From openmp-commits at lists.llvm.org Tue Oct 1 08:46:42 2024 From: openmp-commits at lists.llvm.org (Brad Richardson via Openmp-commits) Date: Tue, 01 Oct 2024 08:46:42 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <66fc1962.050a0220.846e5.1e34@mx.google.com> everythingfunctional wrote: > Shouldn't `-fopenacc` then also emit such a warning? Well, we could get into a philosophical discussion about whether software is ever *not* experimental. But in this case I think it's fine to be a bit proactive and warn users that we know there are still some unfinished aspects. Or we could maybe use the word "incomplete" instead of experimental. Either way I'm happy to add the warning for both, just let me know. https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Tue Oct 1 10:10:48 2024 From: openmp-commits at lists.llvm.org (Valentin Clement =?utf-8?b?44OQ44Os44Oz44K/44Kk44OzIOOCr+ODrOODoeODsw==?= via Openmp-commits) Date: Tue, 01 Oct 2024 10:10:48 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <66fc2d18.170a0220.3babad.e63f@mx.google.com> clementval wrote: > > Could you add an experimental message ("The openmp support in Flang is experimental") when compiling with OpenMP? The warning can be in the Driver code that forwards the `-fopenmp` flag to the driver. > > Hm. I'm not yet fully convinced that we should be doing this. Shouldn't `-fopenacc` then also emit such a warning? OpenACC has no codegen upstream yet so it will never produce an executable. When we start adding support for codeine/runtime, a warning is probably nice until good enough support is reached. https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Tue Oct 1 14:38:49 2024 From: openmp-commits at lists.llvm.org (Brad Richardson via Openmp-commits) Date: Tue, 01 Oct 2024 14:38:49 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <66fc6be9.630a0220.776f2.5007@mx.google.com> everythingfunctional wrote: Sounds like I should just add the warning to OpenMP for now. I'll work on getting that added and then figure out why the CI is failing. https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Wed Oct 2 08:17:41 2024 From: openmp-commits at lists.llvm.org (Joel E. Denny via Openmp-commits) Date: Wed, 02 Oct 2024 08:17:41 -0700 (PDT) Subject: [Openmp-commits] [clang] [compiler-rt] [llvm] [openmp] [PGO][Offload] Profile profraw generation for GPU instrumentation #76587 (PR #93365) In-Reply-To: Message-ID: <66fd6415.170a0220.59042.3dbe@mx.google.com> ================ @@ -1,12 +1,17 @@ -// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ -// RUN: -Xclang "-fprofile-instrument=clang" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ -// RUN: --check-prefix="CLANG-PGO" // RUN: %libomptarget-compile-generic -fprofile-generate \ // RUN: -Xclang "-fprofile-instrument=llvm" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ +// RUN: env LLVM_PROFILE_FILE=llvm.profraw %libomptarget-run-generic 2>&1 +// RUN: llvm-profdata show --all-functions --counts \ ---------------- jdenny-ornl wrote: We should probably have a substitution that locates the llvm-profdata from the build. On a test system I use, the above instead picks up rocm's llvm-profdata and reports: ``` error: amdgcn-amd-amdhsa.llvm.profraw: unsupported instrumentation profile format version ``` https://github.com/llvm/llvm-project/pull/93365 From openmp-commits at lists.llvm.org Wed Oct 2 08:41:21 2024 From: openmp-commits at lists.llvm.org (Joel E. Denny via Openmp-commits) Date: Wed, 02 Oct 2024 08:41:21 -0700 (PDT) Subject: [Openmp-commits] [clang] [compiler-rt] [llvm] [openmp] [PGO][Offload] Add GPU profiling flags to driver (PR #94268) In-Reply-To: Message-ID: <66fd69a1.170a0220.ae5fe.731b@mx.google.com> ================ @@ -0,0 +1,82 @@ +// RUN: %libomptarget-compile-generic -fprofile-generate-gpu ---------------- jdenny-ornl wrote: When targeting a V100, this command fails for me in both pgo1.c and pgo2.c. In the LTO case: ``` LLVM ERROR: Circular dependency found in global variable set ``` In the non-LTO case: ``` fatal error: error in backend: NVPTX aliasee must be a non-kernel function definition ``` I do not see this problem in PR #93365's pgo1.c. https://github.com/llvm/llvm-project/pull/94268 From openmp-commits at lists.llvm.org Wed Oct 2 08:43:32 2024 From: openmp-commits at lists.llvm.org (Joseph Huber via Openmp-commits) Date: Wed, 02 Oct 2024 08:43:32 -0700 (PDT) Subject: [Openmp-commits] [clang] [compiler-rt] [llvm] [openmp] [PGO][Offload] Add GPU profiling flags to driver (PR #94268) In-Reply-To: Message-ID: <66fd6a24.a70a0220.1d18df.ab40@mx.google.com> ================ @@ -0,0 +1,82 @@ +// RUN: %libomptarget-compile-generic -fprofile-generate-gpu ---------------- jhuber6 wrote: This is a limitation of the PTX target, globals cannot reference themselves. Most likely whatever NVIDIA engineer wrote the PTX parser found it annoying to reference something that wasn't fully parsed yet so he just decided to make it an error and here we are. See https://godbolt.org/z/53PP5c5ve. https://github.com/llvm/llvm-project/pull/94268 From openmp-commits at lists.llvm.org Wed Oct 2 09:51:56 2024 From: openmp-commits at lists.llvm.org (Baodi Shan via Openmp-commits) Date: Wed, 02 Oct 2024 09:51:56 -0700 (PDT) Subject: [Openmp-commits] [openmp] Add 'offload' in OpenMP target doc. (PR #110141) In-Reply-To: Message-ID: <66fd7a2c.170a0220.ba28b.7f50@mx.google.com> lwshanbd wrote: @shiltian https://github.com/llvm/llvm-project/pull/110141 From openmp-commits at lists.llvm.org Wed Oct 2 09:55:16 2024 From: openmp-commits at lists.llvm.org (Shilei Tian via Openmp-commits) Date: Wed, 02 Oct 2024 09:55:16 -0700 (PDT) Subject: [Openmp-commits] [openmp] Add 'offload' in OpenMP target doc. (PR #110141) In-Reply-To: Message-ID: <66fd7af4.170a0220.20aec3.4fab@mx.google.com> https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/110141 From openmp-commits at lists.llvm.org Wed Oct 2 09:55:27 2024 From: openmp-commits at lists.llvm.org (Shilei Tian via Openmp-commits) Date: Wed, 02 Oct 2024 09:55:27 -0700 (PDT) Subject: [Openmp-commits] [openmp] [Offload][Doc] Add 'offload' in OpenMP target doc (PR #110141) In-Reply-To: Message-ID: <66fd7aff.170a0220.3cb298.7934@mx.google.com> https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/110141 From openmp-commits at lists.llvm.org Wed Oct 2 09:55:32 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Wed, 02 Oct 2024 09:55:32 -0700 (PDT) Subject: [Openmp-commits] [openmp] 4123050 - [Offload][Doc] Add 'offload' in OpenMP target doc (#110141) Message-ID: <66fd7b04.050a0220.3406d2.b0d1@mx.google.com> Author: Baodi Shan Date: 2024-10-02T12:55:28-04:00 New Revision: 4123050b965f685e8e56c74d413e99f64f35d38b URL: https://github.com/llvm/llvm-project/commit/4123050b965f685e8e56c74d413e99f64f35d38b DIFF: https://github.com/llvm/llvm-project/commit/4123050b965f685e8e56c74d413e99f64f35d38b.diff LOG: [Offload][Doc] Add 'offload' in OpenMP target doc (#110141) Fix #106399 Added: Modified: openmp/docs/SupportAndFAQ.rst Removed: ################################################################################ diff --git a/openmp/docs/SupportAndFAQ.rst b/openmp/docs/SupportAndFAQ.rst index cd2d6a47032214..dee707cf50f919 100644 --- a/openmp/docs/SupportAndFAQ.rst +++ b/openmp/docs/SupportAndFAQ.rst @@ -52,7 +52,7 @@ All patches go through the regular `LLVM review process Q: How to build an OpenMP GPU offload capable compiler? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To build an *effective* OpenMP offload capable compiler, only one extra CMake -option, ``LLVM_ENABLE_RUNTIMES="openmp"``, is needed when building LLVM (Generic +option, ``LLVM_ENABLE_RUNTIMES="openmp;offload"``, is needed when building LLVM (Generic information about building LLVM is available `here `__.). Make sure all backends that are targeted by OpenMP are enabled. That can be done by adjusting the CMake From openmp-commits at lists.llvm.org Wed Oct 2 09:55:33 2024 From: openmp-commits at lists.llvm.org (Shilei Tian via Openmp-commits) Date: Wed, 02 Oct 2024 09:55:33 -0700 (PDT) Subject: [Openmp-commits] [openmp] [Offload][Doc] Add 'offload' in OpenMP target doc (PR #110141) In-Reply-To: Message-ID: <66fd7b05.170a0220.92224.4c88@mx.google.com> https://github.com/shiltian closed https://github.com/llvm/llvm-project/pull/110141 From openmp-commits at lists.llvm.org Wed Oct 2 10:13:12 2024 From: openmp-commits at lists.llvm.org (Leandro Lupori via Openmp-commits) Date: Wed, 02 Oct 2024 10:13:12 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <66fd7f28.050a0220.20bfe.bd3f@mx.google.com> ================ @@ -339,11 +335,11 @@ just added using your new frontend option. ## CMake Support As of [#7246](https://gitlab.kitware.com/cmake/cmake/-/merge_requests/7246) -(and soon to be released CMake 3.24.0), `cmake` can detect `flang-new` as a +(and soon to be released CMake 3.24.0), `cmake` can detect `flang` as a ---------------- luporl wrote: ```suggestion (and CMake 3.24.0), `cmake` can detect `flang` as a ``` Can CMake already detect `flang` as well? https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Wed Oct 2 11:01:36 2024 From: openmp-commits at lists.llvm.org (Brad Richardson via Openmp-commits) Date: Wed, 02 Oct 2024 11:01:36 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <66fd8a80.050a0220.297bd2.cdb2@mx.google.com> ================ @@ -339,11 +335,11 @@ just added using your new frontend option. ## CMake Support As of [#7246](https://gitlab.kitware.com/cmake/cmake/-/merge_requests/7246) -(and soon to be released CMake 3.24.0), `cmake` can detect `flang-new` as a +(and soon to be released CMake 3.24.0), `cmake` can detect `flang` as a ---------------- everythingfunctional wrote: That I am not quite sure of. I will look into it and adjust that comment in the docs as appropriate. https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Wed Oct 2 11:25:11 2024 From: openmp-commits at lists.llvm.org (Brad Richardson via Openmp-commits) Date: Wed, 02 Oct 2024 11:25:11 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <66fd9007.050a0220.291356.be12@mx.google.com> ================ @@ -339,11 +335,11 @@ just added using your new frontend option. ## CMake Support As of [#7246](https://gitlab.kitware.com/cmake/cmake/-/merge_requests/7246) -(and soon to be released CMake 3.24.0), `cmake` can detect `flang-new` as a +(and soon to be released CMake 3.24.0), `cmake` can detect `flang` as a ---------------- everythingfunctional wrote: @banach-space appears to have been involved in that MR, do you happen to know the answer to the above questions? It looks like it doesn't rely on the `flang-new` name, but I'm not sure. https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Wed Oct 2 11:45:06 2024 From: openmp-commits at lists.llvm.org (=?UTF-8?Q?H=C3=A5kon_Strandenes?= via Openmp-commits) Date: Wed, 02 Oct 2024 11:45:06 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <66fd94b2.170a0220.37fa11.6555@mx.google.com> ================ @@ -339,11 +335,11 @@ just added using your new frontend option. ## CMake Support As of [#7246](https://gitlab.kitware.com/cmake/cmake/-/merge_requests/7246) -(and soon to be released CMake 3.24.0), `cmake` can detect `flang-new` as a +(and soon to be released CMake 3.24.0), `cmake` can detect `flang` as a ---------------- hakostra wrote: 1. A search in the CMake repository revel that `flang-new` only appear in testcases and ci-related code - not in any code that eventually end up in CMake. This indicate that the name `flang-new` is not a magic token for CMake to recognize LLVM Flang. 2. I think that [this is the merge](https://gitlab.kitware.com/cmake/cmake/-/merge_requests/7246) that first introduced LLVM Flang to CMake. It seems like the check is through preprocessor macros defined by the different compilers. See [`Modules/CMakeFortranCompilerId.F.in`](https://gitlab.kitware.com/cmake/cmake/-/blob/master/Modules/CMakeFortranCompilerId.F.in) in the CMake sources. https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Wed Oct 2 13:58:09 2024 From: openmp-commits at lists.llvm.org (=?UTF-8?Q?Andrzej_Warzy=C5=84ski?= via Openmp-commits) Date: Wed, 02 Oct 2024 13:58:09 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <66fdb3e1.050a0220.25e621.e9b3@mx.google.com> ================ @@ -339,11 +335,11 @@ just added using your new frontend option. ## CMake Support As of [#7246](https://gitlab.kitware.com/cmake/cmake/-/merge_requests/7246) -(and soon to be released CMake 3.24.0), `cmake` can detect `flang-new` as a +(and soon to be released CMake 3.24.0), `cmake` can detect `flang` as a ---------------- banach-space wrote: > It seems like the check is through preprocessor macros defined by the different compilers. Yes, thank you @hakostra 🙏🏻 https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Thu Oct 3 04:58:58 2024 From: openmp-commits at lists.llvm.org (David Truby via Openmp-commits) Date: Thu, 03 Oct 2024 04:58:58 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <66fe8702.170a0220.393cc6.3f58@mx.google.com> ================ @@ -339,11 +335,11 @@ just added using your new frontend option. ## CMake Support As of [#7246](https://gitlab.kitware.com/cmake/cmake/-/merge_requests/7246) -(and soon to be released CMake 3.24.0), `cmake` can detect `flang-new` as a +(and soon to be released CMake 3.24.0), `cmake` can detect `flang` as a ---------------- DavidTruby wrote: Just to add, for an actually working CMake with flang-new that detects the correct options and does the right thing reliably, you'll need CMake 3.28 as that's the first version with fully working flang-new support, so if updating this comment could you change 3.24 to 3.28 as well? I am fairly sure the CMake support for LLVM flang is designed to work fine regardless of the name of the binary, as the CMake devs were anticipating this name change anyway. https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Thu Oct 3 14:12:59 2024 From: openmp-commits at lists.llvm.org (Brad Richardson via Openmp-commits) Date: Thu, 03 Oct 2024 14:12:59 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <66ff08db.050a0220.f40f1.96ca@mx.google.com> https://github.com/everythingfunctional updated https://github.com/llvm/llvm-project/pull/110023 >From 649a73478c78389560042030a9717a05e8e338a8 Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Wed, 25 Sep 2024 13:25:22 -0500 Subject: [PATCH 1/4] [flang][driver] rename flang-new to flang --- .github/workflows/release-binaries.yml | 2 +- clang/include/clang/Driver/Options.td | 4 +- clang/lib/Driver/Driver.cpp | 2 +- clang/lib/Driver/ToolChains/Flang.cpp | 6 +- clang/test/Driver/flang/flang.f90 | 2 +- clang/test/Driver/flang/flang_ucase.F90 | 2 +- .../Driver/flang/multiple-inputs-mixed.f90 | 2 +- clang/test/Driver/flang/multiple-inputs.f90 | 4 +- flang/docs/FlangDriver.md | 76 +++++++++---------- flang/docs/ImplementingASemanticCheck.md | 4 +- flang/docs/Overview.md | 26 +++---- .../FlangOmpReport/FlangOmpReport.cpp | 2 +- .../flang/Optimizer/Analysis/AliasAnalysis.h | 2 +- flang/include/flang/Tools/CrossToolHelpers.h | 2 +- flang/lib/Frontend/CompilerInvocation.cpp | 6 +- flang/lib/Frontend/FrontendActions.cpp | 2 +- .../ExecuteCompilerInvocation.cpp | 3 +- flang/runtime/CMakeLists.txt | 6 +- flang/test/CMakeLists.txt | 2 +- flang/test/Driver/aarch64-outline-atomics.f90 | 2 +- .../Driver/color-diagnostics-forwarding.f90 | 4 +- flang/test/Driver/compiler-options.f90 | 4 +- flang/test/Driver/convert.f90 | 2 +- .../test/Driver/disable-ext-name-interop.f90 | 2 +- flang/test/Driver/driver-version.f90 | 4 +- flang/test/Driver/escaped-backslash.f90 | 4 +- flang/test/Driver/fdefault.f90 | 28 +++---- flang/test/Driver/flarge-sizes.f90 | 20 ++--- .../test/Driver/frame-pointer-forwarding.f90 | 2 +- flang/test/Driver/frontend-forwarding.f90 | 4 +- flang/test/Driver/hlfir-no-hlfir-error.f90 | 4 +- flang/test/Driver/intrinsic-module-path.f90 | 2 +- flang/test/Driver/large-data-threshold.f90 | 6 +- flang/test/Driver/lto-flags.f90 | 2 +- flang/test/Driver/macro-def-undef.F90 | 4 +- flang/test/Driver/missing-input.f90 | 14 ++-- flang/test/Driver/multiple-input-files.f90 | 2 +- flang/test/Driver/omp-driver-offload.f90 | 66 ++++++++-------- .../predefined-macros-compiler-version.F90 | 4 +- flang/test/Driver/std2018-wrong.f90 | 2 +- flang/test/Driver/std2018.f90 | 2 +- .../Driver/supported-suffices/f03-suffix.f03 | 2 +- .../Driver/supported-suffices/f08-suffix.f08 | 2 +- flang/test/Driver/use-module-error.f90 | 4 +- flang/test/Driver/use-module.f90 | 4 +- flang/test/Driver/version-loops.f90 | 18 ++--- flang/test/Driver/wextra-ok.f90 | 2 +- flang/test/HLFIR/hlfir-flags.f90 | 2 +- .../Intrinsics/command_argument_count.f90 | 4 +- flang/test/Lower/Intrinsics/exit.f90 | 2 +- .../test/Lower/Intrinsics/ieee_is_normal.f90 | 2 +- flang/test/Lower/Intrinsics/isnan.f90 | 2 +- flang/test/Lower/Intrinsics/modulo.f90 | 2 +- .../OpenMP/Todo/omp-declarative-allocate.f90 | 2 +- .../OpenMP/Todo/omp-declare-reduction.f90 | 2 +- .../Lower/OpenMP/Todo/omp-declare-simd.f90 | 2 +- .../parallel-lastprivate-clause-scalar.f90 | 2 +- .../parallel-wsloop-reduction-byref.f90 | 2 +- .../OpenMP/parallel-wsloop-reduction.f90 | 2 +- flang/test/lit.cfg.py | 4 +- flang/tools/f18/CMakeLists.txt | 10 +-- flang/tools/flang-driver/CMakeLists.txt | 12 +-- flang/tools/flang-driver/driver.cpp | 6 +- llvm/runtimes/CMakeLists.txt | 10 +-- offload/CMakeLists.txt | 4 +- openmp/CMakeLists.txt | 4 +- 66 files changed, 220 insertions(+), 227 deletions(-) diff --git a/.github/workflows/release-binaries.yml b/.github/workflows/release-binaries.yml index 925912df6843e4..6073ebac9e6c2c 100644 --- a/.github/workflows/release-binaries.yml +++ b/.github/workflows/release-binaries.yml @@ -328,7 +328,7 @@ jobs: run: | # Build some of the mlir tools that take a long time to link if [ "${{ needs.prepare.outputs.build-flang }}" = "true" ]; then - ninja -C ${{ steps.setup-stage.outputs.build-prefix }}/build/tools/clang/stage2-bins/ -j2 flang-new bbc + ninja -C ${{ steps.setup-stage.outputs.build-prefix }}/build/tools/clang/stage2-bins/ -j2 flang bbc fi ninja -C ${{ steps.setup-stage.outputs.build-prefix }}/build/tools/clang/stage2-bins/ \ mlir-bytecode-parser-fuzzer \ diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 932cf13edab53d..4a45a825da8fa1 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -6071,7 +6071,7 @@ def _sysroot_EQ : Joined<["--"], "sysroot=">, Visibility<[ClangOption, FlangOpti def _sysroot : Separate<["--"], "sysroot">, Alias<_sysroot_EQ>; //===----------------------------------------------------------------------===// -// pie/pic options (clang + flang-new) +// pie/pic options (clang + flang) //===----------------------------------------------------------------------===// let Visibility = [ClangOption, FlangOption] in { @@ -6087,7 +6087,7 @@ def fno_pie : Flag<["-"], "fno-pie">, Group; } // let Vis = [Default, FlangOption] //===----------------------------------------------------------------------===// -// Target Options (clang + flang-new) +// Target Options (clang + flang) //===----------------------------------------------------------------------===// let Flags = [TargetSpecific] in { let Visibility = [ClangOption, FlangOption] in { diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp index d0c8bdba0ede95..4243ee006c1553 100644 --- a/clang/lib/Driver/Driver.cpp +++ b/clang/lib/Driver/Driver.cpp @@ -2021,7 +2021,7 @@ void Driver::PrintHelp(bool ShowHidden) const { void Driver::PrintVersion(const Compilation &C, raw_ostream &OS) const { if (IsFlangMode()) { - OS << getClangToolFullVersion("flang-new") << '\n'; + OS << getClangToolFullVersion("flang") << '\n'; } else { // FIXME: The following handlers should use a callback mechanism, we don't // know what the client would like to do. diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp index 98350690f8d20e..1ca12ff81389a3 100644 --- a/clang/lib/Driver/ToolChains/Flang.cpp +++ b/clang/lib/Driver/ToolChains/Flang.cpp @@ -881,14 +881,12 @@ void Flang::ConstructJob(Compilation &C, const JobAction &JA, CmdArgs.push_back(Input.getFilename()); - // TODO: Replace flang-new with flang once the new driver replaces the - // throwaway driver - const char *Exec = Args.MakeArgString(D.GetProgramPath("flang-new", TC)); + const char *Exec = Args.MakeArgString(D.GetProgramPath("flang", TC)); C.addCommand(std::make_unique(JA, *this, ResponseFileSupport::AtFileUTF8(), Exec, CmdArgs, Inputs, Output)); } -Flang::Flang(const ToolChain &TC) : Tool("flang-new", "flang frontend", TC) {} +Flang::Flang(const ToolChain &TC) : Tool("flang", "flang frontend", TC) {} Flang::~Flang() {} diff --git a/clang/test/Driver/flang/flang.f90 b/clang/test/Driver/flang/flang.f90 index ad4a3a3b6bd44d..b52977ee66d7b0 100644 --- a/clang/test/Driver/flang/flang.f90 +++ b/clang/test/Driver/flang/flang.f90 @@ -13,7 +13,7 @@ ! * (no type specified, resulting in an object file) ! All invocations should begin with flang -fc1, consume up to here. -! ALL-LABEL: "{{[^"]*}}flang-new{{[^"/]*}}" "-fc1" +! ALL-LABEL: "{{[^"]*}}flang{{[^"/]*}}" "-fc1" ! Check that f90 files are not treated as "previously preprocessed" ! ... in --driver-mode=flang. diff --git a/clang/test/Driver/flang/flang_ucase.F90 b/clang/test/Driver/flang/flang_ucase.F90 index e89c053b327bc9..88aedc39fb94a7 100644 --- a/clang/test/Driver/flang/flang_ucase.F90 +++ b/clang/test/Driver/flang/flang_ucase.F90 @@ -13,7 +13,7 @@ ! * (no type specified, resulting in an object file) ! All invocations should begin with flang -fc1, consume up to here. -! ALL-LABEL: "{{[^"]*}}flang-new{{[^"/]*}}" "-fc1" +! ALL-LABEL: "{{[^"]*}}flang{{[^"/]*}}" "-fc1" ! Check that f90 files are not treated as "previously preprocessed" ! ... in --driver-mode=flang. diff --git a/clang/test/Driver/flang/multiple-inputs-mixed.f90 b/clang/test/Driver/flang/multiple-inputs-mixed.f90 index 2395dbecf1fe92..98d8cab00bdfdb 100644 --- a/clang/test/Driver/flang/multiple-inputs-mixed.f90 +++ b/clang/test/Driver/flang/multiple-inputs-mixed.f90 @@ -1,7 +1,7 @@ ! Check that flang can handle mixed C and fortran inputs. ! RUN: %clang --driver-mode=flang -### -fsyntax-only %S/Inputs/one.f90 %S/Inputs/other.c 2>&1 | FileCheck --check-prefixes=CHECK-SYNTAX-ONLY %s -! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang-new{{[^"/]*}}" "-fc1" +! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang{{[^"/]*}}" "-fc1" ! CHECK-SYNTAX-ONLY: "{{[^"]*}}/Inputs/one.f90" ! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}clang{{[^"/]*}}" "-cc1" ! CHECK-SYNTAX-ONLY: "{{[^"]*}}/Inputs/other.c" diff --git a/clang/test/Driver/flang/multiple-inputs.f90 b/clang/test/Driver/flang/multiple-inputs.f90 index ada999e927a6a0..3c0f22e5d3e508 100644 --- a/clang/test/Driver/flang/multiple-inputs.f90 +++ b/clang/test/Driver/flang/multiple-inputs.f90 @@ -1,7 +1,7 @@ ! Check that flang driver can handle multiple inputs at once. ! RUN: %clang --driver-mode=flang -### -fsyntax-only %S/Inputs/one.f90 %S/Inputs/two.f90 2>&1 | FileCheck --check-prefixes=CHECK-SYNTAX-ONLY %s -! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang-new{{[^"/]*}}" "-fc1" +! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang{{[^"/]*}}" "-fc1" ! CHECK-SYNTAX-ONLY: "{{[^"]*}}/Inputs/one.f90" -! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang-new{{[^"/]*}}" "-fc1" +! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang{{[^"/]*}}" "-fc1" ! CHECK-SYNTAX-ONLY: "{{[^"]*}}/Inputs/two.f90" diff --git a/flang/docs/FlangDriver.md b/flang/docs/FlangDriver.md index 815c26a28dfdfa..47cf078cf2d0d4 100644 --- a/flang/docs/FlangDriver.md +++ b/flang/docs/FlangDriver.md @@ -15,17 +15,13 @@ local: ``` There are two main drivers in Flang: -* the compiler driver, `flang-new` -* the frontend driver, `flang-new -fc1` - -> **_NOTE:_** The diagrams in this document refer to `flang` as opposed to -> `flang-new`. Eventually, `flang-new` will be renamed as `flang` and the -> diagrams reflect the final design that we are still working towards. +* the compiler driver, `flang` +* the frontend driver, `flang -fc1` The **compiler driver** will allow you to control all compilation phases (e.g. preprocessing, semantic checks, code-generation, code-optimisation, lowering and linking). For frontend specific tasks, the compiler driver creates a -Fortran compilation job and delegates it to `flang-new -fc1`, the frontend +Fortran compilation job and delegates it to `flang -fc1`, the frontend driver. For linking, it creates a linker job and calls an external linker (e.g. LLVM's [`lld`](https://lld.llvm.org/)). It can also call other tools such as external assemblers (e.g. [`as`](https://www.gnu.org/software/binutils/)). In @@ -47,7 +43,7 @@ frontend. It uses MLIR and LLVM for code-generation and can be viewed as a driver for Flang, LLVM and MLIR libraries. Contrary to the compiler driver, it is not capable of calling any external tools (including linkers). It is aware of all the frontend internals that are "hidden" from the compiler driver. It -accepts many frontend-specific options not available in `flang-new` and as such +accepts many frontend-specific options not available in `flang` and as such it provides a finer control over the frontend. Note that this tool is mostly intended for Flang developers. In particular, there are no guarantees about the stability of its interface and compiler developers can use it to experiment @@ -62,30 +58,30 @@ frontend specific flag from the _compiler_ directly to the _frontend_ driver, e.g.: ```bash -flang-new -Xflang -fdebug-dump-parse-tree input.f95 +flang -Xflang -fdebug-dump-parse-tree input.f95 ``` -In the invocation above, `-fdebug-dump-parse-tree` is forwarded to `flang-new +In the invocation above, `-fdebug-dump-parse-tree` is forwarded to `flang -fc1`. Without the forwarding flag, `-Xflang`, you would see the following warning: ```bash -flang-new: warning: argument unused during compilation: +flang: warning: argument unused during compilation: ``` -As `-fdebug-dump-parse-tree` is only supported by `flang-new -fc1`, `flang-new` +As `-fdebug-dump-parse-tree` is only supported by `flang -fc1`, `flang` will ignore it when used without `Xflang`. ## Why Do We Need Two Drivers? -As hinted above, `flang-new` and `flang-new -fc1` are two separate tools. The -fact that these tools are accessed through one binary, `flang-new`, is just an +As hinted above, `flang` and `flang -fc1` are two separate tools. The +fact that these tools are accessed through one binary, `flang`, is just an implementation detail. Each tool has a separate list of options, albeit defined in the same file: `clang/include/clang/Driver/Options.td`. The separation helps us split various tasks and allows us to implement more -specialised tools. In particular, `flang-new` is not aware of various +specialised tools. In particular, `flang` is not aware of various compilation phases within the frontend (e.g. scanning, parsing or semantic -checks). It does not have to be. Conversely, the frontend driver, `flang-new +checks). It does not have to be. Conversely, the frontend driver, `flang -fc1`, needs not to be concerned with linkers or other external tools like assemblers. Nor does it need to know where to look for various systems libraries, which is usually OS and platform specific. @@ -104,7 +100,7 @@ GCC](https://en.wikibooks.org/wiki/GNU_C_Compiler_Internals/GNU_C_Compiler_Archi In fact, Flang needs to adhere to this model in order to be able to re-use Clang's driver library. If you are more familiar with the [architecture of GFortran](https://gcc.gnu.org/onlinedocs/gcc-4.7.4/gfortran/About-GNU-Fortran.html) -than Clang, then `flang-new` corresponds to `gfortran` and `flang-new -fc1` to +than Clang, then `flang` corresponds to `gfortran` and `flang -fc1` to `f951`. ## Compiler Driver @@ -135,7 +131,7 @@ output from one action is the input for the subsequent one. You can use the `-ccc-print-phases` flag to see the sequence of actions that the driver will create for your compiler invocation: ```bash -flang-new -ccc-print-phases -E file.f +flang -ccc-print-phases -E file.f +- 0: input, "file.f", f95-cpp-input 1: preprocessor, {0}, f95 ``` @@ -143,7 +139,7 @@ As you can see, for `-E` the driver creates only two jobs and stops immediately after preprocessing. The first job simply prepares the input. For `-c`, the pipeline of the created jobs is more complex: ```bash -flang-new -ccc-print-phases -c file.f +flang -ccc-print-phases -c file.f +- 0: input, "file.f", f95-cpp-input +- 1: preprocessor, {0}, f95 +- 2: compiler, {1}, ir @@ -158,7 +154,7 @@ command to call the frontend driver is generated (more specifically, an instance of `clang::driver::Command`). Every command is bound to an instance of `clang::driver::Tool`. For Flang we introduced a specialisation of this class: `clang::driver::Flang`. This class implements the logic to either translate or -forward compiler options to the frontend driver, `flang-new -fc1`. +forward compiler options to the frontend driver, `flang -fc1`. You can read more on the design of `clangDriver` in Clang's [Driver Design & Internals](https://clang.llvm.org/docs/DriverInternals.html). @@ -232,12 +228,12 @@ driver, `clang -cc1` and consists of the following classes: This list is not exhaustive and only covers the main classes that implement the driver. The main entry point for the frontend driver, `fc1_main`, is implemented in `flang/tools/flang-driver/driver.cpp`. It can be accessed by -invoking the compiler driver, `flang-new`, with the `-fc1` flag. +invoking the compiler driver, `flang`, with the `-fc1` flag. The frontend driver will only run one action at a time. If you specify multiple action flags, only the last one will be taken into account. The default action is `ParseSyntaxOnlyAction`, which corresponds to `-fsyntax-only`. In other -words, `flang-new -fc1 ` is equivalent to `flang-new -fc1 -fsyntax-only +words, `flang -fc1 ` is equivalent to `flang -fc1 -fsyntax-only `. ## Adding new Compiler Options @@ -262,8 +258,8 @@ similar semantics to your new option and start by copying that. For every new option, you will also have to define the visibility of the new option. This is controlled through the `Visibility` field. You can use the following Flang specific visibility flags to control this: - * `FlangOption` - this option will be available in the `flang-new` compiler driver, - * `FC1Option` - this option will be available in the `flang-new -fc1` frontend driver, + * `FlangOption` - this option will be available in the `flang` compiler driver, + * `FC1Option` - this option will be available in the `flang -fc1` frontend driver, Options that are supported by clang should explicitly specify `ClangOption` in `Visibility`, and options that are only supported in Flang should not specify @@ -290,10 +286,10 @@ The parsing will depend on the semantics encoded in the TableGen definition. When adding a compiler driver option (i.e. an option that contains `FlangOption` among in it's `Visibility`) that you also intend to be understood -by the frontend, make sure that it is either forwarded to `flang-new -fc1` or +by the frontend, make sure that it is either forwarded to `flang -fc1` or translated into some other option that is accepted by the frontend driver. In the case of options that contain both `FlangOption` and `FC1Option` among its -flags, we usually just forward from `flang-new` to `flang-new -fc1`. This is +flags, we usually just forward from `flang` to `flang -fc1`. This is then tested in `flang/test/Driver/frontend-forward.F90`. What follows is usually very dependant on the meaning of the corresponding @@ -339,11 +335,11 @@ just added using your new frontend option. ## CMake Support As of [#7246](https://gitlab.kitware.com/cmake/cmake/-/merge_requests/7246) -(and soon to be released CMake 3.24.0), `cmake` can detect `flang-new` as a +(and soon to be released CMake 3.24.0), `cmake` can detect `flang` as a supported Fortran compiler. You can configure your CMake projects to use -`flang-new` as follows: +`flang` as follows: ```bash -cmake -DCMAKE_Fortran_COMPILER= +cmake -DCMAKE_Fortran_COMPILER= ``` You should see the following in the output: ``` @@ -353,14 +349,14 @@ where `` corresponds to the LLVM Flang version. ## Testing In LIT, we define two variables that you can use to invoke Flang's drivers: -* `%flang` is expanded as `flang-new` (i.e. the compiler driver) -* `%flang_fc1` is expanded as `flang-new -fc1` (i.e. the frontend driver) +* `%flang` is expanded as `flang` (i.e. the compiler driver) +* `%flang_fc1` is expanded as `flang -fc1` (i.e. the frontend driver) For most regression tests for the frontend, you will want to use `%flang_fc1`. In some cases, the observable behaviour will be identical regardless of whether `%flang` or `%flang_fc1` is used. However, when you are using `%flang` instead of `%flang_fc1`, the compiler driver will add extra flags to the frontend -driver invocation (i.e. `flang-new -fc1 -`). In some cases that might +driver invocation (i.e. `flang -fc1 -`). In some cases that might be exactly what you want to test. In fact, you can check these additional flags by using the `-###` compiler driver command line option. @@ -380,7 +376,7 @@ plugins. The process for using plugins includes: * [Creating a plugin](#creating-a-plugin) * [Loading and running a plugin](#loading-and-running-a-plugin) -Flang plugins are limited to `flang-new -fc1` and are currently only available / +Flang plugins are limited to `flang -fc1` and are currently only available / been tested on Linux. ### Creating a Plugin @@ -465,14 +461,14 @@ static FrontendPluginRegistry::Add X( ### Loading and Running a Plugin In order to use plugins, there are 2 command line options made available to the -frontend driver, `flang-new -fc1`: +frontend driver, `flang -fc1`: * [`-load `](#the--load-dsopath-option) for loading the dynamic shared object of the plugin * [`-plugin `](#the--plugin-name-option) for calling the registered plugin Invocation of the example plugin is done through: ```bash -flang-new -fc1 -load flangPrintFunctionNames.so -plugin print-fns file.f90 +flang -fc1 -load flangPrintFunctionNames.so -plugin print-fns file.f90 ``` Both these options are parsed in `flang/lib/Frontend/CompilerInvocation.cpp` and @@ -493,7 +489,7 @@ reports an error diagnostic and returns `nullptr`. ### Enabling In-Tree Plugins For in-tree plugins, there is the CMake flag `FLANG_PLUGIN_SUPPORT`, enabled by -default, that controls the exporting of executable symbols from `flang-new`, +default, that controls the exporting of executable symbols from `flang`, which plugins need access to. Additionally, there is the CMake flag `LLVM_BUILD_EXAMPLES`, turned off by default, that is used to control if the example programs are built. This includes plugins that are in the @@ -526,7 +522,7 @@ invocations `invokeFIROptEarlyEPCallbacks`, `invokeFIRInlinerCallback`, and `invokeFIROptLastEPCallbacks` for Flang drivers to be able to insert additonal passes at different points of the default pass pipeline. An example use of these extension point callbacks is shown in `registerDefaultInlinerPass` to invoke the -default inliner pass in `flang-new`. +default inliner pass in `flang`. ## LLVM Pass Plugins @@ -539,7 +535,7 @@ documentation for [`llvm::PassBuilder`](https://llvm.org/doxygen/classllvm_1_1PassBuilder.html) for details. -The framework to enable pass plugins in `flang-new` uses the exact same +The framework to enable pass plugins in `flang` uses the exact same machinery as that used by `clang` and thus has the same capabilities and limitations. @@ -547,7 +543,7 @@ In order to use a pass plugin, the pass(es) must be compiled into a dynamic shared object which is then loaded using the `-fpass-plugin` option. ``` -flang-new -fpass-plugin=/path/to/plugin.so +flang -fpass-plugin=/path/to/plugin.so ``` This option is available in both the compiler driver and the frontend driver. @@ -559,7 +555,7 @@ Pass extensions are similar to plugins, except that they can also be linked statically. Setting `-DLLVM_${NAME}_LINK_INTO_TOOLS` to `ON` in the cmake command turns the project into a statically linked extension. An example would be Polly, e.g., using `-DLLVM_POLLY_LINK_INTO_TOOLS=ON` would link Polly passes -into `flang-new` as built-in middle-end passes. +into `flang` as built-in middle-end passes. See the [`WritingAnLLVMNewPMPass`](https://llvm.org/docs/WritingAnLLVMNewPMPass.html#id9) diff --git a/flang/docs/ImplementingASemanticCheck.md b/flang/docs/ImplementingASemanticCheck.md index 5b583d4f8031b8..598ef696ad14bf 100644 --- a/flang/docs/ImplementingASemanticCheck.md +++ b/flang/docs/ImplementingASemanticCheck.md @@ -68,7 +68,7 @@ of the call to `intentOutFunc()`: I also used this program to produce a parse tree for the program using the command: ```bash - flang-new -fc1 -fdebug-dump-parse-tree testfun.f90 + flang -fc1 -fdebug-dump-parse-tree testfun.f90 ``` Here's the relevant fragment of the parse tree produced by the compiler: @@ -296,7 +296,7 @@ In `lib/Semantics/check-do.cpp`, I added an (almost empty) implementation: I then built the compiler with these changes and ran it on my test program. This time, I made sure to invoke semantic checking. Here's the command I used: ```bash - flang-new -fc1 -fdebug-unparse-with-symbols testfun.f90 + flang -fc1 -fdebug-unparse-with-symbols testfun.f90 ``` This produced the output: diff --git a/flang/docs/Overview.md b/flang/docs/Overview.md index 6eba19ea3a3c0d..dfb4d89264a755 100644 --- a/flang/docs/Overview.md +++ b/flang/docs/Overview.md @@ -65,8 +65,8 @@ See [Preprocessing.md](Preprocessing.md). **Entry point:** `parser::Parsing::Prescan` **Commands:** - - `flang-new -fc1 -E src.f90` dumps the cooked character stream - - `flang-new -fc1 -fdebug-dump-provenance src.f90` dumps provenance + - `flang -fc1 -E src.f90` dumps the cooked character stream + - `flang -fc1 -fdebug-dump-provenance src.f90` dumps provenance information ### Parsing @@ -80,10 +80,10 @@ representing a syntactically correct program, rooted at the program unit. See: **Entry point:** `parser::Parsing::Parse` **Commands:** - - `flang-new -fc1 -fdebug-dump-parse-tree-no-sema src.f90` dumps the parse tree - - `flang-new -fc1 -fdebug-unparse src.f90` converts the parse tree to normalized Fortran - - `flang-new -fc1 -fdebug-dump-parsing-log src.f90` runs an instrumented parse and dumps the log - - `flang-new -fc1 -fdebug-measure-parse-tree src.f90` measures the parse tree + - `flang -fc1 -fdebug-dump-parse-tree-no-sema src.f90` dumps the parse tree + - `flang -fc1 -fdebug-unparse src.f90` converts the parse tree to normalized Fortran + - `flang -fc1 -fdebug-dump-parsing-log src.f90` runs an instrumented parse and dumps the log + - `flang -fc1 -fdebug-measure-parse-tree src.f90` measures the parse tree ### Semantic processing @@ -121,9 +121,9 @@ In the course of semantic analysis, the compiler: At the end of semantic processing, all validation of the user's program is complete. This is the last detailed phase of analysis processing. **Commands:** - - `flang-new -fc1 -fdebug-dump-parse-tree src.f90` dumps the parse tree after semantic analysis - - `flang-new -fc1 -fdebug-dump-symbols src.f90` dumps the symbol table - - `flang-new -fc1 -fdebug-dump-all src.f90` dumps both the parse tree and the symbol table + - `flang -fc1 -fdebug-dump-parse-tree src.f90` dumps the parse tree after semantic analysis + - `flang -fc1 -fdebug-dump-symbols src.f90` dumps the symbol table + - `flang -fc1 -fdebug-dump-all src.f90` dumps both the parse tree and the symbol table ## Lowering @@ -163,8 +163,8 @@ contain a list of evaluations. All of these contain pointers back into the parse tree. The compiler walks the PFT generating FIR. **Commands:** - - `flang-new -fc1 -fdebug-dump-pft src.f90` dumps the pre-FIR tree - - `flang-new -fc1 -emit-mlir src.f90` dumps the FIR to the files src.mlir + - `flang -fc1 -fdebug-dump-pft src.f90` dumps the pre-FIR tree + - `flang -fc1 -emit-mlir src.f90` dumps the FIR to the files src.mlir ### Transformation passes @@ -180,8 +180,8 @@ perform various optimizations and transformations. The final pass creates an LLVM IR representation of the program. **Commands:** - - `flang-new -mmlir --mlir-print-ir-after-all -S src.f90` dumps the FIR code after each pass to standard error - - `flang-new -fc1 -emit-llvm src.f90` dumps the LLVM IR to src.ll + - `flang -mmlir --mlir-print-ir-after-all -S src.f90` dumps the FIR code after each pass to standard error + - `flang -fc1 -emit-llvm src.f90` dumps the LLVM IR to src.ll ## Object code generation and linking diff --git a/flang/examples/FlangOmpReport/FlangOmpReport.cpp b/flang/examples/FlangOmpReport/FlangOmpReport.cpp index 9c1f304b9741e7..709c5c5d305e51 100644 --- a/flang/examples/FlangOmpReport/FlangOmpReport.cpp +++ b/flang/examples/FlangOmpReport/FlangOmpReport.cpp @@ -9,7 +9,7 @@ // all the OpenMP constructs and clauses and which line they're located on. // // The plugin may be invoked as: -// ./bin/flang-new -fc1 -load lib/flangOmpReport.so -plugin flang-omp-report +// ./bin/flang -fc1 -load lib/flangOmpReport.so -plugin flang-omp-report // -fopenmp // //===----------------------------------------------------------------------===// diff --git a/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h b/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h index 9a70b7fbfad2b6..8ab5150cd7c812 100644 --- a/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h +++ b/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h @@ -67,7 +67,7 @@ struct AliasAnalysis { // end subroutine // ------------------------------------------------- // - // flang-new -fc1 -emit-fir test.f90 -o test.fir + // flang -fc1 -emit-fir test.f90 -o test.fir // // ------------------- test.fir -------------------- // fir.global @_QMtopEa : !fir.box>> diff --git a/flang/include/flang/Tools/CrossToolHelpers.h b/flang/include/flang/Tools/CrossToolHelpers.h index 3e703de545950c..df4b21ada058fe 100644 --- a/flang/include/flang/Tools/CrossToolHelpers.h +++ b/flang/include/flang/Tools/CrossToolHelpers.h @@ -7,7 +7,7 @@ //===----------------------------------------------------------------------===// // A header file for containing functionallity that is used across Flang tools, // such as helper functions which apply or generate information needed accross -// tools like bbc and flang-new. +// tools like bbc and flang. //===----------------------------------------------------------------------===// #ifndef FORTRAN_TOOLS_CROSS_TOOL_HELPERS_H diff --git a/flang/lib/Frontend/CompilerInvocation.cpp b/flang/lib/Frontend/CompilerInvocation.cpp index 05b03ba9ebdf30..18383eaafb1136 100644 --- a/flang/lib/Frontend/CompilerInvocation.cpp +++ b/flang/lib/Frontend/CompilerInvocation.cpp @@ -65,8 +65,8 @@ CompilerInvocationBase::~CompilerInvocationBase() = default; static bool parseShowColorsArgs(const llvm::opt::ArgList &args, bool defaultColor = true) { // Color diagnostics default to auto ("on" if terminal supports) in the - // compiler driver `flang-new` but default to off in the frontend driver - // `flang-new -fc1`, needing an explicit OPT_fdiagnostics_color. + // compiler driver `flang` but default to off in the frontend driver + // `flang -fc1`, needing an explicit OPT_fdiagnostics_color. // Support both clang's -f[no-]color-diagnostics and gcc's // -f[no-]diagnostics-colors[=never|always|auto]. enum { @@ -891,7 +891,7 @@ static bool parseDiagArgs(CompilerInvocation &res, llvm::opt::ArgList &args, } } - // Default to off for `flang-new -fc1`. + // Default to off for `flang -fc1`. res.getFrontendOpts().showColors = parseShowColorsArgs(args, /*defaultDiagColor=*/false); diff --git a/flang/lib/Frontend/FrontendActions.cpp b/flang/lib/Frontend/FrontendActions.cpp index 4a52edc436e0ed..8f882bff170909 100644 --- a/flang/lib/Frontend/FrontendActions.cpp +++ b/flang/lib/Frontend/FrontendActions.cpp @@ -233,7 +233,7 @@ bool CodeGenAction::beginSourceFileAction() { llvm::SMDiagnostic err; llvmModule = llvm::parseIRFile(getCurrentInput().getFile(), err, *llvmCtx); if (!llvmModule || llvm::verifyModule(*llvmModule, &llvm::errs())) { - err.print("flang-new", llvm::errs()); + err.print("flang", llvm::errs()); unsigned diagID = ci.getDiagnostics().getCustomDiagID( clang::DiagnosticsEngine::Error, "Could not parse IR"); ci.getDiagnostics().Report(diagID); diff --git a/flang/lib/FrontendTool/ExecuteCompilerInvocation.cpp b/flang/lib/FrontendTool/ExecuteCompilerInvocation.cpp index e2cbd5112d6ea5..09ac129d3e6893 100644 --- a/flang/lib/FrontendTool/ExecuteCompilerInvocation.cpp +++ b/flang/lib/FrontendTool/ExecuteCompilerInvocation.cpp @@ -154,8 +154,7 @@ bool executeCompilerInvocation(CompilerInstance *flang) { // Honor -help. if (flang->getFrontendOpts().showHelp) { clang::driver::getDriverOptTable().printHelp( - llvm::outs(), "flang-new -fc1 [options] file...", - "LLVM 'Flang' Compiler", + llvm::outs(), "flang -fc1 [options] file...", "LLVM 'Flang' Compiler", /*ShowHidden=*/false, /*ShowAllAliases=*/false, llvm::opt::Visibility(clang::driver::options::FC1Option)); return true; diff --git a/flang/runtime/CMakeLists.txt b/flang/runtime/CMakeLists.txt index 0ad1b718d5875b..cdd2de541c6730 100644 --- a/flang/runtime/CMakeLists.txt +++ b/flang/runtime/CMakeLists.txt @@ -308,12 +308,12 @@ set_target_properties(FortranRuntime PROPERTIES FOLDER "Flang/Runtime Libraries" # If FortranRuntime is part of a Flang build (and not a separate build) then # add dependency to make sure that Fortran runtime library is being built after # we have the Flang compiler available. This also includes the MODULE files -# that compile when the 'flang-new' target is built. +# that compile when the 'flang' target is built. # # TODO: This is a workaround and should be updated when runtime build procedure # is changed to a regular runtime build. See discussion in PR #95388. -if (TARGET flang-new AND TARGET module_files) - add_dependencies(FortranRuntime flang-new module_files) +if (TARGET flang AND TARGET module_files) + add_dependencies(FortranRuntime flang module_files) endif() if (FLANG_CUF_RUNTIME) diff --git a/flang/test/CMakeLists.txt b/flang/test/CMakeLists.txt index a18a5c6519eda4..cab214c2ef4c8c 100644 --- a/flang/test/CMakeLists.txt +++ b/flang/test/CMakeLists.txt @@ -58,7 +58,7 @@ set(FLANG_TEST_PARAMS flang_site_config=${CMAKE_CURRENT_BINARY_DIR}/lit.site.cfg.py) set(FLANG_TEST_DEPENDS - flang-new + flang llvm-config FileCheck count diff --git a/flang/test/Driver/aarch64-outline-atomics.f90 b/flang/test/Driver/aarch64-outline-atomics.f90 index a1c874c20df5c7..530bfc8e962091 100644 --- a/flang/test/Driver/aarch64-outline-atomics.f90 +++ b/flang/test/Driver/aarch64-outline-atomics.f90 @@ -1,4 +1,4 @@ -! Test that flang-new forwards the -moutline-atomics and -mno-outline-atomics. +! Test that flang forwards the -moutline-atomics and -mno-outline-atomics. ! RUN: %flang -moutline-atomics --target=aarch64-none-none -### %s -o %t 2>&1 | FileCheck %s ! CHECK: "-target-feature" "+outline-atomics" diff --git a/flang/test/Driver/color-diagnostics-forwarding.f90 b/flang/test/Driver/color-diagnostics-forwarding.f90 index 368fa8834142ab..29061242cb0cbc 100644 --- a/flang/test/Driver/color-diagnostics-forwarding.f90 +++ b/flang/test/Driver/color-diagnostics-forwarding.f90 @@ -1,5 +1,5 @@ -! Test that flang-new forwards -f{no-}color-diagnostics and -! -f{no-}diagnostics-color options to flang-new -fc1 as expected. +! Test that flang forwards -f{no-}color-diagnostics and +! -f{no-}diagnostics-color options to flang -fc1 as expected. ! RUN: %flang -fsyntax-only -### %s -o %t 2>&1 -fcolor-diagnostics \ ! RUN: | FileCheck %s --check-prefix=CHECK-CD diff --git a/flang/test/Driver/compiler-options.f90 b/flang/test/Driver/compiler-options.f90 index 7ec29ce7ba7abf..cefa86836abd30 100644 --- a/flang/test/Driver/compiler-options.f90 +++ b/flang/test/Driver/compiler-options.f90 @@ -1,6 +1,6 @@ ! RUN: %flang -S -emit-llvm -flang-deprecated-no-hlfir -o - %s | FileCheck %s -! Test communication of COMPILER_OPTIONS from flang-new to flang-new -fc1. -! CHECK: [[OPTSVAR:@_QQclX[0-9a-f]+]] = {{[a-z]+}} constant [[[OPTSLEN:[0-9]+]] x i8] c"{{.*}}flang-new{{(\.exe)?}} {{.*}}-S -emit-llvm -flang-deprecated-no-hlfir -o - {{.*}}compiler-options.f90" +! Test communication of COMPILER_OPTIONS from flang to flang -fc1. +! CHECK: [[OPTSVAR:@_QQclX[0-9a-f]+]] = {{[a-z]+}} constant [[[OPTSLEN:[0-9]+]] x i8] c"{{.*}}flang{{(\.exe)?}} {{.*}}-S -emit-llvm -flang-deprecated-no-hlfir -o - {{.*}}compiler-options.f90" program main use ISO_FORTRAN_ENV, only: compiler_options implicit none diff --git a/flang/test/Driver/convert.f90 b/flang/test/Driver/convert.f90 index b2cf6c23efdb75..0ba31d2188cdf5 100755 --- a/flang/test/Driver/convert.f90 +++ b/flang/test/Driver/convert.f90 @@ -12,7 +12,7 @@ ! RUN: not %flang -fconvert=foobar %s 2>&1 | FileCheck %s --check-prefix=INVALID !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -emit-mlir -fconvert=unknown %s -o - | FileCheck %s --check-prefix=VALID_FC1 ! RUN: %flang_fc1 -emit-mlir -fconvert=native %s -o - | FileCheck %s --check-prefix=VALID_FC1 diff --git a/flang/test/Driver/disable-ext-name-interop.f90 b/flang/test/Driver/disable-ext-name-interop.f90 index 0c59a5b4c980f8..1ade84b996d043 100644 --- a/flang/test/Driver/disable-ext-name-interop.f90 +++ b/flang/test/Driver/disable-ext-name-interop.f90 @@ -1,4 +1,4 @@ -! Test that we can disable the ExternalNameConversion pass in flang-new. +! Test that we can disable the ExternalNameConversion pass in flang. ! RUN: %flang_fc1 -S %s -o - 2>&1 | FileCheck %s --check-prefix=EXTNAMES ! RUN: %flang_fc1 -S -mmlir -disable-external-name-interop %s -o - 2>&1 | FileCheck %s --check-prefix=INTNAMES diff --git a/flang/test/Driver/driver-version.f90 b/flang/test/Driver/driver-version.f90 index d1e1e1d90fe1f8..4c6aecb1c4fa7e 100644 --- a/flang/test/Driver/driver-version.f90 +++ b/flang/test/Driver/driver-version.f90 @@ -4,12 +4,12 @@ ! RUN: %flang_fc1 -version 2>&1 | FileCheck %s --check-prefix=VERSION-FC1 ! RUN: not %flang_fc1 --version 2>&1 | FileCheck %s --check-prefix=ERROR-FC1 -! VERSION: flang-new version +! VERSION: flang version ! VERSION-NEXT: Target: ! VERSION-NEXT: Thread model: ! VERSION-NEXT: InstalledDir: -! ERROR: flang-new: error: unknown argument '--versions'; did you mean '--version'? +! ERROR: flang: error: unknown argument '--versions'; did you mean '--version'? ! VERSION-FC1: LLVM version diff --git a/flang/test/Driver/escaped-backslash.f90 b/flang/test/Driver/escaped-backslash.f90 index ad07eae24e9fab..90dd1783dd1150 100644 --- a/flang/test/Driver/escaped-backslash.f90 +++ b/flang/test/Driver/escaped-backslash.f90 @@ -1,14 +1,14 @@ ! Ensure argument -fbackslash works as expected. !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: %flang -E %s 2>&1 | FileCheck %s --check-prefix=ESCAPED ! RUN: %flang -E -fbackslash -fno-backslash %s 2>&1 | FileCheck %s --check-prefix=ESCAPED ! RUN: %flang -E -fbackslash %s 2>&1 | FileCheck %s --check-prefix=UNESCAPED !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -E %s 2>&1 | FileCheck %s --check-prefix=ESCAPED ! RUN: %flang_fc1 -E -fbackslash -fno-backslash %s 2>&1 | FileCheck %s --check-prefix=ESCAPED diff --git a/flang/test/Driver/fdefault.f90 b/flang/test/Driver/fdefault.f90 index 88592bfa3e87ee..7ce45b763a240f 100644 --- a/flang/test/Driver/fdefault.f90 +++ b/flang/test/Driver/fdefault.f90 @@ -2,25 +2,25 @@ ! TODO: Add checks when actual codegen is possible for this family !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang -fsyntax-only -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=NOOPTION -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang -fsyntax-only -fdefault-real-8 -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=REAL8 -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang -fsyntax-only -fdefault-real-8 -fdefault-double-8 -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=DOUBLE8 +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang -fsyntax-only -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=NOOPTION +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang -fsyntax-only -fdefault-real-8 -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=REAL8 +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang -fsyntax-only -fdefault-real-8 -fdefault-double-8 -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=DOUBLE8 ! RUN: not %flang -fsyntax-only -fdefault-double-8 %s 2>&1 | FileCheck %s --check-prefix=ERROR !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang_fc1 -fsyntax-only -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=NOOPTION -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang_fc1 -fsyntax-only -fdefault-real-8 -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=REAL8 -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang_fc1 -fsyntax-only -fdefault-real-8 -fdefault-double-8 -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=DOUBLE8 +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang_fc1 -fsyntax-only -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=NOOPTION +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang_fc1 -fsyntax-only -fdefault-real-8 -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=REAL8 +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang_fc1 -fsyntax-only -fdefault-real-8 -fdefault-double-8 -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=DOUBLE8 ! RUN: not %flang_fc1 -fsyntax-only -fdefault-double-8 %s 2>&1 | FileCheck %s --check-prefix=ERROR ! NOOPTION: integer(4),parameter::real_kind=4_4 diff --git a/flang/test/Driver/flarge-sizes.f90 b/flang/test/Driver/flarge-sizes.f90 index 6ea5876676ed1f..6c41a03a830bfb 100644 --- a/flang/test/Driver/flarge-sizes.f90 +++ b/flang/test/Driver/flarge-sizes.f90 @@ -2,20 +2,20 @@ ! TODO: Add checks when actual codegen is possible. !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang -fsyntax-only -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=NOLARGE -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang -fsyntax-only -flarge-sizes -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=LARGE +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang -fsyntax-only -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=NOLARGE +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang -fsyntax-only -flarge-sizes -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=LARGE !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang_fc1 -fsyntax-only -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=NOLARGE -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang_fc1 -fsyntax-only -flarge-sizes -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=LARGE +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang_fc1 -fsyntax-only -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=NOLARGE +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang_fc1 -fsyntax-only -flarge-sizes -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=LARGE ! NOLARGE: real(4)::z(1_8:10_8) ! NOLARGE-NEXT: integer(4),parameter::size_kind=4_4 diff --git a/flang/test/Driver/frame-pointer-forwarding.f90 b/flang/test/Driver/frame-pointer-forwarding.f90 index 751494cc6a6017..9fcbd6e12f98b7 100644 --- a/flang/test/Driver/frame-pointer-forwarding.f90 +++ b/flang/test/Driver/frame-pointer-forwarding.f90 @@ -1,4 +1,4 @@ -! Test that flang-new forwards -fno-omit-frame-pointer and -fomit-frame-pointer Flang frontend +! Test that flang forwards -fno-omit-frame-pointer and -fomit-frame-pointer Flang frontend ! RUN: %flang --target=aarch64-none-none -fsyntax-only -### %s -o %t 2>&1 | FileCheck %s --check-prefix=CHECK-NOVALUE ! CHECK-NOVALUE: "-fc1"{{.*}}"-mframe-pointer=non-leaf" diff --git a/flang/test/Driver/frontend-forwarding.f90 b/flang/test/Driver/frontend-forwarding.f90 index 35adb47b56861e..0a56a1e3710d9d 100644 --- a/flang/test/Driver/frontend-forwarding.f90 +++ b/flang/test/Driver/frontend-forwarding.f90 @@ -1,5 +1,5 @@ -! Test that flang-new forwards Flang frontend -! options to flang-new -fc1 as expected. +! Test that flang forwards Flang frontend +! options to flang -fc1 as expected. ! RUN: %flang -fsyntax-only -### %s -o %t 2>&1 \ ! RUN: -finput-charset=utf-8 \ diff --git a/flang/test/Driver/hlfir-no-hlfir-error.f90 b/flang/test/Driver/hlfir-no-hlfir-error.f90 index 2410393b6cd9c1..59f8304db5c9ab 100644 --- a/flang/test/Driver/hlfir-no-hlfir-error.f90 +++ b/flang/test/Driver/hlfir-no-hlfir-error.f90 @@ -2,12 +2,12 @@ ! options cannot be both used. !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: not %flang -flang-experimental-hlfir -flang-deprecated-no-hlfir %s 2>&1 | FileCheck %s !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: not %flang_fc1 -emit-llvm -flang-experimental-hlfir -flang-deprecated-no-hlfir %s 2>&1 | FileCheck %s diff --git a/flang/test/Driver/intrinsic-module-path.f90 b/flang/test/Driver/intrinsic-module-path.f90 index 5523ed37b724cd..15d19dd83d963f 100644 --- a/flang/test/Driver/intrinsic-module-path.f90 +++ b/flang/test/Driver/intrinsic-module-path.f90 @@ -4,7 +4,7 @@ ! default one, causing a CHECKSUM error. !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -fsyntax-only %s 2>&1 | FileCheck %s --allow-empty --check-prefix=WITHOUT ! RUN: not %flang_fc1 -fsyntax-only -fintrinsic-modules-path %S/Inputs/ %s 2>&1 | FileCheck %s --check-prefix=GIVEN diff --git a/flang/test/Driver/large-data-threshold.f90 b/flang/test/Driver/large-data-threshold.f90 index 320566c4b2e43a..6a7eef79559d0b 100644 --- a/flang/test/Driver/large-data-threshold.f90 +++ b/flang/test/Driver/large-data-threshold.f90 @@ -7,11 +7,11 @@ ! RUN: not %flang -### -c --target=aarch64 -mcmodel=small -mlarge-data-threshold=32768 %s 2>&1 | FileCheck %s --check-prefix=NOT-SUPPORTED -! CHECK: "{{.*}}flang-new" "-fc1" +! CHECK: "{{.*}}flang" "-fc1" ! CHECK-SAME: "-mlarge-data-threshold=32768" -! CHECK-59000: "{{.*}}flang-new" "-fc1" +! CHECK-59000: "{{.*}}flang" "-fc1" ! CHECK-59000-SAME: "-mlarge-data-threshold=59000" -! CHECK-1M: "{{.*}}flang-new" "-fc1" +! CHECK-1M: "{{.*}}flang" "-fc1" ! CHECK-1M-SAME: "-mlarge-data-threshold=1048576" ! NO-MCMODEL: 'mlarge-data-threshold=' only applies to medium and large code models ! INVALID: error: invalid value 'nonsense' in '-mlarge-data-threshold=' diff --git a/flang/test/Driver/lto-flags.f90 b/flang/test/Driver/lto-flags.f90 index a51febc7009691..bad3d972e6bd6b 100644 --- a/flang/test/Driver/lto-flags.f90 +++ b/flang/test/Driver/lto-flags.f90 @@ -30,7 +30,7 @@ ! FULL-LTO: "-fc1" ! FULL-LTO-SAME: "-flto=full" -! THIN-LTO-ALL: flang-new: warning: the option '-flto=thin' is a work in progress +! THIN-LTO-ALL: flang: warning: the option '-flto=thin' is a work in progress ! THIN-LTO-ALL: "-fc1" ! THIN-LTO-ALL-SAME: "-flto=thin" ! THIN-LTO-LINKER-PLUGIN: "-plugin-opt=thinlto" diff --git a/flang/test/Driver/macro-def-undef.F90 b/flang/test/Driver/macro-def-undef.F90 index 1332c6d6c02708..b13a9040833dbf 100644 --- a/flang/test/Driver/macro-def-undef.F90 +++ b/flang/test/Driver/macro-def-undef.F90 @@ -1,14 +1,14 @@ ! Ensure arguments -D and -U work as expected. !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: %flang -E -P %s 2>&1 | FileCheck %s --check-prefix=UNDEFINED ! RUN: %flang -E -P -DX=A %s 2>&1 | FileCheck %s --check-prefix=DEFINED ! RUN: %flang -E -P -DX=A -UX %s 2>&1 | FileCheck %s --check-prefix=UNDEFINED !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -E -P %s 2>&1 | FileCheck %s --check-prefix=UNDEFINED ! RUN: %flang_fc1 -E -P -DX=A %s 2>&1 | FileCheck %s --check-prefix=DEFINED diff --git a/flang/test/Driver/missing-input.f90 b/flang/test/Driver/missing-input.f90 index 236325e3578f1d..51d37a718c542f 100644 --- a/flang/test/Driver/missing-input.f90 +++ b/flang/test/Driver/missing-input.f90 @@ -1,26 +1,26 @@ ! Test the behaviour of the driver when input is missing or is invalid. Note -! that with the compiler driver (flang-new), the input _has_ to be specified. +! that with the compiler driver (flang), the input _has_ to be specified. ! Indeed, the driver decides what "job/command" to create based on the input ! file's extension. No input file means that it doesn't know what to do -! (compile? preprocess? link?). The frontend driver (flang-new -fc1) simply +! (compile? preprocess? link?). The frontend driver (flang -fc1) simply ! assumes that "no explicit input == read from stdin" !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: not %flang 2>&1 | FileCheck %s --check-prefix=FLANG-NO-FILE ! RUN: not %flang %t.f90 2>&1 | FileCheck %s --check-prefix=FLANG-NONEXISTENT-FILE !----------------------------------------- -! FLANG FRONTEND DRIVER (flang-new -fc1) +! FLANG FRONTEND DRIVER (flang -fc1) !----------------------------------------- ! RUN: not %flang_fc1 %t.f90 2>&1 | FileCheck %s --check-prefix=FLANG-FC1-NONEXISTENT-FILE ! RUN: not %flang_fc1 %S 2>&1 | FileCheck %s --check-prefix=FLANG-FC1-DIR -! FLANG-NO-FILE: flang-new: error: no input files +! FLANG-NO-FILE: flang: error: no input files -! FLANG-NONEXISTENT-FILE: flang-new: error: no such file or directory: {{.*}} -! FLANG-NONEXISTENT-FILE: flang-new: error: no input files +! FLANG-NONEXISTENT-FILE: flang: error: no such file or directory: {{.*}} +! FLANG-NONEXISTENT-FILE: flang: error: no input files ! FLANG-FC1-NONEXISTENT-FILE: error: {{.*}} does not exist ! FLANG-FC1-DIR: error: {{.*}} is not a regular file diff --git a/flang/test/Driver/multiple-input-files.f90 b/flang/test/Driver/multiple-input-files.f90 index 6c86f23f2b21fa..64ec8679abf94f 100644 --- a/flang/test/Driver/multiple-input-files.f90 +++ b/flang/test/Driver/multiple-input-files.f90 @@ -39,7 +39,7 @@ ! FLANG-NEXT:end program hello ! TEST 2: `-o` does not when multiple input files are present -! ERROR: flang-new: error: cannot specify -o when generating multiple output files +! ERROR: flang: error: cannot specify -o when generating multiple output files ! TEST 3: The output file _was not_ specified - `flang_fc1` will process all ! input files and generate one output file for every input file. diff --git a/flang/test/Driver/omp-driver-offload.f90 b/flang/test/Driver/omp-driver-offload.f90 index b0b94ab1386a74..7c51656f0001af 100644 --- a/flang/test/Driver/omp-driver-offload.f90 +++ b/flang/test/Driver/omp-driver-offload.f90 @@ -1,6 +1,6 @@ -! Test that flang-new OpenMP and OpenMP offload related +! Test that flang OpenMP and OpenMP offload related ! commands forward or expand to the appropriate commands -! for flang-new -fc1 as expected. Assumes a gfx90a, aarch64, +! for flang -fc1 as expected. Assumes a gfx90a, aarch64, ! and sm_70 architecture, but doesn't require one to be ! installed or compiled for, just testing the appropriate ! generation of jobs are created with the correct @@ -8,8 +8,8 @@ ! Test regular -fopenmp with no offload ! RUN: %flang -### -fopenmp %s 2>&1 | FileCheck --check-prefixes=CHECK-OPENMP %s -! CHECK-OPENMP: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}}.f90" -! CHECK-OPENMP-NOT: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" {{.*}}.f90" +! CHECK-OPENMP: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}}.f90" +! CHECK-OPENMP-NOT: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" {{.*}}.f90" ! Test regular -fopenmp with offload, and invocation filtering options ! RUN: %flang -S -### %s -o %t 2>&1 \ @@ -22,47 +22,47 @@ ! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\ ! RUN: | FileCheck %s --check-prefix=OFFLOAD-HOST-AND-DEVICE -! OFFLOAD-HOST-AND-DEVICE: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" -! OFFLOAD-HOST-AND-DEVICE-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" -! OFFLOAD-HOST-AND-DEVICE-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "nvptx64-nvidia-cuda" -! OFFLOAD-HOST-AND-DEVICE: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-HOST-AND-DEVICE: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-HOST-AND-DEVICE-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! OFFLOAD-HOST-AND-DEVICE-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "nvptx64-nvidia-cuda" +! OFFLOAD-HOST-AND-DEVICE: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! RUN: %flang -S -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a --offload-arch=sm_70 --offload-host-only \ ! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\ ! RUN: | FileCheck %s --check-prefix=OFFLOAD-HOST -! OFFLOAD-HOST: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-HOST: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! OFFLOAD-HOST-NOT: "-triple" "amdgcn-amd-amdhsa" ! OFFLOAD-HOST-NOT: "-triple" "nvptx64-nvidia-cuda" -! OFFLOAD-HOST-NOT: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-HOST-NOT: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! RUN: %flang -S -### %s 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a --offload-arch=sm_70 --offload-device-only \ ! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\ ! RUN: | FileCheck %s --check-prefix=OFFLOAD-DEVICE -! OFFLOAD-DEVICE: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" -! OFFLOAD-DEVICE-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" -! OFFLOAD-DEVICE-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "nvptx64-nvidia-cuda" -! OFFLOAD-DEVICE-NOT: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-DEVICE: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-DEVICE-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! OFFLOAD-DEVICE-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "nvptx64-nvidia-cuda" +! OFFLOAD-DEVICE-NOT: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! Test regular -fopenmp with offload for basic fopenmp-is-target-device flag addition and correct fopenmp ! RUN: %flang -### -fopenmp --offload-arch=gfx90a -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib %s 2>&1 | FileCheck --check-prefixes=CHECK-OPENMP-IS-TARGET-DEVICE %s -! CHECK-OPENMP-IS-TARGET-DEVICE: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" {{.*}}.f90" +! CHECK-OPENMP-IS-TARGET-DEVICE: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" {{.*}}.f90" ! Testing appropriate flags are gnerated and appropriately assigned by the driver when offloading ! RUN: %flang -S -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ ! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\ ! RUN: | FileCheck %s --check-prefix=OPENMP-OFFLOAD-ARGS -! OPENMP-OFFLOAD-ARGS: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" {{.*}} "-fopenmp" {{.*}}.f90" -! OPENMP-OFFLOAD-ARGS-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! OPENMP-OFFLOAD-ARGS: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" {{.*}} "-fopenmp" {{.*}}.f90" +! OPENMP-OFFLOAD-ARGS-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" ! OPENMP-OFFLOAD-ARGS-SAME: "-fopenmp" ! OPENMP-OFFLOAD-ARGS-SAME: "-fopenmp-host-ir-file-path" "{{.*}}.bc" "-fopenmp-is-target-device" ! OPENMP-OFFLOAD-ARGS-SAME: {{.*}}.f90" ! OPENMP-OFFLOAD-ARGS: "{{[^"]*}}clang-offload-packager{{.*}}" {{.*}} "--image=file={{.*}}.bc,triple=amdgcn-amd-amdhsa,arch=gfx90a,kind=openmp" -! OPENMP-OFFLOAD-ARGS-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OPENMP-OFFLOAD-ARGS-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! OPENMP-OFFLOAD-ARGS-SAME: "-fopenmp" ! OPENMP-OFFLOAD-ARGS-SAME: "-fembed-offload-object={{.*}}.out" {{.*}}.bc" @@ -77,7 +77,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-assume-threads-oversubscription \ ! RUN: | FileCheck %s --check-prefixes=CHECK-THREADS-OVS -! CHECK-THREADS-OVS: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-threads-oversubscription" {{.*}}.f90" +! CHECK-THREADS-OVS: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-threads-oversubscription" {{.*}}.f90" ! RUN: %flang -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -89,7 +89,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-assume-teams-oversubscription \ ! RUN: | FileCheck %s --check-prefixes=CHECK-TEAMS-OVS -! CHECK-TEAMS-OVS: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-teams-oversubscription" {{.*}}.f90" +! CHECK-TEAMS-OVS: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-teams-oversubscription" {{.*}}.f90" ! RUN: %flang -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -101,7 +101,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-assume-no-nested-parallelism \ ! RUN: | FileCheck %s --check-prefixes=CHECK-NEST-PAR -! CHECK-NEST-PAR: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-no-nested-parallelism" {{.*}}.f90" +! CHECK-NEST-PAR: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-no-nested-parallelism" {{.*}}.f90" ! RUN: %flang -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -113,7 +113,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-assume-no-thread-state \ ! RUN: | FileCheck %s --check-prefixes=CHECK-THREAD-STATE -! CHECK-THREAD-STATE: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-no-thread-state" {{.*}}.f90" +! CHECK-THREAD-STATE: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-no-thread-state" {{.*}}.f90" ! RUN: %flang -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -125,7 +125,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-target-debug \ ! RUN: | FileCheck %s --check-prefixes=CHECK-TARGET-DEBUG -! CHECK-TARGET-DEBUG: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug" {{.*}}.f90" +! CHECK-TARGET-DEBUG: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug" {{.*}}.f90" ! RUN: %flang -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -137,7 +137,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-target-debug \ ! RUN: | FileCheck %s --check-prefixes=CHECK-TARGET-DEBUG -! CHECK-TARGET-DEBUG-EQ: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug=111" {{.*}}.f90" +! CHECK-TARGET-DEBUG-EQ: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug=111" {{.*}}.f90" ! RUN: %flang -S -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -153,7 +153,7 @@ ! RUN: -fopenmp-assume-teams-oversubscription -fopenmp-assume-no-nested-parallelism \ ! RUN: -fopenmp-assume-no-thread-state \ ! RUN: | FileCheck %s --check-prefixes=CHECK-RTL-ALL -! CHECK-RTL-ALL: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug" "-fopenmp-assume-teams-oversubscription" +! CHECK-RTL-ALL: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug" "-fopenmp-assume-teams-oversubscription" ! CHECK-RTL-ALL: "-fopenmp-assume-threads-oversubscription" "-fopenmp-assume-no-thread-state" "-fopenmp-assume-no-nested-parallelism" ! CHECK-RTL-ALL: {{.*}}.f90" @@ -167,7 +167,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-version=45 \ ! RUN: | FileCheck %s --check-prefixes=CHECK-OPENMP-VERSION -! CHECK-OPENMP-VERSION: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" "-fopenmp-version=45" {{.*}}.f90" +! CHECK-OPENMP-VERSION: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" "-fopenmp-version=45" {{.*}}.f90" ! Test diagnostic error when host IR file is non-existent ! RUN: not %flang_fc1 %s -o %t 2>&1 -fopenmp -fopenmp-is-target-device \ @@ -187,7 +187,7 @@ ! RUN: --target=aarch64-unknown-linux-gnu \ ! RUN: | FileCheck %s --check-prefix=FORCE-USM-NO-OFFLOAD -! FORCE-USM-NO-OFFLOAD: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! FORCE-USM-NO-OFFLOAD: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! FORCE-USM-NO-OFFLOAD-SAME: "-fopenmp" "-fopenmp-force-usm" ! Test -fopenmp-force-usm option with offload @@ -196,16 +196,16 @@ ! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\ ! RUN: | FileCheck %s --check-prefix=FORCE-USM-OFFLOAD -! FORCE-USM-OFFLOAD: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! FORCE-USM-OFFLOAD: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! FORCE-USM-OFFLOAD-SAME: "-fopenmp" "-fopenmp-force-usm" -! FORCE-USM-OFFLOAD-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! FORCE-USM-OFFLOAD-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" ! FORCE-USM-OFFLOAD-SAME: "-fopenmp" "-fopenmp-force-usm" ! RUN: %flang -### -v --target=x86_64-unknown-linux-gnu -fopenmp \ ! RUN: --offload-arch=gfx900 \ ! RUN: --rocm-path=%S/Inputs/rocm %s 2>&1 \ ! RUN: | FileCheck --check-prefix=MLINK-BUILTIN-BITCODE %s -! MLINK-BUILTIN-BITCODE: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! MLINK-BUILTIN-BITCODE: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" ! MLINK-BUILTIN-BITCODE-SAME: "-mlink-builtin-bitcode" {{.*Inputs.*rocm.*amdgcn.*bitcode.*}}oclc_isa_version_900.bc ! Test that the -fopenmp-targets option is added to host compilation invocations @@ -219,9 +219,9 @@ ! RUN: --target=x86_64-unknown-linux-gnu -nogpulib \ ! RUN: | FileCheck %s --check-prefix=OFFLOAD-TARGETS -! OFFLOAD-TARGETS: "{{[^"]*}}flang-new" "-fc1" "-triple" "x86_64-unknown-linux-gnu" +! OFFLOAD-TARGETS: "{{[^"]*}}flang" "-fc1" "-triple" "x86_64-unknown-linux-gnu" ! OFFLOAD-TARGETS-SAME: "-fopenmp-targets=amdgcn-amd-amdhsa" -! OFFLOAD-TARGETS-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! OFFLOAD-TARGETS-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" ! OFFLOAD-TARGETS-NOT: -fopenmp-targets -! OFFLOAD-TARGETS: "{{[^"]*}}flang-new" "-fc1" "-triple" "x86_64-unknown-linux-gnu" +! OFFLOAD-TARGETS: "{{[^"]*}}flang" "-fc1" "-triple" "x86_64-unknown-linux-gnu" ! OFFLOAD-TARGETS-SAME: "-fopenmp-targets=amdgcn-amd-amdhsa" diff --git a/flang/test/Driver/predefined-macros-compiler-version.F90 b/flang/test/Driver/predefined-macros-compiler-version.F90 index 823a730f96845a..f6924479281562 100644 --- a/flang/test/Driver/predefined-macros-compiler-version.F90 +++ b/flang/test/Driver/predefined-macros-compiler-version.F90 @@ -1,12 +1,12 @@ ! Check that the driver correctly defines macros with the compiler version !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: %flang_fc1 -E %s 2>&1 | FileCheck %s --ignore-case !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -E %s 2>&1 | FileCheck %s --ignore-case diff --git a/flang/test/Driver/std2018-wrong.f90 b/flang/test/Driver/std2018-wrong.f90 index 27ccc76bd39aad..93ba153d75f7f9 100644 --- a/flang/test/Driver/std2018-wrong.f90 +++ b/flang/test/Driver/std2018-wrong.f90 @@ -1,7 +1,7 @@ ! Ensure argument -std=f2018 works as expected. !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: not %flang_fc1 -std=90 %s 2>&1 | FileCheck %s --check-prefix=WRONG diff --git a/flang/test/Driver/std2018.f90 b/flang/test/Driver/std2018.f90 index cf461cf89e4e19..1727f92127b711 100644 --- a/flang/test/Driver/std2018.f90 +++ b/flang/test/Driver/std2018.f90 @@ -1,7 +1,7 @@ ! Ensure argument -std=f2018 works as expected. !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -fsyntax-only %s 2>&1 | FileCheck %s --allow-empty --check-prefix=WITHOUT ! RUN: %flang_fc1 -fsyntax-only -std=f2018 %s 2>&1 | FileCheck %s --check-prefix=GIVEN diff --git a/flang/test/Driver/supported-suffices/f03-suffix.f03 b/flang/test/Driver/supported-suffices/f03-suffix.f03 index 6e03f9f43fc602..1d850305cd040e 100644 --- a/flang/test/Driver/supported-suffices/f03-suffix.f03 +++ b/flang/test/Driver/supported-suffices/f03-suffix.f03 @@ -1,5 +1,5 @@ ! RUN: %flang -### %s 2>&1 | FileCheck %s -! CHECK: "{{.*}}flang-new" "-fc1" {{.*}} "-o" "{{.*}}.o" +! CHECK: "{{.*}}flang" "-fc1" {{.*}} "-o" "{{.*}}.o" program f03 end program f03 diff --git a/flang/test/Driver/supported-suffices/f08-suffix.f08 b/flang/test/Driver/supported-suffices/f08-suffix.f08 index d5bcf4ce1de1cc..2b31e4c21876ae 100644 --- a/flang/test/Driver/supported-suffices/f08-suffix.f08 +++ b/flang/test/Driver/supported-suffices/f08-suffix.f08 @@ -1,5 +1,5 @@ ! RUN: %flang -### %s 2>&1 | FileCheck %s -! CHECK: "{{.*}}flang-new" "-fc1" {{.*}} "-o" "{{.*}}.o" +! CHECK: "{{.*}}flang" "-fc1" {{.*}} "-o" "{{.*}}.o" program f08 end program f08 diff --git a/flang/test/Driver/use-module-error.f90 b/flang/test/Driver/use-module-error.f90 index 42d6650621c8c8..67335f61626817 100644 --- a/flang/test/Driver/use-module-error.f90 +++ b/flang/test/Driver/use-module-error.f90 @@ -1,14 +1,14 @@ ! Ensure that multiple module directories are not allowed !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: not %flang -fsyntax-only -J %S/Inputs/module-dir -J %S/Inputs/ %s 2>&1 | FileCheck %s --check-prefix=DOUBLEINCLUDE ! RUN: not %flang -fsyntax-only -J %S/Inputs/module-dir -module-dir %S/Inputs/ %s 2>&1 | FileCheck %s --check-prefix=DOUBLEINCLUDE ! RUN: not %flang -fsyntax-only -module-dir %S/Inputs/module-dir -J%S/Inputs/ %s 2>&1 | FileCheck %s --check-prefix=DOUBLEINCLUDE !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: not %flang_fc1 -fsyntax-only -J %S/Inputs/module-dir -J %S/Inputs/ %s 2>&1 | FileCheck %s --check-prefix=DOUBLEINCLUDE ! RUN: not %flang_fc1 -fsyntax-only -J %S/Inputs/module-dir -module-dir %S/Inputs/ %s 2>&1 | FileCheck %s --check-prefix=DOUBLEINCLUDE diff --git a/flang/test/Driver/use-module.f90 b/flang/test/Driver/use-module.f90 index 775c0424715883..2c3a38043fe16e 100644 --- a/flang/test/Driver/use-module.f90 +++ b/flang/test/Driver/use-module.f90 @@ -1,7 +1,7 @@ ! Checks that module search directories specified with `-J/-module-dir` and `-I` are handled correctly !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: %flang -fsyntax-only -I %S/Inputs -I %S/Inputs/module-dir %s 2>&1 | FileCheck %s --check-prefix=INCLUDED --allow-empty ! RUN: %flang -fsyntax-only -I %S/Inputs -J %S/Inputs/module-dir %s 2>&1 | FileCheck %s --check-prefix=INCLUDED --allow-empty @@ -16,7 +16,7 @@ ! RUN: not %flang -fsyntax-only -module-dir %S/Inputs/module-dir %s 2>&1 | FileCheck %s --check-prefix=SINGLEINCLUDE !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -fsyntax-only -I %S/Inputs -I %S/Inputs/module-dir %s 2>&1 | FileCheck %s --check-prefix=INCLUDED --allow-empty ! RUN: %flang_fc1 -fsyntax-only -I %S/Inputs -J %S/Inputs/module-dir %s 2>&1 | FileCheck %s --check-prefix=INCLUDED --allow-empty diff --git a/flang/test/Driver/version-loops.f90 b/flang/test/Driver/version-loops.f90 index b0fa01d572512a..d206393a04f486 100644 --- a/flang/test/Driver/version-loops.f90 +++ b/flang/test/Driver/version-loops.f90 @@ -1,5 +1,5 @@ -! Test that flang-new forwards the -f{no-,}version-loops-for-stride -! options correctly to flang-new -fc1 for different variants of optimisation +! Test that flang forwards the -f{no-,}version-loops-for-stride +! options correctly to flang -fc1 for different variants of optimisation ! and explicit flags. ! RUN: %flang -### %s -o %t 2>&1 -O3 \ @@ -23,32 +23,32 @@ ! RUN: %flang -### %s -o %t 2>&1 -O3 -fno-version-loops-for-stride \ ! RUN: | FileCheck %s --check-prefix=CHECK-O3-no -! CHECK: "{{.*}}flang-new" "-fc1" +! CHECK: "{{.*}}flang" "-fc1" ! CHECK-SAME: "-fversion-loops-for-stride" ! CHECK-SAME: "-O3" -! CHECK-O2: "{{.*}}flang-new" "-fc1" +! CHECK-O2: "{{.*}}flang" "-fc1" ! CHECK-O2-NOT: "-fversion-loops-for-stride" ! CHECK-O2-SAME: "-O2" -! CHECK-O2-with: "{{.*}}flang-new" "-fc1" +! CHECK-O2-with: "{{.*}}flang" "-fc1" ! CHECK-O2-with-SAME: "-fversion-loops-for-stride" ! CHECK-O2-with-SAME: "-O2" -! CHECK-O4: "{{.*}}flang-new" "-fc1" +! CHECK-O4: "{{.*}}flang" "-fc1" ! CHECK-O4-SAME: "-fversion-loops-for-stride" ! CHECK-O4-SAME: "-O3" -! CHECK-Ofast: "{{.*}}flang-new" "-fc1" +! CHECK-Ofast: "{{.*}}flang" "-fc1" ! CHECK-Ofast-SAME: "-ffast-math" ! CHECK-Ofast-SAME: "-fversion-loops-for-stride" ! CHECK-Ofast-SAME: "-O3" -! CHECK-Ofast-no: "{{.*}}flang-new" "-fc1" +! CHECK-Ofast-no: "{{.*}}flang" "-fc1" ! CHECK-Ofast-no-SAME: "-ffast-math" ! CHECK-Ofast-no-NOT: "-fversion-loops-for-stride" ! CHECK-Ofast-no-SAME: "-O3" -! CHECK-O3-no: "{{.*}}flang-new" "-fc1" +! CHECK-O3-no: "{{.*}}flang" "-fc1" ! CHECK-O3-no-NOT: "-fversion-loops-for-stride" ! CHECK-O3-no-SAME: "-O3" diff --git a/flang/test/Driver/wextra-ok.f90 b/flang/test/Driver/wextra-ok.f90 index 6a38d9481a36b7..441029aa0af276 100644 --- a/flang/test/Driver/wextra-ok.f90 +++ b/flang/test/Driver/wextra-ok.f90 @@ -1,4 +1,4 @@ -! Ensure that supplying -Wextra into flang-new does not raise error +! Ensure that supplying -Wextra into flang does not raise error ! The first check should be changed if -Wextra is implemented ! RUN: %flang -std=f2018 -Wextra %s -c 2>&1 | FileCheck %s --check-prefix=CHECK-OK diff --git a/flang/test/HLFIR/hlfir-flags.f90 b/flang/test/HLFIR/hlfir-flags.f90 index b383a79d12c27b..0b1e80b1e3f636 100644 --- a/flang/test/HLFIR/hlfir-flags.f90 +++ b/flang/test/HLFIR/hlfir-flags.f90 @@ -1,4 +1,4 @@ -! Test -flang-deprecated-hlfir, -flang-experimental-hlfir (flang-new), and +! Test -flang-deprecated-hlfir, -flang-experimental-hlfir (flang), and ! -hlfir (bbc), -emit-hlfir, -emit-fir flags ! RUN: %flang_fc1 -emit-hlfir -o - %s | FileCheck --check-prefix HLFIR --check-prefix ALL %s ! RUN: bbc -emit-hlfir -o - %s | FileCheck --check-prefix HLFIR --check-prefix ALL %s diff --git a/flang/test/Lower/Intrinsics/command_argument_count.f90 b/flang/test/Lower/Intrinsics/command_argument_count.f90 index 0cf92d4444db98..a30b27d664fc0c 100644 --- a/flang/test/Lower/Intrinsics/command_argument_count.f90 +++ b/flang/test/Lower/Intrinsics/command_argument_count.f90 @@ -1,6 +1,6 @@ ! RUN: bbc -emit-fir %s -o - | FileCheck %s -! bbc doesn't have a way to set the default kinds so we use flang-new driver -! RUN: flang-new -fc1 -fdefault-integer-8 -emit-fir %s -o - | FileCheck --check-prefixes=CHECK,CHECK-64 %s +! bbc doesn't have a way to set the default kinds so we use flang driver +! RUN: flang -fc1 -fdefault-integer-8 -emit-fir %s -o - | FileCheck --check-prefixes=CHECK,CHECK-64 %s ! CHECK-LABEL: argument_count_test subroutine argument_count_test() diff --git a/flang/test/Lower/Intrinsics/exit.f90 b/flang/test/Lower/Intrinsics/exit.f90 index c3110fcbec2b5a..bd551f7318a84a 100644 --- a/flang/test/Lower/Intrinsics/exit.f90 +++ b/flang/test/Lower/Intrinsics/exit.f90 @@ -1,5 +1,5 @@ ! RUN: bbc -emit-fir -hlfir=false %s -o - | FileCheck --check-prefixes=CHECK,CHECK-32 -DDEFAULT_INTEGER_SIZE=32 %s -! bbc doesn't have a way to set the default kinds so we use flang-new driver +! bbc doesn't have a way to set the default kinds so we use flang driver ! RUN: %flang_fc1 -fdefault-integer-8 -emit-fir -flang-deprecated-no-hlfir %s -o - | FileCheck --check-prefixes=CHECK,CHECK-64 -DDEFAULT_INTEGER_SIZE=64 %s ! CHECK-LABEL: func @_QPexit_test1() { diff --git a/flang/test/Lower/Intrinsics/ieee_is_normal.f90 b/flang/test/Lower/Intrinsics/ieee_is_normal.f90 index f9ab01881d250d..9b864c9a9849c3 100644 --- a/flang/test/Lower/Intrinsics/ieee_is_normal.f90 +++ b/flang/test/Lower/Intrinsics/ieee_is_normal.f90 @@ -1,5 +1,5 @@ ! RUN: bbc -emit-fir %s -o - | FileCheck %s -! RUN: flang-new -fc1 -emit-fir %s -o - | FileCheck %s +! RUN: flang -fc1 -emit-fir %s -o - | FileCheck %s ! CHECK-LABEL: ieee_is_normal_f16 subroutine ieee_is_normal_f16(r) diff --git a/flang/test/Lower/Intrinsics/isnan.f90 b/flang/test/Lower/Intrinsics/isnan.f90 index 700b2d1a67c656..62b98c8ea98bee 100644 --- a/flang/test/Lower/Intrinsics/isnan.f90 +++ b/flang/test/Lower/Intrinsics/isnan.f90 @@ -1,5 +1,5 @@ ! RUN: bbc -emit-fir %s -o - | FileCheck %s -! RUN: flang-new -fc1 -emit-fir %s -o - | FileCheck %s +! RUN: flang -fc1 -emit-fir %s -o - | FileCheck %s ! CHECK-LABEL: isnan_f32 subroutine isnan_f32(r) diff --git a/flang/test/Lower/Intrinsics/modulo.f90 b/flang/test/Lower/Intrinsics/modulo.f90 index ac18e59033a6b6..781ef8296a2b7d 100644 --- a/flang/test/Lower/Intrinsics/modulo.f90 +++ b/flang/test/Lower/Intrinsics/modulo.f90 @@ -1,5 +1,5 @@ ! RUN: bbc -emit-fir -hlfir=false %s -o - | FileCheck %s -check-prefixes=HONORINF,ALL -! RUN: flang-new -fc1 -menable-no-infs -emit-fir -flang-deprecated-no-hlfir %s -o - | FileCheck %s -check-prefixes=CHECK,ALL +! RUN: flang -fc1 -menable-no-infs -emit-fir -flang-deprecated-no-hlfir %s -o - | FileCheck %s -check-prefixes=CHECK,ALL ! ALL-LABEL: func @_QPmodulo_testr( ! ALL-SAME: %[[arg0:.*]]: !fir.ref{{.*}}, %[[arg1:.*]]: !fir.ref{{.*}}, %[[arg2:.*]]: !fir.ref{{.*}}) { diff --git a/flang/test/Lower/OpenMP/Todo/omp-declarative-allocate.f90 b/flang/test/Lower/OpenMP/Todo/omp-declarative-allocate.f90 index f02884e5e92f38..425ccbc5dd56c5 100644 --- a/flang/test/Lower/OpenMP/Todo/omp-declarative-allocate.f90 +++ b/flang/test/Lower/OpenMP/Todo/omp-declarative-allocate.f90 @@ -1,6 +1,6 @@ ! This test checks lowering of OpenMP allocate Directive. -// RUN: not flang-new -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s +// RUN: not flang -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s program main integer :: x, y diff --git a/flang/test/Lower/OpenMP/Todo/omp-declare-reduction.f90 b/flang/test/Lower/OpenMP/Todo/omp-declare-reduction.f90 index 3be61a1700ced3..7a7d28db8d6f5a 100644 --- a/flang/test/Lower/OpenMP/Todo/omp-declare-reduction.f90 +++ b/flang/test/Lower/OpenMP/Todo/omp-declare-reduction.f90 @@ -1,6 +1,6 @@ ! This test checks lowering of OpenMP declare reduction Directive. -// RUN: not flang-new -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s +// RUN: not flang -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s subroutine declare_red() integer :: my_var diff --git a/flang/test/Lower/OpenMP/Todo/omp-declare-simd.f90 b/flang/test/Lower/OpenMP/Todo/omp-declare-simd.f90 index c6a0a8f2cd0d22..be1ac2db5dfa4a 100644 --- a/flang/test/Lower/OpenMP/Todo/omp-declare-simd.f90 +++ b/flang/test/Lower/OpenMP/Todo/omp-declare-simd.f90 @@ -1,6 +1,6 @@ ! This test checks lowering of OpenMP declare simd Directive. -// RUN: not flang-new -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s +// RUN: not flang -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s subroutine sub(x, y) real, intent(inout) :: x, y diff --git a/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 b/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 index 62bc247a1456a1..bc5baf4e1cf604 100644 --- a/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 +++ b/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 @@ -1,7 +1,7 @@ ! This test checks lowering of `LASTPRIVATE` clause for scalar types. ! RUN: bbc -fopenmp -emit-hlfir %s -o - | FileCheck %s -! RUN: flang-new -fc1 -fopenmp -emit-hlfir %s -o - | FileCheck %s +! RUN: flang -fc1 -fopenmp -emit-hlfir %s -o - | FileCheck %s !CHECK: func @_QPlastprivate_character(%[[ARG1:.*]]: !fir.boxchar<1>{{.*}}) { !CHECK-DAG: %[[ARG1_UNBOX:.*]]:2 = fir.unboxchar diff --git a/flang/test/Lower/OpenMP/parallel-wsloop-reduction-byref.f90 b/flang/test/Lower/OpenMP/parallel-wsloop-reduction-byref.f90 index 32caac39778dee..99c521406a7775 100644 --- a/flang/test/Lower/OpenMP/parallel-wsloop-reduction-byref.f90 +++ b/flang/test/Lower/OpenMP/parallel-wsloop-reduction-byref.f90 @@ -1,7 +1,7 @@ ! Check that for parallel do, reduction is only processed for the loop ! RUN: bbc -fopenmp --force-byref-reduction -emit-hlfir %s -o - | FileCheck %s -! RUN: flang-new -fc1 -fopenmp -mmlir --force-byref-reduction -emit-hlfir %s -o - | FileCheck %s +! RUN: flang -fc1 -fopenmp -mmlir --force-byref-reduction -emit-hlfir %s -o - | FileCheck %s ! CHECK: omp.parallel { ! CHECK: omp.wsloop reduction(byref @add_reduction_byref_i32 diff --git a/flang/test/Lower/OpenMP/parallel-wsloop-reduction.f90 b/flang/test/Lower/OpenMP/parallel-wsloop-reduction.f90 index fdedbb06160761..cfeb5de83f4e82 100644 --- a/flang/test/Lower/OpenMP/parallel-wsloop-reduction.f90 +++ b/flang/test/Lower/OpenMP/parallel-wsloop-reduction.f90 @@ -1,7 +1,7 @@ ! Check that for parallel do, reduction is only processed for the loop ! RUN: bbc -fopenmp -emit-hlfir %s -o - | FileCheck %s -! RUN: flang-new -fc1 -fopenmp -emit-hlfir %s -o - | FileCheck %s +! RUN: flang -fc1 -fopenmp -emit-hlfir %s -o - | FileCheck %s ! CHECK: omp.parallel { ! CHECK: omp.wsloop reduction(@add_reduction_i32 diff --git a/flang/test/lit.cfg.py b/flang/test/lit.cfg.py index 4acbc0606d1977..f43234fb125b7e 100644 --- a/flang/test/lit.cfg.py +++ b/flang/test/lit.cfg.py @@ -132,13 +132,13 @@ tools = [ ToolSubst( "%flang", - command=FindTool("flang-new"), + command=FindTool("flang"), extra_args=isysroot_flag, unresolved="fatal", ), ToolSubst( "%flang_fc1", - command=FindTool("flang-new"), + command=FindTool("flang"), extra_args=["-fc1"], unresolved="fatal", ), diff --git a/flang/tools/f18/CMakeLists.txt b/flang/tools/f18/CMakeLists.txt index 9d7b8633958cb7..4362fcf0537616 100644 --- a/flang/tools/f18/CMakeLists.txt +++ b/flang/tools/f18/CMakeLists.txt @@ -55,7 +55,7 @@ endif() set(module_objects "") # Create module files directly from the top-level module source directory. -# If CMAKE_CROSSCOMPILING, then the newly built flang-new executable was +# If CMAKE_CROSSCOMPILING, then the newly built flang executable was # cross compiled, and thus can't be executed on the build system and thus # can't be used for generating module files. if (NOT CMAKE_CROSSCOMPILING) @@ -115,9 +115,9 @@ if (NOT CMAKE_CROSSCOMPILING) # TODO: We may need to flag this with conditional, in case Flang is built w/o OpenMP support add_custom_command(OUTPUT ${base}.mod ${object_output} COMMAND ${CMAKE_COMMAND} -E make_directory ${FLANG_INTRINSIC_MODULES_DIR} - COMMAND flang-new ${opts} ${decls} -cpp ${compile_with} -module-dir ${FLANG_INTRINSIC_MODULES_DIR} + COMMAND flang ${opts} ${decls} -cpp ${compile_with} -module-dir ${FLANG_INTRINSIC_MODULES_DIR} ${FLANG_SOURCE_DIR}/module/${filename}.f90 - DEPENDS flang-new ${FLANG_SOURCE_DIR}/module/${filename}.f90 ${FLANG_SOURCE_DIR}/module/__fortran_builtins.f90 ${depends} + DEPENDS flang ${FLANG_SOURCE_DIR}/module/${filename}.f90 ${FLANG_SOURCE_DIR}/module/__fortran_builtins.f90 ${depends} ) list(APPEND MODULE_FILES ${base}.mod) install(FILES ${base}.mod DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}/flang") @@ -142,9 +142,9 @@ if (NOT CMAKE_CROSSCOMPILING) set(base ${FLANG_INTRINSIC_MODULES_DIR}/omp_lib) add_custom_command(OUTPUT ${base}.mod ${base}_kinds.mod COMMAND ${CMAKE_COMMAND} -E make_directory ${FLANG_INTRINSIC_MODULES_DIR} - COMMAND flang-new -cpp -fsyntax-only ${opts} -module-dir ${FLANG_INTRINSIC_MODULES_DIR} + COMMAND flang -cpp -fsyntax-only ${opts} -module-dir ${FLANG_INTRINSIC_MODULES_DIR} ${CMAKE_BINARY_DIR}/projects/openmp/runtime/src/omp_lib.F90 - DEPENDS flang-new ${FLANG_INTRINSIC_MODULES_DIR}/iso_c_binding.mod ${CMAKE_BINARY_DIR}/projects/openmp/runtime/src/omp_lib.F90 ${depends} + DEPENDS flang ${FLANG_INTRINSIC_MODULES_DIR}/iso_c_binding.mod ${CMAKE_BINARY_DIR}/projects/openmp/runtime/src/omp_lib.F90 ${depends} ) add_custom_command(OUTPUT ${base}.f18.mod DEPENDS ${base}.mod diff --git a/flang/tools/flang-driver/CMakeLists.txt b/flang/tools/flang-driver/CMakeLists.txt index 9f33cdfe3fa90f..615c673374faf4 100644 --- a/flang/tools/flang-driver/CMakeLists.txt +++ b/flang/tools/flang-driver/CMakeLists.txt @@ -11,18 +11,18 @@ set( LLVM_LINK_COMPONENTS TargetParser ) -add_flang_tool(flang-new +add_flang_tool(flang driver.cpp fc1_main.cpp ) -target_link_libraries(flang-new +target_link_libraries(flang PRIVATE flangFrontend flangFrontendTool ) -clang_target_link_libraries(flang-new +clang_target_link_libraries(flang PRIVATE clangDriver clangBasic @@ -30,9 +30,9 @@ clang_target_link_libraries(flang-new option(FLANG_PLUGIN_SUPPORT "Build Flang with plugin support." ON) -# Enable support for plugins, which need access to symbols from flang-new +# Enable support for plugins, which need access to symbols from flang if(FLANG_PLUGIN_SUPPORT) - export_executable_symbols_for_plugins(flang-new) + export_executable_symbols_for_plugins(flang) endif() -install(TARGETS flang-new DESTINATION "${CMAKE_INSTALL_BINDIR}") +install(TARGETS flang DESTINATION "${CMAKE_INSTALL_BINDIR}") diff --git a/flang/tools/flang-driver/driver.cpp b/flang/tools/flang-driver/driver.cpp index 52136df10c0b02..603aab4205836c 100644 --- a/flang/tools/flang-driver/driver.cpp +++ b/flang/tools/flang-driver/driver.cpp @@ -95,7 +95,7 @@ int main(int argc, const char **argv) { llvm::StringSaver saver(a); ExpandResponseFiles(saver, args); - // Check if flang-new is in the frontend mode + // Check if flang is in the frontend mode auto firstArg = std::find_if(args.begin() + 1, args.end(), [](const char *a) { return a != nullptr; }); if (firstArg != args.end()) { @@ -104,7 +104,7 @@ int main(int argc, const char **argv) { << "Valid tools include '-fc1'.\n"; return 1; } - // Call flang-new frontend + // Call flang frontend if (llvm::StringRef(args[1]).starts_with("-fc1")) { return executeFC1Tool(args); } @@ -140,7 +140,7 @@ int main(int argc, const char **argv) { // Set the environment variable, FLANG_COMPILER_OPTIONS_STRING, to contain all // the compiler options. This is intended for the frontend driver, - // flang-new -fc1, to enable the implementation of the COMPILER_OPTIONS + // flang -fc1, to enable the implementation of the COMPILER_OPTIONS // intrinsic. To this end, the frontend driver requires the list of the // original compiler options, which is not available through other means. // TODO: This way of passing information between the compiler and frontend diff --git a/llvm/runtimes/CMakeLists.txt b/llvm/runtimes/CMakeLists.txt index d948b7eb39b39c..9da1f926817a8b 100644 --- a/llvm/runtimes/CMakeLists.txt +++ b/llvm/runtimes/CMakeLists.txt @@ -504,15 +504,15 @@ if(build_runtimes) if("openmp" IN_LIST LLVM_ENABLE_RUNTIMES) if (${LLVM_TOOL_FLANG_BUILD}) - message(STATUS "Configuring build of omp_lib.mod and omp_lib_kinds.mod via flang-new") - set(LIBOMP_FORTRAN_MODULES_COMPILER "${CMAKE_BINARY_DIR}/bin/flang-new") + message(STATUS "Configuring build of omp_lib.mod and omp_lib_kinds.mod via flang") + set(LIBOMP_FORTRAN_MODULES_COMPILER "${CMAKE_BINARY_DIR}/bin/flang") set(LIBOMP_MODULES_INSTALL_PATH "${CMAKE_INSTALL_INCLUDEDIR}/flang") # TODO: This is a workaround until flang becomes a first-class project - # in llvm/CMakeList.txt. Until then, this line ensures that flang-new is - # built before "openmp" is built as a runtime project. Besides "flang-new" + # in llvm/CMakeList.txt. Until then, this line ensures that flang is + # built before "openmp" is built as a runtime project. Besides "flang" # to build the compiler, we also need to add "module_files" to make sure # that all .mod files are also properly build. - list(APPEND extra_deps "flang-new" "module_files") + list(APPEND extra_deps "flang" "module_files") endif() foreach(dep opt llvm-link llvm-extract clang clang-offload-packager) if(TARGET ${dep}) diff --git a/offload/CMakeLists.txt b/offload/CMakeLists.txt index 9ffe8f56b76e67..9b771d1116ee38 100644 --- a/offload/CMakeLists.txt +++ b/offload/CMakeLists.txt @@ -89,9 +89,9 @@ else() # Check for flang if (NOT MSVC) - set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang-new) + set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang) else() - set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang-new.exe) + set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang.exe) endif() # Set fortran test compiler if flang is found diff --git a/openmp/CMakeLists.txt b/openmp/CMakeLists.txt index 3b4259dfa380e8..c206386fa6b614 100644 --- a/openmp/CMakeLists.txt +++ b/openmp/CMakeLists.txt @@ -69,9 +69,9 @@ else() # Check for flang if (NOT MSVC) - set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang-new) + set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang) else() - set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang-new.exe) + set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang.exe) endif() # Set fortran test compiler if flang is found >From b71c1d519cc61a751268b1ccda3fc59a966bab96 Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Thu, 26 Sep 2024 10:39:53 -0500 Subject: [PATCH 2/4] [flang][driver] restore flang-new as symlink Restore flang-new as a symlink to flang for backwards compatibility Co-authored-by: H. Vetinari Co-authored-by: Andrzej Warzynski --- clang/lib/Driver/ToolChain.cpp | 3 +++ flang/tools/flang-driver/CMakeLists.txt | 4 ++++ flang/tools/flang-driver/driver.cpp | 3 ++- 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp index 16f9b629fc538c..c9f3dbd7707b77 100644 --- a/clang/lib/Driver/ToolChain.cpp +++ b/clang/lib/Driver/ToolChain.cpp @@ -381,6 +381,9 @@ static const DriverSuffix *FindDriverSuffix(StringRef ProgName, size_t &Pos) { {"cl", "--driver-mode=cl"}, {"++", "--driver-mode=g++"}, {"flang", "--driver-mode=flang"}, + // For backwards compatibility, we create a symlink for `flang` called + // `flang-new`. This will be removed in the future. + {"flang-new", "--driver-mode=flang"}, {"clang-dxc", "--driver-mode=dxc"}, }; diff --git a/flang/tools/flang-driver/CMakeLists.txt b/flang/tools/flang-driver/CMakeLists.txt index 615c673374faf4..063acdd7dfe57c 100644 --- a/flang/tools/flang-driver/CMakeLists.txt +++ b/flang/tools/flang-driver/CMakeLists.txt @@ -36,3 +36,7 @@ if(FLANG_PLUGIN_SUPPORT) endif() install(TARGETS flang DESTINATION "${CMAKE_INSTALL_BINDIR}") + +# Keep "flang-new" as a symlink for backwards compatiblity. Remove once "flang" +# is a widely adopted name. +add_flang_symlink(flang-new flang) diff --git a/flang/tools/flang-driver/driver.cpp b/flang/tools/flang-driver/driver.cpp index 603aab4205836c..ed52988feaa59c 100644 --- a/flang/tools/flang-driver/driver.cpp +++ b/flang/tools/flang-driver/driver.cpp @@ -88,7 +88,8 @@ int main(int argc, const char **argv) { llvm::InitLLVM x(argc, argv); llvm::SmallVector args(argv, argv + argc); - clang::driver::ParsedClangName targetandMode("flang", "--driver-mode=flang"); + clang::driver::ParsedClangName targetandMode = + clang::driver::ToolChain::getTargetAndModeFromProgramName(argv[0]); std::string driverPath = getExecutablePath(args[0]); llvm::BumpPtrAllocator a; >From 443c951f8e0458e8b011424fad6a2e4b40b63144 Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Mon, 30 Sep 2024 10:16:59 -0500 Subject: [PATCH 3/4] [flang][driver] add version to flang executable --- flang/tools/flang-driver/CMakeLists.txt | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/flang/tools/flang-driver/CMakeLists.txt b/flang/tools/flang-driver/CMakeLists.txt index 063acdd7dfe57c..9a89a6185a3291 100644 --- a/flang/tools/flang-driver/CMakeLists.txt +++ b/flang/tools/flang-driver/CMakeLists.txt @@ -28,6 +28,12 @@ clang_target_link_libraries(flang clangBasic ) +# This creates the executable with a version appended +# and creates a symlink to it without the version +if(CYGWIN OR NOT WIN32) # but it doesn't work on Windows + set_target_properties(flang PROPERTIES VERSION ${FLANG_EXECUTABLE_VERSION}) +endif() + option(FLANG_PLUGIN_SUPPORT "Build Flang with plugin support." ON) # Enable support for plugins, which need access to symbols from flang >From 27ae40d86f235890d109ca88682dd0caba0d2c93 Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Thu, 3 Oct 2024 14:12:35 -0700 Subject: [PATCH 4/4] [flang][driver] add warning when using openmp --- clang/include/clang/Basic/DiagnosticDriverKinds.td | 3 +++ clang/include/clang/Basic/DiagnosticGroups.td | 4 ++++ clang/lib/Driver/ToolChains/Flang.cpp | 3 +++ 3 files changed, 10 insertions(+) diff --git a/clang/include/clang/Basic/DiagnosticDriverKinds.td b/clang/include/clang/Basic/DiagnosticDriverKinds.td index 97573fcf20c1fb..68722ad9633120 100644 --- a/clang/include/clang/Basic/DiagnosticDriverKinds.td +++ b/clang/include/clang/Basic/DiagnosticDriverKinds.td @@ -147,6 +147,9 @@ def warn_drv_unsupported_option_for_processor : Warning< def warn_drv_unsupported_openmp_library : Warning< "the library '%0=%1' is not supported, OpenMP will not be enabled">, InGroup; +def warn_openmp_experimental : Warning< + "OpenMP support in flang is still experimental">, + InGroup; def err_drv_invalid_thread_model_for_target : Error< "invalid thread model '%0' in '%1' for this target">; diff --git a/clang/include/clang/Basic/DiagnosticGroups.td b/clang/include/clang/Basic/DiagnosticGroups.td index 7d81bdf827ea0c..bfa065f018f8d8 100644 --- a/clang/include/clang/Basic/DiagnosticGroups.td +++ b/clang/include/clang/Basic/DiagnosticGroups.td @@ -1582,3 +1582,7 @@ def ExtractAPIMisuse : DiagGroup<"extractapi-misuse">; // Warnings about using the non-standard extension having an explicit specialization // with a storage class specifier. def ExplicitSpecializationStorageClass : DiagGroup<"explicit-specialization-storage-class">; + +// A warning for options that enable a feature that is not yet complete +def ExperimentalOption : DiagGroup<"experimental-option">; + diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp index 1ca12ff81389a3..19b43594b00815 100644 --- a/clang/lib/Driver/ToolChains/Flang.cpp +++ b/clang/lib/Driver/ToolChains/Flang.cpp @@ -787,6 +787,9 @@ void Flang::ConstructJob(Compilation &C, const JobAction &JA, if (Args.hasArg(options::OPT_fopenmp_force_usm)) CmdArgs.push_back("-fopenmp-force-usm"); + // TODO: OpenMP support isn't "done" yet, so for now we warn that it + // is experimental. + D.Diag(diag::warn_openmp_experimental); // FIXME: Clang supports a whole bunch more flags here. break; From openmp-commits at lists.llvm.org Thu Oct 3 15:32:12 2024 From: openmp-commits at lists.llvm.org (Joel E. Denny via Openmp-commits) Date: Thu, 03 Oct 2024 15:32:12 -0700 (PDT) Subject: [Openmp-commits] [clang] [compiler-rt] [llvm] [openmp] [PGO][Offload] Profile profraw generation for GPU instrumentation #76587 (PR #93365) In-Reply-To: Message-ID: <66ff1b6c.170a0220.d9bc6.729c@mx.google.com> ================ @@ -1311,4 +1314,96 @@ COMPILER_RT_VISIBILITY int __llvm_profile_set_file_object(FILE *File, return 0; } +int __llvm_write_custom_profile(const char *Target, + const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, + const char *CountersBegin, + const char *CountersEnd, const char *NamesBegin, + const char *NamesEnd) { + int ReturnValue = 0, FilenameLength, TargetLength; + char *FilenameBuf, *TargetFilename; + const char *Filename; + + /* Save old profile data */ + FILE *oldFile = getProfileFile(); + + // Temporarily suspend getting SIGKILL when the parent exits. + int PDeathSig = lprofSuspendSigKill(); + + if (lprofProfileDumped() || __llvm_profile_is_continuous_mode_enabled()) { + PROF_NOTE("Profile data not written to file: %s.\n", "already written"); + if (PDeathSig == 1) + lprofRestoreSigKill(); + return 0; + } + + /* Get current filename */ + FilenameLength = getCurFilenameLength(); + FilenameBuf = (char *)COMPILER_RT_ALLOCA(FilenameLength + 1); + Filename = getCurFilename(FilenameBuf, 0); + + /* Check the filename. */ + if (!Filename) { + PROF_ERR("Failed to write file : %s\n", "Filename not set"); + if (PDeathSig == 1) + lprofRestoreSigKill(); + return -1; + } + + /* Allocate new space for our target-specific PGO filename */ + TargetLength = strlen(Target); + TargetFilename = + (char *)COMPILER_RT_ALLOCA(FilenameLength + TargetLength + 2); + + /* Prepend "TARGET." to current filename */ ---------------- jdenny-ornl wrote: Can we make this handle file names with directory components? Otherwise, I end up with errors like: ``` LLVM Profile Error: Failed to open file : amdgcn-amd-amdhsa./home/jdenny/tmp/default_15853421304062701701_0.profraw ``` https://github.com/llvm/llvm-project/pull/93365 From openmp-commits at lists.llvm.org Thu Oct 3 16:58:52 2024 From: openmp-commits at lists.llvm.org (Joel E. Denny via Openmp-commits) Date: Thu, 03 Oct 2024 16:58:52 -0700 (PDT) Subject: [Openmp-commits] [clang] [compiler-rt] [llvm] [openmp] [PGO][Offload] Add GPU profiling flags to driver (PR #94268) In-Reply-To: Message-ID: <66ff2fbc.170a0220.208f10.7901@mx.google.com> jdenny-ornl wrote: For some codes, I get the following error for a gfx906: ``` LLVM ERROR: Relocation for CG Profile could not be created: unknown relocation name ``` I see it for OpenMC, but the following is a simpler example: ``` $ cat test.c #include #include __attribute__((noinline)) double test(double x, int n) { double res = 1; for (int i = 0; i < n; ++i) res *= x; return res; } int main(int argc, char *argv[]) { double x = atof(argv[1]); unsigned n = atoi(argv[2]); #pragma omp target map(tofrom:x) x = test(x, n); printf("%f\n", x); return 0; } $ clang -O2 -g -fopenmp --offload-arch=native test.c -o test \ -fprofile-generate -fprofile-generate-gpu $ LLVM_PROFILE_FILE=test.profraw ./test 2 4 16.000000 $ llvm-profdata merge -output=test.profdata *.profraw $ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \ -fprofile-use-gpu=test.profdata ``` I can prevent the error by lowering the last -O2 to -O1 or by removing the `__attribute__((noinline))`. Am I doing something wrong? https://github.com/llvm/llvm-project/pull/94268 From openmp-commits at lists.llvm.org Fri Oct 4 03:40:30 2024 From: openmp-commits at lists.llvm.org (Tom Eccles via Openmp-commits) Date: Fri, 04 Oct 2024 03:40:30 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <66ffc61e.620a0220.1e23d.c553@mx.google.com> ================ @@ -787,6 +787,9 @@ void Flang::ConstructJob(Compilation &C, const JobAction &JA, if (Args.hasArg(options::OPT_fopenmp_force_usm)) CmdArgs.push_back("-fopenmp-force-usm"); + // TODO: OpenMP support isn't "done" yet, so for now we warn that it + // is experimental. + D.Diag(diag::warn_openmp_experimental); ---------------- tblah wrote: Please could you add a test that this warning is printed. `flang/test/Driver/fopoenmp.f90` would be a good place. There is an example in the file already checking for a warning. https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Fri Oct 4 06:20:14 2024 From: openmp-commits at lists.llvm.org (Michael Kruse via Openmp-commits) Date: Fri, 04 Oct 2024 06:20:14 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [Clang][OpenMP] Add permutation clause (PR #92030) In-Reply-To: Message-ID: <66ffeb8e.a70a0220.262833.dc51@mx.google.com> Meinersbur wrote: @alexey-bataev Could you have another look? https://github.com/llvm/llvm-project/pull/92030 From openmp-commits at lists.llvm.org Fri Oct 4 06:27:37 2024 From: openmp-commits at lists.llvm.org (Brad Richardson via Openmp-commits) Date: Fri, 04 Oct 2024 06:27:37 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <66ffed49.a70a0220.d0e1e.d75c@mx.google.com> https://github.com/everythingfunctional updated https://github.com/llvm/llvm-project/pull/110023 >From 649a73478c78389560042030a9717a05e8e338a8 Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Wed, 25 Sep 2024 13:25:22 -0500 Subject: [PATCH 1/5] [flang][driver] rename flang-new to flang --- .github/workflows/release-binaries.yml | 2 +- clang/include/clang/Driver/Options.td | 4 +- clang/lib/Driver/Driver.cpp | 2 +- clang/lib/Driver/ToolChains/Flang.cpp | 6 +- clang/test/Driver/flang/flang.f90 | 2 +- clang/test/Driver/flang/flang_ucase.F90 | 2 +- .../Driver/flang/multiple-inputs-mixed.f90 | 2 +- clang/test/Driver/flang/multiple-inputs.f90 | 4 +- flang/docs/FlangDriver.md | 76 +++++++++---------- flang/docs/ImplementingASemanticCheck.md | 4 +- flang/docs/Overview.md | 26 +++---- .../FlangOmpReport/FlangOmpReport.cpp | 2 +- .../flang/Optimizer/Analysis/AliasAnalysis.h | 2 +- flang/include/flang/Tools/CrossToolHelpers.h | 2 +- flang/lib/Frontend/CompilerInvocation.cpp | 6 +- flang/lib/Frontend/FrontendActions.cpp | 2 +- .../ExecuteCompilerInvocation.cpp | 3 +- flang/runtime/CMakeLists.txt | 6 +- flang/test/CMakeLists.txt | 2 +- flang/test/Driver/aarch64-outline-atomics.f90 | 2 +- .../Driver/color-diagnostics-forwarding.f90 | 4 +- flang/test/Driver/compiler-options.f90 | 4 +- flang/test/Driver/convert.f90 | 2 +- .../test/Driver/disable-ext-name-interop.f90 | 2 +- flang/test/Driver/driver-version.f90 | 4 +- flang/test/Driver/escaped-backslash.f90 | 4 +- flang/test/Driver/fdefault.f90 | 28 +++---- flang/test/Driver/flarge-sizes.f90 | 20 ++--- .../test/Driver/frame-pointer-forwarding.f90 | 2 +- flang/test/Driver/frontend-forwarding.f90 | 4 +- flang/test/Driver/hlfir-no-hlfir-error.f90 | 4 +- flang/test/Driver/intrinsic-module-path.f90 | 2 +- flang/test/Driver/large-data-threshold.f90 | 6 +- flang/test/Driver/lto-flags.f90 | 2 +- flang/test/Driver/macro-def-undef.F90 | 4 +- flang/test/Driver/missing-input.f90 | 14 ++-- flang/test/Driver/multiple-input-files.f90 | 2 +- flang/test/Driver/omp-driver-offload.f90 | 66 ++++++++-------- .../predefined-macros-compiler-version.F90 | 4 +- flang/test/Driver/std2018-wrong.f90 | 2 +- flang/test/Driver/std2018.f90 | 2 +- .../Driver/supported-suffices/f03-suffix.f03 | 2 +- .../Driver/supported-suffices/f08-suffix.f08 | 2 +- flang/test/Driver/use-module-error.f90 | 4 +- flang/test/Driver/use-module.f90 | 4 +- flang/test/Driver/version-loops.f90 | 18 ++--- flang/test/Driver/wextra-ok.f90 | 2 +- flang/test/HLFIR/hlfir-flags.f90 | 2 +- .../Intrinsics/command_argument_count.f90 | 4 +- flang/test/Lower/Intrinsics/exit.f90 | 2 +- .../test/Lower/Intrinsics/ieee_is_normal.f90 | 2 +- flang/test/Lower/Intrinsics/isnan.f90 | 2 +- flang/test/Lower/Intrinsics/modulo.f90 | 2 +- .../OpenMP/Todo/omp-declarative-allocate.f90 | 2 +- .../OpenMP/Todo/omp-declare-reduction.f90 | 2 +- .../Lower/OpenMP/Todo/omp-declare-simd.f90 | 2 +- .../parallel-lastprivate-clause-scalar.f90 | 2 +- .../parallel-wsloop-reduction-byref.f90 | 2 +- .../OpenMP/parallel-wsloop-reduction.f90 | 2 +- flang/test/lit.cfg.py | 4 +- flang/tools/f18/CMakeLists.txt | 10 +-- flang/tools/flang-driver/CMakeLists.txt | 12 +-- flang/tools/flang-driver/driver.cpp | 6 +- llvm/runtimes/CMakeLists.txt | 10 +-- offload/CMakeLists.txt | 4 +- openmp/CMakeLists.txt | 4 +- 66 files changed, 220 insertions(+), 227 deletions(-) diff --git a/.github/workflows/release-binaries.yml b/.github/workflows/release-binaries.yml index 925912df6843e4..6073ebac9e6c2c 100644 --- a/.github/workflows/release-binaries.yml +++ b/.github/workflows/release-binaries.yml @@ -328,7 +328,7 @@ jobs: run: | # Build some of the mlir tools that take a long time to link if [ "${{ needs.prepare.outputs.build-flang }}" = "true" ]; then - ninja -C ${{ steps.setup-stage.outputs.build-prefix }}/build/tools/clang/stage2-bins/ -j2 flang-new bbc + ninja -C ${{ steps.setup-stage.outputs.build-prefix }}/build/tools/clang/stage2-bins/ -j2 flang bbc fi ninja -C ${{ steps.setup-stage.outputs.build-prefix }}/build/tools/clang/stage2-bins/ \ mlir-bytecode-parser-fuzzer \ diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 932cf13edab53d..4a45a825da8fa1 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -6071,7 +6071,7 @@ def _sysroot_EQ : Joined<["--"], "sysroot=">, Visibility<[ClangOption, FlangOpti def _sysroot : Separate<["--"], "sysroot">, Alias<_sysroot_EQ>; //===----------------------------------------------------------------------===// -// pie/pic options (clang + flang-new) +// pie/pic options (clang + flang) //===----------------------------------------------------------------------===// let Visibility = [ClangOption, FlangOption] in { @@ -6087,7 +6087,7 @@ def fno_pie : Flag<["-"], "fno-pie">, Group; } // let Vis = [Default, FlangOption] //===----------------------------------------------------------------------===// -// Target Options (clang + flang-new) +// Target Options (clang + flang) //===----------------------------------------------------------------------===// let Flags = [TargetSpecific] in { let Visibility = [ClangOption, FlangOption] in { diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp index d0c8bdba0ede95..4243ee006c1553 100644 --- a/clang/lib/Driver/Driver.cpp +++ b/clang/lib/Driver/Driver.cpp @@ -2021,7 +2021,7 @@ void Driver::PrintHelp(bool ShowHidden) const { void Driver::PrintVersion(const Compilation &C, raw_ostream &OS) const { if (IsFlangMode()) { - OS << getClangToolFullVersion("flang-new") << '\n'; + OS << getClangToolFullVersion("flang") << '\n'; } else { // FIXME: The following handlers should use a callback mechanism, we don't // know what the client would like to do. diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp index 98350690f8d20e..1ca12ff81389a3 100644 --- a/clang/lib/Driver/ToolChains/Flang.cpp +++ b/clang/lib/Driver/ToolChains/Flang.cpp @@ -881,14 +881,12 @@ void Flang::ConstructJob(Compilation &C, const JobAction &JA, CmdArgs.push_back(Input.getFilename()); - // TODO: Replace flang-new with flang once the new driver replaces the - // throwaway driver - const char *Exec = Args.MakeArgString(D.GetProgramPath("flang-new", TC)); + const char *Exec = Args.MakeArgString(D.GetProgramPath("flang", TC)); C.addCommand(std::make_unique(JA, *this, ResponseFileSupport::AtFileUTF8(), Exec, CmdArgs, Inputs, Output)); } -Flang::Flang(const ToolChain &TC) : Tool("flang-new", "flang frontend", TC) {} +Flang::Flang(const ToolChain &TC) : Tool("flang", "flang frontend", TC) {} Flang::~Flang() {} diff --git a/clang/test/Driver/flang/flang.f90 b/clang/test/Driver/flang/flang.f90 index ad4a3a3b6bd44d..b52977ee66d7b0 100644 --- a/clang/test/Driver/flang/flang.f90 +++ b/clang/test/Driver/flang/flang.f90 @@ -13,7 +13,7 @@ ! * (no type specified, resulting in an object file) ! All invocations should begin with flang -fc1, consume up to here. -! ALL-LABEL: "{{[^"]*}}flang-new{{[^"/]*}}" "-fc1" +! ALL-LABEL: "{{[^"]*}}flang{{[^"/]*}}" "-fc1" ! Check that f90 files are not treated as "previously preprocessed" ! ... in --driver-mode=flang. diff --git a/clang/test/Driver/flang/flang_ucase.F90 b/clang/test/Driver/flang/flang_ucase.F90 index e89c053b327bc9..88aedc39fb94a7 100644 --- a/clang/test/Driver/flang/flang_ucase.F90 +++ b/clang/test/Driver/flang/flang_ucase.F90 @@ -13,7 +13,7 @@ ! * (no type specified, resulting in an object file) ! All invocations should begin with flang -fc1, consume up to here. -! ALL-LABEL: "{{[^"]*}}flang-new{{[^"/]*}}" "-fc1" +! ALL-LABEL: "{{[^"]*}}flang{{[^"/]*}}" "-fc1" ! Check that f90 files are not treated as "previously preprocessed" ! ... in --driver-mode=flang. diff --git a/clang/test/Driver/flang/multiple-inputs-mixed.f90 b/clang/test/Driver/flang/multiple-inputs-mixed.f90 index 2395dbecf1fe92..98d8cab00bdfdb 100644 --- a/clang/test/Driver/flang/multiple-inputs-mixed.f90 +++ b/clang/test/Driver/flang/multiple-inputs-mixed.f90 @@ -1,7 +1,7 @@ ! Check that flang can handle mixed C and fortran inputs. ! RUN: %clang --driver-mode=flang -### -fsyntax-only %S/Inputs/one.f90 %S/Inputs/other.c 2>&1 | FileCheck --check-prefixes=CHECK-SYNTAX-ONLY %s -! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang-new{{[^"/]*}}" "-fc1" +! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang{{[^"/]*}}" "-fc1" ! CHECK-SYNTAX-ONLY: "{{[^"]*}}/Inputs/one.f90" ! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}clang{{[^"/]*}}" "-cc1" ! CHECK-SYNTAX-ONLY: "{{[^"]*}}/Inputs/other.c" diff --git a/clang/test/Driver/flang/multiple-inputs.f90 b/clang/test/Driver/flang/multiple-inputs.f90 index ada999e927a6a0..3c0f22e5d3e508 100644 --- a/clang/test/Driver/flang/multiple-inputs.f90 +++ b/clang/test/Driver/flang/multiple-inputs.f90 @@ -1,7 +1,7 @@ ! Check that flang driver can handle multiple inputs at once. ! RUN: %clang --driver-mode=flang -### -fsyntax-only %S/Inputs/one.f90 %S/Inputs/two.f90 2>&1 | FileCheck --check-prefixes=CHECK-SYNTAX-ONLY %s -! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang-new{{[^"/]*}}" "-fc1" +! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang{{[^"/]*}}" "-fc1" ! CHECK-SYNTAX-ONLY: "{{[^"]*}}/Inputs/one.f90" -! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang-new{{[^"/]*}}" "-fc1" +! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang{{[^"/]*}}" "-fc1" ! CHECK-SYNTAX-ONLY: "{{[^"]*}}/Inputs/two.f90" diff --git a/flang/docs/FlangDriver.md b/flang/docs/FlangDriver.md index 815c26a28dfdfa..47cf078cf2d0d4 100644 --- a/flang/docs/FlangDriver.md +++ b/flang/docs/FlangDriver.md @@ -15,17 +15,13 @@ local: ``` There are two main drivers in Flang: -* the compiler driver, `flang-new` -* the frontend driver, `flang-new -fc1` - -> **_NOTE:_** The diagrams in this document refer to `flang` as opposed to -> `flang-new`. Eventually, `flang-new` will be renamed as `flang` and the -> diagrams reflect the final design that we are still working towards. +* the compiler driver, `flang` +* the frontend driver, `flang -fc1` The **compiler driver** will allow you to control all compilation phases (e.g. preprocessing, semantic checks, code-generation, code-optimisation, lowering and linking). For frontend specific tasks, the compiler driver creates a -Fortran compilation job and delegates it to `flang-new -fc1`, the frontend +Fortran compilation job and delegates it to `flang -fc1`, the frontend driver. For linking, it creates a linker job and calls an external linker (e.g. LLVM's [`lld`](https://lld.llvm.org/)). It can also call other tools such as external assemblers (e.g. [`as`](https://www.gnu.org/software/binutils/)). In @@ -47,7 +43,7 @@ frontend. It uses MLIR and LLVM for code-generation and can be viewed as a driver for Flang, LLVM and MLIR libraries. Contrary to the compiler driver, it is not capable of calling any external tools (including linkers). It is aware of all the frontend internals that are "hidden" from the compiler driver. It -accepts many frontend-specific options not available in `flang-new` and as such +accepts many frontend-specific options not available in `flang` and as such it provides a finer control over the frontend. Note that this tool is mostly intended for Flang developers. In particular, there are no guarantees about the stability of its interface and compiler developers can use it to experiment @@ -62,30 +58,30 @@ frontend specific flag from the _compiler_ directly to the _frontend_ driver, e.g.: ```bash -flang-new -Xflang -fdebug-dump-parse-tree input.f95 +flang -Xflang -fdebug-dump-parse-tree input.f95 ``` -In the invocation above, `-fdebug-dump-parse-tree` is forwarded to `flang-new +In the invocation above, `-fdebug-dump-parse-tree` is forwarded to `flang -fc1`. Without the forwarding flag, `-Xflang`, you would see the following warning: ```bash -flang-new: warning: argument unused during compilation: +flang: warning: argument unused during compilation: ``` -As `-fdebug-dump-parse-tree` is only supported by `flang-new -fc1`, `flang-new` +As `-fdebug-dump-parse-tree` is only supported by `flang -fc1`, `flang` will ignore it when used without `Xflang`. ## Why Do We Need Two Drivers? -As hinted above, `flang-new` and `flang-new -fc1` are two separate tools. The -fact that these tools are accessed through one binary, `flang-new`, is just an +As hinted above, `flang` and `flang -fc1` are two separate tools. The +fact that these tools are accessed through one binary, `flang`, is just an implementation detail. Each tool has a separate list of options, albeit defined in the same file: `clang/include/clang/Driver/Options.td`. The separation helps us split various tasks and allows us to implement more -specialised tools. In particular, `flang-new` is not aware of various +specialised tools. In particular, `flang` is not aware of various compilation phases within the frontend (e.g. scanning, parsing or semantic -checks). It does not have to be. Conversely, the frontend driver, `flang-new +checks). It does not have to be. Conversely, the frontend driver, `flang -fc1`, needs not to be concerned with linkers or other external tools like assemblers. Nor does it need to know where to look for various systems libraries, which is usually OS and platform specific. @@ -104,7 +100,7 @@ GCC](https://en.wikibooks.org/wiki/GNU_C_Compiler_Internals/GNU_C_Compiler_Archi In fact, Flang needs to adhere to this model in order to be able to re-use Clang's driver library. If you are more familiar with the [architecture of GFortran](https://gcc.gnu.org/onlinedocs/gcc-4.7.4/gfortran/About-GNU-Fortran.html) -than Clang, then `flang-new` corresponds to `gfortran` and `flang-new -fc1` to +than Clang, then `flang` corresponds to `gfortran` and `flang -fc1` to `f951`. ## Compiler Driver @@ -135,7 +131,7 @@ output from one action is the input for the subsequent one. You can use the `-ccc-print-phases` flag to see the sequence of actions that the driver will create for your compiler invocation: ```bash -flang-new -ccc-print-phases -E file.f +flang -ccc-print-phases -E file.f +- 0: input, "file.f", f95-cpp-input 1: preprocessor, {0}, f95 ``` @@ -143,7 +139,7 @@ As you can see, for `-E` the driver creates only two jobs and stops immediately after preprocessing. The first job simply prepares the input. For `-c`, the pipeline of the created jobs is more complex: ```bash -flang-new -ccc-print-phases -c file.f +flang -ccc-print-phases -c file.f +- 0: input, "file.f", f95-cpp-input +- 1: preprocessor, {0}, f95 +- 2: compiler, {1}, ir @@ -158,7 +154,7 @@ command to call the frontend driver is generated (more specifically, an instance of `clang::driver::Command`). Every command is bound to an instance of `clang::driver::Tool`. For Flang we introduced a specialisation of this class: `clang::driver::Flang`. This class implements the logic to either translate or -forward compiler options to the frontend driver, `flang-new -fc1`. +forward compiler options to the frontend driver, `flang -fc1`. You can read more on the design of `clangDriver` in Clang's [Driver Design & Internals](https://clang.llvm.org/docs/DriverInternals.html). @@ -232,12 +228,12 @@ driver, `clang -cc1` and consists of the following classes: This list is not exhaustive and only covers the main classes that implement the driver. The main entry point for the frontend driver, `fc1_main`, is implemented in `flang/tools/flang-driver/driver.cpp`. It can be accessed by -invoking the compiler driver, `flang-new`, with the `-fc1` flag. +invoking the compiler driver, `flang`, with the `-fc1` flag. The frontend driver will only run one action at a time. If you specify multiple action flags, only the last one will be taken into account. The default action is `ParseSyntaxOnlyAction`, which corresponds to `-fsyntax-only`. In other -words, `flang-new -fc1 ` is equivalent to `flang-new -fc1 -fsyntax-only +words, `flang -fc1 ` is equivalent to `flang -fc1 -fsyntax-only `. ## Adding new Compiler Options @@ -262,8 +258,8 @@ similar semantics to your new option and start by copying that. For every new option, you will also have to define the visibility of the new option. This is controlled through the `Visibility` field. You can use the following Flang specific visibility flags to control this: - * `FlangOption` - this option will be available in the `flang-new` compiler driver, - * `FC1Option` - this option will be available in the `flang-new -fc1` frontend driver, + * `FlangOption` - this option will be available in the `flang` compiler driver, + * `FC1Option` - this option will be available in the `flang -fc1` frontend driver, Options that are supported by clang should explicitly specify `ClangOption` in `Visibility`, and options that are only supported in Flang should not specify @@ -290,10 +286,10 @@ The parsing will depend on the semantics encoded in the TableGen definition. When adding a compiler driver option (i.e. an option that contains `FlangOption` among in it's `Visibility`) that you also intend to be understood -by the frontend, make sure that it is either forwarded to `flang-new -fc1` or +by the frontend, make sure that it is either forwarded to `flang -fc1` or translated into some other option that is accepted by the frontend driver. In the case of options that contain both `FlangOption` and `FC1Option` among its -flags, we usually just forward from `flang-new` to `flang-new -fc1`. This is +flags, we usually just forward from `flang` to `flang -fc1`. This is then tested in `flang/test/Driver/frontend-forward.F90`. What follows is usually very dependant on the meaning of the corresponding @@ -339,11 +335,11 @@ just added using your new frontend option. ## CMake Support As of [#7246](https://gitlab.kitware.com/cmake/cmake/-/merge_requests/7246) -(and soon to be released CMake 3.24.0), `cmake` can detect `flang-new` as a +(and soon to be released CMake 3.24.0), `cmake` can detect `flang` as a supported Fortran compiler. You can configure your CMake projects to use -`flang-new` as follows: +`flang` as follows: ```bash -cmake -DCMAKE_Fortran_COMPILER= +cmake -DCMAKE_Fortran_COMPILER= ``` You should see the following in the output: ``` @@ -353,14 +349,14 @@ where `` corresponds to the LLVM Flang version. ## Testing In LIT, we define two variables that you can use to invoke Flang's drivers: -* `%flang` is expanded as `flang-new` (i.e. the compiler driver) -* `%flang_fc1` is expanded as `flang-new -fc1` (i.e. the frontend driver) +* `%flang` is expanded as `flang` (i.e. the compiler driver) +* `%flang_fc1` is expanded as `flang -fc1` (i.e. the frontend driver) For most regression tests for the frontend, you will want to use `%flang_fc1`. In some cases, the observable behaviour will be identical regardless of whether `%flang` or `%flang_fc1` is used. However, when you are using `%flang` instead of `%flang_fc1`, the compiler driver will add extra flags to the frontend -driver invocation (i.e. `flang-new -fc1 -`). In some cases that might +driver invocation (i.e. `flang -fc1 -`). In some cases that might be exactly what you want to test. In fact, you can check these additional flags by using the `-###` compiler driver command line option. @@ -380,7 +376,7 @@ plugins. The process for using plugins includes: * [Creating a plugin](#creating-a-plugin) * [Loading and running a plugin](#loading-and-running-a-plugin) -Flang plugins are limited to `flang-new -fc1` and are currently only available / +Flang plugins are limited to `flang -fc1` and are currently only available / been tested on Linux. ### Creating a Plugin @@ -465,14 +461,14 @@ static FrontendPluginRegistry::Add X( ### Loading and Running a Plugin In order to use plugins, there are 2 command line options made available to the -frontend driver, `flang-new -fc1`: +frontend driver, `flang -fc1`: * [`-load `](#the--load-dsopath-option) for loading the dynamic shared object of the plugin * [`-plugin `](#the--plugin-name-option) for calling the registered plugin Invocation of the example plugin is done through: ```bash -flang-new -fc1 -load flangPrintFunctionNames.so -plugin print-fns file.f90 +flang -fc1 -load flangPrintFunctionNames.so -plugin print-fns file.f90 ``` Both these options are parsed in `flang/lib/Frontend/CompilerInvocation.cpp` and @@ -493,7 +489,7 @@ reports an error diagnostic and returns `nullptr`. ### Enabling In-Tree Plugins For in-tree plugins, there is the CMake flag `FLANG_PLUGIN_SUPPORT`, enabled by -default, that controls the exporting of executable symbols from `flang-new`, +default, that controls the exporting of executable symbols from `flang`, which plugins need access to. Additionally, there is the CMake flag `LLVM_BUILD_EXAMPLES`, turned off by default, that is used to control if the example programs are built. This includes plugins that are in the @@ -526,7 +522,7 @@ invocations `invokeFIROptEarlyEPCallbacks`, `invokeFIRInlinerCallback`, and `invokeFIROptLastEPCallbacks` for Flang drivers to be able to insert additonal passes at different points of the default pass pipeline. An example use of these extension point callbacks is shown in `registerDefaultInlinerPass` to invoke the -default inliner pass in `flang-new`. +default inliner pass in `flang`. ## LLVM Pass Plugins @@ -539,7 +535,7 @@ documentation for [`llvm::PassBuilder`](https://llvm.org/doxygen/classllvm_1_1PassBuilder.html) for details. -The framework to enable pass plugins in `flang-new` uses the exact same +The framework to enable pass plugins in `flang` uses the exact same machinery as that used by `clang` and thus has the same capabilities and limitations. @@ -547,7 +543,7 @@ In order to use a pass plugin, the pass(es) must be compiled into a dynamic shared object which is then loaded using the `-fpass-plugin` option. ``` -flang-new -fpass-plugin=/path/to/plugin.so +flang -fpass-plugin=/path/to/plugin.so ``` This option is available in both the compiler driver and the frontend driver. @@ -559,7 +555,7 @@ Pass extensions are similar to plugins, except that they can also be linked statically. Setting `-DLLVM_${NAME}_LINK_INTO_TOOLS` to `ON` in the cmake command turns the project into a statically linked extension. An example would be Polly, e.g., using `-DLLVM_POLLY_LINK_INTO_TOOLS=ON` would link Polly passes -into `flang-new` as built-in middle-end passes. +into `flang` as built-in middle-end passes. See the [`WritingAnLLVMNewPMPass`](https://llvm.org/docs/WritingAnLLVMNewPMPass.html#id9) diff --git a/flang/docs/ImplementingASemanticCheck.md b/flang/docs/ImplementingASemanticCheck.md index 5b583d4f8031b8..598ef696ad14bf 100644 --- a/flang/docs/ImplementingASemanticCheck.md +++ b/flang/docs/ImplementingASemanticCheck.md @@ -68,7 +68,7 @@ of the call to `intentOutFunc()`: I also used this program to produce a parse tree for the program using the command: ```bash - flang-new -fc1 -fdebug-dump-parse-tree testfun.f90 + flang -fc1 -fdebug-dump-parse-tree testfun.f90 ``` Here's the relevant fragment of the parse tree produced by the compiler: @@ -296,7 +296,7 @@ In `lib/Semantics/check-do.cpp`, I added an (almost empty) implementation: I then built the compiler with these changes and ran it on my test program. This time, I made sure to invoke semantic checking. Here's the command I used: ```bash - flang-new -fc1 -fdebug-unparse-with-symbols testfun.f90 + flang -fc1 -fdebug-unparse-with-symbols testfun.f90 ``` This produced the output: diff --git a/flang/docs/Overview.md b/flang/docs/Overview.md index 6eba19ea3a3c0d..dfb4d89264a755 100644 --- a/flang/docs/Overview.md +++ b/flang/docs/Overview.md @@ -65,8 +65,8 @@ See [Preprocessing.md](Preprocessing.md). **Entry point:** `parser::Parsing::Prescan` **Commands:** - - `flang-new -fc1 -E src.f90` dumps the cooked character stream - - `flang-new -fc1 -fdebug-dump-provenance src.f90` dumps provenance + - `flang -fc1 -E src.f90` dumps the cooked character stream + - `flang -fc1 -fdebug-dump-provenance src.f90` dumps provenance information ### Parsing @@ -80,10 +80,10 @@ representing a syntactically correct program, rooted at the program unit. See: **Entry point:** `parser::Parsing::Parse` **Commands:** - - `flang-new -fc1 -fdebug-dump-parse-tree-no-sema src.f90` dumps the parse tree - - `flang-new -fc1 -fdebug-unparse src.f90` converts the parse tree to normalized Fortran - - `flang-new -fc1 -fdebug-dump-parsing-log src.f90` runs an instrumented parse and dumps the log - - `flang-new -fc1 -fdebug-measure-parse-tree src.f90` measures the parse tree + - `flang -fc1 -fdebug-dump-parse-tree-no-sema src.f90` dumps the parse tree + - `flang -fc1 -fdebug-unparse src.f90` converts the parse tree to normalized Fortran + - `flang -fc1 -fdebug-dump-parsing-log src.f90` runs an instrumented parse and dumps the log + - `flang -fc1 -fdebug-measure-parse-tree src.f90` measures the parse tree ### Semantic processing @@ -121,9 +121,9 @@ In the course of semantic analysis, the compiler: At the end of semantic processing, all validation of the user's program is complete. This is the last detailed phase of analysis processing. **Commands:** - - `flang-new -fc1 -fdebug-dump-parse-tree src.f90` dumps the parse tree after semantic analysis - - `flang-new -fc1 -fdebug-dump-symbols src.f90` dumps the symbol table - - `flang-new -fc1 -fdebug-dump-all src.f90` dumps both the parse tree and the symbol table + - `flang -fc1 -fdebug-dump-parse-tree src.f90` dumps the parse tree after semantic analysis + - `flang -fc1 -fdebug-dump-symbols src.f90` dumps the symbol table + - `flang -fc1 -fdebug-dump-all src.f90` dumps both the parse tree and the symbol table ## Lowering @@ -163,8 +163,8 @@ contain a list of evaluations. All of these contain pointers back into the parse tree. The compiler walks the PFT generating FIR. **Commands:** - - `flang-new -fc1 -fdebug-dump-pft src.f90` dumps the pre-FIR tree - - `flang-new -fc1 -emit-mlir src.f90` dumps the FIR to the files src.mlir + - `flang -fc1 -fdebug-dump-pft src.f90` dumps the pre-FIR tree + - `flang -fc1 -emit-mlir src.f90` dumps the FIR to the files src.mlir ### Transformation passes @@ -180,8 +180,8 @@ perform various optimizations and transformations. The final pass creates an LLVM IR representation of the program. **Commands:** - - `flang-new -mmlir --mlir-print-ir-after-all -S src.f90` dumps the FIR code after each pass to standard error - - `flang-new -fc1 -emit-llvm src.f90` dumps the LLVM IR to src.ll + - `flang -mmlir --mlir-print-ir-after-all -S src.f90` dumps the FIR code after each pass to standard error + - `flang -fc1 -emit-llvm src.f90` dumps the LLVM IR to src.ll ## Object code generation and linking diff --git a/flang/examples/FlangOmpReport/FlangOmpReport.cpp b/flang/examples/FlangOmpReport/FlangOmpReport.cpp index 9c1f304b9741e7..709c5c5d305e51 100644 --- a/flang/examples/FlangOmpReport/FlangOmpReport.cpp +++ b/flang/examples/FlangOmpReport/FlangOmpReport.cpp @@ -9,7 +9,7 @@ // all the OpenMP constructs and clauses and which line they're located on. // // The plugin may be invoked as: -// ./bin/flang-new -fc1 -load lib/flangOmpReport.so -plugin flang-omp-report +// ./bin/flang -fc1 -load lib/flangOmpReport.so -plugin flang-omp-report // -fopenmp // //===----------------------------------------------------------------------===// diff --git a/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h b/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h index 9a70b7fbfad2b6..8ab5150cd7c812 100644 --- a/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h +++ b/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h @@ -67,7 +67,7 @@ struct AliasAnalysis { // end subroutine // ------------------------------------------------- // - // flang-new -fc1 -emit-fir test.f90 -o test.fir + // flang -fc1 -emit-fir test.f90 -o test.fir // // ------------------- test.fir -------------------- // fir.global @_QMtopEa : !fir.box>> diff --git a/flang/include/flang/Tools/CrossToolHelpers.h b/flang/include/flang/Tools/CrossToolHelpers.h index 3e703de545950c..df4b21ada058fe 100644 --- a/flang/include/flang/Tools/CrossToolHelpers.h +++ b/flang/include/flang/Tools/CrossToolHelpers.h @@ -7,7 +7,7 @@ //===----------------------------------------------------------------------===// // A header file for containing functionallity that is used across Flang tools, // such as helper functions which apply or generate information needed accross -// tools like bbc and flang-new. +// tools like bbc and flang. //===----------------------------------------------------------------------===// #ifndef FORTRAN_TOOLS_CROSS_TOOL_HELPERS_H diff --git a/flang/lib/Frontend/CompilerInvocation.cpp b/flang/lib/Frontend/CompilerInvocation.cpp index 05b03ba9ebdf30..18383eaafb1136 100644 --- a/flang/lib/Frontend/CompilerInvocation.cpp +++ b/flang/lib/Frontend/CompilerInvocation.cpp @@ -65,8 +65,8 @@ CompilerInvocationBase::~CompilerInvocationBase() = default; static bool parseShowColorsArgs(const llvm::opt::ArgList &args, bool defaultColor = true) { // Color diagnostics default to auto ("on" if terminal supports) in the - // compiler driver `flang-new` but default to off in the frontend driver - // `flang-new -fc1`, needing an explicit OPT_fdiagnostics_color. + // compiler driver `flang` but default to off in the frontend driver + // `flang -fc1`, needing an explicit OPT_fdiagnostics_color. // Support both clang's -f[no-]color-diagnostics and gcc's // -f[no-]diagnostics-colors[=never|always|auto]. enum { @@ -891,7 +891,7 @@ static bool parseDiagArgs(CompilerInvocation &res, llvm::opt::ArgList &args, } } - // Default to off for `flang-new -fc1`. + // Default to off for `flang -fc1`. res.getFrontendOpts().showColors = parseShowColorsArgs(args, /*defaultDiagColor=*/false); diff --git a/flang/lib/Frontend/FrontendActions.cpp b/flang/lib/Frontend/FrontendActions.cpp index 4a52edc436e0ed..8f882bff170909 100644 --- a/flang/lib/Frontend/FrontendActions.cpp +++ b/flang/lib/Frontend/FrontendActions.cpp @@ -233,7 +233,7 @@ bool CodeGenAction::beginSourceFileAction() { llvm::SMDiagnostic err; llvmModule = llvm::parseIRFile(getCurrentInput().getFile(), err, *llvmCtx); if (!llvmModule || llvm::verifyModule(*llvmModule, &llvm::errs())) { - err.print("flang-new", llvm::errs()); + err.print("flang", llvm::errs()); unsigned diagID = ci.getDiagnostics().getCustomDiagID( clang::DiagnosticsEngine::Error, "Could not parse IR"); ci.getDiagnostics().Report(diagID); diff --git a/flang/lib/FrontendTool/ExecuteCompilerInvocation.cpp b/flang/lib/FrontendTool/ExecuteCompilerInvocation.cpp index e2cbd5112d6ea5..09ac129d3e6893 100644 --- a/flang/lib/FrontendTool/ExecuteCompilerInvocation.cpp +++ b/flang/lib/FrontendTool/ExecuteCompilerInvocation.cpp @@ -154,8 +154,7 @@ bool executeCompilerInvocation(CompilerInstance *flang) { // Honor -help. if (flang->getFrontendOpts().showHelp) { clang::driver::getDriverOptTable().printHelp( - llvm::outs(), "flang-new -fc1 [options] file...", - "LLVM 'Flang' Compiler", + llvm::outs(), "flang -fc1 [options] file...", "LLVM 'Flang' Compiler", /*ShowHidden=*/false, /*ShowAllAliases=*/false, llvm::opt::Visibility(clang::driver::options::FC1Option)); return true; diff --git a/flang/runtime/CMakeLists.txt b/flang/runtime/CMakeLists.txt index 0ad1b718d5875b..cdd2de541c6730 100644 --- a/flang/runtime/CMakeLists.txt +++ b/flang/runtime/CMakeLists.txt @@ -308,12 +308,12 @@ set_target_properties(FortranRuntime PROPERTIES FOLDER "Flang/Runtime Libraries" # If FortranRuntime is part of a Flang build (and not a separate build) then # add dependency to make sure that Fortran runtime library is being built after # we have the Flang compiler available. This also includes the MODULE files -# that compile when the 'flang-new' target is built. +# that compile when the 'flang' target is built. # # TODO: This is a workaround and should be updated when runtime build procedure # is changed to a regular runtime build. See discussion in PR #95388. -if (TARGET flang-new AND TARGET module_files) - add_dependencies(FortranRuntime flang-new module_files) +if (TARGET flang AND TARGET module_files) + add_dependencies(FortranRuntime flang module_files) endif() if (FLANG_CUF_RUNTIME) diff --git a/flang/test/CMakeLists.txt b/flang/test/CMakeLists.txt index a18a5c6519eda4..cab214c2ef4c8c 100644 --- a/flang/test/CMakeLists.txt +++ b/flang/test/CMakeLists.txt @@ -58,7 +58,7 @@ set(FLANG_TEST_PARAMS flang_site_config=${CMAKE_CURRENT_BINARY_DIR}/lit.site.cfg.py) set(FLANG_TEST_DEPENDS - flang-new + flang llvm-config FileCheck count diff --git a/flang/test/Driver/aarch64-outline-atomics.f90 b/flang/test/Driver/aarch64-outline-atomics.f90 index a1c874c20df5c7..530bfc8e962091 100644 --- a/flang/test/Driver/aarch64-outline-atomics.f90 +++ b/flang/test/Driver/aarch64-outline-atomics.f90 @@ -1,4 +1,4 @@ -! Test that flang-new forwards the -moutline-atomics and -mno-outline-atomics. +! Test that flang forwards the -moutline-atomics and -mno-outline-atomics. ! RUN: %flang -moutline-atomics --target=aarch64-none-none -### %s -o %t 2>&1 | FileCheck %s ! CHECK: "-target-feature" "+outline-atomics" diff --git a/flang/test/Driver/color-diagnostics-forwarding.f90 b/flang/test/Driver/color-diagnostics-forwarding.f90 index 368fa8834142ab..29061242cb0cbc 100644 --- a/flang/test/Driver/color-diagnostics-forwarding.f90 +++ b/flang/test/Driver/color-diagnostics-forwarding.f90 @@ -1,5 +1,5 @@ -! Test that flang-new forwards -f{no-}color-diagnostics and -! -f{no-}diagnostics-color options to flang-new -fc1 as expected. +! Test that flang forwards -f{no-}color-diagnostics and +! -f{no-}diagnostics-color options to flang -fc1 as expected. ! RUN: %flang -fsyntax-only -### %s -o %t 2>&1 -fcolor-diagnostics \ ! RUN: | FileCheck %s --check-prefix=CHECK-CD diff --git a/flang/test/Driver/compiler-options.f90 b/flang/test/Driver/compiler-options.f90 index 7ec29ce7ba7abf..cefa86836abd30 100644 --- a/flang/test/Driver/compiler-options.f90 +++ b/flang/test/Driver/compiler-options.f90 @@ -1,6 +1,6 @@ ! RUN: %flang -S -emit-llvm -flang-deprecated-no-hlfir -o - %s | FileCheck %s -! Test communication of COMPILER_OPTIONS from flang-new to flang-new -fc1. -! CHECK: [[OPTSVAR:@_QQclX[0-9a-f]+]] = {{[a-z]+}} constant [[[OPTSLEN:[0-9]+]] x i8] c"{{.*}}flang-new{{(\.exe)?}} {{.*}}-S -emit-llvm -flang-deprecated-no-hlfir -o - {{.*}}compiler-options.f90" +! Test communication of COMPILER_OPTIONS from flang to flang -fc1. +! CHECK: [[OPTSVAR:@_QQclX[0-9a-f]+]] = {{[a-z]+}} constant [[[OPTSLEN:[0-9]+]] x i8] c"{{.*}}flang{{(\.exe)?}} {{.*}}-S -emit-llvm -flang-deprecated-no-hlfir -o - {{.*}}compiler-options.f90" program main use ISO_FORTRAN_ENV, only: compiler_options implicit none diff --git a/flang/test/Driver/convert.f90 b/flang/test/Driver/convert.f90 index b2cf6c23efdb75..0ba31d2188cdf5 100755 --- a/flang/test/Driver/convert.f90 +++ b/flang/test/Driver/convert.f90 @@ -12,7 +12,7 @@ ! RUN: not %flang -fconvert=foobar %s 2>&1 | FileCheck %s --check-prefix=INVALID !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -emit-mlir -fconvert=unknown %s -o - | FileCheck %s --check-prefix=VALID_FC1 ! RUN: %flang_fc1 -emit-mlir -fconvert=native %s -o - | FileCheck %s --check-prefix=VALID_FC1 diff --git a/flang/test/Driver/disable-ext-name-interop.f90 b/flang/test/Driver/disable-ext-name-interop.f90 index 0c59a5b4c980f8..1ade84b996d043 100644 --- a/flang/test/Driver/disable-ext-name-interop.f90 +++ b/flang/test/Driver/disable-ext-name-interop.f90 @@ -1,4 +1,4 @@ -! Test that we can disable the ExternalNameConversion pass in flang-new. +! Test that we can disable the ExternalNameConversion pass in flang. ! RUN: %flang_fc1 -S %s -o - 2>&1 | FileCheck %s --check-prefix=EXTNAMES ! RUN: %flang_fc1 -S -mmlir -disable-external-name-interop %s -o - 2>&1 | FileCheck %s --check-prefix=INTNAMES diff --git a/flang/test/Driver/driver-version.f90 b/flang/test/Driver/driver-version.f90 index d1e1e1d90fe1f8..4c6aecb1c4fa7e 100644 --- a/flang/test/Driver/driver-version.f90 +++ b/flang/test/Driver/driver-version.f90 @@ -4,12 +4,12 @@ ! RUN: %flang_fc1 -version 2>&1 | FileCheck %s --check-prefix=VERSION-FC1 ! RUN: not %flang_fc1 --version 2>&1 | FileCheck %s --check-prefix=ERROR-FC1 -! VERSION: flang-new version +! VERSION: flang version ! VERSION-NEXT: Target: ! VERSION-NEXT: Thread model: ! VERSION-NEXT: InstalledDir: -! ERROR: flang-new: error: unknown argument '--versions'; did you mean '--version'? +! ERROR: flang: error: unknown argument '--versions'; did you mean '--version'? ! VERSION-FC1: LLVM version diff --git a/flang/test/Driver/escaped-backslash.f90 b/flang/test/Driver/escaped-backslash.f90 index ad07eae24e9fab..90dd1783dd1150 100644 --- a/flang/test/Driver/escaped-backslash.f90 +++ b/flang/test/Driver/escaped-backslash.f90 @@ -1,14 +1,14 @@ ! Ensure argument -fbackslash works as expected. !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: %flang -E %s 2>&1 | FileCheck %s --check-prefix=ESCAPED ! RUN: %flang -E -fbackslash -fno-backslash %s 2>&1 | FileCheck %s --check-prefix=ESCAPED ! RUN: %flang -E -fbackslash %s 2>&1 | FileCheck %s --check-prefix=UNESCAPED !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -E %s 2>&1 | FileCheck %s --check-prefix=ESCAPED ! RUN: %flang_fc1 -E -fbackslash -fno-backslash %s 2>&1 | FileCheck %s --check-prefix=ESCAPED diff --git a/flang/test/Driver/fdefault.f90 b/flang/test/Driver/fdefault.f90 index 88592bfa3e87ee..7ce45b763a240f 100644 --- a/flang/test/Driver/fdefault.f90 +++ b/flang/test/Driver/fdefault.f90 @@ -2,25 +2,25 @@ ! TODO: Add checks when actual codegen is possible for this family !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang -fsyntax-only -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=NOOPTION -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang -fsyntax-only -fdefault-real-8 -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=REAL8 -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang -fsyntax-only -fdefault-real-8 -fdefault-double-8 -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=DOUBLE8 +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang -fsyntax-only -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=NOOPTION +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang -fsyntax-only -fdefault-real-8 -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=REAL8 +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang -fsyntax-only -fdefault-real-8 -fdefault-double-8 -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=DOUBLE8 ! RUN: not %flang -fsyntax-only -fdefault-double-8 %s 2>&1 | FileCheck %s --check-prefix=ERROR !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang_fc1 -fsyntax-only -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=NOOPTION -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang_fc1 -fsyntax-only -fdefault-real-8 -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=REAL8 -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang_fc1 -fsyntax-only -fdefault-real-8 -fdefault-double-8 -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=DOUBLE8 +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang_fc1 -fsyntax-only -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=NOOPTION +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang_fc1 -fsyntax-only -fdefault-real-8 -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=REAL8 +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang_fc1 -fsyntax-only -fdefault-real-8 -fdefault-double-8 -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=DOUBLE8 ! RUN: not %flang_fc1 -fsyntax-only -fdefault-double-8 %s 2>&1 | FileCheck %s --check-prefix=ERROR ! NOOPTION: integer(4),parameter::real_kind=4_4 diff --git a/flang/test/Driver/flarge-sizes.f90 b/flang/test/Driver/flarge-sizes.f90 index 6ea5876676ed1f..6c41a03a830bfb 100644 --- a/flang/test/Driver/flarge-sizes.f90 +++ b/flang/test/Driver/flarge-sizes.f90 @@ -2,20 +2,20 @@ ! TODO: Add checks when actual codegen is possible. !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang -fsyntax-only -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=NOLARGE -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang -fsyntax-only -flarge-sizes -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=LARGE +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang -fsyntax-only -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=NOLARGE +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang -fsyntax-only -flarge-sizes -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=LARGE !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang_fc1 -fsyntax-only -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=NOLARGE -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang_fc1 -fsyntax-only -flarge-sizes -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=LARGE +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang_fc1 -fsyntax-only -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=NOLARGE +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang_fc1 -fsyntax-only -flarge-sizes -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=LARGE ! NOLARGE: real(4)::z(1_8:10_8) ! NOLARGE-NEXT: integer(4),parameter::size_kind=4_4 diff --git a/flang/test/Driver/frame-pointer-forwarding.f90 b/flang/test/Driver/frame-pointer-forwarding.f90 index 751494cc6a6017..9fcbd6e12f98b7 100644 --- a/flang/test/Driver/frame-pointer-forwarding.f90 +++ b/flang/test/Driver/frame-pointer-forwarding.f90 @@ -1,4 +1,4 @@ -! Test that flang-new forwards -fno-omit-frame-pointer and -fomit-frame-pointer Flang frontend +! Test that flang forwards -fno-omit-frame-pointer and -fomit-frame-pointer Flang frontend ! RUN: %flang --target=aarch64-none-none -fsyntax-only -### %s -o %t 2>&1 | FileCheck %s --check-prefix=CHECK-NOVALUE ! CHECK-NOVALUE: "-fc1"{{.*}}"-mframe-pointer=non-leaf" diff --git a/flang/test/Driver/frontend-forwarding.f90 b/flang/test/Driver/frontend-forwarding.f90 index 35adb47b56861e..0a56a1e3710d9d 100644 --- a/flang/test/Driver/frontend-forwarding.f90 +++ b/flang/test/Driver/frontend-forwarding.f90 @@ -1,5 +1,5 @@ -! Test that flang-new forwards Flang frontend -! options to flang-new -fc1 as expected. +! Test that flang forwards Flang frontend +! options to flang -fc1 as expected. ! RUN: %flang -fsyntax-only -### %s -o %t 2>&1 \ ! RUN: -finput-charset=utf-8 \ diff --git a/flang/test/Driver/hlfir-no-hlfir-error.f90 b/flang/test/Driver/hlfir-no-hlfir-error.f90 index 2410393b6cd9c1..59f8304db5c9ab 100644 --- a/flang/test/Driver/hlfir-no-hlfir-error.f90 +++ b/flang/test/Driver/hlfir-no-hlfir-error.f90 @@ -2,12 +2,12 @@ ! options cannot be both used. !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: not %flang -flang-experimental-hlfir -flang-deprecated-no-hlfir %s 2>&1 | FileCheck %s !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: not %flang_fc1 -emit-llvm -flang-experimental-hlfir -flang-deprecated-no-hlfir %s 2>&1 | FileCheck %s diff --git a/flang/test/Driver/intrinsic-module-path.f90 b/flang/test/Driver/intrinsic-module-path.f90 index 5523ed37b724cd..15d19dd83d963f 100644 --- a/flang/test/Driver/intrinsic-module-path.f90 +++ b/flang/test/Driver/intrinsic-module-path.f90 @@ -4,7 +4,7 @@ ! default one, causing a CHECKSUM error. !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -fsyntax-only %s 2>&1 | FileCheck %s --allow-empty --check-prefix=WITHOUT ! RUN: not %flang_fc1 -fsyntax-only -fintrinsic-modules-path %S/Inputs/ %s 2>&1 | FileCheck %s --check-prefix=GIVEN diff --git a/flang/test/Driver/large-data-threshold.f90 b/flang/test/Driver/large-data-threshold.f90 index 320566c4b2e43a..6a7eef79559d0b 100644 --- a/flang/test/Driver/large-data-threshold.f90 +++ b/flang/test/Driver/large-data-threshold.f90 @@ -7,11 +7,11 @@ ! RUN: not %flang -### -c --target=aarch64 -mcmodel=small -mlarge-data-threshold=32768 %s 2>&1 | FileCheck %s --check-prefix=NOT-SUPPORTED -! CHECK: "{{.*}}flang-new" "-fc1" +! CHECK: "{{.*}}flang" "-fc1" ! CHECK-SAME: "-mlarge-data-threshold=32768" -! CHECK-59000: "{{.*}}flang-new" "-fc1" +! CHECK-59000: "{{.*}}flang" "-fc1" ! CHECK-59000-SAME: "-mlarge-data-threshold=59000" -! CHECK-1M: "{{.*}}flang-new" "-fc1" +! CHECK-1M: "{{.*}}flang" "-fc1" ! CHECK-1M-SAME: "-mlarge-data-threshold=1048576" ! NO-MCMODEL: 'mlarge-data-threshold=' only applies to medium and large code models ! INVALID: error: invalid value 'nonsense' in '-mlarge-data-threshold=' diff --git a/flang/test/Driver/lto-flags.f90 b/flang/test/Driver/lto-flags.f90 index a51febc7009691..bad3d972e6bd6b 100644 --- a/flang/test/Driver/lto-flags.f90 +++ b/flang/test/Driver/lto-flags.f90 @@ -30,7 +30,7 @@ ! FULL-LTO: "-fc1" ! FULL-LTO-SAME: "-flto=full" -! THIN-LTO-ALL: flang-new: warning: the option '-flto=thin' is a work in progress +! THIN-LTO-ALL: flang: warning: the option '-flto=thin' is a work in progress ! THIN-LTO-ALL: "-fc1" ! THIN-LTO-ALL-SAME: "-flto=thin" ! THIN-LTO-LINKER-PLUGIN: "-plugin-opt=thinlto" diff --git a/flang/test/Driver/macro-def-undef.F90 b/flang/test/Driver/macro-def-undef.F90 index 1332c6d6c02708..b13a9040833dbf 100644 --- a/flang/test/Driver/macro-def-undef.F90 +++ b/flang/test/Driver/macro-def-undef.F90 @@ -1,14 +1,14 @@ ! Ensure arguments -D and -U work as expected. !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: %flang -E -P %s 2>&1 | FileCheck %s --check-prefix=UNDEFINED ! RUN: %flang -E -P -DX=A %s 2>&1 | FileCheck %s --check-prefix=DEFINED ! RUN: %flang -E -P -DX=A -UX %s 2>&1 | FileCheck %s --check-prefix=UNDEFINED !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -E -P %s 2>&1 | FileCheck %s --check-prefix=UNDEFINED ! RUN: %flang_fc1 -E -P -DX=A %s 2>&1 | FileCheck %s --check-prefix=DEFINED diff --git a/flang/test/Driver/missing-input.f90 b/flang/test/Driver/missing-input.f90 index 236325e3578f1d..51d37a718c542f 100644 --- a/flang/test/Driver/missing-input.f90 +++ b/flang/test/Driver/missing-input.f90 @@ -1,26 +1,26 @@ ! Test the behaviour of the driver when input is missing or is invalid. Note -! that with the compiler driver (flang-new), the input _has_ to be specified. +! that with the compiler driver (flang), the input _has_ to be specified. ! Indeed, the driver decides what "job/command" to create based on the input ! file's extension. No input file means that it doesn't know what to do -! (compile? preprocess? link?). The frontend driver (flang-new -fc1) simply +! (compile? preprocess? link?). The frontend driver (flang -fc1) simply ! assumes that "no explicit input == read from stdin" !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: not %flang 2>&1 | FileCheck %s --check-prefix=FLANG-NO-FILE ! RUN: not %flang %t.f90 2>&1 | FileCheck %s --check-prefix=FLANG-NONEXISTENT-FILE !----------------------------------------- -! FLANG FRONTEND DRIVER (flang-new -fc1) +! FLANG FRONTEND DRIVER (flang -fc1) !----------------------------------------- ! RUN: not %flang_fc1 %t.f90 2>&1 | FileCheck %s --check-prefix=FLANG-FC1-NONEXISTENT-FILE ! RUN: not %flang_fc1 %S 2>&1 | FileCheck %s --check-prefix=FLANG-FC1-DIR -! FLANG-NO-FILE: flang-new: error: no input files +! FLANG-NO-FILE: flang: error: no input files -! FLANG-NONEXISTENT-FILE: flang-new: error: no such file or directory: {{.*}} -! FLANG-NONEXISTENT-FILE: flang-new: error: no input files +! FLANG-NONEXISTENT-FILE: flang: error: no such file or directory: {{.*}} +! FLANG-NONEXISTENT-FILE: flang: error: no input files ! FLANG-FC1-NONEXISTENT-FILE: error: {{.*}} does not exist ! FLANG-FC1-DIR: error: {{.*}} is not a regular file diff --git a/flang/test/Driver/multiple-input-files.f90 b/flang/test/Driver/multiple-input-files.f90 index 6c86f23f2b21fa..64ec8679abf94f 100644 --- a/flang/test/Driver/multiple-input-files.f90 +++ b/flang/test/Driver/multiple-input-files.f90 @@ -39,7 +39,7 @@ ! FLANG-NEXT:end program hello ! TEST 2: `-o` does not when multiple input files are present -! ERROR: flang-new: error: cannot specify -o when generating multiple output files +! ERROR: flang: error: cannot specify -o when generating multiple output files ! TEST 3: The output file _was not_ specified - `flang_fc1` will process all ! input files and generate one output file for every input file. diff --git a/flang/test/Driver/omp-driver-offload.f90 b/flang/test/Driver/omp-driver-offload.f90 index b0b94ab1386a74..7c51656f0001af 100644 --- a/flang/test/Driver/omp-driver-offload.f90 +++ b/flang/test/Driver/omp-driver-offload.f90 @@ -1,6 +1,6 @@ -! Test that flang-new OpenMP and OpenMP offload related +! Test that flang OpenMP and OpenMP offload related ! commands forward or expand to the appropriate commands -! for flang-new -fc1 as expected. Assumes a gfx90a, aarch64, +! for flang -fc1 as expected. Assumes a gfx90a, aarch64, ! and sm_70 architecture, but doesn't require one to be ! installed or compiled for, just testing the appropriate ! generation of jobs are created with the correct @@ -8,8 +8,8 @@ ! Test regular -fopenmp with no offload ! RUN: %flang -### -fopenmp %s 2>&1 | FileCheck --check-prefixes=CHECK-OPENMP %s -! CHECK-OPENMP: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}}.f90" -! CHECK-OPENMP-NOT: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" {{.*}}.f90" +! CHECK-OPENMP: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}}.f90" +! CHECK-OPENMP-NOT: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" {{.*}}.f90" ! Test regular -fopenmp with offload, and invocation filtering options ! RUN: %flang -S -### %s -o %t 2>&1 \ @@ -22,47 +22,47 @@ ! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\ ! RUN: | FileCheck %s --check-prefix=OFFLOAD-HOST-AND-DEVICE -! OFFLOAD-HOST-AND-DEVICE: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" -! OFFLOAD-HOST-AND-DEVICE-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" -! OFFLOAD-HOST-AND-DEVICE-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "nvptx64-nvidia-cuda" -! OFFLOAD-HOST-AND-DEVICE: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-HOST-AND-DEVICE: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-HOST-AND-DEVICE-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! OFFLOAD-HOST-AND-DEVICE-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "nvptx64-nvidia-cuda" +! OFFLOAD-HOST-AND-DEVICE: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! RUN: %flang -S -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a --offload-arch=sm_70 --offload-host-only \ ! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\ ! RUN: | FileCheck %s --check-prefix=OFFLOAD-HOST -! OFFLOAD-HOST: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-HOST: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! OFFLOAD-HOST-NOT: "-triple" "amdgcn-amd-amdhsa" ! OFFLOAD-HOST-NOT: "-triple" "nvptx64-nvidia-cuda" -! OFFLOAD-HOST-NOT: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-HOST-NOT: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! RUN: %flang -S -### %s 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a --offload-arch=sm_70 --offload-device-only \ ! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\ ! RUN: | FileCheck %s --check-prefix=OFFLOAD-DEVICE -! OFFLOAD-DEVICE: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" -! OFFLOAD-DEVICE-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" -! OFFLOAD-DEVICE-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "nvptx64-nvidia-cuda" -! OFFLOAD-DEVICE-NOT: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-DEVICE: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-DEVICE-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! OFFLOAD-DEVICE-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "nvptx64-nvidia-cuda" +! OFFLOAD-DEVICE-NOT: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! Test regular -fopenmp with offload for basic fopenmp-is-target-device flag addition and correct fopenmp ! RUN: %flang -### -fopenmp --offload-arch=gfx90a -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib %s 2>&1 | FileCheck --check-prefixes=CHECK-OPENMP-IS-TARGET-DEVICE %s -! CHECK-OPENMP-IS-TARGET-DEVICE: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" {{.*}}.f90" +! CHECK-OPENMP-IS-TARGET-DEVICE: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" {{.*}}.f90" ! Testing appropriate flags are gnerated and appropriately assigned by the driver when offloading ! RUN: %flang -S -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ ! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\ ! RUN: | FileCheck %s --check-prefix=OPENMP-OFFLOAD-ARGS -! OPENMP-OFFLOAD-ARGS: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" {{.*}} "-fopenmp" {{.*}}.f90" -! OPENMP-OFFLOAD-ARGS-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! OPENMP-OFFLOAD-ARGS: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" {{.*}} "-fopenmp" {{.*}}.f90" +! OPENMP-OFFLOAD-ARGS-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" ! OPENMP-OFFLOAD-ARGS-SAME: "-fopenmp" ! OPENMP-OFFLOAD-ARGS-SAME: "-fopenmp-host-ir-file-path" "{{.*}}.bc" "-fopenmp-is-target-device" ! OPENMP-OFFLOAD-ARGS-SAME: {{.*}}.f90" ! OPENMP-OFFLOAD-ARGS: "{{[^"]*}}clang-offload-packager{{.*}}" {{.*}} "--image=file={{.*}}.bc,triple=amdgcn-amd-amdhsa,arch=gfx90a,kind=openmp" -! OPENMP-OFFLOAD-ARGS-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OPENMP-OFFLOAD-ARGS-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! OPENMP-OFFLOAD-ARGS-SAME: "-fopenmp" ! OPENMP-OFFLOAD-ARGS-SAME: "-fembed-offload-object={{.*}}.out" {{.*}}.bc" @@ -77,7 +77,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-assume-threads-oversubscription \ ! RUN: | FileCheck %s --check-prefixes=CHECK-THREADS-OVS -! CHECK-THREADS-OVS: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-threads-oversubscription" {{.*}}.f90" +! CHECK-THREADS-OVS: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-threads-oversubscription" {{.*}}.f90" ! RUN: %flang -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -89,7 +89,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-assume-teams-oversubscription \ ! RUN: | FileCheck %s --check-prefixes=CHECK-TEAMS-OVS -! CHECK-TEAMS-OVS: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-teams-oversubscription" {{.*}}.f90" +! CHECK-TEAMS-OVS: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-teams-oversubscription" {{.*}}.f90" ! RUN: %flang -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -101,7 +101,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-assume-no-nested-parallelism \ ! RUN: | FileCheck %s --check-prefixes=CHECK-NEST-PAR -! CHECK-NEST-PAR: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-no-nested-parallelism" {{.*}}.f90" +! CHECK-NEST-PAR: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-no-nested-parallelism" {{.*}}.f90" ! RUN: %flang -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -113,7 +113,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-assume-no-thread-state \ ! RUN: | FileCheck %s --check-prefixes=CHECK-THREAD-STATE -! CHECK-THREAD-STATE: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-no-thread-state" {{.*}}.f90" +! CHECK-THREAD-STATE: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-no-thread-state" {{.*}}.f90" ! RUN: %flang -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -125,7 +125,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-target-debug \ ! RUN: | FileCheck %s --check-prefixes=CHECK-TARGET-DEBUG -! CHECK-TARGET-DEBUG: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug" {{.*}}.f90" +! CHECK-TARGET-DEBUG: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug" {{.*}}.f90" ! RUN: %flang -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -137,7 +137,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-target-debug \ ! RUN: | FileCheck %s --check-prefixes=CHECK-TARGET-DEBUG -! CHECK-TARGET-DEBUG-EQ: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug=111" {{.*}}.f90" +! CHECK-TARGET-DEBUG-EQ: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug=111" {{.*}}.f90" ! RUN: %flang -S -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -153,7 +153,7 @@ ! RUN: -fopenmp-assume-teams-oversubscription -fopenmp-assume-no-nested-parallelism \ ! RUN: -fopenmp-assume-no-thread-state \ ! RUN: | FileCheck %s --check-prefixes=CHECK-RTL-ALL -! CHECK-RTL-ALL: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug" "-fopenmp-assume-teams-oversubscription" +! CHECK-RTL-ALL: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug" "-fopenmp-assume-teams-oversubscription" ! CHECK-RTL-ALL: "-fopenmp-assume-threads-oversubscription" "-fopenmp-assume-no-thread-state" "-fopenmp-assume-no-nested-parallelism" ! CHECK-RTL-ALL: {{.*}}.f90" @@ -167,7 +167,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-version=45 \ ! RUN: | FileCheck %s --check-prefixes=CHECK-OPENMP-VERSION -! CHECK-OPENMP-VERSION: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" "-fopenmp-version=45" {{.*}}.f90" +! CHECK-OPENMP-VERSION: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" "-fopenmp-version=45" {{.*}}.f90" ! Test diagnostic error when host IR file is non-existent ! RUN: not %flang_fc1 %s -o %t 2>&1 -fopenmp -fopenmp-is-target-device \ @@ -187,7 +187,7 @@ ! RUN: --target=aarch64-unknown-linux-gnu \ ! RUN: | FileCheck %s --check-prefix=FORCE-USM-NO-OFFLOAD -! FORCE-USM-NO-OFFLOAD: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! FORCE-USM-NO-OFFLOAD: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! FORCE-USM-NO-OFFLOAD-SAME: "-fopenmp" "-fopenmp-force-usm" ! Test -fopenmp-force-usm option with offload @@ -196,16 +196,16 @@ ! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\ ! RUN: | FileCheck %s --check-prefix=FORCE-USM-OFFLOAD -! FORCE-USM-OFFLOAD: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! FORCE-USM-OFFLOAD: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! FORCE-USM-OFFLOAD-SAME: "-fopenmp" "-fopenmp-force-usm" -! FORCE-USM-OFFLOAD-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! FORCE-USM-OFFLOAD-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" ! FORCE-USM-OFFLOAD-SAME: "-fopenmp" "-fopenmp-force-usm" ! RUN: %flang -### -v --target=x86_64-unknown-linux-gnu -fopenmp \ ! RUN: --offload-arch=gfx900 \ ! RUN: --rocm-path=%S/Inputs/rocm %s 2>&1 \ ! RUN: | FileCheck --check-prefix=MLINK-BUILTIN-BITCODE %s -! MLINK-BUILTIN-BITCODE: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! MLINK-BUILTIN-BITCODE: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" ! MLINK-BUILTIN-BITCODE-SAME: "-mlink-builtin-bitcode" {{.*Inputs.*rocm.*amdgcn.*bitcode.*}}oclc_isa_version_900.bc ! Test that the -fopenmp-targets option is added to host compilation invocations @@ -219,9 +219,9 @@ ! RUN: --target=x86_64-unknown-linux-gnu -nogpulib \ ! RUN: | FileCheck %s --check-prefix=OFFLOAD-TARGETS -! OFFLOAD-TARGETS: "{{[^"]*}}flang-new" "-fc1" "-triple" "x86_64-unknown-linux-gnu" +! OFFLOAD-TARGETS: "{{[^"]*}}flang" "-fc1" "-triple" "x86_64-unknown-linux-gnu" ! OFFLOAD-TARGETS-SAME: "-fopenmp-targets=amdgcn-amd-amdhsa" -! OFFLOAD-TARGETS-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! OFFLOAD-TARGETS-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" ! OFFLOAD-TARGETS-NOT: -fopenmp-targets -! OFFLOAD-TARGETS: "{{[^"]*}}flang-new" "-fc1" "-triple" "x86_64-unknown-linux-gnu" +! OFFLOAD-TARGETS: "{{[^"]*}}flang" "-fc1" "-triple" "x86_64-unknown-linux-gnu" ! OFFLOAD-TARGETS-SAME: "-fopenmp-targets=amdgcn-amd-amdhsa" diff --git a/flang/test/Driver/predefined-macros-compiler-version.F90 b/flang/test/Driver/predefined-macros-compiler-version.F90 index 823a730f96845a..f6924479281562 100644 --- a/flang/test/Driver/predefined-macros-compiler-version.F90 +++ b/flang/test/Driver/predefined-macros-compiler-version.F90 @@ -1,12 +1,12 @@ ! Check that the driver correctly defines macros with the compiler version !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: %flang_fc1 -E %s 2>&1 | FileCheck %s --ignore-case !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -E %s 2>&1 | FileCheck %s --ignore-case diff --git a/flang/test/Driver/std2018-wrong.f90 b/flang/test/Driver/std2018-wrong.f90 index 27ccc76bd39aad..93ba153d75f7f9 100644 --- a/flang/test/Driver/std2018-wrong.f90 +++ b/flang/test/Driver/std2018-wrong.f90 @@ -1,7 +1,7 @@ ! Ensure argument -std=f2018 works as expected. !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: not %flang_fc1 -std=90 %s 2>&1 | FileCheck %s --check-prefix=WRONG diff --git a/flang/test/Driver/std2018.f90 b/flang/test/Driver/std2018.f90 index cf461cf89e4e19..1727f92127b711 100644 --- a/flang/test/Driver/std2018.f90 +++ b/flang/test/Driver/std2018.f90 @@ -1,7 +1,7 @@ ! Ensure argument -std=f2018 works as expected. !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -fsyntax-only %s 2>&1 | FileCheck %s --allow-empty --check-prefix=WITHOUT ! RUN: %flang_fc1 -fsyntax-only -std=f2018 %s 2>&1 | FileCheck %s --check-prefix=GIVEN diff --git a/flang/test/Driver/supported-suffices/f03-suffix.f03 b/flang/test/Driver/supported-suffices/f03-suffix.f03 index 6e03f9f43fc602..1d850305cd040e 100644 --- a/flang/test/Driver/supported-suffices/f03-suffix.f03 +++ b/flang/test/Driver/supported-suffices/f03-suffix.f03 @@ -1,5 +1,5 @@ ! RUN: %flang -### %s 2>&1 | FileCheck %s -! CHECK: "{{.*}}flang-new" "-fc1" {{.*}} "-o" "{{.*}}.o" +! CHECK: "{{.*}}flang" "-fc1" {{.*}} "-o" "{{.*}}.o" program f03 end program f03 diff --git a/flang/test/Driver/supported-suffices/f08-suffix.f08 b/flang/test/Driver/supported-suffices/f08-suffix.f08 index d5bcf4ce1de1cc..2b31e4c21876ae 100644 --- a/flang/test/Driver/supported-suffices/f08-suffix.f08 +++ b/flang/test/Driver/supported-suffices/f08-suffix.f08 @@ -1,5 +1,5 @@ ! RUN: %flang -### %s 2>&1 | FileCheck %s -! CHECK: "{{.*}}flang-new" "-fc1" {{.*}} "-o" "{{.*}}.o" +! CHECK: "{{.*}}flang" "-fc1" {{.*}} "-o" "{{.*}}.o" program f08 end program f08 diff --git a/flang/test/Driver/use-module-error.f90 b/flang/test/Driver/use-module-error.f90 index 42d6650621c8c8..67335f61626817 100644 --- a/flang/test/Driver/use-module-error.f90 +++ b/flang/test/Driver/use-module-error.f90 @@ -1,14 +1,14 @@ ! Ensure that multiple module directories are not allowed !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: not %flang -fsyntax-only -J %S/Inputs/module-dir -J %S/Inputs/ %s 2>&1 | FileCheck %s --check-prefix=DOUBLEINCLUDE ! RUN: not %flang -fsyntax-only -J %S/Inputs/module-dir -module-dir %S/Inputs/ %s 2>&1 | FileCheck %s --check-prefix=DOUBLEINCLUDE ! RUN: not %flang -fsyntax-only -module-dir %S/Inputs/module-dir -J%S/Inputs/ %s 2>&1 | FileCheck %s --check-prefix=DOUBLEINCLUDE !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: not %flang_fc1 -fsyntax-only -J %S/Inputs/module-dir -J %S/Inputs/ %s 2>&1 | FileCheck %s --check-prefix=DOUBLEINCLUDE ! RUN: not %flang_fc1 -fsyntax-only -J %S/Inputs/module-dir -module-dir %S/Inputs/ %s 2>&1 | FileCheck %s --check-prefix=DOUBLEINCLUDE diff --git a/flang/test/Driver/use-module.f90 b/flang/test/Driver/use-module.f90 index 775c0424715883..2c3a38043fe16e 100644 --- a/flang/test/Driver/use-module.f90 +++ b/flang/test/Driver/use-module.f90 @@ -1,7 +1,7 @@ ! Checks that module search directories specified with `-J/-module-dir` and `-I` are handled correctly !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: %flang -fsyntax-only -I %S/Inputs -I %S/Inputs/module-dir %s 2>&1 | FileCheck %s --check-prefix=INCLUDED --allow-empty ! RUN: %flang -fsyntax-only -I %S/Inputs -J %S/Inputs/module-dir %s 2>&1 | FileCheck %s --check-prefix=INCLUDED --allow-empty @@ -16,7 +16,7 @@ ! RUN: not %flang -fsyntax-only -module-dir %S/Inputs/module-dir %s 2>&1 | FileCheck %s --check-prefix=SINGLEINCLUDE !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -fsyntax-only -I %S/Inputs -I %S/Inputs/module-dir %s 2>&1 | FileCheck %s --check-prefix=INCLUDED --allow-empty ! RUN: %flang_fc1 -fsyntax-only -I %S/Inputs -J %S/Inputs/module-dir %s 2>&1 | FileCheck %s --check-prefix=INCLUDED --allow-empty diff --git a/flang/test/Driver/version-loops.f90 b/flang/test/Driver/version-loops.f90 index b0fa01d572512a..d206393a04f486 100644 --- a/flang/test/Driver/version-loops.f90 +++ b/flang/test/Driver/version-loops.f90 @@ -1,5 +1,5 @@ -! Test that flang-new forwards the -f{no-,}version-loops-for-stride -! options correctly to flang-new -fc1 for different variants of optimisation +! Test that flang forwards the -f{no-,}version-loops-for-stride +! options correctly to flang -fc1 for different variants of optimisation ! and explicit flags. ! RUN: %flang -### %s -o %t 2>&1 -O3 \ @@ -23,32 +23,32 @@ ! RUN: %flang -### %s -o %t 2>&1 -O3 -fno-version-loops-for-stride \ ! RUN: | FileCheck %s --check-prefix=CHECK-O3-no -! CHECK: "{{.*}}flang-new" "-fc1" +! CHECK: "{{.*}}flang" "-fc1" ! CHECK-SAME: "-fversion-loops-for-stride" ! CHECK-SAME: "-O3" -! CHECK-O2: "{{.*}}flang-new" "-fc1" +! CHECK-O2: "{{.*}}flang" "-fc1" ! CHECK-O2-NOT: "-fversion-loops-for-stride" ! CHECK-O2-SAME: "-O2" -! CHECK-O2-with: "{{.*}}flang-new" "-fc1" +! CHECK-O2-with: "{{.*}}flang" "-fc1" ! CHECK-O2-with-SAME: "-fversion-loops-for-stride" ! CHECK-O2-with-SAME: "-O2" -! CHECK-O4: "{{.*}}flang-new" "-fc1" +! CHECK-O4: "{{.*}}flang" "-fc1" ! CHECK-O4-SAME: "-fversion-loops-for-stride" ! CHECK-O4-SAME: "-O3" -! CHECK-Ofast: "{{.*}}flang-new" "-fc1" +! CHECK-Ofast: "{{.*}}flang" "-fc1" ! CHECK-Ofast-SAME: "-ffast-math" ! CHECK-Ofast-SAME: "-fversion-loops-for-stride" ! CHECK-Ofast-SAME: "-O3" -! CHECK-Ofast-no: "{{.*}}flang-new" "-fc1" +! CHECK-Ofast-no: "{{.*}}flang" "-fc1" ! CHECK-Ofast-no-SAME: "-ffast-math" ! CHECK-Ofast-no-NOT: "-fversion-loops-for-stride" ! CHECK-Ofast-no-SAME: "-O3" -! CHECK-O3-no: "{{.*}}flang-new" "-fc1" +! CHECK-O3-no: "{{.*}}flang" "-fc1" ! CHECK-O3-no-NOT: "-fversion-loops-for-stride" ! CHECK-O3-no-SAME: "-O3" diff --git a/flang/test/Driver/wextra-ok.f90 b/flang/test/Driver/wextra-ok.f90 index 6a38d9481a36b7..441029aa0af276 100644 --- a/flang/test/Driver/wextra-ok.f90 +++ b/flang/test/Driver/wextra-ok.f90 @@ -1,4 +1,4 @@ -! Ensure that supplying -Wextra into flang-new does not raise error +! Ensure that supplying -Wextra into flang does not raise error ! The first check should be changed if -Wextra is implemented ! RUN: %flang -std=f2018 -Wextra %s -c 2>&1 | FileCheck %s --check-prefix=CHECK-OK diff --git a/flang/test/HLFIR/hlfir-flags.f90 b/flang/test/HLFIR/hlfir-flags.f90 index b383a79d12c27b..0b1e80b1e3f636 100644 --- a/flang/test/HLFIR/hlfir-flags.f90 +++ b/flang/test/HLFIR/hlfir-flags.f90 @@ -1,4 +1,4 @@ -! Test -flang-deprecated-hlfir, -flang-experimental-hlfir (flang-new), and +! Test -flang-deprecated-hlfir, -flang-experimental-hlfir (flang), and ! -hlfir (bbc), -emit-hlfir, -emit-fir flags ! RUN: %flang_fc1 -emit-hlfir -o - %s | FileCheck --check-prefix HLFIR --check-prefix ALL %s ! RUN: bbc -emit-hlfir -o - %s | FileCheck --check-prefix HLFIR --check-prefix ALL %s diff --git a/flang/test/Lower/Intrinsics/command_argument_count.f90 b/flang/test/Lower/Intrinsics/command_argument_count.f90 index 0cf92d4444db98..a30b27d664fc0c 100644 --- a/flang/test/Lower/Intrinsics/command_argument_count.f90 +++ b/flang/test/Lower/Intrinsics/command_argument_count.f90 @@ -1,6 +1,6 @@ ! RUN: bbc -emit-fir %s -o - | FileCheck %s -! bbc doesn't have a way to set the default kinds so we use flang-new driver -! RUN: flang-new -fc1 -fdefault-integer-8 -emit-fir %s -o - | FileCheck --check-prefixes=CHECK,CHECK-64 %s +! bbc doesn't have a way to set the default kinds so we use flang driver +! RUN: flang -fc1 -fdefault-integer-8 -emit-fir %s -o - | FileCheck --check-prefixes=CHECK,CHECK-64 %s ! CHECK-LABEL: argument_count_test subroutine argument_count_test() diff --git a/flang/test/Lower/Intrinsics/exit.f90 b/flang/test/Lower/Intrinsics/exit.f90 index c3110fcbec2b5a..bd551f7318a84a 100644 --- a/flang/test/Lower/Intrinsics/exit.f90 +++ b/flang/test/Lower/Intrinsics/exit.f90 @@ -1,5 +1,5 @@ ! RUN: bbc -emit-fir -hlfir=false %s -o - | FileCheck --check-prefixes=CHECK,CHECK-32 -DDEFAULT_INTEGER_SIZE=32 %s -! bbc doesn't have a way to set the default kinds so we use flang-new driver +! bbc doesn't have a way to set the default kinds so we use flang driver ! RUN: %flang_fc1 -fdefault-integer-8 -emit-fir -flang-deprecated-no-hlfir %s -o - | FileCheck --check-prefixes=CHECK,CHECK-64 -DDEFAULT_INTEGER_SIZE=64 %s ! CHECK-LABEL: func @_QPexit_test1() { diff --git a/flang/test/Lower/Intrinsics/ieee_is_normal.f90 b/flang/test/Lower/Intrinsics/ieee_is_normal.f90 index f9ab01881d250d..9b864c9a9849c3 100644 --- a/flang/test/Lower/Intrinsics/ieee_is_normal.f90 +++ b/flang/test/Lower/Intrinsics/ieee_is_normal.f90 @@ -1,5 +1,5 @@ ! RUN: bbc -emit-fir %s -o - | FileCheck %s -! RUN: flang-new -fc1 -emit-fir %s -o - | FileCheck %s +! RUN: flang -fc1 -emit-fir %s -o - | FileCheck %s ! CHECK-LABEL: ieee_is_normal_f16 subroutine ieee_is_normal_f16(r) diff --git a/flang/test/Lower/Intrinsics/isnan.f90 b/flang/test/Lower/Intrinsics/isnan.f90 index 700b2d1a67c656..62b98c8ea98bee 100644 --- a/flang/test/Lower/Intrinsics/isnan.f90 +++ b/flang/test/Lower/Intrinsics/isnan.f90 @@ -1,5 +1,5 @@ ! RUN: bbc -emit-fir %s -o - | FileCheck %s -! RUN: flang-new -fc1 -emit-fir %s -o - | FileCheck %s +! RUN: flang -fc1 -emit-fir %s -o - | FileCheck %s ! CHECK-LABEL: isnan_f32 subroutine isnan_f32(r) diff --git a/flang/test/Lower/Intrinsics/modulo.f90 b/flang/test/Lower/Intrinsics/modulo.f90 index ac18e59033a6b6..781ef8296a2b7d 100644 --- a/flang/test/Lower/Intrinsics/modulo.f90 +++ b/flang/test/Lower/Intrinsics/modulo.f90 @@ -1,5 +1,5 @@ ! RUN: bbc -emit-fir -hlfir=false %s -o - | FileCheck %s -check-prefixes=HONORINF,ALL -! RUN: flang-new -fc1 -menable-no-infs -emit-fir -flang-deprecated-no-hlfir %s -o - | FileCheck %s -check-prefixes=CHECK,ALL +! RUN: flang -fc1 -menable-no-infs -emit-fir -flang-deprecated-no-hlfir %s -o - | FileCheck %s -check-prefixes=CHECK,ALL ! ALL-LABEL: func @_QPmodulo_testr( ! ALL-SAME: %[[arg0:.*]]: !fir.ref{{.*}}, %[[arg1:.*]]: !fir.ref{{.*}}, %[[arg2:.*]]: !fir.ref{{.*}}) { diff --git a/flang/test/Lower/OpenMP/Todo/omp-declarative-allocate.f90 b/flang/test/Lower/OpenMP/Todo/omp-declarative-allocate.f90 index f02884e5e92f38..425ccbc5dd56c5 100644 --- a/flang/test/Lower/OpenMP/Todo/omp-declarative-allocate.f90 +++ b/flang/test/Lower/OpenMP/Todo/omp-declarative-allocate.f90 @@ -1,6 +1,6 @@ ! This test checks lowering of OpenMP allocate Directive. -// RUN: not flang-new -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s +// RUN: not flang -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s program main integer :: x, y diff --git a/flang/test/Lower/OpenMP/Todo/omp-declare-reduction.f90 b/flang/test/Lower/OpenMP/Todo/omp-declare-reduction.f90 index 3be61a1700ced3..7a7d28db8d6f5a 100644 --- a/flang/test/Lower/OpenMP/Todo/omp-declare-reduction.f90 +++ b/flang/test/Lower/OpenMP/Todo/omp-declare-reduction.f90 @@ -1,6 +1,6 @@ ! This test checks lowering of OpenMP declare reduction Directive. -// RUN: not flang-new -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s +// RUN: not flang -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s subroutine declare_red() integer :: my_var diff --git a/flang/test/Lower/OpenMP/Todo/omp-declare-simd.f90 b/flang/test/Lower/OpenMP/Todo/omp-declare-simd.f90 index c6a0a8f2cd0d22..be1ac2db5dfa4a 100644 --- a/flang/test/Lower/OpenMP/Todo/omp-declare-simd.f90 +++ b/flang/test/Lower/OpenMP/Todo/omp-declare-simd.f90 @@ -1,6 +1,6 @@ ! This test checks lowering of OpenMP declare simd Directive. -// RUN: not flang-new -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s +// RUN: not flang -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s subroutine sub(x, y) real, intent(inout) :: x, y diff --git a/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 b/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 index 62bc247a1456a1..bc5baf4e1cf604 100644 --- a/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 +++ b/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 @@ -1,7 +1,7 @@ ! This test checks lowering of `LASTPRIVATE` clause for scalar types. ! RUN: bbc -fopenmp -emit-hlfir %s -o - | FileCheck %s -! RUN: flang-new -fc1 -fopenmp -emit-hlfir %s -o - | FileCheck %s +! RUN: flang -fc1 -fopenmp -emit-hlfir %s -o - | FileCheck %s !CHECK: func @_QPlastprivate_character(%[[ARG1:.*]]: !fir.boxchar<1>{{.*}}) { !CHECK-DAG: %[[ARG1_UNBOX:.*]]:2 = fir.unboxchar diff --git a/flang/test/Lower/OpenMP/parallel-wsloop-reduction-byref.f90 b/flang/test/Lower/OpenMP/parallel-wsloop-reduction-byref.f90 index 32caac39778dee..99c521406a7775 100644 --- a/flang/test/Lower/OpenMP/parallel-wsloop-reduction-byref.f90 +++ b/flang/test/Lower/OpenMP/parallel-wsloop-reduction-byref.f90 @@ -1,7 +1,7 @@ ! Check that for parallel do, reduction is only processed for the loop ! RUN: bbc -fopenmp --force-byref-reduction -emit-hlfir %s -o - | FileCheck %s -! RUN: flang-new -fc1 -fopenmp -mmlir --force-byref-reduction -emit-hlfir %s -o - | FileCheck %s +! RUN: flang -fc1 -fopenmp -mmlir --force-byref-reduction -emit-hlfir %s -o - | FileCheck %s ! CHECK: omp.parallel { ! CHECK: omp.wsloop reduction(byref @add_reduction_byref_i32 diff --git a/flang/test/Lower/OpenMP/parallel-wsloop-reduction.f90 b/flang/test/Lower/OpenMP/parallel-wsloop-reduction.f90 index fdedbb06160761..cfeb5de83f4e82 100644 --- a/flang/test/Lower/OpenMP/parallel-wsloop-reduction.f90 +++ b/flang/test/Lower/OpenMP/parallel-wsloop-reduction.f90 @@ -1,7 +1,7 @@ ! Check that for parallel do, reduction is only processed for the loop ! RUN: bbc -fopenmp -emit-hlfir %s -o - | FileCheck %s -! RUN: flang-new -fc1 -fopenmp -emit-hlfir %s -o - | FileCheck %s +! RUN: flang -fc1 -fopenmp -emit-hlfir %s -o - | FileCheck %s ! CHECK: omp.parallel { ! CHECK: omp.wsloop reduction(@add_reduction_i32 diff --git a/flang/test/lit.cfg.py b/flang/test/lit.cfg.py index 4acbc0606d1977..f43234fb125b7e 100644 --- a/flang/test/lit.cfg.py +++ b/flang/test/lit.cfg.py @@ -132,13 +132,13 @@ tools = [ ToolSubst( "%flang", - command=FindTool("flang-new"), + command=FindTool("flang"), extra_args=isysroot_flag, unresolved="fatal", ), ToolSubst( "%flang_fc1", - command=FindTool("flang-new"), + command=FindTool("flang"), extra_args=["-fc1"], unresolved="fatal", ), diff --git a/flang/tools/f18/CMakeLists.txt b/flang/tools/f18/CMakeLists.txt index 9d7b8633958cb7..4362fcf0537616 100644 --- a/flang/tools/f18/CMakeLists.txt +++ b/flang/tools/f18/CMakeLists.txt @@ -55,7 +55,7 @@ endif() set(module_objects "") # Create module files directly from the top-level module source directory. -# If CMAKE_CROSSCOMPILING, then the newly built flang-new executable was +# If CMAKE_CROSSCOMPILING, then the newly built flang executable was # cross compiled, and thus can't be executed on the build system and thus # can't be used for generating module files. if (NOT CMAKE_CROSSCOMPILING) @@ -115,9 +115,9 @@ if (NOT CMAKE_CROSSCOMPILING) # TODO: We may need to flag this with conditional, in case Flang is built w/o OpenMP support add_custom_command(OUTPUT ${base}.mod ${object_output} COMMAND ${CMAKE_COMMAND} -E make_directory ${FLANG_INTRINSIC_MODULES_DIR} - COMMAND flang-new ${opts} ${decls} -cpp ${compile_with} -module-dir ${FLANG_INTRINSIC_MODULES_DIR} + COMMAND flang ${opts} ${decls} -cpp ${compile_with} -module-dir ${FLANG_INTRINSIC_MODULES_DIR} ${FLANG_SOURCE_DIR}/module/${filename}.f90 - DEPENDS flang-new ${FLANG_SOURCE_DIR}/module/${filename}.f90 ${FLANG_SOURCE_DIR}/module/__fortran_builtins.f90 ${depends} + DEPENDS flang ${FLANG_SOURCE_DIR}/module/${filename}.f90 ${FLANG_SOURCE_DIR}/module/__fortran_builtins.f90 ${depends} ) list(APPEND MODULE_FILES ${base}.mod) install(FILES ${base}.mod DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}/flang") @@ -142,9 +142,9 @@ if (NOT CMAKE_CROSSCOMPILING) set(base ${FLANG_INTRINSIC_MODULES_DIR}/omp_lib) add_custom_command(OUTPUT ${base}.mod ${base}_kinds.mod COMMAND ${CMAKE_COMMAND} -E make_directory ${FLANG_INTRINSIC_MODULES_DIR} - COMMAND flang-new -cpp -fsyntax-only ${opts} -module-dir ${FLANG_INTRINSIC_MODULES_DIR} + COMMAND flang -cpp -fsyntax-only ${opts} -module-dir ${FLANG_INTRINSIC_MODULES_DIR} ${CMAKE_BINARY_DIR}/projects/openmp/runtime/src/omp_lib.F90 - DEPENDS flang-new ${FLANG_INTRINSIC_MODULES_DIR}/iso_c_binding.mod ${CMAKE_BINARY_DIR}/projects/openmp/runtime/src/omp_lib.F90 ${depends} + DEPENDS flang ${FLANG_INTRINSIC_MODULES_DIR}/iso_c_binding.mod ${CMAKE_BINARY_DIR}/projects/openmp/runtime/src/omp_lib.F90 ${depends} ) add_custom_command(OUTPUT ${base}.f18.mod DEPENDS ${base}.mod diff --git a/flang/tools/flang-driver/CMakeLists.txt b/flang/tools/flang-driver/CMakeLists.txt index 9f33cdfe3fa90f..615c673374faf4 100644 --- a/flang/tools/flang-driver/CMakeLists.txt +++ b/flang/tools/flang-driver/CMakeLists.txt @@ -11,18 +11,18 @@ set( LLVM_LINK_COMPONENTS TargetParser ) -add_flang_tool(flang-new +add_flang_tool(flang driver.cpp fc1_main.cpp ) -target_link_libraries(flang-new +target_link_libraries(flang PRIVATE flangFrontend flangFrontendTool ) -clang_target_link_libraries(flang-new +clang_target_link_libraries(flang PRIVATE clangDriver clangBasic @@ -30,9 +30,9 @@ clang_target_link_libraries(flang-new option(FLANG_PLUGIN_SUPPORT "Build Flang with plugin support." ON) -# Enable support for plugins, which need access to symbols from flang-new +# Enable support for plugins, which need access to symbols from flang if(FLANG_PLUGIN_SUPPORT) - export_executable_symbols_for_plugins(flang-new) + export_executable_symbols_for_plugins(flang) endif() -install(TARGETS flang-new DESTINATION "${CMAKE_INSTALL_BINDIR}") +install(TARGETS flang DESTINATION "${CMAKE_INSTALL_BINDIR}") diff --git a/flang/tools/flang-driver/driver.cpp b/flang/tools/flang-driver/driver.cpp index 52136df10c0b02..603aab4205836c 100644 --- a/flang/tools/flang-driver/driver.cpp +++ b/flang/tools/flang-driver/driver.cpp @@ -95,7 +95,7 @@ int main(int argc, const char **argv) { llvm::StringSaver saver(a); ExpandResponseFiles(saver, args); - // Check if flang-new is in the frontend mode + // Check if flang is in the frontend mode auto firstArg = std::find_if(args.begin() + 1, args.end(), [](const char *a) { return a != nullptr; }); if (firstArg != args.end()) { @@ -104,7 +104,7 @@ int main(int argc, const char **argv) { << "Valid tools include '-fc1'.\n"; return 1; } - // Call flang-new frontend + // Call flang frontend if (llvm::StringRef(args[1]).starts_with("-fc1")) { return executeFC1Tool(args); } @@ -140,7 +140,7 @@ int main(int argc, const char **argv) { // Set the environment variable, FLANG_COMPILER_OPTIONS_STRING, to contain all // the compiler options. This is intended for the frontend driver, - // flang-new -fc1, to enable the implementation of the COMPILER_OPTIONS + // flang -fc1, to enable the implementation of the COMPILER_OPTIONS // intrinsic. To this end, the frontend driver requires the list of the // original compiler options, which is not available through other means. // TODO: This way of passing information between the compiler and frontend diff --git a/llvm/runtimes/CMakeLists.txt b/llvm/runtimes/CMakeLists.txt index d948b7eb39b39c..9da1f926817a8b 100644 --- a/llvm/runtimes/CMakeLists.txt +++ b/llvm/runtimes/CMakeLists.txt @@ -504,15 +504,15 @@ if(build_runtimes) if("openmp" IN_LIST LLVM_ENABLE_RUNTIMES) if (${LLVM_TOOL_FLANG_BUILD}) - message(STATUS "Configuring build of omp_lib.mod and omp_lib_kinds.mod via flang-new") - set(LIBOMP_FORTRAN_MODULES_COMPILER "${CMAKE_BINARY_DIR}/bin/flang-new") + message(STATUS "Configuring build of omp_lib.mod and omp_lib_kinds.mod via flang") + set(LIBOMP_FORTRAN_MODULES_COMPILER "${CMAKE_BINARY_DIR}/bin/flang") set(LIBOMP_MODULES_INSTALL_PATH "${CMAKE_INSTALL_INCLUDEDIR}/flang") # TODO: This is a workaround until flang becomes a first-class project - # in llvm/CMakeList.txt. Until then, this line ensures that flang-new is - # built before "openmp" is built as a runtime project. Besides "flang-new" + # in llvm/CMakeList.txt. Until then, this line ensures that flang is + # built before "openmp" is built as a runtime project. Besides "flang" # to build the compiler, we also need to add "module_files" to make sure # that all .mod files are also properly build. - list(APPEND extra_deps "flang-new" "module_files") + list(APPEND extra_deps "flang" "module_files") endif() foreach(dep opt llvm-link llvm-extract clang clang-offload-packager) if(TARGET ${dep}) diff --git a/offload/CMakeLists.txt b/offload/CMakeLists.txt index 9ffe8f56b76e67..9b771d1116ee38 100644 --- a/offload/CMakeLists.txt +++ b/offload/CMakeLists.txt @@ -89,9 +89,9 @@ else() # Check for flang if (NOT MSVC) - set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang-new) + set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang) else() - set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang-new.exe) + set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang.exe) endif() # Set fortran test compiler if flang is found diff --git a/openmp/CMakeLists.txt b/openmp/CMakeLists.txt index 3b4259dfa380e8..c206386fa6b614 100644 --- a/openmp/CMakeLists.txt +++ b/openmp/CMakeLists.txt @@ -69,9 +69,9 @@ else() # Check for flang if (NOT MSVC) - set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang-new) + set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang) else() - set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang-new.exe) + set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang.exe) endif() # Set fortran test compiler if flang is found >From b71c1d519cc61a751268b1ccda3fc59a966bab96 Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Thu, 26 Sep 2024 10:39:53 -0500 Subject: [PATCH 2/5] [flang][driver] restore flang-new as symlink Restore flang-new as a symlink to flang for backwards compatibility Co-authored-by: H. Vetinari Co-authored-by: Andrzej Warzynski --- clang/lib/Driver/ToolChain.cpp | 3 +++ flang/tools/flang-driver/CMakeLists.txt | 4 ++++ flang/tools/flang-driver/driver.cpp | 3 ++- 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp index 16f9b629fc538c..c9f3dbd7707b77 100644 --- a/clang/lib/Driver/ToolChain.cpp +++ b/clang/lib/Driver/ToolChain.cpp @@ -381,6 +381,9 @@ static const DriverSuffix *FindDriverSuffix(StringRef ProgName, size_t &Pos) { {"cl", "--driver-mode=cl"}, {"++", "--driver-mode=g++"}, {"flang", "--driver-mode=flang"}, + // For backwards compatibility, we create a symlink for `flang` called + // `flang-new`. This will be removed in the future. + {"flang-new", "--driver-mode=flang"}, {"clang-dxc", "--driver-mode=dxc"}, }; diff --git a/flang/tools/flang-driver/CMakeLists.txt b/flang/tools/flang-driver/CMakeLists.txt index 615c673374faf4..063acdd7dfe57c 100644 --- a/flang/tools/flang-driver/CMakeLists.txt +++ b/flang/tools/flang-driver/CMakeLists.txt @@ -36,3 +36,7 @@ if(FLANG_PLUGIN_SUPPORT) endif() install(TARGETS flang DESTINATION "${CMAKE_INSTALL_BINDIR}") + +# Keep "flang-new" as a symlink for backwards compatiblity. Remove once "flang" +# is a widely adopted name. +add_flang_symlink(flang-new flang) diff --git a/flang/tools/flang-driver/driver.cpp b/flang/tools/flang-driver/driver.cpp index 603aab4205836c..ed52988feaa59c 100644 --- a/flang/tools/flang-driver/driver.cpp +++ b/flang/tools/flang-driver/driver.cpp @@ -88,7 +88,8 @@ int main(int argc, const char **argv) { llvm::InitLLVM x(argc, argv); llvm::SmallVector args(argv, argv + argc); - clang::driver::ParsedClangName targetandMode("flang", "--driver-mode=flang"); + clang::driver::ParsedClangName targetandMode = + clang::driver::ToolChain::getTargetAndModeFromProgramName(argv[0]); std::string driverPath = getExecutablePath(args[0]); llvm::BumpPtrAllocator a; >From 443c951f8e0458e8b011424fad6a2e4b40b63144 Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Mon, 30 Sep 2024 10:16:59 -0500 Subject: [PATCH 3/5] [flang][driver] add version to flang executable --- flang/tools/flang-driver/CMakeLists.txt | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/flang/tools/flang-driver/CMakeLists.txt b/flang/tools/flang-driver/CMakeLists.txt index 063acdd7dfe57c..9a89a6185a3291 100644 --- a/flang/tools/flang-driver/CMakeLists.txt +++ b/flang/tools/flang-driver/CMakeLists.txt @@ -28,6 +28,12 @@ clang_target_link_libraries(flang clangBasic ) +# This creates the executable with a version appended +# and creates a symlink to it without the version +if(CYGWIN OR NOT WIN32) # but it doesn't work on Windows + set_target_properties(flang PROPERTIES VERSION ${FLANG_EXECUTABLE_VERSION}) +endif() + option(FLANG_PLUGIN_SUPPORT "Build Flang with plugin support." ON) # Enable support for plugins, which need access to symbols from flang >From 27ae40d86f235890d109ca88682dd0caba0d2c93 Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Thu, 3 Oct 2024 14:12:35 -0700 Subject: [PATCH 4/5] [flang][driver] add warning when using openmp --- clang/include/clang/Basic/DiagnosticDriverKinds.td | 3 +++ clang/include/clang/Basic/DiagnosticGroups.td | 4 ++++ clang/lib/Driver/ToolChains/Flang.cpp | 3 +++ 3 files changed, 10 insertions(+) diff --git a/clang/include/clang/Basic/DiagnosticDriverKinds.td b/clang/include/clang/Basic/DiagnosticDriverKinds.td index 97573fcf20c1fb..68722ad9633120 100644 --- a/clang/include/clang/Basic/DiagnosticDriverKinds.td +++ b/clang/include/clang/Basic/DiagnosticDriverKinds.td @@ -147,6 +147,9 @@ def warn_drv_unsupported_option_for_processor : Warning< def warn_drv_unsupported_openmp_library : Warning< "the library '%0=%1' is not supported, OpenMP will not be enabled">, InGroup; +def warn_openmp_experimental : Warning< + "OpenMP support in flang is still experimental">, + InGroup; def err_drv_invalid_thread_model_for_target : Error< "invalid thread model '%0' in '%1' for this target">; diff --git a/clang/include/clang/Basic/DiagnosticGroups.td b/clang/include/clang/Basic/DiagnosticGroups.td index 7d81bdf827ea0c..bfa065f018f8d8 100644 --- a/clang/include/clang/Basic/DiagnosticGroups.td +++ b/clang/include/clang/Basic/DiagnosticGroups.td @@ -1582,3 +1582,7 @@ def ExtractAPIMisuse : DiagGroup<"extractapi-misuse">; // Warnings about using the non-standard extension having an explicit specialization // with a storage class specifier. def ExplicitSpecializationStorageClass : DiagGroup<"explicit-specialization-storage-class">; + +// A warning for options that enable a feature that is not yet complete +def ExperimentalOption : DiagGroup<"experimental-option">; + diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp index 1ca12ff81389a3..19b43594b00815 100644 --- a/clang/lib/Driver/ToolChains/Flang.cpp +++ b/clang/lib/Driver/ToolChains/Flang.cpp @@ -787,6 +787,9 @@ void Flang::ConstructJob(Compilation &C, const JobAction &JA, if (Args.hasArg(options::OPT_fopenmp_force_usm)) CmdArgs.push_back("-fopenmp-force-usm"); + // TODO: OpenMP support isn't "done" yet, so for now we warn that it + // is experimental. + D.Diag(diag::warn_openmp_experimental); // FIXME: Clang supports a whole bunch more flags here. break; >From d8f95da5712a7d03a935c8b38f06d373c21f7a1f Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Fri, 4 Oct 2024 06:27:05 -0700 Subject: [PATCH 5/5] [flang][doc] update note about CMake support --- flang/docs/FlangDriver.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/flang/docs/FlangDriver.md b/flang/docs/FlangDriver.md index 47cf078cf2d0d4..23cbab30ee903e 100644 --- a/flang/docs/FlangDriver.md +++ b/flang/docs/FlangDriver.md @@ -335,7 +335,7 @@ just added using your new frontend option. ## CMake Support As of [#7246](https://gitlab.kitware.com/cmake/cmake/-/merge_requests/7246) -(and soon to be released CMake 3.24.0), `cmake` can detect `flang` as a +(CMake 3.28.0), `cmake` can detect `flang` as a supported Fortran compiler. You can configure your CMake projects to use `flang` as follows: ```bash From openmp-commits at lists.llvm.org Fri Oct 4 06:28:37 2024 From: openmp-commits at lists.llvm.org (Brad Richardson via Openmp-commits) Date: Fri, 04 Oct 2024 06:28:37 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <66ffed85.630a0220.37b37e.b13f@mx.google.com> ================ @@ -339,11 +335,11 @@ just added using your new frontend option. ## CMake Support As of [#7246](https://gitlab.kitware.com/cmake/cmake/-/merge_requests/7246) -(and soon to be released CMake 3.24.0), `cmake` can detect `flang-new` as a +(and soon to be released CMake 3.24.0), `cmake` can detect `flang` as a ---------------- everythingfunctional wrote: So, sounds like support is there in CMake, so I've updated the note and the identified version. https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Fri Oct 4 07:11:38 2024 From: openmp-commits at lists.llvm.org (Brad Richardson via Openmp-commits) Date: Fri, 04 Oct 2024 07:11:38 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <66fff79a.050a0220.86023.22d0@mx.google.com> https://github.com/everythingfunctional updated https://github.com/llvm/llvm-project/pull/110023 >From 649a73478c78389560042030a9717a05e8e338a8 Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Wed, 25 Sep 2024 13:25:22 -0500 Subject: [PATCH 1/6] [flang][driver] rename flang-new to flang --- .github/workflows/release-binaries.yml | 2 +- clang/include/clang/Driver/Options.td | 4 +- clang/lib/Driver/Driver.cpp | 2 +- clang/lib/Driver/ToolChains/Flang.cpp | 6 +- clang/test/Driver/flang/flang.f90 | 2 +- clang/test/Driver/flang/flang_ucase.F90 | 2 +- .../Driver/flang/multiple-inputs-mixed.f90 | 2 +- clang/test/Driver/flang/multiple-inputs.f90 | 4 +- flang/docs/FlangDriver.md | 76 +++++++++---------- flang/docs/ImplementingASemanticCheck.md | 4 +- flang/docs/Overview.md | 26 +++---- .../FlangOmpReport/FlangOmpReport.cpp | 2 +- .../flang/Optimizer/Analysis/AliasAnalysis.h | 2 +- flang/include/flang/Tools/CrossToolHelpers.h | 2 +- flang/lib/Frontend/CompilerInvocation.cpp | 6 +- flang/lib/Frontend/FrontendActions.cpp | 2 +- .../ExecuteCompilerInvocation.cpp | 3 +- flang/runtime/CMakeLists.txt | 6 +- flang/test/CMakeLists.txt | 2 +- flang/test/Driver/aarch64-outline-atomics.f90 | 2 +- .../Driver/color-diagnostics-forwarding.f90 | 4 +- flang/test/Driver/compiler-options.f90 | 4 +- flang/test/Driver/convert.f90 | 2 +- .../test/Driver/disable-ext-name-interop.f90 | 2 +- flang/test/Driver/driver-version.f90 | 4 +- flang/test/Driver/escaped-backslash.f90 | 4 +- flang/test/Driver/fdefault.f90 | 28 +++---- flang/test/Driver/flarge-sizes.f90 | 20 ++--- .../test/Driver/frame-pointer-forwarding.f90 | 2 +- flang/test/Driver/frontend-forwarding.f90 | 4 +- flang/test/Driver/hlfir-no-hlfir-error.f90 | 4 +- flang/test/Driver/intrinsic-module-path.f90 | 2 +- flang/test/Driver/large-data-threshold.f90 | 6 +- flang/test/Driver/lto-flags.f90 | 2 +- flang/test/Driver/macro-def-undef.F90 | 4 +- flang/test/Driver/missing-input.f90 | 14 ++-- flang/test/Driver/multiple-input-files.f90 | 2 +- flang/test/Driver/omp-driver-offload.f90 | 66 ++++++++-------- .../predefined-macros-compiler-version.F90 | 4 +- flang/test/Driver/std2018-wrong.f90 | 2 +- flang/test/Driver/std2018.f90 | 2 +- .../Driver/supported-suffices/f03-suffix.f03 | 2 +- .../Driver/supported-suffices/f08-suffix.f08 | 2 +- flang/test/Driver/use-module-error.f90 | 4 +- flang/test/Driver/use-module.f90 | 4 +- flang/test/Driver/version-loops.f90 | 18 ++--- flang/test/Driver/wextra-ok.f90 | 2 +- flang/test/HLFIR/hlfir-flags.f90 | 2 +- .../Intrinsics/command_argument_count.f90 | 4 +- flang/test/Lower/Intrinsics/exit.f90 | 2 +- .../test/Lower/Intrinsics/ieee_is_normal.f90 | 2 +- flang/test/Lower/Intrinsics/isnan.f90 | 2 +- flang/test/Lower/Intrinsics/modulo.f90 | 2 +- .../OpenMP/Todo/omp-declarative-allocate.f90 | 2 +- .../OpenMP/Todo/omp-declare-reduction.f90 | 2 +- .../Lower/OpenMP/Todo/omp-declare-simd.f90 | 2 +- .../parallel-lastprivate-clause-scalar.f90 | 2 +- .../parallel-wsloop-reduction-byref.f90 | 2 +- .../OpenMP/parallel-wsloop-reduction.f90 | 2 +- flang/test/lit.cfg.py | 4 +- flang/tools/f18/CMakeLists.txt | 10 +-- flang/tools/flang-driver/CMakeLists.txt | 12 +-- flang/tools/flang-driver/driver.cpp | 6 +- llvm/runtimes/CMakeLists.txt | 10 +-- offload/CMakeLists.txt | 4 +- openmp/CMakeLists.txt | 4 +- 66 files changed, 220 insertions(+), 227 deletions(-) diff --git a/.github/workflows/release-binaries.yml b/.github/workflows/release-binaries.yml index 925912df6843e4..6073ebac9e6c2c 100644 --- a/.github/workflows/release-binaries.yml +++ b/.github/workflows/release-binaries.yml @@ -328,7 +328,7 @@ jobs: run: | # Build some of the mlir tools that take a long time to link if [ "${{ needs.prepare.outputs.build-flang }}" = "true" ]; then - ninja -C ${{ steps.setup-stage.outputs.build-prefix }}/build/tools/clang/stage2-bins/ -j2 flang-new bbc + ninja -C ${{ steps.setup-stage.outputs.build-prefix }}/build/tools/clang/stage2-bins/ -j2 flang bbc fi ninja -C ${{ steps.setup-stage.outputs.build-prefix }}/build/tools/clang/stage2-bins/ \ mlir-bytecode-parser-fuzzer \ diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 932cf13edab53d..4a45a825da8fa1 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -6071,7 +6071,7 @@ def _sysroot_EQ : Joined<["--"], "sysroot=">, Visibility<[ClangOption, FlangOpti def _sysroot : Separate<["--"], "sysroot">, Alias<_sysroot_EQ>; //===----------------------------------------------------------------------===// -// pie/pic options (clang + flang-new) +// pie/pic options (clang + flang) //===----------------------------------------------------------------------===// let Visibility = [ClangOption, FlangOption] in { @@ -6087,7 +6087,7 @@ def fno_pie : Flag<["-"], "fno-pie">, Group; } // let Vis = [Default, FlangOption] //===----------------------------------------------------------------------===// -// Target Options (clang + flang-new) +// Target Options (clang + flang) //===----------------------------------------------------------------------===// let Flags = [TargetSpecific] in { let Visibility = [ClangOption, FlangOption] in { diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp index d0c8bdba0ede95..4243ee006c1553 100644 --- a/clang/lib/Driver/Driver.cpp +++ b/clang/lib/Driver/Driver.cpp @@ -2021,7 +2021,7 @@ void Driver::PrintHelp(bool ShowHidden) const { void Driver::PrintVersion(const Compilation &C, raw_ostream &OS) const { if (IsFlangMode()) { - OS << getClangToolFullVersion("flang-new") << '\n'; + OS << getClangToolFullVersion("flang") << '\n'; } else { // FIXME: The following handlers should use a callback mechanism, we don't // know what the client would like to do. diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp index 98350690f8d20e..1ca12ff81389a3 100644 --- a/clang/lib/Driver/ToolChains/Flang.cpp +++ b/clang/lib/Driver/ToolChains/Flang.cpp @@ -881,14 +881,12 @@ void Flang::ConstructJob(Compilation &C, const JobAction &JA, CmdArgs.push_back(Input.getFilename()); - // TODO: Replace flang-new with flang once the new driver replaces the - // throwaway driver - const char *Exec = Args.MakeArgString(D.GetProgramPath("flang-new", TC)); + const char *Exec = Args.MakeArgString(D.GetProgramPath("flang", TC)); C.addCommand(std::make_unique(JA, *this, ResponseFileSupport::AtFileUTF8(), Exec, CmdArgs, Inputs, Output)); } -Flang::Flang(const ToolChain &TC) : Tool("flang-new", "flang frontend", TC) {} +Flang::Flang(const ToolChain &TC) : Tool("flang", "flang frontend", TC) {} Flang::~Flang() {} diff --git a/clang/test/Driver/flang/flang.f90 b/clang/test/Driver/flang/flang.f90 index ad4a3a3b6bd44d..b52977ee66d7b0 100644 --- a/clang/test/Driver/flang/flang.f90 +++ b/clang/test/Driver/flang/flang.f90 @@ -13,7 +13,7 @@ ! * (no type specified, resulting in an object file) ! All invocations should begin with flang -fc1, consume up to here. -! ALL-LABEL: "{{[^"]*}}flang-new{{[^"/]*}}" "-fc1" +! ALL-LABEL: "{{[^"]*}}flang{{[^"/]*}}" "-fc1" ! Check that f90 files are not treated as "previously preprocessed" ! ... in --driver-mode=flang. diff --git a/clang/test/Driver/flang/flang_ucase.F90 b/clang/test/Driver/flang/flang_ucase.F90 index e89c053b327bc9..88aedc39fb94a7 100644 --- a/clang/test/Driver/flang/flang_ucase.F90 +++ b/clang/test/Driver/flang/flang_ucase.F90 @@ -13,7 +13,7 @@ ! * (no type specified, resulting in an object file) ! All invocations should begin with flang -fc1, consume up to here. -! ALL-LABEL: "{{[^"]*}}flang-new{{[^"/]*}}" "-fc1" +! ALL-LABEL: "{{[^"]*}}flang{{[^"/]*}}" "-fc1" ! Check that f90 files are not treated as "previously preprocessed" ! ... in --driver-mode=flang. diff --git a/clang/test/Driver/flang/multiple-inputs-mixed.f90 b/clang/test/Driver/flang/multiple-inputs-mixed.f90 index 2395dbecf1fe92..98d8cab00bdfdb 100644 --- a/clang/test/Driver/flang/multiple-inputs-mixed.f90 +++ b/clang/test/Driver/flang/multiple-inputs-mixed.f90 @@ -1,7 +1,7 @@ ! Check that flang can handle mixed C and fortran inputs. ! RUN: %clang --driver-mode=flang -### -fsyntax-only %S/Inputs/one.f90 %S/Inputs/other.c 2>&1 | FileCheck --check-prefixes=CHECK-SYNTAX-ONLY %s -! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang-new{{[^"/]*}}" "-fc1" +! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang{{[^"/]*}}" "-fc1" ! CHECK-SYNTAX-ONLY: "{{[^"]*}}/Inputs/one.f90" ! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}clang{{[^"/]*}}" "-cc1" ! CHECK-SYNTAX-ONLY: "{{[^"]*}}/Inputs/other.c" diff --git a/clang/test/Driver/flang/multiple-inputs.f90 b/clang/test/Driver/flang/multiple-inputs.f90 index ada999e927a6a0..3c0f22e5d3e508 100644 --- a/clang/test/Driver/flang/multiple-inputs.f90 +++ b/clang/test/Driver/flang/multiple-inputs.f90 @@ -1,7 +1,7 @@ ! Check that flang driver can handle multiple inputs at once. ! RUN: %clang --driver-mode=flang -### -fsyntax-only %S/Inputs/one.f90 %S/Inputs/two.f90 2>&1 | FileCheck --check-prefixes=CHECK-SYNTAX-ONLY %s -! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang-new{{[^"/]*}}" "-fc1" +! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang{{[^"/]*}}" "-fc1" ! CHECK-SYNTAX-ONLY: "{{[^"]*}}/Inputs/one.f90" -! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang-new{{[^"/]*}}" "-fc1" +! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang{{[^"/]*}}" "-fc1" ! CHECK-SYNTAX-ONLY: "{{[^"]*}}/Inputs/two.f90" diff --git a/flang/docs/FlangDriver.md b/flang/docs/FlangDriver.md index 815c26a28dfdfa..47cf078cf2d0d4 100644 --- a/flang/docs/FlangDriver.md +++ b/flang/docs/FlangDriver.md @@ -15,17 +15,13 @@ local: ``` There are two main drivers in Flang: -* the compiler driver, `flang-new` -* the frontend driver, `flang-new -fc1` - -> **_NOTE:_** The diagrams in this document refer to `flang` as opposed to -> `flang-new`. Eventually, `flang-new` will be renamed as `flang` and the -> diagrams reflect the final design that we are still working towards. +* the compiler driver, `flang` +* the frontend driver, `flang -fc1` The **compiler driver** will allow you to control all compilation phases (e.g. preprocessing, semantic checks, code-generation, code-optimisation, lowering and linking). For frontend specific tasks, the compiler driver creates a -Fortran compilation job and delegates it to `flang-new -fc1`, the frontend +Fortran compilation job and delegates it to `flang -fc1`, the frontend driver. For linking, it creates a linker job and calls an external linker (e.g. LLVM's [`lld`](https://lld.llvm.org/)). It can also call other tools such as external assemblers (e.g. [`as`](https://www.gnu.org/software/binutils/)). In @@ -47,7 +43,7 @@ frontend. It uses MLIR and LLVM for code-generation and can be viewed as a driver for Flang, LLVM and MLIR libraries. Contrary to the compiler driver, it is not capable of calling any external tools (including linkers). It is aware of all the frontend internals that are "hidden" from the compiler driver. It -accepts many frontend-specific options not available in `flang-new` and as such +accepts many frontend-specific options not available in `flang` and as such it provides a finer control over the frontend. Note that this tool is mostly intended for Flang developers. In particular, there are no guarantees about the stability of its interface and compiler developers can use it to experiment @@ -62,30 +58,30 @@ frontend specific flag from the _compiler_ directly to the _frontend_ driver, e.g.: ```bash -flang-new -Xflang -fdebug-dump-parse-tree input.f95 +flang -Xflang -fdebug-dump-parse-tree input.f95 ``` -In the invocation above, `-fdebug-dump-parse-tree` is forwarded to `flang-new +In the invocation above, `-fdebug-dump-parse-tree` is forwarded to `flang -fc1`. Without the forwarding flag, `-Xflang`, you would see the following warning: ```bash -flang-new: warning: argument unused during compilation: +flang: warning: argument unused during compilation: ``` -As `-fdebug-dump-parse-tree` is only supported by `flang-new -fc1`, `flang-new` +As `-fdebug-dump-parse-tree` is only supported by `flang -fc1`, `flang` will ignore it when used without `Xflang`. ## Why Do We Need Two Drivers? -As hinted above, `flang-new` and `flang-new -fc1` are two separate tools. The -fact that these tools are accessed through one binary, `flang-new`, is just an +As hinted above, `flang` and `flang -fc1` are two separate tools. The +fact that these tools are accessed through one binary, `flang`, is just an implementation detail. Each tool has a separate list of options, albeit defined in the same file: `clang/include/clang/Driver/Options.td`. The separation helps us split various tasks and allows us to implement more -specialised tools. In particular, `flang-new` is not aware of various +specialised tools. In particular, `flang` is not aware of various compilation phases within the frontend (e.g. scanning, parsing or semantic -checks). It does not have to be. Conversely, the frontend driver, `flang-new +checks). It does not have to be. Conversely, the frontend driver, `flang -fc1`, needs not to be concerned with linkers or other external tools like assemblers. Nor does it need to know where to look for various systems libraries, which is usually OS and platform specific. @@ -104,7 +100,7 @@ GCC](https://en.wikibooks.org/wiki/GNU_C_Compiler_Internals/GNU_C_Compiler_Archi In fact, Flang needs to adhere to this model in order to be able to re-use Clang's driver library. If you are more familiar with the [architecture of GFortran](https://gcc.gnu.org/onlinedocs/gcc-4.7.4/gfortran/About-GNU-Fortran.html) -than Clang, then `flang-new` corresponds to `gfortran` and `flang-new -fc1` to +than Clang, then `flang` corresponds to `gfortran` and `flang -fc1` to `f951`. ## Compiler Driver @@ -135,7 +131,7 @@ output from one action is the input for the subsequent one. You can use the `-ccc-print-phases` flag to see the sequence of actions that the driver will create for your compiler invocation: ```bash -flang-new -ccc-print-phases -E file.f +flang -ccc-print-phases -E file.f +- 0: input, "file.f", f95-cpp-input 1: preprocessor, {0}, f95 ``` @@ -143,7 +139,7 @@ As you can see, for `-E` the driver creates only two jobs and stops immediately after preprocessing. The first job simply prepares the input. For `-c`, the pipeline of the created jobs is more complex: ```bash -flang-new -ccc-print-phases -c file.f +flang -ccc-print-phases -c file.f +- 0: input, "file.f", f95-cpp-input +- 1: preprocessor, {0}, f95 +- 2: compiler, {1}, ir @@ -158,7 +154,7 @@ command to call the frontend driver is generated (more specifically, an instance of `clang::driver::Command`). Every command is bound to an instance of `clang::driver::Tool`. For Flang we introduced a specialisation of this class: `clang::driver::Flang`. This class implements the logic to either translate or -forward compiler options to the frontend driver, `flang-new -fc1`. +forward compiler options to the frontend driver, `flang -fc1`. You can read more on the design of `clangDriver` in Clang's [Driver Design & Internals](https://clang.llvm.org/docs/DriverInternals.html). @@ -232,12 +228,12 @@ driver, `clang -cc1` and consists of the following classes: This list is not exhaustive and only covers the main classes that implement the driver. The main entry point for the frontend driver, `fc1_main`, is implemented in `flang/tools/flang-driver/driver.cpp`. It can be accessed by -invoking the compiler driver, `flang-new`, with the `-fc1` flag. +invoking the compiler driver, `flang`, with the `-fc1` flag. The frontend driver will only run one action at a time. If you specify multiple action flags, only the last one will be taken into account. The default action is `ParseSyntaxOnlyAction`, which corresponds to `-fsyntax-only`. In other -words, `flang-new -fc1 ` is equivalent to `flang-new -fc1 -fsyntax-only +words, `flang -fc1 ` is equivalent to `flang -fc1 -fsyntax-only `. ## Adding new Compiler Options @@ -262,8 +258,8 @@ similar semantics to your new option and start by copying that. For every new option, you will also have to define the visibility of the new option. This is controlled through the `Visibility` field. You can use the following Flang specific visibility flags to control this: - * `FlangOption` - this option will be available in the `flang-new` compiler driver, - * `FC1Option` - this option will be available in the `flang-new -fc1` frontend driver, + * `FlangOption` - this option will be available in the `flang` compiler driver, + * `FC1Option` - this option will be available in the `flang -fc1` frontend driver, Options that are supported by clang should explicitly specify `ClangOption` in `Visibility`, and options that are only supported in Flang should not specify @@ -290,10 +286,10 @@ The parsing will depend on the semantics encoded in the TableGen definition. When adding a compiler driver option (i.e. an option that contains `FlangOption` among in it's `Visibility`) that you also intend to be understood -by the frontend, make sure that it is either forwarded to `flang-new -fc1` or +by the frontend, make sure that it is either forwarded to `flang -fc1` or translated into some other option that is accepted by the frontend driver. In the case of options that contain both `FlangOption` and `FC1Option` among its -flags, we usually just forward from `flang-new` to `flang-new -fc1`. This is +flags, we usually just forward from `flang` to `flang -fc1`. This is then tested in `flang/test/Driver/frontend-forward.F90`. What follows is usually very dependant on the meaning of the corresponding @@ -339,11 +335,11 @@ just added using your new frontend option. ## CMake Support As of [#7246](https://gitlab.kitware.com/cmake/cmake/-/merge_requests/7246) -(and soon to be released CMake 3.24.0), `cmake` can detect `flang-new` as a +(and soon to be released CMake 3.24.0), `cmake` can detect `flang` as a supported Fortran compiler. You can configure your CMake projects to use -`flang-new` as follows: +`flang` as follows: ```bash -cmake -DCMAKE_Fortran_COMPILER= +cmake -DCMAKE_Fortran_COMPILER= ``` You should see the following in the output: ``` @@ -353,14 +349,14 @@ where `` corresponds to the LLVM Flang version. ## Testing In LIT, we define two variables that you can use to invoke Flang's drivers: -* `%flang` is expanded as `flang-new` (i.e. the compiler driver) -* `%flang_fc1` is expanded as `flang-new -fc1` (i.e. the frontend driver) +* `%flang` is expanded as `flang` (i.e. the compiler driver) +* `%flang_fc1` is expanded as `flang -fc1` (i.e. the frontend driver) For most regression tests for the frontend, you will want to use `%flang_fc1`. In some cases, the observable behaviour will be identical regardless of whether `%flang` or `%flang_fc1` is used. However, when you are using `%flang` instead of `%flang_fc1`, the compiler driver will add extra flags to the frontend -driver invocation (i.e. `flang-new -fc1 -`). In some cases that might +driver invocation (i.e. `flang -fc1 -`). In some cases that might be exactly what you want to test. In fact, you can check these additional flags by using the `-###` compiler driver command line option. @@ -380,7 +376,7 @@ plugins. The process for using plugins includes: * [Creating a plugin](#creating-a-plugin) * [Loading and running a plugin](#loading-and-running-a-plugin) -Flang plugins are limited to `flang-new -fc1` and are currently only available / +Flang plugins are limited to `flang -fc1` and are currently only available / been tested on Linux. ### Creating a Plugin @@ -465,14 +461,14 @@ static FrontendPluginRegistry::Add X( ### Loading and Running a Plugin In order to use plugins, there are 2 command line options made available to the -frontend driver, `flang-new -fc1`: +frontend driver, `flang -fc1`: * [`-load `](#the--load-dsopath-option) for loading the dynamic shared object of the plugin * [`-plugin `](#the--plugin-name-option) for calling the registered plugin Invocation of the example plugin is done through: ```bash -flang-new -fc1 -load flangPrintFunctionNames.so -plugin print-fns file.f90 +flang -fc1 -load flangPrintFunctionNames.so -plugin print-fns file.f90 ``` Both these options are parsed in `flang/lib/Frontend/CompilerInvocation.cpp` and @@ -493,7 +489,7 @@ reports an error diagnostic and returns `nullptr`. ### Enabling In-Tree Plugins For in-tree plugins, there is the CMake flag `FLANG_PLUGIN_SUPPORT`, enabled by -default, that controls the exporting of executable symbols from `flang-new`, +default, that controls the exporting of executable symbols from `flang`, which plugins need access to. Additionally, there is the CMake flag `LLVM_BUILD_EXAMPLES`, turned off by default, that is used to control if the example programs are built. This includes plugins that are in the @@ -526,7 +522,7 @@ invocations `invokeFIROptEarlyEPCallbacks`, `invokeFIRInlinerCallback`, and `invokeFIROptLastEPCallbacks` for Flang drivers to be able to insert additonal passes at different points of the default pass pipeline. An example use of these extension point callbacks is shown in `registerDefaultInlinerPass` to invoke the -default inliner pass in `flang-new`. +default inliner pass in `flang`. ## LLVM Pass Plugins @@ -539,7 +535,7 @@ documentation for [`llvm::PassBuilder`](https://llvm.org/doxygen/classllvm_1_1PassBuilder.html) for details. -The framework to enable pass plugins in `flang-new` uses the exact same +The framework to enable pass plugins in `flang` uses the exact same machinery as that used by `clang` and thus has the same capabilities and limitations. @@ -547,7 +543,7 @@ In order to use a pass plugin, the pass(es) must be compiled into a dynamic shared object which is then loaded using the `-fpass-plugin` option. ``` -flang-new -fpass-plugin=/path/to/plugin.so +flang -fpass-plugin=/path/to/plugin.so ``` This option is available in both the compiler driver and the frontend driver. @@ -559,7 +555,7 @@ Pass extensions are similar to plugins, except that they can also be linked statically. Setting `-DLLVM_${NAME}_LINK_INTO_TOOLS` to `ON` in the cmake command turns the project into a statically linked extension. An example would be Polly, e.g., using `-DLLVM_POLLY_LINK_INTO_TOOLS=ON` would link Polly passes -into `flang-new` as built-in middle-end passes. +into `flang` as built-in middle-end passes. See the [`WritingAnLLVMNewPMPass`](https://llvm.org/docs/WritingAnLLVMNewPMPass.html#id9) diff --git a/flang/docs/ImplementingASemanticCheck.md b/flang/docs/ImplementingASemanticCheck.md index 5b583d4f8031b8..598ef696ad14bf 100644 --- a/flang/docs/ImplementingASemanticCheck.md +++ b/flang/docs/ImplementingASemanticCheck.md @@ -68,7 +68,7 @@ of the call to `intentOutFunc()`: I also used this program to produce a parse tree for the program using the command: ```bash - flang-new -fc1 -fdebug-dump-parse-tree testfun.f90 + flang -fc1 -fdebug-dump-parse-tree testfun.f90 ``` Here's the relevant fragment of the parse tree produced by the compiler: @@ -296,7 +296,7 @@ In `lib/Semantics/check-do.cpp`, I added an (almost empty) implementation: I then built the compiler with these changes and ran it on my test program. This time, I made sure to invoke semantic checking. Here's the command I used: ```bash - flang-new -fc1 -fdebug-unparse-with-symbols testfun.f90 + flang -fc1 -fdebug-unparse-with-symbols testfun.f90 ``` This produced the output: diff --git a/flang/docs/Overview.md b/flang/docs/Overview.md index 6eba19ea3a3c0d..dfb4d89264a755 100644 --- a/flang/docs/Overview.md +++ b/flang/docs/Overview.md @@ -65,8 +65,8 @@ See [Preprocessing.md](Preprocessing.md). **Entry point:** `parser::Parsing::Prescan` **Commands:** - - `flang-new -fc1 -E src.f90` dumps the cooked character stream - - `flang-new -fc1 -fdebug-dump-provenance src.f90` dumps provenance + - `flang -fc1 -E src.f90` dumps the cooked character stream + - `flang -fc1 -fdebug-dump-provenance src.f90` dumps provenance information ### Parsing @@ -80,10 +80,10 @@ representing a syntactically correct program, rooted at the program unit. See: **Entry point:** `parser::Parsing::Parse` **Commands:** - - `flang-new -fc1 -fdebug-dump-parse-tree-no-sema src.f90` dumps the parse tree - - `flang-new -fc1 -fdebug-unparse src.f90` converts the parse tree to normalized Fortran - - `flang-new -fc1 -fdebug-dump-parsing-log src.f90` runs an instrumented parse and dumps the log - - `flang-new -fc1 -fdebug-measure-parse-tree src.f90` measures the parse tree + - `flang -fc1 -fdebug-dump-parse-tree-no-sema src.f90` dumps the parse tree + - `flang -fc1 -fdebug-unparse src.f90` converts the parse tree to normalized Fortran + - `flang -fc1 -fdebug-dump-parsing-log src.f90` runs an instrumented parse and dumps the log + - `flang -fc1 -fdebug-measure-parse-tree src.f90` measures the parse tree ### Semantic processing @@ -121,9 +121,9 @@ In the course of semantic analysis, the compiler: At the end of semantic processing, all validation of the user's program is complete. This is the last detailed phase of analysis processing. **Commands:** - - `flang-new -fc1 -fdebug-dump-parse-tree src.f90` dumps the parse tree after semantic analysis - - `flang-new -fc1 -fdebug-dump-symbols src.f90` dumps the symbol table - - `flang-new -fc1 -fdebug-dump-all src.f90` dumps both the parse tree and the symbol table + - `flang -fc1 -fdebug-dump-parse-tree src.f90` dumps the parse tree after semantic analysis + - `flang -fc1 -fdebug-dump-symbols src.f90` dumps the symbol table + - `flang -fc1 -fdebug-dump-all src.f90` dumps both the parse tree and the symbol table ## Lowering @@ -163,8 +163,8 @@ contain a list of evaluations. All of these contain pointers back into the parse tree. The compiler walks the PFT generating FIR. **Commands:** - - `flang-new -fc1 -fdebug-dump-pft src.f90` dumps the pre-FIR tree - - `flang-new -fc1 -emit-mlir src.f90` dumps the FIR to the files src.mlir + - `flang -fc1 -fdebug-dump-pft src.f90` dumps the pre-FIR tree + - `flang -fc1 -emit-mlir src.f90` dumps the FIR to the files src.mlir ### Transformation passes @@ -180,8 +180,8 @@ perform various optimizations and transformations. The final pass creates an LLVM IR representation of the program. **Commands:** - - `flang-new -mmlir --mlir-print-ir-after-all -S src.f90` dumps the FIR code after each pass to standard error - - `flang-new -fc1 -emit-llvm src.f90` dumps the LLVM IR to src.ll + - `flang -mmlir --mlir-print-ir-after-all -S src.f90` dumps the FIR code after each pass to standard error + - `flang -fc1 -emit-llvm src.f90` dumps the LLVM IR to src.ll ## Object code generation and linking diff --git a/flang/examples/FlangOmpReport/FlangOmpReport.cpp b/flang/examples/FlangOmpReport/FlangOmpReport.cpp index 9c1f304b9741e7..709c5c5d305e51 100644 --- a/flang/examples/FlangOmpReport/FlangOmpReport.cpp +++ b/flang/examples/FlangOmpReport/FlangOmpReport.cpp @@ -9,7 +9,7 @@ // all the OpenMP constructs and clauses and which line they're located on. // // The plugin may be invoked as: -// ./bin/flang-new -fc1 -load lib/flangOmpReport.so -plugin flang-omp-report +// ./bin/flang -fc1 -load lib/flangOmpReport.so -plugin flang-omp-report // -fopenmp // //===----------------------------------------------------------------------===// diff --git a/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h b/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h index 9a70b7fbfad2b6..8ab5150cd7c812 100644 --- a/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h +++ b/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h @@ -67,7 +67,7 @@ struct AliasAnalysis { // end subroutine // ------------------------------------------------- // - // flang-new -fc1 -emit-fir test.f90 -o test.fir + // flang -fc1 -emit-fir test.f90 -o test.fir // // ------------------- test.fir -------------------- // fir.global @_QMtopEa : !fir.box>> diff --git a/flang/include/flang/Tools/CrossToolHelpers.h b/flang/include/flang/Tools/CrossToolHelpers.h index 3e703de545950c..df4b21ada058fe 100644 --- a/flang/include/flang/Tools/CrossToolHelpers.h +++ b/flang/include/flang/Tools/CrossToolHelpers.h @@ -7,7 +7,7 @@ //===----------------------------------------------------------------------===// // A header file for containing functionallity that is used across Flang tools, // such as helper functions which apply or generate information needed accross -// tools like bbc and flang-new. +// tools like bbc and flang. //===----------------------------------------------------------------------===// #ifndef FORTRAN_TOOLS_CROSS_TOOL_HELPERS_H diff --git a/flang/lib/Frontend/CompilerInvocation.cpp b/flang/lib/Frontend/CompilerInvocation.cpp index 05b03ba9ebdf30..18383eaafb1136 100644 --- a/flang/lib/Frontend/CompilerInvocation.cpp +++ b/flang/lib/Frontend/CompilerInvocation.cpp @@ -65,8 +65,8 @@ CompilerInvocationBase::~CompilerInvocationBase() = default; static bool parseShowColorsArgs(const llvm::opt::ArgList &args, bool defaultColor = true) { // Color diagnostics default to auto ("on" if terminal supports) in the - // compiler driver `flang-new` but default to off in the frontend driver - // `flang-new -fc1`, needing an explicit OPT_fdiagnostics_color. + // compiler driver `flang` but default to off in the frontend driver + // `flang -fc1`, needing an explicit OPT_fdiagnostics_color. // Support both clang's -f[no-]color-diagnostics and gcc's // -f[no-]diagnostics-colors[=never|always|auto]. enum { @@ -891,7 +891,7 @@ static bool parseDiagArgs(CompilerInvocation &res, llvm::opt::ArgList &args, } } - // Default to off for `flang-new -fc1`. + // Default to off for `flang -fc1`. res.getFrontendOpts().showColors = parseShowColorsArgs(args, /*defaultDiagColor=*/false); diff --git a/flang/lib/Frontend/FrontendActions.cpp b/flang/lib/Frontend/FrontendActions.cpp index 4a52edc436e0ed..8f882bff170909 100644 --- a/flang/lib/Frontend/FrontendActions.cpp +++ b/flang/lib/Frontend/FrontendActions.cpp @@ -233,7 +233,7 @@ bool CodeGenAction::beginSourceFileAction() { llvm::SMDiagnostic err; llvmModule = llvm::parseIRFile(getCurrentInput().getFile(), err, *llvmCtx); if (!llvmModule || llvm::verifyModule(*llvmModule, &llvm::errs())) { - err.print("flang-new", llvm::errs()); + err.print("flang", llvm::errs()); unsigned diagID = ci.getDiagnostics().getCustomDiagID( clang::DiagnosticsEngine::Error, "Could not parse IR"); ci.getDiagnostics().Report(diagID); diff --git a/flang/lib/FrontendTool/ExecuteCompilerInvocation.cpp b/flang/lib/FrontendTool/ExecuteCompilerInvocation.cpp index e2cbd5112d6ea5..09ac129d3e6893 100644 --- a/flang/lib/FrontendTool/ExecuteCompilerInvocation.cpp +++ b/flang/lib/FrontendTool/ExecuteCompilerInvocation.cpp @@ -154,8 +154,7 @@ bool executeCompilerInvocation(CompilerInstance *flang) { // Honor -help. if (flang->getFrontendOpts().showHelp) { clang::driver::getDriverOptTable().printHelp( - llvm::outs(), "flang-new -fc1 [options] file...", - "LLVM 'Flang' Compiler", + llvm::outs(), "flang -fc1 [options] file...", "LLVM 'Flang' Compiler", /*ShowHidden=*/false, /*ShowAllAliases=*/false, llvm::opt::Visibility(clang::driver::options::FC1Option)); return true; diff --git a/flang/runtime/CMakeLists.txt b/flang/runtime/CMakeLists.txt index 0ad1b718d5875b..cdd2de541c6730 100644 --- a/flang/runtime/CMakeLists.txt +++ b/flang/runtime/CMakeLists.txt @@ -308,12 +308,12 @@ set_target_properties(FortranRuntime PROPERTIES FOLDER "Flang/Runtime Libraries" # If FortranRuntime is part of a Flang build (and not a separate build) then # add dependency to make sure that Fortran runtime library is being built after # we have the Flang compiler available. This also includes the MODULE files -# that compile when the 'flang-new' target is built. +# that compile when the 'flang' target is built. # # TODO: This is a workaround and should be updated when runtime build procedure # is changed to a regular runtime build. See discussion in PR #95388. -if (TARGET flang-new AND TARGET module_files) - add_dependencies(FortranRuntime flang-new module_files) +if (TARGET flang AND TARGET module_files) + add_dependencies(FortranRuntime flang module_files) endif() if (FLANG_CUF_RUNTIME) diff --git a/flang/test/CMakeLists.txt b/flang/test/CMakeLists.txt index a18a5c6519eda4..cab214c2ef4c8c 100644 --- a/flang/test/CMakeLists.txt +++ b/flang/test/CMakeLists.txt @@ -58,7 +58,7 @@ set(FLANG_TEST_PARAMS flang_site_config=${CMAKE_CURRENT_BINARY_DIR}/lit.site.cfg.py) set(FLANG_TEST_DEPENDS - flang-new + flang llvm-config FileCheck count diff --git a/flang/test/Driver/aarch64-outline-atomics.f90 b/flang/test/Driver/aarch64-outline-atomics.f90 index a1c874c20df5c7..530bfc8e962091 100644 --- a/flang/test/Driver/aarch64-outline-atomics.f90 +++ b/flang/test/Driver/aarch64-outline-atomics.f90 @@ -1,4 +1,4 @@ -! Test that flang-new forwards the -moutline-atomics and -mno-outline-atomics. +! Test that flang forwards the -moutline-atomics and -mno-outline-atomics. ! RUN: %flang -moutline-atomics --target=aarch64-none-none -### %s -o %t 2>&1 | FileCheck %s ! CHECK: "-target-feature" "+outline-atomics" diff --git a/flang/test/Driver/color-diagnostics-forwarding.f90 b/flang/test/Driver/color-diagnostics-forwarding.f90 index 368fa8834142ab..29061242cb0cbc 100644 --- a/flang/test/Driver/color-diagnostics-forwarding.f90 +++ b/flang/test/Driver/color-diagnostics-forwarding.f90 @@ -1,5 +1,5 @@ -! Test that flang-new forwards -f{no-}color-diagnostics and -! -f{no-}diagnostics-color options to flang-new -fc1 as expected. +! Test that flang forwards -f{no-}color-diagnostics and +! -f{no-}diagnostics-color options to flang -fc1 as expected. ! RUN: %flang -fsyntax-only -### %s -o %t 2>&1 -fcolor-diagnostics \ ! RUN: | FileCheck %s --check-prefix=CHECK-CD diff --git a/flang/test/Driver/compiler-options.f90 b/flang/test/Driver/compiler-options.f90 index 7ec29ce7ba7abf..cefa86836abd30 100644 --- a/flang/test/Driver/compiler-options.f90 +++ b/flang/test/Driver/compiler-options.f90 @@ -1,6 +1,6 @@ ! RUN: %flang -S -emit-llvm -flang-deprecated-no-hlfir -o - %s | FileCheck %s -! Test communication of COMPILER_OPTIONS from flang-new to flang-new -fc1. -! CHECK: [[OPTSVAR:@_QQclX[0-9a-f]+]] = {{[a-z]+}} constant [[[OPTSLEN:[0-9]+]] x i8] c"{{.*}}flang-new{{(\.exe)?}} {{.*}}-S -emit-llvm -flang-deprecated-no-hlfir -o - {{.*}}compiler-options.f90" +! Test communication of COMPILER_OPTIONS from flang to flang -fc1. +! CHECK: [[OPTSVAR:@_QQclX[0-9a-f]+]] = {{[a-z]+}} constant [[[OPTSLEN:[0-9]+]] x i8] c"{{.*}}flang{{(\.exe)?}} {{.*}}-S -emit-llvm -flang-deprecated-no-hlfir -o - {{.*}}compiler-options.f90" program main use ISO_FORTRAN_ENV, only: compiler_options implicit none diff --git a/flang/test/Driver/convert.f90 b/flang/test/Driver/convert.f90 index b2cf6c23efdb75..0ba31d2188cdf5 100755 --- a/flang/test/Driver/convert.f90 +++ b/flang/test/Driver/convert.f90 @@ -12,7 +12,7 @@ ! RUN: not %flang -fconvert=foobar %s 2>&1 | FileCheck %s --check-prefix=INVALID !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -emit-mlir -fconvert=unknown %s -o - | FileCheck %s --check-prefix=VALID_FC1 ! RUN: %flang_fc1 -emit-mlir -fconvert=native %s -o - | FileCheck %s --check-prefix=VALID_FC1 diff --git a/flang/test/Driver/disable-ext-name-interop.f90 b/flang/test/Driver/disable-ext-name-interop.f90 index 0c59a5b4c980f8..1ade84b996d043 100644 --- a/flang/test/Driver/disable-ext-name-interop.f90 +++ b/flang/test/Driver/disable-ext-name-interop.f90 @@ -1,4 +1,4 @@ -! Test that we can disable the ExternalNameConversion pass in flang-new. +! Test that we can disable the ExternalNameConversion pass in flang. ! RUN: %flang_fc1 -S %s -o - 2>&1 | FileCheck %s --check-prefix=EXTNAMES ! RUN: %flang_fc1 -S -mmlir -disable-external-name-interop %s -o - 2>&1 | FileCheck %s --check-prefix=INTNAMES diff --git a/flang/test/Driver/driver-version.f90 b/flang/test/Driver/driver-version.f90 index d1e1e1d90fe1f8..4c6aecb1c4fa7e 100644 --- a/flang/test/Driver/driver-version.f90 +++ b/flang/test/Driver/driver-version.f90 @@ -4,12 +4,12 @@ ! RUN: %flang_fc1 -version 2>&1 | FileCheck %s --check-prefix=VERSION-FC1 ! RUN: not %flang_fc1 --version 2>&1 | FileCheck %s --check-prefix=ERROR-FC1 -! VERSION: flang-new version +! VERSION: flang version ! VERSION-NEXT: Target: ! VERSION-NEXT: Thread model: ! VERSION-NEXT: InstalledDir: -! ERROR: flang-new: error: unknown argument '--versions'; did you mean '--version'? +! ERROR: flang: error: unknown argument '--versions'; did you mean '--version'? ! VERSION-FC1: LLVM version diff --git a/flang/test/Driver/escaped-backslash.f90 b/flang/test/Driver/escaped-backslash.f90 index ad07eae24e9fab..90dd1783dd1150 100644 --- a/flang/test/Driver/escaped-backslash.f90 +++ b/flang/test/Driver/escaped-backslash.f90 @@ -1,14 +1,14 @@ ! Ensure argument -fbackslash works as expected. !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: %flang -E %s 2>&1 | FileCheck %s --check-prefix=ESCAPED ! RUN: %flang -E -fbackslash -fno-backslash %s 2>&1 | FileCheck %s --check-prefix=ESCAPED ! RUN: %flang -E -fbackslash %s 2>&1 | FileCheck %s --check-prefix=UNESCAPED !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -E %s 2>&1 | FileCheck %s --check-prefix=ESCAPED ! RUN: %flang_fc1 -E -fbackslash -fno-backslash %s 2>&1 | FileCheck %s --check-prefix=ESCAPED diff --git a/flang/test/Driver/fdefault.f90 b/flang/test/Driver/fdefault.f90 index 88592bfa3e87ee..7ce45b763a240f 100644 --- a/flang/test/Driver/fdefault.f90 +++ b/flang/test/Driver/fdefault.f90 @@ -2,25 +2,25 @@ ! TODO: Add checks when actual codegen is possible for this family !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang -fsyntax-only -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=NOOPTION -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang -fsyntax-only -fdefault-real-8 -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=REAL8 -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang -fsyntax-only -fdefault-real-8 -fdefault-double-8 -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=DOUBLE8 +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang -fsyntax-only -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=NOOPTION +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang -fsyntax-only -fdefault-real-8 -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=REAL8 +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang -fsyntax-only -fdefault-real-8 -fdefault-double-8 -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=DOUBLE8 ! RUN: not %flang -fsyntax-only -fdefault-double-8 %s 2>&1 | FileCheck %s --check-prefix=ERROR !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang_fc1 -fsyntax-only -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=NOOPTION -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang_fc1 -fsyntax-only -fdefault-real-8 -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=REAL8 -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang_fc1 -fsyntax-only -fdefault-real-8 -fdefault-double-8 -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=DOUBLE8 +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang_fc1 -fsyntax-only -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=NOOPTION +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang_fc1 -fsyntax-only -fdefault-real-8 -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=REAL8 +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang_fc1 -fsyntax-only -fdefault-real-8 -fdefault-double-8 -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=DOUBLE8 ! RUN: not %flang_fc1 -fsyntax-only -fdefault-double-8 %s 2>&1 | FileCheck %s --check-prefix=ERROR ! NOOPTION: integer(4),parameter::real_kind=4_4 diff --git a/flang/test/Driver/flarge-sizes.f90 b/flang/test/Driver/flarge-sizes.f90 index 6ea5876676ed1f..6c41a03a830bfb 100644 --- a/flang/test/Driver/flarge-sizes.f90 +++ b/flang/test/Driver/flarge-sizes.f90 @@ -2,20 +2,20 @@ ! TODO: Add checks when actual codegen is possible. !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang -fsyntax-only -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=NOLARGE -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang -fsyntax-only -flarge-sizes -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=LARGE +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang -fsyntax-only -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=NOLARGE +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang -fsyntax-only -flarge-sizes -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=LARGE !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang_fc1 -fsyntax-only -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=NOLARGE -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang_fc1 -fsyntax-only -flarge-sizes -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=LARGE +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang_fc1 -fsyntax-only -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=NOLARGE +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang_fc1 -fsyntax-only -flarge-sizes -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=LARGE ! NOLARGE: real(4)::z(1_8:10_8) ! NOLARGE-NEXT: integer(4),parameter::size_kind=4_4 diff --git a/flang/test/Driver/frame-pointer-forwarding.f90 b/flang/test/Driver/frame-pointer-forwarding.f90 index 751494cc6a6017..9fcbd6e12f98b7 100644 --- a/flang/test/Driver/frame-pointer-forwarding.f90 +++ b/flang/test/Driver/frame-pointer-forwarding.f90 @@ -1,4 +1,4 @@ -! Test that flang-new forwards -fno-omit-frame-pointer and -fomit-frame-pointer Flang frontend +! Test that flang forwards -fno-omit-frame-pointer and -fomit-frame-pointer Flang frontend ! RUN: %flang --target=aarch64-none-none -fsyntax-only -### %s -o %t 2>&1 | FileCheck %s --check-prefix=CHECK-NOVALUE ! CHECK-NOVALUE: "-fc1"{{.*}}"-mframe-pointer=non-leaf" diff --git a/flang/test/Driver/frontend-forwarding.f90 b/flang/test/Driver/frontend-forwarding.f90 index 35adb47b56861e..0a56a1e3710d9d 100644 --- a/flang/test/Driver/frontend-forwarding.f90 +++ b/flang/test/Driver/frontend-forwarding.f90 @@ -1,5 +1,5 @@ -! Test that flang-new forwards Flang frontend -! options to flang-new -fc1 as expected. +! Test that flang forwards Flang frontend +! options to flang -fc1 as expected. ! RUN: %flang -fsyntax-only -### %s -o %t 2>&1 \ ! RUN: -finput-charset=utf-8 \ diff --git a/flang/test/Driver/hlfir-no-hlfir-error.f90 b/flang/test/Driver/hlfir-no-hlfir-error.f90 index 2410393b6cd9c1..59f8304db5c9ab 100644 --- a/flang/test/Driver/hlfir-no-hlfir-error.f90 +++ b/flang/test/Driver/hlfir-no-hlfir-error.f90 @@ -2,12 +2,12 @@ ! options cannot be both used. !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: not %flang -flang-experimental-hlfir -flang-deprecated-no-hlfir %s 2>&1 | FileCheck %s !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: not %flang_fc1 -emit-llvm -flang-experimental-hlfir -flang-deprecated-no-hlfir %s 2>&1 | FileCheck %s diff --git a/flang/test/Driver/intrinsic-module-path.f90 b/flang/test/Driver/intrinsic-module-path.f90 index 5523ed37b724cd..15d19dd83d963f 100644 --- a/flang/test/Driver/intrinsic-module-path.f90 +++ b/flang/test/Driver/intrinsic-module-path.f90 @@ -4,7 +4,7 @@ ! default one, causing a CHECKSUM error. !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -fsyntax-only %s 2>&1 | FileCheck %s --allow-empty --check-prefix=WITHOUT ! RUN: not %flang_fc1 -fsyntax-only -fintrinsic-modules-path %S/Inputs/ %s 2>&1 | FileCheck %s --check-prefix=GIVEN diff --git a/flang/test/Driver/large-data-threshold.f90 b/flang/test/Driver/large-data-threshold.f90 index 320566c4b2e43a..6a7eef79559d0b 100644 --- a/flang/test/Driver/large-data-threshold.f90 +++ b/flang/test/Driver/large-data-threshold.f90 @@ -7,11 +7,11 @@ ! RUN: not %flang -### -c --target=aarch64 -mcmodel=small -mlarge-data-threshold=32768 %s 2>&1 | FileCheck %s --check-prefix=NOT-SUPPORTED -! CHECK: "{{.*}}flang-new" "-fc1" +! CHECK: "{{.*}}flang" "-fc1" ! CHECK-SAME: "-mlarge-data-threshold=32768" -! CHECK-59000: "{{.*}}flang-new" "-fc1" +! CHECK-59000: "{{.*}}flang" "-fc1" ! CHECK-59000-SAME: "-mlarge-data-threshold=59000" -! CHECK-1M: "{{.*}}flang-new" "-fc1" +! CHECK-1M: "{{.*}}flang" "-fc1" ! CHECK-1M-SAME: "-mlarge-data-threshold=1048576" ! NO-MCMODEL: 'mlarge-data-threshold=' only applies to medium and large code models ! INVALID: error: invalid value 'nonsense' in '-mlarge-data-threshold=' diff --git a/flang/test/Driver/lto-flags.f90 b/flang/test/Driver/lto-flags.f90 index a51febc7009691..bad3d972e6bd6b 100644 --- a/flang/test/Driver/lto-flags.f90 +++ b/flang/test/Driver/lto-flags.f90 @@ -30,7 +30,7 @@ ! FULL-LTO: "-fc1" ! FULL-LTO-SAME: "-flto=full" -! THIN-LTO-ALL: flang-new: warning: the option '-flto=thin' is a work in progress +! THIN-LTO-ALL: flang: warning: the option '-flto=thin' is a work in progress ! THIN-LTO-ALL: "-fc1" ! THIN-LTO-ALL-SAME: "-flto=thin" ! THIN-LTO-LINKER-PLUGIN: "-plugin-opt=thinlto" diff --git a/flang/test/Driver/macro-def-undef.F90 b/flang/test/Driver/macro-def-undef.F90 index 1332c6d6c02708..b13a9040833dbf 100644 --- a/flang/test/Driver/macro-def-undef.F90 +++ b/flang/test/Driver/macro-def-undef.F90 @@ -1,14 +1,14 @@ ! Ensure arguments -D and -U work as expected. !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: %flang -E -P %s 2>&1 | FileCheck %s --check-prefix=UNDEFINED ! RUN: %flang -E -P -DX=A %s 2>&1 | FileCheck %s --check-prefix=DEFINED ! RUN: %flang -E -P -DX=A -UX %s 2>&1 | FileCheck %s --check-prefix=UNDEFINED !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -E -P %s 2>&1 | FileCheck %s --check-prefix=UNDEFINED ! RUN: %flang_fc1 -E -P -DX=A %s 2>&1 | FileCheck %s --check-prefix=DEFINED diff --git a/flang/test/Driver/missing-input.f90 b/flang/test/Driver/missing-input.f90 index 236325e3578f1d..51d37a718c542f 100644 --- a/flang/test/Driver/missing-input.f90 +++ b/flang/test/Driver/missing-input.f90 @@ -1,26 +1,26 @@ ! Test the behaviour of the driver when input is missing or is invalid. Note -! that with the compiler driver (flang-new), the input _has_ to be specified. +! that with the compiler driver (flang), the input _has_ to be specified. ! Indeed, the driver decides what "job/command" to create based on the input ! file's extension. No input file means that it doesn't know what to do -! (compile? preprocess? link?). The frontend driver (flang-new -fc1) simply +! (compile? preprocess? link?). The frontend driver (flang -fc1) simply ! assumes that "no explicit input == read from stdin" !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: not %flang 2>&1 | FileCheck %s --check-prefix=FLANG-NO-FILE ! RUN: not %flang %t.f90 2>&1 | FileCheck %s --check-prefix=FLANG-NONEXISTENT-FILE !----------------------------------------- -! FLANG FRONTEND DRIVER (flang-new -fc1) +! FLANG FRONTEND DRIVER (flang -fc1) !----------------------------------------- ! RUN: not %flang_fc1 %t.f90 2>&1 | FileCheck %s --check-prefix=FLANG-FC1-NONEXISTENT-FILE ! RUN: not %flang_fc1 %S 2>&1 | FileCheck %s --check-prefix=FLANG-FC1-DIR -! FLANG-NO-FILE: flang-new: error: no input files +! FLANG-NO-FILE: flang: error: no input files -! FLANG-NONEXISTENT-FILE: flang-new: error: no such file or directory: {{.*}} -! FLANG-NONEXISTENT-FILE: flang-new: error: no input files +! FLANG-NONEXISTENT-FILE: flang: error: no such file or directory: {{.*}} +! FLANG-NONEXISTENT-FILE: flang: error: no input files ! FLANG-FC1-NONEXISTENT-FILE: error: {{.*}} does not exist ! FLANG-FC1-DIR: error: {{.*}} is not a regular file diff --git a/flang/test/Driver/multiple-input-files.f90 b/flang/test/Driver/multiple-input-files.f90 index 6c86f23f2b21fa..64ec8679abf94f 100644 --- a/flang/test/Driver/multiple-input-files.f90 +++ b/flang/test/Driver/multiple-input-files.f90 @@ -39,7 +39,7 @@ ! FLANG-NEXT:end program hello ! TEST 2: `-o` does not when multiple input files are present -! ERROR: flang-new: error: cannot specify -o when generating multiple output files +! ERROR: flang: error: cannot specify -o when generating multiple output files ! TEST 3: The output file _was not_ specified - `flang_fc1` will process all ! input files and generate one output file for every input file. diff --git a/flang/test/Driver/omp-driver-offload.f90 b/flang/test/Driver/omp-driver-offload.f90 index b0b94ab1386a74..7c51656f0001af 100644 --- a/flang/test/Driver/omp-driver-offload.f90 +++ b/flang/test/Driver/omp-driver-offload.f90 @@ -1,6 +1,6 @@ -! Test that flang-new OpenMP and OpenMP offload related +! Test that flang OpenMP and OpenMP offload related ! commands forward or expand to the appropriate commands -! for flang-new -fc1 as expected. Assumes a gfx90a, aarch64, +! for flang -fc1 as expected. Assumes a gfx90a, aarch64, ! and sm_70 architecture, but doesn't require one to be ! installed or compiled for, just testing the appropriate ! generation of jobs are created with the correct @@ -8,8 +8,8 @@ ! Test regular -fopenmp with no offload ! RUN: %flang -### -fopenmp %s 2>&1 | FileCheck --check-prefixes=CHECK-OPENMP %s -! CHECK-OPENMP: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}}.f90" -! CHECK-OPENMP-NOT: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" {{.*}}.f90" +! CHECK-OPENMP: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}}.f90" +! CHECK-OPENMP-NOT: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" {{.*}}.f90" ! Test regular -fopenmp with offload, and invocation filtering options ! RUN: %flang -S -### %s -o %t 2>&1 \ @@ -22,47 +22,47 @@ ! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\ ! RUN: | FileCheck %s --check-prefix=OFFLOAD-HOST-AND-DEVICE -! OFFLOAD-HOST-AND-DEVICE: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" -! OFFLOAD-HOST-AND-DEVICE-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" -! OFFLOAD-HOST-AND-DEVICE-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "nvptx64-nvidia-cuda" -! OFFLOAD-HOST-AND-DEVICE: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-HOST-AND-DEVICE: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-HOST-AND-DEVICE-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! OFFLOAD-HOST-AND-DEVICE-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "nvptx64-nvidia-cuda" +! OFFLOAD-HOST-AND-DEVICE: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! RUN: %flang -S -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a --offload-arch=sm_70 --offload-host-only \ ! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\ ! RUN: | FileCheck %s --check-prefix=OFFLOAD-HOST -! OFFLOAD-HOST: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-HOST: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! OFFLOAD-HOST-NOT: "-triple" "amdgcn-amd-amdhsa" ! OFFLOAD-HOST-NOT: "-triple" "nvptx64-nvidia-cuda" -! OFFLOAD-HOST-NOT: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-HOST-NOT: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! RUN: %flang -S -### %s 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a --offload-arch=sm_70 --offload-device-only \ ! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\ ! RUN: | FileCheck %s --check-prefix=OFFLOAD-DEVICE -! OFFLOAD-DEVICE: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" -! OFFLOAD-DEVICE-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" -! OFFLOAD-DEVICE-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "nvptx64-nvidia-cuda" -! OFFLOAD-DEVICE-NOT: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-DEVICE: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-DEVICE-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! OFFLOAD-DEVICE-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "nvptx64-nvidia-cuda" +! OFFLOAD-DEVICE-NOT: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! Test regular -fopenmp with offload for basic fopenmp-is-target-device flag addition and correct fopenmp ! RUN: %flang -### -fopenmp --offload-arch=gfx90a -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib %s 2>&1 | FileCheck --check-prefixes=CHECK-OPENMP-IS-TARGET-DEVICE %s -! CHECK-OPENMP-IS-TARGET-DEVICE: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" {{.*}}.f90" +! CHECK-OPENMP-IS-TARGET-DEVICE: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" {{.*}}.f90" ! Testing appropriate flags are gnerated and appropriately assigned by the driver when offloading ! RUN: %flang -S -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ ! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\ ! RUN: | FileCheck %s --check-prefix=OPENMP-OFFLOAD-ARGS -! OPENMP-OFFLOAD-ARGS: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" {{.*}} "-fopenmp" {{.*}}.f90" -! OPENMP-OFFLOAD-ARGS-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! OPENMP-OFFLOAD-ARGS: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" {{.*}} "-fopenmp" {{.*}}.f90" +! OPENMP-OFFLOAD-ARGS-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" ! OPENMP-OFFLOAD-ARGS-SAME: "-fopenmp" ! OPENMP-OFFLOAD-ARGS-SAME: "-fopenmp-host-ir-file-path" "{{.*}}.bc" "-fopenmp-is-target-device" ! OPENMP-OFFLOAD-ARGS-SAME: {{.*}}.f90" ! OPENMP-OFFLOAD-ARGS: "{{[^"]*}}clang-offload-packager{{.*}}" {{.*}} "--image=file={{.*}}.bc,triple=amdgcn-amd-amdhsa,arch=gfx90a,kind=openmp" -! OPENMP-OFFLOAD-ARGS-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OPENMP-OFFLOAD-ARGS-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! OPENMP-OFFLOAD-ARGS-SAME: "-fopenmp" ! OPENMP-OFFLOAD-ARGS-SAME: "-fembed-offload-object={{.*}}.out" {{.*}}.bc" @@ -77,7 +77,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-assume-threads-oversubscription \ ! RUN: | FileCheck %s --check-prefixes=CHECK-THREADS-OVS -! CHECK-THREADS-OVS: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-threads-oversubscription" {{.*}}.f90" +! CHECK-THREADS-OVS: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-threads-oversubscription" {{.*}}.f90" ! RUN: %flang -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -89,7 +89,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-assume-teams-oversubscription \ ! RUN: | FileCheck %s --check-prefixes=CHECK-TEAMS-OVS -! CHECK-TEAMS-OVS: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-teams-oversubscription" {{.*}}.f90" +! CHECK-TEAMS-OVS: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-teams-oversubscription" {{.*}}.f90" ! RUN: %flang -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -101,7 +101,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-assume-no-nested-parallelism \ ! RUN: | FileCheck %s --check-prefixes=CHECK-NEST-PAR -! CHECK-NEST-PAR: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-no-nested-parallelism" {{.*}}.f90" +! CHECK-NEST-PAR: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-no-nested-parallelism" {{.*}}.f90" ! RUN: %flang -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -113,7 +113,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-assume-no-thread-state \ ! RUN: | FileCheck %s --check-prefixes=CHECK-THREAD-STATE -! CHECK-THREAD-STATE: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-no-thread-state" {{.*}}.f90" +! CHECK-THREAD-STATE: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-no-thread-state" {{.*}}.f90" ! RUN: %flang -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -125,7 +125,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-target-debug \ ! RUN: | FileCheck %s --check-prefixes=CHECK-TARGET-DEBUG -! CHECK-TARGET-DEBUG: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug" {{.*}}.f90" +! CHECK-TARGET-DEBUG: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug" {{.*}}.f90" ! RUN: %flang -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -137,7 +137,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-target-debug \ ! RUN: | FileCheck %s --check-prefixes=CHECK-TARGET-DEBUG -! CHECK-TARGET-DEBUG-EQ: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug=111" {{.*}}.f90" +! CHECK-TARGET-DEBUG-EQ: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug=111" {{.*}}.f90" ! RUN: %flang -S -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -153,7 +153,7 @@ ! RUN: -fopenmp-assume-teams-oversubscription -fopenmp-assume-no-nested-parallelism \ ! RUN: -fopenmp-assume-no-thread-state \ ! RUN: | FileCheck %s --check-prefixes=CHECK-RTL-ALL -! CHECK-RTL-ALL: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug" "-fopenmp-assume-teams-oversubscription" +! CHECK-RTL-ALL: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug" "-fopenmp-assume-teams-oversubscription" ! CHECK-RTL-ALL: "-fopenmp-assume-threads-oversubscription" "-fopenmp-assume-no-thread-state" "-fopenmp-assume-no-nested-parallelism" ! CHECK-RTL-ALL: {{.*}}.f90" @@ -167,7 +167,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-version=45 \ ! RUN: | FileCheck %s --check-prefixes=CHECK-OPENMP-VERSION -! CHECK-OPENMP-VERSION: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" "-fopenmp-version=45" {{.*}}.f90" +! CHECK-OPENMP-VERSION: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" "-fopenmp-version=45" {{.*}}.f90" ! Test diagnostic error when host IR file is non-existent ! RUN: not %flang_fc1 %s -o %t 2>&1 -fopenmp -fopenmp-is-target-device \ @@ -187,7 +187,7 @@ ! RUN: --target=aarch64-unknown-linux-gnu \ ! RUN: | FileCheck %s --check-prefix=FORCE-USM-NO-OFFLOAD -! FORCE-USM-NO-OFFLOAD: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! FORCE-USM-NO-OFFLOAD: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! FORCE-USM-NO-OFFLOAD-SAME: "-fopenmp" "-fopenmp-force-usm" ! Test -fopenmp-force-usm option with offload @@ -196,16 +196,16 @@ ! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\ ! RUN: | FileCheck %s --check-prefix=FORCE-USM-OFFLOAD -! FORCE-USM-OFFLOAD: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! FORCE-USM-OFFLOAD: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! FORCE-USM-OFFLOAD-SAME: "-fopenmp" "-fopenmp-force-usm" -! FORCE-USM-OFFLOAD-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! FORCE-USM-OFFLOAD-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" ! FORCE-USM-OFFLOAD-SAME: "-fopenmp" "-fopenmp-force-usm" ! RUN: %flang -### -v --target=x86_64-unknown-linux-gnu -fopenmp \ ! RUN: --offload-arch=gfx900 \ ! RUN: --rocm-path=%S/Inputs/rocm %s 2>&1 \ ! RUN: | FileCheck --check-prefix=MLINK-BUILTIN-BITCODE %s -! MLINK-BUILTIN-BITCODE: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! MLINK-BUILTIN-BITCODE: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" ! MLINK-BUILTIN-BITCODE-SAME: "-mlink-builtin-bitcode" {{.*Inputs.*rocm.*amdgcn.*bitcode.*}}oclc_isa_version_900.bc ! Test that the -fopenmp-targets option is added to host compilation invocations @@ -219,9 +219,9 @@ ! RUN: --target=x86_64-unknown-linux-gnu -nogpulib \ ! RUN: | FileCheck %s --check-prefix=OFFLOAD-TARGETS -! OFFLOAD-TARGETS: "{{[^"]*}}flang-new" "-fc1" "-triple" "x86_64-unknown-linux-gnu" +! OFFLOAD-TARGETS: "{{[^"]*}}flang" "-fc1" "-triple" "x86_64-unknown-linux-gnu" ! OFFLOAD-TARGETS-SAME: "-fopenmp-targets=amdgcn-amd-amdhsa" -! OFFLOAD-TARGETS-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! OFFLOAD-TARGETS-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" ! OFFLOAD-TARGETS-NOT: -fopenmp-targets -! OFFLOAD-TARGETS: "{{[^"]*}}flang-new" "-fc1" "-triple" "x86_64-unknown-linux-gnu" +! OFFLOAD-TARGETS: "{{[^"]*}}flang" "-fc1" "-triple" "x86_64-unknown-linux-gnu" ! OFFLOAD-TARGETS-SAME: "-fopenmp-targets=amdgcn-amd-amdhsa" diff --git a/flang/test/Driver/predefined-macros-compiler-version.F90 b/flang/test/Driver/predefined-macros-compiler-version.F90 index 823a730f96845a..f6924479281562 100644 --- a/flang/test/Driver/predefined-macros-compiler-version.F90 +++ b/flang/test/Driver/predefined-macros-compiler-version.F90 @@ -1,12 +1,12 @@ ! Check that the driver correctly defines macros with the compiler version !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: %flang_fc1 -E %s 2>&1 | FileCheck %s --ignore-case !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -E %s 2>&1 | FileCheck %s --ignore-case diff --git a/flang/test/Driver/std2018-wrong.f90 b/flang/test/Driver/std2018-wrong.f90 index 27ccc76bd39aad..93ba153d75f7f9 100644 --- a/flang/test/Driver/std2018-wrong.f90 +++ b/flang/test/Driver/std2018-wrong.f90 @@ -1,7 +1,7 @@ ! Ensure argument -std=f2018 works as expected. !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: not %flang_fc1 -std=90 %s 2>&1 | FileCheck %s --check-prefix=WRONG diff --git a/flang/test/Driver/std2018.f90 b/flang/test/Driver/std2018.f90 index cf461cf89e4e19..1727f92127b711 100644 --- a/flang/test/Driver/std2018.f90 +++ b/flang/test/Driver/std2018.f90 @@ -1,7 +1,7 @@ ! Ensure argument -std=f2018 works as expected. !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -fsyntax-only %s 2>&1 | FileCheck %s --allow-empty --check-prefix=WITHOUT ! RUN: %flang_fc1 -fsyntax-only -std=f2018 %s 2>&1 | FileCheck %s --check-prefix=GIVEN diff --git a/flang/test/Driver/supported-suffices/f03-suffix.f03 b/flang/test/Driver/supported-suffices/f03-suffix.f03 index 6e03f9f43fc602..1d850305cd040e 100644 --- a/flang/test/Driver/supported-suffices/f03-suffix.f03 +++ b/flang/test/Driver/supported-suffices/f03-suffix.f03 @@ -1,5 +1,5 @@ ! RUN: %flang -### %s 2>&1 | FileCheck %s -! CHECK: "{{.*}}flang-new" "-fc1" {{.*}} "-o" "{{.*}}.o" +! CHECK: "{{.*}}flang" "-fc1" {{.*}} "-o" "{{.*}}.o" program f03 end program f03 diff --git a/flang/test/Driver/supported-suffices/f08-suffix.f08 b/flang/test/Driver/supported-suffices/f08-suffix.f08 index d5bcf4ce1de1cc..2b31e4c21876ae 100644 --- a/flang/test/Driver/supported-suffices/f08-suffix.f08 +++ b/flang/test/Driver/supported-suffices/f08-suffix.f08 @@ -1,5 +1,5 @@ ! RUN: %flang -### %s 2>&1 | FileCheck %s -! CHECK: "{{.*}}flang-new" "-fc1" {{.*}} "-o" "{{.*}}.o" +! CHECK: "{{.*}}flang" "-fc1" {{.*}} "-o" "{{.*}}.o" program f08 end program f08 diff --git a/flang/test/Driver/use-module-error.f90 b/flang/test/Driver/use-module-error.f90 index 42d6650621c8c8..67335f61626817 100644 --- a/flang/test/Driver/use-module-error.f90 +++ b/flang/test/Driver/use-module-error.f90 @@ -1,14 +1,14 @@ ! Ensure that multiple module directories are not allowed !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: not %flang -fsyntax-only -J %S/Inputs/module-dir -J %S/Inputs/ %s 2>&1 | FileCheck %s --check-prefix=DOUBLEINCLUDE ! RUN: not %flang -fsyntax-only -J %S/Inputs/module-dir -module-dir %S/Inputs/ %s 2>&1 | FileCheck %s --check-prefix=DOUBLEINCLUDE ! RUN: not %flang -fsyntax-only -module-dir %S/Inputs/module-dir -J%S/Inputs/ %s 2>&1 | FileCheck %s --check-prefix=DOUBLEINCLUDE !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: not %flang_fc1 -fsyntax-only -J %S/Inputs/module-dir -J %S/Inputs/ %s 2>&1 | FileCheck %s --check-prefix=DOUBLEINCLUDE ! RUN: not %flang_fc1 -fsyntax-only -J %S/Inputs/module-dir -module-dir %S/Inputs/ %s 2>&1 | FileCheck %s --check-prefix=DOUBLEINCLUDE diff --git a/flang/test/Driver/use-module.f90 b/flang/test/Driver/use-module.f90 index 775c0424715883..2c3a38043fe16e 100644 --- a/flang/test/Driver/use-module.f90 +++ b/flang/test/Driver/use-module.f90 @@ -1,7 +1,7 @@ ! Checks that module search directories specified with `-J/-module-dir` and `-I` are handled correctly !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: %flang -fsyntax-only -I %S/Inputs -I %S/Inputs/module-dir %s 2>&1 | FileCheck %s --check-prefix=INCLUDED --allow-empty ! RUN: %flang -fsyntax-only -I %S/Inputs -J %S/Inputs/module-dir %s 2>&1 | FileCheck %s --check-prefix=INCLUDED --allow-empty @@ -16,7 +16,7 @@ ! RUN: not %flang -fsyntax-only -module-dir %S/Inputs/module-dir %s 2>&1 | FileCheck %s --check-prefix=SINGLEINCLUDE !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -fsyntax-only -I %S/Inputs -I %S/Inputs/module-dir %s 2>&1 | FileCheck %s --check-prefix=INCLUDED --allow-empty ! RUN: %flang_fc1 -fsyntax-only -I %S/Inputs -J %S/Inputs/module-dir %s 2>&1 | FileCheck %s --check-prefix=INCLUDED --allow-empty diff --git a/flang/test/Driver/version-loops.f90 b/flang/test/Driver/version-loops.f90 index b0fa01d572512a..d206393a04f486 100644 --- a/flang/test/Driver/version-loops.f90 +++ b/flang/test/Driver/version-loops.f90 @@ -1,5 +1,5 @@ -! Test that flang-new forwards the -f{no-,}version-loops-for-stride -! options correctly to flang-new -fc1 for different variants of optimisation +! Test that flang forwards the -f{no-,}version-loops-for-stride +! options correctly to flang -fc1 for different variants of optimisation ! and explicit flags. ! RUN: %flang -### %s -o %t 2>&1 -O3 \ @@ -23,32 +23,32 @@ ! RUN: %flang -### %s -o %t 2>&1 -O3 -fno-version-loops-for-stride \ ! RUN: | FileCheck %s --check-prefix=CHECK-O3-no -! CHECK: "{{.*}}flang-new" "-fc1" +! CHECK: "{{.*}}flang" "-fc1" ! CHECK-SAME: "-fversion-loops-for-stride" ! CHECK-SAME: "-O3" -! CHECK-O2: "{{.*}}flang-new" "-fc1" +! CHECK-O2: "{{.*}}flang" "-fc1" ! CHECK-O2-NOT: "-fversion-loops-for-stride" ! CHECK-O2-SAME: "-O2" -! CHECK-O2-with: "{{.*}}flang-new" "-fc1" +! CHECK-O2-with: "{{.*}}flang" "-fc1" ! CHECK-O2-with-SAME: "-fversion-loops-for-stride" ! CHECK-O2-with-SAME: "-O2" -! CHECK-O4: "{{.*}}flang-new" "-fc1" +! CHECK-O4: "{{.*}}flang" "-fc1" ! CHECK-O4-SAME: "-fversion-loops-for-stride" ! CHECK-O4-SAME: "-O3" -! CHECK-Ofast: "{{.*}}flang-new" "-fc1" +! CHECK-Ofast: "{{.*}}flang" "-fc1" ! CHECK-Ofast-SAME: "-ffast-math" ! CHECK-Ofast-SAME: "-fversion-loops-for-stride" ! CHECK-Ofast-SAME: "-O3" -! CHECK-Ofast-no: "{{.*}}flang-new" "-fc1" +! CHECK-Ofast-no: "{{.*}}flang" "-fc1" ! CHECK-Ofast-no-SAME: "-ffast-math" ! CHECK-Ofast-no-NOT: "-fversion-loops-for-stride" ! CHECK-Ofast-no-SAME: "-O3" -! CHECK-O3-no: "{{.*}}flang-new" "-fc1" +! CHECK-O3-no: "{{.*}}flang" "-fc1" ! CHECK-O3-no-NOT: "-fversion-loops-for-stride" ! CHECK-O3-no-SAME: "-O3" diff --git a/flang/test/Driver/wextra-ok.f90 b/flang/test/Driver/wextra-ok.f90 index 6a38d9481a36b7..441029aa0af276 100644 --- a/flang/test/Driver/wextra-ok.f90 +++ b/flang/test/Driver/wextra-ok.f90 @@ -1,4 +1,4 @@ -! Ensure that supplying -Wextra into flang-new does not raise error +! Ensure that supplying -Wextra into flang does not raise error ! The first check should be changed if -Wextra is implemented ! RUN: %flang -std=f2018 -Wextra %s -c 2>&1 | FileCheck %s --check-prefix=CHECK-OK diff --git a/flang/test/HLFIR/hlfir-flags.f90 b/flang/test/HLFIR/hlfir-flags.f90 index b383a79d12c27b..0b1e80b1e3f636 100644 --- a/flang/test/HLFIR/hlfir-flags.f90 +++ b/flang/test/HLFIR/hlfir-flags.f90 @@ -1,4 +1,4 @@ -! Test -flang-deprecated-hlfir, -flang-experimental-hlfir (flang-new), and +! Test -flang-deprecated-hlfir, -flang-experimental-hlfir (flang), and ! -hlfir (bbc), -emit-hlfir, -emit-fir flags ! RUN: %flang_fc1 -emit-hlfir -o - %s | FileCheck --check-prefix HLFIR --check-prefix ALL %s ! RUN: bbc -emit-hlfir -o - %s | FileCheck --check-prefix HLFIR --check-prefix ALL %s diff --git a/flang/test/Lower/Intrinsics/command_argument_count.f90 b/flang/test/Lower/Intrinsics/command_argument_count.f90 index 0cf92d4444db98..a30b27d664fc0c 100644 --- a/flang/test/Lower/Intrinsics/command_argument_count.f90 +++ b/flang/test/Lower/Intrinsics/command_argument_count.f90 @@ -1,6 +1,6 @@ ! RUN: bbc -emit-fir %s -o - | FileCheck %s -! bbc doesn't have a way to set the default kinds so we use flang-new driver -! RUN: flang-new -fc1 -fdefault-integer-8 -emit-fir %s -o - | FileCheck --check-prefixes=CHECK,CHECK-64 %s +! bbc doesn't have a way to set the default kinds so we use flang driver +! RUN: flang -fc1 -fdefault-integer-8 -emit-fir %s -o - | FileCheck --check-prefixes=CHECK,CHECK-64 %s ! CHECK-LABEL: argument_count_test subroutine argument_count_test() diff --git a/flang/test/Lower/Intrinsics/exit.f90 b/flang/test/Lower/Intrinsics/exit.f90 index c3110fcbec2b5a..bd551f7318a84a 100644 --- a/flang/test/Lower/Intrinsics/exit.f90 +++ b/flang/test/Lower/Intrinsics/exit.f90 @@ -1,5 +1,5 @@ ! RUN: bbc -emit-fir -hlfir=false %s -o - | FileCheck --check-prefixes=CHECK,CHECK-32 -DDEFAULT_INTEGER_SIZE=32 %s -! bbc doesn't have a way to set the default kinds so we use flang-new driver +! bbc doesn't have a way to set the default kinds so we use flang driver ! RUN: %flang_fc1 -fdefault-integer-8 -emit-fir -flang-deprecated-no-hlfir %s -o - | FileCheck --check-prefixes=CHECK,CHECK-64 -DDEFAULT_INTEGER_SIZE=64 %s ! CHECK-LABEL: func @_QPexit_test1() { diff --git a/flang/test/Lower/Intrinsics/ieee_is_normal.f90 b/flang/test/Lower/Intrinsics/ieee_is_normal.f90 index f9ab01881d250d..9b864c9a9849c3 100644 --- a/flang/test/Lower/Intrinsics/ieee_is_normal.f90 +++ b/flang/test/Lower/Intrinsics/ieee_is_normal.f90 @@ -1,5 +1,5 @@ ! RUN: bbc -emit-fir %s -o - | FileCheck %s -! RUN: flang-new -fc1 -emit-fir %s -o - | FileCheck %s +! RUN: flang -fc1 -emit-fir %s -o - | FileCheck %s ! CHECK-LABEL: ieee_is_normal_f16 subroutine ieee_is_normal_f16(r) diff --git a/flang/test/Lower/Intrinsics/isnan.f90 b/flang/test/Lower/Intrinsics/isnan.f90 index 700b2d1a67c656..62b98c8ea98bee 100644 --- a/flang/test/Lower/Intrinsics/isnan.f90 +++ b/flang/test/Lower/Intrinsics/isnan.f90 @@ -1,5 +1,5 @@ ! RUN: bbc -emit-fir %s -o - | FileCheck %s -! RUN: flang-new -fc1 -emit-fir %s -o - | FileCheck %s +! RUN: flang -fc1 -emit-fir %s -o - | FileCheck %s ! CHECK-LABEL: isnan_f32 subroutine isnan_f32(r) diff --git a/flang/test/Lower/Intrinsics/modulo.f90 b/flang/test/Lower/Intrinsics/modulo.f90 index ac18e59033a6b6..781ef8296a2b7d 100644 --- a/flang/test/Lower/Intrinsics/modulo.f90 +++ b/flang/test/Lower/Intrinsics/modulo.f90 @@ -1,5 +1,5 @@ ! RUN: bbc -emit-fir -hlfir=false %s -o - | FileCheck %s -check-prefixes=HONORINF,ALL -! RUN: flang-new -fc1 -menable-no-infs -emit-fir -flang-deprecated-no-hlfir %s -o - | FileCheck %s -check-prefixes=CHECK,ALL +! RUN: flang -fc1 -menable-no-infs -emit-fir -flang-deprecated-no-hlfir %s -o - | FileCheck %s -check-prefixes=CHECK,ALL ! ALL-LABEL: func @_QPmodulo_testr( ! ALL-SAME: %[[arg0:.*]]: !fir.ref{{.*}}, %[[arg1:.*]]: !fir.ref{{.*}}, %[[arg2:.*]]: !fir.ref{{.*}}) { diff --git a/flang/test/Lower/OpenMP/Todo/omp-declarative-allocate.f90 b/flang/test/Lower/OpenMP/Todo/omp-declarative-allocate.f90 index f02884e5e92f38..425ccbc5dd56c5 100644 --- a/flang/test/Lower/OpenMP/Todo/omp-declarative-allocate.f90 +++ b/flang/test/Lower/OpenMP/Todo/omp-declarative-allocate.f90 @@ -1,6 +1,6 @@ ! This test checks lowering of OpenMP allocate Directive. -// RUN: not flang-new -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s +// RUN: not flang -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s program main integer :: x, y diff --git a/flang/test/Lower/OpenMP/Todo/omp-declare-reduction.f90 b/flang/test/Lower/OpenMP/Todo/omp-declare-reduction.f90 index 3be61a1700ced3..7a7d28db8d6f5a 100644 --- a/flang/test/Lower/OpenMP/Todo/omp-declare-reduction.f90 +++ b/flang/test/Lower/OpenMP/Todo/omp-declare-reduction.f90 @@ -1,6 +1,6 @@ ! This test checks lowering of OpenMP declare reduction Directive. -// RUN: not flang-new -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s +// RUN: not flang -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s subroutine declare_red() integer :: my_var diff --git a/flang/test/Lower/OpenMP/Todo/omp-declare-simd.f90 b/flang/test/Lower/OpenMP/Todo/omp-declare-simd.f90 index c6a0a8f2cd0d22..be1ac2db5dfa4a 100644 --- a/flang/test/Lower/OpenMP/Todo/omp-declare-simd.f90 +++ b/flang/test/Lower/OpenMP/Todo/omp-declare-simd.f90 @@ -1,6 +1,6 @@ ! This test checks lowering of OpenMP declare simd Directive. -// RUN: not flang-new -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s +// RUN: not flang -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s subroutine sub(x, y) real, intent(inout) :: x, y diff --git a/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 b/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 index 62bc247a1456a1..bc5baf4e1cf604 100644 --- a/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 +++ b/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 @@ -1,7 +1,7 @@ ! This test checks lowering of `LASTPRIVATE` clause for scalar types. ! RUN: bbc -fopenmp -emit-hlfir %s -o - | FileCheck %s -! RUN: flang-new -fc1 -fopenmp -emit-hlfir %s -o - | FileCheck %s +! RUN: flang -fc1 -fopenmp -emit-hlfir %s -o - | FileCheck %s !CHECK: func @_QPlastprivate_character(%[[ARG1:.*]]: !fir.boxchar<1>{{.*}}) { !CHECK-DAG: %[[ARG1_UNBOX:.*]]:2 = fir.unboxchar diff --git a/flang/test/Lower/OpenMP/parallel-wsloop-reduction-byref.f90 b/flang/test/Lower/OpenMP/parallel-wsloop-reduction-byref.f90 index 32caac39778dee..99c521406a7775 100644 --- a/flang/test/Lower/OpenMP/parallel-wsloop-reduction-byref.f90 +++ b/flang/test/Lower/OpenMP/parallel-wsloop-reduction-byref.f90 @@ -1,7 +1,7 @@ ! Check that for parallel do, reduction is only processed for the loop ! RUN: bbc -fopenmp --force-byref-reduction -emit-hlfir %s -o - | FileCheck %s -! RUN: flang-new -fc1 -fopenmp -mmlir --force-byref-reduction -emit-hlfir %s -o - | FileCheck %s +! RUN: flang -fc1 -fopenmp -mmlir --force-byref-reduction -emit-hlfir %s -o - | FileCheck %s ! CHECK: omp.parallel { ! CHECK: omp.wsloop reduction(byref @add_reduction_byref_i32 diff --git a/flang/test/Lower/OpenMP/parallel-wsloop-reduction.f90 b/flang/test/Lower/OpenMP/parallel-wsloop-reduction.f90 index fdedbb06160761..cfeb5de83f4e82 100644 --- a/flang/test/Lower/OpenMP/parallel-wsloop-reduction.f90 +++ b/flang/test/Lower/OpenMP/parallel-wsloop-reduction.f90 @@ -1,7 +1,7 @@ ! Check that for parallel do, reduction is only processed for the loop ! RUN: bbc -fopenmp -emit-hlfir %s -o - | FileCheck %s -! RUN: flang-new -fc1 -fopenmp -emit-hlfir %s -o - | FileCheck %s +! RUN: flang -fc1 -fopenmp -emit-hlfir %s -o - | FileCheck %s ! CHECK: omp.parallel { ! CHECK: omp.wsloop reduction(@add_reduction_i32 diff --git a/flang/test/lit.cfg.py b/flang/test/lit.cfg.py index 4acbc0606d1977..f43234fb125b7e 100644 --- a/flang/test/lit.cfg.py +++ b/flang/test/lit.cfg.py @@ -132,13 +132,13 @@ tools = [ ToolSubst( "%flang", - command=FindTool("flang-new"), + command=FindTool("flang"), extra_args=isysroot_flag, unresolved="fatal", ), ToolSubst( "%flang_fc1", - command=FindTool("flang-new"), + command=FindTool("flang"), extra_args=["-fc1"], unresolved="fatal", ), diff --git a/flang/tools/f18/CMakeLists.txt b/flang/tools/f18/CMakeLists.txt index 9d7b8633958cb7..4362fcf0537616 100644 --- a/flang/tools/f18/CMakeLists.txt +++ b/flang/tools/f18/CMakeLists.txt @@ -55,7 +55,7 @@ endif() set(module_objects "") # Create module files directly from the top-level module source directory. -# If CMAKE_CROSSCOMPILING, then the newly built flang-new executable was +# If CMAKE_CROSSCOMPILING, then the newly built flang executable was # cross compiled, and thus can't be executed on the build system and thus # can't be used for generating module files. if (NOT CMAKE_CROSSCOMPILING) @@ -115,9 +115,9 @@ if (NOT CMAKE_CROSSCOMPILING) # TODO: We may need to flag this with conditional, in case Flang is built w/o OpenMP support add_custom_command(OUTPUT ${base}.mod ${object_output} COMMAND ${CMAKE_COMMAND} -E make_directory ${FLANG_INTRINSIC_MODULES_DIR} - COMMAND flang-new ${opts} ${decls} -cpp ${compile_with} -module-dir ${FLANG_INTRINSIC_MODULES_DIR} + COMMAND flang ${opts} ${decls} -cpp ${compile_with} -module-dir ${FLANG_INTRINSIC_MODULES_DIR} ${FLANG_SOURCE_DIR}/module/${filename}.f90 - DEPENDS flang-new ${FLANG_SOURCE_DIR}/module/${filename}.f90 ${FLANG_SOURCE_DIR}/module/__fortran_builtins.f90 ${depends} + DEPENDS flang ${FLANG_SOURCE_DIR}/module/${filename}.f90 ${FLANG_SOURCE_DIR}/module/__fortran_builtins.f90 ${depends} ) list(APPEND MODULE_FILES ${base}.mod) install(FILES ${base}.mod DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}/flang") @@ -142,9 +142,9 @@ if (NOT CMAKE_CROSSCOMPILING) set(base ${FLANG_INTRINSIC_MODULES_DIR}/omp_lib) add_custom_command(OUTPUT ${base}.mod ${base}_kinds.mod COMMAND ${CMAKE_COMMAND} -E make_directory ${FLANG_INTRINSIC_MODULES_DIR} - COMMAND flang-new -cpp -fsyntax-only ${opts} -module-dir ${FLANG_INTRINSIC_MODULES_DIR} + COMMAND flang -cpp -fsyntax-only ${opts} -module-dir ${FLANG_INTRINSIC_MODULES_DIR} ${CMAKE_BINARY_DIR}/projects/openmp/runtime/src/omp_lib.F90 - DEPENDS flang-new ${FLANG_INTRINSIC_MODULES_DIR}/iso_c_binding.mod ${CMAKE_BINARY_DIR}/projects/openmp/runtime/src/omp_lib.F90 ${depends} + DEPENDS flang ${FLANG_INTRINSIC_MODULES_DIR}/iso_c_binding.mod ${CMAKE_BINARY_DIR}/projects/openmp/runtime/src/omp_lib.F90 ${depends} ) add_custom_command(OUTPUT ${base}.f18.mod DEPENDS ${base}.mod diff --git a/flang/tools/flang-driver/CMakeLists.txt b/flang/tools/flang-driver/CMakeLists.txt index 9f33cdfe3fa90f..615c673374faf4 100644 --- a/flang/tools/flang-driver/CMakeLists.txt +++ b/flang/tools/flang-driver/CMakeLists.txt @@ -11,18 +11,18 @@ set( LLVM_LINK_COMPONENTS TargetParser ) -add_flang_tool(flang-new +add_flang_tool(flang driver.cpp fc1_main.cpp ) -target_link_libraries(flang-new +target_link_libraries(flang PRIVATE flangFrontend flangFrontendTool ) -clang_target_link_libraries(flang-new +clang_target_link_libraries(flang PRIVATE clangDriver clangBasic @@ -30,9 +30,9 @@ clang_target_link_libraries(flang-new option(FLANG_PLUGIN_SUPPORT "Build Flang with plugin support." ON) -# Enable support for plugins, which need access to symbols from flang-new +# Enable support for plugins, which need access to symbols from flang if(FLANG_PLUGIN_SUPPORT) - export_executable_symbols_for_plugins(flang-new) + export_executable_symbols_for_plugins(flang) endif() -install(TARGETS flang-new DESTINATION "${CMAKE_INSTALL_BINDIR}") +install(TARGETS flang DESTINATION "${CMAKE_INSTALL_BINDIR}") diff --git a/flang/tools/flang-driver/driver.cpp b/flang/tools/flang-driver/driver.cpp index 52136df10c0b02..603aab4205836c 100644 --- a/flang/tools/flang-driver/driver.cpp +++ b/flang/tools/flang-driver/driver.cpp @@ -95,7 +95,7 @@ int main(int argc, const char **argv) { llvm::StringSaver saver(a); ExpandResponseFiles(saver, args); - // Check if flang-new is in the frontend mode + // Check if flang is in the frontend mode auto firstArg = std::find_if(args.begin() + 1, args.end(), [](const char *a) { return a != nullptr; }); if (firstArg != args.end()) { @@ -104,7 +104,7 @@ int main(int argc, const char **argv) { << "Valid tools include '-fc1'.\n"; return 1; } - // Call flang-new frontend + // Call flang frontend if (llvm::StringRef(args[1]).starts_with("-fc1")) { return executeFC1Tool(args); } @@ -140,7 +140,7 @@ int main(int argc, const char **argv) { // Set the environment variable, FLANG_COMPILER_OPTIONS_STRING, to contain all // the compiler options. This is intended for the frontend driver, - // flang-new -fc1, to enable the implementation of the COMPILER_OPTIONS + // flang -fc1, to enable the implementation of the COMPILER_OPTIONS // intrinsic. To this end, the frontend driver requires the list of the // original compiler options, which is not available through other means. // TODO: This way of passing information between the compiler and frontend diff --git a/llvm/runtimes/CMakeLists.txt b/llvm/runtimes/CMakeLists.txt index d948b7eb39b39c..9da1f926817a8b 100644 --- a/llvm/runtimes/CMakeLists.txt +++ b/llvm/runtimes/CMakeLists.txt @@ -504,15 +504,15 @@ if(build_runtimes) if("openmp" IN_LIST LLVM_ENABLE_RUNTIMES) if (${LLVM_TOOL_FLANG_BUILD}) - message(STATUS "Configuring build of omp_lib.mod and omp_lib_kinds.mod via flang-new") - set(LIBOMP_FORTRAN_MODULES_COMPILER "${CMAKE_BINARY_DIR}/bin/flang-new") + message(STATUS "Configuring build of omp_lib.mod and omp_lib_kinds.mod via flang") + set(LIBOMP_FORTRAN_MODULES_COMPILER "${CMAKE_BINARY_DIR}/bin/flang") set(LIBOMP_MODULES_INSTALL_PATH "${CMAKE_INSTALL_INCLUDEDIR}/flang") # TODO: This is a workaround until flang becomes a first-class project - # in llvm/CMakeList.txt. Until then, this line ensures that flang-new is - # built before "openmp" is built as a runtime project. Besides "flang-new" + # in llvm/CMakeList.txt. Until then, this line ensures that flang is + # built before "openmp" is built as a runtime project. Besides "flang" # to build the compiler, we also need to add "module_files" to make sure # that all .mod files are also properly build. - list(APPEND extra_deps "flang-new" "module_files") + list(APPEND extra_deps "flang" "module_files") endif() foreach(dep opt llvm-link llvm-extract clang clang-offload-packager) if(TARGET ${dep}) diff --git a/offload/CMakeLists.txt b/offload/CMakeLists.txt index 9ffe8f56b76e67..9b771d1116ee38 100644 --- a/offload/CMakeLists.txt +++ b/offload/CMakeLists.txt @@ -89,9 +89,9 @@ else() # Check for flang if (NOT MSVC) - set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang-new) + set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang) else() - set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang-new.exe) + set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang.exe) endif() # Set fortran test compiler if flang is found diff --git a/openmp/CMakeLists.txt b/openmp/CMakeLists.txt index 3b4259dfa380e8..c206386fa6b614 100644 --- a/openmp/CMakeLists.txt +++ b/openmp/CMakeLists.txt @@ -69,9 +69,9 @@ else() # Check for flang if (NOT MSVC) - set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang-new) + set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang) else() - set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang-new.exe) + set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang.exe) endif() # Set fortran test compiler if flang is found >From b71c1d519cc61a751268b1ccda3fc59a966bab96 Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Thu, 26 Sep 2024 10:39:53 -0500 Subject: [PATCH 2/6] [flang][driver] restore flang-new as symlink Restore flang-new as a symlink to flang for backwards compatibility Co-authored-by: H. Vetinari Co-authored-by: Andrzej Warzynski --- clang/lib/Driver/ToolChain.cpp | 3 +++ flang/tools/flang-driver/CMakeLists.txt | 4 ++++ flang/tools/flang-driver/driver.cpp | 3 ++- 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp index 16f9b629fc538c..c9f3dbd7707b77 100644 --- a/clang/lib/Driver/ToolChain.cpp +++ b/clang/lib/Driver/ToolChain.cpp @@ -381,6 +381,9 @@ static const DriverSuffix *FindDriverSuffix(StringRef ProgName, size_t &Pos) { {"cl", "--driver-mode=cl"}, {"++", "--driver-mode=g++"}, {"flang", "--driver-mode=flang"}, + // For backwards compatibility, we create a symlink for `flang` called + // `flang-new`. This will be removed in the future. + {"flang-new", "--driver-mode=flang"}, {"clang-dxc", "--driver-mode=dxc"}, }; diff --git a/flang/tools/flang-driver/CMakeLists.txt b/flang/tools/flang-driver/CMakeLists.txt index 615c673374faf4..063acdd7dfe57c 100644 --- a/flang/tools/flang-driver/CMakeLists.txt +++ b/flang/tools/flang-driver/CMakeLists.txt @@ -36,3 +36,7 @@ if(FLANG_PLUGIN_SUPPORT) endif() install(TARGETS flang DESTINATION "${CMAKE_INSTALL_BINDIR}") + +# Keep "flang-new" as a symlink for backwards compatiblity. Remove once "flang" +# is a widely adopted name. +add_flang_symlink(flang-new flang) diff --git a/flang/tools/flang-driver/driver.cpp b/flang/tools/flang-driver/driver.cpp index 603aab4205836c..ed52988feaa59c 100644 --- a/flang/tools/flang-driver/driver.cpp +++ b/flang/tools/flang-driver/driver.cpp @@ -88,7 +88,8 @@ int main(int argc, const char **argv) { llvm::InitLLVM x(argc, argv); llvm::SmallVector args(argv, argv + argc); - clang::driver::ParsedClangName targetandMode("flang", "--driver-mode=flang"); + clang::driver::ParsedClangName targetandMode = + clang::driver::ToolChain::getTargetAndModeFromProgramName(argv[0]); std::string driverPath = getExecutablePath(args[0]); llvm::BumpPtrAllocator a; >From 443c951f8e0458e8b011424fad6a2e4b40b63144 Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Mon, 30 Sep 2024 10:16:59 -0500 Subject: [PATCH 3/6] [flang][driver] add version to flang executable --- flang/tools/flang-driver/CMakeLists.txt | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/flang/tools/flang-driver/CMakeLists.txt b/flang/tools/flang-driver/CMakeLists.txt index 063acdd7dfe57c..9a89a6185a3291 100644 --- a/flang/tools/flang-driver/CMakeLists.txt +++ b/flang/tools/flang-driver/CMakeLists.txt @@ -28,6 +28,12 @@ clang_target_link_libraries(flang clangBasic ) +# This creates the executable with a version appended +# and creates a symlink to it without the version +if(CYGWIN OR NOT WIN32) # but it doesn't work on Windows + set_target_properties(flang PROPERTIES VERSION ${FLANG_EXECUTABLE_VERSION}) +endif() + option(FLANG_PLUGIN_SUPPORT "Build Flang with plugin support." ON) # Enable support for plugins, which need access to symbols from flang >From 27ae40d86f235890d109ca88682dd0caba0d2c93 Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Thu, 3 Oct 2024 14:12:35 -0700 Subject: [PATCH 4/6] [flang][driver] add warning when using openmp --- clang/include/clang/Basic/DiagnosticDriverKinds.td | 3 +++ clang/include/clang/Basic/DiagnosticGroups.td | 4 ++++ clang/lib/Driver/ToolChains/Flang.cpp | 3 +++ 3 files changed, 10 insertions(+) diff --git a/clang/include/clang/Basic/DiagnosticDriverKinds.td b/clang/include/clang/Basic/DiagnosticDriverKinds.td index 97573fcf20c1fb..68722ad9633120 100644 --- a/clang/include/clang/Basic/DiagnosticDriverKinds.td +++ b/clang/include/clang/Basic/DiagnosticDriverKinds.td @@ -147,6 +147,9 @@ def warn_drv_unsupported_option_for_processor : Warning< def warn_drv_unsupported_openmp_library : Warning< "the library '%0=%1' is not supported, OpenMP will not be enabled">, InGroup; +def warn_openmp_experimental : Warning< + "OpenMP support in flang is still experimental">, + InGroup; def err_drv_invalid_thread_model_for_target : Error< "invalid thread model '%0' in '%1' for this target">; diff --git a/clang/include/clang/Basic/DiagnosticGroups.td b/clang/include/clang/Basic/DiagnosticGroups.td index 7d81bdf827ea0c..bfa065f018f8d8 100644 --- a/clang/include/clang/Basic/DiagnosticGroups.td +++ b/clang/include/clang/Basic/DiagnosticGroups.td @@ -1582,3 +1582,7 @@ def ExtractAPIMisuse : DiagGroup<"extractapi-misuse">; // Warnings about using the non-standard extension having an explicit specialization // with a storage class specifier. def ExplicitSpecializationStorageClass : DiagGroup<"explicit-specialization-storage-class">; + +// A warning for options that enable a feature that is not yet complete +def ExperimentalOption : DiagGroup<"experimental-option">; + diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp index 1ca12ff81389a3..19b43594b00815 100644 --- a/clang/lib/Driver/ToolChains/Flang.cpp +++ b/clang/lib/Driver/ToolChains/Flang.cpp @@ -787,6 +787,9 @@ void Flang::ConstructJob(Compilation &C, const JobAction &JA, if (Args.hasArg(options::OPT_fopenmp_force_usm)) CmdArgs.push_back("-fopenmp-force-usm"); + // TODO: OpenMP support isn't "done" yet, so for now we warn that it + // is experimental. + D.Diag(diag::warn_openmp_experimental); // FIXME: Clang supports a whole bunch more flags here. break; >From d8f95da5712a7d03a935c8b38f06d373c21f7a1f Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Fri, 4 Oct 2024 06:27:05 -0700 Subject: [PATCH 5/6] [flang][doc] update note about CMake support --- flang/docs/FlangDriver.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/flang/docs/FlangDriver.md b/flang/docs/FlangDriver.md index 47cf078cf2d0d4..23cbab30ee903e 100644 --- a/flang/docs/FlangDriver.md +++ b/flang/docs/FlangDriver.md @@ -335,7 +335,7 @@ just added using your new frontend option. ## CMake Support As of [#7246](https://gitlab.kitware.com/cmake/cmake/-/merge_requests/7246) -(and soon to be released CMake 3.24.0), `cmake` can detect `flang` as a +(CMake 3.28.0), `cmake` can detect `flang` as a supported Fortran compiler. You can configure your CMake projects to use `flang` as follows: ```bash >From a35343fd31314a59f671474474258c8707c123ab Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Fri, 4 Oct 2024 07:11:16 -0700 Subject: [PATCH 6/6] [flang][test] fix tests broken by rename --- flang/test/Driver/driver-version.f90 | 2 +- flang/test/Driver/lto-flags.f90 | 2 +- flang/test/Driver/missing-input.f90 | 6 +++--- flang/test/Driver/multiple-input-files.f90 | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/flang/test/Driver/driver-version.f90 b/flang/test/Driver/driver-version.f90 index 4c6aecb1c4fa7e..6daeb0e767c0e0 100644 --- a/flang/test/Driver/driver-version.f90 +++ b/flang/test/Driver/driver-version.f90 @@ -9,7 +9,7 @@ ! VERSION-NEXT: Thread model: ! VERSION-NEXT: InstalledDir: -! ERROR: flang: error: unknown argument '--versions'; did you mean '--version'? +! ERROR: flang{{.*}}: error: unknown argument '--versions'; did you mean '--version'? ! VERSION-FC1: LLVM version diff --git a/flang/test/Driver/lto-flags.f90 b/flang/test/Driver/lto-flags.f90 index bad3d972e6bd6b..be9416810716a9 100644 --- a/flang/test/Driver/lto-flags.f90 +++ b/flang/test/Driver/lto-flags.f90 @@ -30,7 +30,7 @@ ! FULL-LTO: "-fc1" ! FULL-LTO-SAME: "-flto=full" -! THIN-LTO-ALL: flang: warning: the option '-flto=thin' is a work in progress +! THIN-LTO-ALL: flang{{.*}}: warning: the option '-flto=thin' is a work in progress ! THIN-LTO-ALL: "-fc1" ! THIN-LTO-ALL-SAME: "-flto=thin" ! THIN-LTO-LINKER-PLUGIN: "-plugin-opt=thinlto" diff --git a/flang/test/Driver/missing-input.f90 b/flang/test/Driver/missing-input.f90 index 51d37a718c542f..aeefbe14c20563 100644 --- a/flang/test/Driver/missing-input.f90 +++ b/flang/test/Driver/missing-input.f90 @@ -17,10 +17,10 @@ ! RUN: not %flang_fc1 %t.f90 2>&1 | FileCheck %s --check-prefix=FLANG-FC1-NONEXISTENT-FILE ! RUN: not %flang_fc1 %S 2>&1 | FileCheck %s --check-prefix=FLANG-FC1-DIR -! FLANG-NO-FILE: flang: error: no input files +! FLANG-NO-FILE: flang{{.*}}: error: no input files -! FLANG-NONEXISTENT-FILE: flang: error: no such file or directory: {{.*}} -! FLANG-NONEXISTENT-FILE: flang: error: no input files +! FLANG-NONEXISTENT-FILE: flang{{.*}}: error: no such file or directory: {{.*}} +! FLANG-NONEXISTENT-FILE: flang{{.*}}: error: no input files ! FLANG-FC1-NONEXISTENT-FILE: error: {{.*}} does not exist ! FLANG-FC1-DIR: error: {{.*}} is not a regular file diff --git a/flang/test/Driver/multiple-input-files.f90 b/flang/test/Driver/multiple-input-files.f90 index 64ec8679abf94f..0242db288babf2 100644 --- a/flang/test/Driver/multiple-input-files.f90 +++ b/flang/test/Driver/multiple-input-files.f90 @@ -39,7 +39,7 @@ ! FLANG-NEXT:end program hello ! TEST 2: `-o` does not when multiple input files are present -! ERROR: flang: error: cannot specify -o when generating multiple output files +! ERROR: flang{{.*}}: error: cannot specify -o when generating multiple output files ! TEST 3: The output file _was not_ specified - `flang_fc1` will process all ! input files and generate one output file for every input file. From openmp-commits at lists.llvm.org Fri Oct 4 07:56:26 2024 From: openmp-commits at lists.llvm.org (Brad Richardson via Openmp-commits) Date: Fri, 04 Oct 2024 07:56:26 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <6700021a.170a0220.12a9d5.c54f@mx.google.com> https://github.com/everythingfunctional updated https://github.com/llvm/llvm-project/pull/110023 >From 649a73478c78389560042030a9717a05e8e338a8 Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Wed, 25 Sep 2024 13:25:22 -0500 Subject: [PATCH 1/7] [flang][driver] rename flang-new to flang --- .github/workflows/release-binaries.yml | 2 +- clang/include/clang/Driver/Options.td | 4 +- clang/lib/Driver/Driver.cpp | 2 +- clang/lib/Driver/ToolChains/Flang.cpp | 6 +- clang/test/Driver/flang/flang.f90 | 2 +- clang/test/Driver/flang/flang_ucase.F90 | 2 +- .../Driver/flang/multiple-inputs-mixed.f90 | 2 +- clang/test/Driver/flang/multiple-inputs.f90 | 4 +- flang/docs/FlangDriver.md | 76 +++++++++---------- flang/docs/ImplementingASemanticCheck.md | 4 +- flang/docs/Overview.md | 26 +++---- .../FlangOmpReport/FlangOmpReport.cpp | 2 +- .../flang/Optimizer/Analysis/AliasAnalysis.h | 2 +- flang/include/flang/Tools/CrossToolHelpers.h | 2 +- flang/lib/Frontend/CompilerInvocation.cpp | 6 +- flang/lib/Frontend/FrontendActions.cpp | 2 +- .../ExecuteCompilerInvocation.cpp | 3 +- flang/runtime/CMakeLists.txt | 6 +- flang/test/CMakeLists.txt | 2 +- flang/test/Driver/aarch64-outline-atomics.f90 | 2 +- .../Driver/color-diagnostics-forwarding.f90 | 4 +- flang/test/Driver/compiler-options.f90 | 4 +- flang/test/Driver/convert.f90 | 2 +- .../test/Driver/disable-ext-name-interop.f90 | 2 +- flang/test/Driver/driver-version.f90 | 4 +- flang/test/Driver/escaped-backslash.f90 | 4 +- flang/test/Driver/fdefault.f90 | 28 +++---- flang/test/Driver/flarge-sizes.f90 | 20 ++--- .../test/Driver/frame-pointer-forwarding.f90 | 2 +- flang/test/Driver/frontend-forwarding.f90 | 4 +- flang/test/Driver/hlfir-no-hlfir-error.f90 | 4 +- flang/test/Driver/intrinsic-module-path.f90 | 2 +- flang/test/Driver/large-data-threshold.f90 | 6 +- flang/test/Driver/lto-flags.f90 | 2 +- flang/test/Driver/macro-def-undef.F90 | 4 +- flang/test/Driver/missing-input.f90 | 14 ++-- flang/test/Driver/multiple-input-files.f90 | 2 +- flang/test/Driver/omp-driver-offload.f90 | 66 ++++++++-------- .../predefined-macros-compiler-version.F90 | 4 +- flang/test/Driver/std2018-wrong.f90 | 2 +- flang/test/Driver/std2018.f90 | 2 +- .../Driver/supported-suffices/f03-suffix.f03 | 2 +- .../Driver/supported-suffices/f08-suffix.f08 | 2 +- flang/test/Driver/use-module-error.f90 | 4 +- flang/test/Driver/use-module.f90 | 4 +- flang/test/Driver/version-loops.f90 | 18 ++--- flang/test/Driver/wextra-ok.f90 | 2 +- flang/test/HLFIR/hlfir-flags.f90 | 2 +- .../Intrinsics/command_argument_count.f90 | 4 +- flang/test/Lower/Intrinsics/exit.f90 | 2 +- .../test/Lower/Intrinsics/ieee_is_normal.f90 | 2 +- flang/test/Lower/Intrinsics/isnan.f90 | 2 +- flang/test/Lower/Intrinsics/modulo.f90 | 2 +- .../OpenMP/Todo/omp-declarative-allocate.f90 | 2 +- .../OpenMP/Todo/omp-declare-reduction.f90 | 2 +- .../Lower/OpenMP/Todo/omp-declare-simd.f90 | 2 +- .../parallel-lastprivate-clause-scalar.f90 | 2 +- .../parallel-wsloop-reduction-byref.f90 | 2 +- .../OpenMP/parallel-wsloop-reduction.f90 | 2 +- flang/test/lit.cfg.py | 4 +- flang/tools/f18/CMakeLists.txt | 10 +-- flang/tools/flang-driver/CMakeLists.txt | 12 +-- flang/tools/flang-driver/driver.cpp | 6 +- llvm/runtimes/CMakeLists.txt | 10 +-- offload/CMakeLists.txt | 4 +- openmp/CMakeLists.txt | 4 +- 66 files changed, 220 insertions(+), 227 deletions(-) diff --git a/.github/workflows/release-binaries.yml b/.github/workflows/release-binaries.yml index 925912df6843e4..6073ebac9e6c2c 100644 --- a/.github/workflows/release-binaries.yml +++ b/.github/workflows/release-binaries.yml @@ -328,7 +328,7 @@ jobs: run: | # Build some of the mlir tools that take a long time to link if [ "${{ needs.prepare.outputs.build-flang }}" = "true" ]; then - ninja -C ${{ steps.setup-stage.outputs.build-prefix }}/build/tools/clang/stage2-bins/ -j2 flang-new bbc + ninja -C ${{ steps.setup-stage.outputs.build-prefix }}/build/tools/clang/stage2-bins/ -j2 flang bbc fi ninja -C ${{ steps.setup-stage.outputs.build-prefix }}/build/tools/clang/stage2-bins/ \ mlir-bytecode-parser-fuzzer \ diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 932cf13edab53d..4a45a825da8fa1 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -6071,7 +6071,7 @@ def _sysroot_EQ : Joined<["--"], "sysroot=">, Visibility<[ClangOption, FlangOpti def _sysroot : Separate<["--"], "sysroot">, Alias<_sysroot_EQ>; //===----------------------------------------------------------------------===// -// pie/pic options (clang + flang-new) +// pie/pic options (clang + flang) //===----------------------------------------------------------------------===// let Visibility = [ClangOption, FlangOption] in { @@ -6087,7 +6087,7 @@ def fno_pie : Flag<["-"], "fno-pie">, Group; } // let Vis = [Default, FlangOption] //===----------------------------------------------------------------------===// -// Target Options (clang + flang-new) +// Target Options (clang + flang) //===----------------------------------------------------------------------===// let Flags = [TargetSpecific] in { let Visibility = [ClangOption, FlangOption] in { diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp index d0c8bdba0ede95..4243ee006c1553 100644 --- a/clang/lib/Driver/Driver.cpp +++ b/clang/lib/Driver/Driver.cpp @@ -2021,7 +2021,7 @@ void Driver::PrintHelp(bool ShowHidden) const { void Driver::PrintVersion(const Compilation &C, raw_ostream &OS) const { if (IsFlangMode()) { - OS << getClangToolFullVersion("flang-new") << '\n'; + OS << getClangToolFullVersion("flang") << '\n'; } else { // FIXME: The following handlers should use a callback mechanism, we don't // know what the client would like to do. diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp index 98350690f8d20e..1ca12ff81389a3 100644 --- a/clang/lib/Driver/ToolChains/Flang.cpp +++ b/clang/lib/Driver/ToolChains/Flang.cpp @@ -881,14 +881,12 @@ void Flang::ConstructJob(Compilation &C, const JobAction &JA, CmdArgs.push_back(Input.getFilename()); - // TODO: Replace flang-new with flang once the new driver replaces the - // throwaway driver - const char *Exec = Args.MakeArgString(D.GetProgramPath("flang-new", TC)); + const char *Exec = Args.MakeArgString(D.GetProgramPath("flang", TC)); C.addCommand(std::make_unique(JA, *this, ResponseFileSupport::AtFileUTF8(), Exec, CmdArgs, Inputs, Output)); } -Flang::Flang(const ToolChain &TC) : Tool("flang-new", "flang frontend", TC) {} +Flang::Flang(const ToolChain &TC) : Tool("flang", "flang frontend", TC) {} Flang::~Flang() {} diff --git a/clang/test/Driver/flang/flang.f90 b/clang/test/Driver/flang/flang.f90 index ad4a3a3b6bd44d..b52977ee66d7b0 100644 --- a/clang/test/Driver/flang/flang.f90 +++ b/clang/test/Driver/flang/flang.f90 @@ -13,7 +13,7 @@ ! * (no type specified, resulting in an object file) ! All invocations should begin with flang -fc1, consume up to here. -! ALL-LABEL: "{{[^"]*}}flang-new{{[^"/]*}}" "-fc1" +! ALL-LABEL: "{{[^"]*}}flang{{[^"/]*}}" "-fc1" ! Check that f90 files are not treated as "previously preprocessed" ! ... in --driver-mode=flang. diff --git a/clang/test/Driver/flang/flang_ucase.F90 b/clang/test/Driver/flang/flang_ucase.F90 index e89c053b327bc9..88aedc39fb94a7 100644 --- a/clang/test/Driver/flang/flang_ucase.F90 +++ b/clang/test/Driver/flang/flang_ucase.F90 @@ -13,7 +13,7 @@ ! * (no type specified, resulting in an object file) ! All invocations should begin with flang -fc1, consume up to here. -! ALL-LABEL: "{{[^"]*}}flang-new{{[^"/]*}}" "-fc1" +! ALL-LABEL: "{{[^"]*}}flang{{[^"/]*}}" "-fc1" ! Check that f90 files are not treated as "previously preprocessed" ! ... in --driver-mode=flang. diff --git a/clang/test/Driver/flang/multiple-inputs-mixed.f90 b/clang/test/Driver/flang/multiple-inputs-mixed.f90 index 2395dbecf1fe92..98d8cab00bdfdb 100644 --- a/clang/test/Driver/flang/multiple-inputs-mixed.f90 +++ b/clang/test/Driver/flang/multiple-inputs-mixed.f90 @@ -1,7 +1,7 @@ ! Check that flang can handle mixed C and fortran inputs. ! RUN: %clang --driver-mode=flang -### -fsyntax-only %S/Inputs/one.f90 %S/Inputs/other.c 2>&1 | FileCheck --check-prefixes=CHECK-SYNTAX-ONLY %s -! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang-new{{[^"/]*}}" "-fc1" +! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang{{[^"/]*}}" "-fc1" ! CHECK-SYNTAX-ONLY: "{{[^"]*}}/Inputs/one.f90" ! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}clang{{[^"/]*}}" "-cc1" ! CHECK-SYNTAX-ONLY: "{{[^"]*}}/Inputs/other.c" diff --git a/clang/test/Driver/flang/multiple-inputs.f90 b/clang/test/Driver/flang/multiple-inputs.f90 index ada999e927a6a0..3c0f22e5d3e508 100644 --- a/clang/test/Driver/flang/multiple-inputs.f90 +++ b/clang/test/Driver/flang/multiple-inputs.f90 @@ -1,7 +1,7 @@ ! Check that flang driver can handle multiple inputs at once. ! RUN: %clang --driver-mode=flang -### -fsyntax-only %S/Inputs/one.f90 %S/Inputs/two.f90 2>&1 | FileCheck --check-prefixes=CHECK-SYNTAX-ONLY %s -! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang-new{{[^"/]*}}" "-fc1" +! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang{{[^"/]*}}" "-fc1" ! CHECK-SYNTAX-ONLY: "{{[^"]*}}/Inputs/one.f90" -! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang-new{{[^"/]*}}" "-fc1" +! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang{{[^"/]*}}" "-fc1" ! CHECK-SYNTAX-ONLY: "{{[^"]*}}/Inputs/two.f90" diff --git a/flang/docs/FlangDriver.md b/flang/docs/FlangDriver.md index 815c26a28dfdfa..47cf078cf2d0d4 100644 --- a/flang/docs/FlangDriver.md +++ b/flang/docs/FlangDriver.md @@ -15,17 +15,13 @@ local: ``` There are two main drivers in Flang: -* the compiler driver, `flang-new` -* the frontend driver, `flang-new -fc1` - -> **_NOTE:_** The diagrams in this document refer to `flang` as opposed to -> `flang-new`. Eventually, `flang-new` will be renamed as `flang` and the -> diagrams reflect the final design that we are still working towards. +* the compiler driver, `flang` +* the frontend driver, `flang -fc1` The **compiler driver** will allow you to control all compilation phases (e.g. preprocessing, semantic checks, code-generation, code-optimisation, lowering and linking). For frontend specific tasks, the compiler driver creates a -Fortran compilation job and delegates it to `flang-new -fc1`, the frontend +Fortran compilation job and delegates it to `flang -fc1`, the frontend driver. For linking, it creates a linker job and calls an external linker (e.g. LLVM's [`lld`](https://lld.llvm.org/)). It can also call other tools such as external assemblers (e.g. [`as`](https://www.gnu.org/software/binutils/)). In @@ -47,7 +43,7 @@ frontend. It uses MLIR and LLVM for code-generation and can be viewed as a driver for Flang, LLVM and MLIR libraries. Contrary to the compiler driver, it is not capable of calling any external tools (including linkers). It is aware of all the frontend internals that are "hidden" from the compiler driver. It -accepts many frontend-specific options not available in `flang-new` and as such +accepts many frontend-specific options not available in `flang` and as such it provides a finer control over the frontend. Note that this tool is mostly intended for Flang developers. In particular, there are no guarantees about the stability of its interface and compiler developers can use it to experiment @@ -62,30 +58,30 @@ frontend specific flag from the _compiler_ directly to the _frontend_ driver, e.g.: ```bash -flang-new -Xflang -fdebug-dump-parse-tree input.f95 +flang -Xflang -fdebug-dump-parse-tree input.f95 ``` -In the invocation above, `-fdebug-dump-parse-tree` is forwarded to `flang-new +In the invocation above, `-fdebug-dump-parse-tree` is forwarded to `flang -fc1`. Without the forwarding flag, `-Xflang`, you would see the following warning: ```bash -flang-new: warning: argument unused during compilation: +flang: warning: argument unused during compilation: ``` -As `-fdebug-dump-parse-tree` is only supported by `flang-new -fc1`, `flang-new` +As `-fdebug-dump-parse-tree` is only supported by `flang -fc1`, `flang` will ignore it when used without `Xflang`. ## Why Do We Need Two Drivers? -As hinted above, `flang-new` and `flang-new -fc1` are two separate tools. The -fact that these tools are accessed through one binary, `flang-new`, is just an +As hinted above, `flang` and `flang -fc1` are two separate tools. The +fact that these tools are accessed through one binary, `flang`, is just an implementation detail. Each tool has a separate list of options, albeit defined in the same file: `clang/include/clang/Driver/Options.td`. The separation helps us split various tasks and allows us to implement more -specialised tools. In particular, `flang-new` is not aware of various +specialised tools. In particular, `flang` is not aware of various compilation phases within the frontend (e.g. scanning, parsing or semantic -checks). It does not have to be. Conversely, the frontend driver, `flang-new +checks). It does not have to be. Conversely, the frontend driver, `flang -fc1`, needs not to be concerned with linkers or other external tools like assemblers. Nor does it need to know where to look for various systems libraries, which is usually OS and platform specific. @@ -104,7 +100,7 @@ GCC](https://en.wikibooks.org/wiki/GNU_C_Compiler_Internals/GNU_C_Compiler_Archi In fact, Flang needs to adhere to this model in order to be able to re-use Clang's driver library. If you are more familiar with the [architecture of GFortran](https://gcc.gnu.org/onlinedocs/gcc-4.7.4/gfortran/About-GNU-Fortran.html) -than Clang, then `flang-new` corresponds to `gfortran` and `flang-new -fc1` to +than Clang, then `flang` corresponds to `gfortran` and `flang -fc1` to `f951`. ## Compiler Driver @@ -135,7 +131,7 @@ output from one action is the input for the subsequent one. You can use the `-ccc-print-phases` flag to see the sequence of actions that the driver will create for your compiler invocation: ```bash -flang-new -ccc-print-phases -E file.f +flang -ccc-print-phases -E file.f +- 0: input, "file.f", f95-cpp-input 1: preprocessor, {0}, f95 ``` @@ -143,7 +139,7 @@ As you can see, for `-E` the driver creates only two jobs and stops immediately after preprocessing. The first job simply prepares the input. For `-c`, the pipeline of the created jobs is more complex: ```bash -flang-new -ccc-print-phases -c file.f +flang -ccc-print-phases -c file.f +- 0: input, "file.f", f95-cpp-input +- 1: preprocessor, {0}, f95 +- 2: compiler, {1}, ir @@ -158,7 +154,7 @@ command to call the frontend driver is generated (more specifically, an instance of `clang::driver::Command`). Every command is bound to an instance of `clang::driver::Tool`. For Flang we introduced a specialisation of this class: `clang::driver::Flang`. This class implements the logic to either translate or -forward compiler options to the frontend driver, `flang-new -fc1`. +forward compiler options to the frontend driver, `flang -fc1`. You can read more on the design of `clangDriver` in Clang's [Driver Design & Internals](https://clang.llvm.org/docs/DriverInternals.html). @@ -232,12 +228,12 @@ driver, `clang -cc1` and consists of the following classes: This list is not exhaustive and only covers the main classes that implement the driver. The main entry point for the frontend driver, `fc1_main`, is implemented in `flang/tools/flang-driver/driver.cpp`. It can be accessed by -invoking the compiler driver, `flang-new`, with the `-fc1` flag. +invoking the compiler driver, `flang`, with the `-fc1` flag. The frontend driver will only run one action at a time. If you specify multiple action flags, only the last one will be taken into account. The default action is `ParseSyntaxOnlyAction`, which corresponds to `-fsyntax-only`. In other -words, `flang-new -fc1 ` is equivalent to `flang-new -fc1 -fsyntax-only +words, `flang -fc1 ` is equivalent to `flang -fc1 -fsyntax-only `. ## Adding new Compiler Options @@ -262,8 +258,8 @@ similar semantics to your new option and start by copying that. For every new option, you will also have to define the visibility of the new option. This is controlled through the `Visibility` field. You can use the following Flang specific visibility flags to control this: - * `FlangOption` - this option will be available in the `flang-new` compiler driver, - * `FC1Option` - this option will be available in the `flang-new -fc1` frontend driver, + * `FlangOption` - this option will be available in the `flang` compiler driver, + * `FC1Option` - this option will be available in the `flang -fc1` frontend driver, Options that are supported by clang should explicitly specify `ClangOption` in `Visibility`, and options that are only supported in Flang should not specify @@ -290,10 +286,10 @@ The parsing will depend on the semantics encoded in the TableGen definition. When adding a compiler driver option (i.e. an option that contains `FlangOption` among in it's `Visibility`) that you also intend to be understood -by the frontend, make sure that it is either forwarded to `flang-new -fc1` or +by the frontend, make sure that it is either forwarded to `flang -fc1` or translated into some other option that is accepted by the frontend driver. In the case of options that contain both `FlangOption` and `FC1Option` among its -flags, we usually just forward from `flang-new` to `flang-new -fc1`. This is +flags, we usually just forward from `flang` to `flang -fc1`. This is then tested in `flang/test/Driver/frontend-forward.F90`. What follows is usually very dependant on the meaning of the corresponding @@ -339,11 +335,11 @@ just added using your new frontend option. ## CMake Support As of [#7246](https://gitlab.kitware.com/cmake/cmake/-/merge_requests/7246) -(and soon to be released CMake 3.24.0), `cmake` can detect `flang-new` as a +(and soon to be released CMake 3.24.0), `cmake` can detect `flang` as a supported Fortran compiler. You can configure your CMake projects to use -`flang-new` as follows: +`flang` as follows: ```bash -cmake -DCMAKE_Fortran_COMPILER= +cmake -DCMAKE_Fortran_COMPILER= ``` You should see the following in the output: ``` @@ -353,14 +349,14 @@ where `` corresponds to the LLVM Flang version. ## Testing In LIT, we define two variables that you can use to invoke Flang's drivers: -* `%flang` is expanded as `flang-new` (i.e. the compiler driver) -* `%flang_fc1` is expanded as `flang-new -fc1` (i.e. the frontend driver) +* `%flang` is expanded as `flang` (i.e. the compiler driver) +* `%flang_fc1` is expanded as `flang -fc1` (i.e. the frontend driver) For most regression tests for the frontend, you will want to use `%flang_fc1`. In some cases, the observable behaviour will be identical regardless of whether `%flang` or `%flang_fc1` is used. However, when you are using `%flang` instead of `%flang_fc1`, the compiler driver will add extra flags to the frontend -driver invocation (i.e. `flang-new -fc1 -`). In some cases that might +driver invocation (i.e. `flang -fc1 -`). In some cases that might be exactly what you want to test. In fact, you can check these additional flags by using the `-###` compiler driver command line option. @@ -380,7 +376,7 @@ plugins. The process for using plugins includes: * [Creating a plugin](#creating-a-plugin) * [Loading and running a plugin](#loading-and-running-a-plugin) -Flang plugins are limited to `flang-new -fc1` and are currently only available / +Flang plugins are limited to `flang -fc1` and are currently only available / been tested on Linux. ### Creating a Plugin @@ -465,14 +461,14 @@ static FrontendPluginRegistry::Add X( ### Loading and Running a Plugin In order to use plugins, there are 2 command line options made available to the -frontend driver, `flang-new -fc1`: +frontend driver, `flang -fc1`: * [`-load `](#the--load-dsopath-option) for loading the dynamic shared object of the plugin * [`-plugin `](#the--plugin-name-option) for calling the registered plugin Invocation of the example plugin is done through: ```bash -flang-new -fc1 -load flangPrintFunctionNames.so -plugin print-fns file.f90 +flang -fc1 -load flangPrintFunctionNames.so -plugin print-fns file.f90 ``` Both these options are parsed in `flang/lib/Frontend/CompilerInvocation.cpp` and @@ -493,7 +489,7 @@ reports an error diagnostic and returns `nullptr`. ### Enabling In-Tree Plugins For in-tree plugins, there is the CMake flag `FLANG_PLUGIN_SUPPORT`, enabled by -default, that controls the exporting of executable symbols from `flang-new`, +default, that controls the exporting of executable symbols from `flang`, which plugins need access to. Additionally, there is the CMake flag `LLVM_BUILD_EXAMPLES`, turned off by default, that is used to control if the example programs are built. This includes plugins that are in the @@ -526,7 +522,7 @@ invocations `invokeFIROptEarlyEPCallbacks`, `invokeFIRInlinerCallback`, and `invokeFIROptLastEPCallbacks` for Flang drivers to be able to insert additonal passes at different points of the default pass pipeline. An example use of these extension point callbacks is shown in `registerDefaultInlinerPass` to invoke the -default inliner pass in `flang-new`. +default inliner pass in `flang`. ## LLVM Pass Plugins @@ -539,7 +535,7 @@ documentation for [`llvm::PassBuilder`](https://llvm.org/doxygen/classllvm_1_1PassBuilder.html) for details. -The framework to enable pass plugins in `flang-new` uses the exact same +The framework to enable pass plugins in `flang` uses the exact same machinery as that used by `clang` and thus has the same capabilities and limitations. @@ -547,7 +543,7 @@ In order to use a pass plugin, the pass(es) must be compiled into a dynamic shared object which is then loaded using the `-fpass-plugin` option. ``` -flang-new -fpass-plugin=/path/to/plugin.so +flang -fpass-plugin=/path/to/plugin.so ``` This option is available in both the compiler driver and the frontend driver. @@ -559,7 +555,7 @@ Pass extensions are similar to plugins, except that they can also be linked statically. Setting `-DLLVM_${NAME}_LINK_INTO_TOOLS` to `ON` in the cmake command turns the project into a statically linked extension. An example would be Polly, e.g., using `-DLLVM_POLLY_LINK_INTO_TOOLS=ON` would link Polly passes -into `flang-new` as built-in middle-end passes. +into `flang` as built-in middle-end passes. See the [`WritingAnLLVMNewPMPass`](https://llvm.org/docs/WritingAnLLVMNewPMPass.html#id9) diff --git a/flang/docs/ImplementingASemanticCheck.md b/flang/docs/ImplementingASemanticCheck.md index 5b583d4f8031b8..598ef696ad14bf 100644 --- a/flang/docs/ImplementingASemanticCheck.md +++ b/flang/docs/ImplementingASemanticCheck.md @@ -68,7 +68,7 @@ of the call to `intentOutFunc()`: I also used this program to produce a parse tree for the program using the command: ```bash - flang-new -fc1 -fdebug-dump-parse-tree testfun.f90 + flang -fc1 -fdebug-dump-parse-tree testfun.f90 ``` Here's the relevant fragment of the parse tree produced by the compiler: @@ -296,7 +296,7 @@ In `lib/Semantics/check-do.cpp`, I added an (almost empty) implementation: I then built the compiler with these changes and ran it on my test program. This time, I made sure to invoke semantic checking. Here's the command I used: ```bash - flang-new -fc1 -fdebug-unparse-with-symbols testfun.f90 + flang -fc1 -fdebug-unparse-with-symbols testfun.f90 ``` This produced the output: diff --git a/flang/docs/Overview.md b/flang/docs/Overview.md index 6eba19ea3a3c0d..dfb4d89264a755 100644 --- a/flang/docs/Overview.md +++ b/flang/docs/Overview.md @@ -65,8 +65,8 @@ See [Preprocessing.md](Preprocessing.md). **Entry point:** `parser::Parsing::Prescan` **Commands:** - - `flang-new -fc1 -E src.f90` dumps the cooked character stream - - `flang-new -fc1 -fdebug-dump-provenance src.f90` dumps provenance + - `flang -fc1 -E src.f90` dumps the cooked character stream + - `flang -fc1 -fdebug-dump-provenance src.f90` dumps provenance information ### Parsing @@ -80,10 +80,10 @@ representing a syntactically correct program, rooted at the program unit. See: **Entry point:** `parser::Parsing::Parse` **Commands:** - - `flang-new -fc1 -fdebug-dump-parse-tree-no-sema src.f90` dumps the parse tree - - `flang-new -fc1 -fdebug-unparse src.f90` converts the parse tree to normalized Fortran - - `flang-new -fc1 -fdebug-dump-parsing-log src.f90` runs an instrumented parse and dumps the log - - `flang-new -fc1 -fdebug-measure-parse-tree src.f90` measures the parse tree + - `flang -fc1 -fdebug-dump-parse-tree-no-sema src.f90` dumps the parse tree + - `flang -fc1 -fdebug-unparse src.f90` converts the parse tree to normalized Fortran + - `flang -fc1 -fdebug-dump-parsing-log src.f90` runs an instrumented parse and dumps the log + - `flang -fc1 -fdebug-measure-parse-tree src.f90` measures the parse tree ### Semantic processing @@ -121,9 +121,9 @@ In the course of semantic analysis, the compiler: At the end of semantic processing, all validation of the user's program is complete. This is the last detailed phase of analysis processing. **Commands:** - - `flang-new -fc1 -fdebug-dump-parse-tree src.f90` dumps the parse tree after semantic analysis - - `flang-new -fc1 -fdebug-dump-symbols src.f90` dumps the symbol table - - `flang-new -fc1 -fdebug-dump-all src.f90` dumps both the parse tree and the symbol table + - `flang -fc1 -fdebug-dump-parse-tree src.f90` dumps the parse tree after semantic analysis + - `flang -fc1 -fdebug-dump-symbols src.f90` dumps the symbol table + - `flang -fc1 -fdebug-dump-all src.f90` dumps both the parse tree and the symbol table ## Lowering @@ -163,8 +163,8 @@ contain a list of evaluations. All of these contain pointers back into the parse tree. The compiler walks the PFT generating FIR. **Commands:** - - `flang-new -fc1 -fdebug-dump-pft src.f90` dumps the pre-FIR tree - - `flang-new -fc1 -emit-mlir src.f90` dumps the FIR to the files src.mlir + - `flang -fc1 -fdebug-dump-pft src.f90` dumps the pre-FIR tree + - `flang -fc1 -emit-mlir src.f90` dumps the FIR to the files src.mlir ### Transformation passes @@ -180,8 +180,8 @@ perform various optimizations and transformations. The final pass creates an LLVM IR representation of the program. **Commands:** - - `flang-new -mmlir --mlir-print-ir-after-all -S src.f90` dumps the FIR code after each pass to standard error - - `flang-new -fc1 -emit-llvm src.f90` dumps the LLVM IR to src.ll + - `flang -mmlir --mlir-print-ir-after-all -S src.f90` dumps the FIR code after each pass to standard error + - `flang -fc1 -emit-llvm src.f90` dumps the LLVM IR to src.ll ## Object code generation and linking diff --git a/flang/examples/FlangOmpReport/FlangOmpReport.cpp b/flang/examples/FlangOmpReport/FlangOmpReport.cpp index 9c1f304b9741e7..709c5c5d305e51 100644 --- a/flang/examples/FlangOmpReport/FlangOmpReport.cpp +++ b/flang/examples/FlangOmpReport/FlangOmpReport.cpp @@ -9,7 +9,7 @@ // all the OpenMP constructs and clauses and which line they're located on. // // The plugin may be invoked as: -// ./bin/flang-new -fc1 -load lib/flangOmpReport.so -plugin flang-omp-report +// ./bin/flang -fc1 -load lib/flangOmpReport.so -plugin flang-omp-report // -fopenmp // //===----------------------------------------------------------------------===// diff --git a/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h b/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h index 9a70b7fbfad2b6..8ab5150cd7c812 100644 --- a/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h +++ b/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h @@ -67,7 +67,7 @@ struct AliasAnalysis { // end subroutine // ------------------------------------------------- // - // flang-new -fc1 -emit-fir test.f90 -o test.fir + // flang -fc1 -emit-fir test.f90 -o test.fir // // ------------------- test.fir -------------------- // fir.global @_QMtopEa : !fir.box>> diff --git a/flang/include/flang/Tools/CrossToolHelpers.h b/flang/include/flang/Tools/CrossToolHelpers.h index 3e703de545950c..df4b21ada058fe 100644 --- a/flang/include/flang/Tools/CrossToolHelpers.h +++ b/flang/include/flang/Tools/CrossToolHelpers.h @@ -7,7 +7,7 @@ //===----------------------------------------------------------------------===// // A header file for containing functionallity that is used across Flang tools, // such as helper functions which apply or generate information needed accross -// tools like bbc and flang-new. +// tools like bbc and flang. //===----------------------------------------------------------------------===// #ifndef FORTRAN_TOOLS_CROSS_TOOL_HELPERS_H diff --git a/flang/lib/Frontend/CompilerInvocation.cpp b/flang/lib/Frontend/CompilerInvocation.cpp index 05b03ba9ebdf30..18383eaafb1136 100644 --- a/flang/lib/Frontend/CompilerInvocation.cpp +++ b/flang/lib/Frontend/CompilerInvocation.cpp @@ -65,8 +65,8 @@ CompilerInvocationBase::~CompilerInvocationBase() = default; static bool parseShowColorsArgs(const llvm::opt::ArgList &args, bool defaultColor = true) { // Color diagnostics default to auto ("on" if terminal supports) in the - // compiler driver `flang-new` but default to off in the frontend driver - // `flang-new -fc1`, needing an explicit OPT_fdiagnostics_color. + // compiler driver `flang` but default to off in the frontend driver + // `flang -fc1`, needing an explicit OPT_fdiagnostics_color. // Support both clang's -f[no-]color-diagnostics and gcc's // -f[no-]diagnostics-colors[=never|always|auto]. enum { @@ -891,7 +891,7 @@ static bool parseDiagArgs(CompilerInvocation &res, llvm::opt::ArgList &args, } } - // Default to off for `flang-new -fc1`. + // Default to off for `flang -fc1`. res.getFrontendOpts().showColors = parseShowColorsArgs(args, /*defaultDiagColor=*/false); diff --git a/flang/lib/Frontend/FrontendActions.cpp b/flang/lib/Frontend/FrontendActions.cpp index 4a52edc436e0ed..8f882bff170909 100644 --- a/flang/lib/Frontend/FrontendActions.cpp +++ b/flang/lib/Frontend/FrontendActions.cpp @@ -233,7 +233,7 @@ bool CodeGenAction::beginSourceFileAction() { llvm::SMDiagnostic err; llvmModule = llvm::parseIRFile(getCurrentInput().getFile(), err, *llvmCtx); if (!llvmModule || llvm::verifyModule(*llvmModule, &llvm::errs())) { - err.print("flang-new", llvm::errs()); + err.print("flang", llvm::errs()); unsigned diagID = ci.getDiagnostics().getCustomDiagID( clang::DiagnosticsEngine::Error, "Could not parse IR"); ci.getDiagnostics().Report(diagID); diff --git a/flang/lib/FrontendTool/ExecuteCompilerInvocation.cpp b/flang/lib/FrontendTool/ExecuteCompilerInvocation.cpp index e2cbd5112d6ea5..09ac129d3e6893 100644 --- a/flang/lib/FrontendTool/ExecuteCompilerInvocation.cpp +++ b/flang/lib/FrontendTool/ExecuteCompilerInvocation.cpp @@ -154,8 +154,7 @@ bool executeCompilerInvocation(CompilerInstance *flang) { // Honor -help. if (flang->getFrontendOpts().showHelp) { clang::driver::getDriverOptTable().printHelp( - llvm::outs(), "flang-new -fc1 [options] file...", - "LLVM 'Flang' Compiler", + llvm::outs(), "flang -fc1 [options] file...", "LLVM 'Flang' Compiler", /*ShowHidden=*/false, /*ShowAllAliases=*/false, llvm::opt::Visibility(clang::driver::options::FC1Option)); return true; diff --git a/flang/runtime/CMakeLists.txt b/flang/runtime/CMakeLists.txt index 0ad1b718d5875b..cdd2de541c6730 100644 --- a/flang/runtime/CMakeLists.txt +++ b/flang/runtime/CMakeLists.txt @@ -308,12 +308,12 @@ set_target_properties(FortranRuntime PROPERTIES FOLDER "Flang/Runtime Libraries" # If FortranRuntime is part of a Flang build (and not a separate build) then # add dependency to make sure that Fortran runtime library is being built after # we have the Flang compiler available. This also includes the MODULE files -# that compile when the 'flang-new' target is built. +# that compile when the 'flang' target is built. # # TODO: This is a workaround and should be updated when runtime build procedure # is changed to a regular runtime build. See discussion in PR #95388. -if (TARGET flang-new AND TARGET module_files) - add_dependencies(FortranRuntime flang-new module_files) +if (TARGET flang AND TARGET module_files) + add_dependencies(FortranRuntime flang module_files) endif() if (FLANG_CUF_RUNTIME) diff --git a/flang/test/CMakeLists.txt b/flang/test/CMakeLists.txt index a18a5c6519eda4..cab214c2ef4c8c 100644 --- a/flang/test/CMakeLists.txt +++ b/flang/test/CMakeLists.txt @@ -58,7 +58,7 @@ set(FLANG_TEST_PARAMS flang_site_config=${CMAKE_CURRENT_BINARY_DIR}/lit.site.cfg.py) set(FLANG_TEST_DEPENDS - flang-new + flang llvm-config FileCheck count diff --git a/flang/test/Driver/aarch64-outline-atomics.f90 b/flang/test/Driver/aarch64-outline-atomics.f90 index a1c874c20df5c7..530bfc8e962091 100644 --- a/flang/test/Driver/aarch64-outline-atomics.f90 +++ b/flang/test/Driver/aarch64-outline-atomics.f90 @@ -1,4 +1,4 @@ -! Test that flang-new forwards the -moutline-atomics and -mno-outline-atomics. +! Test that flang forwards the -moutline-atomics and -mno-outline-atomics. ! RUN: %flang -moutline-atomics --target=aarch64-none-none -### %s -o %t 2>&1 | FileCheck %s ! CHECK: "-target-feature" "+outline-atomics" diff --git a/flang/test/Driver/color-diagnostics-forwarding.f90 b/flang/test/Driver/color-diagnostics-forwarding.f90 index 368fa8834142ab..29061242cb0cbc 100644 --- a/flang/test/Driver/color-diagnostics-forwarding.f90 +++ b/flang/test/Driver/color-diagnostics-forwarding.f90 @@ -1,5 +1,5 @@ -! Test that flang-new forwards -f{no-}color-diagnostics and -! -f{no-}diagnostics-color options to flang-new -fc1 as expected. +! Test that flang forwards -f{no-}color-diagnostics and +! -f{no-}diagnostics-color options to flang -fc1 as expected. ! RUN: %flang -fsyntax-only -### %s -o %t 2>&1 -fcolor-diagnostics \ ! RUN: | FileCheck %s --check-prefix=CHECK-CD diff --git a/flang/test/Driver/compiler-options.f90 b/flang/test/Driver/compiler-options.f90 index 7ec29ce7ba7abf..cefa86836abd30 100644 --- a/flang/test/Driver/compiler-options.f90 +++ b/flang/test/Driver/compiler-options.f90 @@ -1,6 +1,6 @@ ! RUN: %flang -S -emit-llvm -flang-deprecated-no-hlfir -o - %s | FileCheck %s -! Test communication of COMPILER_OPTIONS from flang-new to flang-new -fc1. -! CHECK: [[OPTSVAR:@_QQclX[0-9a-f]+]] = {{[a-z]+}} constant [[[OPTSLEN:[0-9]+]] x i8] c"{{.*}}flang-new{{(\.exe)?}} {{.*}}-S -emit-llvm -flang-deprecated-no-hlfir -o - {{.*}}compiler-options.f90" +! Test communication of COMPILER_OPTIONS from flang to flang -fc1. +! CHECK: [[OPTSVAR:@_QQclX[0-9a-f]+]] = {{[a-z]+}} constant [[[OPTSLEN:[0-9]+]] x i8] c"{{.*}}flang{{(\.exe)?}} {{.*}}-S -emit-llvm -flang-deprecated-no-hlfir -o - {{.*}}compiler-options.f90" program main use ISO_FORTRAN_ENV, only: compiler_options implicit none diff --git a/flang/test/Driver/convert.f90 b/flang/test/Driver/convert.f90 index b2cf6c23efdb75..0ba31d2188cdf5 100755 --- a/flang/test/Driver/convert.f90 +++ b/flang/test/Driver/convert.f90 @@ -12,7 +12,7 @@ ! RUN: not %flang -fconvert=foobar %s 2>&1 | FileCheck %s --check-prefix=INVALID !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -emit-mlir -fconvert=unknown %s -o - | FileCheck %s --check-prefix=VALID_FC1 ! RUN: %flang_fc1 -emit-mlir -fconvert=native %s -o - | FileCheck %s --check-prefix=VALID_FC1 diff --git a/flang/test/Driver/disable-ext-name-interop.f90 b/flang/test/Driver/disable-ext-name-interop.f90 index 0c59a5b4c980f8..1ade84b996d043 100644 --- a/flang/test/Driver/disable-ext-name-interop.f90 +++ b/flang/test/Driver/disable-ext-name-interop.f90 @@ -1,4 +1,4 @@ -! Test that we can disable the ExternalNameConversion pass in flang-new. +! Test that we can disable the ExternalNameConversion pass in flang. ! RUN: %flang_fc1 -S %s -o - 2>&1 | FileCheck %s --check-prefix=EXTNAMES ! RUN: %flang_fc1 -S -mmlir -disable-external-name-interop %s -o - 2>&1 | FileCheck %s --check-prefix=INTNAMES diff --git a/flang/test/Driver/driver-version.f90 b/flang/test/Driver/driver-version.f90 index d1e1e1d90fe1f8..4c6aecb1c4fa7e 100644 --- a/flang/test/Driver/driver-version.f90 +++ b/flang/test/Driver/driver-version.f90 @@ -4,12 +4,12 @@ ! RUN: %flang_fc1 -version 2>&1 | FileCheck %s --check-prefix=VERSION-FC1 ! RUN: not %flang_fc1 --version 2>&1 | FileCheck %s --check-prefix=ERROR-FC1 -! VERSION: flang-new version +! VERSION: flang version ! VERSION-NEXT: Target: ! VERSION-NEXT: Thread model: ! VERSION-NEXT: InstalledDir: -! ERROR: flang-new: error: unknown argument '--versions'; did you mean '--version'? +! ERROR: flang: error: unknown argument '--versions'; did you mean '--version'? ! VERSION-FC1: LLVM version diff --git a/flang/test/Driver/escaped-backslash.f90 b/flang/test/Driver/escaped-backslash.f90 index ad07eae24e9fab..90dd1783dd1150 100644 --- a/flang/test/Driver/escaped-backslash.f90 +++ b/flang/test/Driver/escaped-backslash.f90 @@ -1,14 +1,14 @@ ! Ensure argument -fbackslash works as expected. !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: %flang -E %s 2>&1 | FileCheck %s --check-prefix=ESCAPED ! RUN: %flang -E -fbackslash -fno-backslash %s 2>&1 | FileCheck %s --check-prefix=ESCAPED ! RUN: %flang -E -fbackslash %s 2>&1 | FileCheck %s --check-prefix=UNESCAPED !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -E %s 2>&1 | FileCheck %s --check-prefix=ESCAPED ! RUN: %flang_fc1 -E -fbackslash -fno-backslash %s 2>&1 | FileCheck %s --check-prefix=ESCAPED diff --git a/flang/test/Driver/fdefault.f90 b/flang/test/Driver/fdefault.f90 index 88592bfa3e87ee..7ce45b763a240f 100644 --- a/flang/test/Driver/fdefault.f90 +++ b/flang/test/Driver/fdefault.f90 @@ -2,25 +2,25 @@ ! TODO: Add checks when actual codegen is possible for this family !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang -fsyntax-only -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=NOOPTION -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang -fsyntax-only -fdefault-real-8 -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=REAL8 -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang -fsyntax-only -fdefault-real-8 -fdefault-double-8 -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=DOUBLE8 +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang -fsyntax-only -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=NOOPTION +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang -fsyntax-only -fdefault-real-8 -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=REAL8 +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang -fsyntax-only -fdefault-real-8 -fdefault-double-8 -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=DOUBLE8 ! RUN: not %flang -fsyntax-only -fdefault-double-8 %s 2>&1 | FileCheck %s --check-prefix=ERROR !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang_fc1 -fsyntax-only -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=NOOPTION -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang_fc1 -fsyntax-only -fdefault-real-8 -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=REAL8 -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang_fc1 -fsyntax-only -fdefault-real-8 -fdefault-double-8 -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=DOUBLE8 +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang_fc1 -fsyntax-only -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=NOOPTION +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang_fc1 -fsyntax-only -fdefault-real-8 -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=REAL8 +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang_fc1 -fsyntax-only -fdefault-real-8 -fdefault-double-8 -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=DOUBLE8 ! RUN: not %flang_fc1 -fsyntax-only -fdefault-double-8 %s 2>&1 | FileCheck %s --check-prefix=ERROR ! NOOPTION: integer(4),parameter::real_kind=4_4 diff --git a/flang/test/Driver/flarge-sizes.f90 b/flang/test/Driver/flarge-sizes.f90 index 6ea5876676ed1f..6c41a03a830bfb 100644 --- a/flang/test/Driver/flarge-sizes.f90 +++ b/flang/test/Driver/flarge-sizes.f90 @@ -2,20 +2,20 @@ ! TODO: Add checks when actual codegen is possible. !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang -fsyntax-only -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=NOLARGE -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang -fsyntax-only -flarge-sizes -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=LARGE +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang -fsyntax-only -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=NOLARGE +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang -fsyntax-only -flarge-sizes -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=LARGE !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang_fc1 -fsyntax-only -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=NOLARGE -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang_fc1 -fsyntax-only -flarge-sizes -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=LARGE +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang_fc1 -fsyntax-only -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=NOLARGE +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang_fc1 -fsyntax-only -flarge-sizes -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=LARGE ! NOLARGE: real(4)::z(1_8:10_8) ! NOLARGE-NEXT: integer(4),parameter::size_kind=4_4 diff --git a/flang/test/Driver/frame-pointer-forwarding.f90 b/flang/test/Driver/frame-pointer-forwarding.f90 index 751494cc6a6017..9fcbd6e12f98b7 100644 --- a/flang/test/Driver/frame-pointer-forwarding.f90 +++ b/flang/test/Driver/frame-pointer-forwarding.f90 @@ -1,4 +1,4 @@ -! Test that flang-new forwards -fno-omit-frame-pointer and -fomit-frame-pointer Flang frontend +! Test that flang forwards -fno-omit-frame-pointer and -fomit-frame-pointer Flang frontend ! RUN: %flang --target=aarch64-none-none -fsyntax-only -### %s -o %t 2>&1 | FileCheck %s --check-prefix=CHECK-NOVALUE ! CHECK-NOVALUE: "-fc1"{{.*}}"-mframe-pointer=non-leaf" diff --git a/flang/test/Driver/frontend-forwarding.f90 b/flang/test/Driver/frontend-forwarding.f90 index 35adb47b56861e..0a56a1e3710d9d 100644 --- a/flang/test/Driver/frontend-forwarding.f90 +++ b/flang/test/Driver/frontend-forwarding.f90 @@ -1,5 +1,5 @@ -! Test that flang-new forwards Flang frontend -! options to flang-new -fc1 as expected. +! Test that flang forwards Flang frontend +! options to flang -fc1 as expected. ! RUN: %flang -fsyntax-only -### %s -o %t 2>&1 \ ! RUN: -finput-charset=utf-8 \ diff --git a/flang/test/Driver/hlfir-no-hlfir-error.f90 b/flang/test/Driver/hlfir-no-hlfir-error.f90 index 2410393b6cd9c1..59f8304db5c9ab 100644 --- a/flang/test/Driver/hlfir-no-hlfir-error.f90 +++ b/flang/test/Driver/hlfir-no-hlfir-error.f90 @@ -2,12 +2,12 @@ ! options cannot be both used. !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: not %flang -flang-experimental-hlfir -flang-deprecated-no-hlfir %s 2>&1 | FileCheck %s !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: not %flang_fc1 -emit-llvm -flang-experimental-hlfir -flang-deprecated-no-hlfir %s 2>&1 | FileCheck %s diff --git a/flang/test/Driver/intrinsic-module-path.f90 b/flang/test/Driver/intrinsic-module-path.f90 index 5523ed37b724cd..15d19dd83d963f 100644 --- a/flang/test/Driver/intrinsic-module-path.f90 +++ b/flang/test/Driver/intrinsic-module-path.f90 @@ -4,7 +4,7 @@ ! default one, causing a CHECKSUM error. !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -fsyntax-only %s 2>&1 | FileCheck %s --allow-empty --check-prefix=WITHOUT ! RUN: not %flang_fc1 -fsyntax-only -fintrinsic-modules-path %S/Inputs/ %s 2>&1 | FileCheck %s --check-prefix=GIVEN diff --git a/flang/test/Driver/large-data-threshold.f90 b/flang/test/Driver/large-data-threshold.f90 index 320566c4b2e43a..6a7eef79559d0b 100644 --- a/flang/test/Driver/large-data-threshold.f90 +++ b/flang/test/Driver/large-data-threshold.f90 @@ -7,11 +7,11 @@ ! RUN: not %flang -### -c --target=aarch64 -mcmodel=small -mlarge-data-threshold=32768 %s 2>&1 | FileCheck %s --check-prefix=NOT-SUPPORTED -! CHECK: "{{.*}}flang-new" "-fc1" +! CHECK: "{{.*}}flang" "-fc1" ! CHECK-SAME: "-mlarge-data-threshold=32768" -! CHECK-59000: "{{.*}}flang-new" "-fc1" +! CHECK-59000: "{{.*}}flang" "-fc1" ! CHECK-59000-SAME: "-mlarge-data-threshold=59000" -! CHECK-1M: "{{.*}}flang-new" "-fc1" +! CHECK-1M: "{{.*}}flang" "-fc1" ! CHECK-1M-SAME: "-mlarge-data-threshold=1048576" ! NO-MCMODEL: 'mlarge-data-threshold=' only applies to medium and large code models ! INVALID: error: invalid value 'nonsense' in '-mlarge-data-threshold=' diff --git a/flang/test/Driver/lto-flags.f90 b/flang/test/Driver/lto-flags.f90 index a51febc7009691..bad3d972e6bd6b 100644 --- a/flang/test/Driver/lto-flags.f90 +++ b/flang/test/Driver/lto-flags.f90 @@ -30,7 +30,7 @@ ! FULL-LTO: "-fc1" ! FULL-LTO-SAME: "-flto=full" -! THIN-LTO-ALL: flang-new: warning: the option '-flto=thin' is a work in progress +! THIN-LTO-ALL: flang: warning: the option '-flto=thin' is a work in progress ! THIN-LTO-ALL: "-fc1" ! THIN-LTO-ALL-SAME: "-flto=thin" ! THIN-LTO-LINKER-PLUGIN: "-plugin-opt=thinlto" diff --git a/flang/test/Driver/macro-def-undef.F90 b/flang/test/Driver/macro-def-undef.F90 index 1332c6d6c02708..b13a9040833dbf 100644 --- a/flang/test/Driver/macro-def-undef.F90 +++ b/flang/test/Driver/macro-def-undef.F90 @@ -1,14 +1,14 @@ ! Ensure arguments -D and -U work as expected. !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: %flang -E -P %s 2>&1 | FileCheck %s --check-prefix=UNDEFINED ! RUN: %flang -E -P -DX=A %s 2>&1 | FileCheck %s --check-prefix=DEFINED ! RUN: %flang -E -P -DX=A -UX %s 2>&1 | FileCheck %s --check-prefix=UNDEFINED !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -E -P %s 2>&1 | FileCheck %s --check-prefix=UNDEFINED ! RUN: %flang_fc1 -E -P -DX=A %s 2>&1 | FileCheck %s --check-prefix=DEFINED diff --git a/flang/test/Driver/missing-input.f90 b/flang/test/Driver/missing-input.f90 index 236325e3578f1d..51d37a718c542f 100644 --- a/flang/test/Driver/missing-input.f90 +++ b/flang/test/Driver/missing-input.f90 @@ -1,26 +1,26 @@ ! Test the behaviour of the driver when input is missing or is invalid. Note -! that with the compiler driver (flang-new), the input _has_ to be specified. +! that with the compiler driver (flang), the input _has_ to be specified. ! Indeed, the driver decides what "job/command" to create based on the input ! file's extension. No input file means that it doesn't know what to do -! (compile? preprocess? link?). The frontend driver (flang-new -fc1) simply +! (compile? preprocess? link?). The frontend driver (flang -fc1) simply ! assumes that "no explicit input == read from stdin" !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: not %flang 2>&1 | FileCheck %s --check-prefix=FLANG-NO-FILE ! RUN: not %flang %t.f90 2>&1 | FileCheck %s --check-prefix=FLANG-NONEXISTENT-FILE !----------------------------------------- -! FLANG FRONTEND DRIVER (flang-new -fc1) +! FLANG FRONTEND DRIVER (flang -fc1) !----------------------------------------- ! RUN: not %flang_fc1 %t.f90 2>&1 | FileCheck %s --check-prefix=FLANG-FC1-NONEXISTENT-FILE ! RUN: not %flang_fc1 %S 2>&1 | FileCheck %s --check-prefix=FLANG-FC1-DIR -! FLANG-NO-FILE: flang-new: error: no input files +! FLANG-NO-FILE: flang: error: no input files -! FLANG-NONEXISTENT-FILE: flang-new: error: no such file or directory: {{.*}} -! FLANG-NONEXISTENT-FILE: flang-new: error: no input files +! FLANG-NONEXISTENT-FILE: flang: error: no such file or directory: {{.*}} +! FLANG-NONEXISTENT-FILE: flang: error: no input files ! FLANG-FC1-NONEXISTENT-FILE: error: {{.*}} does not exist ! FLANG-FC1-DIR: error: {{.*}} is not a regular file diff --git a/flang/test/Driver/multiple-input-files.f90 b/flang/test/Driver/multiple-input-files.f90 index 6c86f23f2b21fa..64ec8679abf94f 100644 --- a/flang/test/Driver/multiple-input-files.f90 +++ b/flang/test/Driver/multiple-input-files.f90 @@ -39,7 +39,7 @@ ! FLANG-NEXT:end program hello ! TEST 2: `-o` does not when multiple input files are present -! ERROR: flang-new: error: cannot specify -o when generating multiple output files +! ERROR: flang: error: cannot specify -o when generating multiple output files ! TEST 3: The output file _was not_ specified - `flang_fc1` will process all ! input files and generate one output file for every input file. diff --git a/flang/test/Driver/omp-driver-offload.f90 b/flang/test/Driver/omp-driver-offload.f90 index b0b94ab1386a74..7c51656f0001af 100644 --- a/flang/test/Driver/omp-driver-offload.f90 +++ b/flang/test/Driver/omp-driver-offload.f90 @@ -1,6 +1,6 @@ -! Test that flang-new OpenMP and OpenMP offload related +! Test that flang OpenMP and OpenMP offload related ! commands forward or expand to the appropriate commands -! for flang-new -fc1 as expected. Assumes a gfx90a, aarch64, +! for flang -fc1 as expected. Assumes a gfx90a, aarch64, ! and sm_70 architecture, but doesn't require one to be ! installed or compiled for, just testing the appropriate ! generation of jobs are created with the correct @@ -8,8 +8,8 @@ ! Test regular -fopenmp with no offload ! RUN: %flang -### -fopenmp %s 2>&1 | FileCheck --check-prefixes=CHECK-OPENMP %s -! CHECK-OPENMP: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}}.f90" -! CHECK-OPENMP-NOT: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" {{.*}}.f90" +! CHECK-OPENMP: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}}.f90" +! CHECK-OPENMP-NOT: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" {{.*}}.f90" ! Test regular -fopenmp with offload, and invocation filtering options ! RUN: %flang -S -### %s -o %t 2>&1 \ @@ -22,47 +22,47 @@ ! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\ ! RUN: | FileCheck %s --check-prefix=OFFLOAD-HOST-AND-DEVICE -! OFFLOAD-HOST-AND-DEVICE: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" -! OFFLOAD-HOST-AND-DEVICE-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" -! OFFLOAD-HOST-AND-DEVICE-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "nvptx64-nvidia-cuda" -! OFFLOAD-HOST-AND-DEVICE: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-HOST-AND-DEVICE: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-HOST-AND-DEVICE-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! OFFLOAD-HOST-AND-DEVICE-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "nvptx64-nvidia-cuda" +! OFFLOAD-HOST-AND-DEVICE: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! RUN: %flang -S -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a --offload-arch=sm_70 --offload-host-only \ ! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\ ! RUN: | FileCheck %s --check-prefix=OFFLOAD-HOST -! OFFLOAD-HOST: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-HOST: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! OFFLOAD-HOST-NOT: "-triple" "amdgcn-amd-amdhsa" ! OFFLOAD-HOST-NOT: "-triple" "nvptx64-nvidia-cuda" -! OFFLOAD-HOST-NOT: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-HOST-NOT: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! RUN: %flang -S -### %s 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a --offload-arch=sm_70 --offload-device-only \ ! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\ ! RUN: | FileCheck %s --check-prefix=OFFLOAD-DEVICE -! OFFLOAD-DEVICE: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" -! OFFLOAD-DEVICE-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" -! OFFLOAD-DEVICE-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "nvptx64-nvidia-cuda" -! OFFLOAD-DEVICE-NOT: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-DEVICE: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-DEVICE-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! OFFLOAD-DEVICE-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "nvptx64-nvidia-cuda" +! OFFLOAD-DEVICE-NOT: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! Test regular -fopenmp with offload for basic fopenmp-is-target-device flag addition and correct fopenmp ! RUN: %flang -### -fopenmp --offload-arch=gfx90a -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib %s 2>&1 | FileCheck --check-prefixes=CHECK-OPENMP-IS-TARGET-DEVICE %s -! CHECK-OPENMP-IS-TARGET-DEVICE: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" {{.*}}.f90" +! CHECK-OPENMP-IS-TARGET-DEVICE: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" {{.*}}.f90" ! Testing appropriate flags are gnerated and appropriately assigned by the driver when offloading ! RUN: %flang -S -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ ! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\ ! RUN: | FileCheck %s --check-prefix=OPENMP-OFFLOAD-ARGS -! OPENMP-OFFLOAD-ARGS: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" {{.*}} "-fopenmp" {{.*}}.f90" -! OPENMP-OFFLOAD-ARGS-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! OPENMP-OFFLOAD-ARGS: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" {{.*}} "-fopenmp" {{.*}}.f90" +! OPENMP-OFFLOAD-ARGS-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" ! OPENMP-OFFLOAD-ARGS-SAME: "-fopenmp" ! OPENMP-OFFLOAD-ARGS-SAME: "-fopenmp-host-ir-file-path" "{{.*}}.bc" "-fopenmp-is-target-device" ! OPENMP-OFFLOAD-ARGS-SAME: {{.*}}.f90" ! OPENMP-OFFLOAD-ARGS: "{{[^"]*}}clang-offload-packager{{.*}}" {{.*}} "--image=file={{.*}}.bc,triple=amdgcn-amd-amdhsa,arch=gfx90a,kind=openmp" -! OPENMP-OFFLOAD-ARGS-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OPENMP-OFFLOAD-ARGS-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! OPENMP-OFFLOAD-ARGS-SAME: "-fopenmp" ! OPENMP-OFFLOAD-ARGS-SAME: "-fembed-offload-object={{.*}}.out" {{.*}}.bc" @@ -77,7 +77,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-assume-threads-oversubscription \ ! RUN: | FileCheck %s --check-prefixes=CHECK-THREADS-OVS -! CHECK-THREADS-OVS: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-threads-oversubscription" {{.*}}.f90" +! CHECK-THREADS-OVS: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-threads-oversubscription" {{.*}}.f90" ! RUN: %flang -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -89,7 +89,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-assume-teams-oversubscription \ ! RUN: | FileCheck %s --check-prefixes=CHECK-TEAMS-OVS -! CHECK-TEAMS-OVS: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-teams-oversubscription" {{.*}}.f90" +! CHECK-TEAMS-OVS: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-teams-oversubscription" {{.*}}.f90" ! RUN: %flang -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -101,7 +101,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-assume-no-nested-parallelism \ ! RUN: | FileCheck %s --check-prefixes=CHECK-NEST-PAR -! CHECK-NEST-PAR: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-no-nested-parallelism" {{.*}}.f90" +! CHECK-NEST-PAR: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-no-nested-parallelism" {{.*}}.f90" ! RUN: %flang -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -113,7 +113,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-assume-no-thread-state \ ! RUN: | FileCheck %s --check-prefixes=CHECK-THREAD-STATE -! CHECK-THREAD-STATE: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-no-thread-state" {{.*}}.f90" +! CHECK-THREAD-STATE: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-no-thread-state" {{.*}}.f90" ! RUN: %flang -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -125,7 +125,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-target-debug \ ! RUN: | FileCheck %s --check-prefixes=CHECK-TARGET-DEBUG -! CHECK-TARGET-DEBUG: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug" {{.*}}.f90" +! CHECK-TARGET-DEBUG: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug" {{.*}}.f90" ! RUN: %flang -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -137,7 +137,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-target-debug \ ! RUN: | FileCheck %s --check-prefixes=CHECK-TARGET-DEBUG -! CHECK-TARGET-DEBUG-EQ: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug=111" {{.*}}.f90" +! CHECK-TARGET-DEBUG-EQ: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug=111" {{.*}}.f90" ! RUN: %flang -S -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -153,7 +153,7 @@ ! RUN: -fopenmp-assume-teams-oversubscription -fopenmp-assume-no-nested-parallelism \ ! RUN: -fopenmp-assume-no-thread-state \ ! RUN: | FileCheck %s --check-prefixes=CHECK-RTL-ALL -! CHECK-RTL-ALL: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug" "-fopenmp-assume-teams-oversubscription" +! CHECK-RTL-ALL: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug" "-fopenmp-assume-teams-oversubscription" ! CHECK-RTL-ALL: "-fopenmp-assume-threads-oversubscription" "-fopenmp-assume-no-thread-state" "-fopenmp-assume-no-nested-parallelism" ! CHECK-RTL-ALL: {{.*}}.f90" @@ -167,7 +167,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-version=45 \ ! RUN: | FileCheck %s --check-prefixes=CHECK-OPENMP-VERSION -! CHECK-OPENMP-VERSION: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" "-fopenmp-version=45" {{.*}}.f90" +! CHECK-OPENMP-VERSION: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" "-fopenmp-version=45" {{.*}}.f90" ! Test diagnostic error when host IR file is non-existent ! RUN: not %flang_fc1 %s -o %t 2>&1 -fopenmp -fopenmp-is-target-device \ @@ -187,7 +187,7 @@ ! RUN: --target=aarch64-unknown-linux-gnu \ ! RUN: | FileCheck %s --check-prefix=FORCE-USM-NO-OFFLOAD -! FORCE-USM-NO-OFFLOAD: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! FORCE-USM-NO-OFFLOAD: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! FORCE-USM-NO-OFFLOAD-SAME: "-fopenmp" "-fopenmp-force-usm" ! Test -fopenmp-force-usm option with offload @@ -196,16 +196,16 @@ ! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\ ! RUN: | FileCheck %s --check-prefix=FORCE-USM-OFFLOAD -! FORCE-USM-OFFLOAD: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! FORCE-USM-OFFLOAD: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! FORCE-USM-OFFLOAD-SAME: "-fopenmp" "-fopenmp-force-usm" -! FORCE-USM-OFFLOAD-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! FORCE-USM-OFFLOAD-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" ! FORCE-USM-OFFLOAD-SAME: "-fopenmp" "-fopenmp-force-usm" ! RUN: %flang -### -v --target=x86_64-unknown-linux-gnu -fopenmp \ ! RUN: --offload-arch=gfx900 \ ! RUN: --rocm-path=%S/Inputs/rocm %s 2>&1 \ ! RUN: | FileCheck --check-prefix=MLINK-BUILTIN-BITCODE %s -! MLINK-BUILTIN-BITCODE: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! MLINK-BUILTIN-BITCODE: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" ! MLINK-BUILTIN-BITCODE-SAME: "-mlink-builtin-bitcode" {{.*Inputs.*rocm.*amdgcn.*bitcode.*}}oclc_isa_version_900.bc ! Test that the -fopenmp-targets option is added to host compilation invocations @@ -219,9 +219,9 @@ ! RUN: --target=x86_64-unknown-linux-gnu -nogpulib \ ! RUN: | FileCheck %s --check-prefix=OFFLOAD-TARGETS -! OFFLOAD-TARGETS: "{{[^"]*}}flang-new" "-fc1" "-triple" "x86_64-unknown-linux-gnu" +! OFFLOAD-TARGETS: "{{[^"]*}}flang" "-fc1" "-triple" "x86_64-unknown-linux-gnu" ! OFFLOAD-TARGETS-SAME: "-fopenmp-targets=amdgcn-amd-amdhsa" -! OFFLOAD-TARGETS-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! OFFLOAD-TARGETS-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" ! OFFLOAD-TARGETS-NOT: -fopenmp-targets -! OFFLOAD-TARGETS: "{{[^"]*}}flang-new" "-fc1" "-triple" "x86_64-unknown-linux-gnu" +! OFFLOAD-TARGETS: "{{[^"]*}}flang" "-fc1" "-triple" "x86_64-unknown-linux-gnu" ! OFFLOAD-TARGETS-SAME: "-fopenmp-targets=amdgcn-amd-amdhsa" diff --git a/flang/test/Driver/predefined-macros-compiler-version.F90 b/flang/test/Driver/predefined-macros-compiler-version.F90 index 823a730f96845a..f6924479281562 100644 --- a/flang/test/Driver/predefined-macros-compiler-version.F90 +++ b/flang/test/Driver/predefined-macros-compiler-version.F90 @@ -1,12 +1,12 @@ ! Check that the driver correctly defines macros with the compiler version !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: %flang_fc1 -E %s 2>&1 | FileCheck %s --ignore-case !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -E %s 2>&1 | FileCheck %s --ignore-case diff --git a/flang/test/Driver/std2018-wrong.f90 b/flang/test/Driver/std2018-wrong.f90 index 27ccc76bd39aad..93ba153d75f7f9 100644 --- a/flang/test/Driver/std2018-wrong.f90 +++ b/flang/test/Driver/std2018-wrong.f90 @@ -1,7 +1,7 @@ ! Ensure argument -std=f2018 works as expected. !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: not %flang_fc1 -std=90 %s 2>&1 | FileCheck %s --check-prefix=WRONG diff --git a/flang/test/Driver/std2018.f90 b/flang/test/Driver/std2018.f90 index cf461cf89e4e19..1727f92127b711 100644 --- a/flang/test/Driver/std2018.f90 +++ b/flang/test/Driver/std2018.f90 @@ -1,7 +1,7 @@ ! Ensure argument -std=f2018 works as expected. !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -fsyntax-only %s 2>&1 | FileCheck %s --allow-empty --check-prefix=WITHOUT ! RUN: %flang_fc1 -fsyntax-only -std=f2018 %s 2>&1 | FileCheck %s --check-prefix=GIVEN diff --git a/flang/test/Driver/supported-suffices/f03-suffix.f03 b/flang/test/Driver/supported-suffices/f03-suffix.f03 index 6e03f9f43fc602..1d850305cd040e 100644 --- a/flang/test/Driver/supported-suffices/f03-suffix.f03 +++ b/flang/test/Driver/supported-suffices/f03-suffix.f03 @@ -1,5 +1,5 @@ ! RUN: %flang -### %s 2>&1 | FileCheck %s -! CHECK: "{{.*}}flang-new" "-fc1" {{.*}} "-o" "{{.*}}.o" +! CHECK: "{{.*}}flang" "-fc1" {{.*}} "-o" "{{.*}}.o" program f03 end program f03 diff --git a/flang/test/Driver/supported-suffices/f08-suffix.f08 b/flang/test/Driver/supported-suffices/f08-suffix.f08 index d5bcf4ce1de1cc..2b31e4c21876ae 100644 --- a/flang/test/Driver/supported-suffices/f08-suffix.f08 +++ b/flang/test/Driver/supported-suffices/f08-suffix.f08 @@ -1,5 +1,5 @@ ! RUN: %flang -### %s 2>&1 | FileCheck %s -! CHECK: "{{.*}}flang-new" "-fc1" {{.*}} "-o" "{{.*}}.o" +! CHECK: "{{.*}}flang" "-fc1" {{.*}} "-o" "{{.*}}.o" program f08 end program f08 diff --git a/flang/test/Driver/use-module-error.f90 b/flang/test/Driver/use-module-error.f90 index 42d6650621c8c8..67335f61626817 100644 --- a/flang/test/Driver/use-module-error.f90 +++ b/flang/test/Driver/use-module-error.f90 @@ -1,14 +1,14 @@ ! Ensure that multiple module directories are not allowed !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: not %flang -fsyntax-only -J %S/Inputs/module-dir -J %S/Inputs/ %s 2>&1 | FileCheck %s --check-prefix=DOUBLEINCLUDE ! RUN: not %flang -fsyntax-only -J %S/Inputs/module-dir -module-dir %S/Inputs/ %s 2>&1 | FileCheck %s --check-prefix=DOUBLEINCLUDE ! RUN: not %flang -fsyntax-only -module-dir %S/Inputs/module-dir -J%S/Inputs/ %s 2>&1 | FileCheck %s --check-prefix=DOUBLEINCLUDE !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: not %flang_fc1 -fsyntax-only -J %S/Inputs/module-dir -J %S/Inputs/ %s 2>&1 | FileCheck %s --check-prefix=DOUBLEINCLUDE ! RUN: not %flang_fc1 -fsyntax-only -J %S/Inputs/module-dir -module-dir %S/Inputs/ %s 2>&1 | FileCheck %s --check-prefix=DOUBLEINCLUDE diff --git a/flang/test/Driver/use-module.f90 b/flang/test/Driver/use-module.f90 index 775c0424715883..2c3a38043fe16e 100644 --- a/flang/test/Driver/use-module.f90 +++ b/flang/test/Driver/use-module.f90 @@ -1,7 +1,7 @@ ! Checks that module search directories specified with `-J/-module-dir` and `-I` are handled correctly !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: %flang -fsyntax-only -I %S/Inputs -I %S/Inputs/module-dir %s 2>&1 | FileCheck %s --check-prefix=INCLUDED --allow-empty ! RUN: %flang -fsyntax-only -I %S/Inputs -J %S/Inputs/module-dir %s 2>&1 | FileCheck %s --check-prefix=INCLUDED --allow-empty @@ -16,7 +16,7 @@ ! RUN: not %flang -fsyntax-only -module-dir %S/Inputs/module-dir %s 2>&1 | FileCheck %s --check-prefix=SINGLEINCLUDE !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -fsyntax-only -I %S/Inputs -I %S/Inputs/module-dir %s 2>&1 | FileCheck %s --check-prefix=INCLUDED --allow-empty ! RUN: %flang_fc1 -fsyntax-only -I %S/Inputs -J %S/Inputs/module-dir %s 2>&1 | FileCheck %s --check-prefix=INCLUDED --allow-empty diff --git a/flang/test/Driver/version-loops.f90 b/flang/test/Driver/version-loops.f90 index b0fa01d572512a..d206393a04f486 100644 --- a/flang/test/Driver/version-loops.f90 +++ b/flang/test/Driver/version-loops.f90 @@ -1,5 +1,5 @@ -! Test that flang-new forwards the -f{no-,}version-loops-for-stride -! options correctly to flang-new -fc1 for different variants of optimisation +! Test that flang forwards the -f{no-,}version-loops-for-stride +! options correctly to flang -fc1 for different variants of optimisation ! and explicit flags. ! RUN: %flang -### %s -o %t 2>&1 -O3 \ @@ -23,32 +23,32 @@ ! RUN: %flang -### %s -o %t 2>&1 -O3 -fno-version-loops-for-stride \ ! RUN: | FileCheck %s --check-prefix=CHECK-O3-no -! CHECK: "{{.*}}flang-new" "-fc1" +! CHECK: "{{.*}}flang" "-fc1" ! CHECK-SAME: "-fversion-loops-for-stride" ! CHECK-SAME: "-O3" -! CHECK-O2: "{{.*}}flang-new" "-fc1" +! CHECK-O2: "{{.*}}flang" "-fc1" ! CHECK-O2-NOT: "-fversion-loops-for-stride" ! CHECK-O2-SAME: "-O2" -! CHECK-O2-with: "{{.*}}flang-new" "-fc1" +! CHECK-O2-with: "{{.*}}flang" "-fc1" ! CHECK-O2-with-SAME: "-fversion-loops-for-stride" ! CHECK-O2-with-SAME: "-O2" -! CHECK-O4: "{{.*}}flang-new" "-fc1" +! CHECK-O4: "{{.*}}flang" "-fc1" ! CHECK-O4-SAME: "-fversion-loops-for-stride" ! CHECK-O4-SAME: "-O3" -! CHECK-Ofast: "{{.*}}flang-new" "-fc1" +! CHECK-Ofast: "{{.*}}flang" "-fc1" ! CHECK-Ofast-SAME: "-ffast-math" ! CHECK-Ofast-SAME: "-fversion-loops-for-stride" ! CHECK-Ofast-SAME: "-O3" -! CHECK-Ofast-no: "{{.*}}flang-new" "-fc1" +! CHECK-Ofast-no: "{{.*}}flang" "-fc1" ! CHECK-Ofast-no-SAME: "-ffast-math" ! CHECK-Ofast-no-NOT: "-fversion-loops-for-stride" ! CHECK-Ofast-no-SAME: "-O3" -! CHECK-O3-no: "{{.*}}flang-new" "-fc1" +! CHECK-O3-no: "{{.*}}flang" "-fc1" ! CHECK-O3-no-NOT: "-fversion-loops-for-stride" ! CHECK-O3-no-SAME: "-O3" diff --git a/flang/test/Driver/wextra-ok.f90 b/flang/test/Driver/wextra-ok.f90 index 6a38d9481a36b7..441029aa0af276 100644 --- a/flang/test/Driver/wextra-ok.f90 +++ b/flang/test/Driver/wextra-ok.f90 @@ -1,4 +1,4 @@ -! Ensure that supplying -Wextra into flang-new does not raise error +! Ensure that supplying -Wextra into flang does not raise error ! The first check should be changed if -Wextra is implemented ! RUN: %flang -std=f2018 -Wextra %s -c 2>&1 | FileCheck %s --check-prefix=CHECK-OK diff --git a/flang/test/HLFIR/hlfir-flags.f90 b/flang/test/HLFIR/hlfir-flags.f90 index b383a79d12c27b..0b1e80b1e3f636 100644 --- a/flang/test/HLFIR/hlfir-flags.f90 +++ b/flang/test/HLFIR/hlfir-flags.f90 @@ -1,4 +1,4 @@ -! Test -flang-deprecated-hlfir, -flang-experimental-hlfir (flang-new), and +! Test -flang-deprecated-hlfir, -flang-experimental-hlfir (flang), and ! -hlfir (bbc), -emit-hlfir, -emit-fir flags ! RUN: %flang_fc1 -emit-hlfir -o - %s | FileCheck --check-prefix HLFIR --check-prefix ALL %s ! RUN: bbc -emit-hlfir -o - %s | FileCheck --check-prefix HLFIR --check-prefix ALL %s diff --git a/flang/test/Lower/Intrinsics/command_argument_count.f90 b/flang/test/Lower/Intrinsics/command_argument_count.f90 index 0cf92d4444db98..a30b27d664fc0c 100644 --- a/flang/test/Lower/Intrinsics/command_argument_count.f90 +++ b/flang/test/Lower/Intrinsics/command_argument_count.f90 @@ -1,6 +1,6 @@ ! RUN: bbc -emit-fir %s -o - | FileCheck %s -! bbc doesn't have a way to set the default kinds so we use flang-new driver -! RUN: flang-new -fc1 -fdefault-integer-8 -emit-fir %s -o - | FileCheck --check-prefixes=CHECK,CHECK-64 %s +! bbc doesn't have a way to set the default kinds so we use flang driver +! RUN: flang -fc1 -fdefault-integer-8 -emit-fir %s -o - | FileCheck --check-prefixes=CHECK,CHECK-64 %s ! CHECK-LABEL: argument_count_test subroutine argument_count_test() diff --git a/flang/test/Lower/Intrinsics/exit.f90 b/flang/test/Lower/Intrinsics/exit.f90 index c3110fcbec2b5a..bd551f7318a84a 100644 --- a/flang/test/Lower/Intrinsics/exit.f90 +++ b/flang/test/Lower/Intrinsics/exit.f90 @@ -1,5 +1,5 @@ ! RUN: bbc -emit-fir -hlfir=false %s -o - | FileCheck --check-prefixes=CHECK,CHECK-32 -DDEFAULT_INTEGER_SIZE=32 %s -! bbc doesn't have a way to set the default kinds so we use flang-new driver +! bbc doesn't have a way to set the default kinds so we use flang driver ! RUN: %flang_fc1 -fdefault-integer-8 -emit-fir -flang-deprecated-no-hlfir %s -o - | FileCheck --check-prefixes=CHECK,CHECK-64 -DDEFAULT_INTEGER_SIZE=64 %s ! CHECK-LABEL: func @_QPexit_test1() { diff --git a/flang/test/Lower/Intrinsics/ieee_is_normal.f90 b/flang/test/Lower/Intrinsics/ieee_is_normal.f90 index f9ab01881d250d..9b864c9a9849c3 100644 --- a/flang/test/Lower/Intrinsics/ieee_is_normal.f90 +++ b/flang/test/Lower/Intrinsics/ieee_is_normal.f90 @@ -1,5 +1,5 @@ ! RUN: bbc -emit-fir %s -o - | FileCheck %s -! RUN: flang-new -fc1 -emit-fir %s -o - | FileCheck %s +! RUN: flang -fc1 -emit-fir %s -o - | FileCheck %s ! CHECK-LABEL: ieee_is_normal_f16 subroutine ieee_is_normal_f16(r) diff --git a/flang/test/Lower/Intrinsics/isnan.f90 b/flang/test/Lower/Intrinsics/isnan.f90 index 700b2d1a67c656..62b98c8ea98bee 100644 --- a/flang/test/Lower/Intrinsics/isnan.f90 +++ b/flang/test/Lower/Intrinsics/isnan.f90 @@ -1,5 +1,5 @@ ! RUN: bbc -emit-fir %s -o - | FileCheck %s -! RUN: flang-new -fc1 -emit-fir %s -o - | FileCheck %s +! RUN: flang -fc1 -emit-fir %s -o - | FileCheck %s ! CHECK-LABEL: isnan_f32 subroutine isnan_f32(r) diff --git a/flang/test/Lower/Intrinsics/modulo.f90 b/flang/test/Lower/Intrinsics/modulo.f90 index ac18e59033a6b6..781ef8296a2b7d 100644 --- a/flang/test/Lower/Intrinsics/modulo.f90 +++ b/flang/test/Lower/Intrinsics/modulo.f90 @@ -1,5 +1,5 @@ ! RUN: bbc -emit-fir -hlfir=false %s -o - | FileCheck %s -check-prefixes=HONORINF,ALL -! RUN: flang-new -fc1 -menable-no-infs -emit-fir -flang-deprecated-no-hlfir %s -o - | FileCheck %s -check-prefixes=CHECK,ALL +! RUN: flang -fc1 -menable-no-infs -emit-fir -flang-deprecated-no-hlfir %s -o - | FileCheck %s -check-prefixes=CHECK,ALL ! ALL-LABEL: func @_QPmodulo_testr( ! ALL-SAME: %[[arg0:.*]]: !fir.ref{{.*}}, %[[arg1:.*]]: !fir.ref{{.*}}, %[[arg2:.*]]: !fir.ref{{.*}}) { diff --git a/flang/test/Lower/OpenMP/Todo/omp-declarative-allocate.f90 b/flang/test/Lower/OpenMP/Todo/omp-declarative-allocate.f90 index f02884e5e92f38..425ccbc5dd56c5 100644 --- a/flang/test/Lower/OpenMP/Todo/omp-declarative-allocate.f90 +++ b/flang/test/Lower/OpenMP/Todo/omp-declarative-allocate.f90 @@ -1,6 +1,6 @@ ! This test checks lowering of OpenMP allocate Directive. -// RUN: not flang-new -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s +// RUN: not flang -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s program main integer :: x, y diff --git a/flang/test/Lower/OpenMP/Todo/omp-declare-reduction.f90 b/flang/test/Lower/OpenMP/Todo/omp-declare-reduction.f90 index 3be61a1700ced3..7a7d28db8d6f5a 100644 --- a/flang/test/Lower/OpenMP/Todo/omp-declare-reduction.f90 +++ b/flang/test/Lower/OpenMP/Todo/omp-declare-reduction.f90 @@ -1,6 +1,6 @@ ! This test checks lowering of OpenMP declare reduction Directive. -// RUN: not flang-new -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s +// RUN: not flang -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s subroutine declare_red() integer :: my_var diff --git a/flang/test/Lower/OpenMP/Todo/omp-declare-simd.f90 b/flang/test/Lower/OpenMP/Todo/omp-declare-simd.f90 index c6a0a8f2cd0d22..be1ac2db5dfa4a 100644 --- a/flang/test/Lower/OpenMP/Todo/omp-declare-simd.f90 +++ b/flang/test/Lower/OpenMP/Todo/omp-declare-simd.f90 @@ -1,6 +1,6 @@ ! This test checks lowering of OpenMP declare simd Directive. -// RUN: not flang-new -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s +// RUN: not flang -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s subroutine sub(x, y) real, intent(inout) :: x, y diff --git a/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 b/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 index 62bc247a1456a1..bc5baf4e1cf604 100644 --- a/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 +++ b/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 @@ -1,7 +1,7 @@ ! This test checks lowering of `LASTPRIVATE` clause for scalar types. ! RUN: bbc -fopenmp -emit-hlfir %s -o - | FileCheck %s -! RUN: flang-new -fc1 -fopenmp -emit-hlfir %s -o - | FileCheck %s +! RUN: flang -fc1 -fopenmp -emit-hlfir %s -o - | FileCheck %s !CHECK: func @_QPlastprivate_character(%[[ARG1:.*]]: !fir.boxchar<1>{{.*}}) { !CHECK-DAG: %[[ARG1_UNBOX:.*]]:2 = fir.unboxchar diff --git a/flang/test/Lower/OpenMP/parallel-wsloop-reduction-byref.f90 b/flang/test/Lower/OpenMP/parallel-wsloop-reduction-byref.f90 index 32caac39778dee..99c521406a7775 100644 --- a/flang/test/Lower/OpenMP/parallel-wsloop-reduction-byref.f90 +++ b/flang/test/Lower/OpenMP/parallel-wsloop-reduction-byref.f90 @@ -1,7 +1,7 @@ ! Check that for parallel do, reduction is only processed for the loop ! RUN: bbc -fopenmp --force-byref-reduction -emit-hlfir %s -o - | FileCheck %s -! RUN: flang-new -fc1 -fopenmp -mmlir --force-byref-reduction -emit-hlfir %s -o - | FileCheck %s +! RUN: flang -fc1 -fopenmp -mmlir --force-byref-reduction -emit-hlfir %s -o - | FileCheck %s ! CHECK: omp.parallel { ! CHECK: omp.wsloop reduction(byref @add_reduction_byref_i32 diff --git a/flang/test/Lower/OpenMP/parallel-wsloop-reduction.f90 b/flang/test/Lower/OpenMP/parallel-wsloop-reduction.f90 index fdedbb06160761..cfeb5de83f4e82 100644 --- a/flang/test/Lower/OpenMP/parallel-wsloop-reduction.f90 +++ b/flang/test/Lower/OpenMP/parallel-wsloop-reduction.f90 @@ -1,7 +1,7 @@ ! Check that for parallel do, reduction is only processed for the loop ! RUN: bbc -fopenmp -emit-hlfir %s -o - | FileCheck %s -! RUN: flang-new -fc1 -fopenmp -emit-hlfir %s -o - | FileCheck %s +! RUN: flang -fc1 -fopenmp -emit-hlfir %s -o - | FileCheck %s ! CHECK: omp.parallel { ! CHECK: omp.wsloop reduction(@add_reduction_i32 diff --git a/flang/test/lit.cfg.py b/flang/test/lit.cfg.py index 4acbc0606d1977..f43234fb125b7e 100644 --- a/flang/test/lit.cfg.py +++ b/flang/test/lit.cfg.py @@ -132,13 +132,13 @@ tools = [ ToolSubst( "%flang", - command=FindTool("flang-new"), + command=FindTool("flang"), extra_args=isysroot_flag, unresolved="fatal", ), ToolSubst( "%flang_fc1", - command=FindTool("flang-new"), + command=FindTool("flang"), extra_args=["-fc1"], unresolved="fatal", ), diff --git a/flang/tools/f18/CMakeLists.txt b/flang/tools/f18/CMakeLists.txt index 9d7b8633958cb7..4362fcf0537616 100644 --- a/flang/tools/f18/CMakeLists.txt +++ b/flang/tools/f18/CMakeLists.txt @@ -55,7 +55,7 @@ endif() set(module_objects "") # Create module files directly from the top-level module source directory. -# If CMAKE_CROSSCOMPILING, then the newly built flang-new executable was +# If CMAKE_CROSSCOMPILING, then the newly built flang executable was # cross compiled, and thus can't be executed on the build system and thus # can't be used for generating module files. if (NOT CMAKE_CROSSCOMPILING) @@ -115,9 +115,9 @@ if (NOT CMAKE_CROSSCOMPILING) # TODO: We may need to flag this with conditional, in case Flang is built w/o OpenMP support add_custom_command(OUTPUT ${base}.mod ${object_output} COMMAND ${CMAKE_COMMAND} -E make_directory ${FLANG_INTRINSIC_MODULES_DIR} - COMMAND flang-new ${opts} ${decls} -cpp ${compile_with} -module-dir ${FLANG_INTRINSIC_MODULES_DIR} + COMMAND flang ${opts} ${decls} -cpp ${compile_with} -module-dir ${FLANG_INTRINSIC_MODULES_DIR} ${FLANG_SOURCE_DIR}/module/${filename}.f90 - DEPENDS flang-new ${FLANG_SOURCE_DIR}/module/${filename}.f90 ${FLANG_SOURCE_DIR}/module/__fortran_builtins.f90 ${depends} + DEPENDS flang ${FLANG_SOURCE_DIR}/module/${filename}.f90 ${FLANG_SOURCE_DIR}/module/__fortran_builtins.f90 ${depends} ) list(APPEND MODULE_FILES ${base}.mod) install(FILES ${base}.mod DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}/flang") @@ -142,9 +142,9 @@ if (NOT CMAKE_CROSSCOMPILING) set(base ${FLANG_INTRINSIC_MODULES_DIR}/omp_lib) add_custom_command(OUTPUT ${base}.mod ${base}_kinds.mod COMMAND ${CMAKE_COMMAND} -E make_directory ${FLANG_INTRINSIC_MODULES_DIR} - COMMAND flang-new -cpp -fsyntax-only ${opts} -module-dir ${FLANG_INTRINSIC_MODULES_DIR} + COMMAND flang -cpp -fsyntax-only ${opts} -module-dir ${FLANG_INTRINSIC_MODULES_DIR} ${CMAKE_BINARY_DIR}/projects/openmp/runtime/src/omp_lib.F90 - DEPENDS flang-new ${FLANG_INTRINSIC_MODULES_DIR}/iso_c_binding.mod ${CMAKE_BINARY_DIR}/projects/openmp/runtime/src/omp_lib.F90 ${depends} + DEPENDS flang ${FLANG_INTRINSIC_MODULES_DIR}/iso_c_binding.mod ${CMAKE_BINARY_DIR}/projects/openmp/runtime/src/omp_lib.F90 ${depends} ) add_custom_command(OUTPUT ${base}.f18.mod DEPENDS ${base}.mod diff --git a/flang/tools/flang-driver/CMakeLists.txt b/flang/tools/flang-driver/CMakeLists.txt index 9f33cdfe3fa90f..615c673374faf4 100644 --- a/flang/tools/flang-driver/CMakeLists.txt +++ b/flang/tools/flang-driver/CMakeLists.txt @@ -11,18 +11,18 @@ set( LLVM_LINK_COMPONENTS TargetParser ) -add_flang_tool(flang-new +add_flang_tool(flang driver.cpp fc1_main.cpp ) -target_link_libraries(flang-new +target_link_libraries(flang PRIVATE flangFrontend flangFrontendTool ) -clang_target_link_libraries(flang-new +clang_target_link_libraries(flang PRIVATE clangDriver clangBasic @@ -30,9 +30,9 @@ clang_target_link_libraries(flang-new option(FLANG_PLUGIN_SUPPORT "Build Flang with plugin support." ON) -# Enable support for plugins, which need access to symbols from flang-new +# Enable support for plugins, which need access to symbols from flang if(FLANG_PLUGIN_SUPPORT) - export_executable_symbols_for_plugins(flang-new) + export_executable_symbols_for_plugins(flang) endif() -install(TARGETS flang-new DESTINATION "${CMAKE_INSTALL_BINDIR}") +install(TARGETS flang DESTINATION "${CMAKE_INSTALL_BINDIR}") diff --git a/flang/tools/flang-driver/driver.cpp b/flang/tools/flang-driver/driver.cpp index 52136df10c0b02..603aab4205836c 100644 --- a/flang/tools/flang-driver/driver.cpp +++ b/flang/tools/flang-driver/driver.cpp @@ -95,7 +95,7 @@ int main(int argc, const char **argv) { llvm::StringSaver saver(a); ExpandResponseFiles(saver, args); - // Check if flang-new is in the frontend mode + // Check if flang is in the frontend mode auto firstArg = std::find_if(args.begin() + 1, args.end(), [](const char *a) { return a != nullptr; }); if (firstArg != args.end()) { @@ -104,7 +104,7 @@ int main(int argc, const char **argv) { << "Valid tools include '-fc1'.\n"; return 1; } - // Call flang-new frontend + // Call flang frontend if (llvm::StringRef(args[1]).starts_with("-fc1")) { return executeFC1Tool(args); } @@ -140,7 +140,7 @@ int main(int argc, const char **argv) { // Set the environment variable, FLANG_COMPILER_OPTIONS_STRING, to contain all // the compiler options. This is intended for the frontend driver, - // flang-new -fc1, to enable the implementation of the COMPILER_OPTIONS + // flang -fc1, to enable the implementation of the COMPILER_OPTIONS // intrinsic. To this end, the frontend driver requires the list of the // original compiler options, which is not available through other means. // TODO: This way of passing information between the compiler and frontend diff --git a/llvm/runtimes/CMakeLists.txt b/llvm/runtimes/CMakeLists.txt index d948b7eb39b39c..9da1f926817a8b 100644 --- a/llvm/runtimes/CMakeLists.txt +++ b/llvm/runtimes/CMakeLists.txt @@ -504,15 +504,15 @@ if(build_runtimes) if("openmp" IN_LIST LLVM_ENABLE_RUNTIMES) if (${LLVM_TOOL_FLANG_BUILD}) - message(STATUS "Configuring build of omp_lib.mod and omp_lib_kinds.mod via flang-new") - set(LIBOMP_FORTRAN_MODULES_COMPILER "${CMAKE_BINARY_DIR}/bin/flang-new") + message(STATUS "Configuring build of omp_lib.mod and omp_lib_kinds.mod via flang") + set(LIBOMP_FORTRAN_MODULES_COMPILER "${CMAKE_BINARY_DIR}/bin/flang") set(LIBOMP_MODULES_INSTALL_PATH "${CMAKE_INSTALL_INCLUDEDIR}/flang") # TODO: This is a workaround until flang becomes a first-class project - # in llvm/CMakeList.txt. Until then, this line ensures that flang-new is - # built before "openmp" is built as a runtime project. Besides "flang-new" + # in llvm/CMakeList.txt. Until then, this line ensures that flang is + # built before "openmp" is built as a runtime project. Besides "flang" # to build the compiler, we also need to add "module_files" to make sure # that all .mod files are also properly build. - list(APPEND extra_deps "flang-new" "module_files") + list(APPEND extra_deps "flang" "module_files") endif() foreach(dep opt llvm-link llvm-extract clang clang-offload-packager) if(TARGET ${dep}) diff --git a/offload/CMakeLists.txt b/offload/CMakeLists.txt index 9ffe8f56b76e67..9b771d1116ee38 100644 --- a/offload/CMakeLists.txt +++ b/offload/CMakeLists.txt @@ -89,9 +89,9 @@ else() # Check for flang if (NOT MSVC) - set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang-new) + set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang) else() - set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang-new.exe) + set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang.exe) endif() # Set fortran test compiler if flang is found diff --git a/openmp/CMakeLists.txt b/openmp/CMakeLists.txt index 3b4259dfa380e8..c206386fa6b614 100644 --- a/openmp/CMakeLists.txt +++ b/openmp/CMakeLists.txt @@ -69,9 +69,9 @@ else() # Check for flang if (NOT MSVC) - set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang-new) + set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang) else() - set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang-new.exe) + set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang.exe) endif() # Set fortran test compiler if flang is found >From b71c1d519cc61a751268b1ccda3fc59a966bab96 Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Thu, 26 Sep 2024 10:39:53 -0500 Subject: [PATCH 2/7] [flang][driver] restore flang-new as symlink Restore flang-new as a symlink to flang for backwards compatibility Co-authored-by: H. Vetinari Co-authored-by: Andrzej Warzynski --- clang/lib/Driver/ToolChain.cpp | 3 +++ flang/tools/flang-driver/CMakeLists.txt | 4 ++++ flang/tools/flang-driver/driver.cpp | 3 ++- 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp index 16f9b629fc538c..c9f3dbd7707b77 100644 --- a/clang/lib/Driver/ToolChain.cpp +++ b/clang/lib/Driver/ToolChain.cpp @@ -381,6 +381,9 @@ static const DriverSuffix *FindDriverSuffix(StringRef ProgName, size_t &Pos) { {"cl", "--driver-mode=cl"}, {"++", "--driver-mode=g++"}, {"flang", "--driver-mode=flang"}, + // For backwards compatibility, we create a symlink for `flang` called + // `flang-new`. This will be removed in the future. + {"flang-new", "--driver-mode=flang"}, {"clang-dxc", "--driver-mode=dxc"}, }; diff --git a/flang/tools/flang-driver/CMakeLists.txt b/flang/tools/flang-driver/CMakeLists.txt index 615c673374faf4..063acdd7dfe57c 100644 --- a/flang/tools/flang-driver/CMakeLists.txt +++ b/flang/tools/flang-driver/CMakeLists.txt @@ -36,3 +36,7 @@ if(FLANG_PLUGIN_SUPPORT) endif() install(TARGETS flang DESTINATION "${CMAKE_INSTALL_BINDIR}") + +# Keep "flang-new" as a symlink for backwards compatiblity. Remove once "flang" +# is a widely adopted name. +add_flang_symlink(flang-new flang) diff --git a/flang/tools/flang-driver/driver.cpp b/flang/tools/flang-driver/driver.cpp index 603aab4205836c..ed52988feaa59c 100644 --- a/flang/tools/flang-driver/driver.cpp +++ b/flang/tools/flang-driver/driver.cpp @@ -88,7 +88,8 @@ int main(int argc, const char **argv) { llvm::InitLLVM x(argc, argv); llvm::SmallVector args(argv, argv + argc); - clang::driver::ParsedClangName targetandMode("flang", "--driver-mode=flang"); + clang::driver::ParsedClangName targetandMode = + clang::driver::ToolChain::getTargetAndModeFromProgramName(argv[0]); std::string driverPath = getExecutablePath(args[0]); llvm::BumpPtrAllocator a; >From 443c951f8e0458e8b011424fad6a2e4b40b63144 Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Mon, 30 Sep 2024 10:16:59 -0500 Subject: [PATCH 3/7] [flang][driver] add version to flang executable --- flang/tools/flang-driver/CMakeLists.txt | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/flang/tools/flang-driver/CMakeLists.txt b/flang/tools/flang-driver/CMakeLists.txt index 063acdd7dfe57c..9a89a6185a3291 100644 --- a/flang/tools/flang-driver/CMakeLists.txt +++ b/flang/tools/flang-driver/CMakeLists.txt @@ -28,6 +28,12 @@ clang_target_link_libraries(flang clangBasic ) +# This creates the executable with a version appended +# and creates a symlink to it without the version +if(CYGWIN OR NOT WIN32) # but it doesn't work on Windows + set_target_properties(flang PROPERTIES VERSION ${FLANG_EXECUTABLE_VERSION}) +endif() + option(FLANG_PLUGIN_SUPPORT "Build Flang with plugin support." ON) # Enable support for plugins, which need access to symbols from flang >From 27ae40d86f235890d109ca88682dd0caba0d2c93 Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Thu, 3 Oct 2024 14:12:35 -0700 Subject: [PATCH 4/7] [flang][driver] add warning when using openmp --- clang/include/clang/Basic/DiagnosticDriverKinds.td | 3 +++ clang/include/clang/Basic/DiagnosticGroups.td | 4 ++++ clang/lib/Driver/ToolChains/Flang.cpp | 3 +++ 3 files changed, 10 insertions(+) diff --git a/clang/include/clang/Basic/DiagnosticDriverKinds.td b/clang/include/clang/Basic/DiagnosticDriverKinds.td index 97573fcf20c1fb..68722ad9633120 100644 --- a/clang/include/clang/Basic/DiagnosticDriverKinds.td +++ b/clang/include/clang/Basic/DiagnosticDriverKinds.td @@ -147,6 +147,9 @@ def warn_drv_unsupported_option_for_processor : Warning< def warn_drv_unsupported_openmp_library : Warning< "the library '%0=%1' is not supported, OpenMP will not be enabled">, InGroup; +def warn_openmp_experimental : Warning< + "OpenMP support in flang is still experimental">, + InGroup; def err_drv_invalid_thread_model_for_target : Error< "invalid thread model '%0' in '%1' for this target">; diff --git a/clang/include/clang/Basic/DiagnosticGroups.td b/clang/include/clang/Basic/DiagnosticGroups.td index 7d81bdf827ea0c..bfa065f018f8d8 100644 --- a/clang/include/clang/Basic/DiagnosticGroups.td +++ b/clang/include/clang/Basic/DiagnosticGroups.td @@ -1582,3 +1582,7 @@ def ExtractAPIMisuse : DiagGroup<"extractapi-misuse">; // Warnings about using the non-standard extension having an explicit specialization // with a storage class specifier. def ExplicitSpecializationStorageClass : DiagGroup<"explicit-specialization-storage-class">; + +// A warning for options that enable a feature that is not yet complete +def ExperimentalOption : DiagGroup<"experimental-option">; + diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp index 1ca12ff81389a3..19b43594b00815 100644 --- a/clang/lib/Driver/ToolChains/Flang.cpp +++ b/clang/lib/Driver/ToolChains/Flang.cpp @@ -787,6 +787,9 @@ void Flang::ConstructJob(Compilation &C, const JobAction &JA, if (Args.hasArg(options::OPT_fopenmp_force_usm)) CmdArgs.push_back("-fopenmp-force-usm"); + // TODO: OpenMP support isn't "done" yet, so for now we warn that it + // is experimental. + D.Diag(diag::warn_openmp_experimental); // FIXME: Clang supports a whole bunch more flags here. break; >From d8f95da5712a7d03a935c8b38f06d373c21f7a1f Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Fri, 4 Oct 2024 06:27:05 -0700 Subject: [PATCH 5/7] [flang][doc] update note about CMake support --- flang/docs/FlangDriver.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/flang/docs/FlangDriver.md b/flang/docs/FlangDriver.md index 47cf078cf2d0d4..23cbab30ee903e 100644 --- a/flang/docs/FlangDriver.md +++ b/flang/docs/FlangDriver.md @@ -335,7 +335,7 @@ just added using your new frontend option. ## CMake Support As of [#7246](https://gitlab.kitware.com/cmake/cmake/-/merge_requests/7246) -(and soon to be released CMake 3.24.0), `cmake` can detect `flang` as a +(CMake 3.28.0), `cmake` can detect `flang` as a supported Fortran compiler. You can configure your CMake projects to use `flang` as follows: ```bash >From a35343fd31314a59f671474474258c8707c123ab Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Fri, 4 Oct 2024 07:11:16 -0700 Subject: [PATCH 6/7] [flang][test] fix tests broken by rename --- flang/test/Driver/driver-version.f90 | 2 +- flang/test/Driver/lto-flags.f90 | 2 +- flang/test/Driver/missing-input.f90 | 6 +++--- flang/test/Driver/multiple-input-files.f90 | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/flang/test/Driver/driver-version.f90 b/flang/test/Driver/driver-version.f90 index 4c6aecb1c4fa7e..6daeb0e767c0e0 100644 --- a/flang/test/Driver/driver-version.f90 +++ b/flang/test/Driver/driver-version.f90 @@ -9,7 +9,7 @@ ! VERSION-NEXT: Thread model: ! VERSION-NEXT: InstalledDir: -! ERROR: flang: error: unknown argument '--versions'; did you mean '--version'? +! ERROR: flang{{.*}}: error: unknown argument '--versions'; did you mean '--version'? ! VERSION-FC1: LLVM version diff --git a/flang/test/Driver/lto-flags.f90 b/flang/test/Driver/lto-flags.f90 index bad3d972e6bd6b..be9416810716a9 100644 --- a/flang/test/Driver/lto-flags.f90 +++ b/flang/test/Driver/lto-flags.f90 @@ -30,7 +30,7 @@ ! FULL-LTO: "-fc1" ! FULL-LTO-SAME: "-flto=full" -! THIN-LTO-ALL: flang: warning: the option '-flto=thin' is a work in progress +! THIN-LTO-ALL: flang{{.*}}: warning: the option '-flto=thin' is a work in progress ! THIN-LTO-ALL: "-fc1" ! THIN-LTO-ALL-SAME: "-flto=thin" ! THIN-LTO-LINKER-PLUGIN: "-plugin-opt=thinlto" diff --git a/flang/test/Driver/missing-input.f90 b/flang/test/Driver/missing-input.f90 index 51d37a718c542f..aeefbe14c20563 100644 --- a/flang/test/Driver/missing-input.f90 +++ b/flang/test/Driver/missing-input.f90 @@ -17,10 +17,10 @@ ! RUN: not %flang_fc1 %t.f90 2>&1 | FileCheck %s --check-prefix=FLANG-FC1-NONEXISTENT-FILE ! RUN: not %flang_fc1 %S 2>&1 | FileCheck %s --check-prefix=FLANG-FC1-DIR -! FLANG-NO-FILE: flang: error: no input files +! FLANG-NO-FILE: flang{{.*}}: error: no input files -! FLANG-NONEXISTENT-FILE: flang: error: no such file or directory: {{.*}} -! FLANG-NONEXISTENT-FILE: flang: error: no input files +! FLANG-NONEXISTENT-FILE: flang{{.*}}: error: no such file or directory: {{.*}} +! FLANG-NONEXISTENT-FILE: flang{{.*}}: error: no input files ! FLANG-FC1-NONEXISTENT-FILE: error: {{.*}} does not exist ! FLANG-FC1-DIR: error: {{.*}} is not a regular file diff --git a/flang/test/Driver/multiple-input-files.f90 b/flang/test/Driver/multiple-input-files.f90 index 64ec8679abf94f..0242db288babf2 100644 --- a/flang/test/Driver/multiple-input-files.f90 +++ b/flang/test/Driver/multiple-input-files.f90 @@ -39,7 +39,7 @@ ! FLANG-NEXT:end program hello ! TEST 2: `-o` does not when multiple input files are present -! ERROR: flang: error: cannot specify -o when generating multiple output files +! ERROR: flang{{.*}}: error: cannot specify -o when generating multiple output files ! TEST 3: The output file _was not_ specified - `flang_fc1` will process all ! input files and generate one output file for every input file. >From 811b6a7f37f017b0713e6251d95a23c050ab7670 Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Fri, 4 Oct 2024 07:56:02 -0700 Subject: [PATCH 7/7] [flang][test] add check for OpenMP experimental warning --- flang/test/Driver/fopenmp.f90 | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/flang/test/Driver/fopenmp.f90 b/flang/test/Driver/fopenmp.f90 index 9b4dc5ffb1f690..b3c3547800bdba 100644 --- a/flang/test/Driver/fopenmp.f90 +++ b/flang/test/Driver/fopenmp.f90 @@ -73,3 +73,7 @@ ! ! CHECK-LD-ANYMD: "{{.*}}ld{{(.exe)?}}" ! CHECK-LD-ANYMD: "-l{{(omp|gomp|iomp5md)}}" +! +! RUN: %flang -fopenmp -c %s -### 2>&1 | FileCheck %s --check-prefix=CHECK-EXPERIMENTAL +! +! CHECK-EXPERIMENTAL: flang{{.*}}: warning: OpenMP support in flang is still experimental From openmp-commits at lists.llvm.org Fri Oct 4 07:57:42 2024 From: openmp-commits at lists.llvm.org (Brad Richardson via Openmp-commits) Date: Fri, 04 Oct 2024 07:57:42 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <67000266.050a0220.17ef4.fe52@mx.google.com> ================ @@ -787,6 +787,9 @@ void Flang::ConstructJob(Compilation &C, const JobAction &JA, if (Args.hasArg(options::OPT_fopenmp_force_usm)) CmdArgs.push_back("-fopenmp-force-usm"); + // TODO: OpenMP support isn't "done" yet, so for now we warn that it + // is experimental. + D.Diag(diag::warn_openmp_experimental); ---------------- everythingfunctional wrote: While I was slightly hesitant to add a test for a temporary warning, I've done so. If anyone else thinks it doesn't belong I'll undo it. https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Fri Oct 4 07:59:33 2024 From: openmp-commits at lists.llvm.org (=?UTF-8?Q?Andrzej_Warzy=C5=84ski?= via Openmp-commits) Date: Fri, 04 Oct 2024 07:59:33 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <670002d5.170a0220.2510dd.4a72@mx.google.com> ================ @@ -339,11 +335,11 @@ just added using your new frontend option. ## CMake Support As of [#7246](https://gitlab.kitware.com/cmake/cmake/-/merge_requests/7246) -(and soon to be released CMake 3.24.0), `cmake` can detect `flang-new` as a +(CMake 3.28.0), `cmake` can detect `flang` as a ---------------- banach-space wrote: https://gitlab.kitware.com/cmake/cmake/-/merge_requests/7246 pre-dates CMake 3.28.0 by quite a few releases, I would just remove the link. @DavidTruby what makes 3.28 special? Should 3.28 be marked as requirement? https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Fri Oct 4 10:06:37 2024 From: openmp-commits at lists.llvm.org (Alexey Bataev via Openmp-commits) Date: Fri, 04 Oct 2024 10:06:37 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [Clang][OpenMP] Add permutation clause (PR #92030) In-Reply-To: Message-ID: <6700209d.170a0220.366c87.1539@mx.google.com> ================ @@ -16029,6 +16074,44 @@ OMPClause *SemaOpenMP::ActOnOpenMPSizesClause(ArrayRef SizeExprs, SanitizedSizeExprs); } +OMPClause *SemaOpenMP::ActOnOpenMPPermutationClause(ArrayRef PermExprs, + SourceLocation StartLoc, + SourceLocation LParenLoc, + SourceLocation EndLoc) { + size_t NumLoops = PermExprs.size(); + SmallVector SanitizedPermExprs; + llvm::append_range(SanitizedPermExprs, PermExprs); + + for (Expr *&PermExpr : SanitizedPermExprs) { + // Skip if template-dependent or already sanitized, e.g. during a partial + // template instantiation. + if (!PermExpr || PermExpr->isInstantiationDependent()) + continue; + + llvm::APSInt PermVal; + ExprResult PermEvalExpr = SemaRef.VerifyIntegerConstantExpression( + PermExpr, &PermVal, Sema::AllowFold); + bool IsValid = PermEvalExpr.isUsable(); + if (IsValid) ---------------- alexey-bataev wrote: ```suggestion if (!IsValid) { PermExpr = nullptr; continue; } ``` ? https://github.com/llvm/llvm-project/pull/92030 From openmp-commits at lists.llvm.org Mon Oct 7 07:18:46 2024 From: openmp-commits at lists.llvm.org (Xing Xue via Openmp-commits) Date: Mon, 07 Oct 2024 07:18:46 -0700 (PDT) Subject: [Openmp-commits] [openmp] [libomp][AIX] Use SO version "1" for AIX libomp (PR #111384) Message-ID: https://github.com/xingxue-ibm created https://github.com/llvm/llvm-project/pull/111384 For `libomp` on AIX, we build shared object `libomp.so` first and then archive it into libomp.a. This patch changes to use SO version `1` and name the shared object `libomp.so.1` so that it is consistent with the naming of other shared objects in AIX libraries, e.g., `libc++.so.1` in `libc++.a`. With this change, the change made in commit bde51d9b0d473447ea12fb14924f14ea167eec85 to ensure only `libomp.a` is published on AIX is no longer necessary and is removed. >From 71cc26f08fb647002e8f7e84d251ac05211dd7a3 Mon Sep 17 00:00:00 2001 From: Xing Xue Date: Fri, 4 Oct 2024 13:57:33 -0400 Subject: [PATCH] Use SO version "1" for AIX libomp. --- openmp/runtime/src/CMakeLists.txt | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/openmp/runtime/src/CMakeLists.txt b/openmp/runtime/src/CMakeLists.txt index 2dd54b5116d920..439cc20963a129 100644 --- a/openmp/runtime/src/CMakeLists.txt +++ b/openmp/runtime/src/CMakeLists.txt @@ -212,6 +212,15 @@ if(OPENMP_MSVC_NAME_SCHEME) LINK_FLAGS "${LIBOMP_CONFIGURED_LDFLAGS}" LINKER_LANGUAGE ${LIBOMP_LINKER_LANGUAGE} ) +elseif(${CMAKE_SYSTEM_NAME} MATCHES "AIX") + set(LIBOMP_SHARED_OUTPUT_NAME "omp" CACHE STRING "Output name for the shared libomp runtime library.") + set_target_properties(omp PROPERTIES + OUTPUT_NAME "${LIBOMP_SHARED_OUTPUT_NAME}" + LINK_FLAGS "${LIBOMP_CONFIGURED_LDFLAGS}" + LINKER_LANGUAGE ${LIBOMP_LINKER_LANGUAGE} + VERSION "1.0" + SOVERSION "1" + ) else() set_target_properties(omp PROPERTIES PREFIX "" SUFFIX "" OUTPUT_NAME "${LIBOMP_LIB_FILE}" @@ -426,11 +435,7 @@ if(WIN32) endforeach() else() - if(${CMAKE_SYSTEM_NAME} MATCHES "AIX") - install(FILES ${LIBOMP_LIBRARY_DIR}/libomp.a DESTINATION "${OPENMP_INSTALL_LIBDIR}" COMPONENT runtime) - else() - install(TARGETS omp ${export_to_llvmexports} ${LIBOMP_INSTALL_KIND} DESTINATION "${OPENMP_INSTALL_LIBDIR}") - endif() + install(TARGETS omp ${export_to_llvmexports} ${LIBOMP_INSTALL_KIND} DESTINATION "${OPENMP_INSTALL_LIBDIR}") if(${LIBOMP_INSTALL_ALIASES}) # Create aliases (symlinks) of the library for backwards compatibility From openmp-commits at lists.llvm.org Mon Oct 7 09:51:11 2024 From: openmp-commits at lists.llvm.org (Daniel Chen via Openmp-commits) Date: Mon, 07 Oct 2024 09:51:11 -0700 (PDT) Subject: [Openmp-commits] [openmp] [libomp][AIX] Use SO version "1" for AIX libomp (PR #111384) In-Reply-To: Message-ID: <6704117f.170a0220.27d277.e588@mx.google.com> https://github.com/DanielCChen approved this pull request. LGTM. https://github.com/llvm/llvm-project/pull/111384 From openmp-commits at lists.llvm.org Tue Oct 8 03:04:16 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Tue, 08 Oct 2024 03:04:16 -0700 (PDT) Subject: [Openmp-commits] [openmp] c62e61a - [libomp][AIX] Use SO version "1" for AIX libomp (#111384) Message-ID: <670503a0.170a0220.acce1.2b6c@mx.google.com> Author: Xing Xue Date: 2024-10-08T06:04:13-04:00 New Revision: c62e61acb428bb46ad834f8288b0c6f8c2ea8d31 URL: https://github.com/llvm/llvm-project/commit/c62e61acb428bb46ad834f8288b0c6f8c2ea8d31 DIFF: https://github.com/llvm/llvm-project/commit/c62e61acb428bb46ad834f8288b0c6f8c2ea8d31.diff LOG: [libomp][AIX] Use SO version "1" for AIX libomp (#111384) For `libomp` on AIX, we build shared object `libomp.so` first and then archive it into libomp.a. This patch changes to use SO version `1` and name the shared object `libomp.so.1` so that it is consistent with the naming of other shared objects in AIX libraries, e.g., `libc++.so.1` in `libc++.a`. With this change, the change made in commit bde51d9b0d473447ea12fb14924f14ea167eec85 to ensure only `libomp.a` is published on AIX is no longer necessary and is removed. Added: Modified: openmp/runtime/src/CMakeLists.txt Removed: ################################################################################ diff --git a/openmp/runtime/src/CMakeLists.txt b/openmp/runtime/src/CMakeLists.txt index 2dd54b5116d920..439cc20963a129 100644 --- a/openmp/runtime/src/CMakeLists.txt +++ b/openmp/runtime/src/CMakeLists.txt @@ -212,6 +212,15 @@ if(OPENMP_MSVC_NAME_SCHEME) LINK_FLAGS "${LIBOMP_CONFIGURED_LDFLAGS}" LINKER_LANGUAGE ${LIBOMP_LINKER_LANGUAGE} ) +elseif(${CMAKE_SYSTEM_NAME} MATCHES "AIX") + set(LIBOMP_SHARED_OUTPUT_NAME "omp" CACHE STRING "Output name for the shared libomp runtime library.") + set_target_properties(omp PROPERTIES + OUTPUT_NAME "${LIBOMP_SHARED_OUTPUT_NAME}" + LINK_FLAGS "${LIBOMP_CONFIGURED_LDFLAGS}" + LINKER_LANGUAGE ${LIBOMP_LINKER_LANGUAGE} + VERSION "1.0" + SOVERSION "1" + ) else() set_target_properties(omp PROPERTIES PREFIX "" SUFFIX "" OUTPUT_NAME "${LIBOMP_LIB_FILE}" @@ -426,11 +435,7 @@ if(WIN32) endforeach() else() - if(${CMAKE_SYSTEM_NAME} MATCHES "AIX") - install(FILES ${LIBOMP_LIBRARY_DIR}/libomp.a DESTINATION "${OPENMP_INSTALL_LIBDIR}" COMPONENT runtime) - else() - install(TARGETS omp ${export_to_llvmexports} ${LIBOMP_INSTALL_KIND} DESTINATION "${OPENMP_INSTALL_LIBDIR}") - endif() + install(TARGETS omp ${export_to_llvmexports} ${LIBOMP_INSTALL_KIND} DESTINATION "${OPENMP_INSTALL_LIBDIR}") if(${LIBOMP_INSTALL_ALIASES}) # Create aliases (symlinks) of the library for backwards compatibility From openmp-commits at lists.llvm.org Tue Oct 8 03:04:20 2024 From: openmp-commits at lists.llvm.org (Xing Xue via Openmp-commits) Date: Tue, 08 Oct 2024 03:04:20 -0700 (PDT) Subject: [Openmp-commits] [openmp] [libomp][AIX] Use SO version "1" for AIX libomp (PR #111384) In-Reply-To: Message-ID: <670503a4.620a0220.17b39d.2dbd@mx.google.com> https://github.com/xingxue-ibm closed https://github.com/llvm/llvm-project/pull/111384 From openmp-commits at lists.llvm.org Tue Oct 8 08:30:53 2024 From: openmp-commits at lists.llvm.org (Nikita Popov via Openmp-commits) Date: Tue, 08 Oct 2024 08:30:53 -0700 (PDT) Subject: [Openmp-commits] [openmp] [openmp] Add option to disable tsan tests (PR #111548) Message-ID: https://github.com/nikic created https://github.com/llvm/llvm-project/pull/111548 This adds a OPENMP_TEST_ENABLE_TSAN option that allows to override whether tests using tsan will be enabled. The option defaults to the existing auto-detection. The background here is https://github.com/llvm/llvm-project/issues/111492, where we have some systems where tsan doesn't work, but we do still want to build it. >From 0d22314aac0875c6b6e748a2a87dddfa68f8416a Mon Sep 17 00:00:00 2001 From: Nikita Popov Date: Tue, 8 Oct 2024 17:12:52 +0200 Subject: [PATCH] [openmp] Add option to disable tsan tests This adds a OPENMP_TEST_ENABLE_TSAN option that allows to override whether tests using tsan will be enabled. The option defaults to the existing auto-detection. --- openmp/cmake/OpenMPTesting.cmake | 3 +++ openmp/tools/archer/tests/CMakeLists.txt | 2 +- openmp/tools/archer/tests/lit.site.cfg.in | 2 +- 3 files changed, 5 insertions(+), 2 deletions(-) diff --git a/openmp/cmake/OpenMPTesting.cmake b/openmp/cmake/OpenMPTesting.cmake index c67ad8b1cbd9cc..14cc5c67d84c2d 100644 --- a/openmp/cmake/OpenMPTesting.cmake +++ b/openmp/cmake/OpenMPTesting.cmake @@ -163,6 +163,9 @@ else() set(OPENMP_TEST_COMPILER_HAS_OMIT_FRAME_POINTER_FLAGS 1) endif() +set(OPENMP_TEST_ENABLE_TSAN "${OPENMP_TEST_COMPILER_HAS_TSAN_FLAGS}" CACHE BOOL + "Whether to enable tests using tsan") + # Function to set compiler features for use in lit. function(update_test_compiler_features) set(FEATURES "[") diff --git a/openmp/tools/archer/tests/CMakeLists.txt b/openmp/tools/archer/tests/CMakeLists.txt index 5de91148fa4b38..412c7d63725eb2 100644 --- a/openmp/tools/archer/tests/CMakeLists.txt +++ b/openmp/tools/archer/tests/CMakeLists.txt @@ -28,7 +28,7 @@ macro(pythonize_bool var) endmacro() pythonize_bool(LIBARCHER_HAVE_LIBATOMIC) -pythonize_bool(OPENMP_TEST_COMPILER_HAS_TSAN_FLAGS) +pythonize_bool(OPENMP_TEST_ENABLE_TSAN) set(ARCHER_TSAN_TEST_DEPENDENCE "") if(TARGET tsan) diff --git a/openmp/tools/archer/tests/lit.site.cfg.in b/openmp/tools/archer/tests/lit.site.cfg.in index 55edfde9738e3b..ddcb7b8bc3a56b 100644 --- a/openmp/tools/archer/tests/lit.site.cfg.in +++ b/openmp/tools/archer/tests/lit.site.cfg.in @@ -12,7 +12,7 @@ config.omp_library_dir = "@LIBOMP_LIBRARY_DIR@" config.omp_header_dir = "@LIBOMP_INCLUDE_DIR@" config.operating_system = "@CMAKE_SYSTEM_NAME@" config.has_libatomic = @LIBARCHER_HAVE_LIBATOMIC@ -config.has_tsan = @OPENMP_TEST_COMPILER_HAS_TSAN_FLAGS@ +config.has_tsan = @OPENMP_TEST_ENABLE_TSAN@ config.test_archer_flags = "@LIBARCHER_TEST_FLAGS@" config.libarcher_obj_root = "@CMAKE_CURRENT_BINARY_DIR@" From openmp-commits at lists.llvm.org Tue Oct 8 08:33:24 2024 From: openmp-commits at lists.llvm.org (Nikita Popov via Openmp-commits) Date: Tue, 08 Oct 2024 08:33:24 -0700 (PDT) Subject: [Openmp-commits] [openmp] [openmp] Add option to disable tsan tests (PR #111548) In-Reply-To: Message-ID: <670550c4.170a0220.336069.43cf@mx.google.com> https://github.com/nikic edited https://github.com/llvm/llvm-project/pull/111548 From openmp-commits at lists.llvm.org Tue Oct 8 08:35:12 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Tue, 08 Oct 2024 08:35:12 -0700 (PDT) Subject: [Openmp-commits] [openmp] [openmp] Add option to disable tsan tests (PR #111548) In-Reply-To: Message-ID: <67055130.050a0220.17fb49.7c1d@mx.google.com> https://github.com/jprotze approved this pull request. lgtm https://github.com/llvm/llvm-project/pull/111548 From openmp-commits at lists.llvm.org Tue Oct 8 08:53:52 2024 From: openmp-commits at lists.llvm.org (Michael Kruse via Openmp-commits) Date: Tue, 08 Oct 2024 08:53:52 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [Clang][OpenMP] Add permutation clause (PR #92030) In-Reply-To: Message-ID: <67055590.170a0220.16e512.30f3@mx.google.com> ================ @@ -16029,6 +16074,44 @@ OMPClause *SemaOpenMP::ActOnOpenMPSizesClause(ArrayRef SizeExprs, SanitizedSizeExprs); } +OMPClause *SemaOpenMP::ActOnOpenMPPermutationClause(ArrayRef PermExprs, + SourceLocation StartLoc, + SourceLocation LParenLoc, + SourceLocation EndLoc) { + size_t NumLoops = PermExprs.size(); + SmallVector SanitizedPermExprs; + llvm::append_range(SanitizedPermExprs, PermExprs); + + for (Expr *&PermExpr : SanitizedPermExprs) { + // Skip if template-dependent or already sanitized, e.g. during a partial + // template instantiation. + if (!PermExpr || PermExpr->isInstantiationDependent()) + continue; + + llvm::APSInt PermVal; + ExprResult PermEvalExpr = SemaRef.VerifyIntegerConstantExpression( + PermExpr, &PermVal, Sema::AllowFold); + bool IsValid = PermEvalExpr.isUsable(); + if (IsValid) ---------------- Meinersbur wrote: In partial template instantiations the expression may not evaluate to a constant yet because it depends on another template parameter. We cannot set `PermExpr` to nullptr because we need to preserve the expression to be evaluated at another call to `ActOnOpenMPPermutationClause` when the template is fully instantiated. https://github.com/llvm/llvm-project/pull/92030 From openmp-commits at lists.llvm.org Tue Oct 8 09:02:33 2024 From: openmp-commits at lists.llvm.org (Alexey Bataev via Openmp-commits) Date: Tue, 08 Oct 2024 09:02:33 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [Clang][OpenMP] Add permutation clause (PR #92030) In-Reply-To: Message-ID: <67055799.a70a0220.caebc.66e8@mx.google.com> https://github.com/alexey-bataev approved this pull request. https://github.com/llvm/llvm-project/pull/92030 From openmp-commits at lists.llvm.org Tue Oct 8 09:12:12 2024 From: openmp-commits at lists.llvm.org (Kiran Chandramohan via Openmp-commits) Date: Tue, 08 Oct 2024 09:12:12 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <670559dc.170a0220.347f51.bd33@mx.google.com> https://github.com/kiranchandramohan approved this pull request. LGTM. Thanks for adding the warning about OpenMP. Please ask in the Flang community call tomorrow and in the discourse post before you submit. https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Wed Oct 9 02:29:33 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Wed, 09 Oct 2024 02:29:33 -0700 (PDT) Subject: [Openmp-commits] [openmp] b2edeb5 - [openmp] Add option to disable tsan tests (#111548) Message-ID: <67064cfd.050a0220.369330.a4fe@mx.google.com> Author: Nikita Popov Date: 2024-10-09T11:29:30+02:00 New Revision: b2edeb58b8cb3268acee425cd52b406eb60a8095 URL: https://github.com/llvm/llvm-project/commit/b2edeb58b8cb3268acee425cd52b406eb60a8095 DIFF: https://github.com/llvm/llvm-project/commit/b2edeb58b8cb3268acee425cd52b406eb60a8095.diff LOG: [openmp] Add option to disable tsan tests (#111548) This adds a OPENMP_TEST_ENABLE_TSAN option that allows to override whether tests using tsan will be enabled. The option defaults to the existing auto-detection. The background here is https://github.com/llvm/llvm-project/issues/111492, where we have some systems where tsan doesn't work, but we do still want to build it and run tests that don't use tsan. Added: Modified: openmp/cmake/OpenMPTesting.cmake openmp/tools/archer/tests/CMakeLists.txt openmp/tools/archer/tests/lit.site.cfg.in Removed: ################################################################################ diff --git a/openmp/cmake/OpenMPTesting.cmake b/openmp/cmake/OpenMPTesting.cmake index c67ad8b1cbd9cc..14cc5c67d84c2d 100644 --- a/openmp/cmake/OpenMPTesting.cmake +++ b/openmp/cmake/OpenMPTesting.cmake @@ -163,6 +163,9 @@ else() set(OPENMP_TEST_COMPILER_HAS_OMIT_FRAME_POINTER_FLAGS 1) endif() +set(OPENMP_TEST_ENABLE_TSAN "${OPENMP_TEST_COMPILER_HAS_TSAN_FLAGS}" CACHE BOOL + "Whether to enable tests using tsan") + # Function to set compiler features for use in lit. function(update_test_compiler_features) set(FEATURES "[") diff --git a/openmp/tools/archer/tests/CMakeLists.txt b/openmp/tools/archer/tests/CMakeLists.txt index 5de91148fa4b38..412c7d63725eb2 100644 --- a/openmp/tools/archer/tests/CMakeLists.txt +++ b/openmp/tools/archer/tests/CMakeLists.txt @@ -28,7 +28,7 @@ macro(pythonize_bool var) endmacro() pythonize_bool(LIBARCHER_HAVE_LIBATOMIC) -pythonize_bool(OPENMP_TEST_COMPILER_HAS_TSAN_FLAGS) +pythonize_bool(OPENMP_TEST_ENABLE_TSAN) set(ARCHER_TSAN_TEST_DEPENDENCE "") if(TARGET tsan) diff --git a/openmp/tools/archer/tests/lit.site.cfg.in b/openmp/tools/archer/tests/lit.site.cfg.in index 55edfde9738e3b..ddcb7b8bc3a56b 100644 --- a/openmp/tools/archer/tests/lit.site.cfg.in +++ b/openmp/tools/archer/tests/lit.site.cfg.in @@ -12,7 +12,7 @@ config.omp_library_dir = "@LIBOMP_LIBRARY_DIR@" config.omp_header_dir = "@LIBOMP_INCLUDE_DIR@" config.operating_system = "@CMAKE_SYSTEM_NAME@" config.has_libatomic = @LIBARCHER_HAVE_LIBATOMIC@ -config.has_tsan = @OPENMP_TEST_COMPILER_HAS_TSAN_FLAGS@ +config.has_tsan = @OPENMP_TEST_ENABLE_TSAN@ config.test_archer_flags = "@LIBARCHER_TEST_FLAGS@" config.libarcher_obj_root = "@CMAKE_CURRENT_BINARY_DIR@" From openmp-commits at lists.llvm.org Wed Oct 9 02:29:38 2024 From: openmp-commits at lists.llvm.org (Nikita Popov via Openmp-commits) Date: Wed, 09 Oct 2024 02:29:38 -0700 (PDT) Subject: [Openmp-commits] [openmp] [openmp] Add option to disable tsan tests (PR #111548) In-Reply-To: Message-ID: <67064d02.170a0220.2c27de.2d03@mx.google.com> https://github.com/nikic closed https://github.com/llvm/llvm-project/pull/111548 From openmp-commits at lists.llvm.org Wed Oct 9 05:57:06 2024 From: openmp-commits at lists.llvm.org (Michael Kruse via Openmp-commits) Date: Wed, 09 Oct 2024 05:57:06 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [Clang][OpenMP] Add permutation clause (PR #92030) In-Reply-To: Message-ID: <67067da2.170a0220.12b5df.9026@mx.google.com> https://github.com/Meinersbur closed https://github.com/llvm/llvm-project/pull/92030 From openmp-commits at lists.llvm.org Wed Oct 9 11:00:56 2024 From: openmp-commits at lists.llvm.org (Brad Richardson via Openmp-commits) Date: Wed, 09 Oct 2024 11:00:56 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <6706c4d8.170a0220.19ac54.4f76@mx.google.com> https://github.com/everythingfunctional updated https://github.com/llvm/llvm-project/pull/110023 >From c19e89f7358dc638687be4da8f5a51cb483b3637 Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Wed, 25 Sep 2024 13:25:22 -0500 Subject: [PATCH 1/7] [flang][driver] rename flang-new to flang --- .github/workflows/release-binaries.yml | 2 +- clang/include/clang/Driver/Options.td | 4 +- clang/lib/Driver/Driver.cpp | 2 +- clang/lib/Driver/ToolChains/Flang.cpp | 6 +- clang/test/Driver/flang/flang.f90 | 2 +- clang/test/Driver/flang/flang_ucase.F90 | 2 +- .../Driver/flang/multiple-inputs-mixed.f90 | 2 +- clang/test/Driver/flang/multiple-inputs.f90 | 4 +- flang/docs/FlangDriver.md | 76 +++++++++---------- flang/docs/ImplementingASemanticCheck.md | 4 +- flang/docs/Overview.md | 26 +++---- .../FlangOmpReport/FlangOmpReport.cpp | 2 +- .../flang/Optimizer/Analysis/AliasAnalysis.h | 2 +- flang/include/flang/Tools/CrossToolHelpers.h | 2 +- flang/lib/Frontend/CompilerInvocation.cpp | 6 +- flang/lib/Frontend/FrontendActions.cpp | 2 +- .../ExecuteCompilerInvocation.cpp | 3 +- flang/runtime/CMakeLists.txt | 6 +- flang/test/CMakeLists.txt | 2 +- flang/test/Driver/aarch64-outline-atomics.f90 | 2 +- .../Driver/color-diagnostics-forwarding.f90 | 4 +- flang/test/Driver/compiler-options.f90 | 4 +- flang/test/Driver/convert.f90 | 2 +- .../test/Driver/disable-ext-name-interop.f90 | 2 +- flang/test/Driver/driver-version.f90 | 4 +- flang/test/Driver/escaped-backslash.f90 | 4 +- flang/test/Driver/fdefault.f90 | 28 +++---- flang/test/Driver/flarge-sizes.f90 | 20 ++--- .../test/Driver/frame-pointer-forwarding.f90 | 2 +- flang/test/Driver/frontend-forwarding.f90 | 4 +- flang/test/Driver/hlfir-no-hlfir-error.f90 | 4 +- flang/test/Driver/intrinsic-module-path.f90 | 2 +- flang/test/Driver/large-data-threshold.f90 | 6 +- flang/test/Driver/lto-flags.f90 | 2 +- flang/test/Driver/macro-def-undef.F90 | 4 +- flang/test/Driver/missing-input.f90 | 14 ++-- flang/test/Driver/multiple-input-files.f90 | 2 +- flang/test/Driver/omp-driver-offload.f90 | 66 ++++++++-------- .../predefined-macros-compiler-version.F90 | 4 +- flang/test/Driver/std2018-wrong.f90 | 2 +- flang/test/Driver/std2018.f90 | 2 +- .../Driver/supported-suffices/f03-suffix.f03 | 2 +- .../Driver/supported-suffices/f08-suffix.f08 | 2 +- flang/test/Driver/use-module-error.f90 | 4 +- flang/test/Driver/use-module.f90 | 4 +- flang/test/Driver/version-loops.f90 | 18 ++--- flang/test/Driver/wextra-ok.f90 | 2 +- flang/test/HLFIR/hlfir-flags.f90 | 2 +- .../Intrinsics/command_argument_count.f90 | 4 +- flang/test/Lower/Intrinsics/exit.f90 | 2 +- .../test/Lower/Intrinsics/ieee_is_normal.f90 | 2 +- flang/test/Lower/Intrinsics/isnan.f90 | 2 +- flang/test/Lower/Intrinsics/modulo.f90 | 2 +- .../OpenMP/Todo/omp-declarative-allocate.f90 | 2 +- .../OpenMP/Todo/omp-declare-reduction.f90 | 2 +- .../Lower/OpenMP/Todo/omp-declare-simd.f90 | 2 +- .../parallel-lastprivate-clause-scalar.f90 | 2 +- .../parallel-wsloop-reduction-byref.f90 | 2 +- .../OpenMP/parallel-wsloop-reduction.f90 | 2 +- flang/test/lit.cfg.py | 4 +- flang/tools/f18/CMakeLists.txt | 10 +-- flang/tools/flang-driver/CMakeLists.txt | 12 +-- flang/tools/flang-driver/driver.cpp | 6 +- llvm/runtimes/CMakeLists.txt | 10 +-- offload/CMakeLists.txt | 4 +- openmp/CMakeLists.txt | 4 +- 66 files changed, 220 insertions(+), 227 deletions(-) diff --git a/.github/workflows/release-binaries.yml b/.github/workflows/release-binaries.yml index f24e25879b96bd..1cde628d3f66c3 100644 --- a/.github/workflows/release-binaries.yml +++ b/.github/workflows/release-binaries.yml @@ -328,7 +328,7 @@ jobs: run: | # Build some of the mlir tools that take a long time to link if [ "${{ needs.prepare.outputs.build-flang }}" = "true" ]; then - ninja -C ${{ steps.setup-stage.outputs.build-prefix }}/build/tools/clang/stage2-bins/ -j2 flang-new bbc + ninja -C ${{ steps.setup-stage.outputs.build-prefix }}/build/tools/clang/stage2-bins/ -j2 flang bbc fi ninja -C ${{ steps.setup-stage.outputs.build-prefix }}/build/tools/clang/stage2-bins/ \ mlir-bytecode-parser-fuzzer \ diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 90f0c4f2df2130..8fb9edd7a2a927 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -6077,7 +6077,7 @@ def _sysroot_EQ : Joined<["--"], "sysroot=">, Visibility<[ClangOption, FlangOpti def _sysroot : Separate<["--"], "sysroot">, Alias<_sysroot_EQ>; //===----------------------------------------------------------------------===// -// pie/pic options (clang + flang-new) +// pie/pic options (clang + flang) //===----------------------------------------------------------------------===// let Visibility = [ClangOption, FlangOption] in { @@ -6093,7 +6093,7 @@ def fno_pie : Flag<["-"], "fno-pie">, Group; } // let Vis = [Default, FlangOption] //===----------------------------------------------------------------------===// -// Target Options (clang + flang-new) +// Target Options (clang + flang) //===----------------------------------------------------------------------===// let Flags = [TargetSpecific] in { let Visibility = [ClangOption, FlangOption] in { diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp index a5d43bdac23735..ba850cf3803e9b 100644 --- a/clang/lib/Driver/Driver.cpp +++ b/clang/lib/Driver/Driver.cpp @@ -2029,7 +2029,7 @@ void Driver::PrintHelp(bool ShowHidden) const { void Driver::PrintVersion(const Compilation &C, raw_ostream &OS) const { if (IsFlangMode()) { - OS << getClangToolFullVersion("flang-new") << '\n'; + OS << getClangToolFullVersion("flang") << '\n'; } else { // FIXME: The following handlers should use a callback mechanism, we don't // know what the client would like to do. diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp index 98350690f8d20e..1ca12ff81389a3 100644 --- a/clang/lib/Driver/ToolChains/Flang.cpp +++ b/clang/lib/Driver/ToolChains/Flang.cpp @@ -881,14 +881,12 @@ void Flang::ConstructJob(Compilation &C, const JobAction &JA, CmdArgs.push_back(Input.getFilename()); - // TODO: Replace flang-new with flang once the new driver replaces the - // throwaway driver - const char *Exec = Args.MakeArgString(D.GetProgramPath("flang-new", TC)); + const char *Exec = Args.MakeArgString(D.GetProgramPath("flang", TC)); C.addCommand(std::make_unique(JA, *this, ResponseFileSupport::AtFileUTF8(), Exec, CmdArgs, Inputs, Output)); } -Flang::Flang(const ToolChain &TC) : Tool("flang-new", "flang frontend", TC) {} +Flang::Flang(const ToolChain &TC) : Tool("flang", "flang frontend", TC) {} Flang::~Flang() {} diff --git a/clang/test/Driver/flang/flang.f90 b/clang/test/Driver/flang/flang.f90 index ad4a3a3b6bd44d..b52977ee66d7b0 100644 --- a/clang/test/Driver/flang/flang.f90 +++ b/clang/test/Driver/flang/flang.f90 @@ -13,7 +13,7 @@ ! * (no type specified, resulting in an object file) ! All invocations should begin with flang -fc1, consume up to here. -! ALL-LABEL: "{{[^"]*}}flang-new{{[^"/]*}}" "-fc1" +! ALL-LABEL: "{{[^"]*}}flang{{[^"/]*}}" "-fc1" ! Check that f90 files are not treated as "previously preprocessed" ! ... in --driver-mode=flang. diff --git a/clang/test/Driver/flang/flang_ucase.F90 b/clang/test/Driver/flang/flang_ucase.F90 index e89c053b327bc9..88aedc39fb94a7 100644 --- a/clang/test/Driver/flang/flang_ucase.F90 +++ b/clang/test/Driver/flang/flang_ucase.F90 @@ -13,7 +13,7 @@ ! * (no type specified, resulting in an object file) ! All invocations should begin with flang -fc1, consume up to here. -! ALL-LABEL: "{{[^"]*}}flang-new{{[^"/]*}}" "-fc1" +! ALL-LABEL: "{{[^"]*}}flang{{[^"/]*}}" "-fc1" ! Check that f90 files are not treated as "previously preprocessed" ! ... in --driver-mode=flang. diff --git a/clang/test/Driver/flang/multiple-inputs-mixed.f90 b/clang/test/Driver/flang/multiple-inputs-mixed.f90 index 2395dbecf1fe92..98d8cab00bdfdb 100644 --- a/clang/test/Driver/flang/multiple-inputs-mixed.f90 +++ b/clang/test/Driver/flang/multiple-inputs-mixed.f90 @@ -1,7 +1,7 @@ ! Check that flang can handle mixed C and fortran inputs. ! RUN: %clang --driver-mode=flang -### -fsyntax-only %S/Inputs/one.f90 %S/Inputs/other.c 2>&1 | FileCheck --check-prefixes=CHECK-SYNTAX-ONLY %s -! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang-new{{[^"/]*}}" "-fc1" +! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang{{[^"/]*}}" "-fc1" ! CHECK-SYNTAX-ONLY: "{{[^"]*}}/Inputs/one.f90" ! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}clang{{[^"/]*}}" "-cc1" ! CHECK-SYNTAX-ONLY: "{{[^"]*}}/Inputs/other.c" diff --git a/clang/test/Driver/flang/multiple-inputs.f90 b/clang/test/Driver/flang/multiple-inputs.f90 index ada999e927a6a0..3c0f22e5d3e508 100644 --- a/clang/test/Driver/flang/multiple-inputs.f90 +++ b/clang/test/Driver/flang/multiple-inputs.f90 @@ -1,7 +1,7 @@ ! Check that flang driver can handle multiple inputs at once. ! RUN: %clang --driver-mode=flang -### -fsyntax-only %S/Inputs/one.f90 %S/Inputs/two.f90 2>&1 | FileCheck --check-prefixes=CHECK-SYNTAX-ONLY %s -! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang-new{{[^"/]*}}" "-fc1" +! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang{{[^"/]*}}" "-fc1" ! CHECK-SYNTAX-ONLY: "{{[^"]*}}/Inputs/one.f90" -! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang-new{{[^"/]*}}" "-fc1" +! CHECK-SYNTAX-ONLY-LABEL: "{{[^"]*}}flang{{[^"/]*}}" "-fc1" ! CHECK-SYNTAX-ONLY: "{{[^"]*}}/Inputs/two.f90" diff --git a/flang/docs/FlangDriver.md b/flang/docs/FlangDriver.md index 815c26a28dfdfa..47cf078cf2d0d4 100644 --- a/flang/docs/FlangDriver.md +++ b/flang/docs/FlangDriver.md @@ -15,17 +15,13 @@ local: ``` There are two main drivers in Flang: -* the compiler driver, `flang-new` -* the frontend driver, `flang-new -fc1` - -> **_NOTE:_** The diagrams in this document refer to `flang` as opposed to -> `flang-new`. Eventually, `flang-new` will be renamed as `flang` and the -> diagrams reflect the final design that we are still working towards. +* the compiler driver, `flang` +* the frontend driver, `flang -fc1` The **compiler driver** will allow you to control all compilation phases (e.g. preprocessing, semantic checks, code-generation, code-optimisation, lowering and linking). For frontend specific tasks, the compiler driver creates a -Fortran compilation job and delegates it to `flang-new -fc1`, the frontend +Fortran compilation job and delegates it to `flang -fc1`, the frontend driver. For linking, it creates a linker job and calls an external linker (e.g. LLVM's [`lld`](https://lld.llvm.org/)). It can also call other tools such as external assemblers (e.g. [`as`](https://www.gnu.org/software/binutils/)). In @@ -47,7 +43,7 @@ frontend. It uses MLIR and LLVM for code-generation and can be viewed as a driver for Flang, LLVM and MLIR libraries. Contrary to the compiler driver, it is not capable of calling any external tools (including linkers). It is aware of all the frontend internals that are "hidden" from the compiler driver. It -accepts many frontend-specific options not available in `flang-new` and as such +accepts many frontend-specific options not available in `flang` and as such it provides a finer control over the frontend. Note that this tool is mostly intended for Flang developers. In particular, there are no guarantees about the stability of its interface and compiler developers can use it to experiment @@ -62,30 +58,30 @@ frontend specific flag from the _compiler_ directly to the _frontend_ driver, e.g.: ```bash -flang-new -Xflang -fdebug-dump-parse-tree input.f95 +flang -Xflang -fdebug-dump-parse-tree input.f95 ``` -In the invocation above, `-fdebug-dump-parse-tree` is forwarded to `flang-new +In the invocation above, `-fdebug-dump-parse-tree` is forwarded to `flang -fc1`. Without the forwarding flag, `-Xflang`, you would see the following warning: ```bash -flang-new: warning: argument unused during compilation: +flang: warning: argument unused during compilation: ``` -As `-fdebug-dump-parse-tree` is only supported by `flang-new -fc1`, `flang-new` +As `-fdebug-dump-parse-tree` is only supported by `flang -fc1`, `flang` will ignore it when used without `Xflang`. ## Why Do We Need Two Drivers? -As hinted above, `flang-new` and `flang-new -fc1` are two separate tools. The -fact that these tools are accessed through one binary, `flang-new`, is just an +As hinted above, `flang` and `flang -fc1` are two separate tools. The +fact that these tools are accessed through one binary, `flang`, is just an implementation detail. Each tool has a separate list of options, albeit defined in the same file: `clang/include/clang/Driver/Options.td`. The separation helps us split various tasks and allows us to implement more -specialised tools. In particular, `flang-new` is not aware of various +specialised tools. In particular, `flang` is not aware of various compilation phases within the frontend (e.g. scanning, parsing or semantic -checks). It does not have to be. Conversely, the frontend driver, `flang-new +checks). It does not have to be. Conversely, the frontend driver, `flang -fc1`, needs not to be concerned with linkers or other external tools like assemblers. Nor does it need to know where to look for various systems libraries, which is usually OS and platform specific. @@ -104,7 +100,7 @@ GCC](https://en.wikibooks.org/wiki/GNU_C_Compiler_Internals/GNU_C_Compiler_Archi In fact, Flang needs to adhere to this model in order to be able to re-use Clang's driver library. If you are more familiar with the [architecture of GFortran](https://gcc.gnu.org/onlinedocs/gcc-4.7.4/gfortran/About-GNU-Fortran.html) -than Clang, then `flang-new` corresponds to `gfortran` and `flang-new -fc1` to +than Clang, then `flang` corresponds to `gfortran` and `flang -fc1` to `f951`. ## Compiler Driver @@ -135,7 +131,7 @@ output from one action is the input for the subsequent one. You can use the `-ccc-print-phases` flag to see the sequence of actions that the driver will create for your compiler invocation: ```bash -flang-new -ccc-print-phases -E file.f +flang -ccc-print-phases -E file.f +- 0: input, "file.f", f95-cpp-input 1: preprocessor, {0}, f95 ``` @@ -143,7 +139,7 @@ As you can see, for `-E` the driver creates only two jobs and stops immediately after preprocessing. The first job simply prepares the input. For `-c`, the pipeline of the created jobs is more complex: ```bash -flang-new -ccc-print-phases -c file.f +flang -ccc-print-phases -c file.f +- 0: input, "file.f", f95-cpp-input +- 1: preprocessor, {0}, f95 +- 2: compiler, {1}, ir @@ -158,7 +154,7 @@ command to call the frontend driver is generated (more specifically, an instance of `clang::driver::Command`). Every command is bound to an instance of `clang::driver::Tool`. For Flang we introduced a specialisation of this class: `clang::driver::Flang`. This class implements the logic to either translate or -forward compiler options to the frontend driver, `flang-new -fc1`. +forward compiler options to the frontend driver, `flang -fc1`. You can read more on the design of `clangDriver` in Clang's [Driver Design & Internals](https://clang.llvm.org/docs/DriverInternals.html). @@ -232,12 +228,12 @@ driver, `clang -cc1` and consists of the following classes: This list is not exhaustive and only covers the main classes that implement the driver. The main entry point for the frontend driver, `fc1_main`, is implemented in `flang/tools/flang-driver/driver.cpp`. It can be accessed by -invoking the compiler driver, `flang-new`, with the `-fc1` flag. +invoking the compiler driver, `flang`, with the `-fc1` flag. The frontend driver will only run one action at a time. If you specify multiple action flags, only the last one will be taken into account. The default action is `ParseSyntaxOnlyAction`, which corresponds to `-fsyntax-only`. In other -words, `flang-new -fc1 ` is equivalent to `flang-new -fc1 -fsyntax-only +words, `flang -fc1 ` is equivalent to `flang -fc1 -fsyntax-only `. ## Adding new Compiler Options @@ -262,8 +258,8 @@ similar semantics to your new option and start by copying that. For every new option, you will also have to define the visibility of the new option. This is controlled through the `Visibility` field. You can use the following Flang specific visibility flags to control this: - * `FlangOption` - this option will be available in the `flang-new` compiler driver, - * `FC1Option` - this option will be available in the `flang-new -fc1` frontend driver, + * `FlangOption` - this option will be available in the `flang` compiler driver, + * `FC1Option` - this option will be available in the `flang -fc1` frontend driver, Options that are supported by clang should explicitly specify `ClangOption` in `Visibility`, and options that are only supported in Flang should not specify @@ -290,10 +286,10 @@ The parsing will depend on the semantics encoded in the TableGen definition. When adding a compiler driver option (i.e. an option that contains `FlangOption` among in it's `Visibility`) that you also intend to be understood -by the frontend, make sure that it is either forwarded to `flang-new -fc1` or +by the frontend, make sure that it is either forwarded to `flang -fc1` or translated into some other option that is accepted by the frontend driver. In the case of options that contain both `FlangOption` and `FC1Option` among its -flags, we usually just forward from `flang-new` to `flang-new -fc1`. This is +flags, we usually just forward from `flang` to `flang -fc1`. This is then tested in `flang/test/Driver/frontend-forward.F90`. What follows is usually very dependant on the meaning of the corresponding @@ -339,11 +335,11 @@ just added using your new frontend option. ## CMake Support As of [#7246](https://gitlab.kitware.com/cmake/cmake/-/merge_requests/7246) -(and soon to be released CMake 3.24.0), `cmake` can detect `flang-new` as a +(and soon to be released CMake 3.24.0), `cmake` can detect `flang` as a supported Fortran compiler. You can configure your CMake projects to use -`flang-new` as follows: +`flang` as follows: ```bash -cmake -DCMAKE_Fortran_COMPILER= +cmake -DCMAKE_Fortran_COMPILER= ``` You should see the following in the output: ``` @@ -353,14 +349,14 @@ where `` corresponds to the LLVM Flang version. ## Testing In LIT, we define two variables that you can use to invoke Flang's drivers: -* `%flang` is expanded as `flang-new` (i.e. the compiler driver) -* `%flang_fc1` is expanded as `flang-new -fc1` (i.e. the frontend driver) +* `%flang` is expanded as `flang` (i.e. the compiler driver) +* `%flang_fc1` is expanded as `flang -fc1` (i.e. the frontend driver) For most regression tests for the frontend, you will want to use `%flang_fc1`. In some cases, the observable behaviour will be identical regardless of whether `%flang` or `%flang_fc1` is used. However, when you are using `%flang` instead of `%flang_fc1`, the compiler driver will add extra flags to the frontend -driver invocation (i.e. `flang-new -fc1 -`). In some cases that might +driver invocation (i.e. `flang -fc1 -`). In some cases that might be exactly what you want to test. In fact, you can check these additional flags by using the `-###` compiler driver command line option. @@ -380,7 +376,7 @@ plugins. The process for using plugins includes: * [Creating a plugin](#creating-a-plugin) * [Loading and running a plugin](#loading-and-running-a-plugin) -Flang plugins are limited to `flang-new -fc1` and are currently only available / +Flang plugins are limited to `flang -fc1` and are currently only available / been tested on Linux. ### Creating a Plugin @@ -465,14 +461,14 @@ static FrontendPluginRegistry::Add X( ### Loading and Running a Plugin In order to use plugins, there are 2 command line options made available to the -frontend driver, `flang-new -fc1`: +frontend driver, `flang -fc1`: * [`-load `](#the--load-dsopath-option) for loading the dynamic shared object of the plugin * [`-plugin `](#the--plugin-name-option) for calling the registered plugin Invocation of the example plugin is done through: ```bash -flang-new -fc1 -load flangPrintFunctionNames.so -plugin print-fns file.f90 +flang -fc1 -load flangPrintFunctionNames.so -plugin print-fns file.f90 ``` Both these options are parsed in `flang/lib/Frontend/CompilerInvocation.cpp` and @@ -493,7 +489,7 @@ reports an error diagnostic and returns `nullptr`. ### Enabling In-Tree Plugins For in-tree plugins, there is the CMake flag `FLANG_PLUGIN_SUPPORT`, enabled by -default, that controls the exporting of executable symbols from `flang-new`, +default, that controls the exporting of executable symbols from `flang`, which plugins need access to. Additionally, there is the CMake flag `LLVM_BUILD_EXAMPLES`, turned off by default, that is used to control if the example programs are built. This includes plugins that are in the @@ -526,7 +522,7 @@ invocations `invokeFIROptEarlyEPCallbacks`, `invokeFIRInlinerCallback`, and `invokeFIROptLastEPCallbacks` for Flang drivers to be able to insert additonal passes at different points of the default pass pipeline. An example use of these extension point callbacks is shown in `registerDefaultInlinerPass` to invoke the -default inliner pass in `flang-new`. +default inliner pass in `flang`. ## LLVM Pass Plugins @@ -539,7 +535,7 @@ documentation for [`llvm::PassBuilder`](https://llvm.org/doxygen/classllvm_1_1PassBuilder.html) for details. -The framework to enable pass plugins in `flang-new` uses the exact same +The framework to enable pass plugins in `flang` uses the exact same machinery as that used by `clang` and thus has the same capabilities and limitations. @@ -547,7 +543,7 @@ In order to use a pass plugin, the pass(es) must be compiled into a dynamic shared object which is then loaded using the `-fpass-plugin` option. ``` -flang-new -fpass-plugin=/path/to/plugin.so +flang -fpass-plugin=/path/to/plugin.so ``` This option is available in both the compiler driver and the frontend driver. @@ -559,7 +555,7 @@ Pass extensions are similar to plugins, except that they can also be linked statically. Setting `-DLLVM_${NAME}_LINK_INTO_TOOLS` to `ON` in the cmake command turns the project into a statically linked extension. An example would be Polly, e.g., using `-DLLVM_POLLY_LINK_INTO_TOOLS=ON` would link Polly passes -into `flang-new` as built-in middle-end passes. +into `flang` as built-in middle-end passes. See the [`WritingAnLLVMNewPMPass`](https://llvm.org/docs/WritingAnLLVMNewPMPass.html#id9) diff --git a/flang/docs/ImplementingASemanticCheck.md b/flang/docs/ImplementingASemanticCheck.md index 5b583d4f8031b8..598ef696ad14bf 100644 --- a/flang/docs/ImplementingASemanticCheck.md +++ b/flang/docs/ImplementingASemanticCheck.md @@ -68,7 +68,7 @@ of the call to `intentOutFunc()`: I also used this program to produce a parse tree for the program using the command: ```bash - flang-new -fc1 -fdebug-dump-parse-tree testfun.f90 + flang -fc1 -fdebug-dump-parse-tree testfun.f90 ``` Here's the relevant fragment of the parse tree produced by the compiler: @@ -296,7 +296,7 @@ In `lib/Semantics/check-do.cpp`, I added an (almost empty) implementation: I then built the compiler with these changes and ran it on my test program. This time, I made sure to invoke semantic checking. Here's the command I used: ```bash - flang-new -fc1 -fdebug-unparse-with-symbols testfun.f90 + flang -fc1 -fdebug-unparse-with-symbols testfun.f90 ``` This produced the output: diff --git a/flang/docs/Overview.md b/flang/docs/Overview.md index 6eba19ea3a3c0d..dfb4d89264a755 100644 --- a/flang/docs/Overview.md +++ b/flang/docs/Overview.md @@ -65,8 +65,8 @@ See [Preprocessing.md](Preprocessing.md). **Entry point:** `parser::Parsing::Prescan` **Commands:** - - `flang-new -fc1 -E src.f90` dumps the cooked character stream - - `flang-new -fc1 -fdebug-dump-provenance src.f90` dumps provenance + - `flang -fc1 -E src.f90` dumps the cooked character stream + - `flang -fc1 -fdebug-dump-provenance src.f90` dumps provenance information ### Parsing @@ -80,10 +80,10 @@ representing a syntactically correct program, rooted at the program unit. See: **Entry point:** `parser::Parsing::Parse` **Commands:** - - `flang-new -fc1 -fdebug-dump-parse-tree-no-sema src.f90` dumps the parse tree - - `flang-new -fc1 -fdebug-unparse src.f90` converts the parse tree to normalized Fortran - - `flang-new -fc1 -fdebug-dump-parsing-log src.f90` runs an instrumented parse and dumps the log - - `flang-new -fc1 -fdebug-measure-parse-tree src.f90` measures the parse tree + - `flang -fc1 -fdebug-dump-parse-tree-no-sema src.f90` dumps the parse tree + - `flang -fc1 -fdebug-unparse src.f90` converts the parse tree to normalized Fortran + - `flang -fc1 -fdebug-dump-parsing-log src.f90` runs an instrumented parse and dumps the log + - `flang -fc1 -fdebug-measure-parse-tree src.f90` measures the parse tree ### Semantic processing @@ -121,9 +121,9 @@ In the course of semantic analysis, the compiler: At the end of semantic processing, all validation of the user's program is complete. This is the last detailed phase of analysis processing. **Commands:** - - `flang-new -fc1 -fdebug-dump-parse-tree src.f90` dumps the parse tree after semantic analysis - - `flang-new -fc1 -fdebug-dump-symbols src.f90` dumps the symbol table - - `flang-new -fc1 -fdebug-dump-all src.f90` dumps both the parse tree and the symbol table + - `flang -fc1 -fdebug-dump-parse-tree src.f90` dumps the parse tree after semantic analysis + - `flang -fc1 -fdebug-dump-symbols src.f90` dumps the symbol table + - `flang -fc1 -fdebug-dump-all src.f90` dumps both the parse tree and the symbol table ## Lowering @@ -163,8 +163,8 @@ contain a list of evaluations. All of these contain pointers back into the parse tree. The compiler walks the PFT generating FIR. **Commands:** - - `flang-new -fc1 -fdebug-dump-pft src.f90` dumps the pre-FIR tree - - `flang-new -fc1 -emit-mlir src.f90` dumps the FIR to the files src.mlir + - `flang -fc1 -fdebug-dump-pft src.f90` dumps the pre-FIR tree + - `flang -fc1 -emit-mlir src.f90` dumps the FIR to the files src.mlir ### Transformation passes @@ -180,8 +180,8 @@ perform various optimizations and transformations. The final pass creates an LLVM IR representation of the program. **Commands:** - - `flang-new -mmlir --mlir-print-ir-after-all -S src.f90` dumps the FIR code after each pass to standard error - - `flang-new -fc1 -emit-llvm src.f90` dumps the LLVM IR to src.ll + - `flang -mmlir --mlir-print-ir-after-all -S src.f90` dumps the FIR code after each pass to standard error + - `flang -fc1 -emit-llvm src.f90` dumps the LLVM IR to src.ll ## Object code generation and linking diff --git a/flang/examples/FlangOmpReport/FlangOmpReport.cpp b/flang/examples/FlangOmpReport/FlangOmpReport.cpp index 9c1f304b9741e7..709c5c5d305e51 100644 --- a/flang/examples/FlangOmpReport/FlangOmpReport.cpp +++ b/flang/examples/FlangOmpReport/FlangOmpReport.cpp @@ -9,7 +9,7 @@ // all the OpenMP constructs and clauses and which line they're located on. // // The plugin may be invoked as: -// ./bin/flang-new -fc1 -load lib/flangOmpReport.so -plugin flang-omp-report +// ./bin/flang -fc1 -load lib/flangOmpReport.so -plugin flang-omp-report // -fopenmp // //===----------------------------------------------------------------------===// diff --git a/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h b/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h index 9a70b7fbfad2b6..8ab5150cd7c812 100644 --- a/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h +++ b/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h @@ -67,7 +67,7 @@ struct AliasAnalysis { // end subroutine // ------------------------------------------------- // - // flang-new -fc1 -emit-fir test.f90 -o test.fir + // flang -fc1 -emit-fir test.f90 -o test.fir // // ------------------- test.fir -------------------- // fir.global @_QMtopEa : !fir.box>> diff --git a/flang/include/flang/Tools/CrossToolHelpers.h b/flang/include/flang/Tools/CrossToolHelpers.h index 3e703de545950c..df4b21ada058fe 100644 --- a/flang/include/flang/Tools/CrossToolHelpers.h +++ b/flang/include/flang/Tools/CrossToolHelpers.h @@ -7,7 +7,7 @@ //===----------------------------------------------------------------------===// // A header file for containing functionallity that is used across Flang tools, // such as helper functions which apply or generate information needed accross -// tools like bbc and flang-new. +// tools like bbc and flang. //===----------------------------------------------------------------------===// #ifndef FORTRAN_TOOLS_CROSS_TOOL_HELPERS_H diff --git a/flang/lib/Frontend/CompilerInvocation.cpp b/flang/lib/Frontend/CompilerInvocation.cpp index 2154b9ab2fbf47..849c6f53614f63 100644 --- a/flang/lib/Frontend/CompilerInvocation.cpp +++ b/flang/lib/Frontend/CompilerInvocation.cpp @@ -66,8 +66,8 @@ CompilerInvocationBase::~CompilerInvocationBase() = default; static bool parseShowColorsArgs(const llvm::opt::ArgList &args, bool defaultColor = true) { // Color diagnostics default to auto ("on" if terminal supports) in the - // compiler driver `flang-new` but default to off in the frontend driver - // `flang-new -fc1`, needing an explicit OPT_fdiagnostics_color. + // compiler driver `flang` but default to off in the frontend driver + // `flang -fc1`, needing an explicit OPT_fdiagnostics_color. // Support both clang's -f[no-]color-diagnostics and gcc's // -f[no-]diagnostics-colors[=never|always|auto]. enum { @@ -900,7 +900,7 @@ static bool parseDiagArgs(CompilerInvocation &res, llvm::opt::ArgList &args, } } - // Default to off for `flang-new -fc1`. + // Default to off for `flang -fc1`. res.getFrontendOpts().showColors = parseShowColorsArgs(args, /*defaultDiagColor=*/false); diff --git a/flang/lib/Frontend/FrontendActions.cpp b/flang/lib/Frontend/FrontendActions.cpp index 4a52edc436e0ed..8f882bff170909 100644 --- a/flang/lib/Frontend/FrontendActions.cpp +++ b/flang/lib/Frontend/FrontendActions.cpp @@ -233,7 +233,7 @@ bool CodeGenAction::beginSourceFileAction() { llvm::SMDiagnostic err; llvmModule = llvm::parseIRFile(getCurrentInput().getFile(), err, *llvmCtx); if (!llvmModule || llvm::verifyModule(*llvmModule, &llvm::errs())) { - err.print("flang-new", llvm::errs()); + err.print("flang", llvm::errs()); unsigned diagID = ci.getDiagnostics().getCustomDiagID( clang::DiagnosticsEngine::Error, "Could not parse IR"); ci.getDiagnostics().Report(diagID); diff --git a/flang/lib/FrontendTool/ExecuteCompilerInvocation.cpp b/flang/lib/FrontendTool/ExecuteCompilerInvocation.cpp index e2cbd5112d6ea5..09ac129d3e6893 100644 --- a/flang/lib/FrontendTool/ExecuteCompilerInvocation.cpp +++ b/flang/lib/FrontendTool/ExecuteCompilerInvocation.cpp @@ -154,8 +154,7 @@ bool executeCompilerInvocation(CompilerInstance *flang) { // Honor -help. if (flang->getFrontendOpts().showHelp) { clang::driver::getDriverOptTable().printHelp( - llvm::outs(), "flang-new -fc1 [options] file...", - "LLVM 'Flang' Compiler", + llvm::outs(), "flang -fc1 [options] file...", "LLVM 'Flang' Compiler", /*ShowHidden=*/false, /*ShowAllAliases=*/false, llvm::opt::Visibility(clang::driver::options::FC1Option)); return true; diff --git a/flang/runtime/CMakeLists.txt b/flang/runtime/CMakeLists.txt index 0ad1b718d5875b..cdd2de541c6730 100644 --- a/flang/runtime/CMakeLists.txt +++ b/flang/runtime/CMakeLists.txt @@ -308,12 +308,12 @@ set_target_properties(FortranRuntime PROPERTIES FOLDER "Flang/Runtime Libraries" # If FortranRuntime is part of a Flang build (and not a separate build) then # add dependency to make sure that Fortran runtime library is being built after # we have the Flang compiler available. This also includes the MODULE files -# that compile when the 'flang-new' target is built. +# that compile when the 'flang' target is built. # # TODO: This is a workaround and should be updated when runtime build procedure # is changed to a regular runtime build. See discussion in PR #95388. -if (TARGET flang-new AND TARGET module_files) - add_dependencies(FortranRuntime flang-new module_files) +if (TARGET flang AND TARGET module_files) + add_dependencies(FortranRuntime flang module_files) endif() if (FLANG_CUF_RUNTIME) diff --git a/flang/test/CMakeLists.txt b/flang/test/CMakeLists.txt index a18a5c6519eda4..cab214c2ef4c8c 100644 --- a/flang/test/CMakeLists.txt +++ b/flang/test/CMakeLists.txt @@ -58,7 +58,7 @@ set(FLANG_TEST_PARAMS flang_site_config=${CMAKE_CURRENT_BINARY_DIR}/lit.site.cfg.py) set(FLANG_TEST_DEPENDS - flang-new + flang llvm-config FileCheck count diff --git a/flang/test/Driver/aarch64-outline-atomics.f90 b/flang/test/Driver/aarch64-outline-atomics.f90 index a1c874c20df5c7..530bfc8e962091 100644 --- a/flang/test/Driver/aarch64-outline-atomics.f90 +++ b/flang/test/Driver/aarch64-outline-atomics.f90 @@ -1,4 +1,4 @@ -! Test that flang-new forwards the -moutline-atomics and -mno-outline-atomics. +! Test that flang forwards the -moutline-atomics and -mno-outline-atomics. ! RUN: %flang -moutline-atomics --target=aarch64-none-none -### %s -o %t 2>&1 | FileCheck %s ! CHECK: "-target-feature" "+outline-atomics" diff --git a/flang/test/Driver/color-diagnostics-forwarding.f90 b/flang/test/Driver/color-diagnostics-forwarding.f90 index 368fa8834142ab..29061242cb0cbc 100644 --- a/flang/test/Driver/color-diagnostics-forwarding.f90 +++ b/flang/test/Driver/color-diagnostics-forwarding.f90 @@ -1,5 +1,5 @@ -! Test that flang-new forwards -f{no-}color-diagnostics and -! -f{no-}diagnostics-color options to flang-new -fc1 as expected. +! Test that flang forwards -f{no-}color-diagnostics and +! -f{no-}diagnostics-color options to flang -fc1 as expected. ! RUN: %flang -fsyntax-only -### %s -o %t 2>&1 -fcolor-diagnostics \ ! RUN: | FileCheck %s --check-prefix=CHECK-CD diff --git a/flang/test/Driver/compiler-options.f90 b/flang/test/Driver/compiler-options.f90 index 7ec29ce7ba7abf..cefa86836abd30 100644 --- a/flang/test/Driver/compiler-options.f90 +++ b/flang/test/Driver/compiler-options.f90 @@ -1,6 +1,6 @@ ! RUN: %flang -S -emit-llvm -flang-deprecated-no-hlfir -o - %s | FileCheck %s -! Test communication of COMPILER_OPTIONS from flang-new to flang-new -fc1. -! CHECK: [[OPTSVAR:@_QQclX[0-9a-f]+]] = {{[a-z]+}} constant [[[OPTSLEN:[0-9]+]] x i8] c"{{.*}}flang-new{{(\.exe)?}} {{.*}}-S -emit-llvm -flang-deprecated-no-hlfir -o - {{.*}}compiler-options.f90" +! Test communication of COMPILER_OPTIONS from flang to flang -fc1. +! CHECK: [[OPTSVAR:@_QQclX[0-9a-f]+]] = {{[a-z]+}} constant [[[OPTSLEN:[0-9]+]] x i8] c"{{.*}}flang{{(\.exe)?}} {{.*}}-S -emit-llvm -flang-deprecated-no-hlfir -o - {{.*}}compiler-options.f90" program main use ISO_FORTRAN_ENV, only: compiler_options implicit none diff --git a/flang/test/Driver/convert.f90 b/flang/test/Driver/convert.f90 index b2cf6c23efdb75..0ba31d2188cdf5 100755 --- a/flang/test/Driver/convert.f90 +++ b/flang/test/Driver/convert.f90 @@ -12,7 +12,7 @@ ! RUN: not %flang -fconvert=foobar %s 2>&1 | FileCheck %s --check-prefix=INVALID !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -emit-mlir -fconvert=unknown %s -o - | FileCheck %s --check-prefix=VALID_FC1 ! RUN: %flang_fc1 -emit-mlir -fconvert=native %s -o - | FileCheck %s --check-prefix=VALID_FC1 diff --git a/flang/test/Driver/disable-ext-name-interop.f90 b/flang/test/Driver/disable-ext-name-interop.f90 index 0c59a5b4c980f8..1ade84b996d043 100644 --- a/flang/test/Driver/disable-ext-name-interop.f90 +++ b/flang/test/Driver/disable-ext-name-interop.f90 @@ -1,4 +1,4 @@ -! Test that we can disable the ExternalNameConversion pass in flang-new. +! Test that we can disable the ExternalNameConversion pass in flang. ! RUN: %flang_fc1 -S %s -o - 2>&1 | FileCheck %s --check-prefix=EXTNAMES ! RUN: %flang_fc1 -S -mmlir -disable-external-name-interop %s -o - 2>&1 | FileCheck %s --check-prefix=INTNAMES diff --git a/flang/test/Driver/driver-version.f90 b/flang/test/Driver/driver-version.f90 index d1e1e1d90fe1f8..4c6aecb1c4fa7e 100644 --- a/flang/test/Driver/driver-version.f90 +++ b/flang/test/Driver/driver-version.f90 @@ -4,12 +4,12 @@ ! RUN: %flang_fc1 -version 2>&1 | FileCheck %s --check-prefix=VERSION-FC1 ! RUN: not %flang_fc1 --version 2>&1 | FileCheck %s --check-prefix=ERROR-FC1 -! VERSION: flang-new version +! VERSION: flang version ! VERSION-NEXT: Target: ! VERSION-NEXT: Thread model: ! VERSION-NEXT: InstalledDir: -! ERROR: flang-new: error: unknown argument '--versions'; did you mean '--version'? +! ERROR: flang: error: unknown argument '--versions'; did you mean '--version'? ! VERSION-FC1: LLVM version diff --git a/flang/test/Driver/escaped-backslash.f90 b/flang/test/Driver/escaped-backslash.f90 index ad07eae24e9fab..90dd1783dd1150 100644 --- a/flang/test/Driver/escaped-backslash.f90 +++ b/flang/test/Driver/escaped-backslash.f90 @@ -1,14 +1,14 @@ ! Ensure argument -fbackslash works as expected. !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: %flang -E %s 2>&1 | FileCheck %s --check-prefix=ESCAPED ! RUN: %flang -E -fbackslash -fno-backslash %s 2>&1 | FileCheck %s --check-prefix=ESCAPED ! RUN: %flang -E -fbackslash %s 2>&1 | FileCheck %s --check-prefix=UNESCAPED !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -E %s 2>&1 | FileCheck %s --check-prefix=ESCAPED ! RUN: %flang_fc1 -E -fbackslash -fno-backslash %s 2>&1 | FileCheck %s --check-prefix=ESCAPED diff --git a/flang/test/Driver/fdefault.f90 b/flang/test/Driver/fdefault.f90 index 88592bfa3e87ee..7ce45b763a240f 100644 --- a/flang/test/Driver/fdefault.f90 +++ b/flang/test/Driver/fdefault.f90 @@ -2,25 +2,25 @@ ! TODO: Add checks when actual codegen is possible for this family !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang -fsyntax-only -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=NOOPTION -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang -fsyntax-only -fdefault-real-8 -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=REAL8 -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang -fsyntax-only -fdefault-real-8 -fdefault-double-8 -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=DOUBLE8 +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang -fsyntax-only -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=NOOPTION +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang -fsyntax-only -fdefault-real-8 -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=REAL8 +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang -fsyntax-only -fdefault-real-8 -fdefault-double-8 -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=DOUBLE8 ! RUN: not %flang -fsyntax-only -fdefault-double-8 %s 2>&1 | FileCheck %s --check-prefix=ERROR !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang_fc1 -fsyntax-only -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=NOOPTION -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang_fc1 -fsyntax-only -fdefault-real-8 -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=REAL8 -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang_fc1 -fsyntax-only -fdefault-real-8 -fdefault-double-8 -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=DOUBLE8 +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang_fc1 -fsyntax-only -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=NOOPTION +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang_fc1 -fsyntax-only -fdefault-real-8 -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=REAL8 +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang_fc1 -fsyntax-only -fdefault-real-8 -fdefault-double-8 -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=DOUBLE8 ! RUN: not %flang_fc1 -fsyntax-only -fdefault-double-8 %s 2>&1 | FileCheck %s --check-prefix=ERROR ! NOOPTION: integer(4),parameter::real_kind=4_4 diff --git a/flang/test/Driver/flarge-sizes.f90 b/flang/test/Driver/flarge-sizes.f90 index 6ea5876676ed1f..6c41a03a830bfb 100644 --- a/flang/test/Driver/flarge-sizes.f90 +++ b/flang/test/Driver/flarge-sizes.f90 @@ -2,20 +2,20 @@ ! TODO: Add checks when actual codegen is possible. !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang -fsyntax-only -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=NOLARGE -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang -fsyntax-only -flarge-sizes -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=LARGE +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang -fsyntax-only -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=NOLARGE +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang -fsyntax-only -flarge-sizes -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=LARGE !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang_fc1 -fsyntax-only -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=NOLARGE -! RUN: rm -rf %t/dir-flang-new && mkdir -p %t/dir-flang-new && %flang_fc1 -fsyntax-only -flarge-sizes -module-dir %t/dir-flang-new %s 2>&1 -! RUN: cat %t/dir-flang-new/m.mod | FileCheck %s --check-prefix=LARGE +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang_fc1 -fsyntax-only -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=NOLARGE +! RUN: rm -rf %t/dir-flang && mkdir -p %t/dir-flang && %flang_fc1 -fsyntax-only -flarge-sizes -module-dir %t/dir-flang %s 2>&1 +! RUN: cat %t/dir-flang/m.mod | FileCheck %s --check-prefix=LARGE ! NOLARGE: real(4)::z(1_8:10_8) ! NOLARGE-NEXT: integer(4),parameter::size_kind=4_4 diff --git a/flang/test/Driver/frame-pointer-forwarding.f90 b/flang/test/Driver/frame-pointer-forwarding.f90 index 751494cc6a6017..9fcbd6e12f98b7 100644 --- a/flang/test/Driver/frame-pointer-forwarding.f90 +++ b/flang/test/Driver/frame-pointer-forwarding.f90 @@ -1,4 +1,4 @@ -! Test that flang-new forwards -fno-omit-frame-pointer and -fomit-frame-pointer Flang frontend +! Test that flang forwards -fno-omit-frame-pointer and -fomit-frame-pointer Flang frontend ! RUN: %flang --target=aarch64-none-none -fsyntax-only -### %s -o %t 2>&1 | FileCheck %s --check-prefix=CHECK-NOVALUE ! CHECK-NOVALUE: "-fc1"{{.*}}"-mframe-pointer=non-leaf" diff --git a/flang/test/Driver/frontend-forwarding.f90 b/flang/test/Driver/frontend-forwarding.f90 index 35adb47b56861e..0a56a1e3710d9d 100644 --- a/flang/test/Driver/frontend-forwarding.f90 +++ b/flang/test/Driver/frontend-forwarding.f90 @@ -1,5 +1,5 @@ -! Test that flang-new forwards Flang frontend -! options to flang-new -fc1 as expected. +! Test that flang forwards Flang frontend +! options to flang -fc1 as expected. ! RUN: %flang -fsyntax-only -### %s -o %t 2>&1 \ ! RUN: -finput-charset=utf-8 \ diff --git a/flang/test/Driver/hlfir-no-hlfir-error.f90 b/flang/test/Driver/hlfir-no-hlfir-error.f90 index 2410393b6cd9c1..59f8304db5c9ab 100644 --- a/flang/test/Driver/hlfir-no-hlfir-error.f90 +++ b/flang/test/Driver/hlfir-no-hlfir-error.f90 @@ -2,12 +2,12 @@ ! options cannot be both used. !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: not %flang -flang-experimental-hlfir -flang-deprecated-no-hlfir %s 2>&1 | FileCheck %s !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: not %flang_fc1 -emit-llvm -flang-experimental-hlfir -flang-deprecated-no-hlfir %s 2>&1 | FileCheck %s diff --git a/flang/test/Driver/intrinsic-module-path.f90 b/flang/test/Driver/intrinsic-module-path.f90 index 5523ed37b724cd..15d19dd83d963f 100644 --- a/flang/test/Driver/intrinsic-module-path.f90 +++ b/flang/test/Driver/intrinsic-module-path.f90 @@ -4,7 +4,7 @@ ! default one, causing a CHECKSUM error. !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -fsyntax-only %s 2>&1 | FileCheck %s --allow-empty --check-prefix=WITHOUT ! RUN: not %flang_fc1 -fsyntax-only -fintrinsic-modules-path %S/Inputs/ %s 2>&1 | FileCheck %s --check-prefix=GIVEN diff --git a/flang/test/Driver/large-data-threshold.f90 b/flang/test/Driver/large-data-threshold.f90 index 320566c4b2e43a..6a7eef79559d0b 100644 --- a/flang/test/Driver/large-data-threshold.f90 +++ b/flang/test/Driver/large-data-threshold.f90 @@ -7,11 +7,11 @@ ! RUN: not %flang -### -c --target=aarch64 -mcmodel=small -mlarge-data-threshold=32768 %s 2>&1 | FileCheck %s --check-prefix=NOT-SUPPORTED -! CHECK: "{{.*}}flang-new" "-fc1" +! CHECK: "{{.*}}flang" "-fc1" ! CHECK-SAME: "-mlarge-data-threshold=32768" -! CHECK-59000: "{{.*}}flang-new" "-fc1" +! CHECK-59000: "{{.*}}flang" "-fc1" ! CHECK-59000-SAME: "-mlarge-data-threshold=59000" -! CHECK-1M: "{{.*}}flang-new" "-fc1" +! CHECK-1M: "{{.*}}flang" "-fc1" ! CHECK-1M-SAME: "-mlarge-data-threshold=1048576" ! NO-MCMODEL: 'mlarge-data-threshold=' only applies to medium and large code models ! INVALID: error: invalid value 'nonsense' in '-mlarge-data-threshold=' diff --git a/flang/test/Driver/lto-flags.f90 b/flang/test/Driver/lto-flags.f90 index a51febc7009691..bad3d972e6bd6b 100644 --- a/flang/test/Driver/lto-flags.f90 +++ b/flang/test/Driver/lto-flags.f90 @@ -30,7 +30,7 @@ ! FULL-LTO: "-fc1" ! FULL-LTO-SAME: "-flto=full" -! THIN-LTO-ALL: flang-new: warning: the option '-flto=thin' is a work in progress +! THIN-LTO-ALL: flang: warning: the option '-flto=thin' is a work in progress ! THIN-LTO-ALL: "-fc1" ! THIN-LTO-ALL-SAME: "-flto=thin" ! THIN-LTO-LINKER-PLUGIN: "-plugin-opt=thinlto" diff --git a/flang/test/Driver/macro-def-undef.F90 b/flang/test/Driver/macro-def-undef.F90 index 1332c6d6c02708..b13a9040833dbf 100644 --- a/flang/test/Driver/macro-def-undef.F90 +++ b/flang/test/Driver/macro-def-undef.F90 @@ -1,14 +1,14 @@ ! Ensure arguments -D and -U work as expected. !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: %flang -E -P %s 2>&1 | FileCheck %s --check-prefix=UNDEFINED ! RUN: %flang -E -P -DX=A %s 2>&1 | FileCheck %s --check-prefix=DEFINED ! RUN: %flang -E -P -DX=A -UX %s 2>&1 | FileCheck %s --check-prefix=UNDEFINED !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -E -P %s 2>&1 | FileCheck %s --check-prefix=UNDEFINED ! RUN: %flang_fc1 -E -P -DX=A %s 2>&1 | FileCheck %s --check-prefix=DEFINED diff --git a/flang/test/Driver/missing-input.f90 b/flang/test/Driver/missing-input.f90 index 236325e3578f1d..51d37a718c542f 100644 --- a/flang/test/Driver/missing-input.f90 +++ b/flang/test/Driver/missing-input.f90 @@ -1,26 +1,26 @@ ! Test the behaviour of the driver when input is missing or is invalid. Note -! that with the compiler driver (flang-new), the input _has_ to be specified. +! that with the compiler driver (flang), the input _has_ to be specified. ! Indeed, the driver decides what "job/command" to create based on the input ! file's extension. No input file means that it doesn't know what to do -! (compile? preprocess? link?). The frontend driver (flang-new -fc1) simply +! (compile? preprocess? link?). The frontend driver (flang -fc1) simply ! assumes that "no explicit input == read from stdin" !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: not %flang 2>&1 | FileCheck %s --check-prefix=FLANG-NO-FILE ! RUN: not %flang %t.f90 2>&1 | FileCheck %s --check-prefix=FLANG-NONEXISTENT-FILE !----------------------------------------- -! FLANG FRONTEND DRIVER (flang-new -fc1) +! FLANG FRONTEND DRIVER (flang -fc1) !----------------------------------------- ! RUN: not %flang_fc1 %t.f90 2>&1 | FileCheck %s --check-prefix=FLANG-FC1-NONEXISTENT-FILE ! RUN: not %flang_fc1 %S 2>&1 | FileCheck %s --check-prefix=FLANG-FC1-DIR -! FLANG-NO-FILE: flang-new: error: no input files +! FLANG-NO-FILE: flang: error: no input files -! FLANG-NONEXISTENT-FILE: flang-new: error: no such file or directory: {{.*}} -! FLANG-NONEXISTENT-FILE: flang-new: error: no input files +! FLANG-NONEXISTENT-FILE: flang: error: no such file or directory: {{.*}} +! FLANG-NONEXISTENT-FILE: flang: error: no input files ! FLANG-FC1-NONEXISTENT-FILE: error: {{.*}} does not exist ! FLANG-FC1-DIR: error: {{.*}} is not a regular file diff --git a/flang/test/Driver/multiple-input-files.f90 b/flang/test/Driver/multiple-input-files.f90 index 6c86f23f2b21fa..64ec8679abf94f 100644 --- a/flang/test/Driver/multiple-input-files.f90 +++ b/flang/test/Driver/multiple-input-files.f90 @@ -39,7 +39,7 @@ ! FLANG-NEXT:end program hello ! TEST 2: `-o` does not when multiple input files are present -! ERROR: flang-new: error: cannot specify -o when generating multiple output files +! ERROR: flang: error: cannot specify -o when generating multiple output files ! TEST 3: The output file _was not_ specified - `flang_fc1` will process all ! input files and generate one output file for every input file. diff --git a/flang/test/Driver/omp-driver-offload.f90 b/flang/test/Driver/omp-driver-offload.f90 index b0b94ab1386a74..7c51656f0001af 100644 --- a/flang/test/Driver/omp-driver-offload.f90 +++ b/flang/test/Driver/omp-driver-offload.f90 @@ -1,6 +1,6 @@ -! Test that flang-new OpenMP and OpenMP offload related +! Test that flang OpenMP and OpenMP offload related ! commands forward or expand to the appropriate commands -! for flang-new -fc1 as expected. Assumes a gfx90a, aarch64, +! for flang -fc1 as expected. Assumes a gfx90a, aarch64, ! and sm_70 architecture, but doesn't require one to be ! installed or compiled for, just testing the appropriate ! generation of jobs are created with the correct @@ -8,8 +8,8 @@ ! Test regular -fopenmp with no offload ! RUN: %flang -### -fopenmp %s 2>&1 | FileCheck --check-prefixes=CHECK-OPENMP %s -! CHECK-OPENMP: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}}.f90" -! CHECK-OPENMP-NOT: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" {{.*}}.f90" +! CHECK-OPENMP: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}}.f90" +! CHECK-OPENMP-NOT: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" {{.*}}.f90" ! Test regular -fopenmp with offload, and invocation filtering options ! RUN: %flang -S -### %s -o %t 2>&1 \ @@ -22,47 +22,47 @@ ! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\ ! RUN: | FileCheck %s --check-prefix=OFFLOAD-HOST-AND-DEVICE -! OFFLOAD-HOST-AND-DEVICE: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" -! OFFLOAD-HOST-AND-DEVICE-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" -! OFFLOAD-HOST-AND-DEVICE-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "nvptx64-nvidia-cuda" -! OFFLOAD-HOST-AND-DEVICE: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-HOST-AND-DEVICE: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-HOST-AND-DEVICE-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! OFFLOAD-HOST-AND-DEVICE-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "nvptx64-nvidia-cuda" +! OFFLOAD-HOST-AND-DEVICE: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! RUN: %flang -S -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a --offload-arch=sm_70 --offload-host-only \ ! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\ ! RUN: | FileCheck %s --check-prefix=OFFLOAD-HOST -! OFFLOAD-HOST: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-HOST: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! OFFLOAD-HOST-NOT: "-triple" "amdgcn-amd-amdhsa" ! OFFLOAD-HOST-NOT: "-triple" "nvptx64-nvidia-cuda" -! OFFLOAD-HOST-NOT: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-HOST-NOT: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! RUN: %flang -S -### %s 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a --offload-arch=sm_70 --offload-device-only \ ! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\ ! RUN: | FileCheck %s --check-prefix=OFFLOAD-DEVICE -! OFFLOAD-DEVICE: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" -! OFFLOAD-DEVICE-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" -! OFFLOAD-DEVICE-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "nvptx64-nvidia-cuda" -! OFFLOAD-DEVICE-NOT: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-DEVICE: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OFFLOAD-DEVICE-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! OFFLOAD-DEVICE-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "nvptx64-nvidia-cuda" +! OFFLOAD-DEVICE-NOT: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! Test regular -fopenmp with offload for basic fopenmp-is-target-device flag addition and correct fopenmp ! RUN: %flang -### -fopenmp --offload-arch=gfx90a -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib %s 2>&1 | FileCheck --check-prefixes=CHECK-OPENMP-IS-TARGET-DEVICE %s -! CHECK-OPENMP-IS-TARGET-DEVICE: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" {{.*}}.f90" +! CHECK-OPENMP-IS-TARGET-DEVICE: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" {{.*}}.f90" ! Testing appropriate flags are gnerated and appropriately assigned by the driver when offloading ! RUN: %flang -S -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ ! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\ ! RUN: | FileCheck %s --check-prefix=OPENMP-OFFLOAD-ARGS -! OPENMP-OFFLOAD-ARGS: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" {{.*}} "-fopenmp" {{.*}}.f90" -! OPENMP-OFFLOAD-ARGS-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! OPENMP-OFFLOAD-ARGS: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" {{.*}} "-fopenmp" {{.*}}.f90" +! OPENMP-OFFLOAD-ARGS-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" ! OPENMP-OFFLOAD-ARGS-SAME: "-fopenmp" ! OPENMP-OFFLOAD-ARGS-SAME: "-fopenmp-host-ir-file-path" "{{.*}}.bc" "-fopenmp-is-target-device" ! OPENMP-OFFLOAD-ARGS-SAME: {{.*}}.f90" ! OPENMP-OFFLOAD-ARGS: "{{[^"]*}}clang-offload-packager{{.*}}" {{.*}} "--image=file={{.*}}.bc,triple=amdgcn-amd-amdhsa,arch=gfx90a,kind=openmp" -! OPENMP-OFFLOAD-ARGS-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! OPENMP-OFFLOAD-ARGS-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! OPENMP-OFFLOAD-ARGS-SAME: "-fopenmp" ! OPENMP-OFFLOAD-ARGS-SAME: "-fembed-offload-object={{.*}}.out" {{.*}}.bc" @@ -77,7 +77,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-assume-threads-oversubscription \ ! RUN: | FileCheck %s --check-prefixes=CHECK-THREADS-OVS -! CHECK-THREADS-OVS: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-threads-oversubscription" {{.*}}.f90" +! CHECK-THREADS-OVS: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-threads-oversubscription" {{.*}}.f90" ! RUN: %flang -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -89,7 +89,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-assume-teams-oversubscription \ ! RUN: | FileCheck %s --check-prefixes=CHECK-TEAMS-OVS -! CHECK-TEAMS-OVS: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-teams-oversubscription" {{.*}}.f90" +! CHECK-TEAMS-OVS: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-teams-oversubscription" {{.*}}.f90" ! RUN: %flang -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -101,7 +101,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-assume-no-nested-parallelism \ ! RUN: | FileCheck %s --check-prefixes=CHECK-NEST-PAR -! CHECK-NEST-PAR: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-no-nested-parallelism" {{.*}}.f90" +! CHECK-NEST-PAR: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-no-nested-parallelism" {{.*}}.f90" ! RUN: %flang -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -113,7 +113,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-assume-no-thread-state \ ! RUN: | FileCheck %s --check-prefixes=CHECK-THREAD-STATE -! CHECK-THREAD-STATE: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-no-thread-state" {{.*}}.f90" +! CHECK-THREAD-STATE: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-no-thread-state" {{.*}}.f90" ! RUN: %flang -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -125,7 +125,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-target-debug \ ! RUN: | FileCheck %s --check-prefixes=CHECK-TARGET-DEBUG -! CHECK-TARGET-DEBUG: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug" {{.*}}.f90" +! CHECK-TARGET-DEBUG: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug" {{.*}}.f90" ! RUN: %flang -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -137,7 +137,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-target-debug \ ! RUN: | FileCheck %s --check-prefixes=CHECK-TARGET-DEBUG -! CHECK-TARGET-DEBUG-EQ: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug=111" {{.*}}.f90" +! CHECK-TARGET-DEBUG-EQ: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug=111" {{.*}}.f90" ! RUN: %flang -S -### %s -o %t 2>&1 \ ! RUN: -fopenmp --offload-arch=gfx90a \ @@ -153,7 +153,7 @@ ! RUN: -fopenmp-assume-teams-oversubscription -fopenmp-assume-no-nested-parallelism \ ! RUN: -fopenmp-assume-no-thread-state \ ! RUN: | FileCheck %s --check-prefixes=CHECK-RTL-ALL -! CHECK-RTL-ALL: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug" "-fopenmp-assume-teams-oversubscription" +! CHECK-RTL-ALL: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug" "-fopenmp-assume-teams-oversubscription" ! CHECK-RTL-ALL: "-fopenmp-assume-threads-oversubscription" "-fopenmp-assume-no-thread-state" "-fopenmp-assume-no-nested-parallelism" ! CHECK-RTL-ALL: {{.*}}.f90" @@ -167,7 +167,7 @@ ! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \ ! RUN: -fopenmp-version=45 \ ! RUN: | FileCheck %s --check-prefixes=CHECK-OPENMP-VERSION -! CHECK-OPENMP-VERSION: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" "-fopenmp-version=45" {{.*}}.f90" +! CHECK-OPENMP-VERSION: "{{[^"]*}}flang" "-fc1" {{.*}} "-fopenmp" "-fopenmp-version=45" {{.*}}.f90" ! Test diagnostic error when host IR file is non-existent ! RUN: not %flang_fc1 %s -o %t 2>&1 -fopenmp -fopenmp-is-target-device \ @@ -187,7 +187,7 @@ ! RUN: --target=aarch64-unknown-linux-gnu \ ! RUN: | FileCheck %s --check-prefix=FORCE-USM-NO-OFFLOAD -! FORCE-USM-NO-OFFLOAD: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! FORCE-USM-NO-OFFLOAD: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! FORCE-USM-NO-OFFLOAD-SAME: "-fopenmp" "-fopenmp-force-usm" ! Test -fopenmp-force-usm option with offload @@ -196,16 +196,16 @@ ! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\ ! RUN: | FileCheck %s --check-prefix=FORCE-USM-OFFLOAD -! FORCE-USM-OFFLOAD: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" +! FORCE-USM-OFFLOAD: "{{[^"]*}}flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" ! FORCE-USM-OFFLOAD-SAME: "-fopenmp" "-fopenmp-force-usm" -! FORCE-USM-OFFLOAD-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! FORCE-USM-OFFLOAD-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" ! FORCE-USM-OFFLOAD-SAME: "-fopenmp" "-fopenmp-force-usm" ! RUN: %flang -### -v --target=x86_64-unknown-linux-gnu -fopenmp \ ! RUN: --offload-arch=gfx900 \ ! RUN: --rocm-path=%S/Inputs/rocm %s 2>&1 \ ! RUN: | FileCheck --check-prefix=MLINK-BUILTIN-BITCODE %s -! MLINK-BUILTIN-BITCODE: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! MLINK-BUILTIN-BITCODE: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" ! MLINK-BUILTIN-BITCODE-SAME: "-mlink-builtin-bitcode" {{.*Inputs.*rocm.*amdgcn.*bitcode.*}}oclc_isa_version_900.bc ! Test that the -fopenmp-targets option is added to host compilation invocations @@ -219,9 +219,9 @@ ! RUN: --target=x86_64-unknown-linux-gnu -nogpulib \ ! RUN: | FileCheck %s --check-prefix=OFFLOAD-TARGETS -! OFFLOAD-TARGETS: "{{[^"]*}}flang-new" "-fc1" "-triple" "x86_64-unknown-linux-gnu" +! OFFLOAD-TARGETS: "{{[^"]*}}flang" "-fc1" "-triple" "x86_64-unknown-linux-gnu" ! OFFLOAD-TARGETS-SAME: "-fopenmp-targets=amdgcn-amd-amdhsa" -! OFFLOAD-TARGETS-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" +! OFFLOAD-TARGETS-NEXT: "{{[^"]*}}flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" ! OFFLOAD-TARGETS-NOT: -fopenmp-targets -! OFFLOAD-TARGETS: "{{[^"]*}}flang-new" "-fc1" "-triple" "x86_64-unknown-linux-gnu" +! OFFLOAD-TARGETS: "{{[^"]*}}flang" "-fc1" "-triple" "x86_64-unknown-linux-gnu" ! OFFLOAD-TARGETS-SAME: "-fopenmp-targets=amdgcn-amd-amdhsa" diff --git a/flang/test/Driver/predefined-macros-compiler-version.F90 b/flang/test/Driver/predefined-macros-compiler-version.F90 index 823a730f96845a..f6924479281562 100644 --- a/flang/test/Driver/predefined-macros-compiler-version.F90 +++ b/flang/test/Driver/predefined-macros-compiler-version.F90 @@ -1,12 +1,12 @@ ! Check that the driver correctly defines macros with the compiler version !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: %flang_fc1 -E %s 2>&1 | FileCheck %s --ignore-case !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -E %s 2>&1 | FileCheck %s --ignore-case diff --git a/flang/test/Driver/std2018-wrong.f90 b/flang/test/Driver/std2018-wrong.f90 index 27ccc76bd39aad..93ba153d75f7f9 100644 --- a/flang/test/Driver/std2018-wrong.f90 +++ b/flang/test/Driver/std2018-wrong.f90 @@ -1,7 +1,7 @@ ! Ensure argument -std=f2018 works as expected. !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: not %flang_fc1 -std=90 %s 2>&1 | FileCheck %s --check-prefix=WRONG diff --git a/flang/test/Driver/std2018.f90 b/flang/test/Driver/std2018.f90 index cf461cf89e4e19..1727f92127b711 100644 --- a/flang/test/Driver/std2018.f90 +++ b/flang/test/Driver/std2018.f90 @@ -1,7 +1,7 @@ ! Ensure argument -std=f2018 works as expected. !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -fsyntax-only %s 2>&1 | FileCheck %s --allow-empty --check-prefix=WITHOUT ! RUN: %flang_fc1 -fsyntax-only -std=f2018 %s 2>&1 | FileCheck %s --check-prefix=GIVEN diff --git a/flang/test/Driver/supported-suffices/f03-suffix.f03 b/flang/test/Driver/supported-suffices/f03-suffix.f03 index 6e03f9f43fc602..1d850305cd040e 100644 --- a/flang/test/Driver/supported-suffices/f03-suffix.f03 +++ b/flang/test/Driver/supported-suffices/f03-suffix.f03 @@ -1,5 +1,5 @@ ! RUN: %flang -### %s 2>&1 | FileCheck %s -! CHECK: "{{.*}}flang-new" "-fc1" {{.*}} "-o" "{{.*}}.o" +! CHECK: "{{.*}}flang" "-fc1" {{.*}} "-o" "{{.*}}.o" program f03 end program f03 diff --git a/flang/test/Driver/supported-suffices/f08-suffix.f08 b/flang/test/Driver/supported-suffices/f08-suffix.f08 index d5bcf4ce1de1cc..2b31e4c21876ae 100644 --- a/flang/test/Driver/supported-suffices/f08-suffix.f08 +++ b/flang/test/Driver/supported-suffices/f08-suffix.f08 @@ -1,5 +1,5 @@ ! RUN: %flang -### %s 2>&1 | FileCheck %s -! CHECK: "{{.*}}flang-new" "-fc1" {{.*}} "-o" "{{.*}}.o" +! CHECK: "{{.*}}flang" "-fc1" {{.*}} "-o" "{{.*}}.o" program f08 end program f08 diff --git a/flang/test/Driver/use-module-error.f90 b/flang/test/Driver/use-module-error.f90 index 0b47b682d938c0..bb37f0275701b8 100644 --- a/flang/test/Driver/use-module-error.f90 +++ b/flang/test/Driver/use-module-error.f90 @@ -1,7 +1,7 @@ ! Ensure that multiple module directories are not allowed !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: %flang -fsyntax-only -J %S/Inputs/ %s 2>&1 | FileCheck %s --allow-empty --check-prefix=SINGLEINCLUDE ! RUN: %flang -fsyntax-only -J %S/Inputs/ -J %S/Inputs/ %s 2>&1 | FileCheck %s --allow-empty --check-prefix=SINGLEINCLUDE @@ -13,7 +13,7 @@ ! RUN: not %flang -fsyntax-only -module-dir %S/Inputs/module-dir -J%S/Inputs/ %s 2>&1 | FileCheck %s --check-prefix=DOUBLEINCLUDE !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -fsyntax-only -J %S/Inputs/ %s 2>&1 | FileCheck %s --allow-empty --check-prefix=SINGLEINCLUDE ! RUN: %flang_fc1 -fsyntax-only -J %S/Inputs/ -J %S/Inputs/ %s 2>&1 | FileCheck %s --allow-empty --check-prefix=SINGLEINCLUDE diff --git a/flang/test/Driver/use-module.f90 b/flang/test/Driver/use-module.f90 index 775c0424715883..2c3a38043fe16e 100644 --- a/flang/test/Driver/use-module.f90 +++ b/flang/test/Driver/use-module.f90 @@ -1,7 +1,7 @@ ! Checks that module search directories specified with `-J/-module-dir` and `-I` are handled correctly !-------------------------- -! FLANG DRIVER (flang-new) +! FLANG DRIVER (flang) !-------------------------- ! RUN: %flang -fsyntax-only -I %S/Inputs -I %S/Inputs/module-dir %s 2>&1 | FileCheck %s --check-prefix=INCLUDED --allow-empty ! RUN: %flang -fsyntax-only -I %S/Inputs -J %S/Inputs/module-dir %s 2>&1 | FileCheck %s --check-prefix=INCLUDED --allow-empty @@ -16,7 +16,7 @@ ! RUN: not %flang -fsyntax-only -module-dir %S/Inputs/module-dir %s 2>&1 | FileCheck %s --check-prefix=SINGLEINCLUDE !----------------------------------------- -! FRONTEND FLANG DRIVER (flang-new -fc1) +! FRONTEND FLANG DRIVER (flang -fc1) !----------------------------------------- ! RUN: %flang_fc1 -fsyntax-only -I %S/Inputs -I %S/Inputs/module-dir %s 2>&1 | FileCheck %s --check-prefix=INCLUDED --allow-empty ! RUN: %flang_fc1 -fsyntax-only -I %S/Inputs -J %S/Inputs/module-dir %s 2>&1 | FileCheck %s --check-prefix=INCLUDED --allow-empty diff --git a/flang/test/Driver/version-loops.f90 b/flang/test/Driver/version-loops.f90 index b0fa01d572512a..d206393a04f486 100644 --- a/flang/test/Driver/version-loops.f90 +++ b/flang/test/Driver/version-loops.f90 @@ -1,5 +1,5 @@ -! Test that flang-new forwards the -f{no-,}version-loops-for-stride -! options correctly to flang-new -fc1 for different variants of optimisation +! Test that flang forwards the -f{no-,}version-loops-for-stride +! options correctly to flang -fc1 for different variants of optimisation ! and explicit flags. ! RUN: %flang -### %s -o %t 2>&1 -O3 \ @@ -23,32 +23,32 @@ ! RUN: %flang -### %s -o %t 2>&1 -O3 -fno-version-loops-for-stride \ ! RUN: | FileCheck %s --check-prefix=CHECK-O3-no -! CHECK: "{{.*}}flang-new" "-fc1" +! CHECK: "{{.*}}flang" "-fc1" ! CHECK-SAME: "-fversion-loops-for-stride" ! CHECK-SAME: "-O3" -! CHECK-O2: "{{.*}}flang-new" "-fc1" +! CHECK-O2: "{{.*}}flang" "-fc1" ! CHECK-O2-NOT: "-fversion-loops-for-stride" ! CHECK-O2-SAME: "-O2" -! CHECK-O2-with: "{{.*}}flang-new" "-fc1" +! CHECK-O2-with: "{{.*}}flang" "-fc1" ! CHECK-O2-with-SAME: "-fversion-loops-for-stride" ! CHECK-O2-with-SAME: "-O2" -! CHECK-O4: "{{.*}}flang-new" "-fc1" +! CHECK-O4: "{{.*}}flang" "-fc1" ! CHECK-O4-SAME: "-fversion-loops-for-stride" ! CHECK-O4-SAME: "-O3" -! CHECK-Ofast: "{{.*}}flang-new" "-fc1" +! CHECK-Ofast: "{{.*}}flang" "-fc1" ! CHECK-Ofast-SAME: "-ffast-math" ! CHECK-Ofast-SAME: "-fversion-loops-for-stride" ! CHECK-Ofast-SAME: "-O3" -! CHECK-Ofast-no: "{{.*}}flang-new" "-fc1" +! CHECK-Ofast-no: "{{.*}}flang" "-fc1" ! CHECK-Ofast-no-SAME: "-ffast-math" ! CHECK-Ofast-no-NOT: "-fversion-loops-for-stride" ! CHECK-Ofast-no-SAME: "-O3" -! CHECK-O3-no: "{{.*}}flang-new" "-fc1" +! CHECK-O3-no: "{{.*}}flang" "-fc1" ! CHECK-O3-no-NOT: "-fversion-loops-for-stride" ! CHECK-O3-no-SAME: "-O3" diff --git a/flang/test/Driver/wextra-ok.f90 b/flang/test/Driver/wextra-ok.f90 index 6a38d9481a36b7..441029aa0af276 100644 --- a/flang/test/Driver/wextra-ok.f90 +++ b/flang/test/Driver/wextra-ok.f90 @@ -1,4 +1,4 @@ -! Ensure that supplying -Wextra into flang-new does not raise error +! Ensure that supplying -Wextra into flang does not raise error ! The first check should be changed if -Wextra is implemented ! RUN: %flang -std=f2018 -Wextra %s -c 2>&1 | FileCheck %s --check-prefix=CHECK-OK diff --git a/flang/test/HLFIR/hlfir-flags.f90 b/flang/test/HLFIR/hlfir-flags.f90 index b383a79d12c27b..0b1e80b1e3f636 100644 --- a/flang/test/HLFIR/hlfir-flags.f90 +++ b/flang/test/HLFIR/hlfir-flags.f90 @@ -1,4 +1,4 @@ -! Test -flang-deprecated-hlfir, -flang-experimental-hlfir (flang-new), and +! Test -flang-deprecated-hlfir, -flang-experimental-hlfir (flang), and ! -hlfir (bbc), -emit-hlfir, -emit-fir flags ! RUN: %flang_fc1 -emit-hlfir -o - %s | FileCheck --check-prefix HLFIR --check-prefix ALL %s ! RUN: bbc -emit-hlfir -o - %s | FileCheck --check-prefix HLFIR --check-prefix ALL %s diff --git a/flang/test/Lower/Intrinsics/command_argument_count.f90 b/flang/test/Lower/Intrinsics/command_argument_count.f90 index 0cf92d4444db98..a30b27d664fc0c 100644 --- a/flang/test/Lower/Intrinsics/command_argument_count.f90 +++ b/flang/test/Lower/Intrinsics/command_argument_count.f90 @@ -1,6 +1,6 @@ ! RUN: bbc -emit-fir %s -o - | FileCheck %s -! bbc doesn't have a way to set the default kinds so we use flang-new driver -! RUN: flang-new -fc1 -fdefault-integer-8 -emit-fir %s -o - | FileCheck --check-prefixes=CHECK,CHECK-64 %s +! bbc doesn't have a way to set the default kinds so we use flang driver +! RUN: flang -fc1 -fdefault-integer-8 -emit-fir %s -o - | FileCheck --check-prefixes=CHECK,CHECK-64 %s ! CHECK-LABEL: argument_count_test subroutine argument_count_test() diff --git a/flang/test/Lower/Intrinsics/exit.f90 b/flang/test/Lower/Intrinsics/exit.f90 index c3110fcbec2b5a..bd551f7318a84a 100644 --- a/flang/test/Lower/Intrinsics/exit.f90 +++ b/flang/test/Lower/Intrinsics/exit.f90 @@ -1,5 +1,5 @@ ! RUN: bbc -emit-fir -hlfir=false %s -o - | FileCheck --check-prefixes=CHECK,CHECK-32 -DDEFAULT_INTEGER_SIZE=32 %s -! bbc doesn't have a way to set the default kinds so we use flang-new driver +! bbc doesn't have a way to set the default kinds so we use flang driver ! RUN: %flang_fc1 -fdefault-integer-8 -emit-fir -flang-deprecated-no-hlfir %s -o - | FileCheck --check-prefixes=CHECK,CHECK-64 -DDEFAULT_INTEGER_SIZE=64 %s ! CHECK-LABEL: func @_QPexit_test1() { diff --git a/flang/test/Lower/Intrinsics/ieee_is_normal.f90 b/flang/test/Lower/Intrinsics/ieee_is_normal.f90 index f9ab01881d250d..9b864c9a9849c3 100644 --- a/flang/test/Lower/Intrinsics/ieee_is_normal.f90 +++ b/flang/test/Lower/Intrinsics/ieee_is_normal.f90 @@ -1,5 +1,5 @@ ! RUN: bbc -emit-fir %s -o - | FileCheck %s -! RUN: flang-new -fc1 -emit-fir %s -o - | FileCheck %s +! RUN: flang -fc1 -emit-fir %s -o - | FileCheck %s ! CHECK-LABEL: ieee_is_normal_f16 subroutine ieee_is_normal_f16(r) diff --git a/flang/test/Lower/Intrinsics/isnan.f90 b/flang/test/Lower/Intrinsics/isnan.f90 index 700b2d1a67c656..62b98c8ea98bee 100644 --- a/flang/test/Lower/Intrinsics/isnan.f90 +++ b/flang/test/Lower/Intrinsics/isnan.f90 @@ -1,5 +1,5 @@ ! RUN: bbc -emit-fir %s -o - | FileCheck %s -! RUN: flang-new -fc1 -emit-fir %s -o - | FileCheck %s +! RUN: flang -fc1 -emit-fir %s -o - | FileCheck %s ! CHECK-LABEL: isnan_f32 subroutine isnan_f32(r) diff --git a/flang/test/Lower/Intrinsics/modulo.f90 b/flang/test/Lower/Intrinsics/modulo.f90 index ac18e59033a6b6..781ef8296a2b7d 100644 --- a/flang/test/Lower/Intrinsics/modulo.f90 +++ b/flang/test/Lower/Intrinsics/modulo.f90 @@ -1,5 +1,5 @@ ! RUN: bbc -emit-fir -hlfir=false %s -o - | FileCheck %s -check-prefixes=HONORINF,ALL -! RUN: flang-new -fc1 -menable-no-infs -emit-fir -flang-deprecated-no-hlfir %s -o - | FileCheck %s -check-prefixes=CHECK,ALL +! RUN: flang -fc1 -menable-no-infs -emit-fir -flang-deprecated-no-hlfir %s -o - | FileCheck %s -check-prefixes=CHECK,ALL ! ALL-LABEL: func @_QPmodulo_testr( ! ALL-SAME: %[[arg0:.*]]: !fir.ref{{.*}}, %[[arg1:.*]]: !fir.ref{{.*}}, %[[arg2:.*]]: !fir.ref{{.*}}) { diff --git a/flang/test/Lower/OpenMP/Todo/omp-declarative-allocate.f90 b/flang/test/Lower/OpenMP/Todo/omp-declarative-allocate.f90 index f02884e5e92f38..425ccbc5dd56c5 100644 --- a/flang/test/Lower/OpenMP/Todo/omp-declarative-allocate.f90 +++ b/flang/test/Lower/OpenMP/Todo/omp-declarative-allocate.f90 @@ -1,6 +1,6 @@ ! This test checks lowering of OpenMP allocate Directive. -// RUN: not flang-new -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s +// RUN: not flang -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s program main integer :: x, y diff --git a/flang/test/Lower/OpenMP/Todo/omp-declare-reduction.f90 b/flang/test/Lower/OpenMP/Todo/omp-declare-reduction.f90 index 3be61a1700ced3..7a7d28db8d6f5a 100644 --- a/flang/test/Lower/OpenMP/Todo/omp-declare-reduction.f90 +++ b/flang/test/Lower/OpenMP/Todo/omp-declare-reduction.f90 @@ -1,6 +1,6 @@ ! This test checks lowering of OpenMP declare reduction Directive. -// RUN: not flang-new -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s +// RUN: not flang -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s subroutine declare_red() integer :: my_var diff --git a/flang/test/Lower/OpenMP/Todo/omp-declare-simd.f90 b/flang/test/Lower/OpenMP/Todo/omp-declare-simd.f90 index c6a0a8f2cd0d22..be1ac2db5dfa4a 100644 --- a/flang/test/Lower/OpenMP/Todo/omp-declare-simd.f90 +++ b/flang/test/Lower/OpenMP/Todo/omp-declare-simd.f90 @@ -1,6 +1,6 @@ ! This test checks lowering of OpenMP declare simd Directive. -// RUN: not flang-new -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s +// RUN: not flang -fc1 -emit-fir -fopenmp %s 2>&1 | FileCheck %s subroutine sub(x, y) real, intent(inout) :: x, y diff --git a/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 b/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 index 62bc247a1456a1..bc5baf4e1cf604 100644 --- a/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 +++ b/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 @@ -1,7 +1,7 @@ ! This test checks lowering of `LASTPRIVATE` clause for scalar types. ! RUN: bbc -fopenmp -emit-hlfir %s -o - | FileCheck %s -! RUN: flang-new -fc1 -fopenmp -emit-hlfir %s -o - | FileCheck %s +! RUN: flang -fc1 -fopenmp -emit-hlfir %s -o - | FileCheck %s !CHECK: func @_QPlastprivate_character(%[[ARG1:.*]]: !fir.boxchar<1>{{.*}}) { !CHECK-DAG: %[[ARG1_UNBOX:.*]]:2 = fir.unboxchar diff --git a/flang/test/Lower/OpenMP/parallel-wsloop-reduction-byref.f90 b/flang/test/Lower/OpenMP/parallel-wsloop-reduction-byref.f90 index 32caac39778dee..99c521406a7775 100644 --- a/flang/test/Lower/OpenMP/parallel-wsloop-reduction-byref.f90 +++ b/flang/test/Lower/OpenMP/parallel-wsloop-reduction-byref.f90 @@ -1,7 +1,7 @@ ! Check that for parallel do, reduction is only processed for the loop ! RUN: bbc -fopenmp --force-byref-reduction -emit-hlfir %s -o - | FileCheck %s -! RUN: flang-new -fc1 -fopenmp -mmlir --force-byref-reduction -emit-hlfir %s -o - | FileCheck %s +! RUN: flang -fc1 -fopenmp -mmlir --force-byref-reduction -emit-hlfir %s -o - | FileCheck %s ! CHECK: omp.parallel { ! CHECK: omp.wsloop reduction(byref @add_reduction_byref_i32 diff --git a/flang/test/Lower/OpenMP/parallel-wsloop-reduction.f90 b/flang/test/Lower/OpenMP/parallel-wsloop-reduction.f90 index fdedbb06160761..cfeb5de83f4e82 100644 --- a/flang/test/Lower/OpenMP/parallel-wsloop-reduction.f90 +++ b/flang/test/Lower/OpenMP/parallel-wsloop-reduction.f90 @@ -1,7 +1,7 @@ ! Check that for parallel do, reduction is only processed for the loop ! RUN: bbc -fopenmp -emit-hlfir %s -o - | FileCheck %s -! RUN: flang-new -fc1 -fopenmp -emit-hlfir %s -o - | FileCheck %s +! RUN: flang -fc1 -fopenmp -emit-hlfir %s -o - | FileCheck %s ! CHECK: omp.parallel { ! CHECK: omp.wsloop reduction(@add_reduction_i32 diff --git a/flang/test/lit.cfg.py b/flang/test/lit.cfg.py index 4acbc0606d1977..f43234fb125b7e 100644 --- a/flang/test/lit.cfg.py +++ b/flang/test/lit.cfg.py @@ -132,13 +132,13 @@ tools = [ ToolSubst( "%flang", - command=FindTool("flang-new"), + command=FindTool("flang"), extra_args=isysroot_flag, unresolved="fatal", ), ToolSubst( "%flang_fc1", - command=FindTool("flang-new"), + command=FindTool("flang"), extra_args=["-fc1"], unresolved="fatal", ), diff --git a/flang/tools/f18/CMakeLists.txt b/flang/tools/f18/CMakeLists.txt index 9d7b8633958cb7..4362fcf0537616 100644 --- a/flang/tools/f18/CMakeLists.txt +++ b/flang/tools/f18/CMakeLists.txt @@ -55,7 +55,7 @@ endif() set(module_objects "") # Create module files directly from the top-level module source directory. -# If CMAKE_CROSSCOMPILING, then the newly built flang-new executable was +# If CMAKE_CROSSCOMPILING, then the newly built flang executable was # cross compiled, and thus can't be executed on the build system and thus # can't be used for generating module files. if (NOT CMAKE_CROSSCOMPILING) @@ -115,9 +115,9 @@ if (NOT CMAKE_CROSSCOMPILING) # TODO: We may need to flag this with conditional, in case Flang is built w/o OpenMP support add_custom_command(OUTPUT ${base}.mod ${object_output} COMMAND ${CMAKE_COMMAND} -E make_directory ${FLANG_INTRINSIC_MODULES_DIR} - COMMAND flang-new ${opts} ${decls} -cpp ${compile_with} -module-dir ${FLANG_INTRINSIC_MODULES_DIR} + COMMAND flang ${opts} ${decls} -cpp ${compile_with} -module-dir ${FLANG_INTRINSIC_MODULES_DIR} ${FLANG_SOURCE_DIR}/module/${filename}.f90 - DEPENDS flang-new ${FLANG_SOURCE_DIR}/module/${filename}.f90 ${FLANG_SOURCE_DIR}/module/__fortran_builtins.f90 ${depends} + DEPENDS flang ${FLANG_SOURCE_DIR}/module/${filename}.f90 ${FLANG_SOURCE_DIR}/module/__fortran_builtins.f90 ${depends} ) list(APPEND MODULE_FILES ${base}.mod) install(FILES ${base}.mod DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}/flang") @@ -142,9 +142,9 @@ if (NOT CMAKE_CROSSCOMPILING) set(base ${FLANG_INTRINSIC_MODULES_DIR}/omp_lib) add_custom_command(OUTPUT ${base}.mod ${base}_kinds.mod COMMAND ${CMAKE_COMMAND} -E make_directory ${FLANG_INTRINSIC_MODULES_DIR} - COMMAND flang-new -cpp -fsyntax-only ${opts} -module-dir ${FLANG_INTRINSIC_MODULES_DIR} + COMMAND flang -cpp -fsyntax-only ${opts} -module-dir ${FLANG_INTRINSIC_MODULES_DIR} ${CMAKE_BINARY_DIR}/projects/openmp/runtime/src/omp_lib.F90 - DEPENDS flang-new ${FLANG_INTRINSIC_MODULES_DIR}/iso_c_binding.mod ${CMAKE_BINARY_DIR}/projects/openmp/runtime/src/omp_lib.F90 ${depends} + DEPENDS flang ${FLANG_INTRINSIC_MODULES_DIR}/iso_c_binding.mod ${CMAKE_BINARY_DIR}/projects/openmp/runtime/src/omp_lib.F90 ${depends} ) add_custom_command(OUTPUT ${base}.f18.mod DEPENDS ${base}.mod diff --git a/flang/tools/flang-driver/CMakeLists.txt b/flang/tools/flang-driver/CMakeLists.txt index 9f33cdfe3fa90f..615c673374faf4 100644 --- a/flang/tools/flang-driver/CMakeLists.txt +++ b/flang/tools/flang-driver/CMakeLists.txt @@ -11,18 +11,18 @@ set( LLVM_LINK_COMPONENTS TargetParser ) -add_flang_tool(flang-new +add_flang_tool(flang driver.cpp fc1_main.cpp ) -target_link_libraries(flang-new +target_link_libraries(flang PRIVATE flangFrontend flangFrontendTool ) -clang_target_link_libraries(flang-new +clang_target_link_libraries(flang PRIVATE clangDriver clangBasic @@ -30,9 +30,9 @@ clang_target_link_libraries(flang-new option(FLANG_PLUGIN_SUPPORT "Build Flang with plugin support." ON) -# Enable support for plugins, which need access to symbols from flang-new +# Enable support for plugins, which need access to symbols from flang if(FLANG_PLUGIN_SUPPORT) - export_executable_symbols_for_plugins(flang-new) + export_executable_symbols_for_plugins(flang) endif() -install(TARGETS flang-new DESTINATION "${CMAKE_INSTALL_BINDIR}") +install(TARGETS flang DESTINATION "${CMAKE_INSTALL_BINDIR}") diff --git a/flang/tools/flang-driver/driver.cpp b/flang/tools/flang-driver/driver.cpp index 52136df10c0b02..603aab4205836c 100644 --- a/flang/tools/flang-driver/driver.cpp +++ b/flang/tools/flang-driver/driver.cpp @@ -95,7 +95,7 @@ int main(int argc, const char **argv) { llvm::StringSaver saver(a); ExpandResponseFiles(saver, args); - // Check if flang-new is in the frontend mode + // Check if flang is in the frontend mode auto firstArg = std::find_if(args.begin() + 1, args.end(), [](const char *a) { return a != nullptr; }); if (firstArg != args.end()) { @@ -104,7 +104,7 @@ int main(int argc, const char **argv) { << "Valid tools include '-fc1'.\n"; return 1; } - // Call flang-new frontend + // Call flang frontend if (llvm::StringRef(args[1]).starts_with("-fc1")) { return executeFC1Tool(args); } @@ -140,7 +140,7 @@ int main(int argc, const char **argv) { // Set the environment variable, FLANG_COMPILER_OPTIONS_STRING, to contain all // the compiler options. This is intended for the frontend driver, - // flang-new -fc1, to enable the implementation of the COMPILER_OPTIONS + // flang -fc1, to enable the implementation of the COMPILER_OPTIONS // intrinsic. To this end, the frontend driver requires the list of the // original compiler options, which is not available through other means. // TODO: This way of passing information between the compiler and frontend diff --git a/llvm/runtimes/CMakeLists.txt b/llvm/runtimes/CMakeLists.txt index d948b7eb39b39c..9da1f926817a8b 100644 --- a/llvm/runtimes/CMakeLists.txt +++ b/llvm/runtimes/CMakeLists.txt @@ -504,15 +504,15 @@ if(build_runtimes) if("openmp" IN_LIST LLVM_ENABLE_RUNTIMES) if (${LLVM_TOOL_FLANG_BUILD}) - message(STATUS "Configuring build of omp_lib.mod and omp_lib_kinds.mod via flang-new") - set(LIBOMP_FORTRAN_MODULES_COMPILER "${CMAKE_BINARY_DIR}/bin/flang-new") + message(STATUS "Configuring build of omp_lib.mod and omp_lib_kinds.mod via flang") + set(LIBOMP_FORTRAN_MODULES_COMPILER "${CMAKE_BINARY_DIR}/bin/flang") set(LIBOMP_MODULES_INSTALL_PATH "${CMAKE_INSTALL_INCLUDEDIR}/flang") # TODO: This is a workaround until flang becomes a first-class project - # in llvm/CMakeList.txt. Until then, this line ensures that flang-new is - # built before "openmp" is built as a runtime project. Besides "flang-new" + # in llvm/CMakeList.txt. Until then, this line ensures that flang is + # built before "openmp" is built as a runtime project. Besides "flang" # to build the compiler, we also need to add "module_files" to make sure # that all .mod files are also properly build. - list(APPEND extra_deps "flang-new" "module_files") + list(APPEND extra_deps "flang" "module_files") endif() foreach(dep opt llvm-link llvm-extract clang clang-offload-packager) if(TARGET ${dep}) diff --git a/offload/CMakeLists.txt b/offload/CMakeLists.txt index 9ffe8f56b76e67..9b771d1116ee38 100644 --- a/offload/CMakeLists.txt +++ b/offload/CMakeLists.txt @@ -89,9 +89,9 @@ else() # Check for flang if (NOT MSVC) - set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang-new) + set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang) else() - set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang-new.exe) + set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang.exe) endif() # Set fortran test compiler if flang is found diff --git a/openmp/CMakeLists.txt b/openmp/CMakeLists.txt index 3b4259dfa380e8..c206386fa6b614 100644 --- a/openmp/CMakeLists.txt +++ b/openmp/CMakeLists.txt @@ -69,9 +69,9 @@ else() # Check for flang if (NOT MSVC) - set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang-new) + set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang) else() - set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang-new.exe) + set(OPENMP_TEST_Fortran_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/flang.exe) endif() # Set fortran test compiler if flang is found >From fdd11969f56f8ec81bfd5ee5a06e991840291747 Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Thu, 26 Sep 2024 10:39:53 -0500 Subject: [PATCH 2/7] [flang][driver] restore flang-new as symlink Restore flang-new as a symlink to flang for backwards compatibility Co-authored-by: H. Vetinari Co-authored-by: Andrzej Warzynski --- clang/lib/Driver/ToolChain.cpp | 3 +++ flang/tools/flang-driver/CMakeLists.txt | 4 ++++ flang/tools/flang-driver/driver.cpp | 3 ++- 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp index de250322b3b34d..4df31770950858 100644 --- a/clang/lib/Driver/ToolChain.cpp +++ b/clang/lib/Driver/ToolChain.cpp @@ -386,6 +386,9 @@ static const DriverSuffix *FindDriverSuffix(StringRef ProgName, size_t &Pos) { {"cl", "--driver-mode=cl"}, {"++", "--driver-mode=g++"}, {"flang", "--driver-mode=flang"}, + // For backwards compatibility, we create a symlink for `flang` called + // `flang-new`. This will be removed in the future. + {"flang-new", "--driver-mode=flang"}, {"clang-dxc", "--driver-mode=dxc"}, }; diff --git a/flang/tools/flang-driver/CMakeLists.txt b/flang/tools/flang-driver/CMakeLists.txt index 615c673374faf4..063acdd7dfe57c 100644 --- a/flang/tools/flang-driver/CMakeLists.txt +++ b/flang/tools/flang-driver/CMakeLists.txt @@ -36,3 +36,7 @@ if(FLANG_PLUGIN_SUPPORT) endif() install(TARGETS flang DESTINATION "${CMAKE_INSTALL_BINDIR}") + +# Keep "flang-new" as a symlink for backwards compatiblity. Remove once "flang" +# is a widely adopted name. +add_flang_symlink(flang-new flang) diff --git a/flang/tools/flang-driver/driver.cpp b/flang/tools/flang-driver/driver.cpp index 603aab4205836c..ed52988feaa59c 100644 --- a/flang/tools/flang-driver/driver.cpp +++ b/flang/tools/flang-driver/driver.cpp @@ -88,7 +88,8 @@ int main(int argc, const char **argv) { llvm::InitLLVM x(argc, argv); llvm::SmallVector args(argv, argv + argc); - clang::driver::ParsedClangName targetandMode("flang", "--driver-mode=flang"); + clang::driver::ParsedClangName targetandMode = + clang::driver::ToolChain::getTargetAndModeFromProgramName(argv[0]); std::string driverPath = getExecutablePath(args[0]); llvm::BumpPtrAllocator a; >From 48d19d7c43591268e4ce434c3c2a28b588c9fe95 Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Mon, 30 Sep 2024 10:16:59 -0500 Subject: [PATCH 3/7] [flang][driver] add version to flang executable --- flang/tools/flang-driver/CMakeLists.txt | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/flang/tools/flang-driver/CMakeLists.txt b/flang/tools/flang-driver/CMakeLists.txt index 063acdd7dfe57c..9a89a6185a3291 100644 --- a/flang/tools/flang-driver/CMakeLists.txt +++ b/flang/tools/flang-driver/CMakeLists.txt @@ -28,6 +28,12 @@ clang_target_link_libraries(flang clangBasic ) +# This creates the executable with a version appended +# and creates a symlink to it without the version +if(CYGWIN OR NOT WIN32) # but it doesn't work on Windows + set_target_properties(flang PROPERTIES VERSION ${FLANG_EXECUTABLE_VERSION}) +endif() + option(FLANG_PLUGIN_SUPPORT "Build Flang with plugin support." ON) # Enable support for plugins, which need access to symbols from flang >From 39fb4c728ab876148725e35fc02887702433eb65 Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Thu, 3 Oct 2024 14:12:35 -0700 Subject: [PATCH 4/7] [flang][driver] add warning when using openmp --- clang/include/clang/Basic/DiagnosticDriverKinds.td | 3 +++ clang/include/clang/Basic/DiagnosticGroups.td | 4 ++++ clang/lib/Driver/ToolChains/Flang.cpp | 3 +++ 3 files changed, 10 insertions(+) diff --git a/clang/include/clang/Basic/DiagnosticDriverKinds.td b/clang/include/clang/Basic/DiagnosticDriverKinds.td index 97573fcf20c1fb..68722ad9633120 100644 --- a/clang/include/clang/Basic/DiagnosticDriverKinds.td +++ b/clang/include/clang/Basic/DiagnosticDriverKinds.td @@ -147,6 +147,9 @@ def warn_drv_unsupported_option_for_processor : Warning< def warn_drv_unsupported_openmp_library : Warning< "the library '%0=%1' is not supported, OpenMP will not be enabled">, InGroup; +def warn_openmp_experimental : Warning< + "OpenMP support in flang is still experimental">, + InGroup; def err_drv_invalid_thread_model_for_target : Error< "invalid thread model '%0' in '%1' for this target">; diff --git a/clang/include/clang/Basic/DiagnosticGroups.td b/clang/include/clang/Basic/DiagnosticGroups.td index 41e719d4d57816..8273701e7b0963 100644 --- a/clang/include/clang/Basic/DiagnosticGroups.td +++ b/clang/include/clang/Basic/DiagnosticGroups.td @@ -1583,3 +1583,7 @@ def ExtractAPIMisuse : DiagGroup<"extractapi-misuse">; // Warnings about using the non-standard extension having an explicit specialization // with a storage class specifier. def ExplicitSpecializationStorageClass : DiagGroup<"explicit-specialization-storage-class">; + +// A warning for options that enable a feature that is not yet complete +def ExperimentalOption : DiagGroup<"experimental-option">; + diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp index 1ca12ff81389a3..19b43594b00815 100644 --- a/clang/lib/Driver/ToolChains/Flang.cpp +++ b/clang/lib/Driver/ToolChains/Flang.cpp @@ -787,6 +787,9 @@ void Flang::ConstructJob(Compilation &C, const JobAction &JA, if (Args.hasArg(options::OPT_fopenmp_force_usm)) CmdArgs.push_back("-fopenmp-force-usm"); + // TODO: OpenMP support isn't "done" yet, so for now we warn that it + // is experimental. + D.Diag(diag::warn_openmp_experimental); // FIXME: Clang supports a whole bunch more flags here. break; >From c4f8e707de3f7c62ae323dfe821c5a3e075a27f7 Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Fri, 4 Oct 2024 06:27:05 -0700 Subject: [PATCH 5/7] [flang][doc] update note about CMake support --- flang/docs/FlangDriver.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/flang/docs/FlangDriver.md b/flang/docs/FlangDriver.md index 47cf078cf2d0d4..23cbab30ee903e 100644 --- a/flang/docs/FlangDriver.md +++ b/flang/docs/FlangDriver.md @@ -335,7 +335,7 @@ just added using your new frontend option. ## CMake Support As of [#7246](https://gitlab.kitware.com/cmake/cmake/-/merge_requests/7246) -(and soon to be released CMake 3.24.0), `cmake` can detect `flang` as a +(CMake 3.28.0), `cmake` can detect `flang` as a supported Fortran compiler. You can configure your CMake projects to use `flang` as follows: ```bash >From 12c46452ac837d4bc3c592f635aff68e83545252 Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Fri, 4 Oct 2024 07:11:16 -0700 Subject: [PATCH 6/7] [flang][test] fix tests broken by rename --- flang/test/Driver/driver-version.f90 | 2 +- flang/test/Driver/lto-flags.f90 | 2 +- flang/test/Driver/missing-input.f90 | 6 +++--- flang/test/Driver/multiple-input-files.f90 | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/flang/test/Driver/driver-version.f90 b/flang/test/Driver/driver-version.f90 index 4c6aecb1c4fa7e..6daeb0e767c0e0 100644 --- a/flang/test/Driver/driver-version.f90 +++ b/flang/test/Driver/driver-version.f90 @@ -9,7 +9,7 @@ ! VERSION-NEXT: Thread model: ! VERSION-NEXT: InstalledDir: -! ERROR: flang: error: unknown argument '--versions'; did you mean '--version'? +! ERROR: flang{{.*}}: error: unknown argument '--versions'; did you mean '--version'? ! VERSION-FC1: LLVM version diff --git a/flang/test/Driver/lto-flags.f90 b/flang/test/Driver/lto-flags.f90 index bad3d972e6bd6b..be9416810716a9 100644 --- a/flang/test/Driver/lto-flags.f90 +++ b/flang/test/Driver/lto-flags.f90 @@ -30,7 +30,7 @@ ! FULL-LTO: "-fc1" ! FULL-LTO-SAME: "-flto=full" -! THIN-LTO-ALL: flang: warning: the option '-flto=thin' is a work in progress +! THIN-LTO-ALL: flang{{.*}}: warning: the option '-flto=thin' is a work in progress ! THIN-LTO-ALL: "-fc1" ! THIN-LTO-ALL-SAME: "-flto=thin" ! THIN-LTO-LINKER-PLUGIN: "-plugin-opt=thinlto" diff --git a/flang/test/Driver/missing-input.f90 b/flang/test/Driver/missing-input.f90 index 51d37a718c542f..aeefbe14c20563 100644 --- a/flang/test/Driver/missing-input.f90 +++ b/flang/test/Driver/missing-input.f90 @@ -17,10 +17,10 @@ ! RUN: not %flang_fc1 %t.f90 2>&1 | FileCheck %s --check-prefix=FLANG-FC1-NONEXISTENT-FILE ! RUN: not %flang_fc1 %S 2>&1 | FileCheck %s --check-prefix=FLANG-FC1-DIR -! FLANG-NO-FILE: flang: error: no input files +! FLANG-NO-FILE: flang{{.*}}: error: no input files -! FLANG-NONEXISTENT-FILE: flang: error: no such file or directory: {{.*}} -! FLANG-NONEXISTENT-FILE: flang: error: no input files +! FLANG-NONEXISTENT-FILE: flang{{.*}}: error: no such file or directory: {{.*}} +! FLANG-NONEXISTENT-FILE: flang{{.*}}: error: no input files ! FLANG-FC1-NONEXISTENT-FILE: error: {{.*}} does not exist ! FLANG-FC1-DIR: error: {{.*}} is not a regular file diff --git a/flang/test/Driver/multiple-input-files.f90 b/flang/test/Driver/multiple-input-files.f90 index 64ec8679abf94f..0242db288babf2 100644 --- a/flang/test/Driver/multiple-input-files.f90 +++ b/flang/test/Driver/multiple-input-files.f90 @@ -39,7 +39,7 @@ ! FLANG-NEXT:end program hello ! TEST 2: `-o` does not when multiple input files are present -! ERROR: flang: error: cannot specify -o when generating multiple output files +! ERROR: flang{{.*}}: error: cannot specify -o when generating multiple output files ! TEST 3: The output file _was not_ specified - `flang_fc1` will process all ! input files and generate one output file for every input file. >From f2e919b65e9a8b833fd8a028a8c80c8c7439785c Mon Sep 17 00:00:00 2001 From: Brad Richardson Date: Fri, 4 Oct 2024 07:56:02 -0700 Subject: [PATCH 7/7] [flang][test] add check for OpenMP experimental warning --- flang/test/Driver/fopenmp.f90 | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/flang/test/Driver/fopenmp.f90 b/flang/test/Driver/fopenmp.f90 index 9b4dc5ffb1f690..b3c3547800bdba 100644 --- a/flang/test/Driver/fopenmp.f90 +++ b/flang/test/Driver/fopenmp.f90 @@ -73,3 +73,7 @@ ! ! CHECK-LD-ANYMD: "{{.*}}ld{{(.exe)?}}" ! CHECK-LD-ANYMD: "-l{{(omp|gomp|iomp5md)}}" +! +! RUN: %flang -fopenmp -c %s -### 2>&1 | FileCheck %s --check-prefix=CHECK-EXPERIMENTAL +! +! CHECK-EXPERIMENTAL: flang{{.*}}: warning: OpenMP support in flang is still experimental From openmp-commits at lists.llvm.org Wed Oct 9 19:49:03 2024 From: openmp-commits at lists.llvm.org (Brad Richardson via Openmp-commits) Date: Wed, 09 Oct 2024 19:49:03 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <6707409f.050a0220.8500e.0637@mx.google.com> everythingfunctional wrote: @kiranchandramohan , I don't have commit access, so could you merge for me? https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Thu Oct 10 01:26:19 2024 From: openmp-commits at lists.llvm.org (Kiran Chandramohan via Openmp-commits) Date: Thu, 10 Oct 2024 01:26:19 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <67078fab.170a0220.21cde9.1cad@mx.google.com> https://github.com/kiranchandramohan closed https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Thu Oct 10 02:21:50 2024 From: openmp-commits at lists.llvm.org (LLVM Continuous Integration via Openmp-commits) Date: Thu, 10 Oct 2024 02:21:50 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <67079cae.050a0220.3489cb.18bc@mx.google.com> llvm-ci wrote: LLVM Buildbot has detected a new failure on builder `sanitizer-aarch64-linux-fuzzer` running on `sanitizer-buildbot11` while building `.github,clang,flang,llvm,offload,openmp` at step 2 "annotate". Full details are available at: https://lab.llvm.org/buildbot/#/builders/159/builds/7879
Here is the relevant piece of the build log for the reference ``` Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure) (timed out) ... #6 0xacedca160438 in fuzzer::Fuzzer::Loop(std::__Fuzzer::vector>&) /home/b/sanitizer-aarch64-linux-fuzzer/build/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:910:5 #7 0xacedca14f828 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /home/b/sanitizer-aarch64-linux-fuzzer/build/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:915:6 #8 0xacedca178ec4 in main /home/b/sanitizer-aarch64-linux-fuzzer/build/llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10 #9 0xe71967a684c0 (/lib/aarch64-linux-gnu/libc.so.6+0x284c0) (BuildId: 32fa4d6f3a8d5f430bdb7af2eb779470cd5ec7c2) #10 0xe71967a68594 in __libc_start_main (/lib/aarch64-linux-gnu/libc.so.6+0x28594) (BuildId: 32fa4d6f3a8d5f430bdb7af2eb779470cd5ec7c2) #11 0xacedca143cac in _start (/home/b/sanitizer-aarch64-linux-fuzzer/build/RUNDIR-c-ares-CVE-2016-5180/c-ares-CVE-2016-5180-fsanitize_fuzzer+0xb3cac) SUMMARY: AddressSanitizer: heap-buffer-overflow ares_create_query.c in ares_create_query @@@BUILD_STEP test openssl-1.0.1f fuzzer@@@ Cloning into 'SRC'... command timed out: 1200 seconds without output running [b'python', b'../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py'], attempting to kill process killed by signal 9 program finished with exit code -1 elapsedTime=2985.777262 Step 13 (test openssl-1.0.1f fuzzer) failure: test openssl-1.0.1f fuzzer (failure) @@@BUILD_STEP test openssl-1.0.1f fuzzer@@@ Cloning into 'SRC'... command timed out: 1200 seconds without output running [b'python', b'../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py'], attempting to kill process killed by signal 9 program finished with exit code -1 elapsedTime=2985.777262 ```
https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Thu Oct 10 02:47:20 2024 From: openmp-commits at lists.llvm.org (LLVM Continuous Integration via Openmp-commits) Date: Thu, 10 Oct 2024 02:47:20 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <6707a2a8.170a0220.1416fb.1891@mx.google.com> llvm-ci wrote: LLVM Buildbot has detected a new failure on builder `sanitizer-x86_64-linux` running on `sanitizer-buildbot2` while building `.github,clang,flang,llvm,offload,openmp` at step 2 "annotate". Full details are available at: https://lab.llvm.org/buildbot/#/builders/66/builds/4738
Here is the relevant piece of the build log for the reference ``` Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure) ... [385/391] Generating FuzzerUtils-x86_64-Test [386/391] Generating MSAN_INST_GTEST.gtest-all.cc.x86_64.o [387/391] Generating MSAN_INST_TEST_OBJECTS.msan_test.cpp.x86_64-with-call.o [388/391] Generating Msan-x86_64-with-call-Test [389/391] Generating MSAN_INST_TEST_OBJECTS.msan_test.cpp.x86_64.o [390/391] Generating Msan-x86_64-Test [390/391] Running compiler_rt regression tests llvm-lit: /home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds. -- Testing: 4686 of 10415 tests, 88 workers -- Testing: 0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90. FAIL: libFuzzer-x86_64-static-libcxx-Linux :: reduce_inputs.test (4580 of 4686) ******************** TEST 'libFuzzer-x86_64-static-libcxx-Linux :: reduce_inputs.test' FAILED ******************** Exit Code: 1 Command Output (stderr): -- RUN: at line 3: rm -rf /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp/C + rm -rf /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp/C RUN: at line 4: mkdir -p /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp/C + mkdir -p /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp/C RUN: at line 5: /home/b/sanitizer-x86_64-linux/build/build_default/./bin/clang -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/fuzzer -m64 /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/ShrinkControlFlowSimpleTest.cpp -o /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp-ShrinkControlFlowSimpleTest + /home/b/sanitizer-x86_64-linux/build/build_default/./bin/clang -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/fuzzer -m64 /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/ShrinkControlFlowSimpleTest.cpp -o /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp-ShrinkControlFlowSimpleTest RUN: at line 6: /home/b/sanitizer-x86_64-linux/build/build_default/./bin/clang -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/fuzzer -m64 /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/ShrinkControlFlowTest.cpp -o /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp-ShrinkControlFlowTest + /home/b/sanitizer-x86_64-linux/build/build_default/./bin/clang -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/fuzzer -m64 /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/ShrinkControlFlowTest.cpp -o /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp-ShrinkControlFlowTest RUN: at line 7: /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp-ShrinkControlFlowSimpleTest -exit_on_item=0eb8e4ed029b774d80f2b66408203801cb982a60 -runs=1000000 /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp/C 2>&1 | FileCheck /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/reduce_inputs.test + /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp-ShrinkControlFlowSimpleTest -exit_on_item=0eb8e4ed029b774d80f2b66408203801cb982a60 -runs=1000000 /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp/C + FileCheck /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/reduce_inputs.test RUN: at line 11: /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp-ShrinkControlFlowSimpleTest -runs=0 /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp/C 2>&1 | FileCheck /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/reduce_inputs.test --check-prefix=COUNT + /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp-ShrinkControlFlowSimpleTest -runs=0 /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp/C + FileCheck /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/reduce_inputs.test --check-prefix=COUNT /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/reduce_inputs.test:12:8: error: COUNT: expected string not found in input COUNT: seed corpus: files: 4 ^ :1:1: note: scanning from here INFO: Running with entropic power schedule (0xFF, 100). ^ :7:7: note: possible intended match here INFO: seed corpus: files: 3 min: 2b max: 3b total: 7b rss: 31Mb ^ Input file: Check file: /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/reduce_inputs.test -dump-input=help explains the following input dump. Input was: <<<<<< 1: INFO: Running with entropic power schedule (0xFF, 100). check:12'0 X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found Step 11 (test compiler-rt debug) failure: test compiler-rt debug (failure) ... [385/391] Generating FuzzerUtils-x86_64-Test [386/391] Generating MSAN_INST_GTEST.gtest-all.cc.x86_64.o [387/391] Generating MSAN_INST_TEST_OBJECTS.msan_test.cpp.x86_64-with-call.o [388/391] Generating Msan-x86_64-with-call-Test [389/391] Generating MSAN_INST_TEST_OBJECTS.msan_test.cpp.x86_64.o [390/391] Generating Msan-x86_64-Test [390/391] Running compiler_rt regression tests llvm-lit: /home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds. -- Testing: 4686 of 10415 tests, 88 workers -- Testing: 0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90. FAIL: libFuzzer-x86_64-static-libcxx-Linux :: reduce_inputs.test (4580 of 4686) ******************** TEST 'libFuzzer-x86_64-static-libcxx-Linux :: reduce_inputs.test' FAILED ******************** Exit Code: 1 Command Output (stderr): -- RUN: at line 3: rm -rf /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp/C + rm -rf /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp/C RUN: at line 4: mkdir -p /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp/C + mkdir -p /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp/C RUN: at line 5: /home/b/sanitizer-x86_64-linux/build/build_default/./bin/clang -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/fuzzer -m64 /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/ShrinkControlFlowSimpleTest.cpp -o /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp-ShrinkControlFlowSimpleTest + /home/b/sanitizer-x86_64-linux/build/build_default/./bin/clang -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/fuzzer -m64 /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/ShrinkControlFlowSimpleTest.cpp -o /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp-ShrinkControlFlowSimpleTest RUN: at line 6: /home/b/sanitizer-x86_64-linux/build/build_default/./bin/clang -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/fuzzer -m64 /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/ShrinkControlFlowTest.cpp -o /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp-ShrinkControlFlowTest + /home/b/sanitizer-x86_64-linux/build/build_default/./bin/clang -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/fuzzer -m64 /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/ShrinkControlFlowTest.cpp -o /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp-ShrinkControlFlowTest RUN: at line 7: /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp-ShrinkControlFlowSimpleTest -exit_on_item=0eb8e4ed029b774d80f2b66408203801cb982a60 -runs=1000000 /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp/C 2>&1 | FileCheck /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/reduce_inputs.test + /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp-ShrinkControlFlowSimpleTest -exit_on_item=0eb8e4ed029b774d80f2b66408203801cb982a60 -runs=1000000 /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp/C + FileCheck /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/reduce_inputs.test RUN: at line 11: /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp-ShrinkControlFlowSimpleTest -runs=0 /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp/C 2>&1 | FileCheck /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/reduce_inputs.test --check-prefix=COUNT + /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp-ShrinkControlFlowSimpleTest -runs=0 /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64StaticLibcxxLinuxConfig/Output/reduce_inputs.test.tmp/C + FileCheck /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/reduce_inputs.test --check-prefix=COUNT /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/reduce_inputs.test:12:8: error: COUNT: expected string not found in input COUNT: seed corpus: files: 4 ^ :1:1: note: scanning from here INFO: Running with entropic power schedule (0xFF, 100). ^ :7:7: note: possible intended match here INFO: seed corpus: files: 3 min: 2b max: 3b total: 7b rss: 31Mb ^ Input file: Check file: /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/reduce_inputs.test -dump-input=help explains the following input dump. Input was: <<<<<< 1: INFO: Running with entropic power schedule (0xFF, 100). check:12'0 X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found ```
https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Thu Oct 10 03:35:40 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Thu, 10 Oct 2024 03:35:40 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <6707adfc.170a0220.3ea1f.1f54@mx.google.com> h-vetinari wrote: Congratulations on this huge milestone to all involved in flang! 🍾 🥳 🚀 https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Thu Oct 10 05:53:11 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Thu, 10 Oct 2024 05:53:11 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <6707ce37.170a0220.311702.240a@mx.google.com> RichBarton-Arm wrote: +1000 Great to see this finally happen. Great job everyone! https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Thu Oct 10 05:54:43 2024 From: openmp-commits at lists.llvm.org (Nikita Popov via Openmp-commits) Date: Thu, 10 Oct 2024 05:54:43 -0700 (PDT) Subject: [Openmp-commits] [openmp] [openmp] Use core_siblings_list if physical_package_id not available (PR #111831) Message-ID: https://github.com/nikic created https://github.com/llvm/llvm-project/pull/111831 On powerpc, physical_package_id may not be available. Currently, this causes openmp to fall back to flat topology and various affinity tests fail. Fix this by parsing core_siblings_list to deterimine which cpus belong to the same socket. This matches what the testing code does. The code to parse the CPU list format thankfully already exists. Fixes https://github.com/llvm/llvm-project/issues/111809. >From fce625888fc87d6eb572e4cba39ef7cc72fef466 Mon Sep 17 00:00:00 2001 From: Nikita Popov Date: Thu, 10 Oct 2024 12:47:33 +0000 Subject: [PATCH] [openmp] Use core_siblings_list if physical_package_id not available On powerpc, physical_package_id may not be available. Currently, this causes openmp to fall back to flat topology and various affinity tests fail. Fix this by parsing core_siblings_list to deterimine which cpus belong to the same socket. This matches what the testing code does. The code to parse the CPU list format thankfully already exists. Fixes https://github.com/llvm/llvm-project/issues/111809. --- openmp/runtime/src/kmp_affinity.cpp | 93 ++++++++++++++------ openmp/runtime/test/affinity/kmp-hw-subset.c | 2 +- 2 files changed, 65 insertions(+), 30 deletions(-) diff --git a/openmp/runtime/src/kmp_affinity.cpp b/openmp/runtime/src/kmp_affinity.cpp index cf5cad04eb57d5..9074a00c0d9c4e 100644 --- a/openmp/runtime/src/kmp_affinity.cpp +++ b/openmp/runtime/src/kmp_affinity.cpp @@ -1589,15 +1589,13 @@ kmp_str_buf_t *__kmp_affinity_str_buf_mask(kmp_str_buf_t *buf, return buf; } -// Return (possibly empty) affinity mask representing the offline CPUs -// Caller must free the mask -kmp_affin_mask_t *__kmp_affinity_get_offline_cpus() { - kmp_affin_mask_t *offline; - KMP_CPU_ALLOC(offline); - KMP_CPU_ZERO(offline); +static kmp_affin_mask_t *__kmp_parse_cpu_list(const char *path) { + kmp_affin_mask_t *mask; + KMP_CPU_ALLOC(mask); + KMP_CPU_ZERO(mask); #if KMP_OS_LINUX int n, begin_cpu, end_cpu; - kmp_safe_raii_file_t offline_file; + kmp_safe_raii_file_t file; auto skip_ws = [](FILE *f) { int c; do { @@ -1606,29 +1604,29 @@ kmp_affin_mask_t *__kmp_affinity_get_offline_cpus() { if (c != EOF) ungetc(c, f); }; - // File contains CSV of integer ranges representing the offline CPUs + // File contains CSV of integer ranges representing the CPUs // e.g., 1,2,4-7,9,11-15 - int status = offline_file.try_open("/sys/devices/system/cpu/offline", "r"); + int status = file.try_open(path, "r"); if (status != 0) - return offline; - while (!feof(offline_file)) { - skip_ws(offline_file); - n = fscanf(offline_file, "%d", &begin_cpu); + return mask; + while (!feof(file)) { + skip_ws(file); + n = fscanf(file, "%d", &begin_cpu); if (n != 1) break; - skip_ws(offline_file); - int c = fgetc(offline_file); + skip_ws(file); + int c = fgetc(file); if (c == EOF || c == ',') { // Just single CPU end_cpu = begin_cpu; } else if (c == '-') { // Range of CPUs - skip_ws(offline_file); - n = fscanf(offline_file, "%d", &end_cpu); + skip_ws(file); + n = fscanf(file, "%d", &end_cpu); if (n != 1) break; - skip_ws(offline_file); - c = fgetc(offline_file); // skip ',' + skip_ws(file); + c = fgetc(file); // skip ',' } else { // Syntax problem break; @@ -1638,13 +1636,19 @@ kmp_affin_mask_t *__kmp_affinity_get_offline_cpus() { end_cpu >= __kmp_xproc || begin_cpu > end_cpu) { continue; } - // Insert [begin_cpu, end_cpu] into offline mask + // Insert [begin_cpu, end_cpu] into mask for (int cpu = begin_cpu; cpu <= end_cpu; ++cpu) { - KMP_CPU_SET(cpu, offline); + KMP_CPU_SET(cpu, mask); } } #endif - return offline; + return mask; +} + +// Return (possibly empty) affinity mask representing the offline CPUs +// Caller must free the mask +kmp_affin_mask_t *__kmp_affinity_get_offline_cpus() { + return __kmp_parse_cpu_list("/sys/devices/system/cpu/offline"); } // Return the number of available procs @@ -3175,6 +3179,30 @@ static inline const char *__kmp_cpuinfo_get_envvar() { return envvar; } +static bool __kmp_package_id_from_core_sibling_list(unsigned **threadInfo, + unsigned idx) { + char path[256]; + KMP_SNPRINTF( + path, sizeof(path), + "/sys/devices/system/cpu/cpu%u/topology/core_siblings_list", + threadInfo[idx][osIdIndex]); + kmp_affin_mask_t *siblings = __kmp_parse_cpu_list(path); + for (unsigned i = 0; i < __kmp_xproc; ++i) { + if (!KMP_CPU_ISSET(i, siblings)) + continue; + if (threadInfo[i][pkgIdIndex] == UINT_MAX) { + // Arbitrarily pick the first index we encounter, it only matters that + // the value is the same for all siblings. + threadInfo[i][pkgIdIndex] = idx; + } else if (threadInfo[i][pkgIdIndex] != idx) { + // Contradictory sibling lists. + return false; + } + } + KMP_CPU_FREE(siblings); + return true; +} + // Parse /proc/cpuinfo (or an alternate file in the same format) to obtain the // affinity map. On AIX, the map is obtained through system SRAD (Scheduler // Resource Allocation Domain). @@ -3550,18 +3578,13 @@ static bool __kmp_affinity_create_cpuinfo_map(int *line, return false; } - // Check for missing fields. The osId field must be there, and we - // currently require that the physical id field is specified, also. + // Check for missing fields. The osId field must be there. The physical + // id field will be checked later. if (threadInfo[num_avail][osIdIndex] == UINT_MAX) { CLEANUP_THREAD_INFO; *msg_id = kmp_i18n_str_MissingProcField; return false; } - if (threadInfo[0][pkgIdIndex] == UINT_MAX) { - CLEANUP_THREAD_INFO; - *msg_id = kmp_i18n_str_MissingPhysicalIDField; - return false; - } // Skip this proc if it is not included in the machine model. if (KMP_AFFINITY_CAPABLE() && @@ -3591,6 +3614,18 @@ static bool __kmp_affinity_create_cpuinfo_map(int *line, } *line = 0; + // At least on powerpc, Linux may return -1 for physical_package_id. Try + // to reconstruct topology from core_sibling_list in that case. + for (i = 0; i < num_avail; ++i) { + if (threadInfo[i][pkgIdIndex] == UINT_MAX) { + if (!__kmp_package_id_from_core_sibling_list(threadInfo, i)) { + CLEANUP_THREAD_INFO; + *msg_id = kmp_i18n_str_MissingPhysicalIDField; + return false; + } + } + } + #if KMP_MIC && REDUCE_TEAM_SIZE unsigned teamSize = 0; #endif // KMP_MIC && REDUCE_TEAM_SIZE diff --git a/openmp/runtime/test/affinity/kmp-hw-subset.c b/openmp/runtime/test/affinity/kmp-hw-subset.c index 606fcdfbada95a..0b49969bd3b10c 100644 --- a/openmp/runtime/test/affinity/kmp-hw-subset.c +++ b/openmp/runtime/test/affinity/kmp-hw-subset.c @@ -25,7 +25,7 @@ static int compare_hw_subset_places(const place_list_t *openmp_places, expected_per_place = nthreads_per_core; } else { expected_total = nsockets; - expected_per_place = ncores_per_socket; + expected_per_place = ncores_per_socket * nthreads_per_core; } if (openmp_places->num_places != expected_total) { fprintf(stderr, "error: KMP_HW_SUBSET did not half each resource layer!\n"); From openmp-commits at lists.llvm.org Thu Oct 10 05:58:00 2024 From: openmp-commits at lists.llvm.org (Nikita Popov via Openmp-commits) Date: Thu, 10 Oct 2024 05:58:00 -0700 (PDT) Subject: [Openmp-commits] [openmp] [openmp] Use core_siblings_list if physical_package_id not available (PR #111831) In-Reply-To: Message-ID: <6707cf58.050a0220.3a9dc1.2f59@mx.google.com> ================ @@ -25,7 +25,7 @@ static int compare_hw_subset_places(const place_list_t *openmp_places, expected_per_place = nthreads_per_core; } else { expected_total = nsockets; - expected_per_place = ncores_per_socket; + expected_per_place = ncores_per_socket * nthreads_per_core; ---------------- nikic wrote: Unless I'm misunderstanding something, the count should always be in terms of threads. I think maybe this test has been getting away with it, because on x86 the number of threads per core is at most 2, so after halving it it is always 1 and this multiplication does not matter. On the ppc system I'm testing the number of threads per core is 6, so after halving it's 3 and the test would fail if we don't multiply here. https://github.com/llvm/llvm-project/pull/111831 From openmp-commits at lists.llvm.org Thu Oct 10 06:00:59 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Thu, 10 Oct 2024 06:00:59 -0700 (PDT) Subject: [Openmp-commits] [openmp] [openmp] Use core_siblings_list if physical_package_id not available (PR #111831) In-Reply-To: Message-ID: <6707d00b.630a0220.81d98.267a@mx.google.com> github-actions[bot] wrote: :warning: C/C++ code formatter, clang-format found issues in your code. :warning:
You can test this locally with the following command: ``````````bash git-clang-format --diff ced15cd418d96fc3d078e687bdcc5875656c71f6 fce625888fc87d6eb572e4cba39ef7cc72fef466 --extensions c,cpp -- openmp/runtime/src/kmp_affinity.cpp openmp/runtime/test/affinity/kmp-hw-subset.c ``````````
View the diff from clang-format here. ``````````diff diff --git a/openmp/runtime/src/kmp_affinity.cpp b/openmp/runtime/src/kmp_affinity.cpp index 9074a00c0d..099fe203e9 100644 --- a/openmp/runtime/src/kmp_affinity.cpp +++ b/openmp/runtime/src/kmp_affinity.cpp @@ -3182,10 +3182,9 @@ static inline const char *__kmp_cpuinfo_get_envvar() { static bool __kmp_package_id_from_core_sibling_list(unsigned **threadInfo, unsigned idx) { char path[256]; - KMP_SNPRINTF( - path, sizeof(path), - "/sys/devices/system/cpu/cpu%u/topology/core_siblings_list", - threadInfo[idx][osIdIndex]); + KMP_SNPRINTF(path, sizeof(path), + "/sys/devices/system/cpu/cpu%u/topology/core_siblings_list", + threadInfo[idx][osIdIndex]); kmp_affin_mask_t *siblings = __kmp_parse_cpu_list(path); for (unsigned i = 0; i < __kmp_xproc; ++i) { if (!KMP_CPU_ISSET(i, siblings)) ``````````
https://github.com/llvm/llvm-project/pull/111831 From openmp-commits at lists.llvm.org Thu Oct 10 06:05:55 2024 From: openmp-commits at lists.llvm.org (Nikita Popov via Openmp-commits) Date: Thu, 10 Oct 2024 06:05:55 -0700 (PDT) Subject: [Openmp-commits] [openmp] [openmp] Use core_siblings_list if physical_package_id not available (PR #111831) In-Reply-To: Message-ID: <6707d133.050a0220.aaf10.2cd7@mx.google.com> https://github.com/nikic updated https://github.com/llvm/llvm-project/pull/111831 >From 261649d5aeff3f09150ac1fb67cb1d8c09899be2 Mon Sep 17 00:00:00 2001 From: Nikita Popov Date: Thu, 10 Oct 2024 12:47:33 +0000 Subject: [PATCH] [openmp] Use core_siblings_list if physical_package_id not available On powerpc, physical_package_id may not be available. Currently, this causes openmp to fall back to flat topology and various affinity tests fail. Fix this by parsing core_siblings_list to deterimine which cpus belong to the same socket. This matches what the testing code does. The code to parse the CPU list format thankfully already exists. Fixes https://github.com/llvm/llvm-project/issues/111809. --- openmp/runtime/src/kmp_affinity.cpp | 92 ++++++++++++++------ openmp/runtime/test/affinity/kmp-hw-subset.c | 2 +- 2 files changed, 64 insertions(+), 30 deletions(-) diff --git a/openmp/runtime/src/kmp_affinity.cpp b/openmp/runtime/src/kmp_affinity.cpp index cf5cad04eb57d5..099fe203e97b4b 100644 --- a/openmp/runtime/src/kmp_affinity.cpp +++ b/openmp/runtime/src/kmp_affinity.cpp @@ -1589,15 +1589,13 @@ kmp_str_buf_t *__kmp_affinity_str_buf_mask(kmp_str_buf_t *buf, return buf; } -// Return (possibly empty) affinity mask representing the offline CPUs -// Caller must free the mask -kmp_affin_mask_t *__kmp_affinity_get_offline_cpus() { - kmp_affin_mask_t *offline; - KMP_CPU_ALLOC(offline); - KMP_CPU_ZERO(offline); +static kmp_affin_mask_t *__kmp_parse_cpu_list(const char *path) { + kmp_affin_mask_t *mask; + KMP_CPU_ALLOC(mask); + KMP_CPU_ZERO(mask); #if KMP_OS_LINUX int n, begin_cpu, end_cpu; - kmp_safe_raii_file_t offline_file; + kmp_safe_raii_file_t file; auto skip_ws = [](FILE *f) { int c; do { @@ -1606,29 +1604,29 @@ kmp_affin_mask_t *__kmp_affinity_get_offline_cpus() { if (c != EOF) ungetc(c, f); }; - // File contains CSV of integer ranges representing the offline CPUs + // File contains CSV of integer ranges representing the CPUs // e.g., 1,2,4-7,9,11-15 - int status = offline_file.try_open("/sys/devices/system/cpu/offline", "r"); + int status = file.try_open(path, "r"); if (status != 0) - return offline; - while (!feof(offline_file)) { - skip_ws(offline_file); - n = fscanf(offline_file, "%d", &begin_cpu); + return mask; + while (!feof(file)) { + skip_ws(file); + n = fscanf(file, "%d", &begin_cpu); if (n != 1) break; - skip_ws(offline_file); - int c = fgetc(offline_file); + skip_ws(file); + int c = fgetc(file); if (c == EOF || c == ',') { // Just single CPU end_cpu = begin_cpu; } else if (c == '-') { // Range of CPUs - skip_ws(offline_file); - n = fscanf(offline_file, "%d", &end_cpu); + skip_ws(file); + n = fscanf(file, "%d", &end_cpu); if (n != 1) break; - skip_ws(offline_file); - c = fgetc(offline_file); // skip ',' + skip_ws(file); + c = fgetc(file); // skip ',' } else { // Syntax problem break; @@ -1638,13 +1636,19 @@ kmp_affin_mask_t *__kmp_affinity_get_offline_cpus() { end_cpu >= __kmp_xproc || begin_cpu > end_cpu) { continue; } - // Insert [begin_cpu, end_cpu] into offline mask + // Insert [begin_cpu, end_cpu] into mask for (int cpu = begin_cpu; cpu <= end_cpu; ++cpu) { - KMP_CPU_SET(cpu, offline); + KMP_CPU_SET(cpu, mask); } } #endif - return offline; + return mask; +} + +// Return (possibly empty) affinity mask representing the offline CPUs +// Caller must free the mask +kmp_affin_mask_t *__kmp_affinity_get_offline_cpus() { + return __kmp_parse_cpu_list("/sys/devices/system/cpu/offline"); } // Return the number of available procs @@ -3175,6 +3179,29 @@ static inline const char *__kmp_cpuinfo_get_envvar() { return envvar; } +static bool __kmp_package_id_from_core_sibling_list(unsigned **threadInfo, + unsigned idx) { + char path[256]; + KMP_SNPRINTF(path, sizeof(path), + "/sys/devices/system/cpu/cpu%u/topology/core_siblings_list", + threadInfo[idx][osIdIndex]); + kmp_affin_mask_t *siblings = __kmp_parse_cpu_list(path); + for (unsigned i = 0; i < __kmp_xproc; ++i) { + if (!KMP_CPU_ISSET(i, siblings)) + continue; + if (threadInfo[i][pkgIdIndex] == UINT_MAX) { + // Arbitrarily pick the first index we encounter, it only matters that + // the value is the same for all siblings. + threadInfo[i][pkgIdIndex] = idx; + } else if (threadInfo[i][pkgIdIndex] != idx) { + // Contradictory sibling lists. + return false; + } + } + KMP_CPU_FREE(siblings); + return true; +} + // Parse /proc/cpuinfo (or an alternate file in the same format) to obtain the // affinity map. On AIX, the map is obtained through system SRAD (Scheduler // Resource Allocation Domain). @@ -3550,18 +3577,13 @@ static bool __kmp_affinity_create_cpuinfo_map(int *line, return false; } - // Check for missing fields. The osId field must be there, and we - // currently require that the physical id field is specified, also. + // Check for missing fields. The osId field must be there. The physical + // id field will be checked later. if (threadInfo[num_avail][osIdIndex] == UINT_MAX) { CLEANUP_THREAD_INFO; *msg_id = kmp_i18n_str_MissingProcField; return false; } - if (threadInfo[0][pkgIdIndex] == UINT_MAX) { - CLEANUP_THREAD_INFO; - *msg_id = kmp_i18n_str_MissingPhysicalIDField; - return false; - } // Skip this proc if it is not included in the machine model. if (KMP_AFFINITY_CAPABLE() && @@ -3591,6 +3613,18 @@ static bool __kmp_affinity_create_cpuinfo_map(int *line, } *line = 0; + // At least on powerpc, Linux may return -1 for physical_package_id. Try + // to reconstruct topology from core_sibling_list in that case. + for (i = 0; i < num_avail; ++i) { + if (threadInfo[i][pkgIdIndex] == UINT_MAX) { + if (!__kmp_package_id_from_core_sibling_list(threadInfo, i)) { + CLEANUP_THREAD_INFO; + *msg_id = kmp_i18n_str_MissingPhysicalIDField; + return false; + } + } + } + #if KMP_MIC && REDUCE_TEAM_SIZE unsigned teamSize = 0; #endif // KMP_MIC && REDUCE_TEAM_SIZE diff --git a/openmp/runtime/test/affinity/kmp-hw-subset.c b/openmp/runtime/test/affinity/kmp-hw-subset.c index 606fcdfbada95a..0b49969bd3b10c 100644 --- a/openmp/runtime/test/affinity/kmp-hw-subset.c +++ b/openmp/runtime/test/affinity/kmp-hw-subset.c @@ -25,7 +25,7 @@ static int compare_hw_subset_places(const place_list_t *openmp_places, expected_per_place = nthreads_per_core; } else { expected_total = nsockets; - expected_per_place = ncores_per_socket; + expected_per_place = ncores_per_socket * nthreads_per_core; } if (openmp_places->num_places != expected_total) { fprintf(stderr, "error: KMP_HW_SUBSET did not half each resource layer!\n"); From openmp-commits at lists.llvm.org Thu Oct 10 06:17:32 2024 From: openmp-commits at lists.llvm.org (Nikita Popov via Openmp-commits) Date: Thu, 10 Oct 2024 06:17:32 -0700 (PDT) Subject: [Openmp-commits] [openmp] [openmp] Use core_siblings_list if physical_package_id not available (PR #111831) In-Reply-To: Message-ID: <6707d3ec.170a0220.dbcb2.35ae@mx.google.com> https://github.com/nikic updated https://github.com/llvm/llvm-project/pull/111831 >From c4cedb269c5dd82bb5d1c7a3cd24b74b1c8ea1f7 Mon Sep 17 00:00:00 2001 From: Nikita Popov Date: Thu, 10 Oct 2024 12:47:33 +0000 Subject: [PATCH] [openmp] Use core_siblings_list if physical_package_id not available On powerpc, physical_package_id may not be available. Currently, this causes openmp to fall back to flat topology and various affinity tests fail. Fix this by parsing core_siblings_list to deterimine which cpus belong to the same socket. This matches what the testing code does. The code to parse the CPU list format thankfully already exists. Fixes https://github.com/llvm/llvm-project/issues/111809. --- openmp/runtime/src/kmp_affinity.cpp | 93 ++++++++++++++------ openmp/runtime/test/affinity/kmp-hw-subset.c | 2 +- 2 files changed, 65 insertions(+), 30 deletions(-) diff --git a/openmp/runtime/src/kmp_affinity.cpp b/openmp/runtime/src/kmp_affinity.cpp index cf5cad04eb57d5..d4de843812172f 100644 --- a/openmp/runtime/src/kmp_affinity.cpp +++ b/openmp/runtime/src/kmp_affinity.cpp @@ -1589,15 +1589,13 @@ kmp_str_buf_t *__kmp_affinity_str_buf_mask(kmp_str_buf_t *buf, return buf; } -// Return (possibly empty) affinity mask representing the offline CPUs -// Caller must free the mask -kmp_affin_mask_t *__kmp_affinity_get_offline_cpus() { - kmp_affin_mask_t *offline; - KMP_CPU_ALLOC(offline); - KMP_CPU_ZERO(offline); +static kmp_affin_mask_t *__kmp_parse_cpu_list(const char *path) { + kmp_affin_mask_t *mask; + KMP_CPU_ALLOC(mask); + KMP_CPU_ZERO(mask); #if KMP_OS_LINUX int n, begin_cpu, end_cpu; - kmp_safe_raii_file_t offline_file; + kmp_safe_raii_file_t file; auto skip_ws = [](FILE *f) { int c; do { @@ -1606,29 +1604,29 @@ kmp_affin_mask_t *__kmp_affinity_get_offline_cpus() { if (c != EOF) ungetc(c, f); }; - // File contains CSV of integer ranges representing the offline CPUs + // File contains CSV of integer ranges representing the CPUs // e.g., 1,2,4-7,9,11-15 - int status = offline_file.try_open("/sys/devices/system/cpu/offline", "r"); + int status = file.try_open(path, "r"); if (status != 0) - return offline; - while (!feof(offline_file)) { - skip_ws(offline_file); - n = fscanf(offline_file, "%d", &begin_cpu); + return mask; + while (!feof(file)) { + skip_ws(file); + n = fscanf(file, "%d", &begin_cpu); if (n != 1) break; - skip_ws(offline_file); - int c = fgetc(offline_file); + skip_ws(file); + int c = fgetc(file); if (c == EOF || c == ',') { // Just single CPU end_cpu = begin_cpu; } else if (c == '-') { // Range of CPUs - skip_ws(offline_file); - n = fscanf(offline_file, "%d", &end_cpu); + skip_ws(file); + n = fscanf(file, "%d", &end_cpu); if (n != 1) break; - skip_ws(offline_file); - c = fgetc(offline_file); // skip ',' + skip_ws(file); + c = fgetc(file); // skip ',' } else { // Syntax problem break; @@ -1638,13 +1636,19 @@ kmp_affin_mask_t *__kmp_affinity_get_offline_cpus() { end_cpu >= __kmp_xproc || begin_cpu > end_cpu) { continue; } - // Insert [begin_cpu, end_cpu] into offline mask + // Insert [begin_cpu, end_cpu] into mask for (int cpu = begin_cpu; cpu <= end_cpu; ++cpu) { - KMP_CPU_SET(cpu, offline); + KMP_CPU_SET(cpu, mask); } } #endif - return offline; + return mask; +} + +// Return (possibly empty) affinity mask representing the offline CPUs +// Caller must free the mask +kmp_affin_mask_t *__kmp_affinity_get_offline_cpus() { + return __kmp_parse_cpu_list("/sys/devices/system/cpu/offline"); } // Return the number of available procs @@ -3175,6 +3179,30 @@ static inline const char *__kmp_cpuinfo_get_envvar() { return envvar; } +static bool __kmp_package_id_from_core_sibling_list(unsigned **threadInfo, + unsigned idx) { + char path[256]; + KMP_SNPRINTF(path, sizeof(path), + "/sys/devices/system/cpu/cpu%u/topology/core_siblings_list", + threadInfo[idx][osIdIndex]); + kmp_affin_mask_t *siblings = __kmp_parse_cpu_list(path); + for (unsigned i = 0; i < __kmp_xproc; ++i) { + if (!KMP_CPU_ISSET(i, siblings)) + continue; + if (threadInfo[i][pkgIdIndex] == UINT_MAX) { + // Arbitrarily pick the first index we encounter, it only matters that + // the value is the same for all siblings. + threadInfo[i][pkgIdIndex] = idx; + } else if (threadInfo[i][pkgIdIndex] != idx) { + // Contradictory sibling lists. + KMP_CPU_FREE(siblings); + return false; + } + } + KMP_CPU_FREE(siblings); + return true; +} + // Parse /proc/cpuinfo (or an alternate file in the same format) to obtain the // affinity map. On AIX, the map is obtained through system SRAD (Scheduler // Resource Allocation Domain). @@ -3550,18 +3578,13 @@ static bool __kmp_affinity_create_cpuinfo_map(int *line, return false; } - // Check for missing fields. The osId field must be there, and we - // currently require that the physical id field is specified, also. + // Check for missing fields. The osId field must be there. The physical + // id field will be checked later. if (threadInfo[num_avail][osIdIndex] == UINT_MAX) { CLEANUP_THREAD_INFO; *msg_id = kmp_i18n_str_MissingProcField; return false; } - if (threadInfo[0][pkgIdIndex] == UINT_MAX) { - CLEANUP_THREAD_INFO; - *msg_id = kmp_i18n_str_MissingPhysicalIDField; - return false; - } // Skip this proc if it is not included in the machine model. if (KMP_AFFINITY_CAPABLE() && @@ -3591,6 +3614,18 @@ static bool __kmp_affinity_create_cpuinfo_map(int *line, } *line = 0; + // At least on powerpc, Linux may return -1 for physical_package_id. Try + // to reconstruct topology from core_sibling_list in that case. + for (i = 0; i < num_avail; ++i) { + if (threadInfo[i][pkgIdIndex] == UINT_MAX) { + if (!__kmp_package_id_from_core_sibling_list(threadInfo, i)) { + CLEANUP_THREAD_INFO; + *msg_id = kmp_i18n_str_MissingPhysicalIDField; + return false; + } + } + } + #if KMP_MIC && REDUCE_TEAM_SIZE unsigned teamSize = 0; #endif // KMP_MIC && REDUCE_TEAM_SIZE diff --git a/openmp/runtime/test/affinity/kmp-hw-subset.c b/openmp/runtime/test/affinity/kmp-hw-subset.c index 606fcdfbada95a..0b49969bd3b10c 100644 --- a/openmp/runtime/test/affinity/kmp-hw-subset.c +++ b/openmp/runtime/test/affinity/kmp-hw-subset.c @@ -25,7 +25,7 @@ static int compare_hw_subset_places(const place_list_t *openmp_places, expected_per_place = nthreads_per_core; } else { expected_total = nsockets; - expected_per_place = ncores_per_socket; + expected_per_place = ncores_per_socket * nthreads_per_core; } if (openmp_places->num_places != expected_total) { fprintf(stderr, "error: KMP_HW_SUBSET did not half each resource layer!\n"); From openmp-commits at lists.llvm.org Thu Oct 10 11:01:19 2024 From: openmp-commits at lists.llvm.org (Peter Klausler via Openmp-commits) Date: Thu, 10 Oct 2024 11:01:19 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <6708166f.170a0220.569b3.009c@mx.google.com> klausler wrote: There are multiple weird failures in llvm-test-suite where clang++ complains about unknown options that look like they're slang-only options. Did you run llvm-test-suite before merging? Please check. These failures are going to cause build bot failures shortly in the clang-*-2stage builds if not repaired or reverted. https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Thu Oct 10 11:09:46 2024 From: openmp-commits at lists.llvm.org (Brad Richardson via Openmp-commits) Date: Thu, 10 Oct 2024 11:09:46 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <6708186a.170a0220.122b03.5115@mx.google.com> everythingfunctional wrote: @klausler , can you point me specifically at the failure messages? The failures I've seen so far from the merge did not look related to my (admittedly untrained) eye. I ran the tests in the repo and they passed locally. Are there other tests somewhere that need to be updated as well? https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Thu Oct 10 11:11:11 2024 From: openmp-commits at lists.llvm.org (Peter Klausler via Openmp-commits) Date: Thu, 10 Oct 2024 11:11:11 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <670818bf.170a0220.17ab24.4ce6@mx.google.com> klausler wrote: > @klausler , can you point me specifically at the failure messages? The failures I've seen so far from the merge did not look related to my (admittedly untrained) eye. I ran the tests in the repo and they passed locally. Are there other tests somewhere that need to be updated as well? They haven't shown up in the build bots, so it must be something in my local environment that I've missed. If you ran llvm-test-suite's Fortran tests cleanly, then don't worry. https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Thu Oct 10 11:15:12 2024 From: openmp-commits at lists.llvm.org (Kiran Chandramohan via Openmp-commits) Date: Thu, 10 Oct 2024 11:15:12 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <670819b0.170a0220.1918f7.4df3@mx.google.com> kiranchandramohan wrote: The 2 stage buildbot (https://lab.llvm.org/buildbot/#/builders/41) with the Fortran tests passed after this merge. Its latest run is also a pass. https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Thu Oct 10 11:18:27 2024 From: openmp-commits at lists.llvm.org (Kiran Chandramohan via Openmp-commits) Date: Thu, 10 Oct 2024 11:18:27 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <67081a73.170a0220.a11fc.4d93@mx.google.com> kiranchandramohan wrote: @everythingfunctional Following are the steps to run the testsuite. Please check and confirm if you did not run it before. ``` git clone https://github.com/llvm/llvm-test-suite.git cd llvm-test-suite mkdir build cd build cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=$HOME/llvm-project/build_release/bin/clang -DCMAKE_CXX_COMPILER=$HOME/llvm-project/build_release/bin/clang++ -DCMAKE_Fortran_COMPILER=$HOME/llvm-project/build_release/bin/flang-new -DTEST_SUITE_FORTRAN=On -DTEST_SUITE_SUBDIRS=Fortran -DTEST_SUITE_FORTRAN_ISO_C_HEADER_DIR=$HOME/llvm-project/flang/include/flang ../ make -j48 NO_STOP_MESSAGE=1 $HOME/llvm-project/build_release/bin/llvm-lit -v . ``` https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Thu Oct 10 11:49:12 2024 From: openmp-commits at lists.llvm.org (Brad Richardson via Openmp-commits) Date: Thu, 10 Oct 2024 11:49:12 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <670821a8.170a0220.26f3f9.05e8@mx.google.com> everythingfunctional wrote: > @everythingfunctional Following are the steps to run the testsuite. Please check and confirm if you did not run it before. > > ``` > git clone https://github.com/llvm/llvm-test-suite.git > cd llvm-test-suite > mkdir build > cd build > cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=$HOME/llvm-project/build_release/bin/clang -DCMAKE_CXX_COMPILER=$HOME/llvm-project/build_release/bin/clang++ -DCMAKE_Fortran_COMPILER=$HOME/llvm-project/build_release/bin/flang-new -DTEST_SUITE_FORTRAN=On -DTEST_SUITE_SUBDIRS=Fortran -DTEST_SUITE_FORTRAN_ISO_C_HEADER_DIR=$HOME/llvm-project/flang/include/flang ../ > make -j48 > NO_STOP_MESSAGE=1 $HOME/llvm-project/build_release/bin/llvm-lit -v . > ``` Only failure I got was: ```text Failed Tests (1): test-suite :: Fortran/gfortran/regression/gomp/gfortran-regression-compile-regression__gomp__proc_ptr_1_f90.test ``` Which I don't think the rename is at fault for. https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Thu Oct 10 11:51:29 2024 From: openmp-commits at lists.llvm.org (Peter Klausler via Openmp-commits) Date: Thu, 10 Oct 2024 11:51:29 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <67082231.050a0220.56e0.57ce@mx.google.com> klausler wrote: > Which I don't think the rename is at fault for. If you rebase to current llvm-project/main, that test will (or should) work; it got broken yesterday and was fixed later. https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Fri Oct 11 05:04:05 2024 From: openmp-commits at lists.llvm.org (Josep Pinot via Openmp-commits) Date: Fri, 11 Oct 2024 05:04:05 -0700 (PDT) Subject: [Openmp-commits] [openmp] [OpenMP] Fix missing gtid argument in __kmp_print_tdg_dot function (PR #111986) Message-ID: https://github.com/jpinot created https://github.com/llvm/llvm-project/pull/111986 This patch modifies the signature of the `__kmp_print_tdg_dot` function in `kmp_tasking.cpp` to include the global thread ID (gtid) as an argument. The gtid is now correctly passed to the function. - Updated the function declaration to accept the gtid parameter. - Modified all calls to `__kmp_print_tdg_dot` to pass the correct gtid value. This change addresses issues encountered when compiling with `OMPX_TASKGRAPH` enabled. No functional changes are expected beyond successful compilation. >From 6c6d334497bf01d2f72261092a0ee00409ce01b6 Mon Sep 17 00:00:00 2001 From: jpinot Date: Fri, 11 Oct 2024 13:47:46 +0200 Subject: [PATCH] [OpenMP] Fix missing input argument in __kmp_print_tdg_dot function Modified __kmp_print_tdg_dot to accept the global thread ID (gtid). Updated all calls to pass the gtid, ensuring correct task identification when printing the task dependency graph (TDG). --- openmp/runtime/src/kmp_tasking.cpp | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/openmp/runtime/src/kmp_tasking.cpp b/openmp/runtime/src/kmp_tasking.cpp index 7edaa8e127e52c..932799e133b45b 100644 --- a/openmp/runtime/src/kmp_tasking.cpp +++ b/openmp/runtime/src/kmp_tasking.cpp @@ -5491,7 +5491,8 @@ static kmp_tdg_info_t *__kmp_find_tdg(kmp_int32 tdg_id) { // __kmp_print_tdg_dot: prints the TDG to a dot file // tdg: ID of the TDG -void __kmp_print_tdg_dot(kmp_tdg_info_t *tdg) { +// gtid: Global Thread ID +void __kmp_print_tdg_dot(kmp_tdg_info_t *tdg, kmp_int32 gtid) { kmp_int32 tdg_id = tdg->tdg_id; KA_TRACE(10, ("__kmp_print_tdg_dot(enter): T#%d tdg_id=%d \n", gtid, tdg_id)); @@ -5693,7 +5694,7 @@ void __kmp_end_record(kmp_int32 gtid, kmp_tdg_info_t *tdg) { KMP_ATOMIC_ST_RLX(&__kmp_tdg_task_id, 0); if (__kmp_tdg_dot) - __kmp_print_tdg_dot(tdg); + __kmp_print_tdg_dot(tdg, gtid); } // __kmpc_end_record_task: wrapper around __kmp_end_record to mark From openmp-commits at lists.llvm.org Fri Oct 11 05:04:25 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Fri, 11 Oct 2024 05:04:25 -0700 (PDT) Subject: [Openmp-commits] [openmp] [OpenMP] Fix missing gtid argument in __kmp_print_tdg_dot function (PR #111986) In-Reply-To: Message-ID: <67091449.a70a0220.2c877a.8ba5@mx.google.com> github-actions[bot] wrote: Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using `@` followed by their GitHub username. If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the [LLVM GitHub User Guide](https://llvm.org/docs/GitHub.html). You can also ask questions in a comment on this PR, on the [LLVM Discord](https://discord.com/invite/xS7Z362) or on the [forums](https://discourse.llvm.org/). https://github.com/llvm/llvm-project/pull/111986 From openmp-commits at lists.llvm.org Fri Oct 11 09:41:25 2024 From: openmp-commits at lists.llvm.org (Nikita Popov via Openmp-commits) Date: Fri, 11 Oct 2024 09:41:25 -0700 (PDT) Subject: [Openmp-commits] [openmp] [openmp] Use core_siblings_list if physical_package_id not available (PR #111831) In-Reply-To: Message-ID: <67095535.050a0220.103e21.a0ec@mx.google.com> https://github.com/nikic updated https://github.com/llvm/llvm-project/pull/111831 >From 5fb4d7f6079a76b2907ccc8c53c7c509c30a3dca Mon Sep 17 00:00:00 2001 From: Nikita Popov Date: Thu, 10 Oct 2024 12:47:33 +0000 Subject: [PATCH] [openmp] Use core_siblings_list if physical_package_id not available On powerpc, physical_package_id may not be available. Currently, this causes openmp to fall back to flat topology and various affinity tests fail. Fix this by parsing core_siblings_list to deterimine which cpus belong to the same socket. This matches what the testing code does. The code to parse the CPU list format thankfully already exists. Fixes https://github.com/llvm/llvm-project/issues/111809. --- openmp/runtime/src/kmp_affinity.cpp | 100 +++++++++++++------ openmp/runtime/test/affinity/kmp-hw-subset.c | 2 +- 2 files changed, 72 insertions(+), 30 deletions(-) diff --git a/openmp/runtime/src/kmp_affinity.cpp b/openmp/runtime/src/kmp_affinity.cpp index cf5cad04eb57d5..c3d5ecf1345e89 100644 --- a/openmp/runtime/src/kmp_affinity.cpp +++ b/openmp/runtime/src/kmp_affinity.cpp @@ -1589,15 +1589,13 @@ kmp_str_buf_t *__kmp_affinity_str_buf_mask(kmp_str_buf_t *buf, return buf; } -// Return (possibly empty) affinity mask representing the offline CPUs -// Caller must free the mask -kmp_affin_mask_t *__kmp_affinity_get_offline_cpus() { - kmp_affin_mask_t *offline; - KMP_CPU_ALLOC(offline); - KMP_CPU_ZERO(offline); +static kmp_affin_mask_t *__kmp_parse_cpu_list(const char *path) { + kmp_affin_mask_t *mask; + KMP_CPU_ALLOC(mask); + KMP_CPU_ZERO(mask); #if KMP_OS_LINUX int n, begin_cpu, end_cpu; - kmp_safe_raii_file_t offline_file; + kmp_safe_raii_file_t file; auto skip_ws = [](FILE *f) { int c; do { @@ -1606,29 +1604,29 @@ kmp_affin_mask_t *__kmp_affinity_get_offline_cpus() { if (c != EOF) ungetc(c, f); }; - // File contains CSV of integer ranges representing the offline CPUs + // File contains CSV of integer ranges representing the CPUs // e.g., 1,2,4-7,9,11-15 - int status = offline_file.try_open("/sys/devices/system/cpu/offline", "r"); + int status = file.try_open(path, "r"); if (status != 0) - return offline; - while (!feof(offline_file)) { - skip_ws(offline_file); - n = fscanf(offline_file, "%d", &begin_cpu); + return mask; + while (!feof(file)) { + skip_ws(file); + n = fscanf(file, "%d", &begin_cpu); if (n != 1) break; - skip_ws(offline_file); - int c = fgetc(offline_file); + skip_ws(file); + int c = fgetc(file); if (c == EOF || c == ',') { // Just single CPU end_cpu = begin_cpu; } else if (c == '-') { // Range of CPUs - skip_ws(offline_file); - n = fscanf(offline_file, "%d", &end_cpu); + skip_ws(file); + n = fscanf(file, "%d", &end_cpu); if (n != 1) break; - skip_ws(offline_file); - c = fgetc(offline_file); // skip ',' + skip_ws(file); + c = fgetc(file); // skip ',' } else { // Syntax problem break; @@ -1638,13 +1636,19 @@ kmp_affin_mask_t *__kmp_affinity_get_offline_cpus() { end_cpu >= __kmp_xproc || begin_cpu > end_cpu) { continue; } - // Insert [begin_cpu, end_cpu] into offline mask + // Insert [begin_cpu, end_cpu] into mask for (int cpu = begin_cpu; cpu <= end_cpu; ++cpu) { - KMP_CPU_SET(cpu, offline); + KMP_CPU_SET(cpu, mask); } } #endif - return offline; + return mask; +} + +// Return (possibly empty) affinity mask representing the offline CPUs +// Caller must free the mask +kmp_affin_mask_t *__kmp_affinity_get_offline_cpus() { + return __kmp_parse_cpu_list("/sys/devices/system/cpu/offline"); } // Return the number of available procs @@ -3175,6 +3179,37 @@ static inline const char *__kmp_cpuinfo_get_envvar() { return envvar; } +static bool __kmp_package_id_from_core_siblings_list(unsigned **threadInfo, + unsigned num_avail, + unsigned idx) { + if (!KMP_AFFINITY_CAPABLE()) + return false; + + char path[256]; + KMP_SNPRINTF(path, sizeof(path), + "/sys/devices/system/cpu/cpu%u/topology/core_siblings_list", + threadInfo[idx][osIdIndex]); + kmp_affin_mask_t *siblings = __kmp_parse_cpu_list(path); + for (unsigned i = 0; i < num_avail; ++i) { + unsigned cpu_id = threadInfo[i][osIdIndex]; + KMP_ASSERT(cpu_id < __kmp_affin_mask_size * CHAR_BIT); + if (!KMP_CPU_ISSET(cpu_id, siblings)) + continue; + if (threadInfo[i][pkgIdIndex] == UINT_MAX) { + // Arbitrarily pick the first index we encounter, it only matters that + // the value is the same for all siblings. + threadInfo[i][pkgIdIndex] = idx; + } else if (threadInfo[i][pkgIdIndex] != idx) { + // Contradictory sibling lists. + KMP_CPU_FREE(siblings); + return false; + } + } + KMP_ASSERT(threadInfo[idx][pkgIdIndex] != UINT_MAX); + KMP_CPU_FREE(siblings); + return true; +} + // Parse /proc/cpuinfo (or an alternate file in the same format) to obtain the // affinity map. On AIX, the map is obtained through system SRAD (Scheduler // Resource Allocation Domain). @@ -3550,18 +3585,13 @@ static bool __kmp_affinity_create_cpuinfo_map(int *line, return false; } - // Check for missing fields. The osId field must be there, and we - // currently require that the physical id field is specified, also. + // Check for missing fields. The osId field must be there. The physical + // id field will be checked later. if (threadInfo[num_avail][osIdIndex] == UINT_MAX) { CLEANUP_THREAD_INFO; *msg_id = kmp_i18n_str_MissingProcField; return false; } - if (threadInfo[0][pkgIdIndex] == UINT_MAX) { - CLEANUP_THREAD_INFO; - *msg_id = kmp_i18n_str_MissingPhysicalIDField; - return false; - } // Skip this proc if it is not included in the machine model. if (KMP_AFFINITY_CAPABLE() && @@ -3591,6 +3621,18 @@ static bool __kmp_affinity_create_cpuinfo_map(int *line, } *line = 0; + // At least on powerpc, Linux may return -1 for physical_package_id. Try + // to reconstruct topology from core_siblings_list in that case. + for (i = 0; i < num_avail; ++i) { + if (threadInfo[i][pkgIdIndex] == UINT_MAX) { + if (!__kmp_package_id_from_core_siblings_list(threadInfo, num_avail, i)) { + CLEANUP_THREAD_INFO; + *msg_id = kmp_i18n_str_MissingPhysicalIDField; + return false; + } + } + } + #if KMP_MIC && REDUCE_TEAM_SIZE unsigned teamSize = 0; #endif // KMP_MIC && REDUCE_TEAM_SIZE diff --git a/openmp/runtime/test/affinity/kmp-hw-subset.c b/openmp/runtime/test/affinity/kmp-hw-subset.c index 606fcdfbada95a..0b49969bd3b10c 100644 --- a/openmp/runtime/test/affinity/kmp-hw-subset.c +++ b/openmp/runtime/test/affinity/kmp-hw-subset.c @@ -25,7 +25,7 @@ static int compare_hw_subset_places(const place_list_t *openmp_places, expected_per_place = nthreads_per_core; } else { expected_total = nsockets; - expected_per_place = ncores_per_socket; + expected_per_place = ncores_per_socket * nthreads_per_core; } if (openmp_places->num_places != expected_total) { fprintf(stderr, "error: KMP_HW_SUBSET did not half each resource layer!\n"); From openmp-commits at lists.llvm.org Fri Oct 11 12:18:25 2024 From: openmp-commits at lists.llvm.org (Jonathan Peyton via Openmp-commits) Date: Fri, 11 Oct 2024 12:18:25 -0700 (PDT) Subject: [Openmp-commits] [openmp] [openmp] Use core_siblings_list if physical_package_id not available (PR #111831) In-Reply-To: Message-ID: <67097a01.050a0220.c49f9.bd4f@mx.google.com> https://github.com/jpeyton52 approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/111831 From openmp-commits at lists.llvm.org Mon Oct 14 00:23:45 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Mon, 14 Oct 2024 00:23:45 -0700 (PDT) Subject: [Openmp-commits] [openmp] 4722c6b - [openmp] Use core_siblings_list if physical_package_id not available (#111831) Message-ID: <670cc701.050a0220.5a090.45c8@mx.google.com> Author: Nikita Popov Date: 2024-10-14T09:23:41+02:00 New Revision: 4722c6b87ca87fb87c9f522cb9decf70cc8b8c2b URL: https://github.com/llvm/llvm-project/commit/4722c6b87ca87fb87c9f522cb9decf70cc8b8c2b DIFF: https://github.com/llvm/llvm-project/commit/4722c6b87ca87fb87c9f522cb9decf70cc8b8c2b.diff LOG: [openmp] Use core_siblings_list if physical_package_id not available (#111831) On powerpc, physical_package_id may not be available. Currently, this causes openmp to fall back to flat topology and various affinity tests fail. Fix this by parsing core_siblings_list to deterimine which cpus belong to the same socket. This matches what the testing code does. The code to parse the CPU list format thankfully already exists. Fixes https://github.com/llvm/llvm-project/issues/111809. Added: Modified: openmp/runtime/src/kmp_affinity.cpp openmp/runtime/test/affinity/kmp-hw-subset.c Removed: ################################################################################ diff --git a/openmp/runtime/src/kmp_affinity.cpp b/openmp/runtime/src/kmp_affinity.cpp index cf5cad04eb57d5..c3d5ecf1345e89 100644 --- a/openmp/runtime/src/kmp_affinity.cpp +++ b/openmp/runtime/src/kmp_affinity.cpp @@ -1589,15 +1589,13 @@ kmp_str_buf_t *__kmp_affinity_str_buf_mask(kmp_str_buf_t *buf, return buf; } -// Return (possibly empty) affinity mask representing the offline CPUs -// Caller must free the mask -kmp_affin_mask_t *__kmp_affinity_get_offline_cpus() { - kmp_affin_mask_t *offline; - KMP_CPU_ALLOC(offline); - KMP_CPU_ZERO(offline); +static kmp_affin_mask_t *__kmp_parse_cpu_list(const char *path) { + kmp_affin_mask_t *mask; + KMP_CPU_ALLOC(mask); + KMP_CPU_ZERO(mask); #if KMP_OS_LINUX int n, begin_cpu, end_cpu; - kmp_safe_raii_file_t offline_file; + kmp_safe_raii_file_t file; auto skip_ws = [](FILE *f) { int c; do { @@ -1606,29 +1604,29 @@ kmp_affin_mask_t *__kmp_affinity_get_offline_cpus() { if (c != EOF) ungetc(c, f); }; - // File contains CSV of integer ranges representing the offline CPUs + // File contains CSV of integer ranges representing the CPUs // e.g., 1,2,4-7,9,11-15 - int status = offline_file.try_open("/sys/devices/system/cpu/offline", "r"); + int status = file.try_open(path, "r"); if (status != 0) - return offline; - while (!feof(offline_file)) { - skip_ws(offline_file); - n = fscanf(offline_file, "%d", &begin_cpu); + return mask; + while (!feof(file)) { + skip_ws(file); + n = fscanf(file, "%d", &begin_cpu); if (n != 1) break; - skip_ws(offline_file); - int c = fgetc(offline_file); + skip_ws(file); + int c = fgetc(file); if (c == EOF || c == ',') { // Just single CPU end_cpu = begin_cpu; } else if (c == '-') { // Range of CPUs - skip_ws(offline_file); - n = fscanf(offline_file, "%d", &end_cpu); + skip_ws(file); + n = fscanf(file, "%d", &end_cpu); if (n != 1) break; - skip_ws(offline_file); - c = fgetc(offline_file); // skip ',' + skip_ws(file); + c = fgetc(file); // skip ',' } else { // Syntax problem break; @@ -1638,13 +1636,19 @@ kmp_affin_mask_t *__kmp_affinity_get_offline_cpus() { end_cpu >= __kmp_xproc || begin_cpu > end_cpu) { continue; } - // Insert [begin_cpu, end_cpu] into offline mask + // Insert [begin_cpu, end_cpu] into mask for (int cpu = begin_cpu; cpu <= end_cpu; ++cpu) { - KMP_CPU_SET(cpu, offline); + KMP_CPU_SET(cpu, mask); } } #endif - return offline; + return mask; +} + +// Return (possibly empty) affinity mask representing the offline CPUs +// Caller must free the mask +kmp_affin_mask_t *__kmp_affinity_get_offline_cpus() { + return __kmp_parse_cpu_list("/sys/devices/system/cpu/offline"); } // Return the number of available procs @@ -3175,6 +3179,37 @@ static inline const char *__kmp_cpuinfo_get_envvar() { return envvar; } +static bool __kmp_package_id_from_core_siblings_list(unsigned **threadInfo, + unsigned num_avail, + unsigned idx) { + if (!KMP_AFFINITY_CAPABLE()) + return false; + + char path[256]; + KMP_SNPRINTF(path, sizeof(path), + "/sys/devices/system/cpu/cpu%u/topology/core_siblings_list", + threadInfo[idx][osIdIndex]); + kmp_affin_mask_t *siblings = __kmp_parse_cpu_list(path); + for (unsigned i = 0; i < num_avail; ++i) { + unsigned cpu_id = threadInfo[i][osIdIndex]; + KMP_ASSERT(cpu_id < __kmp_affin_mask_size * CHAR_BIT); + if (!KMP_CPU_ISSET(cpu_id, siblings)) + continue; + if (threadInfo[i][pkgIdIndex] == UINT_MAX) { + // Arbitrarily pick the first index we encounter, it only matters that + // the value is the same for all siblings. + threadInfo[i][pkgIdIndex] = idx; + } else if (threadInfo[i][pkgIdIndex] != idx) { + // Contradictory sibling lists. + KMP_CPU_FREE(siblings); + return false; + } + } + KMP_ASSERT(threadInfo[idx][pkgIdIndex] != UINT_MAX); + KMP_CPU_FREE(siblings); + return true; +} + // Parse /proc/cpuinfo (or an alternate file in the same format) to obtain the // affinity map. On AIX, the map is obtained through system SRAD (Scheduler // Resource Allocation Domain). @@ -3550,18 +3585,13 @@ static bool __kmp_affinity_create_cpuinfo_map(int *line, return false; } - // Check for missing fields. The osId field must be there, and we - // currently require that the physical id field is specified, also. + // Check for missing fields. The osId field must be there. The physical + // id field will be checked later. if (threadInfo[num_avail][osIdIndex] == UINT_MAX) { CLEANUP_THREAD_INFO; *msg_id = kmp_i18n_str_MissingProcField; return false; } - if (threadInfo[0][pkgIdIndex] == UINT_MAX) { - CLEANUP_THREAD_INFO; - *msg_id = kmp_i18n_str_MissingPhysicalIDField; - return false; - } // Skip this proc if it is not included in the machine model. if (KMP_AFFINITY_CAPABLE() && @@ -3591,6 +3621,18 @@ static bool __kmp_affinity_create_cpuinfo_map(int *line, } *line = 0; + // At least on powerpc, Linux may return -1 for physical_package_id. Try + // to reconstruct topology from core_siblings_list in that case. + for (i = 0; i < num_avail; ++i) { + if (threadInfo[i][pkgIdIndex] == UINT_MAX) { + if (!__kmp_package_id_from_core_siblings_list(threadInfo, num_avail, i)) { + CLEANUP_THREAD_INFO; + *msg_id = kmp_i18n_str_MissingPhysicalIDField; + return false; + } + } + } + #if KMP_MIC && REDUCE_TEAM_SIZE unsigned teamSize = 0; #endif // KMP_MIC && REDUCE_TEAM_SIZE diff --git a/openmp/runtime/test/affinity/kmp-hw-subset.c b/openmp/runtime/test/affinity/kmp-hw-subset.c index 606fcdfbada95a..0b49969bd3b10c 100644 --- a/openmp/runtime/test/affinity/kmp-hw-subset.c +++ b/openmp/runtime/test/affinity/kmp-hw-subset.c @@ -25,7 +25,7 @@ static int compare_hw_subset_places(const place_list_t *openmp_places, expected_per_place = nthreads_per_core; } else { expected_total = nsockets; - expected_per_place = ncores_per_socket; + expected_per_place = ncores_per_socket * nthreads_per_core; } if (openmp_places->num_places != expected_total) { fprintf(stderr, "error: KMP_HW_SUBSET did not half each resource layer!\n"); From openmp-commits at lists.llvm.org Mon Oct 14 00:23:47 2024 From: openmp-commits at lists.llvm.org (Nikita Popov via Openmp-commits) Date: Mon, 14 Oct 2024 00:23:47 -0700 (PDT) Subject: [Openmp-commits] [openmp] [openmp] Use core_siblings_list if physical_package_id not available (PR #111831) In-Reply-To: Message-ID: <670cc703.170a0220.2ff4ee.171b@mx.google.com> https://github.com/nikic closed https://github.com/llvm/llvm-project/pull/111831 From openmp-commits at lists.llvm.org Mon Oct 14 00:40:14 2024 From: openmp-commits at lists.llvm.org (LLVM Continuous Integration via Openmp-commits) Date: Mon, 14 Oct 2024 00:40:14 -0700 (PDT) Subject: [Openmp-commits] [openmp] [openmp] Use core_siblings_list if physical_package_id not available (PR #111831) In-Reply-To: Message-ID: <670ccade.170a0220.193763.8728@mx.google.com> llvm-ci wrote: LLVM Buildbot has detected a new failure on builder `openmp-offload-libc-amdgpu-runtime` running on `omp-vega20-1` while building `openmp` at step 10 "Add check check-offload". Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/7004
Here is the relevant piece of the build log for the reference ``` Step 10 (Add check check-offload) failure: test (failure) ******************** TEST 'libomptarget :: amdgcn-amd-amdhsa :: offloading/test_libc.cpp' FAILED ******************** Exit Code: 2 Command Output (stdout): -- # RUN: at line 1 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang++ -fopenmp -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/offloading/test_libc.cpp -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/test_libc.cpp.tmp -Xoffload-linker -lc -Xoffload-linker -lm /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a && /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/test_libc.cpp.tmp | /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/offloading/test_libc.cpp # executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang++ -fopenmp -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/offloading/test_libc.cpp -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/test_libc.cpp.tmp -Xoffload-linker -lc -Xoffload-linker -lm /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a # executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/test_libc.cpp.tmp # .---command stderr------------ # | AMDGPU error: Error in hsa_amd_memory_pool_allocate: HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events. # | AMDGPU error: Error in hsa_amd_memory_pool_allocate: HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events. # | "PluginInterface" error: Failure to allocate device memory for global memory pool: Failed to allocate from memory manager # | Display only launched kernel: # | Kernel 'omp target in test_memcpy() @ 10 (__omp_offloading_802_d82835d__Z11test_memcpyv_l10)' # | OFFLOAD ERROR: Memory access fault by GPU 1 (agent 0x55e844cdced0) at virtual address (nil). Reasons: Page not present or supervisor privilege, Write access to a read-only page # | Use 'OFFLOAD_TRACK_ALLOCATION_TRACES=true' to track device allocations # `----------------------------- # error: command failed with exit status: -6 # executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/offloading/test_libc.cpp # .---command stderr------------ # | FileCheck error: '' is empty. # | FileCheck command line: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/offloading/test_libc.cpp # `----------------------------- # error: command failed with exit status: 2 -- ******************** ```
https://github.com/llvm/llvm-project/pull/111831 From openmp-commits at lists.llvm.org Tue Oct 15 01:22:45 2024 From: openmp-commits at lists.llvm.org (Christian von Elm via Openmp-commits) Date: Tue, 15 Oct 2024 01:22:45 -0700 (PDT) Subject: [Openmp-commits] [openmp] [OpenMP] [OMPD] gdb plugin: remove 'imp' import (PR #112331) Message-ID: https://github.com/cvonelm created https://github.com/llvm/llvm-project/pull/112331 The 'imp' library was removed in Python 3.12. As the code never uses the imp library, the import is simply removed. >From d5513b6b752c1dc2db03c74e3ec2ac52304906ff Mon Sep 17 00:00:00 2001 From: Christian von Elm Date: Tue, 15 Oct 2024 10:09:48 +0200 Subject: [PATCH] [OpenMP] [OMPD] gdb plugin: remove 'imp' import The 'imp' library was removed in Python 3.12. As the code never uses the imp library, the import is simply removed. --- openmp/libompd/gdb-plugin/ompd/ompd_handles.py | 1 - 1 file changed, 1 deletion(-) diff --git a/openmp/libompd/gdb-plugin/ompd/ompd_handles.py b/openmp/libompd/gdb-plugin/ompd/ompd_handles.py index 1929a926174156..da97a4086eee6b 100644 --- a/openmp/libompd/gdb-plugin/ompd/ompd_handles.py +++ b/openmp/libompd/gdb-plugin/ompd/ompd_handles.py @@ -1,5 +1,4 @@ import ompdModule -import imp class ompd_parallel(object): From openmp-commits at lists.llvm.org Tue Oct 15 01:23:07 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Tue, 15 Oct 2024 01:23:07 -0700 (PDT) Subject: [Openmp-commits] [openmp] [OpenMP] [OMPD] gdb plugin: remove 'imp' import (PR #112331) In-Reply-To: Message-ID: <670e266b.a70a0220.548f4.1b30@mx.google.com> github-actions[bot] wrote: Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using `@` followed by their GitHub username. If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the [LLVM GitHub User Guide](https://llvm.org/docs/GitHub.html). You can also ask questions in a comment on this PR, on the [LLVM Discord](https://discord.com/invite/xS7Z362) or on the [forums](https://discourse.llvm.org/). https://github.com/llvm/llvm-project/pull/112331 From openmp-commits at lists.llvm.org Wed Oct 16 10:08:00 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Wed, 16 Oct 2024 10:08:00 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <670ff2f0.050a0220.2103d2.6e92@mx.google.com> https://github.com/ldrumm updated https://github.com/llvm/llvm-project/pull/86318 >From 99bf443a80d6927cf1b91942199046414597bd75 Mon Sep 17 00:00:00 2001 From: Luke Drummond Date: Fri, 22 Mar 2024 17:09:54 +0000 Subject: [PATCH 1/2] Finally formalise our defacto line-ending policy Historically, we've not automatically enforced how git tracks line endings, but there are many, many commits that "undo" unintended CRLFs getting into history. `git log --pretty=oneline --grep=CRLF` shows nearly 100 commits involving reverts of CRLF making its way into the index and then history. As far as I can tell, there are none the other way round except for specific cases like `.bat` files or tests for parsers that need to accept such sequences. Of note, one of the earliest of those listed in that output is: ``` commit 9795860250734e5c2a879546c534e35d9edd5944 Author: NAKAMURA Takumi Date: Thu Feb 3 11:41:27 2011 +0000 cmake/*: Add svn:eol-style=native and fix CRLF. llvm-svn: 124793 ``` ...which introduced such a defacto policy for subversion. With old versions of git, it's been a bit of a crapshoot whether enforcing storing line endings in the history will upset checkouts on machines where such line endings are the norm. Indeed many users have enforced that git checks out the working copy according to a global or per-user config via core crlf, or core autocrlf. For ~8 years now[1], however, git has supported the ability to "do as the Romans do" on checkout, but internally store subsets of text files with line-endings specified via a system of patterns in the gitattributes file. Since we now have this ability, and we've been specifying attributes for various binary files, I think it makes sense to rid us of all that work converting things "back", and just let git handle the local checkout. Thus the new toplevel policy here is * text=auto In simple terms this means "unless otherwise specified, convert all files considered "text" files to LF in the project history, but check them out as expected on the local machine. What is "expected on the local machine" is dependent on configuration and default. For those files in the repository that *do* need CRLF endings, I've adopted a policy of `eol=crlf` which means that git will store them in history with LF, but regardless of user config, they'll be checked out in tree with CRLF. Finally, existing files have been "corrected" in history via `git add --renormalize .` [1]: git 2.10 was released with fixed support for fine-grained line-ending tracking that respects user-config *and* repo policy. This can be considered the point at which git will respect both the user's local working tree preference *and* the history as specified by the maintainers. See https://github.com/git/git/blob/master/Documentation/RelNotes/2.10.0.txt#L248 for the release note. --- .gitattributes | 7 +++++++ clang-tools-extra/clangd/test/.gitattributes | 3 +++ clang/test/.gitattributes | 4 ++++ llvm/docs/TestingGuide.rst | 6 ++++++ llvm/test/FileCheck/.gitattributes | 1 + llvm/test/tools/llvm-ar/Inputs/.gitattributes | 1 + llvm/utils/lit/tests/Inputs/shtest-shell/.gitattributes | 1 + 7 files changed, 23 insertions(+) create mode 100644 clang-tools-extra/clangd/test/.gitattributes create mode 100644 clang/test/.gitattributes create mode 100644 llvm/test/FileCheck/.gitattributes create mode 100644 llvm/test/tools/llvm-ar/Inputs/.gitattributes create mode 100644 llvm/utils/lit/tests/Inputs/shtest-shell/.gitattributes diff --git a/.gitattributes b/.gitattributes index 6b281f33f737db..aced01d485c181 100644 --- a/.gitattributes +++ b/.gitattributes @@ -1,3 +1,10 @@ +# Checkout as native, commit as LF except in specific circumstances +* text=auto +*.bat text eol=crlf +*.rc text eol=crlf +*.sln text eol=crlf +*.natvis text eol=crlf + libcxx/src/**/*.cpp merge=libcxx-reformat libcxx/include/**/*.h merge=libcxx-reformat diff --git a/clang-tools-extra/clangd/test/.gitattributes b/clang-tools-extra/clangd/test/.gitattributes new file mode 100644 index 00000000000000..20971adc2b5d03 --- /dev/null +++ b/clang-tools-extra/clangd/test/.gitattributes @@ -0,0 +1,3 @@ +input-mirror.test text eol=crlf +too_large.test text eol=crlf +protocol.test text eol=crlf diff --git a/clang/test/.gitattributes b/clang/test/.gitattributes new file mode 100644 index 00000000000000..160fc6cf561751 --- /dev/null +++ b/clang/test/.gitattributes @@ -0,0 +1,4 @@ +FixIt/fixit-newline-style.c text eol=crlf +Frontend/system-header-line-directive-ms-lineendings.c text eol=crlf +Frontend/rewrite-includes-mixed-eol-crlf.* text eol=crlf +clang/test/Frontend/rewrite-includes-mixed-eol-lf.h text eolf=lf diff --git a/llvm/docs/TestingGuide.rst b/llvm/docs/TestingGuide.rst index 08617933519fdb..344a295226f6ae 100644 --- a/llvm/docs/TestingGuide.rst +++ b/llvm/docs/TestingGuide.rst @@ -360,6 +360,12 @@ Best practices for regression tests - Try to give values (including variables, blocks and functions) meaningful names, and avoid retaining complex names generated by the optimization pipeline (such as ``%foo.0.0.0.0.0.0``). +- If your tests depend on specific input file encodings, beware of line-ending + issues across different platforms, and in the project's history. Before you + commit tests that depend on explicit encodings, consider adding filetype or + specific line-ending annotations to a `<.gitattributes + https://git-scm.com/docs/gitattributes#_effects>`_ file in the appropriate + directory in the repository. Extra files ----------- diff --git a/llvm/test/FileCheck/.gitattributes b/llvm/test/FileCheck/.gitattributes new file mode 100644 index 00000000000000..ba27d7fad76d50 --- /dev/null +++ b/llvm/test/FileCheck/.gitattributes @@ -0,0 +1 @@ +dos-style-eol.txt text eol=crlf diff --git a/llvm/test/tools/llvm-ar/Inputs/.gitattributes b/llvm/test/tools/llvm-ar/Inputs/.gitattributes new file mode 100644 index 00000000000000..6c8a26285daf7f --- /dev/null +++ b/llvm/test/tools/llvm-ar/Inputs/.gitattributes @@ -0,0 +1 @@ +mri-crlf.mri text eol=crlf diff --git a/llvm/utils/lit/tests/Inputs/shtest-shell/.gitattributes b/llvm/utils/lit/tests/Inputs/shtest-shell/.gitattributes new file mode 100644 index 00000000000000..2df17345df5b87 --- /dev/null +++ b/llvm/utils/lit/tests/Inputs/shtest-shell/.gitattributes @@ -0,0 +1 @@ +*.dos text eol=crlf >From ed0c7d0da0ff2bcbb56ce477ee4be0ab4bae0b63 Mon Sep 17 00:00:00 2001 From: Luke Drummond Date: Wed, 16 Oct 2024 18:06:37 +0100 Subject: [PATCH 2/2] Renormalize line endings whitespace only after 99bf443a80d6 Line ending policies were changed in the parent, 99bf443a80d6. To make it easier to resolve downstream merge conflicts after line-ending policies are adjusted this is a separate whitespace-only commit. If you have merge conflicts as a result, you can simply `git add --renormalize -u && git merge --continue` or `git add --renormalize -u && git rebase --continue` --- .../clangd/test/input-mirror.test | 34 +- clang-tools-extra/clangd/test/protocol.test | 226 +- clang-tools-extra/clangd/test/too_large.test | 14 +- clang/test/AST/HLSL/StructuredBuffer-AST.hlsl | 128 +- clang/test/C/C2y/n3262.c | 40 +- clang/test/C/C2y/n3274.c | 36 +- .../StructuredBuffer-annotations.hlsl | 44 +- .../StructuredBuffer-constructor.hlsl | 38 +- .../StructuredBuffer-elementtype.hlsl | 140 +- .../builtins/StructuredBuffer-subscript.hlsl | 34 +- clang/test/CodeGenHLSL/builtins/atan2.hlsl | 118 +- clang/test/CodeGenHLSL/builtins/cross.hlsl | 74 +- clang/test/CodeGenHLSL/builtins/length.hlsl | 146 +- .../test/CodeGenHLSL/builtins/normalize.hlsl | 170 +- clang/test/CodeGenHLSL/builtins/step.hlsl | 168 +- clang/test/Driver/flang/msvc-link.f90 | 10 +- clang/test/FixIt/fixit-newline-style.c | 22 +- .../rewrite-includes-mixed-eol-crlf.c | 16 +- .../rewrite-includes-mixed-eol-crlf.h | 22 +- ...tem-header-line-directive-ms-lineendings.c | 42 +- clang/test/ParserHLSL/bitfields.hlsl | 60 +- .../hlsl_annotations_on_struct_members.hlsl | 42 +- .../ParserHLSL/hlsl_contained_type_attr.hlsl | 50 +- .../hlsl_contained_type_attr_error.hlsl | 56 +- clang/test/ParserHLSL/hlsl_is_rov_attr.hlsl | 44 +- .../ParserHLSL/hlsl_is_rov_attr_error.hlsl | 40 +- .../test/ParserHLSL/hlsl_raw_buffer_attr.hlsl | 44 +- .../hlsl_raw_buffer_attr_error.hlsl | 34 +- .../ParserHLSL/hlsl_resource_class_attr.hlsl | 74 +- .../hlsl_resource_class_attr_error.hlsl | 44 +- .../hlsl_resource_handle_attrs.hlsl | 42 +- clang/test/Sema/aarch64-sve-vector-trig-ops.c | 130 +- clang/test/Sema/riscv-rvv-vector-trig-ops.c | 134 +- .../avail-diag-default-compute.hlsl | 238 +- .../Availability/avail-diag-default-lib.hlsl | 360 +- .../avail-diag-relaxed-compute.hlsl | 238 +- .../Availability/avail-diag-relaxed-lib.hlsl | 324 +- .../avail-diag-strict-compute.hlsl | 256 +- .../Availability/avail-diag-strict-lib.hlsl | 384 +- .../avail-lib-multiple-stages.hlsl | 114 +- .../SemaHLSL/BuiltIns/StructuredBuffers.hlsl | 38 +- .../test/SemaHLSL/BuiltIns/cross-errors.hlsl | 86 +- .../BuiltIns/half-float-only-errors2.hlsl | 26 +- .../test/SemaHLSL/BuiltIns/length-errors.hlsl | 64 +- .../SemaHLSL/BuiltIns/normalize-errors.hlsl | 62 +- clang/test/SemaHLSL/BuiltIns/step-errors.hlsl | 62 +- .../Types/Traits/IsIntangibleType.hlsl | 162 +- .../Types/Traits/IsIntangibleTypeErrors.hlsl | 24 +- .../resource_binding_attr_error_basic.hlsl | 84 +- .../resource_binding_attr_error_other.hlsl | 18 +- .../resource_binding_attr_error_resource.hlsl | 98 +- ...urce_binding_attr_error_silence_diags.hlsl | 54 +- .../resource_binding_attr_error_space.hlsl | 124 +- .../resource_binding_attr_error_udt.hlsl | 270 +- clang/tools/scan-build/bin/scan-build.bat | 2 +- .../tools/scan-build/libexec/c++-analyzer.bat | 2 +- .../tools/scan-build/libexec/ccc-analyzer.bat | 2 +- clang/utils/ClangVisualizers/clang.natvis | 2178 ++--- .../test/Driver/msvc-dependent-lib-flags.f90 | 72 +- .../ir-interpreter-phi-nodes/Makefile | 8 +- .../postmortem/minidump/fizzbuzz.syms | 4 +- .../target-new-solib-notifications/Makefile | 46 +- .../target-new-solib-notifications/a.cpp | 6 +- .../target-new-solib-notifications/b.cpp | 2 +- .../target-new-solib-notifications/c.cpp | 2 +- .../target-new-solib-notifications/d.cpp | 2 +- .../target-new-solib-notifications/main.cpp | 32 +- .../unwind/zeroth_frame/Makefile | 6 +- .../unwind/zeroth_frame/TestZerothFrame.py | 176 +- lldb/test/API/python_api/debugger/Makefile | 6 +- lldb/test/Shell/BuildScript/modes.test | 70 +- lldb/test/Shell/BuildScript/script-args.test | 64 +- .../Shell/BuildScript/toolchain-clang-cl.test | 98 +- .../Windows/Sigsegv/Inputs/sigsegv.cpp | 80 +- .../NativePDB/Inputs/inline_sites.s | 1244 +-- .../Inputs/inline_sites_live.lldbinit | 14 +- .../Inputs/local-variables-registers.lldbinit | 70 +- .../NativePDB/Inputs/lookup-by-types.lldbinit | 6 +- .../subfield_register_simple_type.lldbinit | 4 +- .../NativePDB/function-types-classes.cpp | 12 +- .../NativePDB/inline_sites_live.cpp | 68 +- .../SymbolFile/NativePDB/lookup-by-types.cpp | 92 +- lldb/unittests/Breakpoint/CMakeLists.txt | 20 +- llvm/benchmarks/FormatVariadicBM.cpp | 126 +- .../GetIntrinsicForClangBuiltin.cpp | 100 +- .../GetIntrinsicInfoTableEntriesBM.cpp | 60 +- llvm/docs/_static/LoopOptWG_invite.ics | 160 +- llvm/lib/Support/rpmalloc/CACHE.md | 38 +- llvm/lib/Support/rpmalloc/README.md | 440 +- llvm/lib/Support/rpmalloc/malloc.c | 1448 +-- llvm/lib/Support/rpmalloc/rpmalloc.c | 7984 ++++++++--------- llvm/lib/Support/rpmalloc/rpmalloc.h | 856 +- llvm/lib/Support/rpmalloc/rpnew.h | 226 +- .../Target/DirectX/DXILFinalizeLinkage.cpp | 130 +- .../DirectX/DirectXTargetTransformInfo.cpp | 70 +- llvm/test/CodeGen/DirectX/atan2.ll | 174 +- llvm/test/CodeGen/DirectX/atan2_error.ll | 22 +- llvm/test/CodeGen/DirectX/cross.ll | 112 +- llvm/test/CodeGen/DirectX/finalize_linkage.ll | 128 +- llvm/test/CodeGen/DirectX/normalize.ll | 224 +- llvm/test/CodeGen/DirectX/normalize_error.ll | 20 +- llvm/test/CodeGen/DirectX/step.ll | 156 +- .../CodeGen/SPIRV/hlsl-intrinsics/atan2.ll | 98 +- .../CodeGen/SPIRV/hlsl-intrinsics/cross.ll | 66 +- .../CodeGen/SPIRV/hlsl-intrinsics/length.ll | 58 +- .../SPIRV/hlsl-intrinsics/normalize.ll | 62 +- .../CodeGen/SPIRV/hlsl-intrinsics/step.ll | 66 +- .../Demangle/ms-placeholder-return-type.test | 36 +- llvm/test/FileCheck/dos-style-eol.txt | 20 +- llvm/test/tools/llvm-ar/Inputs/mri-crlf.mri | 8 +- .../tools/llvm-cvtres/Inputs/languages.rc | 72 +- .../tools/llvm-cvtres/Inputs/test_resource.rc | 98 +- .../tools/llvm-rc/Inputs/dialog-with-menu.rc | 32 +- .../COFF/Inputs/resources/test_resource.rc | 88 +- llvm/unittests/Support/ModRefTest.cpp | 54 +- llvm/utils/LLVMVisualizers/llvm.natvis | 816 +- .../lit/tests/Inputs/shtest-shell/diff-in.dos | 6 +- llvm/utils/release/build_llvm_release.bat | 1030 +-- openmp/runtime/doc/doxygen/config | 3644 ++++---- pstl/CREDITS.txt | 42 +- 120 files changed, 14280 insertions(+), 14280 deletions(-) diff --git a/clang-tools-extra/clangd/test/input-mirror.test b/clang-tools-extra/clangd/test/input-mirror.test index a34a4a08cf60cf..bce3f9923a3b90 100644 --- a/clang-tools-extra/clangd/test/input-mirror.test +++ b/clang-tools-extra/clangd/test/input-mirror.test @@ -1,17 +1,17 @@ -# RUN: clangd -pretty -sync -input-mirror-file %t < %s -# Note that we have to use '-b' as -input-mirror-file does not have a newline at the end of file. -# RUN: diff -b %t %s -# It is absolutely vital that this file has CRLF line endings. -# -Content-Length: 125 - -{"jsonrpc":"2.0","id":0,"method":"initialize","params":{"processId":123,"rootPath":"clangd","capabilities":{},"trace":"off"}} -Content-Length: 172 - -{"jsonrpc":"2.0","method":"textDocument/didOpen","params":{"textDocument":{"uri":"test:///main.cpp","languageId":"cpp","version":1,"text":"int main() {\nint a;\na;\n}\n"}}} -Content-Length: 44 - -{"jsonrpc":"2.0","id":3,"method":"shutdown"} -Content-Length: 33 - -{"jsonrpc":"2.0","method":"exit"} +# RUN: clangd -pretty -sync -input-mirror-file %t < %s +# Note that we have to use '-b' as -input-mirror-file does not have a newline at the end of file. +# RUN: diff -b %t %s +# It is absolutely vital that this file has CRLF line endings. +# +Content-Length: 125 + +{"jsonrpc":"2.0","id":0,"method":"initialize","params":{"processId":123,"rootPath":"clangd","capabilities":{},"trace":"off"}} +Content-Length: 172 + +{"jsonrpc":"2.0","method":"textDocument/didOpen","params":{"textDocument":{"uri":"test:///main.cpp","languageId":"cpp","version":1,"text":"int main() {\nint a;\na;\n}\n"}}} +Content-Length: 44 + +{"jsonrpc":"2.0","id":3,"method":"shutdown"} +Content-Length: 33 + +{"jsonrpc":"2.0","method":"exit"} diff --git a/clang-tools-extra/clangd/test/protocol.test b/clang-tools-extra/clangd/test/protocol.test index 5e852d1d9deebc..64ccfaef189111 100644 --- a/clang-tools-extra/clangd/test/protocol.test +++ b/clang-tools-extra/clangd/test/protocol.test @@ -1,113 +1,113 @@ -# RUN: not clangd -pretty -sync -enable-test-uri-scheme < %s | FileCheck -strict-whitespace %s -# RUN: not clangd -pretty -sync -enable-test-uri-scheme < %s 2>&1 | FileCheck -check-prefix=STDERR %s -# vim: fileformat=dos -# It is absolutely vital that this file has CRLF line endings. -# -# Note that we invert the test because we intent to let clangd exit prematurely. -# -# Test protocol parsing -Content-Length: 125 -Content-Type: application/vscode-jsonrpc; charset-utf-8 - -{"jsonrpc":"2.0","id":0,"method":"initialize","params":{"processId":123,"rootPath":"clangd","capabilities":{},"trace":"off"}} -# Test message with Content-Type after Content-Length -# -# CHECK: "jsonrpc": "2.0", -# CHECK-NEXT: "result": { -# CHECK: } -Content-Length: 246 - -{"jsonrpc":"2.0","method":"textDocument/didOpen","params":{"textDocument":{"uri":"test:///main.cpp","languageId":"cpp","version":1,"text":"struct fake { int a, bb, ccc; int f(int i, const float f) const; };\nint main() {\n fake f;\n f.\n}\n"}}} - -Content-Length: 104 - -{"jsonrpc":"2.0","method":"textDocument/didChange","params":{"textDocument":{"uri":"test:///main.cpp"}}} - -Content-Type: application/vscode-jsonrpc; charset-utf-8 -Content-Length: 146 - -{"jsonrpc":"2.0","id":1,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} -# Test message with Content-Type before Content-Length -# -# CHECK: "id": 1, -# CHECK-NEXT: "jsonrpc": "2.0", -# CHECK-NEXT: "result": { -# CHECK-NEXT: "isIncomplete": false, -# CHECK-NEXT: "items": [ -# CHECK: "filterText": "a", -# CHECK-NEXT: "insertText": "a", -# CHECK-NEXT: "insertTextFormat": 1, -# CHECK-NEXT: "kind": 5, -# CHECK-NEXT: "label": " a", -# CHECK-NEXT: "score": {{[0-9]+.[0-9]+}}, -# CHECK-NEXT: "sortText": "{{.*}}" -# CHECK: ] -# CHECK-NEXT: } - -X-Test: Testing -Content-Type: application/vscode-jsonrpc; charset-utf-8 -Content-Length: 146 -Content-Type: application/vscode-jsonrpc; charset-utf-8 -X-Testing: Test - -{"jsonrpc":"2.0","id":2,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} - -Content-Type: application/vscode-jsonrpc; charset-utf-8 -Content-Length: 10 -Content-Length: 146 - -{"jsonrpc":"2.0","id":3,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} -# Test message with duplicate Content-Length headers -# -# CHECK: "id": 3, -# CHECK-NEXT: "jsonrpc": "2.0", -# CHECK-NEXT: "result": { -# CHECK-NEXT: "isIncomplete": false, -# CHECK-NEXT: "items": [ -# CHECK: "filterText": "a", -# CHECK-NEXT: "insertText": "a", -# CHECK-NEXT: "insertTextFormat": 1, -# CHECK-NEXT: "kind": 5, -# CHECK-NEXT: "label": " a", -# CHECK-NEXT: "score": {{[0-9]+.[0-9]+}}, -# CHECK-NEXT: "sortText": "{{.*}}" -# CHECK: ] -# CHECK-NEXT: } -# STDERR: Warning: Duplicate Content-Length header received. The previous value for this message (10) was ignored. - -Content-Type: application/vscode-jsonrpc; charset-utf-8 -Content-Length: 10 - -{"jsonrpc":"2.0","id":4,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} -# Test message with malformed Content-Length -# -# STDERR: JSON parse error -# Ensure we recover by sending another (valid) message - -Content-Length: 146 - -{"jsonrpc":"2.0","id":5,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} -# Test message with Content-Type before Content-Length -# -# CHECK: "id": 5, -# CHECK-NEXT: "jsonrpc": "2.0", -# CHECK-NEXT: "result": { -# CHECK-NEXT: "isIncomplete": false, -# CHECK-NEXT: "items": [ -# CHECK: "filterText": "a", -# CHECK-NEXT: "insertText": "a", -# CHECK-NEXT: "insertTextFormat": 1, -# CHECK-NEXT: "kind": 5, -# CHECK-NEXT: "label": " a", -# CHECK-NEXT: "score": {{[0-9]+.[0-9]+}}, -# CHECK-NEXT: "sortText": "{{.*}}" -# CHECK: ] -# CHECK-NEXT: } -Content-Length: 1024 - -{"jsonrpc":"2.0","id":5,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} -# Test message which reads beyond the end of the stream. -# -# Ensure this is the last test in the file! -# STDERR: Input was aborted. Read only {{[0-9]+}} bytes of expected {{[0-9]+}}. - +# RUN: not clangd -pretty -sync -enable-test-uri-scheme < %s | FileCheck -strict-whitespace %s +# RUN: not clangd -pretty -sync -enable-test-uri-scheme < %s 2>&1 | FileCheck -check-prefix=STDERR %s +# vim: fileformat=dos +# It is absolutely vital that this file has CRLF line endings. +# +# Note that we invert the test because we intent to let clangd exit prematurely. +# +# Test protocol parsing +Content-Length: 125 +Content-Type: application/vscode-jsonrpc; charset-utf-8 + +{"jsonrpc":"2.0","id":0,"method":"initialize","params":{"processId":123,"rootPath":"clangd","capabilities":{},"trace":"off"}} +# Test message with Content-Type after Content-Length +# +# CHECK: "jsonrpc": "2.0", +# CHECK-NEXT: "result": { +# CHECK: } +Content-Length: 246 + +{"jsonrpc":"2.0","method":"textDocument/didOpen","params":{"textDocument":{"uri":"test:///main.cpp","languageId":"cpp","version":1,"text":"struct fake { int a, bb, ccc; int f(int i, const float f) const; };\nint main() {\n fake f;\n f.\n}\n"}}} + +Content-Length: 104 + +{"jsonrpc":"2.0","method":"textDocument/didChange","params":{"textDocument":{"uri":"test:///main.cpp"}}} + +Content-Type: application/vscode-jsonrpc; charset-utf-8 +Content-Length: 146 + +{"jsonrpc":"2.0","id":1,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} +# Test message with Content-Type before Content-Length +# +# CHECK: "id": 1, +# CHECK-NEXT: "jsonrpc": "2.0", +# CHECK-NEXT: "result": { +# CHECK-NEXT: "isIncomplete": false, +# CHECK-NEXT: "items": [ +# CHECK: "filterText": "a", +# CHECK-NEXT: "insertText": "a", +# CHECK-NEXT: "insertTextFormat": 1, +# CHECK-NEXT: "kind": 5, +# CHECK-NEXT: "label": " a", +# CHECK-NEXT: "score": {{[0-9]+.[0-9]+}}, +# CHECK-NEXT: "sortText": "{{.*}}" +# CHECK: ] +# CHECK-NEXT: } + +X-Test: Testing +Content-Type: application/vscode-jsonrpc; charset-utf-8 +Content-Length: 146 +Content-Type: application/vscode-jsonrpc; charset-utf-8 +X-Testing: Test + +{"jsonrpc":"2.0","id":2,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} + +Content-Type: application/vscode-jsonrpc; charset-utf-8 +Content-Length: 10 +Content-Length: 146 + +{"jsonrpc":"2.0","id":3,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} +# Test message with duplicate Content-Length headers +# +# CHECK: "id": 3, +# CHECK-NEXT: "jsonrpc": "2.0", +# CHECK-NEXT: "result": { +# CHECK-NEXT: "isIncomplete": false, +# CHECK-NEXT: "items": [ +# CHECK: "filterText": "a", +# CHECK-NEXT: "insertText": "a", +# CHECK-NEXT: "insertTextFormat": 1, +# CHECK-NEXT: "kind": 5, +# CHECK-NEXT: "label": " a", +# CHECK-NEXT: "score": {{[0-9]+.[0-9]+}}, +# CHECK-NEXT: "sortText": "{{.*}}" +# CHECK: ] +# CHECK-NEXT: } +# STDERR: Warning: Duplicate Content-Length header received. The previous value for this message (10) was ignored. + +Content-Type: application/vscode-jsonrpc; charset-utf-8 +Content-Length: 10 + +{"jsonrpc":"2.0","id":4,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} +# Test message with malformed Content-Length +# +# STDERR: JSON parse error +# Ensure we recover by sending another (valid) message + +Content-Length: 146 + +{"jsonrpc":"2.0","id":5,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} +# Test message with Content-Type before Content-Length +# +# CHECK: "id": 5, +# CHECK-NEXT: "jsonrpc": "2.0", +# CHECK-NEXT: "result": { +# CHECK-NEXT: "isIncomplete": false, +# CHECK-NEXT: "items": [ +# CHECK: "filterText": "a", +# CHECK-NEXT: "insertText": "a", +# CHECK-NEXT: "insertTextFormat": 1, +# CHECK-NEXT: "kind": 5, +# CHECK-NEXT: "label": " a", +# CHECK-NEXT: "score": {{[0-9]+.[0-9]+}}, +# CHECK-NEXT: "sortText": "{{.*}}" +# CHECK: ] +# CHECK-NEXT: } +Content-Length: 1024 + +{"jsonrpc":"2.0","id":5,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} +# Test message which reads beyond the end of the stream. +# +# Ensure this is the last test in the file! +# STDERR: Input was aborted. Read only {{[0-9]+}} bytes of expected {{[0-9]+}}. + diff --git a/clang-tools-extra/clangd/test/too_large.test b/clang-tools-extra/clangd/test/too_large.test index 7df981e7942073..6986bd5e258e87 100644 --- a/clang-tools-extra/clangd/test/too_large.test +++ b/clang-tools-extra/clangd/test/too_large.test @@ -1,7 +1,7 @@ -# RUN: not clangd -sync < %s 2>&1 | FileCheck -check-prefix=STDERR %s -# vim: fileformat=dos -# It is absolutely vital that this file has CRLF line endings. -# -Content-Length: 2147483648 - -# STDERR: Refusing to read message +# RUN: not clangd -sync < %s 2>&1 | FileCheck -check-prefix=STDERR %s +# vim: fileformat=dos +# It is absolutely vital that this file has CRLF line endings. +# +Content-Length: 2147483648 + +# STDERR: Refusing to read message diff --git a/clang/test/AST/HLSL/StructuredBuffer-AST.hlsl b/clang/test/AST/HLSL/StructuredBuffer-AST.hlsl index 030fcfc31691dc..9c1630f6f570aa 100644 --- a/clang/test/AST/HLSL/StructuredBuffer-AST.hlsl +++ b/clang/test/AST/HLSL/StructuredBuffer-AST.hlsl @@ -1,64 +1,64 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library -x hlsl -ast-dump -DEMPTY %s | FileCheck -check-prefix=EMPTY %s -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library -x hlsl -ast-dump %s | FileCheck %s - - -// This test tests two different AST generations. The "EMPTY" test mode verifies -// the AST generated by forward declaration of the HLSL types which happens on -// initializing the HLSL external AST with an AST Context. - -// The non-empty mode has a use that requires the StructuredBuffer type be complete, -// which results in the AST being populated by the external AST source. That -// case covers the full implementation of the template declaration and the -// instantiated specialization. - -// EMPTY: ClassTemplateDecl 0x{{[0-9A-Fa-f]+}} <> implicit StructuredBuffer -// EMPTY-NEXT: TemplateTypeParmDecl 0x{{[0-9A-Fa-f]+}} <> class depth 0 index 0 element_type -// EMPTY-NEXT: CXXRecordDecl 0x{{[0-9A-Fa-f]+}} <> implicit class StructuredBuffer -// EMPTY-NEXT: FinalAttr 0x{{[0-9A-Fa-f]+}} <> Implicit final - -// There should be no more occurrances of StructuredBuffer -// EMPTY-NOT: StructuredBuffer - -#ifndef EMPTY - -StructuredBuffer Buffer; - -#endif - -// CHECK: ClassTemplateDecl 0x{{[0-9A-Fa-f]+}} <> implicit StructuredBuffer -// CHECK-NEXT: TemplateTypeParmDecl 0x{{[0-9A-Fa-f]+}} <> class depth 0 index 0 element_type -// CHECK-NEXT: CXXRecordDecl 0x{{[0-9A-Fa-f]+}} <> implicit class StructuredBuffer definition - -// CHECK: FinalAttr 0x{{[0-9A-Fa-f]+}} <> Implicit final -// CHECK-NEXT: FieldDecl 0x{{[0-9A-Fa-f]+}} <> implicit h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] -// CHECK-SAME{LITERAL}: [[hlsl::contained_type(element_type)]] -// CHECK-NEXT: HLSLResourceAttr 0x{{[0-9A-Fa-f]+}} <> Implicit TypedBuffer - -// CHECK: CXXMethodDecl 0x{{[0-9A-Fa-f]+}} <> operator[] 'element_type &const (unsigned int) const' -// CHECK-NEXT: ParmVarDecl 0x{{[0-9A-Fa-f]+}} <> Idx 'unsigned int' -// CHECK-NEXT: CompoundStmt 0x{{[0-9A-Fa-f]+}} <> -// CHECK-NEXT: ReturnStmt 0x{{[0-9A-Fa-f]+}} <> -// CHECK-NEXT: MemberExpr 0x{{[0-9A-Fa-f]+}} <> 'element_type' lvalue .e 0x{{[0-9A-Fa-f]+}} -// CHECK-NEXT: CXXThisExpr 0x{{[0-9A-Fa-f]+}} <> 'const StructuredBuffer' lvalue implicit this -// CHECK-NEXT: AlwaysInlineAttr 0x{{[0-9A-Fa-f]+}} <> Implicit always_inline - -// CHECK-NEXT: CXXMethodDecl 0x{{[0-9A-Fa-f]+}} <> operator[] 'element_type &(unsigned int)' -// CHECK-NEXT: ParmVarDecl 0x{{[0-9A-Fa-f]+}} <> Idx 'unsigned int' -// CHECK-NEXT: CompoundStmt 0x{{[0-9A-Fa-f]+}} <> -// CHECK-NEXT: ReturnStmt 0x{{[0-9A-Fa-f]+}} <> -// CHECK-NEXT: MemberExpr 0x{{[0-9A-Fa-f]+}} <> 'element_type' lvalue .e 0x{{[0-9A-Fa-f]+}} -// CHECK-NEXT: CXXThisExpr 0x{{[0-9A-Fa-f]+}} <> 'StructuredBuffer' lvalue implicit this -// CHECK-NEXT: AlwaysInlineAttr 0x{{[0-9A-Fa-f]+}} <> Implicit always_inline - -// CHECK: ClassTemplateSpecializationDecl 0x{{[0-9A-Fa-f]+}} <> class StructuredBuffer definition - -// CHECK: TemplateArgument type 'float' -// CHECK-NEXT: BuiltinType 0x{{[0-9A-Fa-f]+}} 'float' -// CHECK-NEXT: FinalAttr 0x{{[0-9A-Fa-f]+}} <> Implicit final -// CHECK-NEXT: FieldDecl 0x{{[0-9A-Fa-f]+}} <> implicit h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] -// CHECK-SAME{LITERAL}: [[hlsl::contained_type(float)]] -// CHECK-NEXT: HLSLResourceAttr 0x{{[0-9A-Fa-f]+}} <> Implicit TypedBuffer +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library -x hlsl -ast-dump -DEMPTY %s | FileCheck -check-prefix=EMPTY %s +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library -x hlsl -ast-dump %s | FileCheck %s + + +// This test tests two different AST generations. The "EMPTY" test mode verifies +// the AST generated by forward declaration of the HLSL types which happens on +// initializing the HLSL external AST with an AST Context. + +// The non-empty mode has a use that requires the StructuredBuffer type be complete, +// which results in the AST being populated by the external AST source. That +// case covers the full implementation of the template declaration and the +// instantiated specialization. + +// EMPTY: ClassTemplateDecl 0x{{[0-9A-Fa-f]+}} <> implicit StructuredBuffer +// EMPTY-NEXT: TemplateTypeParmDecl 0x{{[0-9A-Fa-f]+}} <> class depth 0 index 0 element_type +// EMPTY-NEXT: CXXRecordDecl 0x{{[0-9A-Fa-f]+}} <> implicit class StructuredBuffer +// EMPTY-NEXT: FinalAttr 0x{{[0-9A-Fa-f]+}} <> Implicit final + +// There should be no more occurrances of StructuredBuffer +// EMPTY-NOT: StructuredBuffer + +#ifndef EMPTY + +StructuredBuffer Buffer; + +#endif + +// CHECK: ClassTemplateDecl 0x{{[0-9A-Fa-f]+}} <> implicit StructuredBuffer +// CHECK-NEXT: TemplateTypeParmDecl 0x{{[0-9A-Fa-f]+}} <> class depth 0 index 0 element_type +// CHECK-NEXT: CXXRecordDecl 0x{{[0-9A-Fa-f]+}} <> implicit class StructuredBuffer definition + +// CHECK: FinalAttr 0x{{[0-9A-Fa-f]+}} <> Implicit final +// CHECK-NEXT: FieldDecl 0x{{[0-9A-Fa-f]+}} <> implicit h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] +// CHECK-SAME{LITERAL}: [[hlsl::contained_type(element_type)]] +// CHECK-NEXT: HLSLResourceAttr 0x{{[0-9A-Fa-f]+}} <> Implicit TypedBuffer + +// CHECK: CXXMethodDecl 0x{{[0-9A-Fa-f]+}} <> operator[] 'element_type &const (unsigned int) const' +// CHECK-NEXT: ParmVarDecl 0x{{[0-9A-Fa-f]+}} <> Idx 'unsigned int' +// CHECK-NEXT: CompoundStmt 0x{{[0-9A-Fa-f]+}} <> +// CHECK-NEXT: ReturnStmt 0x{{[0-9A-Fa-f]+}} <> +// CHECK-NEXT: MemberExpr 0x{{[0-9A-Fa-f]+}} <> 'element_type' lvalue .e 0x{{[0-9A-Fa-f]+}} +// CHECK-NEXT: CXXThisExpr 0x{{[0-9A-Fa-f]+}} <> 'const StructuredBuffer' lvalue implicit this +// CHECK-NEXT: AlwaysInlineAttr 0x{{[0-9A-Fa-f]+}} <> Implicit always_inline + +// CHECK-NEXT: CXXMethodDecl 0x{{[0-9A-Fa-f]+}} <> operator[] 'element_type &(unsigned int)' +// CHECK-NEXT: ParmVarDecl 0x{{[0-9A-Fa-f]+}} <> Idx 'unsigned int' +// CHECK-NEXT: CompoundStmt 0x{{[0-9A-Fa-f]+}} <> +// CHECK-NEXT: ReturnStmt 0x{{[0-9A-Fa-f]+}} <> +// CHECK-NEXT: MemberExpr 0x{{[0-9A-Fa-f]+}} <> 'element_type' lvalue .e 0x{{[0-9A-Fa-f]+}} +// CHECK-NEXT: CXXThisExpr 0x{{[0-9A-Fa-f]+}} <> 'StructuredBuffer' lvalue implicit this +// CHECK-NEXT: AlwaysInlineAttr 0x{{[0-9A-Fa-f]+}} <> Implicit always_inline + +// CHECK: ClassTemplateSpecializationDecl 0x{{[0-9A-Fa-f]+}} <> class StructuredBuffer definition + +// CHECK: TemplateArgument type 'float' +// CHECK-NEXT: BuiltinType 0x{{[0-9A-Fa-f]+}} 'float' +// CHECK-NEXT: FinalAttr 0x{{[0-9A-Fa-f]+}} <> Implicit final +// CHECK-NEXT: FieldDecl 0x{{[0-9A-Fa-f]+}} <> implicit h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] +// CHECK-SAME{LITERAL}: [[hlsl::contained_type(float)]] +// CHECK-NEXT: HLSLResourceAttr 0x{{[0-9A-Fa-f]+}} <> Implicit TypedBuffer diff --git a/clang/test/C/C2y/n3262.c b/clang/test/C/C2y/n3262.c index 3ff2062d88dde8..864ab351bdbc23 100644 --- a/clang/test/C/C2y/n3262.c +++ b/clang/test/C/C2y/n3262.c @@ -1,20 +1,20 @@ -// RUN: %clang_cc1 -verify -std=c2y -Wall -pedantic %s -// expected-no-diagnostics - -/* WG14 N3262: Yes - * Usability of a byte-wise copy of va_list - * - * NB: Clang explicitly documents this as being undefined behavior. A - * diagnostic is produced for some targets but not for others for assignment or - * initialization, but no diagnostic is possible to produce for use with memcpy - * in the general case, nor with a manual bytewise copy via a for loop. - * - * Therefore, nothing is tested in this file; it serves as a reminder that we - * validated our documentation against the paper. See - * clang/docs/LanguageExtensions.rst for more details. - * - * FIXME: it would be nice to add ubsan support for recognizing when an invalid - * copy is made and diagnosing on copy (or on use of the copied va_list). - */ - -int main() {} +// RUN: %clang_cc1 -verify -std=c2y -Wall -pedantic %s +// expected-no-diagnostics + +/* WG14 N3262: Yes + * Usability of a byte-wise copy of va_list + * + * NB: Clang explicitly documents this as being undefined behavior. A + * diagnostic is produced for some targets but not for others for assignment or + * initialization, but no diagnostic is possible to produce for use with memcpy + * in the general case, nor with a manual bytewise copy via a for loop. + * + * Therefore, nothing is tested in this file; it serves as a reminder that we + * validated our documentation against the paper. See + * clang/docs/LanguageExtensions.rst for more details. + * + * FIXME: it would be nice to add ubsan support for recognizing when an invalid + * copy is made and diagnosing on copy (or on use of the copied va_list). + */ + +int main() {} diff --git a/clang/test/C/C2y/n3274.c b/clang/test/C/C2y/n3274.c index ccdb89f4069ded..6bf8d72d0f3319 100644 --- a/clang/test/C/C2y/n3274.c +++ b/clang/test/C/C2y/n3274.c @@ -1,18 +1,18 @@ -// RUN: %clang_cc1 -verify -std=c23 -Wall -pedantic %s -// RUN: %clang_cc1 -verify -std=c2y -Wall -pedantic %s - -/* WG14 N3274: Yes - * Remove imaginary types - */ - -// Clang has never supported _Imaginary. -#ifdef __STDC_IEC_559_COMPLEX__ -#error "When did this happen?" -#endif - -_Imaginary float i; // expected-error {{imaginary types are not supported}} - -// _Imaginary is a keyword in older language modes, but doesn't need to be one -// in C2y or later. However, to improve diagnostic behavior, we retain it as a -// keyword in all language modes -- it is not available as an identifier. -static_assert(!__is_identifier(_Imaginary)); +// RUN: %clang_cc1 -verify -std=c23 -Wall -pedantic %s +// RUN: %clang_cc1 -verify -std=c2y -Wall -pedantic %s + +/* WG14 N3274: Yes + * Remove imaginary types + */ + +// Clang has never supported _Imaginary. +#ifdef __STDC_IEC_559_COMPLEX__ +#error "When did this happen?" +#endif + +_Imaginary float i; // expected-error {{imaginary types are not supported}} + +// _Imaginary is a keyword in older language modes, but doesn't need to be one +// in C2y or later. However, to improve diagnostic behavior, we retain it as a +// keyword in all language modes -- it is not available as an identifier. +static_assert(!__is_identifier(_Imaginary)); diff --git a/clang/test/CodeGenHLSL/builtins/StructuredBuffer-annotations.hlsl b/clang/test/CodeGenHLSL/builtins/StructuredBuffer-annotations.hlsl index 4d3d4908c396e6..81c5837d8f2077 100644 --- a/clang/test/CodeGenHLSL/builtins/StructuredBuffer-annotations.hlsl +++ b/clang/test/CodeGenHLSL/builtins/StructuredBuffer-annotations.hlsl @@ -1,22 +1,22 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s - -StructuredBuffer Buffer1; -StructuredBuffer > BufferArray[4]; - -StructuredBuffer Buffer2 : register(u3); -StructuredBuffer > BufferArray2[4] : register(u4); - -StructuredBuffer Buffer3 : register(u3, space1); -StructuredBuffer > BufferArray3[4] : register(u4, space1); - -[numthreads(1,1,1)] -void main() { -} - -// CHECK: !hlsl.uavs = !{![[Single:[0-9]+]], ![[Array:[0-9]+]], ![[SingleAllocated:[0-9]+]], ![[ArrayAllocated:[0-9]+]], ![[SingleSpace:[0-9]+]], ![[ArraySpace:[0-9]+]]} -// CHECK-DAG: ![[Single]] = !{ptr @Buffer1, i32 10, i32 9, i1 false, i32 -1, i32 0} -// CHECK-DAG: ![[Array]] = !{ptr @BufferArray, i32 10, i32 9, i1 false, i32 -1, i32 0} -// CHECK-DAG: ![[SingleAllocated]] = !{ptr @Buffer2, i32 10, i32 9, i1 false, i32 3, i32 0} -// CHECK-DAG: ![[ArrayAllocated]] = !{ptr @BufferArray2, i32 10, i32 9, i1 false, i32 4, i32 0} -// CHECK-DAG: ![[SingleSpace]] = !{ptr @Buffer3, i32 10, i32 9, i1 false, i32 3, i32 1} -// CHECK-DAG: ![[ArraySpace]] = !{ptr @BufferArray3, i32 10, i32 9, i1 false, i32 4, i32 1} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s + +StructuredBuffer Buffer1; +StructuredBuffer > BufferArray[4]; + +StructuredBuffer Buffer2 : register(u3); +StructuredBuffer > BufferArray2[4] : register(u4); + +StructuredBuffer Buffer3 : register(u3, space1); +StructuredBuffer > BufferArray3[4] : register(u4, space1); + +[numthreads(1,1,1)] +void main() { +} + +// CHECK: !hlsl.uavs = !{![[Single:[0-9]+]], ![[Array:[0-9]+]], ![[SingleAllocated:[0-9]+]], ![[ArrayAllocated:[0-9]+]], ![[SingleSpace:[0-9]+]], ![[ArraySpace:[0-9]+]]} +// CHECK-DAG: ![[Single]] = !{ptr @Buffer1, i32 10, i32 9, i1 false, i32 -1, i32 0} +// CHECK-DAG: ![[Array]] = !{ptr @BufferArray, i32 10, i32 9, i1 false, i32 -1, i32 0} +// CHECK-DAG: ![[SingleAllocated]] = !{ptr @Buffer2, i32 10, i32 9, i1 false, i32 3, i32 0} +// CHECK-DAG: ![[ArrayAllocated]] = !{ptr @BufferArray2, i32 10, i32 9, i1 false, i32 4, i32 0} +// CHECK-DAG: ![[SingleSpace]] = !{ptr @Buffer3, i32 10, i32 9, i1 false, i32 3, i32 1} +// CHECK-DAG: ![[ArraySpace]] = !{ptr @BufferArray3, i32 10, i32 9, i1 false, i32 4, i32 1} diff --git a/clang/test/CodeGenHLSL/builtins/StructuredBuffer-constructor.hlsl b/clang/test/CodeGenHLSL/builtins/StructuredBuffer-constructor.hlsl index 178332d03e6404..f65090410ce66f 100644 --- a/clang/test/CodeGenHLSL/builtins/StructuredBuffer-constructor.hlsl +++ b/clang/test/CodeGenHLSL/builtins/StructuredBuffer-constructor.hlsl @@ -1,19 +1,19 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s -// RUN: %clang_cc1 -triple spirv-vulkan-library -x hlsl -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s --check-prefix=CHECK-SPIRV - -// XFAIL: * -// This expectedly fails because create.handle is no longer invoked -// from StructuredBuffer constructor and the replacement has not been -// implemented yet. This test should be updated to expect -// dx.create.handleFromBinding as part of issue #105076. - -StructuredBuffer Buf; - -// CHECK: define linkonce_odr noundef ptr @"??0?$StructuredBuffer at M@hlsl@@QAA at XZ" -// CHECK-NEXT: entry: - -// CHECK: %[[HandleRes:[0-9]+]] = call ptr @llvm.dx.create.handle(i8 1) -// CHECK: store ptr %[[HandleRes]], ptr %h, align 4 - -// CHECK-SPIRV: %[[HandleRes:[0-9]+]] = call ptr @llvm.spv.create.handle(i8 1) -// CHECK-SPIRV: store ptr %[[HandleRes]], ptr %h, align 8 +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple spirv-vulkan-library -x hlsl -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s --check-prefix=CHECK-SPIRV + +// XFAIL: * +// This expectedly fails because create.handle is no longer invoked +// from StructuredBuffer constructor and the replacement has not been +// implemented yet. This test should be updated to expect +// dx.create.handleFromBinding as part of issue #105076. + +StructuredBuffer Buf; + +// CHECK: define linkonce_odr noundef ptr @"??0?$StructuredBuffer at M@hlsl@@QAA at XZ" +// CHECK-NEXT: entry: + +// CHECK: %[[HandleRes:[0-9]+]] = call ptr @llvm.dx.create.handle(i8 1) +// CHECK: store ptr %[[HandleRes]], ptr %h, align 4 + +// CHECK-SPIRV: %[[HandleRes:[0-9]+]] = call ptr @llvm.spv.create.handle(i8 1) +// CHECK-SPIRV: store ptr %[[HandleRes]], ptr %h, align 8 diff --git a/clang/test/CodeGenHLSL/builtins/StructuredBuffer-elementtype.hlsl b/clang/test/CodeGenHLSL/builtins/StructuredBuffer-elementtype.hlsl index a99c7f98a1afb6..435a904327a26a 100644 --- a/clang/test/CodeGenHLSL/builtins/StructuredBuffer-elementtype.hlsl +++ b/clang/test/CodeGenHLSL/builtins/StructuredBuffer-elementtype.hlsl @@ -1,70 +1,70 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.2-compute -finclude-default-header -fnative-half-type -emit-llvm -o - %s | FileCheck %s - -// NOTE: The number in type name and whether the struct is packed or not will mostly -// likely change once subscript operators are properly implemented (llvm/llvm-project#95956) -// and theinterim field of the contained type is removed. - -// CHECK: %"class.hlsl::StructuredBuffer" = type <{ target("dx.RawBuffer", i16, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.0" = type <{ target("dx.RawBuffer", i16, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.2" = type { target("dx.RawBuffer", i32, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.3" = type { target("dx.RawBuffer", i32, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.4" = type { target("dx.RawBuffer", i64, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.5" = type { target("dx.RawBuffer", i64, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.6" = type <{ target("dx.RawBuffer", half, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.8" = type { target("dx.RawBuffer", float, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.9" = type { target("dx.RawBuffer", double, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.10" = type { target("dx.RawBuffer", <4 x i16>, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.11" = type { target("dx.RawBuffer", <3 x i32>, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.12" = type { target("dx.RawBuffer", <2 x half>, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.13" = type { target("dx.RawBuffer", <3 x float>, 1, 0) - -StructuredBuffer BufI16; -StructuredBuffer BufU16; -StructuredBuffer BufI32; -StructuredBuffer BufU32; -StructuredBuffer BufI64; -StructuredBuffer BufU64; -StructuredBuffer BufF16; -StructuredBuffer BufF32; -StructuredBuffer BufF64; -StructuredBuffer< vector > BufI16x4; -StructuredBuffer< vector > BufU32x3; -StructuredBuffer BufF16x2; -StructuredBuffer BufF32x3; -// TODO: StructuredBuffer BufSNormF16; -> 11 -// TODO: StructuredBuffer BufUNormF16; -> 12 -// TODO: StructuredBuffer BufSNormF32; -> 13 -// TODO: StructuredBuffer BufUNormF32; -> 14 -// TODO: StructuredBuffer BufSNormF64; -> 15 -// TODO: StructuredBuffer BufUNormF64; -> 16 - -[numthreads(1,1,1)] -void main(int GI : SV_GroupIndex) { - BufI16[GI] = 0; - BufU16[GI] = 0; - BufI32[GI] = 0; - BufU32[GI] = 0; - BufI64[GI] = 0; - BufU64[GI] = 0; - BufF16[GI] = 0; - BufF32[GI] = 0; - BufF64[GI] = 0; - BufI16x4[GI] = 0; - BufU32x3[GI] = 0; - BufF16x2[GI] = 0; - BufF32x3[GI] = 0; -} - -// CHECK: !{{[0-9]+}} = !{ptr @BufI16, i32 10, i32 2, -// CHECK: !{{[0-9]+}} = !{ptr @BufU16, i32 10, i32 3, -// CHECK: !{{[0-9]+}} = !{ptr @BufI32, i32 10, i32 4, -// CHECK: !{{[0-9]+}} = !{ptr @BufU32, i32 10, i32 5, -// CHECK: !{{[0-9]+}} = !{ptr @BufI64, i32 10, i32 6, -// CHECK: !{{[0-9]+}} = !{ptr @BufU64, i32 10, i32 7, -// CHECK: !{{[0-9]+}} = !{ptr @BufF16, i32 10, i32 8, -// CHECK: !{{[0-9]+}} = !{ptr @BufF32, i32 10, i32 9, -// CHECK: !{{[0-9]+}} = !{ptr @BufF64, i32 10, i32 10, -// CHECK: !{{[0-9]+}} = !{ptr @BufI16x4, i32 10, i32 2, -// CHECK: !{{[0-9]+}} = !{ptr @BufU32x3, i32 10, i32 5, -// CHECK: !{{[0-9]+}} = !{ptr @BufF16x2, i32 10, i32 8, -// CHECK: !{{[0-9]+}} = !{ptr @BufF32x3, i32 10, i32 9, +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.2-compute -finclude-default-header -fnative-half-type -emit-llvm -o - %s | FileCheck %s + +// NOTE: The number in type name and whether the struct is packed or not will mostly +// likely change once subscript operators are properly implemented (llvm/llvm-project#95956) +// and theinterim field of the contained type is removed. + +// CHECK: %"class.hlsl::StructuredBuffer" = type <{ target("dx.RawBuffer", i16, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.0" = type <{ target("dx.RawBuffer", i16, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.2" = type { target("dx.RawBuffer", i32, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.3" = type { target("dx.RawBuffer", i32, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.4" = type { target("dx.RawBuffer", i64, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.5" = type { target("dx.RawBuffer", i64, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.6" = type <{ target("dx.RawBuffer", half, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.8" = type { target("dx.RawBuffer", float, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.9" = type { target("dx.RawBuffer", double, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.10" = type { target("dx.RawBuffer", <4 x i16>, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.11" = type { target("dx.RawBuffer", <3 x i32>, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.12" = type { target("dx.RawBuffer", <2 x half>, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.13" = type { target("dx.RawBuffer", <3 x float>, 1, 0) + +StructuredBuffer BufI16; +StructuredBuffer BufU16; +StructuredBuffer BufI32; +StructuredBuffer BufU32; +StructuredBuffer BufI64; +StructuredBuffer BufU64; +StructuredBuffer BufF16; +StructuredBuffer BufF32; +StructuredBuffer BufF64; +StructuredBuffer< vector > BufI16x4; +StructuredBuffer< vector > BufU32x3; +StructuredBuffer BufF16x2; +StructuredBuffer BufF32x3; +// TODO: StructuredBuffer BufSNormF16; -> 11 +// TODO: StructuredBuffer BufUNormF16; -> 12 +// TODO: StructuredBuffer BufSNormF32; -> 13 +// TODO: StructuredBuffer BufUNormF32; -> 14 +// TODO: StructuredBuffer BufSNormF64; -> 15 +// TODO: StructuredBuffer BufUNormF64; -> 16 + +[numthreads(1,1,1)] +void main(int GI : SV_GroupIndex) { + BufI16[GI] = 0; + BufU16[GI] = 0; + BufI32[GI] = 0; + BufU32[GI] = 0; + BufI64[GI] = 0; + BufU64[GI] = 0; + BufF16[GI] = 0; + BufF32[GI] = 0; + BufF64[GI] = 0; + BufI16x4[GI] = 0; + BufU32x3[GI] = 0; + BufF16x2[GI] = 0; + BufF32x3[GI] = 0; +} + +// CHECK: !{{[0-9]+}} = !{ptr @BufI16, i32 10, i32 2, +// CHECK: !{{[0-9]+}} = !{ptr @BufU16, i32 10, i32 3, +// CHECK: !{{[0-9]+}} = !{ptr @BufI32, i32 10, i32 4, +// CHECK: !{{[0-9]+}} = !{ptr @BufU32, i32 10, i32 5, +// CHECK: !{{[0-9]+}} = !{ptr @BufI64, i32 10, i32 6, +// CHECK: !{{[0-9]+}} = !{ptr @BufU64, i32 10, i32 7, +// CHECK: !{{[0-9]+}} = !{ptr @BufF16, i32 10, i32 8, +// CHECK: !{{[0-9]+}} = !{ptr @BufF32, i32 10, i32 9, +// CHECK: !{{[0-9]+}} = !{ptr @BufF64, i32 10, i32 10, +// CHECK: !{{[0-9]+}} = !{ptr @BufI16x4, i32 10, i32 2, +// CHECK: !{{[0-9]+}} = !{ptr @BufU32x3, i32 10, i32 5, +// CHECK: !{{[0-9]+}} = !{ptr @BufF16x2, i32 10, i32 8, +// CHECK: !{{[0-9]+}} = !{ptr @BufF32x3, i32 10, i32 9, diff --git a/clang/test/CodeGenHLSL/builtins/StructuredBuffer-subscript.hlsl b/clang/test/CodeGenHLSL/builtins/StructuredBuffer-subscript.hlsl index 155749ec4f94a9..89bde9236288fc 100644 --- a/clang/test/CodeGenHLSL/builtins/StructuredBuffer-subscript.hlsl +++ b/clang/test/CodeGenHLSL/builtins/StructuredBuffer-subscript.hlsl @@ -1,17 +1,17 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -emit-llvm -o - -O0 %s | FileCheck %s - -StructuredBuffer In; -StructuredBuffer Out; - -[numthreads(1,1,1)] -void main(unsigned GI : SV_GroupIndex) { - Out[GI] = In[GI]; -} - -// Even at -O0 the subscript operators get inlined. The -O0 IR is a bit messy -// and confusing to follow so the match here is pretty weak. - -// CHECK: define void @main() -// Verify inlining leaves only calls to "llvm." intrinsics -// CHECK-NOT: call {{[^@]*}} @{{[^l][^l][^v][^m][^\.]}} -// CHECK: ret void +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -emit-llvm -o - -O0 %s | FileCheck %s + +StructuredBuffer In; +StructuredBuffer Out; + +[numthreads(1,1,1)] +void main(unsigned GI : SV_GroupIndex) { + Out[GI] = In[GI]; +} + +// Even at -O0 the subscript operators get inlined. The -O0 IR is a bit messy +// and confusing to follow so the match here is pretty weak. + +// CHECK: define void @main() +// Verify inlining leaves only calls to "llvm." intrinsics +// CHECK-NOT: call {{[^@]*}} @{{[^l][^l][^v][^m][^\.]}} +// CHECK: ret void diff --git a/clang/test/CodeGenHLSL/builtins/atan2.hlsl b/clang/test/CodeGenHLSL/builtins/atan2.hlsl index 40796052e608fe..ada269db2f00d3 100644 --- a/clang/test/CodeGenHLSL/builtins/atan2.hlsl +++ b/clang/test/CodeGenHLSL/builtins/atan2.hlsl @@ -1,59 +1,59 @@ -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ -// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ -// RUN: --check-prefixes=CHECK,NATIVE_HALF -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: spirv-unknown-vulkan-compute %s -emit-llvm -disable-llvm-passes \ -// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF - -// CHECK-LABEL: test_atan2_half -// NATIVE_HALF: call half @llvm.atan2.f16 -// NO_HALF: call float @llvm.atan2.f32 -half test_atan2_half (half p0, half p1) { - return atan2(p0, p1); -} - -// CHECK-LABEL: test_atan2_half2 -// NATIVE_HALF: call <2 x half> @llvm.atan2.v2f16 -// NO_HALF: call <2 x float> @llvm.atan2.v2f32 -half2 test_atan2_half2 (half2 p0, half2 p1) { - return atan2(p0, p1); -} - -// CHECK-LABEL: test_atan2_half3 -// NATIVE_HALF: call <3 x half> @llvm.atan2.v3f16 -// NO_HALF: call <3 x float> @llvm.atan2.v3f32 -half3 test_atan2_half3 (half3 p0, half3 p1) { - return atan2(p0, p1); -} - -// CHECK-LABEL: test_atan2_half4 -// NATIVE_HALF: call <4 x half> @llvm.atan2.v4f16 -// NO_HALF: call <4 x float> @llvm.atan2.v4f32 -half4 test_atan2_half4 (half4 p0, half4 p1) { - return atan2(p0, p1); -} - -// CHECK-LABEL: test_atan2_float -// CHECK: call float @llvm.atan2.f32 -float test_atan2_float (float p0, float p1) { - return atan2(p0, p1); -} - -// CHECK-LABEL: test_atan2_float2 -// CHECK: call <2 x float> @llvm.atan2.v2f32 -float2 test_atan2_float2 (float2 p0, float2 p1) { - return atan2(p0, p1); -} - -// CHECK-LABEL: test_atan2_float3 -// CHECK: call <3 x float> @llvm.atan2.v3f32 -float3 test_atan2_float3 (float3 p0, float3 p1) { - return atan2(p0, p1); -} - -// CHECK-LABEL: test_atan2_float4 -// CHECK: call <4 x float> @llvm.atan2.v4f32 -float4 test_atan2_float4 (float4 p0, float4 p1) { - return atan2(p0, p1); -} +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ +// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ +// RUN: --check-prefixes=CHECK,NATIVE_HALF +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: spirv-unknown-vulkan-compute %s -emit-llvm -disable-llvm-passes \ +// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF + +// CHECK-LABEL: test_atan2_half +// NATIVE_HALF: call half @llvm.atan2.f16 +// NO_HALF: call float @llvm.atan2.f32 +half test_atan2_half (half p0, half p1) { + return atan2(p0, p1); +} + +// CHECK-LABEL: test_atan2_half2 +// NATIVE_HALF: call <2 x half> @llvm.atan2.v2f16 +// NO_HALF: call <2 x float> @llvm.atan2.v2f32 +half2 test_atan2_half2 (half2 p0, half2 p1) { + return atan2(p0, p1); +} + +// CHECK-LABEL: test_atan2_half3 +// NATIVE_HALF: call <3 x half> @llvm.atan2.v3f16 +// NO_HALF: call <3 x float> @llvm.atan2.v3f32 +half3 test_atan2_half3 (half3 p0, half3 p1) { + return atan2(p0, p1); +} + +// CHECK-LABEL: test_atan2_half4 +// NATIVE_HALF: call <4 x half> @llvm.atan2.v4f16 +// NO_HALF: call <4 x float> @llvm.atan2.v4f32 +half4 test_atan2_half4 (half4 p0, half4 p1) { + return atan2(p0, p1); +} + +// CHECK-LABEL: test_atan2_float +// CHECK: call float @llvm.atan2.f32 +float test_atan2_float (float p0, float p1) { + return atan2(p0, p1); +} + +// CHECK-LABEL: test_atan2_float2 +// CHECK: call <2 x float> @llvm.atan2.v2f32 +float2 test_atan2_float2 (float2 p0, float2 p1) { + return atan2(p0, p1); +} + +// CHECK-LABEL: test_atan2_float3 +// CHECK: call <3 x float> @llvm.atan2.v3f32 +float3 test_atan2_float3 (float3 p0, float3 p1) { + return atan2(p0, p1); +} + +// CHECK-LABEL: test_atan2_float4 +// CHECK: call <4 x float> @llvm.atan2.v4f32 +float4 test_atan2_float4 (float4 p0, float4 p1) { + return atan2(p0, p1); +} diff --git a/clang/test/CodeGenHLSL/builtins/cross.hlsl b/clang/test/CodeGenHLSL/builtins/cross.hlsl index 514e57d36b2016..eba710c905bf46 100644 --- a/clang/test/CodeGenHLSL/builtins/cross.hlsl +++ b/clang/test/CodeGenHLSL/builtins/cross.hlsl @@ -1,37 +1,37 @@ -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ -// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ -// RUN: --check-prefixes=CHECK,NATIVE_HALF \ -// RUN: -DFNATTRS=noundef -DTARGET=dx -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -emit-llvm -disable-llvm-passes \ -// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ -// RUN: -DFNATTRS=noundef -DTARGET=dx -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: spirv-unknown-vulkan-compute %s -fnative-half-type \ -// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ -// RUN: --check-prefixes=CHECK,NATIVE_HALF \ -// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: spirv-unknown-vulkan-compute %s -emit-llvm -disable-llvm-passes \ -// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ -// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv - -// NATIVE_HALF: define [[FNATTRS]] <3 x half> @ -// NATIVE_HALF: call <3 x half> @llvm.[[TARGET]].cross.v3f16(<3 x half> -// NATIVE_HALF: ret <3 x half> %hlsl.cross -// NO_HALF: define [[FNATTRS]] <3 x float> @ -// NO_HALF: call <3 x float> @llvm.[[TARGET]].cross.v3f32(<3 x float> -// NO_HALF: ret <3 x float> %hlsl.cross -half3 test_cross_half3(half3 p0, half3 p1) -{ - return cross(p0, p1); -} - -// CHECK: define [[FNATTRS]] <3 x float> @ -// CHECK: %hlsl.cross = call <3 x float> @llvm.[[TARGET]].cross.v3f32( -// CHECK: ret <3 x float> %hlsl.cross -float3 test_cross_float3(float3 p0, float3 p1) -{ - return cross(p0, p1); -} +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ +// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ +// RUN: --check-prefixes=CHECK,NATIVE_HALF \ +// RUN: -DFNATTRS=noundef -DTARGET=dx +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -emit-llvm -disable-llvm-passes \ +// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ +// RUN: -DFNATTRS=noundef -DTARGET=dx +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: spirv-unknown-vulkan-compute %s -fnative-half-type \ +// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ +// RUN: --check-prefixes=CHECK,NATIVE_HALF \ +// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: spirv-unknown-vulkan-compute %s -emit-llvm -disable-llvm-passes \ +// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ +// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv + +// NATIVE_HALF: define [[FNATTRS]] <3 x half> @ +// NATIVE_HALF: call <3 x half> @llvm.[[TARGET]].cross.v3f16(<3 x half> +// NATIVE_HALF: ret <3 x half> %hlsl.cross +// NO_HALF: define [[FNATTRS]] <3 x float> @ +// NO_HALF: call <3 x float> @llvm.[[TARGET]].cross.v3f32(<3 x float> +// NO_HALF: ret <3 x float> %hlsl.cross +half3 test_cross_half3(half3 p0, half3 p1) +{ + return cross(p0, p1); +} + +// CHECK: define [[FNATTRS]] <3 x float> @ +// CHECK: %hlsl.cross = call <3 x float> @llvm.[[TARGET]].cross.v3f32( +// CHECK: ret <3 x float> %hlsl.cross +float3 test_cross_float3(float3 p0, float3 p1) +{ + return cross(p0, p1); +} diff --git a/clang/test/CodeGenHLSL/builtins/length.hlsl b/clang/test/CodeGenHLSL/builtins/length.hlsl index 1c23b0df04df98..9b0293c218a5de 100644 --- a/clang/test/CodeGenHLSL/builtins/length.hlsl +++ b/clang/test/CodeGenHLSL/builtins/length.hlsl @@ -1,73 +1,73 @@ -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ -// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ -// RUN: --check-prefixes=CHECK,NATIVE_HALF -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -emit-llvm -disable-llvm-passes \ -// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF - -// NATIVE_HALF: define noundef half @ -// NATIVE_HALF: call half @llvm.fabs.f16(half -// NO_HALF: call float @llvm.fabs.f32(float -// NATIVE_HALF: ret half -// NO_HALF: ret float -half test_length_half(half p0) -{ - return length(p0); -} -// NATIVE_HALF: define noundef half @ -// NATIVE_HALF: %hlsl.length = call half @llvm.dx.length.v2f16 -// NO_HALF: %hlsl.length = call float @llvm.dx.length.v2f32( -// NATIVE_HALF: ret half %hlsl.length -// NO_HALF: ret float %hlsl.length -half test_length_half2(half2 p0) -{ - return length(p0); -} -// NATIVE_HALF: define noundef half @ -// NATIVE_HALF: %hlsl.length = call half @llvm.dx.length.v3f16 -// NO_HALF: %hlsl.length = call float @llvm.dx.length.v3f32( -// NATIVE_HALF: ret half %hlsl.length -// NO_HALF: ret float %hlsl.length -half test_length_half3(half3 p0) -{ - return length(p0); -} -// NATIVE_HALF: define noundef half @ -// NATIVE_HALF: %hlsl.length = call half @llvm.dx.length.v4f16 -// NO_HALF: %hlsl.length = call float @llvm.dx.length.v4f32( -// NATIVE_HALF: ret half %hlsl.length -// NO_HALF: ret float %hlsl.length -half test_length_half4(half4 p0) -{ - return length(p0); -} - -// CHECK: define noundef float @ -// CHECK: call float @llvm.fabs.f32(float -// CHECK: ret float -float test_length_float(float p0) -{ - return length(p0); -} -// CHECK: define noundef float @ -// CHECK: %hlsl.length = call float @llvm.dx.length.v2f32( -// CHECK: ret float %hlsl.length -float test_length_float2(float2 p0) -{ - return length(p0); -} -// CHECK: define noundef float @ -// CHECK: %hlsl.length = call float @llvm.dx.length.v3f32( -// CHECK: ret float %hlsl.length -float test_length_float3(float3 p0) -{ - return length(p0); -} -// CHECK: define noundef float @ -// CHECK: %hlsl.length = call float @llvm.dx.length.v4f32( -// CHECK: ret float %hlsl.length -float test_length_float4(float4 p0) -{ - return length(p0); -} +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ +// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ +// RUN: --check-prefixes=CHECK,NATIVE_HALF +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -emit-llvm -disable-llvm-passes \ +// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF + +// NATIVE_HALF: define noundef half @ +// NATIVE_HALF: call half @llvm.fabs.f16(half +// NO_HALF: call float @llvm.fabs.f32(float +// NATIVE_HALF: ret half +// NO_HALF: ret float +half test_length_half(half p0) +{ + return length(p0); +} +// NATIVE_HALF: define noundef half @ +// NATIVE_HALF: %hlsl.length = call half @llvm.dx.length.v2f16 +// NO_HALF: %hlsl.length = call float @llvm.dx.length.v2f32( +// NATIVE_HALF: ret half %hlsl.length +// NO_HALF: ret float %hlsl.length +half test_length_half2(half2 p0) +{ + return length(p0); +} +// NATIVE_HALF: define noundef half @ +// NATIVE_HALF: %hlsl.length = call half @llvm.dx.length.v3f16 +// NO_HALF: %hlsl.length = call float @llvm.dx.length.v3f32( +// NATIVE_HALF: ret half %hlsl.length +// NO_HALF: ret float %hlsl.length +half test_length_half3(half3 p0) +{ + return length(p0); +} +// NATIVE_HALF: define noundef half @ +// NATIVE_HALF: %hlsl.length = call half @llvm.dx.length.v4f16 +// NO_HALF: %hlsl.length = call float @llvm.dx.length.v4f32( +// NATIVE_HALF: ret half %hlsl.length +// NO_HALF: ret float %hlsl.length +half test_length_half4(half4 p0) +{ + return length(p0); +} + +// CHECK: define noundef float @ +// CHECK: call float @llvm.fabs.f32(float +// CHECK: ret float +float test_length_float(float p0) +{ + return length(p0); +} +// CHECK: define noundef float @ +// CHECK: %hlsl.length = call float @llvm.dx.length.v2f32( +// CHECK: ret float %hlsl.length +float test_length_float2(float2 p0) +{ + return length(p0); +} +// CHECK: define noundef float @ +// CHECK: %hlsl.length = call float @llvm.dx.length.v3f32( +// CHECK: ret float %hlsl.length +float test_length_float3(float3 p0) +{ + return length(p0); +} +// CHECK: define noundef float @ +// CHECK: %hlsl.length = call float @llvm.dx.length.v4f32( +// CHECK: ret float %hlsl.length +float test_length_float4(float4 p0) +{ + return length(p0); +} diff --git a/clang/test/CodeGenHLSL/builtins/normalize.hlsl b/clang/test/CodeGenHLSL/builtins/normalize.hlsl index 83ad607c14a607..d14e7c70ce0653 100644 --- a/clang/test/CodeGenHLSL/builtins/normalize.hlsl +++ b/clang/test/CodeGenHLSL/builtins/normalize.hlsl @@ -1,85 +1,85 @@ -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ -// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ -// RUN: --check-prefixes=CHECK,NATIVE_HALF \ -// RUN: -DFNATTRS=noundef -DTARGET=dx -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -emit-llvm -disable-llvm-passes \ -// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ -// RUN: -DFNATTRS=noundef -DTARGET=dx -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: spirv-unknown-vulkan-compute %s -fnative-half-type \ -// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ -// RUN: --check-prefixes=CHECK,NATIVE_HALF \ -// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: spirv-unknown-vulkan-compute %s -emit-llvm -disable-llvm-passes \ -// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ -// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv - -// NATIVE_HALF: define [[FNATTRS]] half @ -// NATIVE_HALF: call half @llvm.[[TARGET]].normalize.f16(half -// NO_HALF: call float @llvm.[[TARGET]].normalize.f32(float -// NATIVE_HALF: ret half -// NO_HALF: ret float -half test_normalize_half(half p0) -{ - return normalize(p0); -} -// NATIVE_HALF: define [[FNATTRS]] <2 x half> @ -// NATIVE_HALF: call <2 x half> @llvm.[[TARGET]].normalize.v2f16(<2 x half> -// NO_HALF: call <2 x float> @llvm.[[TARGET]].normalize.v2f32(<2 x float> -// NATIVE_HALF: ret <2 x half> %hlsl.normalize -// NO_HALF: ret <2 x float> %hlsl.normalize -half2 test_normalize_half2(half2 p0) -{ - return normalize(p0); -} -// NATIVE_HALF: define [[FNATTRS]] <3 x half> @ -// NATIVE_HALF: call <3 x half> @llvm.[[TARGET]].normalize.v3f16(<3 x half> -// NO_HALF: call <3 x float> @llvm.[[TARGET]].normalize.v3f32(<3 x float> -// NATIVE_HALF: ret <3 x half> %hlsl.normalize -// NO_HALF: ret <3 x float> %hlsl.normalize -half3 test_normalize_half3(half3 p0) -{ - return normalize(p0); -} -// NATIVE_HALF: define [[FNATTRS]] <4 x half> @ -// NATIVE_HALF: call <4 x half> @llvm.[[TARGET]].normalize.v4f16(<4 x half> -// NO_HALF: call <4 x float> @llvm.[[TARGET]].normalize.v4f32(<4 x float> -// NATIVE_HALF: ret <4 x half> %hlsl.normalize -// NO_HALF: ret <4 x float> %hlsl.normalize -half4 test_normalize_half4(half4 p0) -{ - return normalize(p0); -} - -// CHECK: define [[FNATTRS]] float @ -// CHECK: call float @llvm.[[TARGET]].normalize.f32(float -// CHECK: ret float -float test_normalize_float(float p0) -{ - return normalize(p0); -} -// CHECK: define [[FNATTRS]] <2 x float> @ -// CHECK: %hlsl.normalize = call <2 x float> @llvm.[[TARGET]].normalize.v2f32(<2 x float> - -// CHECK: ret <2 x float> %hlsl.normalize -float2 test_normalize_float2(float2 p0) -{ - return normalize(p0); -} -// CHECK: define [[FNATTRS]] <3 x float> @ -// CHECK: %hlsl.normalize = call <3 x float> @llvm.[[TARGET]].normalize.v3f32( -// CHECK: ret <3 x float> %hlsl.normalize -float3 test_normalize_float3(float3 p0) -{ - return normalize(p0); -} -// CHECK: define [[FNATTRS]] <4 x float> @ -// CHECK: %hlsl.normalize = call <4 x float> @llvm.[[TARGET]].normalize.v4f32( -// CHECK: ret <4 x float> %hlsl.normalize -float4 test_length_float4(float4 p0) -{ - return normalize(p0); -} +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ +// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ +// RUN: --check-prefixes=CHECK,NATIVE_HALF \ +// RUN: -DFNATTRS=noundef -DTARGET=dx +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -emit-llvm -disable-llvm-passes \ +// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ +// RUN: -DFNATTRS=noundef -DTARGET=dx +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: spirv-unknown-vulkan-compute %s -fnative-half-type \ +// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ +// RUN: --check-prefixes=CHECK,NATIVE_HALF \ +// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: spirv-unknown-vulkan-compute %s -emit-llvm -disable-llvm-passes \ +// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ +// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv + +// NATIVE_HALF: define [[FNATTRS]] half @ +// NATIVE_HALF: call half @llvm.[[TARGET]].normalize.f16(half +// NO_HALF: call float @llvm.[[TARGET]].normalize.f32(float +// NATIVE_HALF: ret half +// NO_HALF: ret float +half test_normalize_half(half p0) +{ + return normalize(p0); +} +// NATIVE_HALF: define [[FNATTRS]] <2 x half> @ +// NATIVE_HALF: call <2 x half> @llvm.[[TARGET]].normalize.v2f16(<2 x half> +// NO_HALF: call <2 x float> @llvm.[[TARGET]].normalize.v2f32(<2 x float> +// NATIVE_HALF: ret <2 x half> %hlsl.normalize +// NO_HALF: ret <2 x float> %hlsl.normalize +half2 test_normalize_half2(half2 p0) +{ + return normalize(p0); +} +// NATIVE_HALF: define [[FNATTRS]] <3 x half> @ +// NATIVE_HALF: call <3 x half> @llvm.[[TARGET]].normalize.v3f16(<3 x half> +// NO_HALF: call <3 x float> @llvm.[[TARGET]].normalize.v3f32(<3 x float> +// NATIVE_HALF: ret <3 x half> %hlsl.normalize +// NO_HALF: ret <3 x float> %hlsl.normalize +half3 test_normalize_half3(half3 p0) +{ + return normalize(p0); +} +// NATIVE_HALF: define [[FNATTRS]] <4 x half> @ +// NATIVE_HALF: call <4 x half> @llvm.[[TARGET]].normalize.v4f16(<4 x half> +// NO_HALF: call <4 x float> @llvm.[[TARGET]].normalize.v4f32(<4 x float> +// NATIVE_HALF: ret <4 x half> %hlsl.normalize +// NO_HALF: ret <4 x float> %hlsl.normalize +half4 test_normalize_half4(half4 p0) +{ + return normalize(p0); +} + +// CHECK: define [[FNATTRS]] float @ +// CHECK: call float @llvm.[[TARGET]].normalize.f32(float +// CHECK: ret float +float test_normalize_float(float p0) +{ + return normalize(p0); +} +// CHECK: define [[FNATTRS]] <2 x float> @ +// CHECK: %hlsl.normalize = call <2 x float> @llvm.[[TARGET]].normalize.v2f32(<2 x float> + +// CHECK: ret <2 x float> %hlsl.normalize +float2 test_normalize_float2(float2 p0) +{ + return normalize(p0); +} +// CHECK: define [[FNATTRS]] <3 x float> @ +// CHECK: %hlsl.normalize = call <3 x float> @llvm.[[TARGET]].normalize.v3f32( +// CHECK: ret <3 x float> %hlsl.normalize +float3 test_normalize_float3(float3 p0) +{ + return normalize(p0); +} +// CHECK: define [[FNATTRS]] <4 x float> @ +// CHECK: %hlsl.normalize = call <4 x float> @llvm.[[TARGET]].normalize.v4f32( +// CHECK: ret <4 x float> %hlsl.normalize +float4 test_length_float4(float4 p0) +{ + return normalize(p0); +} diff --git a/clang/test/CodeGenHLSL/builtins/step.hlsl b/clang/test/CodeGenHLSL/builtins/step.hlsl index 442f4930ca579c..8ef52794a3be5d 100644 --- a/clang/test/CodeGenHLSL/builtins/step.hlsl +++ b/clang/test/CodeGenHLSL/builtins/step.hlsl @@ -1,84 +1,84 @@ -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ -// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ -// RUN: --check-prefixes=CHECK,NATIVE_HALF \ -// RUN: -DFNATTRS=noundef -DTARGET=dx -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -emit-llvm -disable-llvm-passes \ -// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ -// RUN: -DFNATTRS=noundef -DTARGET=dx -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: spirv-unknown-vulkan-compute %s -fnative-half-type \ -// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ -// RUN: --check-prefixes=CHECK,NATIVE_HALF \ -// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: spirv-unknown-vulkan-compute %s -emit-llvm -disable-llvm-passes \ -// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ -// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv - -// NATIVE_HALF: define [[FNATTRS]] half @ -// NATIVE_HALF: call half @llvm.[[TARGET]].step.f16(half -// NO_HALF: call float @llvm.[[TARGET]].step.f32(float -// NATIVE_HALF: ret half -// NO_HALF: ret float -half test_step_half(half p0, half p1) -{ - return step(p0, p1); -} -// NATIVE_HALF: define [[FNATTRS]] <2 x half> @ -// NATIVE_HALF: call <2 x half> @llvm.[[TARGET]].step.v2f16(<2 x half> -// NO_HALF: call <2 x float> @llvm.[[TARGET]].step.v2f32(<2 x float> -// NATIVE_HALF: ret <2 x half> %hlsl.step -// NO_HALF: ret <2 x float> %hlsl.step -half2 test_step_half2(half2 p0, half2 p1) -{ - return step(p0, p1); -} -// NATIVE_HALF: define [[FNATTRS]] <3 x half> @ -// NATIVE_HALF: call <3 x half> @llvm.[[TARGET]].step.v3f16(<3 x half> -// NO_HALF: call <3 x float> @llvm.[[TARGET]].step.v3f32(<3 x float> -// NATIVE_HALF: ret <3 x half> %hlsl.step -// NO_HALF: ret <3 x float> %hlsl.step -half3 test_step_half3(half3 p0, half3 p1) -{ - return step(p0, p1); -} -// NATIVE_HALF: define [[FNATTRS]] <4 x half> @ -// NATIVE_HALF: call <4 x half> @llvm.[[TARGET]].step.v4f16(<4 x half> -// NO_HALF: call <4 x float> @llvm.[[TARGET]].step.v4f32(<4 x float> -// NATIVE_HALF: ret <4 x half> %hlsl.step -// NO_HALF: ret <4 x float> %hlsl.step -half4 test_step_half4(half4 p0, half4 p1) -{ - return step(p0, p1); -} - -// CHECK: define [[FNATTRS]] float @ -// CHECK: call float @llvm.[[TARGET]].step.f32(float -// CHECK: ret float -float test_step_float(float p0, float p1) -{ - return step(p0, p1); -} -// CHECK: define [[FNATTRS]] <2 x float> @ -// CHECK: %hlsl.step = call <2 x float> @llvm.[[TARGET]].step.v2f32( -// CHECK: ret <2 x float> %hlsl.step -float2 test_step_float2(float2 p0, float2 p1) -{ - return step(p0, p1); -} -// CHECK: define [[FNATTRS]] <3 x float> @ -// CHECK: %hlsl.step = call <3 x float> @llvm.[[TARGET]].step.v3f32( -// CHECK: ret <3 x float> %hlsl.step -float3 test_step_float3(float3 p0, float3 p1) -{ - return step(p0, p1); -} -// CHECK: define [[FNATTRS]] <4 x float> @ -// CHECK: %hlsl.step = call <4 x float> @llvm.[[TARGET]].step.v4f32( -// CHECK: ret <4 x float> %hlsl.step -float4 test_step_float4(float4 p0, float4 p1) -{ - return step(p0, p1); -} +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ +// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ +// RUN: --check-prefixes=CHECK,NATIVE_HALF \ +// RUN: -DFNATTRS=noundef -DTARGET=dx +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -emit-llvm -disable-llvm-passes \ +// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ +// RUN: -DFNATTRS=noundef -DTARGET=dx +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: spirv-unknown-vulkan-compute %s -fnative-half-type \ +// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ +// RUN: --check-prefixes=CHECK,NATIVE_HALF \ +// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: spirv-unknown-vulkan-compute %s -emit-llvm -disable-llvm-passes \ +// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ +// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv + +// NATIVE_HALF: define [[FNATTRS]] half @ +// NATIVE_HALF: call half @llvm.[[TARGET]].step.f16(half +// NO_HALF: call float @llvm.[[TARGET]].step.f32(float +// NATIVE_HALF: ret half +// NO_HALF: ret float +half test_step_half(half p0, half p1) +{ + return step(p0, p1); +} +// NATIVE_HALF: define [[FNATTRS]] <2 x half> @ +// NATIVE_HALF: call <2 x half> @llvm.[[TARGET]].step.v2f16(<2 x half> +// NO_HALF: call <2 x float> @llvm.[[TARGET]].step.v2f32(<2 x float> +// NATIVE_HALF: ret <2 x half> %hlsl.step +// NO_HALF: ret <2 x float> %hlsl.step +half2 test_step_half2(half2 p0, half2 p1) +{ + return step(p0, p1); +} +// NATIVE_HALF: define [[FNATTRS]] <3 x half> @ +// NATIVE_HALF: call <3 x half> @llvm.[[TARGET]].step.v3f16(<3 x half> +// NO_HALF: call <3 x float> @llvm.[[TARGET]].step.v3f32(<3 x float> +// NATIVE_HALF: ret <3 x half> %hlsl.step +// NO_HALF: ret <3 x float> %hlsl.step +half3 test_step_half3(half3 p0, half3 p1) +{ + return step(p0, p1); +} +// NATIVE_HALF: define [[FNATTRS]] <4 x half> @ +// NATIVE_HALF: call <4 x half> @llvm.[[TARGET]].step.v4f16(<4 x half> +// NO_HALF: call <4 x float> @llvm.[[TARGET]].step.v4f32(<4 x float> +// NATIVE_HALF: ret <4 x half> %hlsl.step +// NO_HALF: ret <4 x float> %hlsl.step +half4 test_step_half4(half4 p0, half4 p1) +{ + return step(p0, p1); +} + +// CHECK: define [[FNATTRS]] float @ +// CHECK: call float @llvm.[[TARGET]].step.f32(float +// CHECK: ret float +float test_step_float(float p0, float p1) +{ + return step(p0, p1); +} +// CHECK: define [[FNATTRS]] <2 x float> @ +// CHECK: %hlsl.step = call <2 x float> @llvm.[[TARGET]].step.v2f32( +// CHECK: ret <2 x float> %hlsl.step +float2 test_step_float2(float2 p0, float2 p1) +{ + return step(p0, p1); +} +// CHECK: define [[FNATTRS]] <3 x float> @ +// CHECK: %hlsl.step = call <3 x float> @llvm.[[TARGET]].step.v3f32( +// CHECK: ret <3 x float> %hlsl.step +float3 test_step_float3(float3 p0, float3 p1) +{ + return step(p0, p1); +} +// CHECK: define [[FNATTRS]] <4 x float> @ +// CHECK: %hlsl.step = call <4 x float> @llvm.[[TARGET]].step.v4f32( +// CHECK: ret <4 x float> %hlsl.step +float4 test_step_float4(float4 p0, float4 p1) +{ + return step(p0, p1); +} diff --git a/clang/test/Driver/flang/msvc-link.f90 b/clang/test/Driver/flang/msvc-link.f90 index 463749510eb5f8..3f7e162a9a6116 100644 --- a/clang/test/Driver/flang/msvc-link.f90 +++ b/clang/test/Driver/flang/msvc-link.f90 @@ -1,5 +1,5 @@ -! RUN: %clang --driver-mode=flang --target=x86_64-pc-windows-msvc -### %s -Ltest 2>&1 | FileCheck %s -! -! Test that user provided paths come before the Flang runtimes -! CHECK: "-libpath:test" -! CHECK: "-libpath:{{.*(\\|/)}}lib" +! RUN: %clang --driver-mode=flang --target=x86_64-pc-windows-msvc -### %s -Ltest 2>&1 | FileCheck %s +! +! Test that user provided paths come before the Flang runtimes +! CHECK: "-libpath:test" +! CHECK: "-libpath:{{.*(\\|/)}}lib" diff --git a/clang/test/FixIt/fixit-newline-style.c b/clang/test/FixIt/fixit-newline-style.c index 61e4df67e85bac..2aac143d4d753e 100644 --- a/clang/test/FixIt/fixit-newline-style.c +++ b/clang/test/FixIt/fixit-newline-style.c @@ -1,11 +1,11 @@ -// RUN: %clang_cc1 -pedantic -Wunused-label -fno-diagnostics-show-line-numbers -x c %s 2>&1 | FileCheck %s -strict-whitespace - -// This file intentionally uses a CRLF newline style -// CHECK: warning: unused label 'ddd' -// CHECK-NEXT: {{^ ddd:}} -// CHECK-NEXT: {{^ \^~~~$}} -// CHECK-NOT: {{^ ;}} -void f(void) { - ddd: - ; -} +// RUN: %clang_cc1 -pedantic -Wunused-label -fno-diagnostics-show-line-numbers -x c %s 2>&1 | FileCheck %s -strict-whitespace + +// This file intentionally uses a CRLF newline style +// CHECK: warning: unused label 'ddd' +// CHECK-NEXT: {{^ ddd:}} +// CHECK-NEXT: {{^ \^~~~$}} +// CHECK-NOT: {{^ ;}} +void f(void) { + ddd: + ; +} diff --git a/clang/test/Frontend/rewrite-includes-mixed-eol-crlf.c b/clang/test/Frontend/rewrite-includes-mixed-eol-crlf.c index d6724444c06676..2faeaba3229218 100644 --- a/clang/test/Frontend/rewrite-includes-mixed-eol-crlf.c +++ b/clang/test/Frontend/rewrite-includes-mixed-eol-crlf.c @@ -1,8 +1,8 @@ -// RUN: %clang_cc1 -E -frewrite-includes %s | %clang_cc1 - -// expected-no-diagnostics -// Note: This source file has CRLF line endings. -// This test validates that -frewrite-includes translates the end of line (EOL) -// form used in header files to the EOL form used in the the primary source -// file when the files use different EOL forms. -#include "rewrite-includes-mixed-eol-crlf.h" -#include "rewrite-includes-mixed-eol-lf.h" +// RUN: %clang_cc1 -E -frewrite-includes %s | %clang_cc1 - +// expected-no-diagnostics +// Note: This source file has CRLF line endings. +// This test validates that -frewrite-includes translates the end of line (EOL) +// form used in header files to the EOL form used in the the primary source +// file when the files use different EOL forms. +#include "rewrite-includes-mixed-eol-crlf.h" +#include "rewrite-includes-mixed-eol-lf.h" diff --git a/clang/test/Frontend/rewrite-includes-mixed-eol-crlf.h b/clang/test/Frontend/rewrite-includes-mixed-eol-crlf.h index 0439b88b75e2cf..baedc282296bd7 100644 --- a/clang/test/Frontend/rewrite-includes-mixed-eol-crlf.h +++ b/clang/test/Frontend/rewrite-includes-mixed-eol-crlf.h @@ -1,11 +1,11 @@ -// Note: This header file has CRLF line endings. -// The indentation in some of the conditional inclusion directives below is -// intentional and is required for this test to function as a regression test -// for GH59736. -_Static_assert(__LINE__ == 5, ""); -#if 1 -_Static_assert(__LINE__ == 7, ""); - #if 1 - _Static_assert(__LINE__ == 9, ""); - #endif -#endif +// Note: This header file has CRLF line endings. +// The indentation in some of the conditional inclusion directives below is +// intentional and is required for this test to function as a regression test +// for GH59736. +_Static_assert(__LINE__ == 5, ""); +#if 1 +_Static_assert(__LINE__ == 7, ""); + #if 1 + _Static_assert(__LINE__ == 9, ""); + #endif +#endif diff --git a/clang/test/Frontend/system-header-line-directive-ms-lineendings.c b/clang/test/Frontend/system-header-line-directive-ms-lineendings.c index 92fc07f65e0d4d..dffdd5cf1959ae 100644 --- a/clang/test/Frontend/system-header-line-directive-ms-lineendings.c +++ b/clang/test/Frontend/system-header-line-directive-ms-lineendings.c @@ -1,21 +1,21 @@ -// RUN: %clang_cc1 %s -E -o - -I %S/Inputs -isystem %S/Inputs/SystemHeaderPrefix | FileCheck %s -#include -#include - -#include "line-directive.h" - -// This tests that the line numbers for the current file are correctly outputted -// for the include-file-completed test case. This file should be CRLF. - -// CHECK: # 1 "{{.*}}system-header-line-directive-ms-lineendings.c" 2 -// CHECK: # 1 "{{.*}}noline.h" 1 3 -// CHECK: foo(void); -// CHECK: # 3 "{{.*}}system-header-line-directive-ms-lineendings.c" 2 -// CHECK: # 1 "{{.*}}line-directive-in-system.h" 1 3 -// The "3" below indicates that "foo.h" is considered a system header. -// CHECK: # 1 "foo.h" 3 -// CHECK: foo(void); -// CHECK: # 4 "{{.*}}system-header-line-directive-ms-lineendings.c" 2 -// CHECK: # 1 "{{.*}}line-directive.h" 1 -// CHECK: # 10 "foo.h"{{$}} -// CHECK: # 6 "{{.*}}system-header-line-directive-ms-lineendings.c" 2 +// RUN: %clang_cc1 %s -E -o - -I %S/Inputs -isystem %S/Inputs/SystemHeaderPrefix | FileCheck %s +#include +#include + +#include "line-directive.h" + +// This tests that the line numbers for the current file are correctly outputted +// for the include-file-completed test case. This file should be CRLF. + +// CHECK: # 1 "{{.*}}system-header-line-directive-ms-lineendings.c" 2 +// CHECK: # 1 "{{.*}}noline.h" 1 3 +// CHECK: foo(void); +// CHECK: # 3 "{{.*}}system-header-line-directive-ms-lineendings.c" 2 +// CHECK: # 1 "{{.*}}line-directive-in-system.h" 1 3 +// The "3" below indicates that "foo.h" is considered a system header. +// CHECK: # 1 "foo.h" 3 +// CHECK: foo(void); +// CHECK: # 4 "{{.*}}system-header-line-directive-ms-lineendings.c" 2 +// CHECK: # 1 "{{.*}}line-directive.h" 1 +// CHECK: # 10 "foo.h"{{$}} +// CHECK: # 6 "{{.*}}system-header-line-directive-ms-lineendings.c" 2 diff --git a/clang/test/ParserHLSL/bitfields.hlsl b/clang/test/ParserHLSL/bitfields.hlsl index 307d1143a068e2..57b6705babdc12 100644 --- a/clang/test/ParserHLSL/bitfields.hlsl +++ b/clang/test/ParserHLSL/bitfields.hlsl @@ -1,31 +1,31 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -ast-dump -x hlsl -o - %s | FileCheck %s - - -struct MyBitFields { - // CHECK: FieldDecl 0x{{[0-9a-f]+}} col:16 referenced field1 'unsigned int' - // CHECK:-ConstantExpr 0x{{[0-9a-f]+}} 'int' - // CHECK:-value: Int 3 - // CHECK:-IntegerLiteral 0x{{[0-9a-f]+}} 'int' 3 - unsigned int field1 : 3; // 3 bits for field1 - - // CHECK:FieldDecl 0x{{[0-9a-f]+}} col:16 referenced field2 'unsigned int' - // CHECK:-ConstantExpr 0x{{[0-9a-f]+}} 'int' - // CHECK:-value: Int 4 - // CHECK:-IntegerLiteral 0x{{[0-9a-f]+}} 'int' 4 - unsigned int field2 : 4; // 4 bits for field2 - - // CHECK:FieldDecl 0x{{[0-9a-f]+}} col:7 field3 'int' - // CHECK:-ConstantExpr 0x{{[0-9a-f]+}} 'int' - // CHECK:-value: Int 5 - // CHECK:-IntegerLiteral 0x{{[0-9a-f]+}} 'int' 5 - int field3 : 5; // 5 bits for field3 (signed) -}; - - - -[numthreads(1,1,1)] -void main() { - MyBitFields m; - m.field1 = 4; - m.field2 = m.field1*2; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -ast-dump -x hlsl -o - %s | FileCheck %s + + +struct MyBitFields { + // CHECK: FieldDecl 0x{{[0-9a-f]+}} col:16 referenced field1 'unsigned int' + // CHECK:-ConstantExpr 0x{{[0-9a-f]+}} 'int' + // CHECK:-value: Int 3 + // CHECK:-IntegerLiteral 0x{{[0-9a-f]+}} 'int' 3 + unsigned int field1 : 3; // 3 bits for field1 + + // CHECK:FieldDecl 0x{{[0-9a-f]+}} col:16 referenced field2 'unsigned int' + // CHECK:-ConstantExpr 0x{{[0-9a-f]+}} 'int' + // CHECK:-value: Int 4 + // CHECK:-IntegerLiteral 0x{{[0-9a-f]+}} 'int' 4 + unsigned int field2 : 4; // 4 bits for field2 + + // CHECK:FieldDecl 0x{{[0-9a-f]+}} col:7 field3 'int' + // CHECK:-ConstantExpr 0x{{[0-9a-f]+}} 'int' + // CHECK:-value: Int 5 + // CHECK:-IntegerLiteral 0x{{[0-9a-f]+}} 'int' 5 + int field3 : 5; // 5 bits for field3 (signed) +}; + + + +[numthreads(1,1,1)] +void main() { + MyBitFields m; + m.field1 = 4; + m.field2 = m.field1*2; } \ No newline at end of file diff --git a/clang/test/ParserHLSL/hlsl_annotations_on_struct_members.hlsl b/clang/test/ParserHLSL/hlsl_annotations_on_struct_members.hlsl index 2eebc920388b5b..5b228d039345e1 100644 --- a/clang/test/ParserHLSL/hlsl_annotations_on_struct_members.hlsl +++ b/clang/test/ParserHLSL/hlsl_annotations_on_struct_members.hlsl @@ -1,21 +1,21 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -ast-dump -o - %s | FileCheck %s - -// tests that hlsl annotations are properly parsed when applied on field decls, -// and that the annotation gets properly placed on the AST. - -struct Eg9{ - // CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} col:8 implicit struct Eg9 - // CHECK: FieldDecl 0x{{[0-9a-f]+}} col:16 referenced a 'unsigned int' - // CHECK: -HLSLSV_DispatchThreadIDAttr 0x{{[0-9a-f]+}} - unsigned int a : SV_DispatchThreadID; -}; -Eg9 e9; - - -RWBuffer In : register(u1); - - -[numthreads(1,1,1)] -void main() { - In[0] = e9.a; -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -ast-dump -o - %s | FileCheck %s + +// tests that hlsl annotations are properly parsed when applied on field decls, +// and that the annotation gets properly placed on the AST. + +struct Eg9{ + // CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} col:8 implicit struct Eg9 + // CHECK: FieldDecl 0x{{[0-9a-f]+}} col:16 referenced a 'unsigned int' + // CHECK: -HLSLSV_DispatchThreadIDAttr 0x{{[0-9a-f]+}} + unsigned int a : SV_DispatchThreadID; +}; +Eg9 e9; + + +RWBuffer In : register(u1); + + +[numthreads(1,1,1)] +void main() { + In[0] = e9.a; +} diff --git a/clang/test/ParserHLSL/hlsl_contained_type_attr.hlsl b/clang/test/ParserHLSL/hlsl_contained_type_attr.hlsl index 5a72aa242e581d..476ec39e14da98 100644 --- a/clang/test/ParserHLSL/hlsl_contained_type_attr.hlsl +++ b/clang/test/ParserHLSL/hlsl_contained_type_attr.hlsl @@ -1,25 +1,25 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -std=hlsl202x -x hlsl -ast-dump -o - %s | FileCheck %s - -typedef vector float4; - -// CHECK: -TypeAliasDecl 0x{{[0-9a-f]+}} -// CHECK: -HLSLAttributedResourceType 0x{{[0-9a-f]+}} '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::contained_type(int)]] -using ResourceIntAliasT = __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(int)]]; -ResourceIntAliasT h1; - -// CHECK: -VarDecl 0x{{[0-9a-f]+}} col:82 h2 '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::contained_type(float4)]] -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(float4)]] h2; - -// CHECK: ClassTemplateDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 6]]:30 S -// CHECK: TemplateTypeParmDecl 0x{{[0-9a-f]+}} col:20 referenced typename depth 0 index 0 T -// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 4]]:30 struct S definition -// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:79 h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::contained_type(T)]] -template struct S { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(T)]] h; -}; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -std=hlsl202x -x hlsl -ast-dump -o - %s | FileCheck %s + +typedef vector float4; + +// CHECK: -TypeAliasDecl 0x{{[0-9a-f]+}} +// CHECK: -HLSLAttributedResourceType 0x{{[0-9a-f]+}} '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::contained_type(int)]] +using ResourceIntAliasT = __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(int)]]; +ResourceIntAliasT h1; + +// CHECK: -VarDecl 0x{{[0-9a-f]+}} col:82 h2 '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::contained_type(float4)]] +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(float4)]] h2; + +// CHECK: ClassTemplateDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 6]]:30 S +// CHECK: TemplateTypeParmDecl 0x{{[0-9a-f]+}} col:20 referenced typename depth 0 index 0 T +// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 4]]:30 struct S definition +// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:79 h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::contained_type(T)]] +template struct S { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(T)]] h; +}; diff --git a/clang/test/ParserHLSL/hlsl_contained_type_attr_error.hlsl b/clang/test/ParserHLSL/hlsl_contained_type_attr_error.hlsl index b2d492d95945c1..673ff8693b83b8 100644 --- a/clang/test/ParserHLSL/hlsl_contained_type_attr_error.hlsl +++ b/clang/test/ParserHLSL/hlsl_contained_type_attr_error.hlsl @@ -1,28 +1,28 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -std=hlsl202x -x hlsl -o - %s -verify - -typedef vector float4; - -// expected-error at +1{{'contained_type' attribute cannot be applied to a declaration}} -[[hlsl::contained_type(float4)]] __hlsl_resource_t h1; - -// expected-error at +1{{'contained_type' attribute takes one argument}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type()]] h3; - -// expected-error at +1{{expected a type}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(0)]] h4; - -// expected-error at +1{{unknown type name 'a'}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(a)]] h5; - -// expected-error at +1{{expected a type}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type("b", c)]] h6; - -// expected-warning at +1{{attribute 'contained_type' is already applied}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(float)]] [[hlsl::contained_type(float)]] h7; - -// expected-warning at +1{{attribute 'contained_type' is already applied with different arguments}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(float)]] [[hlsl::contained_type(int)]] h8; - -// expected-error at +2{{attribute 'resource_class' can be used only on HLSL intangible type '__hlsl_resource_t'}} -// expected-error at +1{{attribute 'contained_type' can be used only on HLSL intangible type '__hlsl_resource_t'}} -float [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(float)]] res5; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -std=hlsl202x -x hlsl -o - %s -verify + +typedef vector float4; + +// expected-error at +1{{'contained_type' attribute cannot be applied to a declaration}} +[[hlsl::contained_type(float4)]] __hlsl_resource_t h1; + +// expected-error at +1{{'contained_type' attribute takes one argument}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type()]] h3; + +// expected-error at +1{{expected a type}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(0)]] h4; + +// expected-error at +1{{unknown type name 'a'}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(a)]] h5; + +// expected-error at +1{{expected a type}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type("b", c)]] h6; + +// expected-warning at +1{{attribute 'contained_type' is already applied}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(float)]] [[hlsl::contained_type(float)]] h7; + +// expected-warning at +1{{attribute 'contained_type' is already applied with different arguments}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(float)]] [[hlsl::contained_type(int)]] h8; + +// expected-error at +2{{attribute 'resource_class' can be used only on HLSL intangible type '__hlsl_resource_t'}} +// expected-error at +1{{attribute 'contained_type' can be used only on HLSL intangible type '__hlsl_resource_t'}} +float [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(float)]] res5; diff --git a/clang/test/ParserHLSL/hlsl_is_rov_attr.hlsl b/clang/test/ParserHLSL/hlsl_is_rov_attr.hlsl index 836d129c8d0002..487dc32413032d 100644 --- a/clang/test/ParserHLSL/hlsl_is_rov_attr.hlsl +++ b/clang/test/ParserHLSL/hlsl_is_rov_attr.hlsl @@ -1,22 +1,22 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -ast-dump -o - %s | FileCheck %s - -// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} {{.*}} struct MyBuffer definition -// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:68 h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::is_rov]] -struct MyBuffer { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::is_rov]] h; -}; - -// CHECK: VarDecl 0x{{[0-9a-f]+}} col:66 res '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(SRV)]] -// CHECK-SAME{LITERAL}: [[hlsl::is_rov]] -__hlsl_resource_t [[hlsl::is_rov]] [[hlsl::resource_class(SRV)]] res; - -// CHECK: FunctionDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 4]]:6 f 'void () -// CHECK: VarDecl 0x{{[0-9a-f]+}} col:72 r '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(Sampler)]] -// CHECK-SAME{LITERAL}: [[hlsl::is_rov]] -void f() { - __hlsl_resource_t [[hlsl::resource_class(Sampler)]] [[hlsl::is_rov]] r; -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -ast-dump -o - %s | FileCheck %s + +// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} {{.*}} struct MyBuffer definition +// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:68 h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::is_rov]] +struct MyBuffer { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::is_rov]] h; +}; + +// CHECK: VarDecl 0x{{[0-9a-f]+}} col:66 res '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(SRV)]] +// CHECK-SAME{LITERAL}: [[hlsl::is_rov]] +__hlsl_resource_t [[hlsl::is_rov]] [[hlsl::resource_class(SRV)]] res; + +// CHECK: FunctionDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 4]]:6 f 'void () +// CHECK: VarDecl 0x{{[0-9a-f]+}} col:72 r '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(Sampler)]] +// CHECK-SAME{LITERAL}: [[hlsl::is_rov]] +void f() { + __hlsl_resource_t [[hlsl::resource_class(Sampler)]] [[hlsl::is_rov]] r; +} diff --git a/clang/test/ParserHLSL/hlsl_is_rov_attr_error.hlsl b/clang/test/ParserHLSL/hlsl_is_rov_attr_error.hlsl index 3b2c12e7a96c5c..9bb64ea990e284 100644 --- a/clang/test/ParserHLSL/hlsl_is_rov_attr_error.hlsl +++ b/clang/test/ParserHLSL/hlsl_is_rov_attr_error.hlsl @@ -1,20 +1,20 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -o - %s -verify - -// expected-error at +1{{'is_rov' attribute cannot be applied to a declaration}} -[[hlsl::is_rov]] __hlsl_resource_t res0; - -// expected-error at +1{{HLSL resource needs to have [[hlsl::resource_class()]] attribute}} -__hlsl_resource_t [[hlsl::is_rov]] res1; - -// expected-error at +1{{'is_rov' attribute takes no arguments}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::is_rov(3)]] res2; - -// expected-error at +1{{use of undeclared identifier 'gibberish'}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::is_rov(gibberish)]] res3; - -// expected-warning at +1{{attribute 'is_rov' is already applied}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::is_rov]] [[hlsl::is_rov]] res4; - -// expected-error at +2{{attribute 'resource_class' can be used only on HLSL intangible type '__hlsl_resource_t'}} -// expected-error at +1{{attribute 'is_rov' can be used only on HLSL intangible type '__hlsl_resource_t'}} -float [[hlsl::resource_class(UAV)]] [[hlsl::is_rov]] res5; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -o - %s -verify + +// expected-error at +1{{'is_rov' attribute cannot be applied to a declaration}} +[[hlsl::is_rov]] __hlsl_resource_t res0; + +// expected-error at +1{{HLSL resource needs to have [[hlsl::resource_class()]] attribute}} +__hlsl_resource_t [[hlsl::is_rov]] res1; + +// expected-error at +1{{'is_rov' attribute takes no arguments}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::is_rov(3)]] res2; + +// expected-error at +1{{use of undeclared identifier 'gibberish'}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::is_rov(gibberish)]] res3; + +// expected-warning at +1{{attribute 'is_rov' is already applied}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::is_rov]] [[hlsl::is_rov]] res4; + +// expected-error at +2{{attribute 'resource_class' can be used only on HLSL intangible type '__hlsl_resource_t'}} +// expected-error at +1{{attribute 'is_rov' can be used only on HLSL intangible type '__hlsl_resource_t'}} +float [[hlsl::resource_class(UAV)]] [[hlsl::is_rov]] res5; diff --git a/clang/test/ParserHLSL/hlsl_raw_buffer_attr.hlsl b/clang/test/ParserHLSL/hlsl_raw_buffer_attr.hlsl index 84c924eec24efc..e09ed5586c1025 100644 --- a/clang/test/ParserHLSL/hlsl_raw_buffer_attr.hlsl +++ b/clang/test/ParserHLSL/hlsl_raw_buffer_attr.hlsl @@ -1,22 +1,22 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -ast-dump -o - %s | FileCheck %s - -// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} {{.*}} struct MyBuffer definition -// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:72 h1 '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] -struct MyBuffer { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer]] h1; -}; - -// CHECK: VarDecl 0x{{[0-9a-f]+}} col:70 h2 '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(SRV)]] -// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] -__hlsl_resource_t [[hlsl::raw_buffer]] [[hlsl::resource_class(SRV)]] h2; - -// CHECK: FunctionDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 4]]:6 f 'void () -// CHECK: VarDecl 0x{{[0-9a-f]+}} col:72 h3 '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] -void f() { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer]] h3; -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -ast-dump -o - %s | FileCheck %s + +// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} {{.*}} struct MyBuffer definition +// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:72 h1 '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] +struct MyBuffer { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer]] h1; +}; + +// CHECK: VarDecl 0x{{[0-9a-f]+}} col:70 h2 '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(SRV)]] +// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] +__hlsl_resource_t [[hlsl::raw_buffer]] [[hlsl::resource_class(SRV)]] h2; + +// CHECK: FunctionDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 4]]:6 f 'void () +// CHECK: VarDecl 0x{{[0-9a-f]+}} col:72 h3 '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] +void f() { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer]] h3; +} diff --git a/clang/test/ParserHLSL/hlsl_raw_buffer_attr_error.hlsl b/clang/test/ParserHLSL/hlsl_raw_buffer_attr_error.hlsl index 77530cbf9e4d92..a10aca4e96fc53 100644 --- a/clang/test/ParserHLSL/hlsl_raw_buffer_attr_error.hlsl +++ b/clang/test/ParserHLSL/hlsl_raw_buffer_attr_error.hlsl @@ -1,17 +1,17 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -o - %s -verify - -// expected-error at +1{{'raw_buffer' attribute cannot be applied to a declaration}} -[[hlsl::raw_buffer]] __hlsl_resource_t res0; - -// expected-error at +1{{'raw_buffer' attribute takes no arguments}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer(3)]] res2; - -// expected-error at +1{{use of undeclared identifier 'gibberish'}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer(gibberish)]] res3; - -// expected-warning at +1{{attribute 'raw_buffer' is already applied}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer]] [[hlsl::raw_buffer]] res4; - -// expected-error at +2{{attribute 'resource_class' can be used only on HLSL intangible type '__hlsl_resource_t'}} -// expected-error at +1{{attribute 'raw_buffer' can be used only on HLSL intangible type '__hlsl_resource_t'}} -float [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer]] res5; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -o - %s -verify + +// expected-error at +1{{'raw_buffer' attribute cannot be applied to a declaration}} +[[hlsl::raw_buffer]] __hlsl_resource_t res0; + +// expected-error at +1{{'raw_buffer' attribute takes no arguments}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer(3)]] res2; + +// expected-error at +1{{use of undeclared identifier 'gibberish'}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer(gibberish)]] res3; + +// expected-warning at +1{{attribute 'raw_buffer' is already applied}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer]] [[hlsl::raw_buffer]] res4; + +// expected-error at +2{{attribute 'resource_class' can be used only on HLSL intangible type '__hlsl_resource_t'}} +// expected-error at +1{{attribute 'raw_buffer' can be used only on HLSL intangible type '__hlsl_resource_t'}} +float [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer]] res5; diff --git a/clang/test/ParserHLSL/hlsl_resource_class_attr.hlsl b/clang/test/ParserHLSL/hlsl_resource_class_attr.hlsl index fbada8b4b99f75..9fee9edddf619a 100644 --- a/clang/test/ParserHLSL/hlsl_resource_class_attr.hlsl +++ b/clang/test/ParserHLSL/hlsl_resource_class_attr.hlsl @@ -1,37 +1,37 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -ast-dump -o - %s | FileCheck %s - -// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} {{.*}} struct MyBuffer definition -// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:51 h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -struct MyBuffer { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] h; -}; - -// CHECK: VarDecl 0x{{[0-9a-f]+}} col:49 res '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(SRV)]] -__hlsl_resource_t [[hlsl::resource_class(SRV)]] res; - -// CHECK: FunctionDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 3]]:6 f 'void () -// CHECK: VarDecl 0x{{[0-9a-f]+}} col:55 r '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(Sampler)]] -void f() { - __hlsl_resource_t [[hlsl::resource_class(Sampler)]] r; -} - -// CHECK: ClassTemplateDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 6]]:29 MyBuffer2 -// CHECK: TemplateTypeParmDecl 0x{{[0-9a-f]+}} col:19 typename depth 0 index 0 T -// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 4]]:29 struct MyBuffer2 definition -// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} col:29 implicit struct MyBuffer2 -// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:51 h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -template struct MyBuffer2 { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] h; -}; - -// CHECK: ClassTemplateSpecializationDecl 0x{{[0-9a-f]+}} line:[[# @LINE - 4]]:29 struct MyBuffer2 definition implicit_instantiation -// CHECK: TemplateArgument type 'float' -// CHECK: BuiltinType 0x{{[0-9a-f]+}} 'float' -// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} col:29 implicit struct MyBuffer2 -// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:51 h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -MyBuffer2 myBuffer2; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -ast-dump -o - %s | FileCheck %s + +// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} {{.*}} struct MyBuffer definition +// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:51 h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +struct MyBuffer { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] h; +}; + +// CHECK: VarDecl 0x{{[0-9a-f]+}} col:49 res '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(SRV)]] +__hlsl_resource_t [[hlsl::resource_class(SRV)]] res; + +// CHECK: FunctionDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 3]]:6 f 'void () +// CHECK: VarDecl 0x{{[0-9a-f]+}} col:55 r '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(Sampler)]] +void f() { + __hlsl_resource_t [[hlsl::resource_class(Sampler)]] r; +} + +// CHECK: ClassTemplateDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 6]]:29 MyBuffer2 +// CHECK: TemplateTypeParmDecl 0x{{[0-9a-f]+}} col:19 typename depth 0 index 0 T +// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 4]]:29 struct MyBuffer2 definition +// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} col:29 implicit struct MyBuffer2 +// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:51 h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +template struct MyBuffer2 { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] h; +}; + +// CHECK: ClassTemplateSpecializationDecl 0x{{[0-9a-f]+}} line:[[# @LINE - 4]]:29 struct MyBuffer2 definition implicit_instantiation +// CHECK: TemplateArgument type 'float' +// CHECK: BuiltinType 0x{{[0-9a-f]+}} 'float' +// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} col:29 implicit struct MyBuffer2 +// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:51 h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +MyBuffer2 myBuffer2; diff --git a/clang/test/ParserHLSL/hlsl_resource_class_attr_error.hlsl b/clang/test/ParserHLSL/hlsl_resource_class_attr_error.hlsl index 63e39daff949b4..a0a4da1dc2bf44 100644 --- a/clang/test/ParserHLSL/hlsl_resource_class_attr_error.hlsl +++ b/clang/test/ParserHLSL/hlsl_resource_class_attr_error.hlsl @@ -1,22 +1,22 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -o - %s -verify - -// expected-error at +1{{'resource_class' attribute cannot be applied to a declaration}} -[[hlsl::resource_class(UAV)]] __hlsl_resource_t e0; - -// expected-error at +1{{'resource_class' attribute takes one argument}} -__hlsl_resource_t [[hlsl::resource_class()]] e1; - -// expected-warning at +1{{ResourceClass attribute argument not supported: gibberish}} -__hlsl_resource_t [[hlsl::resource_class(gibberish)]] e2; - -// expected-warning at +1{{attribute 'resource_class' is already applied with different arguments}} -__hlsl_resource_t [[hlsl::resource_class(SRV)]] [[hlsl::resource_class(UAV)]] e3; - -// expected-warning at +1{{attribute 'resource_class' is already applied}} -__hlsl_resource_t [[hlsl::resource_class(SRV)]] [[hlsl::resource_class(SRV)]] e4; - -// expected-error at +1{{'resource_class' attribute takes one argument}} -__hlsl_resource_t [[hlsl::resource_class(SRV, "aa")]] e5; - -// expected-error at +1{{attribute 'resource_class' can be used only on HLSL intangible type '__hlsl_resource_t'}} -float [[hlsl::resource_class(UAV)]] e6; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -o - %s -verify + +// expected-error at +1{{'resource_class' attribute cannot be applied to a declaration}} +[[hlsl::resource_class(UAV)]] __hlsl_resource_t e0; + +// expected-error at +1{{'resource_class' attribute takes one argument}} +__hlsl_resource_t [[hlsl::resource_class()]] e1; + +// expected-warning at +1{{ResourceClass attribute argument not supported: gibberish}} +__hlsl_resource_t [[hlsl::resource_class(gibberish)]] e2; + +// expected-warning at +1{{attribute 'resource_class' is already applied with different arguments}} +__hlsl_resource_t [[hlsl::resource_class(SRV)]] [[hlsl::resource_class(UAV)]] e3; + +// expected-warning at +1{{attribute 'resource_class' is already applied}} +__hlsl_resource_t [[hlsl::resource_class(SRV)]] [[hlsl::resource_class(SRV)]] e4; + +// expected-error at +1{{'resource_class' attribute takes one argument}} +__hlsl_resource_t [[hlsl::resource_class(SRV, "aa")]] e5; + +// expected-error at +1{{attribute 'resource_class' can be used only on HLSL intangible type '__hlsl_resource_t'}} +float [[hlsl::resource_class(UAV)]] e6; diff --git a/clang/test/ParserHLSL/hlsl_resource_handle_attrs.hlsl b/clang/test/ParserHLSL/hlsl_resource_handle_attrs.hlsl index 38d27bc21e4aa8..8885e39237357d 100644 --- a/clang/test/ParserHLSL/hlsl_resource_handle_attrs.hlsl +++ b/clang/test/ParserHLSL/hlsl_resource_handle_attrs.hlsl @@ -1,21 +1,21 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -ast-dump -o - %s | FileCheck %s - -// CHECK: -ClassTemplateSpecializationDecl 0x{{[0-9a-f]+}} <> class RWBuffer definition implicit_instantiation -// CHECK: -TemplateArgument type 'float' -// CHECK: `-BuiltinType 0x{{[0-9a-f]+}} 'float' -// CHECK: -FieldDecl 0x{{[0-9a-f]+}} <> implicit h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::contained_type(float)]] -// CHECK: -HLSLResourceAttr 0x{{[0-9a-f]+}} <> Implicit TypedBuffer -RWBuffer Buffer1; - -// CHECK: -ClassTemplateSpecializationDecl 0x{{[0-9a-f]+}} <> class RasterizerOrderedBuffer definition implicit_instantiation -// CHECK: -TemplateArgument type 'vector' -// CHECK: `-ExtVectorType 0x{{[0-9a-f]+}} 'vector' 4 -// CHECK: `-BuiltinType 0x{{[0-9a-f]+}} 'float' -// CHECK: -FieldDecl 0x{{[0-9a-f]+}} <> implicit h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)] -// CHECK-SAME{LITERAL}: [[hlsl::is_rov]] -// CHECK-SAME{LITERAL}: [[hlsl::contained_type(vector)]] -// CHECK: -HLSLResourceAttr 0x{{[0-9a-f]+}} <> Implicit TypedBuffer -RasterizerOrderedBuffer > BufferArray3[4]; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -ast-dump -o - %s | FileCheck %s + +// CHECK: -ClassTemplateSpecializationDecl 0x{{[0-9a-f]+}} <> class RWBuffer definition implicit_instantiation +// CHECK: -TemplateArgument type 'float' +// CHECK: `-BuiltinType 0x{{[0-9a-f]+}} 'float' +// CHECK: -FieldDecl 0x{{[0-9a-f]+}} <> implicit h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::contained_type(float)]] +// CHECK: -HLSLResourceAttr 0x{{[0-9a-f]+}} <> Implicit TypedBuffer +RWBuffer Buffer1; + +// CHECK: -ClassTemplateSpecializationDecl 0x{{[0-9a-f]+}} <> class RasterizerOrderedBuffer definition implicit_instantiation +// CHECK: -TemplateArgument type 'vector' +// CHECK: `-ExtVectorType 0x{{[0-9a-f]+}} 'vector' 4 +// CHECK: `-BuiltinType 0x{{[0-9a-f]+}} 'float' +// CHECK: -FieldDecl 0x{{[0-9a-f]+}} <> implicit h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)] +// CHECK-SAME{LITERAL}: [[hlsl::is_rov]] +// CHECK-SAME{LITERAL}: [[hlsl::contained_type(vector)]] +// CHECK: -HLSLResourceAttr 0x{{[0-9a-f]+}} <> Implicit TypedBuffer +RasterizerOrderedBuffer > BufferArray3[4]; diff --git a/clang/test/Sema/aarch64-sve-vector-trig-ops.c b/clang/test/Sema/aarch64-sve-vector-trig-ops.c index 3fe6834be2e0b7..f853abcd3379fa 100644 --- a/clang/test/Sema/aarch64-sve-vector-trig-ops.c +++ b/clang/test/Sema/aarch64-sve-vector-trig-ops.c @@ -1,65 +1,65 @@ -// RUN: %clang_cc1 -triple aarch64 -target-feature +sve \ -// RUN: -disable-O0-optnone -o - -fsyntax-only %s -verify -// REQUIRES: aarch64-registered-target - -#include - -svfloat32_t test_asin_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_asin(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -svfloat32_t test_acos_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_acos(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -svfloat32_t test_atan_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_atan(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -svfloat32_t test_atan2_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_atan2(v, v); - // expected-error at -1 {{1st argument must be a floating point type}} -} - -svfloat32_t test_sin_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_sin(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -svfloat32_t test_cos_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_cos(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -svfloat32_t test_tan_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_tan(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -svfloat32_t test_sinh_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_sinh(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -svfloat32_t test_cosh_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_cosh(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -svfloat32_t test_tanh_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_tanh(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} +// RUN: %clang_cc1 -triple aarch64 -target-feature +sve \ +// RUN: -disable-O0-optnone -o - -fsyntax-only %s -verify +// REQUIRES: aarch64-registered-target + +#include + +svfloat32_t test_asin_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_asin(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +svfloat32_t test_acos_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_acos(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +svfloat32_t test_atan_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_atan(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +svfloat32_t test_atan2_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_atan2(v, v); + // expected-error at -1 {{1st argument must be a floating point type}} +} + +svfloat32_t test_sin_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_sin(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +svfloat32_t test_cos_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_cos(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +svfloat32_t test_tan_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_tan(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +svfloat32_t test_sinh_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_sinh(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +svfloat32_t test_cosh_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_cosh(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +svfloat32_t test_tanh_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_tanh(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} diff --git a/clang/test/Sema/riscv-rvv-vector-trig-ops.c b/clang/test/Sema/riscv-rvv-vector-trig-ops.c index 0aed1b2a099865..006c136f80332c 100644 --- a/clang/test/Sema/riscv-rvv-vector-trig-ops.c +++ b/clang/test/Sema/riscv-rvv-vector-trig-ops.c @@ -1,67 +1,67 @@ -// RUN: %clang_cc1 -triple riscv64 -target-feature +f -target-feature +d \ -// RUN: -target-feature +v -target-feature +zfh -target-feature +zvfh \ -// RUN: -disable-O0-optnone -o - -fsyntax-only %s -verify -// REQUIRES: riscv-registered-target - -#include - -vfloat32mf2_t test_asin_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_asin(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} - } - - vfloat32mf2_t test_acos_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_acos(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} - } - - vfloat32mf2_t test_atan_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_atan(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} - } - -vfloat32mf2_t test_atan2_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_atan2(v, v); - // expected-error at -1 {{1st argument must be a floating point type}} -} - -vfloat32mf2_t test_sin_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_sin(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -vfloat32mf2_t test_cos_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_cos(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -vfloat32mf2_t test_tan_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_tan(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -vfloat32mf2_t test_sinh_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_sinh(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} - } - - vfloat32mf2_t test_cosh_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_cosh(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} - } - - vfloat32mf2_t test_tanh_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_tanh(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} - } - +// RUN: %clang_cc1 -triple riscv64 -target-feature +f -target-feature +d \ +// RUN: -target-feature +v -target-feature +zfh -target-feature +zvfh \ +// RUN: -disable-O0-optnone -o - -fsyntax-only %s -verify +// REQUIRES: riscv-registered-target + +#include + +vfloat32mf2_t test_asin_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_asin(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} + } + + vfloat32mf2_t test_acos_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_acos(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} + } + + vfloat32mf2_t test_atan_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_atan(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} + } + +vfloat32mf2_t test_atan2_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_atan2(v, v); + // expected-error at -1 {{1st argument must be a floating point type}} +} + +vfloat32mf2_t test_sin_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_sin(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +vfloat32mf2_t test_cos_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_cos(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +vfloat32mf2_t test_tan_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_tan(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +vfloat32mf2_t test_sinh_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_sinh(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} + } + + vfloat32mf2_t test_cosh_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_cosh(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} + } + + vfloat32mf2_t test_tanh_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_tanh(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} + } + diff --git a/clang/test/SemaHLSL/Availability/avail-diag-default-compute.hlsl b/clang/test/SemaHLSL/Availability/avail-diag-default-compute.hlsl index 764b9e843f7f1c..b60fba62bdb000 100644 --- a/clang/test/SemaHLSL/Availability/avail-diag-default-compute.hlsl +++ b/clang/test/SemaHLSL/Availability/avail-diag-default-compute.hlsl @@ -1,119 +1,119 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute \ -// RUN: -fsyntax-only -verify %s - -__attribute__((availability(shadermodel, introduced = 6.5))) -float fx(float); // #fx - -__attribute__((availability(shadermodel, introduced = 6.6))) -half fx(half); // #fx_half - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) -float fy(float); // #fy - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) -float fz(float); // #fz - -float also_alive(float f) { - // expected-error@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #also_alive_fx_call - // expected-error@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #also_alive_fy_call - // expected-error@#also_alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #also_alive_fz_call - return 0; -} - -float alive(float f) { - // expected-error@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #alive_fx_call - // expected-error@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #alive_fy_call - // expected-error@#alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #alive_fz_call - - return also_alive(f); -} - -float also_dead(float f) { - // unreachable code - no errors expected - float A = fx(f); - float B = fy(f); - float C = fz(f); - return 0; -} - -float dead(float f) { - // unreachable code - no errors expected - float A = fx(f); - float B = fy(f); - float C = fz(f); - - return also_dead(f); -} - -template -T aliveTemp(T f) { - // expected-error@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #aliveTemp_fx_call - // expected-error@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #aliveTemp_fy_call - // expected-error@#aliveTemp_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #aliveTemp_fz_call - return 0; -} - -template T aliveTemp2(T f) { - // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} - // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} - // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - return fx(f); // #aliveTemp2_fx_call -} - -half test(half x) { - return aliveTemp2(x); -} - -float test(float x) { - return aliveTemp2(x); -} - -class MyClass -{ - float F; - float makeF() { - // expected-error@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(F); // #MyClass_makeF_fx_call - // expected-error@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(F); // #MyClass_makeF_fy_call - // expected-error@#MyClass_makeF_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(F); // #MyClass_makeF_fz_call - return 0; - } -}; - -[numthreads(4,1,1)] -float main() { - float f = 3; - MyClass C = { 1.0f }; - float a = alive(f); - float b = aliveTemp(f); // #aliveTemp_inst - float c = C.makeF(); - float d = test((float)1.0); - float e = test((half)1.0); - return a * b * c; -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute \ +// RUN: -fsyntax-only -verify %s + +__attribute__((availability(shadermodel, introduced = 6.5))) +float fx(float); // #fx + +__attribute__((availability(shadermodel, introduced = 6.6))) +half fx(half); // #fx_half + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) +float fy(float); // #fy + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) +float fz(float); // #fz + +float also_alive(float f) { + // expected-error@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #also_alive_fx_call + // expected-error@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #also_alive_fy_call + // expected-error@#also_alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #also_alive_fz_call + return 0; +} + +float alive(float f) { + // expected-error@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #alive_fx_call + // expected-error@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #alive_fy_call + // expected-error@#alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #alive_fz_call + + return also_alive(f); +} + +float also_dead(float f) { + // unreachable code - no errors expected + float A = fx(f); + float B = fy(f); + float C = fz(f); + return 0; +} + +float dead(float f) { + // unreachable code - no errors expected + float A = fx(f); + float B = fy(f); + float C = fz(f); + + return also_dead(f); +} + +template +T aliveTemp(T f) { + // expected-error@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #aliveTemp_fx_call + // expected-error@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #aliveTemp_fy_call + // expected-error@#aliveTemp_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #aliveTemp_fz_call + return 0; +} + +template T aliveTemp2(T f) { + // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} + // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} + // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + return fx(f); // #aliveTemp2_fx_call +} + +half test(half x) { + return aliveTemp2(x); +} + +float test(float x) { + return aliveTemp2(x); +} + +class MyClass +{ + float F; + float makeF() { + // expected-error@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(F); // #MyClass_makeF_fx_call + // expected-error@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(F); // #MyClass_makeF_fy_call + // expected-error@#MyClass_makeF_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(F); // #MyClass_makeF_fz_call + return 0; + } +}; + +[numthreads(4,1,1)] +float main() { + float f = 3; + MyClass C = { 1.0f }; + float a = alive(f); + float b = aliveTemp(f); // #aliveTemp_inst + float c = C.makeF(); + float d = test((float)1.0); + float e = test((half)1.0); + return a * b * c; +} diff --git a/clang/test/SemaHLSL/Availability/avail-diag-default-lib.hlsl b/clang/test/SemaHLSL/Availability/avail-diag-default-lib.hlsl index 6bfc8577670cc7..35b7c384f26cdd 100644 --- a/clang/test/SemaHLSL/Availability/avail-diag-default-lib.hlsl +++ b/clang/test/SemaHLSL/Availability/avail-diag-default-lib.hlsl @@ -1,180 +1,180 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library \ -// RUN: -fsyntax-only -verify %s - -__attribute__((availability(shadermodel, introduced = 6.5))) -float fx(float); // #fx - -__attribute__((availability(shadermodel, introduced = 6.6))) -half fx(half); // #fx_half - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) -float fy(float); // #fy - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) -float fz(float); // #fz - -float also_alive(float f) { - // expected-error@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #also_alive_fx_call - - // expected-error@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #also_alive_fy_call - - // expected-error@#also_alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #also_alive_fz_call - - return 0; -} - -float alive(float f) { - // expected-error@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #alive_fx_call - - // expected-error@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #alive_fy_call - - // expected-error@#alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #alive_fz_call - - return also_alive(f); -} - -float also_dead(float f) { - // unreachable code - no errors expected - float A = fx(f); - float B = fy(f); - float C = fz(f); - return 0; -} - -float dead(float f) { - // unreachable code - no errors expected - float A = fx(f); - float B = fy(f); - float C = fz(f); - return also_dead(f); -} - -template -T aliveTemp(T f) { - // expected-error@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #aliveTemp_fx_call - // expected-error@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #aliveTemp_fy_call - // expected-error@#aliveTemp_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #aliveTemp_fz_call - return 0; -} - -template T aliveTemp2(T f) { - // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} - // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} - // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - return fx(f); // #aliveTemp2_fx_call -} - -half test(half x) { - return aliveTemp2(x); -} - -float test(float x) { - return aliveTemp2(x); -} - -class MyClass -{ - float F; - float makeF() { - // expected-error@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(F); // #MyClass_makeF_fx_call - // expected-error@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(F); // #MyClass_makeF_fy_call - // expected-error@#MyClass_makeF_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(F); // #MyClass_makeF_fz_call - return 0; - } -}; - -// Exported function without body, not used -export void exportedFunctionUnused(float f); - -// Exported function with body, without export, not used -void exportedFunctionUnused(float f) { - // expected-error@#exportedFunctionUnused_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #exportedFunctionUnused_fx_call - - // API with shader-stage-specific availability in unused exported library function - // - no errors expected because the actual shader stage this function - // will be used in not known at this time - float B = fy(f); - float C = fz(f); -} - -// Exported function with body - called from main() which is a compute shader entry point -export void exportedFunctionUsed(float f) { - // expected-error@#exportedFunctionUsed_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #exportedFunctionUsed_fx_call - - // expected-error@#exportedFunctionUsed_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #exportedFunctionUsed_fy_call - - // expected-error@#exportedFunctionUsed_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #exportedFunctionUsed_fz_call -} - -namespace A { - namespace B { - export { - void exportedFunctionInNS(float x) { - // expected-error@#exportedFunctionInNS_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(x); // #exportedFunctionInNS_fx_call - - // API with shader-stage-specific availability in exported library function - // - no errors expected because the actual shader stage this function - // will be used in not known at this time - float B = fy(x); - float C = fz(x); - } - } - } -} - -// Shader entry point without body -[shader("compute")] -[numthreads(4,1,1)] -float main(); - -// Shader entry point with body -[shader("compute")] -[numthreads(4,1,1)] -float main() { - float f = 3; - MyClass C = { 1.0f }; - float a = alive(f); - float b = aliveTemp(f); // #aliveTemp_inst - float c = C.makeF(); - float d = test((float)1.0); - float e = test((half)1.0); - exportedFunctionUsed(1.0f); - return a * b * c; -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library \ +// RUN: -fsyntax-only -verify %s + +__attribute__((availability(shadermodel, introduced = 6.5))) +float fx(float); // #fx + +__attribute__((availability(shadermodel, introduced = 6.6))) +half fx(half); // #fx_half + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) +float fy(float); // #fy + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) +float fz(float); // #fz + +float also_alive(float f) { + // expected-error@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #also_alive_fx_call + + // expected-error@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #also_alive_fy_call + + // expected-error@#also_alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #also_alive_fz_call + + return 0; +} + +float alive(float f) { + // expected-error@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #alive_fx_call + + // expected-error@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #alive_fy_call + + // expected-error@#alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #alive_fz_call + + return also_alive(f); +} + +float also_dead(float f) { + // unreachable code - no errors expected + float A = fx(f); + float B = fy(f); + float C = fz(f); + return 0; +} + +float dead(float f) { + // unreachable code - no errors expected + float A = fx(f); + float B = fy(f); + float C = fz(f); + return also_dead(f); +} + +template +T aliveTemp(T f) { + // expected-error@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #aliveTemp_fx_call + // expected-error@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #aliveTemp_fy_call + // expected-error@#aliveTemp_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #aliveTemp_fz_call + return 0; +} + +template T aliveTemp2(T f) { + // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} + // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} + // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + return fx(f); // #aliveTemp2_fx_call +} + +half test(half x) { + return aliveTemp2(x); +} + +float test(float x) { + return aliveTemp2(x); +} + +class MyClass +{ + float F; + float makeF() { + // expected-error@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(F); // #MyClass_makeF_fx_call + // expected-error@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(F); // #MyClass_makeF_fy_call + // expected-error@#MyClass_makeF_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(F); // #MyClass_makeF_fz_call + return 0; + } +}; + +// Exported function without body, not used +export void exportedFunctionUnused(float f); + +// Exported function with body, without export, not used +void exportedFunctionUnused(float f) { + // expected-error@#exportedFunctionUnused_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #exportedFunctionUnused_fx_call + + // API with shader-stage-specific availability in unused exported library function + // - no errors expected because the actual shader stage this function + // will be used in not known at this time + float B = fy(f); + float C = fz(f); +} + +// Exported function with body - called from main() which is a compute shader entry point +export void exportedFunctionUsed(float f) { + // expected-error@#exportedFunctionUsed_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #exportedFunctionUsed_fx_call + + // expected-error@#exportedFunctionUsed_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #exportedFunctionUsed_fy_call + + // expected-error@#exportedFunctionUsed_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #exportedFunctionUsed_fz_call +} + +namespace A { + namespace B { + export { + void exportedFunctionInNS(float x) { + // expected-error@#exportedFunctionInNS_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(x); // #exportedFunctionInNS_fx_call + + // API with shader-stage-specific availability in exported library function + // - no errors expected because the actual shader stage this function + // will be used in not known at this time + float B = fy(x); + float C = fz(x); + } + } + } +} + +// Shader entry point without body +[shader("compute")] +[numthreads(4,1,1)] +float main(); + +// Shader entry point with body +[shader("compute")] +[numthreads(4,1,1)] +float main() { + float f = 3; + MyClass C = { 1.0f }; + float a = alive(f); + float b = aliveTemp(f); // #aliveTemp_inst + float c = C.makeF(); + float d = test((float)1.0); + float e = test((half)1.0); + exportedFunctionUsed(1.0f); + return a * b * c; +} diff --git a/clang/test/SemaHLSL/Availability/avail-diag-relaxed-compute.hlsl b/clang/test/SemaHLSL/Availability/avail-diag-relaxed-compute.hlsl index 65836c55821d77..40687983839303 100644 --- a/clang/test/SemaHLSL/Availability/avail-diag-relaxed-compute.hlsl +++ b/clang/test/SemaHLSL/Availability/avail-diag-relaxed-compute.hlsl @@ -1,119 +1,119 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute \ -// RUN: -fsyntax-only -Wno-error=hlsl-availability -verify %s - -__attribute__((availability(shadermodel, introduced = 6.5))) -float fx(float); // #fx - -__attribute__((availability(shadermodel, introduced = 6.6))) -half fx(half); // #fx_half - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) -float fy(float); // #fy - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) -float fz(float); // #fz - -float also_alive(float f) { - // expected-warning@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #also_alive_fx_call - // expected-warning@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #also_alive_fy_call - // expected-warning@#also_alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #also_alive_fz_call - return 0; -} - -float alive(float f) { - // expected-warning@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #alive_fx_call - // expected-warning@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #alive_fy_call - // expected-warning@#alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #alive_fz_call - - return also_alive(f); -} - -float also_dead(float f) { - // unreachable code - no errors expected - float A = fx(f); - float B = fy(f); - float C = fz(f); - return 0; -} - -float dead(float f) { - // unreachable code - no errors expected - float A = fx(f); - float B = fy(f); - float C = fz(f); - - return also_dead(f); -} - -template -T aliveTemp(T f) { - // expected-warning@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #aliveTemp_fx_call - // expected-warning@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #aliveTemp_fy_call - // expected-warning@#aliveTemp_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #aliveTemp_fz_call - return 0; -} - -template T aliveTemp2(T f) { - // expected-warning@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} - // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} - // expected-warning@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - return fx(f); // #aliveTemp2_fx_call -} - -half test(half x) { - return aliveTemp2(x); -} - -float test(float x) { - return aliveTemp2(x); -} - -class MyClass -{ - float F; - float makeF() { - // expected-warning@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(F); // #MyClass_makeF_fx_call - // expected-warning@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(F); // #MyClass_makeF_fy_call - // expected-warning@#MyClass_makeF_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(F); // #MyClass_makeF_fz_call - return 0; - } -}; - -[numthreads(4,1,1)] -float main() { - float f = 3; - MyClass C = { 1.0f }; - float a = alive(f); - float b = aliveTemp(f); // #aliveTemp_inst - float c = C.makeF(); - float d = test((float)1.0); - float e = test((half)1.0); - return a * b * c; -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute \ +// RUN: -fsyntax-only -Wno-error=hlsl-availability -verify %s + +__attribute__((availability(shadermodel, introduced = 6.5))) +float fx(float); // #fx + +__attribute__((availability(shadermodel, introduced = 6.6))) +half fx(half); // #fx_half + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) +float fy(float); // #fy + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) +float fz(float); // #fz + +float also_alive(float f) { + // expected-warning@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #also_alive_fx_call + // expected-warning@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #also_alive_fy_call + // expected-warning@#also_alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #also_alive_fz_call + return 0; +} + +float alive(float f) { + // expected-warning@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #alive_fx_call + // expected-warning@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #alive_fy_call + // expected-warning@#alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #alive_fz_call + + return also_alive(f); +} + +float also_dead(float f) { + // unreachable code - no errors expected + float A = fx(f); + float B = fy(f); + float C = fz(f); + return 0; +} + +float dead(float f) { + // unreachable code - no errors expected + float A = fx(f); + float B = fy(f); + float C = fz(f); + + return also_dead(f); +} + +template +T aliveTemp(T f) { + // expected-warning@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #aliveTemp_fx_call + // expected-warning@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #aliveTemp_fy_call + // expected-warning@#aliveTemp_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #aliveTemp_fz_call + return 0; +} + +template T aliveTemp2(T f) { + // expected-warning@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} + // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} + // expected-warning@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + return fx(f); // #aliveTemp2_fx_call +} + +half test(half x) { + return aliveTemp2(x); +} + +float test(float x) { + return aliveTemp2(x); +} + +class MyClass +{ + float F; + float makeF() { + // expected-warning@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(F); // #MyClass_makeF_fx_call + // expected-warning@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(F); // #MyClass_makeF_fy_call + // expected-warning@#MyClass_makeF_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(F); // #MyClass_makeF_fz_call + return 0; + } +}; + +[numthreads(4,1,1)] +float main() { + float f = 3; + MyClass C = { 1.0f }; + float a = alive(f); + float b = aliveTemp(f); // #aliveTemp_inst + float c = C.makeF(); + float d = test((float)1.0); + float e = test((half)1.0); + return a * b * c; +} diff --git a/clang/test/SemaHLSL/Availability/avail-diag-relaxed-lib.hlsl b/clang/test/SemaHLSL/Availability/avail-diag-relaxed-lib.hlsl index 4c9783138f6701..a23e91a546b167 100644 --- a/clang/test/SemaHLSL/Availability/avail-diag-relaxed-lib.hlsl +++ b/clang/test/SemaHLSL/Availability/avail-diag-relaxed-lib.hlsl @@ -1,162 +1,162 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library \ -// RUN: -fsyntax-only -Wno-error=hlsl-availability -verify %s - -__attribute__((availability(shadermodel, introduced = 6.5))) -float fx(float); // #fx - -__attribute__((availability(shadermodel, introduced = 6.6))) -half fx(half); // #fx_half - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) -float fy(float); // #fy - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) -float fz(float); // #fz - -float also_alive(float f) { - // expected-warning@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #also_alive_fx_call - - // expected-warning@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #also_alive_fy_call - - // expected-warning@#also_alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #also_alive_fz_call - - return 0; -} - -float alive(float f) { - // expected-warning@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #alive_fx_call - - // expected-warning@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #alive_fy_call - - // expected-warning@#alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #alive_fz_call - - return also_alive(f); -} - -float also_dead(float f) { - // unreachable code - no errors expected - float A = fx(f); - float B = fy(f); - float C = fz(f); - return 0; -} - -float dead(float f) { - // unreachable code - no errors expected - float A = fx(f); - float B = fy(f); - float C = fz(f); - return also_dead(f); -} - -template -T aliveTemp(T f) { - // expected-warning@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #aliveTemp_fx_call - // expected-warning@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #aliveTemp_fy_call - // expected-warning@#aliveTemp_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #aliveTemp_fz_call - return 0; -} - -template T aliveTemp2(T f) { - // expected-warning@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} - // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} - // expected-warning@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - return fx(f); // #aliveTemp2_fx_call -} - -half test(half x) { - return aliveTemp2(x); -} - -float test(float x) { - return aliveTemp2(x); -} - -class MyClass -{ - float F; - float makeF() { - // expected-warning@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(F); // #MyClass_makeF_fx_call - // expected-warning@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(F); // #MyClass_makeF_fy_call - // expected-warning@#MyClass_makeF_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(F); // #MyClass_makeF_fz_call - return 0; - } -}; - -// Exported function without body, not used -export void exportedFunctionUnused(float f); - -// Exported function with body, without export, not used -void exportedFunctionUnused(float f) { - // expected-warning@#exportedFunctionUnused_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #exportedFunctionUnused_fx_call - - // API with shader-stage-specific availability in unused exported library function - // - no errors expected because the actual shader stage this function - // will be used in not known at this time - float B = fy(f); - float C = fz(f); -} - -// Exported function with body - called from main() which is a compute shader entry point -export void exportedFunctionUsed(float f) { - // expected-warning@#exportedFunctionUsed_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #exportedFunctionUsed_fx_call - - // expected-warning@#exportedFunctionUsed_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #exportedFunctionUsed_fy_call - - // expected-warning@#exportedFunctionUsed_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #exportedFunctionUsed_fz_call -} - -// Shader entry point without body -[shader("compute")] -[numthreads(4,1,1)] -float main(); - -// Shader entry point with body -[shader("compute")] -[numthreads(4,1,1)] -float main() { - float f = 3; - MyClass C = { 1.0f }; - float a = alive(f); - float b = aliveTemp(f); // #aliveTemp_inst - float c = C.makeF(); - float d = test((float)1.0); - float e = test((half)1.0); - exportedFunctionUsed(1.0f); - return a * b * c; -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library \ +// RUN: -fsyntax-only -Wno-error=hlsl-availability -verify %s + +__attribute__((availability(shadermodel, introduced = 6.5))) +float fx(float); // #fx + +__attribute__((availability(shadermodel, introduced = 6.6))) +half fx(half); // #fx_half + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) +float fy(float); // #fy + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) +float fz(float); // #fz + +float also_alive(float f) { + // expected-warning@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #also_alive_fx_call + + // expected-warning@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #also_alive_fy_call + + // expected-warning@#also_alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #also_alive_fz_call + + return 0; +} + +float alive(float f) { + // expected-warning@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #alive_fx_call + + // expected-warning@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #alive_fy_call + + // expected-warning@#alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #alive_fz_call + + return also_alive(f); +} + +float also_dead(float f) { + // unreachable code - no errors expected + float A = fx(f); + float B = fy(f); + float C = fz(f); + return 0; +} + +float dead(float f) { + // unreachable code - no errors expected + float A = fx(f); + float B = fy(f); + float C = fz(f); + return also_dead(f); +} + +template +T aliveTemp(T f) { + // expected-warning@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #aliveTemp_fx_call + // expected-warning@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #aliveTemp_fy_call + // expected-warning@#aliveTemp_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #aliveTemp_fz_call + return 0; +} + +template T aliveTemp2(T f) { + // expected-warning@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} + // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} + // expected-warning@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + return fx(f); // #aliveTemp2_fx_call +} + +half test(half x) { + return aliveTemp2(x); +} + +float test(float x) { + return aliveTemp2(x); +} + +class MyClass +{ + float F; + float makeF() { + // expected-warning@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(F); // #MyClass_makeF_fx_call + // expected-warning@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(F); // #MyClass_makeF_fy_call + // expected-warning@#MyClass_makeF_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(F); // #MyClass_makeF_fz_call + return 0; + } +}; + +// Exported function without body, not used +export void exportedFunctionUnused(float f); + +// Exported function with body, without export, not used +void exportedFunctionUnused(float f) { + // expected-warning@#exportedFunctionUnused_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #exportedFunctionUnused_fx_call + + // API with shader-stage-specific availability in unused exported library function + // - no errors expected because the actual shader stage this function + // will be used in not known at this time + float B = fy(f); + float C = fz(f); +} + +// Exported function with body - called from main() which is a compute shader entry point +export void exportedFunctionUsed(float f) { + // expected-warning@#exportedFunctionUsed_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #exportedFunctionUsed_fx_call + + // expected-warning@#exportedFunctionUsed_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #exportedFunctionUsed_fy_call + + // expected-warning@#exportedFunctionUsed_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #exportedFunctionUsed_fz_call +} + +// Shader entry point without body +[shader("compute")] +[numthreads(4,1,1)] +float main(); + +// Shader entry point with body +[shader("compute")] +[numthreads(4,1,1)] +float main() { + float f = 3; + MyClass C = { 1.0f }; + float a = alive(f); + float b = aliveTemp(f); // #aliveTemp_inst + float c = C.makeF(); + float d = test((float)1.0); + float e = test((half)1.0); + exportedFunctionUsed(1.0f); + return a * b * c; +} diff --git a/clang/test/SemaHLSL/Availability/avail-diag-strict-compute.hlsl b/clang/test/SemaHLSL/Availability/avail-diag-strict-compute.hlsl index b67e10c9a9017a..a8783c10cbabca 100644 --- a/clang/test/SemaHLSL/Availability/avail-diag-strict-compute.hlsl +++ b/clang/test/SemaHLSL/Availability/avail-diag-strict-compute.hlsl @@ -1,129 +1,129 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute \ -// RUN: -fhlsl-strict-availability -fsyntax-only -verify %s - -__attribute__((availability(shadermodel, introduced = 6.5))) -float fx(float); // #fx - -__attribute__((availability(shadermodel, introduced = 6.6))) -half fx(half); // #fx_half - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) -float fy(float); // #fy - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) -float fz(float); // #fz - -float also_alive(float f) { - // expected-error@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #also_alive_fx_call - // expected-error@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #also_alive_fy_call - // expected-error@#also_alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #also_alive_fz_call - return 0; -} - -float alive(float f) { - // expected-error@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #alive_fx_call - // expected-error@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #alive_fy_call - // expected-error@#alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #alive_fz_call - - return also_alive(f); -} - -float also_dead(float f) { - // expected-error@#also_dead_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #also_dead_fx_call - // expected-error@#also_dead_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #also_dead_fy_call - // expected-error@#also_dead_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #also_dead_fz_call - return 0; -} - -float dead(float f) { - // expected-error@#dead_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #dead_fx_call - // expected-error@#dead_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #dead_fy_call - // expected-error@#dead_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #dead_fz_call - - return also_dead(f); -} - -template -T aliveTemp(T f) { - // expected-error@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#aliveTemp_inst {{in instantiation of function template specialization 'aliveTemp' requested here}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #aliveTemp_fx_call - // expected-error@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #aliveTemp_fy_call - // expected-error@#aliveTemp_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #aliveTemp_fz_call - return 0; -} - -template T aliveTemp2(T f) { - // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} - // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} - // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - return fx(f); // #aliveTemp2_fx_call -} - -half test(half x) { - return aliveTemp2(x); // expected-note {{in instantiation of function template specialization 'aliveTemp2' requested here}} -} - -float test(float x) { - return aliveTemp2(x); // expected-note {{in instantiation of function template specialization 'aliveTemp2' requested here}} -} - -class MyClass -{ - float F; - float makeF() { - // expected-error@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(F); // #MyClass_makeF_fx_call - // expected-error@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(F); // #MyClass_makeF_fy_call - // expected-error@#MyClass_makeF_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(F); // #MyClass_makeF_fz_call - } -}; - -[numthreads(4,1,1)] -float main() { - float f = 3; - MyClass C = { 1.0f }; - float a = alive(f); - float b = aliveTemp(f); // #aliveTemp_inst - float c = C.makeF(); - float d = test((float)1.0); - float e = test((half)1.0); - return a * b * c; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute \ +// RUN: -fhlsl-strict-availability -fsyntax-only -verify %s + +__attribute__((availability(shadermodel, introduced = 6.5))) +float fx(float); // #fx + +__attribute__((availability(shadermodel, introduced = 6.6))) +half fx(half); // #fx_half + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) +float fy(float); // #fy + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) +float fz(float); // #fz + +float also_alive(float f) { + // expected-error@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #also_alive_fx_call + // expected-error@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #also_alive_fy_call + // expected-error@#also_alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #also_alive_fz_call + return 0; +} + +float alive(float f) { + // expected-error@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #alive_fx_call + // expected-error@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #alive_fy_call + // expected-error@#alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #alive_fz_call + + return also_alive(f); +} + +float also_dead(float f) { + // expected-error@#also_dead_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #also_dead_fx_call + // expected-error@#also_dead_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #also_dead_fy_call + // expected-error@#also_dead_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #also_dead_fz_call + return 0; +} + +float dead(float f) { + // expected-error@#dead_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #dead_fx_call + // expected-error@#dead_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #dead_fy_call + // expected-error@#dead_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #dead_fz_call + + return also_dead(f); +} + +template +T aliveTemp(T f) { + // expected-error@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#aliveTemp_inst {{in instantiation of function template specialization 'aliveTemp' requested here}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #aliveTemp_fx_call + // expected-error@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #aliveTemp_fy_call + // expected-error@#aliveTemp_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #aliveTemp_fz_call + return 0; +} + +template T aliveTemp2(T f) { + // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} + // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} + // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + return fx(f); // #aliveTemp2_fx_call +} + +half test(half x) { + return aliveTemp2(x); // expected-note {{in instantiation of function template specialization 'aliveTemp2' requested here}} +} + +float test(float x) { + return aliveTemp2(x); // expected-note {{in instantiation of function template specialization 'aliveTemp2' requested here}} +} + +class MyClass +{ + float F; + float makeF() { + // expected-error@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(F); // #MyClass_makeF_fx_call + // expected-error@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(F); // #MyClass_makeF_fy_call + // expected-error@#MyClass_makeF_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(F); // #MyClass_makeF_fz_call + } +}; + +[numthreads(4,1,1)] +float main() { + float f = 3; + MyClass C = { 1.0f }; + float a = alive(f); + float b = aliveTemp(f); // #aliveTemp_inst + float c = C.makeF(); + float d = test((float)1.0); + float e = test((half)1.0); + return a * b * c; } \ No newline at end of file diff --git a/clang/test/SemaHLSL/Availability/avail-diag-strict-lib.hlsl b/clang/test/SemaHLSL/Availability/avail-diag-strict-lib.hlsl index c7be5afbc2d22f..0fffbc96dac194 100644 --- a/clang/test/SemaHLSL/Availability/avail-diag-strict-lib.hlsl +++ b/clang/test/SemaHLSL/Availability/avail-diag-strict-lib.hlsl @@ -1,192 +1,192 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library \ -// RUN: -fhlsl-strict-availability -fsyntax-only -verify %s - -__attribute__((availability(shadermodel, introduced = 6.5))) -float fx(float); // #fx - -__attribute__((availability(shadermodel, introduced = 6.6))) -half fx(half); // #fx_half - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) -float fy(float); // #fy - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) -float fz(float); // #fz - -// FIXME: all diagnostics marked as FUTURE will come alive when HLSL default -// diagnostic mode is implemented in a future PR which will verify calls in -// all functions that are reachable from the shader library entry points - -float also_alive(float f) { - // expected-error@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #also_alive_fx_call - - // expected-error@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #also_alive_fy_call - - // expected-error@#also_alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #also_alive_fz_call - - return 0; -} - -float alive(float f) { - // expected-error@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #alive_fx_call - - // expected-error@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #alive_fy_call - - // expected-error@#alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #alive_fz_call - - return also_alive(f); -} - -float also_dead(float f) { - // expected-error@#also_dead_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #also_dead_fx_call - - // Call to environment-specific function from an unreachable function - // in a shader library - no diagnostic expected. - float B = fy(f); // #also_dead_fy_call - - // Call to environment-specific function from an unreachable function - // in a shader library - no diagnostic expected. - float C = fz(f); // #also_dead_fz_call - return 0; -} - -float dead(float f) { - // expected-error@#dead_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #dead_fx_call - - // Call to environment-specific function from an unreachable function - // in a shader library - no diagnostic expected. - float B = fy(f); // #dead_fy_call - - // Call to environment-specific function from an unreachable function - // in a shader library - no diagnostic expected. - float C = fz(f); // #dead_fz_call - - return also_dead(f); -} - -template -T aliveTemp(T f) { - // expected-error@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#aliveTemp_inst {{in instantiation of function template specialization 'aliveTemp' requested here}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #aliveTemp_fx_call - // expected-error@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #aliveTemp_fy_call - // expected-error@#aliveTemp_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #aliveTemp_fz_call - return 0; -} - -template T aliveTemp2(T f) { - // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} - // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} - // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - return fx(f); // #aliveTemp2_fx_call -} - -half test(half x) { - return aliveTemp2(x); // expected-note {{in instantiation of function template specialization 'aliveTemp2' requested here}} -} - -float test(float x) { - return aliveTemp2(x); // expected-note {{in instantiation of function template specialization 'aliveTemp2' requested here}} -} - -class MyClass -{ - float F; - float makeF() { - // expected-error@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(F); // #MyClass_makeF_fx_call - // expected-error@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(F); // #MyClass_makeF_fy_call - // expected-error@#MyClass_makeF_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(F); // #MyClass_makeF_fz_call - } -}; - -// Exported function without body, not used -export void exportedFunctionUnused(float f); - -// Exported function with body, without export, not used -void exportedFunctionUnused(float f) { - // expected-error@#exportedFunctionUnused_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #exportedFunctionUnused_fx_call - - // API with shader-stage-specific availability in unused exported library function - // - no errors expected because the actual shader stage this function - // will be used in not known at this time - float B = fy(f); - float C = fz(f); -} - -// Exported function with body - called from main() which is a compute shader entry point -export void exportedFunctionUsed(float f) { - // expected-error@#exportedFunctionUsed_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #exportedFunctionUsed_fx_call - - // expected-error@#exportedFunctionUsed_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #exportedFunctionUsed_fy_call - - // expected-error@#exportedFunctionUsed_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #exportedFunctionUsed_fz_call -} - -namespace A { - namespace B { - export { - void exportedFunctionInNS(float x) { - // expected-error@#exportedFunctionInNS_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(x); // #exportedFunctionInNS_fx_call - - // API with shader-stage-specific availability in exported library function - // - no errors expected because the actual shader stage this function - // will be used in not known at this time - float B = fy(x); - float C = fz(x); - } - } - } -} - -[shader("compute")] -[numthreads(4,1,1)] -float main() { - float f = 3; - MyClass C = { 1.0f }; - float a = alive(f);float b = aliveTemp(f); // #aliveTemp_inst - float c = C.makeF(); - float d = test((float)1.0); - float e = test((half)1.0); - exportedFunctionUsed(1.0f); - return a * b * c; -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library \ +// RUN: -fhlsl-strict-availability -fsyntax-only -verify %s + +__attribute__((availability(shadermodel, introduced = 6.5))) +float fx(float); // #fx + +__attribute__((availability(shadermodel, introduced = 6.6))) +half fx(half); // #fx_half + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) +float fy(float); // #fy + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) +float fz(float); // #fz + +// FIXME: all diagnostics marked as FUTURE will come alive when HLSL default +// diagnostic mode is implemented in a future PR which will verify calls in +// all functions that are reachable from the shader library entry points + +float also_alive(float f) { + // expected-error@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #also_alive_fx_call + + // expected-error@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #also_alive_fy_call + + // expected-error@#also_alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #also_alive_fz_call + + return 0; +} + +float alive(float f) { + // expected-error@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #alive_fx_call + + // expected-error@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #alive_fy_call + + // expected-error@#alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #alive_fz_call + + return also_alive(f); +} + +float also_dead(float f) { + // expected-error@#also_dead_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #also_dead_fx_call + + // Call to environment-specific function from an unreachable function + // in a shader library - no diagnostic expected. + float B = fy(f); // #also_dead_fy_call + + // Call to environment-specific function from an unreachable function + // in a shader library - no diagnostic expected. + float C = fz(f); // #also_dead_fz_call + return 0; +} + +float dead(float f) { + // expected-error@#dead_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #dead_fx_call + + // Call to environment-specific function from an unreachable function + // in a shader library - no diagnostic expected. + float B = fy(f); // #dead_fy_call + + // Call to environment-specific function from an unreachable function + // in a shader library - no diagnostic expected. + float C = fz(f); // #dead_fz_call + + return also_dead(f); +} + +template +T aliveTemp(T f) { + // expected-error@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#aliveTemp_inst {{in instantiation of function template specialization 'aliveTemp' requested here}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #aliveTemp_fx_call + // expected-error@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #aliveTemp_fy_call + // expected-error@#aliveTemp_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #aliveTemp_fz_call + return 0; +} + +template T aliveTemp2(T f) { + // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} + // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} + // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + return fx(f); // #aliveTemp2_fx_call +} + +half test(half x) { + return aliveTemp2(x); // expected-note {{in instantiation of function template specialization 'aliveTemp2' requested here}} +} + +float test(float x) { + return aliveTemp2(x); // expected-note {{in instantiation of function template specialization 'aliveTemp2' requested here}} +} + +class MyClass +{ + float F; + float makeF() { + // expected-error@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(F); // #MyClass_makeF_fx_call + // expected-error@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(F); // #MyClass_makeF_fy_call + // expected-error@#MyClass_makeF_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(F); // #MyClass_makeF_fz_call + } +}; + +// Exported function without body, not used +export void exportedFunctionUnused(float f); + +// Exported function with body, without export, not used +void exportedFunctionUnused(float f) { + // expected-error@#exportedFunctionUnused_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #exportedFunctionUnused_fx_call + + // API with shader-stage-specific availability in unused exported library function + // - no errors expected because the actual shader stage this function + // will be used in not known at this time + float B = fy(f); + float C = fz(f); +} + +// Exported function with body - called from main() which is a compute shader entry point +export void exportedFunctionUsed(float f) { + // expected-error@#exportedFunctionUsed_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #exportedFunctionUsed_fx_call + + // expected-error@#exportedFunctionUsed_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #exportedFunctionUsed_fy_call + + // expected-error@#exportedFunctionUsed_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #exportedFunctionUsed_fz_call +} + +namespace A { + namespace B { + export { + void exportedFunctionInNS(float x) { + // expected-error@#exportedFunctionInNS_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(x); // #exportedFunctionInNS_fx_call + + // API with shader-stage-specific availability in exported library function + // - no errors expected because the actual shader stage this function + // will be used in not known at this time + float B = fy(x); + float C = fz(x); + } + } + } +} + +[shader("compute")] +[numthreads(4,1,1)] +float main() { + float f = 3; + MyClass C = { 1.0f }; + float a = alive(f);float b = aliveTemp(f); // #aliveTemp_inst + float c = C.makeF(); + float d = test((float)1.0); + float e = test((half)1.0); + exportedFunctionUsed(1.0f); + return a * b * c; +} diff --git a/clang/test/SemaHLSL/Availability/avail-lib-multiple-stages.hlsl b/clang/test/SemaHLSL/Availability/avail-lib-multiple-stages.hlsl index b56ab8fe4526ba..bfefc9b116a64f 100644 --- a/clang/test/SemaHLSL/Availability/avail-lib-multiple-stages.hlsl +++ b/clang/test/SemaHLSL/Availability/avail-lib-multiple-stages.hlsl @@ -1,57 +1,57 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library \ -// RUN: -fsyntax-only -verify %s - -__attribute__((availability(shadermodel, introduced = 6.5))) -float fx(float); // #fx - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) -float fy(float); // #fy - -__attribute__((availability(shadermodel, introduced = 5.0, environment = compute))) -float fz(float); // #fz - - -void F(float f) { - // Make sure we only get this error once, even though this function is scanned twice - once - // in compute shader context and once in pixel shader context. - // expected-error@#fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #fx_call - - // expected-error@#fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #fy_call - - // expected-error@#fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 5.0 in compute environment here, but the deployment target is Shader Model 6.0 pixel environment}} - float X = fz(f); // #fz_call -} - -void deadCode(float f) { - // no diagnostics expected under default diagnostic mode - float A = fx(f); - float B = fy(f); - float X = fz(f); -} - -// Pixel shader -[shader("pixel")] -void mainPixel() { - F(1.0); -} - -// First Compute shader -[shader("compute")] -[numthreads(4,1,1)] -void mainCompute1() { - F(2.0); -} - -// Second compute shader to make sure we do not get duplicate messages if F is called -// from multiple entry points. -[shader("compute")] -[numthreads(4,1,1)] -void mainCompute2() { - F(3.0); -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library \ +// RUN: -fsyntax-only -verify %s + +__attribute__((availability(shadermodel, introduced = 6.5))) +float fx(float); // #fx + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) +float fy(float); // #fy + +__attribute__((availability(shadermodel, introduced = 5.0, environment = compute))) +float fz(float); // #fz + + +void F(float f) { + // Make sure we only get this error once, even though this function is scanned twice - once + // in compute shader context and once in pixel shader context. + // expected-error@#fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #fx_call + + // expected-error@#fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #fy_call + + // expected-error@#fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 5.0 in compute environment here, but the deployment target is Shader Model 6.0 pixel environment}} + float X = fz(f); // #fz_call +} + +void deadCode(float f) { + // no diagnostics expected under default diagnostic mode + float A = fx(f); + float B = fy(f); + float X = fz(f); +} + +// Pixel shader +[shader("pixel")] +void mainPixel() { + F(1.0); +} + +// First Compute shader +[shader("compute")] +[numthreads(4,1,1)] +void mainCompute1() { + F(2.0); +} + +// Second compute shader to make sure we do not get duplicate messages if F is called +// from multiple entry points. +[shader("compute")] +[numthreads(4,1,1)] +void mainCompute2() { + F(3.0); +} diff --git a/clang/test/SemaHLSL/BuiltIns/StructuredBuffers.hlsl b/clang/test/SemaHLSL/BuiltIns/StructuredBuffers.hlsl index a472d5519dc51f..1ec56542113d90 100644 --- a/clang/test/SemaHLSL/BuiltIns/StructuredBuffers.hlsl +++ b/clang/test/SemaHLSL/BuiltIns/StructuredBuffers.hlsl @@ -1,19 +1,19 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -fsyntax-only -verify %s - -typedef vector float3; - -StructuredBuffer Buffer; - -// expected-error at +2 {{class template 'StructuredBuffer' requires template arguments}} -// expected-note@*:* {{template declaration from hidden source: template class StructuredBuffer}} -StructuredBuffer BufferErr1; - -// expected-error at +2 {{too few template arguments for class template 'StructuredBuffer'}} -// expected-note@*:* {{template declaration from hidden source: template class StructuredBuffer}} -StructuredBuffer<> BufferErr2; - -[numthreads(1,1,1)] -void main() { - (void)Buffer.h; // expected-error {{'h' is a private member of 'hlsl::StructuredBuffer>'}} - // expected-note@* {{implicitly declared private here}} -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -fsyntax-only -verify %s + +typedef vector float3; + +StructuredBuffer Buffer; + +// expected-error at +2 {{class template 'StructuredBuffer' requires template arguments}} +// expected-note@*:* {{template declaration from hidden source: template class StructuredBuffer}} +StructuredBuffer BufferErr1; + +// expected-error at +2 {{too few template arguments for class template 'StructuredBuffer'}} +// expected-note@*:* {{template declaration from hidden source: template class StructuredBuffer}} +StructuredBuffer<> BufferErr2; + +[numthreads(1,1,1)] +void main() { + (void)Buffer.h; // expected-error {{'h' is a private member of 'hlsl::StructuredBuffer>'}} + // expected-note@* {{implicitly declared private here}} +} diff --git a/clang/test/SemaHLSL/BuiltIns/cross-errors.hlsl b/clang/test/SemaHLSL/BuiltIns/cross-errors.hlsl index 423f5bac9471f4..354e7abb8a31eb 100644 --- a/clang/test/SemaHLSL/BuiltIns/cross-errors.hlsl +++ b/clang/test/SemaHLSL/BuiltIns/cross-errors.hlsl @@ -1,43 +1,43 @@ -// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -disable-llvm-passes -verify - -void test_too_few_arg() -{ - return __builtin_hlsl_cross(); - // expected-error at -1 {{too few arguments to function call, expected 2, have 0}} -} - -void test_too_many_arg(float3 p0) -{ - return __builtin_hlsl_cross(p0, p0, p0); - // expected-error at -1 {{too many arguments to function call, expected 2, have 3}} -} - -bool builtin_bool_to_float_type_promotion(bool p1) -{ - return __builtin_hlsl_cross(p1, p1); - // expected-error at -1 {{passing 'bool' to parameter of incompatible type 'float'}} -} - -bool builtin_cross_int_to_float_promotion(int p1) -{ - return __builtin_hlsl_cross(p1, p1); - // expected-error at -1 {{passing 'int' to parameter of incompatible type 'float'}} -} - -bool2 builtin_cross_int2_to_float2_promotion(int2 p1) -{ - return __builtin_hlsl_cross(p1, p1); - // expected-error at -1 {{passing 'int2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} -} - -float2 builtin_cross_float2(float2 p1, float2 p2) -{ - return __builtin_hlsl_cross(p1, p2); - // expected-error at -1 {{too many elements in vector operand (expected 3 elements, have 2)}} -} - -float3 builtin_cross_float3_int3(float3 p1, int3 p2) -{ - return __builtin_hlsl_cross(p1, p2); - // expected-error at -1 {{all arguments to '__builtin_hlsl_cross' must have the same type}} -} +// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -disable-llvm-passes -verify + +void test_too_few_arg() +{ + return __builtin_hlsl_cross(); + // expected-error at -1 {{too few arguments to function call, expected 2, have 0}} +} + +void test_too_many_arg(float3 p0) +{ + return __builtin_hlsl_cross(p0, p0, p0); + // expected-error at -1 {{too many arguments to function call, expected 2, have 3}} +} + +bool builtin_bool_to_float_type_promotion(bool p1) +{ + return __builtin_hlsl_cross(p1, p1); + // expected-error at -1 {{passing 'bool' to parameter of incompatible type 'float'}} +} + +bool builtin_cross_int_to_float_promotion(int p1) +{ + return __builtin_hlsl_cross(p1, p1); + // expected-error at -1 {{passing 'int' to parameter of incompatible type 'float'}} +} + +bool2 builtin_cross_int2_to_float2_promotion(int2 p1) +{ + return __builtin_hlsl_cross(p1, p1); + // expected-error at -1 {{passing 'int2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} +} + +float2 builtin_cross_float2(float2 p1, float2 p2) +{ + return __builtin_hlsl_cross(p1, p2); + // expected-error at -1 {{too many elements in vector operand (expected 3 elements, have 2)}} +} + +float3 builtin_cross_float3_int3(float3 p1, int3 p2) +{ + return __builtin_hlsl_cross(p1, p2); + // expected-error at -1 {{all arguments to '__builtin_hlsl_cross' must have the same type}} +} diff --git a/clang/test/SemaHLSL/BuiltIns/half-float-only-errors2.hlsl b/clang/test/SemaHLSL/BuiltIns/half-float-only-errors2.hlsl index bfbd8b28257a3b..b876a8e84cb3ac 100644 --- a/clang/test/SemaHLSL/BuiltIns/half-float-only-errors2.hlsl +++ b/clang/test/SemaHLSL/BuiltIns/half-float-only-errors2.hlsl @@ -1,13 +1,13 @@ -// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -emit-llvm-only -disable-llvm-passes -verify -DTEST_FUNC=__builtin_elementwise_atan2 -// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -emit-llvm-only -disable-llvm-passes -verify -DTEST_FUNC=__builtin_elementwise_fmod -// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -emit-llvm-only -disable-llvm-passes -verify -DTEST_FUNC=__builtin_elementwise_pow - -double test_double_builtin(double p0, double p1) { - return TEST_FUNC(p0, p1); - // expected-error at -1 {{passing 'double' to parameter of incompatible type 'float'}} -} - -double2 test_vec_double_builtin(double2 p0, double2 p1) { - return TEST_FUNC(p0, p1); - // expected-error at -1 {{passing 'double2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} -} +// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -emit-llvm-only -disable-llvm-passes -verify -DTEST_FUNC=__builtin_elementwise_atan2 +// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -emit-llvm-only -disable-llvm-passes -verify -DTEST_FUNC=__builtin_elementwise_fmod +// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -emit-llvm-only -disable-llvm-passes -verify -DTEST_FUNC=__builtin_elementwise_pow + +double test_double_builtin(double p0, double p1) { + return TEST_FUNC(p0, p1); + // expected-error at -1 {{passing 'double' to parameter of incompatible type 'float'}} +} + +double2 test_vec_double_builtin(double2 p0, double2 p1) { + return TEST_FUNC(p0, p1); + // expected-error at -1 {{passing 'double2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} +} diff --git a/clang/test/SemaHLSL/BuiltIns/length-errors.hlsl b/clang/test/SemaHLSL/BuiltIns/length-errors.hlsl index 281faada6f5e94..c5e2ac0b502dc4 100644 --- a/clang/test/SemaHLSL/BuiltIns/length-errors.hlsl +++ b/clang/test/SemaHLSL/BuiltIns/length-errors.hlsl @@ -1,32 +1,32 @@ -// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -disable-llvm-passes -verify -verify-ignore-unexpected - - -void test_too_few_arg() -{ - return __builtin_hlsl_length(); - // expected-error at -1 {{too few arguments to function call, expected 1, have 0}} -} - -void test_too_many_arg(float2 p0) -{ - return __builtin_hlsl_length(p0, p0); - // expected-error at -1 {{too many arguments to function call, expected 1, have 2}} -} - -bool builtin_bool_to_float_type_promotion(bool p1) -{ - return __builtin_hlsl_length(p1); - // expected-error at -1 {passing 'bool' to parameter of incompatible type 'float'}} -} - -bool builtin_length_int_to_float_promotion(int p1) -{ - return __builtin_hlsl_length(p1); - // expected-error at -1 {{passing 'int' to parameter of incompatible type 'float'}} -} - -bool2 builtin_length_int2_to_float2_promotion(int2 p1) -{ - return __builtin_hlsl_length(p1); - // expected-error at -1 {{passing 'int2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} -} +// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -disable-llvm-passes -verify -verify-ignore-unexpected + + +void test_too_few_arg() +{ + return __builtin_hlsl_length(); + // expected-error at -1 {{too few arguments to function call, expected 1, have 0}} +} + +void test_too_many_arg(float2 p0) +{ + return __builtin_hlsl_length(p0, p0); + // expected-error at -1 {{too many arguments to function call, expected 1, have 2}} +} + +bool builtin_bool_to_float_type_promotion(bool p1) +{ + return __builtin_hlsl_length(p1); + // expected-error at -1 {passing 'bool' to parameter of incompatible type 'float'}} +} + +bool builtin_length_int_to_float_promotion(int p1) +{ + return __builtin_hlsl_length(p1); + // expected-error at -1 {{passing 'int' to parameter of incompatible type 'float'}} +} + +bool2 builtin_length_int2_to_float2_promotion(int2 p1) +{ + return __builtin_hlsl_length(p1); + // expected-error at -1 {{passing 'int2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} +} diff --git a/clang/test/SemaHLSL/BuiltIns/normalize-errors.hlsl b/clang/test/SemaHLSL/BuiltIns/normalize-errors.hlsl index fc48c9b2589f7e..3720dca9b88a12 100644 --- a/clang/test/SemaHLSL/BuiltIns/normalize-errors.hlsl +++ b/clang/test/SemaHLSL/BuiltIns/normalize-errors.hlsl @@ -1,31 +1,31 @@ -// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -disable-llvm-passes -verify -verify-ignore-unexpected - -void test_too_few_arg() -{ - return __builtin_hlsl_normalize(); - // expected-error at -1 {{too few arguments to function call, expected 1, have 0}} -} - -void test_too_many_arg(float2 p0) -{ - return __builtin_hlsl_normalize(p0, p0); - // expected-error at -1 {{too many arguments to function call, expected 1, have 2}} -} - -bool builtin_bool_to_float_type_promotion(bool p1) -{ - return __builtin_hlsl_normalize(p1); - // expected-error at -1 {passing 'bool' to parameter of incompatible type 'float'}} -} - -bool builtin_normalize_int_to_float_promotion(int p1) -{ - return __builtin_hlsl_normalize(p1); - // expected-error at -1 {{passing 'int' to parameter of incompatible type 'float'}} -} - -bool2 builtin_normalize_int2_to_float2_promotion(int2 p1) -{ - return __builtin_hlsl_normalize(p1); - // expected-error at -1 {{passing 'int2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} -} +// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -disable-llvm-passes -verify -verify-ignore-unexpected + +void test_too_few_arg() +{ + return __builtin_hlsl_normalize(); + // expected-error at -1 {{too few arguments to function call, expected 1, have 0}} +} + +void test_too_many_arg(float2 p0) +{ + return __builtin_hlsl_normalize(p0, p0); + // expected-error at -1 {{too many arguments to function call, expected 1, have 2}} +} + +bool builtin_bool_to_float_type_promotion(bool p1) +{ + return __builtin_hlsl_normalize(p1); + // expected-error at -1 {passing 'bool' to parameter of incompatible type 'float'}} +} + +bool builtin_normalize_int_to_float_promotion(int p1) +{ + return __builtin_hlsl_normalize(p1); + // expected-error at -1 {{passing 'int' to parameter of incompatible type 'float'}} +} + +bool2 builtin_normalize_int2_to_float2_promotion(int2 p1) +{ + return __builtin_hlsl_normalize(p1); + // expected-error at -1 {{passing 'int2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} +} diff --git a/clang/test/SemaHLSL/BuiltIns/step-errors.hlsl b/clang/test/SemaHLSL/BuiltIns/step-errors.hlsl index 823585201ca62d..a76c5ff5dbd2ba 100644 --- a/clang/test/SemaHLSL/BuiltIns/step-errors.hlsl +++ b/clang/test/SemaHLSL/BuiltIns/step-errors.hlsl @@ -1,31 +1,31 @@ -// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -disable-llvm-passes -verify -verify-ignore-unexpected - -void test_too_few_arg() -{ - return __builtin_hlsl_step(); - // expected-error at -1 {{too few arguments to function call, expected 2, have 0}} -} - -void test_too_many_arg(float2 p0) -{ - return __builtin_hlsl_step(p0, p0, p0); - // expected-error at -1 {{too many arguments to function call, expected 2, have 3}} -} - -bool builtin_bool_to_float_type_promotion(bool p1) -{ - return __builtin_hlsl_step(p1, p1); - // expected-error at -1 {passing 'bool' to parameter of incompatible type 'float'}} -} - -bool builtin_step_int_to_float_promotion(int p1) -{ - return __builtin_hlsl_step(p1, p1); - // expected-error at -1 {{passing 'int' to parameter of incompatible type 'float'}} -} - -bool2 builtin_step_int2_to_float2_promotion(int2 p1) -{ - return __builtin_hlsl_step(p1, p1); - // expected-error at -1 {{passing 'int2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} -} +// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -disable-llvm-passes -verify -verify-ignore-unexpected + +void test_too_few_arg() +{ + return __builtin_hlsl_step(); + // expected-error at -1 {{too few arguments to function call, expected 2, have 0}} +} + +void test_too_many_arg(float2 p0) +{ + return __builtin_hlsl_step(p0, p0, p0); + // expected-error at -1 {{too many arguments to function call, expected 2, have 3}} +} + +bool builtin_bool_to_float_type_promotion(bool p1) +{ + return __builtin_hlsl_step(p1, p1); + // expected-error at -1 {passing 'bool' to parameter of incompatible type 'float'}} +} + +bool builtin_step_int_to_float_promotion(int p1) +{ + return __builtin_hlsl_step(p1, p1); + // expected-error at -1 {{passing 'int' to parameter of incompatible type 'float'}} +} + +bool2 builtin_step_int2_to_float2_promotion(int2 p1) +{ + return __builtin_hlsl_step(p1, p1); + // expected-error at -1 {{passing 'int2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} +} diff --git a/clang/test/SemaHLSL/Types/Traits/IsIntangibleType.hlsl b/clang/test/SemaHLSL/Types/Traits/IsIntangibleType.hlsl index 8c0f8d6f271dbd..1223a131af35c4 100644 --- a/clang/test/SemaHLSL/Types/Traits/IsIntangibleType.hlsl +++ b/clang/test/SemaHLSL/Types/Traits/IsIntangibleType.hlsl @@ -1,81 +1,81 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.6-library -finclude-default-header -verify %s -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.6-library -finclude-default-header -fnative-half-type -verify %s -// expected-no-diagnostics - -_Static_assert(__builtin_hlsl_is_intangible(__hlsl_resource_t), ""); -// no need to check array of __hlsl_resource_t, arrays of sizeless types are not supported - -_Static_assert(!__builtin_hlsl_is_intangible(int), ""); -_Static_assert(!__builtin_hlsl_is_intangible(float3), ""); -_Static_assert(!__builtin_hlsl_is_intangible(half[4]), ""); - -typedef __hlsl_resource_t Res; -_Static_assert(__builtin_hlsl_is_intangible(const Res), ""); -// no need to check array of Res, arrays of sizeless types are not supported - -struct ABuffer { - const int i[10]; - __hlsl_resource_t h; -}; -_Static_assert(__builtin_hlsl_is_intangible(ABuffer), ""); -_Static_assert(__builtin_hlsl_is_intangible(ABuffer[10]), ""); - -struct MyStruct { - half2 h2; - int3 i3; -}; -_Static_assert(!__builtin_hlsl_is_intangible(MyStruct), ""); -_Static_assert(!__builtin_hlsl_is_intangible(MyStruct[10]), ""); - -class MyClass { - int3 ivec; - float farray[12]; - MyStruct ms; - ABuffer buf; -}; -_Static_assert(__builtin_hlsl_is_intangible(MyClass), ""); -_Static_assert(__builtin_hlsl_is_intangible(MyClass[2]), ""); - -union U { - double d[4]; - Res buf; -}; -_Static_assert(__builtin_hlsl_is_intangible(U), ""); -_Static_assert(__builtin_hlsl_is_intangible(U[100]), ""); - -class MyClass2 { - int3 ivec; - float farray[12]; - U u; -}; -_Static_assert(__builtin_hlsl_is_intangible(MyClass2), ""); -_Static_assert(__builtin_hlsl_is_intangible(MyClass2[5]), ""); - -class Simple { - int a; -}; - -template struct TemplatedBuffer { - T a; - __hlsl_resource_t h; -}; -_Static_assert(__builtin_hlsl_is_intangible(TemplatedBuffer), ""); - -struct MyStruct2 : TemplatedBuffer { - float x; -}; -_Static_assert(__builtin_hlsl_is_intangible(MyStruct2), ""); - -struct MyStruct3 { - const TemplatedBuffer TB[10]; -}; -_Static_assert(__builtin_hlsl_is_intangible(MyStruct3), ""); - -template struct SimpleTemplate { - T a; -}; -_Static_assert(__builtin_hlsl_is_intangible(SimpleTemplate<__hlsl_resource_t>), ""); -_Static_assert(!__builtin_hlsl_is_intangible(SimpleTemplate), ""); - -_Static_assert(__builtin_hlsl_is_intangible(RWBuffer), ""); -_Static_assert(__builtin_hlsl_is_intangible(StructuredBuffer), ""); +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.6-library -finclude-default-header -verify %s +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.6-library -finclude-default-header -fnative-half-type -verify %s +// expected-no-diagnostics + +_Static_assert(__builtin_hlsl_is_intangible(__hlsl_resource_t), ""); +// no need to check array of __hlsl_resource_t, arrays of sizeless types are not supported + +_Static_assert(!__builtin_hlsl_is_intangible(int), ""); +_Static_assert(!__builtin_hlsl_is_intangible(float3), ""); +_Static_assert(!__builtin_hlsl_is_intangible(half[4]), ""); + +typedef __hlsl_resource_t Res; +_Static_assert(__builtin_hlsl_is_intangible(const Res), ""); +// no need to check array of Res, arrays of sizeless types are not supported + +struct ABuffer { + const int i[10]; + __hlsl_resource_t h; +}; +_Static_assert(__builtin_hlsl_is_intangible(ABuffer), ""); +_Static_assert(__builtin_hlsl_is_intangible(ABuffer[10]), ""); + +struct MyStruct { + half2 h2; + int3 i3; +}; +_Static_assert(!__builtin_hlsl_is_intangible(MyStruct), ""); +_Static_assert(!__builtin_hlsl_is_intangible(MyStruct[10]), ""); + +class MyClass { + int3 ivec; + float farray[12]; + MyStruct ms; + ABuffer buf; +}; +_Static_assert(__builtin_hlsl_is_intangible(MyClass), ""); +_Static_assert(__builtin_hlsl_is_intangible(MyClass[2]), ""); + +union U { + double d[4]; + Res buf; +}; +_Static_assert(__builtin_hlsl_is_intangible(U), ""); +_Static_assert(__builtin_hlsl_is_intangible(U[100]), ""); + +class MyClass2 { + int3 ivec; + float farray[12]; + U u; +}; +_Static_assert(__builtin_hlsl_is_intangible(MyClass2), ""); +_Static_assert(__builtin_hlsl_is_intangible(MyClass2[5]), ""); + +class Simple { + int a; +}; + +template struct TemplatedBuffer { + T a; + __hlsl_resource_t h; +}; +_Static_assert(__builtin_hlsl_is_intangible(TemplatedBuffer), ""); + +struct MyStruct2 : TemplatedBuffer { + float x; +}; +_Static_assert(__builtin_hlsl_is_intangible(MyStruct2), ""); + +struct MyStruct3 { + const TemplatedBuffer TB[10]; +}; +_Static_assert(__builtin_hlsl_is_intangible(MyStruct3), ""); + +template struct SimpleTemplate { + T a; +}; +_Static_assert(__builtin_hlsl_is_intangible(SimpleTemplate<__hlsl_resource_t>), ""); +_Static_assert(!__builtin_hlsl_is_intangible(SimpleTemplate), ""); + +_Static_assert(__builtin_hlsl_is_intangible(RWBuffer), ""); +_Static_assert(__builtin_hlsl_is_intangible(StructuredBuffer), ""); diff --git a/clang/test/SemaHLSL/Types/Traits/IsIntangibleTypeErrors.hlsl b/clang/test/SemaHLSL/Types/Traits/IsIntangibleTypeErrors.hlsl index de9ac90b895fc6..33614e87640dad 100644 --- a/clang/test/SemaHLSL/Types/Traits/IsIntangibleTypeErrors.hlsl +++ b/clang/test/SemaHLSL/Types/Traits/IsIntangibleTypeErrors.hlsl @@ -1,12 +1,12 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.6-library -finclude-default-header -verify %s - -struct Undefined; // expected-note {{forward declaration of 'Undefined'}} -_Static_assert(!__builtin_hlsl_is_intangible(Undefined), ""); // expected-error{{incomplete type 'Undefined' used in type trait expression}} - -void fn(int X) { // expected-note {{declared here}} - // expected-error@#vla {{variable length arrays are not supported for the current target}} - // expected-error@#vla {{variable length arrays are not supported in '__builtin_hlsl_is_intangible'}} - // expected-warning@#vla {{variable length arrays in C++ are a Clang extension}} - // expected-note@#vla {{function parameter 'X' with unknown value cannot be used in a constant expression}} - _Static_assert(!__builtin_hlsl_is_intangible(int[X]), ""); // #vla -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.6-library -finclude-default-header -verify %s + +struct Undefined; // expected-note {{forward declaration of 'Undefined'}} +_Static_assert(!__builtin_hlsl_is_intangible(Undefined), ""); // expected-error{{incomplete type 'Undefined' used in type trait expression}} + +void fn(int X) { // expected-note {{declared here}} + // expected-error@#vla {{variable length arrays are not supported for the current target}} + // expected-error@#vla {{variable length arrays are not supported in '__builtin_hlsl_is_intangible'}} + // expected-warning@#vla {{variable length arrays in C++ are a Clang extension}} + // expected-note@#vla {{function parameter 'X' with unknown value cannot be used in a constant expression}} + _Static_assert(!__builtin_hlsl_is_intangible(int[X]), ""); // #vla +} diff --git a/clang/test/SemaHLSL/resource_binding_attr_error_basic.hlsl b/clang/test/SemaHLSL/resource_binding_attr_error_basic.hlsl index 760c057630a7fa..4e50f70952ad13 100644 --- a/clang/test/SemaHLSL/resource_binding_attr_error_basic.hlsl +++ b/clang/test/SemaHLSL/resource_binding_attr_error_basic.hlsl @@ -1,42 +1,42 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s -verify - -// expected-error at +1{{binding type 't' only applies to SRV resources}} -float f1 : register(t0); - -// expected-error at +1 {{binding type 'u' only applies to UAV resources}} -float f2 : register(u0); - -// expected-error at +1{{binding type 'b' only applies to constant buffers. The 'bool constant' binding type is no longer supported}} -float f3 : register(b9); - -// expected-error at +1 {{binding type 's' only applies to sampler state}} -float f4 : register(s0); - -// expected-error at +1{{binding type 'i' ignored. The 'integer constant' binding type is no longer supported}} -float f5 : register(i9); - -// expected-error at +1{{binding type 'x' is invalid}} -float f6 : register(x9); - -cbuffer g_cbuffer1 { -// expected-error at +1{{binding type 'c' ignored in buffer declaration. Did you mean 'packoffset'?}} - float f7 : register(c2); -}; - -tbuffer g_tbuffer1 { -// expected-error at +1{{binding type 'c' ignored in buffer declaration. Did you mean 'packoffset'?}} - float f8 : register(c2); -}; - -cbuffer g_cbuffer2 { -// expected-error at +1{{binding type 'b' only applies to constant buffer resources}} - float f9 : register(b2); -}; - -tbuffer g_tbuffer2 { -// expected-error at +1{{binding type 'i' ignored. The 'integer constant' binding type is no longer supported}} - float f10 : register(i2); -}; - -// expected-error at +1{{binding type 'c' only applies to numeric variables in the global scope}} -RWBuffer f11 : register(c3); +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s -verify + +// expected-error at +1{{binding type 't' only applies to SRV resources}} +float f1 : register(t0); + +// expected-error at +1 {{binding type 'u' only applies to UAV resources}} +float f2 : register(u0); + +// expected-error at +1{{binding type 'b' only applies to constant buffers. The 'bool constant' binding type is no longer supported}} +float f3 : register(b9); + +// expected-error at +1 {{binding type 's' only applies to sampler state}} +float f4 : register(s0); + +// expected-error at +1{{binding type 'i' ignored. The 'integer constant' binding type is no longer supported}} +float f5 : register(i9); + +// expected-error at +1{{binding type 'x' is invalid}} +float f6 : register(x9); + +cbuffer g_cbuffer1 { +// expected-error at +1{{binding type 'c' ignored in buffer declaration. Did you mean 'packoffset'?}} + float f7 : register(c2); +}; + +tbuffer g_tbuffer1 { +// expected-error at +1{{binding type 'c' ignored in buffer declaration. Did you mean 'packoffset'?}} + float f8 : register(c2); +}; + +cbuffer g_cbuffer2 { +// expected-error at +1{{binding type 'b' only applies to constant buffer resources}} + float f9 : register(b2); +}; + +tbuffer g_tbuffer2 { +// expected-error at +1{{binding type 'i' ignored. The 'integer constant' binding type is no longer supported}} + float f10 : register(i2); +}; + +// expected-error at +1{{binding type 'c' only applies to numeric variables in the global scope}} +RWBuffer f11 : register(c3); diff --git a/clang/test/SemaHLSL/resource_binding_attr_error_other.hlsl b/clang/test/SemaHLSL/resource_binding_attr_error_other.hlsl index 4c9e9a6b44c928..503c8469666f3b 100644 --- a/clang/test/SemaHLSL/resource_binding_attr_error_other.hlsl +++ b/clang/test/SemaHLSL/resource_binding_attr_error_other.hlsl @@ -1,9 +1,9 @@ -// RUN: not %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s | FileCheck %s - -// XFAIL: * -// This expectedly fails because RayQuery is an unsupported type. -// When it becomes supported, we should expect an error due to -// the variable type being classified as "other", and according -// to the spec, err_hlsl_unsupported_register_type_and_variable_type -// should be emitted. -RayQuery<0> r1: register(t0); +// RUN: not %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s | FileCheck %s + +// XFAIL: * +// This expectedly fails because RayQuery is an unsupported type. +// When it becomes supported, we should expect an error due to +// the variable type being classified as "other", and according +// to the spec, err_hlsl_unsupported_register_type_and_variable_type +// should be emitted. +RayQuery<0> r1: register(t0); diff --git a/clang/test/SemaHLSL/resource_binding_attr_error_resource.hlsl b/clang/test/SemaHLSL/resource_binding_attr_error_resource.hlsl index 4b6af47c0ab725..ea43e27b5b5ac1 100644 --- a/clang/test/SemaHLSL/resource_binding_attr_error_resource.hlsl +++ b/clang/test/SemaHLSL/resource_binding_attr_error_resource.hlsl @@ -1,49 +1,49 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s -verify - -// This test validates the diagnostics that are emitted when a variable with a "resource" type -// is bound to a register using the register annotation - - -template -struct MyTemplatedSRV { - __hlsl_resource_t [[hlsl::resource_class(SRV)]] x; -}; - -struct MySRV { - __hlsl_resource_t [[hlsl::resource_class(SRV)]] x; -}; - -struct MySampler { - __hlsl_resource_t [[hlsl::resource_class(Sampler)]] x; -}; - -struct MyUAV { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] x; -}; - -struct MyCBuffer { - __hlsl_resource_t [[hlsl::resource_class(CBuffer)]] x; -}; - - -// expected-error at +1 {{binding type 'i' ignored. The 'integer constant' binding type is no longer supported}} -MySRV invalid : register(i2); - -// expected-error at +1 {{binding type 't' only applies to SRV resources}} -MyUAV a : register(t2, space1); - -// expected-error at +1 {{binding type 'u' only applies to UAV resources}} -MySampler b : register(u2, space1); - -// expected-error at +1 {{binding type 'b' only applies to constant buffer resources}} -MyTemplatedSRV c : register(b2); - -// expected-error at +1 {{binding type 's' only applies to sampler state}} -MyUAV d : register(s2, space1); - -// empty binding prefix cases: -// expected-error at +1 {{expected identifier}} -MyTemplatedSRV e: register(); - -// expected-error at +1 {{expected identifier}} -MyTemplatedSRV f: register(""); +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s -verify + +// This test validates the diagnostics that are emitted when a variable with a "resource" type +// is bound to a register using the register annotation + + +template +struct MyTemplatedSRV { + __hlsl_resource_t [[hlsl::resource_class(SRV)]] x; +}; + +struct MySRV { + __hlsl_resource_t [[hlsl::resource_class(SRV)]] x; +}; + +struct MySampler { + __hlsl_resource_t [[hlsl::resource_class(Sampler)]] x; +}; + +struct MyUAV { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] x; +}; + +struct MyCBuffer { + __hlsl_resource_t [[hlsl::resource_class(CBuffer)]] x; +}; + + +// expected-error at +1 {{binding type 'i' ignored. The 'integer constant' binding type is no longer supported}} +MySRV invalid : register(i2); + +// expected-error at +1 {{binding type 't' only applies to SRV resources}} +MyUAV a : register(t2, space1); + +// expected-error at +1 {{binding type 'u' only applies to UAV resources}} +MySampler b : register(u2, space1); + +// expected-error at +1 {{binding type 'b' only applies to constant buffer resources}} +MyTemplatedSRV c : register(b2); + +// expected-error at +1 {{binding type 's' only applies to sampler state}} +MyUAV d : register(s2, space1); + +// empty binding prefix cases: +// expected-error at +1 {{expected identifier}} +MyTemplatedSRV e: register(); + +// expected-error at +1 {{expected identifier}} +MyTemplatedSRV f: register(""); diff --git a/clang/test/SemaHLSL/resource_binding_attr_error_silence_diags.hlsl b/clang/test/SemaHLSL/resource_binding_attr_error_silence_diags.hlsl index e63f264452da79..7f248e30c07096 100644 --- a/clang/test/SemaHLSL/resource_binding_attr_error_silence_diags.hlsl +++ b/clang/test/SemaHLSL/resource_binding_attr_error_silence_diags.hlsl @@ -1,27 +1,27 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only -Wno-legacy-constant-register-binding %s -verify - -// expected-no-diagnostics -float f2 : register(b9); - -float f3 : register(i9); - -cbuffer g_cbuffer1 { - float f4 : register(c2); -}; - - -struct Eg12{ - RWBuffer a; -}; - -Eg12 e12 : register(c9); - -Eg12 bar : register(i1); - -struct Eg7 { - struct Bar { - float f; - }; - Bar b; -}; -Eg7 e7 : register(t0); +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only -Wno-legacy-constant-register-binding %s -verify + +// expected-no-diagnostics +float f2 : register(b9); + +float f3 : register(i9); + +cbuffer g_cbuffer1 { + float f4 : register(c2); +}; + + +struct Eg12{ + RWBuffer a; +}; + +Eg12 e12 : register(c9); + +Eg12 bar : register(i1); + +struct Eg7 { + struct Bar { + float f; + }; + Bar b; +}; +Eg7 e7 : register(t0); diff --git a/clang/test/SemaHLSL/resource_binding_attr_error_space.hlsl b/clang/test/SemaHLSL/resource_binding_attr_error_space.hlsl index 70e64e6ca75280..3001dbb1e3ec96 100644 --- a/clang/test/SemaHLSL/resource_binding_attr_error_space.hlsl +++ b/clang/test/SemaHLSL/resource_binding_attr_error_space.hlsl @@ -1,62 +1,62 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s -verify - -// valid -cbuffer cbuf { - RWBuffer r : register(u0, space0); -} - -cbuffer cbuf2 { - struct x { - // this test validates that no diagnostic is emitted on the space parameter, because - // this register annotation is not in the global scope. - // expected-error at +1 {{'register' attribute only applies to cbuffer/tbuffer and external global variables}} - RWBuffer E : register(u2, space3); - }; -} - -struct MyStruct { - RWBuffer E; -}; - -cbuffer cbuf3 { - // valid - MyStruct E : register(u2, space3); -} - -// valid -MyStruct F : register(u3, space4); - -cbuffer cbuf4 { - // this test validates that no diagnostic is emitted on the space parameter, because - // this register annotation is not in the global scope. - // expected-error at +1 {{binding type 'u' only applies to UAV resources}} - float a : register(u2, space3); -} - -// expected-error at +1 {{invalid space specifier 's2' used; expected 'space' followed by an integer, like space1}} -cbuffer a : register(b0, s2) { - -} - -// expected-error at +1 {{invalid space specifier 'spaces' used; expected 'space' followed by an integer, like space1}} -cbuffer b : register(b2, spaces) { - -} - -// expected-error at +1 {{wrong argument format for hlsl attribute, use space3 instead}} -cbuffer c : register(b2, space 3) {} - -// expected-error at +1 {{register space cannot be specified on global constants}} -int d : register(c2, space3); - -// expected-error at +1 {{register space cannot be specified on global constants}} -int e : register(c2, space0); - -// expected-error at +1 {{register space cannot be specified on global constants}} -int f : register(c2, space00); - -// valid -RWBuffer g : register(u2, space0); - -// valid -RWBuffer h : register(u2, space0); +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s -verify + +// valid +cbuffer cbuf { + RWBuffer r : register(u0, space0); +} + +cbuffer cbuf2 { + struct x { + // this test validates that no diagnostic is emitted on the space parameter, because + // this register annotation is not in the global scope. + // expected-error at +1 {{'register' attribute only applies to cbuffer/tbuffer and external global variables}} + RWBuffer E : register(u2, space3); + }; +} + +struct MyStruct { + RWBuffer E; +}; + +cbuffer cbuf3 { + // valid + MyStruct E : register(u2, space3); +} + +// valid +MyStruct F : register(u3, space4); + +cbuffer cbuf4 { + // this test validates that no diagnostic is emitted on the space parameter, because + // this register annotation is not in the global scope. + // expected-error at +1 {{binding type 'u' only applies to UAV resources}} + float a : register(u2, space3); +} + +// expected-error at +1 {{invalid space specifier 's2' used; expected 'space' followed by an integer, like space1}} +cbuffer a : register(b0, s2) { + +} + +// expected-error at +1 {{invalid space specifier 'spaces' used; expected 'space' followed by an integer, like space1}} +cbuffer b : register(b2, spaces) { + +} + +// expected-error at +1 {{wrong argument format for hlsl attribute, use space3 instead}} +cbuffer c : register(b2, space 3) {} + +// expected-error at +1 {{register space cannot be specified on global constants}} +int d : register(c2, space3); + +// expected-error at +1 {{register space cannot be specified on global constants}} +int e : register(c2, space0); + +// expected-error at +1 {{register space cannot be specified on global constants}} +int f : register(c2, space00); + +// valid +RWBuffer g : register(u2, space0); + +// valid +RWBuffer h : register(u2, space0); diff --git a/clang/test/SemaHLSL/resource_binding_attr_error_udt.hlsl b/clang/test/SemaHLSL/resource_binding_attr_error_udt.hlsl index ea2d576e4cca55..1ae072a6f4db68 100644 --- a/clang/test/SemaHLSL/resource_binding_attr_error_udt.hlsl +++ b/clang/test/SemaHLSL/resource_binding_attr_error_udt.hlsl @@ -1,135 +1,135 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s -verify - -template -struct MyTemplatedUAV { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] x; -}; - -struct MySRV { - __hlsl_resource_t [[hlsl::resource_class(SRV)]] x; -}; - -struct MySampler { - __hlsl_resource_t [[hlsl::resource_class(Sampler)]] x; -}; - -struct MyUAV { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] x; -}; - -struct MyCBuffer { - __hlsl_resource_t [[hlsl::resource_class(CBuffer)]] x; -}; - -// Valid: f is skipped, SRVBuf is bound to t0, UAVBuf is bound to u0 -struct Eg1 { - float f; - MySRV SRVBuf; - MyUAV UAVBuf; - }; -Eg1 e1 : register(t0) : register(u0); - -// Valid: f is skipped, SRVBuf is bound to t0, UAVBuf is bound to u0. -// UAVBuf2 gets automatically assigned to u1 even though there is no explicit binding for u1. -struct Eg2 { - float f; - MySRV SRVBuf; - MyUAV UAVBuf; - MyUAV UAVBuf2; - }; -Eg2 e2 : register(t0) : register(u0); - -// Valid: Bar, the struct within Eg3, has a valid resource that can be bound to t0. -struct Eg3 { - struct Bar { - MyUAV a; - }; - Bar b; -}; -Eg3 e3 : register(u0); - -// Valid: the first sampler state object within 's' is bound to slot 5 -struct Eg4 { - MySampler s[3]; -}; - -Eg4 e4 : register(s5); - - -struct Eg5 { - float f; -}; -// expected-warning at +1{{binding type 't' only applies to types containing SRV resources}} -Eg5 e5 : register(t0); - -struct Eg6 { - float f; -}; -// expected-warning at +1{{binding type 'u' only applies to types containing UAV resources}} -Eg6 e6 : register(u0); - -struct Eg7 { - float f; -}; -// expected-warning at +1{{binding type 'b' only applies to types containing constant buffer resources}} -Eg7 e7 : register(b0); - -struct Eg8 { - float f; -}; -// expected-warning at +1{{binding type 's' only applies to types containing sampler state}} -Eg8 e8 : register(s0); - -struct Eg9 { - MySRV s; -}; -// expected-warning at +1{{binding type 'c' only applies to types containing numeric types}} -Eg9 e9 : register(c0); - -struct Eg10{ - // expected-error at +1{{'register' attribute only applies to cbuffer/tbuffer and external global variables}} - MyTemplatedUAV a : register(u9); -}; -Eg10 e10; - - -template -struct Eg11 { - R b; -}; -// expected-warning at +1{{binding type 'u' only applies to types containing UAV resources}} -Eg11 e11 : register(u0); -// invalid because after template expansion, there are no valid resources inside Eg11 to bind as a UAV, only an SRV - - -struct Eg12{ - MySRV s1; - MySRV s2; -}; -// expected-warning at +3{{binding type 'u' only applies to types containing UAV resources}} -// expected-warning at +2{{binding type 'u' only applies to types containing UAV resources}} -// expected-error at +1{{binding type 'u' cannot be applied more than once}} -Eg12 e12 : register(u9) : register(u10); - -struct Eg13{ - MySRV s1; - MySRV s2; -}; -// expected-warning at +4{{binding type 'u' only applies to types containing UAV resources}} -// expected-warning at +3{{binding type 'u' only applies to types containing UAV resources}} -// expected-warning at +2{{binding type 'u' only applies to types containing UAV resources}} -// expected-error at +1{{binding type 'u' cannot be applied more than once}} -Eg13 e13 : register(u9) : register(u10) : register(u11); - -struct Eg14{ - MyTemplatedUAV r1; -}; -// expected-warning at +1{{binding type 't' only applies to types containing SRV resources}} -Eg14 e14 : register(t9); - -struct Eg15 { - float f[4]; -}; -// expected no error -Eg15 e15 : register(c0); - +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s -verify + +template +struct MyTemplatedUAV { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] x; +}; + +struct MySRV { + __hlsl_resource_t [[hlsl::resource_class(SRV)]] x; +}; + +struct MySampler { + __hlsl_resource_t [[hlsl::resource_class(Sampler)]] x; +}; + +struct MyUAV { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] x; +}; + +struct MyCBuffer { + __hlsl_resource_t [[hlsl::resource_class(CBuffer)]] x; +}; + +// Valid: f is skipped, SRVBuf is bound to t0, UAVBuf is bound to u0 +struct Eg1 { + float f; + MySRV SRVBuf; + MyUAV UAVBuf; + }; +Eg1 e1 : register(t0) : register(u0); + +// Valid: f is skipped, SRVBuf is bound to t0, UAVBuf is bound to u0. +// UAVBuf2 gets automatically assigned to u1 even though there is no explicit binding for u1. +struct Eg2 { + float f; + MySRV SRVBuf; + MyUAV UAVBuf; + MyUAV UAVBuf2; + }; +Eg2 e2 : register(t0) : register(u0); + +// Valid: Bar, the struct within Eg3, has a valid resource that can be bound to t0. +struct Eg3 { + struct Bar { + MyUAV a; + }; + Bar b; +}; +Eg3 e3 : register(u0); + +// Valid: the first sampler state object within 's' is bound to slot 5 +struct Eg4 { + MySampler s[3]; +}; + +Eg4 e4 : register(s5); + + +struct Eg5 { + float f; +}; +// expected-warning at +1{{binding type 't' only applies to types containing SRV resources}} +Eg5 e5 : register(t0); + +struct Eg6 { + float f; +}; +// expected-warning at +1{{binding type 'u' only applies to types containing UAV resources}} +Eg6 e6 : register(u0); + +struct Eg7 { + float f; +}; +// expected-warning at +1{{binding type 'b' only applies to types containing constant buffer resources}} +Eg7 e7 : register(b0); + +struct Eg8 { + float f; +}; +// expected-warning at +1{{binding type 's' only applies to types containing sampler state}} +Eg8 e8 : register(s0); + +struct Eg9 { + MySRV s; +}; +// expected-warning at +1{{binding type 'c' only applies to types containing numeric types}} +Eg9 e9 : register(c0); + +struct Eg10{ + // expected-error at +1{{'register' attribute only applies to cbuffer/tbuffer and external global variables}} + MyTemplatedUAV a : register(u9); +}; +Eg10 e10; + + +template +struct Eg11 { + R b; +}; +// expected-warning at +1{{binding type 'u' only applies to types containing UAV resources}} +Eg11 e11 : register(u0); +// invalid because after template expansion, there are no valid resources inside Eg11 to bind as a UAV, only an SRV + + +struct Eg12{ + MySRV s1; + MySRV s2; +}; +// expected-warning at +3{{binding type 'u' only applies to types containing UAV resources}} +// expected-warning at +2{{binding type 'u' only applies to types containing UAV resources}} +// expected-error at +1{{binding type 'u' cannot be applied more than once}} +Eg12 e12 : register(u9) : register(u10); + +struct Eg13{ + MySRV s1; + MySRV s2; +}; +// expected-warning at +4{{binding type 'u' only applies to types containing UAV resources}} +// expected-warning at +3{{binding type 'u' only applies to types containing UAV resources}} +// expected-warning at +2{{binding type 'u' only applies to types containing UAV resources}} +// expected-error at +1{{binding type 'u' cannot be applied more than once}} +Eg13 e13 : register(u9) : register(u10) : register(u11); + +struct Eg14{ + MyTemplatedUAV r1; +}; +// expected-warning at +1{{binding type 't' only applies to types containing SRV resources}} +Eg14 e14 : register(t9); + +struct Eg15 { + float f[4]; +}; +// expected no error +Eg15 e15 : register(c0); + diff --git a/clang/tools/scan-build/bin/scan-build.bat b/clang/tools/scan-build/bin/scan-build.bat index 77be6746318f11..f765f205b8ec50 100644 --- a/clang/tools/scan-build/bin/scan-build.bat +++ b/clang/tools/scan-build/bin/scan-build.bat @@ -1 +1 @@ -perl -S scan-build %* +perl -S scan-build %* diff --git a/clang/tools/scan-build/libexec/c++-analyzer.bat b/clang/tools/scan-build/libexec/c++-analyzer.bat index 69f048a91671f0..83c7172456a51a 100644 --- a/clang/tools/scan-build/libexec/c++-analyzer.bat +++ b/clang/tools/scan-build/libexec/c++-analyzer.bat @@ -1 +1 @@ -perl -S c++-analyzer %* +perl -S c++-analyzer %* diff --git a/clang/tools/scan-build/libexec/ccc-analyzer.bat b/clang/tools/scan-build/libexec/ccc-analyzer.bat index 2a85376eb82b16..fdd36f3bdd0437 100644 --- a/clang/tools/scan-build/libexec/ccc-analyzer.bat +++ b/clang/tools/scan-build/libexec/ccc-analyzer.bat @@ -1 +1 @@ -perl -S ccc-analyzer %* +perl -S ccc-analyzer %* diff --git a/clang/utils/ClangVisualizers/clang.natvis b/clang/utils/ClangVisualizers/clang.natvis index a7c70186bc46de..611c20dacce176 100644 --- a/clang/utils/ClangVisualizers/clang.natvis +++ b/clang/utils/ClangVisualizers/clang.natvis @@ -1,1089 +1,1089 @@ - - - - - - - LocInfoType - {(clang::Type::TypeClass)TypeBits.TC, en}Type - - {*(clang::BuiltinType *)this} - {*(clang::PointerType *)this} - {*(clang::ParenType *)this} - {(clang::BitIntType *)this} - {*(clang::LValueReferenceType *)this} - {*(clang::RValueReferenceType *)this} - {(clang::ConstantArrayType *)this,na} - {(clang::ConstantArrayType *)this,view(left)na} - {(clang::ConstantArrayType *)this,view(right)na} - {(clang::VariableArrayType *)this,na} - {(clang::VariableArrayType *)this,view(left)na} - {(clang::VariableArrayType *)this,view(right)na} - {(clang::IncompleteArrayType *)this,na} - {(clang::IncompleteArrayType *)this,view(left)na} - {(clang::IncompleteArrayType *)this,view(right)na} - {(clang::TypedefType *)this,na} - {(clang::TypedefType *)this,view(cpp)na} - {*(clang::AttributedType *)this} - {(clang::DecayedType *)this,na} - {(clang::DecayedType *)this,view(left)na} - {(clang::DecayedType *)this,view(right)na} - {(clang::ElaboratedType *)this,na} - {(clang::ElaboratedType *)this,view(left)na} - {(clang::ElaboratedType *)this,view(right)na} - {*(clang::TemplateTypeParmType *)this} - {*(clang::TemplateTypeParmType *)this,view(cpp)} - {*(clang::SubstTemplateTypeParmType *)this} - {*(clang::RecordType *)this} - {*(clang::RecordType *)this,view(cpp)} - {(clang::FunctionProtoType *)this,na} - {(clang::FunctionProtoType *)this,view(left)na} - {(clang::FunctionProtoType *)this,view(right)na} - {*(clang::TemplateSpecializationType *)this} - {*(clang::DeducedTemplateSpecializationType *)this} - {*(clang::DeducedTemplateSpecializationType *)this,view(cpp)} - {*(clang::InjectedClassNameType *)this} - {*(clang::DependentNameType *)this} - {*(clang::PackExpansionType *)this} - {(clang::LocInfoType *)this,na} - {(clang::LocInfoType *)this,view(cpp)na} - {this,view(poly)na} - {*this,view(cpp)} - - No visualizer yet for {(clang::Type::TypeClass)TypeBits.TC,en}Type - Dependence{" ",en} - - CachedLinkage: {(clang::Linkage)TypeBits.CachedLinkage,en} CachedLocalOrUnnamed - CachedLinkage: {(clang::Linkage)TypeBits.CachedLinkage,en}{" ",sb} - - FromAST - - - No TypeBits set beyond TypeClass - - {*this, view(Dependence)}{*this, view(Cache)}{*this, view(FromAST)} - {*this,view(cmn)} {{{*this,view(poly)}}} - - (clang::Type::TypeClass)TypeBits.TC - this,view(flags)na - CanonicalType - *(clang::BuiltinType *)this - *(clang::PointerType *)this - *(clang::ParenType*)this - *(clang::BitIntType*)this - *(clang::LValueReferenceType *)this - *(clang::RValueReferenceType *)this - (clang::ConstantArrayType *)this - (clang::VariableArrayType *)this - (clang::IncompleteArrayType *)this - *(clang::AttributedType *)this - (clang::DecayedType *)this - (clang::ElaboratedType *)this - (clang::TemplateTypeParmType *)this - (clang::SubstTemplateTypeParmType *)this - (clang::RecordType *)this - (clang::FunctionProtoType *)this - (clang::TemplateSpecializationType *)this - (clang::DeducedTemplateSpecializationType *)this - (clang::InjectedClassNameType *)this - (clang::DependentNameType *)this - (clang::PackExpansionType *)this - (clang::LocInfoType *)this - - - - - ElementType - - - - {ElementType,view(cpp)} - [{Size}] - {ElementType,view(cpp)}[{Size}] - - Size - (clang::ArrayType *)this - - - - {ElementType,view(cpp)} - [] - {ElementType,view(cpp)}[] - - (clang::ArrayType *)this - - - - {ElementType,view(cpp)} - [*] - {ElementType,view(cpp)}[*] - - (clang::Expr *)SizeExpr - (clang::ArrayType *)this - - - - {Decl,view(name)nd} - {Decl} - - Decl - *(clang::Type *)this, view(cmn) - - - - {PointeeType, view(cpp)} * - - PointeeType - *(clang::Type *)this, view(cmn) - - - - {Inner, view(cpp)} - - Inner - *(clang::Type *)this, view(cmn) - - - - signed _BitInt({NumBits}) - unsigned _BitInt({NumBits})( - - NumBits - (clang::Type *)this, view(cmn) - - - - - {((clang::ReferenceType *)this)->PointeeType,view(cpp)} & - - *(clang::Type *)this, view(cmn) - PointeeType - - - - {((clang::ReferenceType *)this)->PointeeType,view(cpp)} && - - *(clang::Type *)this, view(cmn) - PointeeType - - - - {ModifiedType} Attribute={(clang::AttributedType::Kind)AttributedTypeBits.AttrKind} - - - - - {(clang::Decl::Kind)DeclContextBits.DeclKind,en}Decl - - (clang::Decl::Kind)DeclContextBits.DeclKind,en - - - - - FirstDecl - (clang::Decl *)(*(intptr_t *)NextInContextAndBits.Value.Data & ~3) - *this - - - - - - - Field {{{*(clang::DeclaratorDecl *)this,view(cpp)nd}}} - - - {*(clang::FunctionDecl *)this,nd} - Method {{{*this,view(cpp)}}} - - - Constructor {{{Name,view(cpp)}({*(clang::FunctionDecl *)this,view(parm0)nd})}} - - - Destructor {{~{Name,view(cpp)}()}} - - - typename - class - (not yet known if parameter pack) - ... - - {(TypeSourceInfo *)(*(uintptr_t *)DefaultArgument.ValueOrInherited.Val.Value.Data&~3LL),view(cpp)} - {{InheritedInitializer}} - = {this,view(DefaultArg)na} - - {*this,view(TorC)} {*this,view(MaybeEllipses)}{Name,view(cpp)} {this,view(Initializer)na} - - - {*TemplatedDecl,view(cpp)} - template{TemplateParams,na} {*TemplatedDecl}; - - TemplateParams,na - TemplatedDecl,na - - - - - {(clang::TypeSourceInfo *)(*(uintptr_t *)MaybeModedTInfo.Value.Data & ~7LL),view(cpp)na} - {(clang::TypedefNameDecl::ModedTInfo *)(*(uintptr_t *)MaybeModedTInfo.Value.Data & ~7LL),view(cpp)na} - {(TypeDecl *)this,view(cpp)nand} - typedef {this,view(type)na} {this,view(name)na}; - - "Not yet calculated",sb - (bool)(*(uintptr_t *)MaybeModedTInfo.Value.Data & 2) - (clang::TypeSourceInfo *)(*(uintptr_t *)MaybeModedTInfo.Value.Data & ~7LL) - (clang::TypedefNameDecl::ModedTInfo *)(*(uintptr_t *)MaybeModedTInfo.Value.Data & ~7LL) - (TypeDecl *)this,nd - - - - {(TypedefNameDecl *)this,view(name)nand} - using {(TypedefNameDecl *)this,view(name)nand} = {(TypedefNameDecl *)this,view(type)nand} - - - {Name} - - - Kind={(UncommonTemplateNameStorage::Kind)Kind,en}, Size={Size} - - (UncommonTemplateNameStorage::Kind)Kind - Size - - - - {Bits}, - {this,view(cmn)na},{(OverloadedTemplateStorage*)this,na} - {this,view(cmn)na},{(AssumedTemplateStorage*)this,na} - {this,view(cmn)na},{(SubstTemplateTemplateParmStorage*)this,na} - {this,view(cmn)na},{(SubstTemplateTemplateParmPackStorage*)this,na} - {this,view(cmn)na} - - Bits - (OverloadedTemplateStorage*)this - (AssumedTemplateStorage*)this - (SubstTemplateTemplateParmStorage*)this - (SubstTemplateTemplateParmPackStorage*)this - - - - - - - {(clang::TemplateDecl *)(Val.Value & ~3LL),view(cpp)na} - - - {(clang::TemplateDecl *)(Val.Value & ~3LL),na} - - - {(clang::UncommonTemplateNameStorage *)(Val.Value & ~3LL),view(cpp)na} - - - {(clang::UncommonTemplateNameStorage *)(Val.Value & ~3LL),na} - - - {(clang::QualifiedTemplateName *)(Val.Value & ~3LL),view(cpp)na} - - - {(clang::QualifiedTemplateName *)(Val.Value & ~3LL),na} - - - {(clang::DependentTemplateName *)(Val.Value & ~3LL),view(cpp)na} - - - {(clang::DependentTemplateName *)(Val.Value & ~3LL),na} - - - "TemplateDecl",s8b - - (clang::TemplateDecl *)(Val.Value & ~3LL) - - "UncommonTemplateNameStorage",s8b - - (clang::UncommonTemplateNameStorage *)(Val.Value & ~3LL) - - "QualifiedTemplateName",s8b - - (clang::QualifiedTemplateName *)(Val.Value & ~3LL) - - "DependentTemplateName",s8b - - (clang::DependentTemplateName *)(Val.Value & ~3LL) - - Val - - - - - {Storage,view(cpp)na} - {Storage,na} - - Storage - - - - {Name,view(cpp)} - {Name} - - - implicit{" ",sb} - - {*this,view(implicit)nd} - {*this,view(modifiers)}{Name,view(cpp)} - {*this,view(modifiers)nd}struct {Name,view(cpp)} - {*this,view(modifiers)nd}interface {Name,view(cpp)} - {*this,view(modifiers)nd}union {Name,view(cpp)} - {*this,view(modifiers)nd}class {Name,view(cpp)} - {*this,view(modifiers)nd}enum {Name,view(cpp)} - - (clang::DeclContext *)this - - - - {decl,view(cpp)na} - {*decl} - - *(clang::Type *)this, view(cmn) - decl - - - - {(clang::TagType *)this,view(cpp)na} - {(clang::TagType *)this,na} - - *(clang::TagType *)this - - - - {{{*Replaced,view(cpp)} <= {CanonicalType,view(cpp)}}} - - *(clang::Type *)this, view(cmn) - *Replaced - - - - - - {ResultType,view(cpp)} - - {*(clang::QualType *)(this+1),view(cpp)}{*this,view(parm1)} - - , {*((clang::QualType *)(this+1)+1),view(cpp)}{*this,view(parm2)} - - , {*((clang::QualType *)(this+1)+2),view(cpp)}{*this,view(parm3)} - - , {*((clang::QualType *)(this+1)+3),view(cpp)}{*this,view(parm4)} - - , {*((clang::QualType *)(this+1)+4),view(cpp)}{*this,view(parm5)} - - , /* expand for more params */ - ({*this,view(parm0)}) -> {ResultType,view(cpp)} - ({*this,view(parm0)}) - {this,view(left)na}{this,view(right)na} - - ResultType - - {*this,view(parm0)} - - - FunctionTypeBits.NumParams - (clang::QualType *)(this+1) - - - - *(clang::Type *)this, view(cmn) - - - - - {OriginalTy} adjusted to {AdjustedTy} - - OriginalTy - AdjustedTy - - - - {OriginalTy,view(left)} - {OriginalTy,view(right)} - {OriginalTy} - - (clang::AdjustedType *)this - - - - {NamedType,view(left)} - {NamedType,view(right)} - {NamedType} - - (clang::ElaboratedTypeKeyword)TypeWithKeywordBits.Keyword - NNS - NamedType,view(cmn) - - - - {TTPDecl->Name,view(cpp)} - Non-canonical: {*TTPDecl} - Canonical: {CanTTPTInfo} - - *(clang::Type *)this, view(cmn) - - - - {Decl,view(cpp)} - - Decl - InjectedType - *(clang::Type *)this, view(cmn) - - - - {NNS}{Name,view(cpp)na} - - NNS - Name - *(clang::Type *)this, view(cmn) - - - - - {(IdentifierInfo*)Specifier,view(cpp)na}:: - {(NamedDecl*)Specifier,view(cpp)na}:: - {(Type*)Specifier,view(cpp)na}:: - - (NestedNameSpecifier::StoredSpecifierKind)((*(uintptr_t *)Prefix.Value.Data>>1)&3) - - - - {Pattern} - - Pattern - NumExpansions - *(clang::Type *)this, view(cmn) - - - - {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,view(poly)}{*this,view(fastQuals)} - {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,view(cpp)}{*this,view(fastQuals)} - {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,view(left)}{*this,view(fastQuals)} - {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,view(right)}{*this,view(fastQuals)} - - - {" ",sb}const - {" ",sb}restrict - {" ",sb}const restrict - {" ",sb}volatile - {" ",sb}const volatile - {" ",sb}volatile restrict - {" ",sb}const volatile restrict - Cannot visualize non-fast qualifiers - Null - {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,na}{*this,view(fastQuals)} - - *this,view(fastQuals) - ((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType - - - - - {DeclInfo,view(cpp)na} - {DeclInfo,na} - - DeclInfo - *(clang::Type *)this, view(cmn) - - - - {Ty,view(cpp)} - {Ty} - - Ty - - - - {(QualType *)&Ty,na} - - (QualType *)&Ty - Data - - - - Not building anything - Building a {LastTy} - - - {Argument,view(cpp)} - {Argument} - - - {*(clang::QualType *)&TypeOrValue.V,view(cpp)} - {(clang::TemplateArgument::ArgKind)TypeOrValue.Kind,en} template argument: {*(clang::QualType *)&TypeOrValue.V} - - {Args.Args[0]}{*this,view(arg1)} - - , {Args.Args[1]}{*this,view(arg2)} - - , {Args.Args[2]}, ... - - {Args.Args[0],view(cpp)}{*this,view(arg1cpp)} - - , {Args.Args[1],view(cpp)}{*this,view(arg2cpp)} - - , {Args.Args[2],view(cpp)}, ... - {*this,view(arg0cpp)} - {*this,view(arg0)} - {(clang::Expr *)TypeOrValue.V,view(cpp)na} - {(clang::TemplateArgument::ArgKind)TypeOrValue.Kind,en} - - *(clang::QualType *)&TypeOrValue.V - (clang::Expr *)TypeOrValue.V - - Args.NumArgs - Args.Args - - - - - - - {((TemplateArgumentLoc*)Arguments.BeginX)[0],view(cpp)}{*this,view(elt1)} - - , {((TemplateArgumentLoc*)Arguments.BeginX)[1],view(cpp)}{*this,view(elt2)} - - , {((TemplateArgumentLoc*)Arguments.BeginX)[2],view(cpp)}{*this,view(elt3)} - - , {((TemplateArgumentLoc*)Arguments.BeginX)[3],view(cpp)}{*this,view(elt4)} - - , ... - empty - <{*this,view(elt0)}> - Uninitialized - - - - {Arguments[0],view(cpp)}{*this,view(arg1)} - - , {Arguments[1],view(cpp)}{*this,view(arg2)} - - , {Arguments[1],view(cpp)}, ... - <{*this,view(arg0)}> - - NumArguments - - NumArguments - Arguments - - - - - - {Data[0],view(cpp)}{*this,view(arg1)} - - , {Data[1],view(cpp)}{*this,view(arg2)} - - , {Data[2],view(cpp)}, ... - <{*this,view(arg0)}> - - Length - - - - Length - Data - - - - - - - - {((llvm::ArrayRef<clang::TemplateArgument> *)TemplateArgumentLists.BeginX)[0],view(cpp)}{*this,view(level1)} - - ::{((llvm::ArrayRef<clang::TemplateArgument> *)TemplateArgumentLists.BeginX)[1],view(cpp)}{*this,view(level2)} - - ::{((llvm::ArrayRef<clang::TemplateArgument> *)TemplateArgumentLists.BeginX)[2],view(cpp)}, ... - {*this,view(level0)} - - TemplateArgumentLists - - - - {(clang::QualType *)Arg,view(cpp)na} - Type template argument: {*(clang::QualType *)Arg} - Non-type template argument: {*(clang::Expr *)Arg} - Template template argument: {*(clang::TemplateName *)Arg - - Kind,en - (clang::QualType *)Arg - (clang::Expr *)Arg - (clang::TemplateName *)Arg - - - - - void - bool - char - unsigned char - wchar_t - char16_t - char32_t - unsigned short - unsigned int - unsigned long - unsigned long long - __uint128_t - char - signed char - wchar_t - short - int - long - long long - __int128_t - __fp16 - float - double - long double - nullptr_t - {(clang::BuiltinType::Kind)BuiltinTypeBits.Kind, en} - - (clang::BuiltinType::Kind)BuiltinTypeBits.Kind - - - - - - {((clang::TemplateArgument *)(this+1))[0],view(cpp)}{*this,view(arg1)} - - , {((clang::TemplateArgument *)(this+1))[1],view(cpp)}{*this,view(arg2)} - - , {((clang::TemplateArgument *)(this+1))[2],view(cpp)}{*this,view(arg3)} - - {*((clang::TemplateDecl *)(Template.Storage.Val.Value))->TemplatedDecl,view(cpp)}<{*this,view(arg0)}> - - Can't visualize this TemplateSpecializationType - - Template.Storage - - TemplateSpecializationTypeBits.NumArgs - (clang::TemplateArgument *)(this+1) - - *(clang::Type *)this, view(cmn) - - - - - (CanonicalType.Value.Value != this) || TypeBits.Dependent - *(clang::Type *)this,view(cmn) - - - - {CanonicalType,view(cpp)} - {Template,view(cpp)} - {Template} - - Template - CanonicalType,view(cpp) - (clang::DeducedType *)this - Template - - - - {*(CXXRecordDecl *)this,nd}{*TemplateArgs} - - (CXXRecordDecl *)this,nd - TemplateArgs - - - - {((llvm::StringMapEntry<clang::IdentifierInfo *>*)Entry)+1,sb} - - ((llvm::StringMapEntry<clang::IdentifierInfo *>*)Entry)+1,s - (clang::tok::TokenKind)TokenID - - - - - Empty - {*(clang::IdentifierInfo *)(Ptr & ~PtrMask)} - {{Identifier ({*(clang::IdentifierInfo *)(Ptr & ~PtrMask)})}} - {{ObjC Zero Arg Selector (*{(clang::IdentifierInfo *)(Ptr & ~PtrMask)})}} - {{ObjC One Arg Selector (*{(clang::IdentifierInfo *)(Ptr & ~PtrMask)})}} - {(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),view(cpp)na} - C++ Constructor {{{(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),view(cpp)na}}} - C++ Destructor {{*(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask)}} - C++ Conversion function {{*(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask)}} - C++ Operator {{*(clang::detail::CXXOperatorIdName *)(Ptr & ~PtrMask)}} - {*(clang::detail::DeclarationNameExtra *)(Ptr & ~PtrMask),view(cpp)} - {{Extra ({*(clang::detail::DeclarationNameExtra *)(Ptr & ~PtrMask)})}} - - StoredNameKind(Ptr & PtrMask),en - *(clang::IdentifierInfo *)(Ptr & ~PtrMask),na - *(clang::IdentifierInfo *)(Ptr & ~PtrMask),na - *(clang::IdentifierInfo *)(Ptr & ~PtrMask),na - *(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),na - *(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),na - *(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),na - *(clang::detail::CXXOperatorIdName *)(Ptr & ~PtrMask),na - (clang::detail::DeclarationNameExtra *)(Ptr & ~PtrMask),na - - - - - {(CXXDeductionGuideNameExtra *)this,view(cpp)nand} - - - {(CXXDeductionGuideNameExtra *)this,nand} - - C++ Literal operator - C++ Using directive - Objective-C MultiArg selector - {(clang::detail::DeclarationNameExtra::ExtraKind)ExtraKindOrNumArgs,en}{" ",sb}{*this,view(cpp)} - - (CXXDeductionGuideNameExtra *)this - ExtraKindOrNumArgs - - - - {Template->TemplatedDecl,view(cpp)} - C++ Deduction guide for {Template->TemplatedDecl,view(cpp)na} - - - {Type,view(cpp)} - {Type} - - - {Name} - - - - {(ParsedTemplateArgument *)(this+1),view(cpp)na}{this,view(arg1)na} - - , {((ParsedTemplateArgument *)(this+1))+1,view(cpp)na}{this,view(arg2)na} - - , ... - {Name,na}<{this,view(arg0)na}> - - Name - - {this,view(arg0)na} - - - NumArgs - (ParsedTemplateArgument *)(this+1) - - - - Operator - - - - {{annot_template_id ({(clang::TemplateIdAnnotation *)(PtrData),na})}} - {{Identifier ({(clang::IdentifierInfo *)(PtrData),na})}} - {(clang::tok::TokenKind)Kind,en} - - - {BufferPtr,nasb} - - - {TheLexer._Mypair._Myval2,na} - Expanding Macro: {TheTokenLexer._Mypair._Myval2,na} - - - - - [{(Token *)(CachedTokens.BeginX) + CachedLexPos,na}] {IncludeMacroStack._Mypair._Myval2._Mylast - 1,na} - - {IncludeMacroStack._Mypair._Myval2._Mylast - 1,na} - {CurLexer._Mypair._Myval2,na} - Expanding Macro: {CurTokenLexer._Mypair._Myval2,na} - - - {this,view(cached)} - - CLK_LexAfterModuleImport - - - [{Tok}] {PP,na} - - - this - *this - {Id} - &{Id} - No visualizer for {Kind} - - - - =, - &, - - {(LambdaCapture *)(Captures.BeginX),na}{this,view(capture1)na} - - ,{(LambdaCapture *)(Captures.BeginX)+1,na}{this,view(capture2)na} - - ,{(LambdaCapture *)(Captures.BeginX)+2,na}{this,view(capture3)na} - - ,... - [{this,view(default)na}{this,view(capture0)na}] - - - - , [{TypeRep}] - - - , [{ExprRep}] - - - , [{DeclRep}] - - - [{(clang::DeclSpec::SCS)StorageClassSpec,en}], [{(clang::TypeSpecifierType)TypeSpecType,en}]{this,view(extra)na} - - (clang::DeclSpec::SCS)StorageClassSpec - (clang::TypeSpecifierType)TypeSpecType - - TypeRep - - - ExprRep - - - DeclRep - - - - - - {Name,s} - - - {RealPathName,s} - - - {Name,s} - - - - (clang::StorageClass)SClass - (clang::ThreadStorageClassSpecifier)TSCSpec - (clang::VarDecl::InitializationStyle)InitStyle - - - - {DeclType,view(left)} {Name,view(cpp)}{DeclType,view(right)} - - Name - DeclType - - - - {(DeclaratorDecl*)this,nand} - - (DeclaratorDecl*)this,nd - Init - VarDeclBits - - - - {*(VarDecl*)this,nd} - - ParmVarDeclBits - *(VarDecl*)this,nd - - - - {"explicit ",sb} - - explicit({ExplicitSpec,view(ptr)na}) - {ExplicitSpec,view(int)en} - {ExplicitSpec,view(int)en} : {ExplicitSpec,view(ptr)na} - - - {ExplicitSpec,view(cpp)}{Name,view(cpp)nd}({(FunctionDecl*)this,view(parm0)nand}) -> {((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)(((uintptr_t)DeclType.Value.Value) & ~15))->BaseType)->ResultType,view(cpp)} - - ExplicitSpec - (bool)FunctionDeclBits.IsCopyDeductionCandidate - (FunctionDecl*)this,nd - - - - {((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)DeclType.Value.Value.Data) & ~15))->BaseType)->ResultType,view(cpp)} - - {ParamInfo[0],na}{*this,view(parm1)nd} - - , {ParamInfo[1],na}{*this,view(parm2)nd} - - , {ParamInfo[2],na}{*this,view(parm3)nd} - - , {ParamInfo[3],na}{*this,view(parm4)nd} - - , {ParamInfo[4],na}{*this,view(parm5)nd} - - , /* expand for more params */ - - auto {Name,view(cpp)nd}({*this,view(parm0)nd}) -> {((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)DeclType.Value.Value.Data) & ~15))->BaseType)->ResultType,view(cpp)} - - {this,view(retType)nand} {Name,view(cpp)nd}({*this,view(parm0)nd}) - - (clang::DeclaratorDecl *)this,nd - ((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)DeclType.Value.Value.Data) & ~15))->BaseType)->ResultType - - {*this,view(parm0)nd} - - - ((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)DeclType.Value.Value.Data) & ~15))->BaseType)->FunctionTypeBits.NumParams - ParamInfo - - - - TemplateOrSpecialization - - - - {*($T1*)&Ptr} - - ($T1*)&Ptr - - - - {($T1 *)Ptr} - - ($T1 *)Ptr - - - - - {*((NamedDecl **)(this+1))[0],view(cpp)}{*this,view(parm1)} - - , {*((NamedDecl **)(this+1))[1],view(cpp)}{*this,view(parm2)} - - , {*((NamedDecl **)(this+1))[2],view(cpp)}{*this,view(parm3)} - - , {*((NamedDecl **)(this+1))[3],view(cpp)}{*this,view(parm4)} - - , {*((NamedDecl **)(this+1))[4],view(cpp)}{*this,view(parm5)} - - , /* Expand for more params */ - <{*this,view(parm0)}> - - - NumParams - (NamedDecl **)(this+1) - - - - - {(clang::Stmt::StmtClass)StmtBits.sClass,en} - - (clang::Stmt::StmtClass)StmtBits.sClass,en - - - - {*(clang::StringLiteral *)this} - Expression of class {(clang::Stmt::StmtClass)StmtBits.sClass,en} and type {TR,view(cpp)} - - - - *(unsigned *)(((clang::StringLiteral *)this)+1) - (const char *)(((clang::StringLiteral *)this)+1)+4+4,[*(unsigned *)(((clang::StringLiteral *)this)+1)]s8 - - - - public - protected - private - - {*(clang::NamedDecl *)(Ptr&~Mask)} - {*this,view(access)} {*this,view(decl)} - - (clang::AccessSpecifier)(Ptr&Mask),en - *(clang::NamedDecl *)(Ptr&~Mask) - - - - [IK_Identifier] {*Identifier} - [IK_OperatorFunctionId] {OperatorFunctionId} - [IK_ConversionFunctionId] {ConversionFunctionId} - [IK_ConstructorName] {ConstructorName} - [IK_DestructorName] {DestructorName} - [IK_DeductionGuideName] {TemplateName} - [IK_TemplateId] {TemplateId} - [IK_ConstructorTemplateId] {TemplateId} - Kind - - Identifier - OperatorFunctionId - ConversionFunctionId - ConstructorName - DestructorName - TemplateName - TemplateId - TemplateId - - - - NumDecls={NumDecls} - - - NumDecls - (Decl **)(this+1) - - - - - {*D} - {*(DeclGroup *)((uintptr_t)D&~1)} - - D - (DeclGroup *)((uintptr_t)D&~1) - - - - {DS} {Name} - - - {Decls} - - Decls - - - - {Ambiguity,en}: {Decls} - {ResultKind,en}: {Decls} - - - Invalid - Unset - {Val} - - - Invalid - Unset - {($T1)(Value&~1)} - - (bool)(Value&1) - ($T1)(Value&~1) - - - + + + + + + + LocInfoType + {(clang::Type::TypeClass)TypeBits.TC, en}Type + + {*(clang::BuiltinType *)this} + {*(clang::PointerType *)this} + {*(clang::ParenType *)this} + {(clang::BitIntType *)this} + {*(clang::LValueReferenceType *)this} + {*(clang::RValueReferenceType *)this} + {(clang::ConstantArrayType *)this,na} + {(clang::ConstantArrayType *)this,view(left)na} + {(clang::ConstantArrayType *)this,view(right)na} + {(clang::VariableArrayType *)this,na} + {(clang::VariableArrayType *)this,view(left)na} + {(clang::VariableArrayType *)this,view(right)na} + {(clang::IncompleteArrayType *)this,na} + {(clang::IncompleteArrayType *)this,view(left)na} + {(clang::IncompleteArrayType *)this,view(right)na} + {(clang::TypedefType *)this,na} + {(clang::TypedefType *)this,view(cpp)na} + {*(clang::AttributedType *)this} + {(clang::DecayedType *)this,na} + {(clang::DecayedType *)this,view(left)na} + {(clang::DecayedType *)this,view(right)na} + {(clang::ElaboratedType *)this,na} + {(clang::ElaboratedType *)this,view(left)na} + {(clang::ElaboratedType *)this,view(right)na} + {*(clang::TemplateTypeParmType *)this} + {*(clang::TemplateTypeParmType *)this,view(cpp)} + {*(clang::SubstTemplateTypeParmType *)this} + {*(clang::RecordType *)this} + {*(clang::RecordType *)this,view(cpp)} + {(clang::FunctionProtoType *)this,na} + {(clang::FunctionProtoType *)this,view(left)na} + {(clang::FunctionProtoType *)this,view(right)na} + {*(clang::TemplateSpecializationType *)this} + {*(clang::DeducedTemplateSpecializationType *)this} + {*(clang::DeducedTemplateSpecializationType *)this,view(cpp)} + {*(clang::InjectedClassNameType *)this} + {*(clang::DependentNameType *)this} + {*(clang::PackExpansionType *)this} + {(clang::LocInfoType *)this,na} + {(clang::LocInfoType *)this,view(cpp)na} + {this,view(poly)na} + {*this,view(cpp)} + + No visualizer yet for {(clang::Type::TypeClass)TypeBits.TC,en}Type + Dependence{" ",en} + + CachedLinkage: {(clang::Linkage)TypeBits.CachedLinkage,en} CachedLocalOrUnnamed + CachedLinkage: {(clang::Linkage)TypeBits.CachedLinkage,en}{" ",sb} + + FromAST + + + No TypeBits set beyond TypeClass + + {*this, view(Dependence)}{*this, view(Cache)}{*this, view(FromAST)} + {*this,view(cmn)} {{{*this,view(poly)}}} + + (clang::Type::TypeClass)TypeBits.TC + this,view(flags)na + CanonicalType + *(clang::BuiltinType *)this + *(clang::PointerType *)this + *(clang::ParenType*)this + *(clang::BitIntType*)this + *(clang::LValueReferenceType *)this + *(clang::RValueReferenceType *)this + (clang::ConstantArrayType *)this + (clang::VariableArrayType *)this + (clang::IncompleteArrayType *)this + *(clang::AttributedType *)this + (clang::DecayedType *)this + (clang::ElaboratedType *)this + (clang::TemplateTypeParmType *)this + (clang::SubstTemplateTypeParmType *)this + (clang::RecordType *)this + (clang::FunctionProtoType *)this + (clang::TemplateSpecializationType *)this + (clang::DeducedTemplateSpecializationType *)this + (clang::InjectedClassNameType *)this + (clang::DependentNameType *)this + (clang::PackExpansionType *)this + (clang::LocInfoType *)this + + + + + ElementType + + + + {ElementType,view(cpp)} + [{Size}] + {ElementType,view(cpp)}[{Size}] + + Size + (clang::ArrayType *)this + + + + {ElementType,view(cpp)} + [] + {ElementType,view(cpp)}[] + + (clang::ArrayType *)this + + + + {ElementType,view(cpp)} + [*] + {ElementType,view(cpp)}[*] + + (clang::Expr *)SizeExpr + (clang::ArrayType *)this + + + + {Decl,view(name)nd} + {Decl} + + Decl + *(clang::Type *)this, view(cmn) + + + + {PointeeType, view(cpp)} * + + PointeeType + *(clang::Type *)this, view(cmn) + + + + {Inner, view(cpp)} + + Inner + *(clang::Type *)this, view(cmn) + + + + signed _BitInt({NumBits}) + unsigned _BitInt({NumBits})( + + NumBits + (clang::Type *)this, view(cmn) + + + + + {((clang::ReferenceType *)this)->PointeeType,view(cpp)} & + + *(clang::Type *)this, view(cmn) + PointeeType + + + + {((clang::ReferenceType *)this)->PointeeType,view(cpp)} && + + *(clang::Type *)this, view(cmn) + PointeeType + + + + {ModifiedType} Attribute={(clang::AttributedType::Kind)AttributedTypeBits.AttrKind} + + + + + {(clang::Decl::Kind)DeclContextBits.DeclKind,en}Decl + + (clang::Decl::Kind)DeclContextBits.DeclKind,en + + + + + FirstDecl + (clang::Decl *)(*(intptr_t *)NextInContextAndBits.Value.Data & ~3) + *this + + + + + + + Field {{{*(clang::DeclaratorDecl *)this,view(cpp)nd}}} + + + {*(clang::FunctionDecl *)this,nd} + Method {{{*this,view(cpp)}}} + + + Constructor {{{Name,view(cpp)}({*(clang::FunctionDecl *)this,view(parm0)nd})}} + + + Destructor {{~{Name,view(cpp)}()}} + + + typename + class + (not yet known if parameter pack) + ... + + {(TypeSourceInfo *)(*(uintptr_t *)DefaultArgument.ValueOrInherited.Val.Value.Data&~3LL),view(cpp)} + {{InheritedInitializer}} + = {this,view(DefaultArg)na} + + {*this,view(TorC)} {*this,view(MaybeEllipses)}{Name,view(cpp)} {this,view(Initializer)na} + + + {*TemplatedDecl,view(cpp)} + template{TemplateParams,na} {*TemplatedDecl}; + + TemplateParams,na + TemplatedDecl,na + + + + + {(clang::TypeSourceInfo *)(*(uintptr_t *)MaybeModedTInfo.Value.Data & ~7LL),view(cpp)na} + {(clang::TypedefNameDecl::ModedTInfo *)(*(uintptr_t *)MaybeModedTInfo.Value.Data & ~7LL),view(cpp)na} + {(TypeDecl *)this,view(cpp)nand} + typedef {this,view(type)na} {this,view(name)na}; + + "Not yet calculated",sb + (bool)(*(uintptr_t *)MaybeModedTInfo.Value.Data & 2) + (clang::TypeSourceInfo *)(*(uintptr_t *)MaybeModedTInfo.Value.Data & ~7LL) + (clang::TypedefNameDecl::ModedTInfo *)(*(uintptr_t *)MaybeModedTInfo.Value.Data & ~7LL) + (TypeDecl *)this,nd + + + + {(TypedefNameDecl *)this,view(name)nand} + using {(TypedefNameDecl *)this,view(name)nand} = {(TypedefNameDecl *)this,view(type)nand} + + + {Name} + + + Kind={(UncommonTemplateNameStorage::Kind)Kind,en}, Size={Size} + + (UncommonTemplateNameStorage::Kind)Kind + Size + + + + {Bits}, + {this,view(cmn)na},{(OverloadedTemplateStorage*)this,na} + {this,view(cmn)na},{(AssumedTemplateStorage*)this,na} + {this,view(cmn)na},{(SubstTemplateTemplateParmStorage*)this,na} + {this,view(cmn)na},{(SubstTemplateTemplateParmPackStorage*)this,na} + {this,view(cmn)na} + + Bits + (OverloadedTemplateStorage*)this + (AssumedTemplateStorage*)this + (SubstTemplateTemplateParmStorage*)this + (SubstTemplateTemplateParmPackStorage*)this + + + + + + + {(clang::TemplateDecl *)(Val.Value & ~3LL),view(cpp)na} + + + {(clang::TemplateDecl *)(Val.Value & ~3LL),na} + + + {(clang::UncommonTemplateNameStorage *)(Val.Value & ~3LL),view(cpp)na} + + + {(clang::UncommonTemplateNameStorage *)(Val.Value & ~3LL),na} + + + {(clang::QualifiedTemplateName *)(Val.Value & ~3LL),view(cpp)na} + + + {(clang::QualifiedTemplateName *)(Val.Value & ~3LL),na} + + + {(clang::DependentTemplateName *)(Val.Value & ~3LL),view(cpp)na} + + + {(clang::DependentTemplateName *)(Val.Value & ~3LL),na} + + + "TemplateDecl",s8b + + (clang::TemplateDecl *)(Val.Value & ~3LL) + + "UncommonTemplateNameStorage",s8b + + (clang::UncommonTemplateNameStorage *)(Val.Value & ~3LL) + + "QualifiedTemplateName",s8b + + (clang::QualifiedTemplateName *)(Val.Value & ~3LL) + + "DependentTemplateName",s8b + + (clang::DependentTemplateName *)(Val.Value & ~3LL) + + Val + + + + + {Storage,view(cpp)na} + {Storage,na} + + Storage + + + + {Name,view(cpp)} + {Name} + + + implicit{" ",sb} + + {*this,view(implicit)nd} + {*this,view(modifiers)}{Name,view(cpp)} + {*this,view(modifiers)nd}struct {Name,view(cpp)} + {*this,view(modifiers)nd}interface {Name,view(cpp)} + {*this,view(modifiers)nd}union {Name,view(cpp)} + {*this,view(modifiers)nd}class {Name,view(cpp)} + {*this,view(modifiers)nd}enum {Name,view(cpp)} + + (clang::DeclContext *)this + + + + {decl,view(cpp)na} + {*decl} + + *(clang::Type *)this, view(cmn) + decl + + + + {(clang::TagType *)this,view(cpp)na} + {(clang::TagType *)this,na} + + *(clang::TagType *)this + + + + {{{*Replaced,view(cpp)} <= {CanonicalType,view(cpp)}}} + + *(clang::Type *)this, view(cmn) + *Replaced + + + + + + {ResultType,view(cpp)} + + {*(clang::QualType *)(this+1),view(cpp)}{*this,view(parm1)} + + , {*((clang::QualType *)(this+1)+1),view(cpp)}{*this,view(parm2)} + + , {*((clang::QualType *)(this+1)+2),view(cpp)}{*this,view(parm3)} + + , {*((clang::QualType *)(this+1)+3),view(cpp)}{*this,view(parm4)} + + , {*((clang::QualType *)(this+1)+4),view(cpp)}{*this,view(parm5)} + + , /* expand for more params */ + ({*this,view(parm0)}) -> {ResultType,view(cpp)} + ({*this,view(parm0)}) + {this,view(left)na}{this,view(right)na} + + ResultType + + {*this,view(parm0)} + + + FunctionTypeBits.NumParams + (clang::QualType *)(this+1) + + + + *(clang::Type *)this, view(cmn) + + + + + {OriginalTy} adjusted to {AdjustedTy} + + OriginalTy + AdjustedTy + + + + {OriginalTy,view(left)} + {OriginalTy,view(right)} + {OriginalTy} + + (clang::AdjustedType *)this + + + + {NamedType,view(left)} + {NamedType,view(right)} + {NamedType} + + (clang::ElaboratedTypeKeyword)TypeWithKeywordBits.Keyword + NNS + NamedType,view(cmn) + + + + {TTPDecl->Name,view(cpp)} + Non-canonical: {*TTPDecl} + Canonical: {CanTTPTInfo} + + *(clang::Type *)this, view(cmn) + + + + {Decl,view(cpp)} + + Decl + InjectedType + *(clang::Type *)this, view(cmn) + + + + {NNS}{Name,view(cpp)na} + + NNS + Name + *(clang::Type *)this, view(cmn) + + + + + {(IdentifierInfo*)Specifier,view(cpp)na}:: + {(NamedDecl*)Specifier,view(cpp)na}:: + {(Type*)Specifier,view(cpp)na}:: + + (NestedNameSpecifier::StoredSpecifierKind)((*(uintptr_t *)Prefix.Value.Data>>1)&3) + + + + {Pattern} + + Pattern + NumExpansions + *(clang::Type *)this, view(cmn) + + + + {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,view(poly)}{*this,view(fastQuals)} + {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,view(cpp)}{*this,view(fastQuals)} + {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,view(left)}{*this,view(fastQuals)} + {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,view(right)}{*this,view(fastQuals)} + + + {" ",sb}const + {" ",sb}restrict + {" ",sb}const restrict + {" ",sb}volatile + {" ",sb}const volatile + {" ",sb}volatile restrict + {" ",sb}const volatile restrict + Cannot visualize non-fast qualifiers + Null + {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,na}{*this,view(fastQuals)} + + *this,view(fastQuals) + ((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType + + + + + {DeclInfo,view(cpp)na} + {DeclInfo,na} + + DeclInfo + *(clang::Type *)this, view(cmn) + + + + {Ty,view(cpp)} + {Ty} + + Ty + + + + {(QualType *)&Ty,na} + + (QualType *)&Ty + Data + + + + Not building anything + Building a {LastTy} + + + {Argument,view(cpp)} + {Argument} + + + {*(clang::QualType *)&TypeOrValue.V,view(cpp)} + {(clang::TemplateArgument::ArgKind)TypeOrValue.Kind,en} template argument: {*(clang::QualType *)&TypeOrValue.V} + + {Args.Args[0]}{*this,view(arg1)} + + , {Args.Args[1]}{*this,view(arg2)} + + , {Args.Args[2]}, ... + + {Args.Args[0],view(cpp)}{*this,view(arg1cpp)} + + , {Args.Args[1],view(cpp)}{*this,view(arg2cpp)} + + , {Args.Args[2],view(cpp)}, ... + {*this,view(arg0cpp)} + {*this,view(arg0)} + {(clang::Expr *)TypeOrValue.V,view(cpp)na} + {(clang::TemplateArgument::ArgKind)TypeOrValue.Kind,en} + + *(clang::QualType *)&TypeOrValue.V + (clang::Expr *)TypeOrValue.V + + Args.NumArgs + Args.Args + + + + + + + {((TemplateArgumentLoc*)Arguments.BeginX)[0],view(cpp)}{*this,view(elt1)} + + , {((TemplateArgumentLoc*)Arguments.BeginX)[1],view(cpp)}{*this,view(elt2)} + + , {((TemplateArgumentLoc*)Arguments.BeginX)[2],view(cpp)}{*this,view(elt3)} + + , {((TemplateArgumentLoc*)Arguments.BeginX)[3],view(cpp)}{*this,view(elt4)} + + , ... + empty + <{*this,view(elt0)}> + Uninitialized + + + + {Arguments[0],view(cpp)}{*this,view(arg1)} + + , {Arguments[1],view(cpp)}{*this,view(arg2)} + + , {Arguments[1],view(cpp)}, ... + <{*this,view(arg0)}> + + NumArguments + + NumArguments + Arguments + + + + + + {Data[0],view(cpp)}{*this,view(arg1)} + + , {Data[1],view(cpp)}{*this,view(arg2)} + + , {Data[2],view(cpp)}, ... + <{*this,view(arg0)}> + + Length + + + + Length + Data + + + + + + + + {((llvm::ArrayRef<clang::TemplateArgument> *)TemplateArgumentLists.BeginX)[0],view(cpp)}{*this,view(level1)} + + ::{((llvm::ArrayRef<clang::TemplateArgument> *)TemplateArgumentLists.BeginX)[1],view(cpp)}{*this,view(level2)} + + ::{((llvm::ArrayRef<clang::TemplateArgument> *)TemplateArgumentLists.BeginX)[2],view(cpp)}, ... + {*this,view(level0)} + + TemplateArgumentLists + + + + {(clang::QualType *)Arg,view(cpp)na} + Type template argument: {*(clang::QualType *)Arg} + Non-type template argument: {*(clang::Expr *)Arg} + Template template argument: {*(clang::TemplateName *)Arg + + Kind,en + (clang::QualType *)Arg + (clang::Expr *)Arg + (clang::TemplateName *)Arg + + + + + void + bool + char + unsigned char + wchar_t + char16_t + char32_t + unsigned short + unsigned int + unsigned long + unsigned long long + __uint128_t + char + signed char + wchar_t + short + int + long + long long + __int128_t + __fp16 + float + double + long double + nullptr_t + {(clang::BuiltinType::Kind)BuiltinTypeBits.Kind, en} + + (clang::BuiltinType::Kind)BuiltinTypeBits.Kind + + + + + + {((clang::TemplateArgument *)(this+1))[0],view(cpp)}{*this,view(arg1)} + + , {((clang::TemplateArgument *)(this+1))[1],view(cpp)}{*this,view(arg2)} + + , {((clang::TemplateArgument *)(this+1))[2],view(cpp)}{*this,view(arg3)} + + {*((clang::TemplateDecl *)(Template.Storage.Val.Value))->TemplatedDecl,view(cpp)}<{*this,view(arg0)}> + + Can't visualize this TemplateSpecializationType + + Template.Storage + + TemplateSpecializationTypeBits.NumArgs + (clang::TemplateArgument *)(this+1) + + *(clang::Type *)this, view(cmn) + + + + + (CanonicalType.Value.Value != this) || TypeBits.Dependent + *(clang::Type *)this,view(cmn) + + + + {CanonicalType,view(cpp)} + {Template,view(cpp)} + {Template} + + Template + CanonicalType,view(cpp) + (clang::DeducedType *)this + Template + + + + {*(CXXRecordDecl *)this,nd}{*TemplateArgs} + + (CXXRecordDecl *)this,nd + TemplateArgs + + + + {((llvm::StringMapEntry<clang::IdentifierInfo *>*)Entry)+1,sb} + + ((llvm::StringMapEntry<clang::IdentifierInfo *>*)Entry)+1,s + (clang::tok::TokenKind)TokenID + + + + + Empty + {*(clang::IdentifierInfo *)(Ptr & ~PtrMask)} + {{Identifier ({*(clang::IdentifierInfo *)(Ptr & ~PtrMask)})}} + {{ObjC Zero Arg Selector (*{(clang::IdentifierInfo *)(Ptr & ~PtrMask)})}} + {{ObjC One Arg Selector (*{(clang::IdentifierInfo *)(Ptr & ~PtrMask)})}} + {(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),view(cpp)na} + C++ Constructor {{{(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),view(cpp)na}}} + C++ Destructor {{*(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask)}} + C++ Conversion function {{*(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask)}} + C++ Operator {{*(clang::detail::CXXOperatorIdName *)(Ptr & ~PtrMask)}} + {*(clang::detail::DeclarationNameExtra *)(Ptr & ~PtrMask),view(cpp)} + {{Extra ({*(clang::detail::DeclarationNameExtra *)(Ptr & ~PtrMask)})}} + + StoredNameKind(Ptr & PtrMask),en + *(clang::IdentifierInfo *)(Ptr & ~PtrMask),na + *(clang::IdentifierInfo *)(Ptr & ~PtrMask),na + *(clang::IdentifierInfo *)(Ptr & ~PtrMask),na + *(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),na + *(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),na + *(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),na + *(clang::detail::CXXOperatorIdName *)(Ptr & ~PtrMask),na + (clang::detail::DeclarationNameExtra *)(Ptr & ~PtrMask),na + + + + + {(CXXDeductionGuideNameExtra *)this,view(cpp)nand} + + + {(CXXDeductionGuideNameExtra *)this,nand} + + C++ Literal operator + C++ Using directive + Objective-C MultiArg selector + {(clang::detail::DeclarationNameExtra::ExtraKind)ExtraKindOrNumArgs,en}{" ",sb}{*this,view(cpp)} + + (CXXDeductionGuideNameExtra *)this + ExtraKindOrNumArgs + + + + {Template->TemplatedDecl,view(cpp)} + C++ Deduction guide for {Template->TemplatedDecl,view(cpp)na} + + + {Type,view(cpp)} + {Type} + + + {Name} + + + + {(ParsedTemplateArgument *)(this+1),view(cpp)na}{this,view(arg1)na} + + , {((ParsedTemplateArgument *)(this+1))+1,view(cpp)na}{this,view(arg2)na} + + , ... + {Name,na}<{this,view(arg0)na}> + + Name + + {this,view(arg0)na} + + + NumArgs + (ParsedTemplateArgument *)(this+1) + + + + Operator + + + + {{annot_template_id ({(clang::TemplateIdAnnotation *)(PtrData),na})}} + {{Identifier ({(clang::IdentifierInfo *)(PtrData),na})}} + {(clang::tok::TokenKind)Kind,en} + + + {BufferPtr,nasb} + + + {TheLexer._Mypair._Myval2,na} + Expanding Macro: {TheTokenLexer._Mypair._Myval2,na} + + + + + [{(Token *)(CachedTokens.BeginX) + CachedLexPos,na}] {IncludeMacroStack._Mypair._Myval2._Mylast - 1,na} + + {IncludeMacroStack._Mypair._Myval2._Mylast - 1,na} + {CurLexer._Mypair._Myval2,na} + Expanding Macro: {CurTokenLexer._Mypair._Myval2,na} + + + {this,view(cached)} + + CLK_LexAfterModuleImport + + + [{Tok}] {PP,na} + + + this + *this + {Id} + &{Id} + No visualizer for {Kind} + + + + =, + &, + + {(LambdaCapture *)(Captures.BeginX),na}{this,view(capture1)na} + + ,{(LambdaCapture *)(Captures.BeginX)+1,na}{this,view(capture2)na} + + ,{(LambdaCapture *)(Captures.BeginX)+2,na}{this,view(capture3)na} + + ,... + [{this,view(default)na}{this,view(capture0)na}] + + + + , [{TypeRep}] + + + , [{ExprRep}] + + + , [{DeclRep}] + + + [{(clang::DeclSpec::SCS)StorageClassSpec,en}], [{(clang::TypeSpecifierType)TypeSpecType,en}]{this,view(extra)na} + + (clang::DeclSpec::SCS)StorageClassSpec + (clang::TypeSpecifierType)TypeSpecType + + TypeRep + + + ExprRep + + + DeclRep + + + + + + {Name,s} + + + {RealPathName,s} + + + {Name,s} + + + + (clang::StorageClass)SClass + (clang::ThreadStorageClassSpecifier)TSCSpec + (clang::VarDecl::InitializationStyle)InitStyle + + + + {DeclType,view(left)} {Name,view(cpp)}{DeclType,view(right)} + + Name + DeclType + + + + {(DeclaratorDecl*)this,nand} + + (DeclaratorDecl*)this,nd + Init + VarDeclBits + + + + {*(VarDecl*)this,nd} + + ParmVarDeclBits + *(VarDecl*)this,nd + + + + {"explicit ",sb} + + explicit({ExplicitSpec,view(ptr)na}) + {ExplicitSpec,view(int)en} + {ExplicitSpec,view(int)en} : {ExplicitSpec,view(ptr)na} + + + {ExplicitSpec,view(cpp)}{Name,view(cpp)nd}({(FunctionDecl*)this,view(parm0)nand}) -> {((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)(((uintptr_t)DeclType.Value.Value) & ~15))->BaseType)->ResultType,view(cpp)} + + ExplicitSpec + (bool)FunctionDeclBits.IsCopyDeductionCandidate + (FunctionDecl*)this,nd + + + + {((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)DeclType.Value.Value.Data) & ~15))->BaseType)->ResultType,view(cpp)} + + {ParamInfo[0],na}{*this,view(parm1)nd} + + , {ParamInfo[1],na}{*this,view(parm2)nd} + + , {ParamInfo[2],na}{*this,view(parm3)nd} + + , {ParamInfo[3],na}{*this,view(parm4)nd} + + , {ParamInfo[4],na}{*this,view(parm5)nd} + + , /* expand for more params */ + + auto {Name,view(cpp)nd}({*this,view(parm0)nd}) -> {((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)DeclType.Value.Value.Data) & ~15))->BaseType)->ResultType,view(cpp)} + + {this,view(retType)nand} {Name,view(cpp)nd}({*this,view(parm0)nd}) + + (clang::DeclaratorDecl *)this,nd + ((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)DeclType.Value.Value.Data) & ~15))->BaseType)->ResultType + + {*this,view(parm0)nd} + + + ((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)DeclType.Value.Value.Data) & ~15))->BaseType)->FunctionTypeBits.NumParams + ParamInfo + + + + TemplateOrSpecialization + + + + {*($T1*)&Ptr} + + ($T1*)&Ptr + + + + {($T1 *)Ptr} + + ($T1 *)Ptr + + + + + {*((NamedDecl **)(this+1))[0],view(cpp)}{*this,view(parm1)} + + , {*((NamedDecl **)(this+1))[1],view(cpp)}{*this,view(parm2)} + + , {*((NamedDecl **)(this+1))[2],view(cpp)}{*this,view(parm3)} + + , {*((NamedDecl **)(this+1))[3],view(cpp)}{*this,view(parm4)} + + , {*((NamedDecl **)(this+1))[4],view(cpp)}{*this,view(parm5)} + + , /* Expand for more params */ + <{*this,view(parm0)}> + + + NumParams + (NamedDecl **)(this+1) + + + + + {(clang::Stmt::StmtClass)StmtBits.sClass,en} + + (clang::Stmt::StmtClass)StmtBits.sClass,en + + + + {*(clang::StringLiteral *)this} + Expression of class {(clang::Stmt::StmtClass)StmtBits.sClass,en} and type {TR,view(cpp)} + + + + *(unsigned *)(((clang::StringLiteral *)this)+1) + (const char *)(((clang::StringLiteral *)this)+1)+4+4,[*(unsigned *)(((clang::StringLiteral *)this)+1)]s8 + + + + public + protected + private + + {*(clang::NamedDecl *)(Ptr&~Mask)} + {*this,view(access)} {*this,view(decl)} + + (clang::AccessSpecifier)(Ptr&Mask),en + *(clang::NamedDecl *)(Ptr&~Mask) + + + + [IK_Identifier] {*Identifier} + [IK_OperatorFunctionId] {OperatorFunctionId} + [IK_ConversionFunctionId] {ConversionFunctionId} + [IK_ConstructorName] {ConstructorName} + [IK_DestructorName] {DestructorName} + [IK_DeductionGuideName] {TemplateName} + [IK_TemplateId] {TemplateId} + [IK_ConstructorTemplateId] {TemplateId} + Kind + + Identifier + OperatorFunctionId + ConversionFunctionId + ConstructorName + DestructorName + TemplateName + TemplateId + TemplateId + + + + NumDecls={NumDecls} + + + NumDecls + (Decl **)(this+1) + + + + + {*D} + {*(DeclGroup *)((uintptr_t)D&~1)} + + D + (DeclGroup *)((uintptr_t)D&~1) + + + + {DS} {Name} + + + {Decls} + + Decls + + + + {Ambiguity,en}: {Decls} + {ResultKind,en}: {Decls} + + + Invalid + Unset + {Val} + + + Invalid + Unset + {($T1)(Value&~1)} + + (bool)(Value&1) + ($T1)(Value&~1) + + + diff --git a/flang/test/Driver/msvc-dependent-lib-flags.f90 b/flang/test/Driver/msvc-dependent-lib-flags.f90 index 765917f07d8e72..1b7ecb604ad67d 100644 --- a/flang/test/Driver/msvc-dependent-lib-flags.f90 +++ b/flang/test/Driver/msvc-dependent-lib-flags.f90 @@ -1,36 +1,36 @@ -! RUN: %flang -### --target=aarch64-windows-msvc -resource-dir=%S/Inputs/resource_dir %S/Inputs/hello.f90 -v 2>&1 | FileCheck %s --check-prefixes=MSVC -! RUN: %flang -### --target=aarch64-windows-msvc -resource-dir=%S/Inputs/resource_dir -fms-runtime-lib=static_dbg %S/Inputs/hello.f90 -v 2>&1 | FileCheck %s --check-prefixes=MSVC-DEBUG -! RUN: %flang -### --target=aarch64-windows-msvc -resource-dir=%S/Inputs/resource_dir -fms-runtime-lib=dll %S/Inputs/hello.f90 -v 2>&1 | FileCheck %s --check-prefixes=MSVC-DLL -! RUN: %flang -### --target=aarch64-windows-msvc -resource-dir=%S/Inputs/resource_dir -fms-runtime-lib=dll_dbg %S/Inputs/hello.f90 -v 2>&1 | FileCheck %s --check-prefixes=MSVC-DLL-DEBUG - -! MSVC: -fc1 -! MSVC-SAME: --dependent-lib=clang_rt.builtins.lib -! MSVC-SAME: -D_MT -! MSVC-SAME: --dependent-lib=libcmt -! MSVC-SAME: --dependent-lib=FortranRuntime.static.lib -! MSVC-SAME: --dependent-lib=FortranDecimal.static.lib - -! MSVC-DEBUG: -fc1 -! MSVC-DEBUG-SAME: --dependent-lib=clang_rt.builtins.lib -! MSVC-DEBUG-SAME: -D_MT -! MSVC-DEBUG-SAME: -D_DEBUG -! MSVC-DEBUG-SAME: --dependent-lib=libcmtd -! MSVC-DEBUG-SAME: --dependent-lib=FortranRuntime.static_dbg.lib -! MSVC-DEBUG-SAME: --dependent-lib=FortranDecimal.static_dbg.lib - -! MSVC-DLL: -fc1 -! MSVC-DLL-SAME: --dependent-lib=clang_rt.builtins.lib -! MSVC-DLL-SAME: -D_MT -! MSVC-DLL-SAME: -D_DLL -! MSVC-DLL-SAME: --dependent-lib=msvcrt -! MSVC-DLL-SAME: --dependent-lib=FortranRuntime.dynamic.lib -! MSVC-DLL-SAME: --dependent-lib=FortranDecimal.dynamic.lib - -! MSVC-DLL-DEBUG: -fc1 -! MSVC-DLL-DEBUG-SAME: --dependent-lib=clang_rt.builtins.lib -! MSVC-DLL-DEBUG-SAME: -D_MT -! MSVC-DLL-DEBUG-SAME: -D_DEBUG -! MSVC-DLL-DEBUG-SAME: -D_DLL -! MSVC-DLL-DEBUG-SAME: --dependent-lib=msvcrtd -! MSVC-DLL-DEBUG-SAME: --dependent-lib=FortranRuntime.dynamic_dbg.lib -! MSVC-DLL-DEBUG-SAME: --dependent-lib=FortranDecimal.dynamic_dbg.lib +! RUN: %flang -### --target=aarch64-windows-msvc -resource-dir=%S/Inputs/resource_dir %S/Inputs/hello.f90 -v 2>&1 | FileCheck %s --check-prefixes=MSVC +! RUN: %flang -### --target=aarch64-windows-msvc -resource-dir=%S/Inputs/resource_dir -fms-runtime-lib=static_dbg %S/Inputs/hello.f90 -v 2>&1 | FileCheck %s --check-prefixes=MSVC-DEBUG +! RUN: %flang -### --target=aarch64-windows-msvc -resource-dir=%S/Inputs/resource_dir -fms-runtime-lib=dll %S/Inputs/hello.f90 -v 2>&1 | FileCheck %s --check-prefixes=MSVC-DLL +! RUN: %flang -### --target=aarch64-windows-msvc -resource-dir=%S/Inputs/resource_dir -fms-runtime-lib=dll_dbg %S/Inputs/hello.f90 -v 2>&1 | FileCheck %s --check-prefixes=MSVC-DLL-DEBUG + +! MSVC: -fc1 +! MSVC-SAME: --dependent-lib=clang_rt.builtins.lib +! MSVC-SAME: -D_MT +! MSVC-SAME: --dependent-lib=libcmt +! MSVC-SAME: --dependent-lib=FortranRuntime.static.lib +! MSVC-SAME: --dependent-lib=FortranDecimal.static.lib + +! MSVC-DEBUG: -fc1 +! MSVC-DEBUG-SAME: --dependent-lib=clang_rt.builtins.lib +! MSVC-DEBUG-SAME: -D_MT +! MSVC-DEBUG-SAME: -D_DEBUG +! MSVC-DEBUG-SAME: --dependent-lib=libcmtd +! MSVC-DEBUG-SAME: --dependent-lib=FortranRuntime.static_dbg.lib +! MSVC-DEBUG-SAME: --dependent-lib=FortranDecimal.static_dbg.lib + +! MSVC-DLL: -fc1 +! MSVC-DLL-SAME: --dependent-lib=clang_rt.builtins.lib +! MSVC-DLL-SAME: -D_MT +! MSVC-DLL-SAME: -D_DLL +! MSVC-DLL-SAME: --dependent-lib=msvcrt +! MSVC-DLL-SAME: --dependent-lib=FortranRuntime.dynamic.lib +! MSVC-DLL-SAME: --dependent-lib=FortranDecimal.dynamic.lib + +! MSVC-DLL-DEBUG: -fc1 +! MSVC-DLL-DEBUG-SAME: --dependent-lib=clang_rt.builtins.lib +! MSVC-DLL-DEBUG-SAME: -D_MT +! MSVC-DLL-DEBUG-SAME: -D_DEBUG +! MSVC-DLL-DEBUG-SAME: -D_DLL +! MSVC-DLL-DEBUG-SAME: --dependent-lib=msvcrtd +! MSVC-DLL-DEBUG-SAME: --dependent-lib=FortranRuntime.dynamic_dbg.lib +! MSVC-DLL-DEBUG-SAME: --dependent-lib=FortranDecimal.dynamic_dbg.lib diff --git a/lldb/test/API/commands/expression/ir-interpreter-phi-nodes/Makefile b/lldb/test/API/commands/expression/ir-interpreter-phi-nodes/Makefile index a1f689e07c77ff..d420a34c03e785 100644 --- a/lldb/test/API/commands/expression/ir-interpreter-phi-nodes/Makefile +++ b/lldb/test/API/commands/expression/ir-interpreter-phi-nodes/Makefile @@ -1,4 +1,4 @@ - -CXX_SOURCES := main.cpp - -include Makefile.rules + +CXX_SOURCES := main.cpp + +include Makefile.rules diff --git a/lldb/test/API/functionalities/postmortem/minidump/fizzbuzz.syms b/lldb/test/API/functionalities/postmortem/minidump/fizzbuzz.syms index cab06c1c9d50b1..e817a491af5750 100644 --- a/lldb/test/API/functionalities/postmortem/minidump/fizzbuzz.syms +++ b/lldb/test/API/functionalities/postmortem/minidump/fizzbuzz.syms @@ -1,2 +1,2 @@ -MODULE windows x86 0F45B7919A9646F9BF8F2D6076EA421A11 fizzbuzz.pdb -PUBLIC 1000 0 main +MODULE windows x86 0F45B7919A9646F9BF8F2D6076EA421A11 fizzbuzz.pdb +PUBLIC 1000 0 main diff --git a/lldb/test/API/functionalities/target-new-solib-notifications/Makefile b/lldb/test/API/functionalities/target-new-solib-notifications/Makefile index e3b48697fd7837..745f6cc9d65ae3 100644 --- a/lldb/test/API/functionalities/target-new-solib-notifications/Makefile +++ b/lldb/test/API/functionalities/target-new-solib-notifications/Makefile @@ -1,23 +1,23 @@ -CXX_SOURCES := main.cpp -LD_EXTRAS := -L. -l_d -l_c -l_a -l_b - -a.out: lib_b lib_a lib_c lib_d - -include Makefile.rules - -lib_a: lib_b - "$(MAKE)" -f $(MAKEFILE_RULES) \ - DYLIB_ONLY=YES DYLIB_CXX_SOURCES=a.cpp DYLIB_NAME=_a \ - LD_EXTRAS="-L. -l_b" - -lib_b: - "$(MAKE)" -f $(MAKEFILE_RULES) \ - DYLIB_ONLY=YES DYLIB_CXX_SOURCES=b.cpp DYLIB_NAME=_b - -lib_c: - "$(MAKE)" -f $(MAKEFILE_RULES) \ - DYLIB_ONLY=YES DYLIB_CXX_SOURCES=c.cpp DYLIB_NAME=_c - -lib_d: - "$(MAKE)" -f $(MAKEFILE_RULES) \ - DYLIB_ONLY=YES DYLIB_CXX_SOURCES=d.cpp DYLIB_NAME=_d +CXX_SOURCES := main.cpp +LD_EXTRAS := -L. -l_d -l_c -l_a -l_b + +a.out: lib_b lib_a lib_c lib_d + +include Makefile.rules + +lib_a: lib_b + "$(MAKE)" -f $(MAKEFILE_RULES) \ + DYLIB_ONLY=YES DYLIB_CXX_SOURCES=a.cpp DYLIB_NAME=_a \ + LD_EXTRAS="-L. -l_b" + +lib_b: + "$(MAKE)" -f $(MAKEFILE_RULES) \ + DYLIB_ONLY=YES DYLIB_CXX_SOURCES=b.cpp DYLIB_NAME=_b + +lib_c: + "$(MAKE)" -f $(MAKEFILE_RULES) \ + DYLIB_ONLY=YES DYLIB_CXX_SOURCES=c.cpp DYLIB_NAME=_c + +lib_d: + "$(MAKE)" -f $(MAKEFILE_RULES) \ + DYLIB_ONLY=YES DYLIB_CXX_SOURCES=d.cpp DYLIB_NAME=_d diff --git a/lldb/test/API/functionalities/target-new-solib-notifications/a.cpp b/lldb/test/API/functionalities/target-new-solib-notifications/a.cpp index 778b46ed5cef1a..66633b70ee1e50 100644 --- a/lldb/test/API/functionalities/target-new-solib-notifications/a.cpp +++ b/lldb/test/API/functionalities/target-new-solib-notifications/a.cpp @@ -1,3 +1,3 @@ -extern "C" int b_function(); - -extern "C" int a_function() { return b_function(); } +extern "C" int b_function(); + +extern "C" int a_function() { return b_function(); } diff --git a/lldb/test/API/functionalities/target-new-solib-notifications/b.cpp b/lldb/test/API/functionalities/target-new-solib-notifications/b.cpp index 4f1a4032ee0eed..8b16fbdb5728cd 100644 --- a/lldb/test/API/functionalities/target-new-solib-notifications/b.cpp +++ b/lldb/test/API/functionalities/target-new-solib-notifications/b.cpp @@ -1 +1 @@ -extern "C" int b_function() { return 500; } +extern "C" int b_function() { return 500; } diff --git a/lldb/test/API/functionalities/target-new-solib-notifications/c.cpp b/lldb/test/API/functionalities/target-new-solib-notifications/c.cpp index 8abd1b155a7590..120c88f2bb609a 100644 --- a/lldb/test/API/functionalities/target-new-solib-notifications/c.cpp +++ b/lldb/test/API/functionalities/target-new-solib-notifications/c.cpp @@ -1 +1 @@ -extern "C" int c_function() { return 600; } +extern "C" int c_function() { return 600; } diff --git a/lldb/test/API/functionalities/target-new-solib-notifications/d.cpp b/lldb/test/API/functionalities/target-new-solib-notifications/d.cpp index 58888a29ba323a..d37ad2621ae4e9 100644 --- a/lldb/test/API/functionalities/target-new-solib-notifications/d.cpp +++ b/lldb/test/API/functionalities/target-new-solib-notifications/d.cpp @@ -1 +1 @@ -extern "C" int d_function() { return 700; } +extern "C" int d_function() { return 700; } diff --git a/lldb/test/API/functionalities/target-new-solib-notifications/main.cpp b/lldb/test/API/functionalities/target-new-solib-notifications/main.cpp index 77b38c5ccdc698..bd2c79cdab9daa 100644 --- a/lldb/test/API/functionalities/target-new-solib-notifications/main.cpp +++ b/lldb/test/API/functionalities/target-new-solib-notifications/main.cpp @@ -1,16 +1,16 @@ -#include - -extern "C" int a_function(); -extern "C" int c_function(); -extern "C" int b_function(); -extern "C" int d_function(); - -int main() { - a_function(); - b_function(); - c_function(); - d_function(); - - puts("running"); // breakpoint here - return 0; -} +#include + +extern "C" int a_function(); +extern "C" int c_function(); +extern "C" int b_function(); +extern "C" int d_function(); + +int main() { + a_function(); + b_function(); + c_function(); + d_function(); + + puts("running"); // breakpoint here + return 0; +} diff --git a/lldb/test/API/functionalities/unwind/zeroth_frame/Makefile b/lldb/test/API/functionalities/unwind/zeroth_frame/Makefile index 15a931850e17e5..10495940055b63 100644 --- a/lldb/test/API/functionalities/unwind/zeroth_frame/Makefile +++ b/lldb/test/API/functionalities/unwind/zeroth_frame/Makefile @@ -1,3 +1,3 @@ -C_SOURCES := main.c - -include Makefile.rules +C_SOURCES := main.c + +include Makefile.rules diff --git a/lldb/test/API/functionalities/unwind/zeroth_frame/TestZerothFrame.py b/lldb/test/API/functionalities/unwind/zeroth_frame/TestZerothFrame.py index d660844405e137..70f72c72c8340e 100644 --- a/lldb/test/API/functionalities/unwind/zeroth_frame/TestZerothFrame.py +++ b/lldb/test/API/functionalities/unwind/zeroth_frame/TestZerothFrame.py @@ -1,88 +1,88 @@ -""" -Test that line information is recalculated properly for a frame when it moves -from the middle of the backtrace to a zero index. - -This is a regression test for a StackFrame bug, where whether frame is zero or -not depends on an internal field. When LLDB was updating its frame list value -of the field wasn't copied into existing StackFrame instances, so those -StackFrame instances, would use an incorrect line entry evaluation logic in -situations if it was in the middle of the stack frame list (not zeroth), and -then moved to the top position. The difference in logic is that for zeroth -frames line entry is returned for program counter, while for other frame -(except for those that "behave like zeroth") it is for the instruction -preceding PC, as PC points to the next instruction after function call. When -the bug is present, when execution stops at the second breakpoint -SBFrame.GetLineEntry() returns line entry for the previous line, rather than -the one with a breakpoint. Note that this is specific to -SBFrame.GetLineEntry(), SBFrame.GetPCAddress().GetLineEntry() would return -correct entry. - -This bug doesn't reproduce through an LLDB interpretator, however it happens -when using API directly, for example in LLDB-MI. -""" - -import lldb -from lldbsuite.test.decorators import * -from lldbsuite.test.lldbtest import * -from lldbsuite.test import lldbutil - - -class ZerothFrame(TestBase): - def test(self): - """ - Test that line information is recalculated properly for a frame when it moves - from the middle of the backtrace to a zero index. - """ - self.build() - self.setTearDownCleanup() - - exe = self.getBuildArtifact("a.out") - target = self.dbg.CreateTarget(exe) - self.assertTrue(target, VALID_TARGET) - - main_dot_c = lldb.SBFileSpec("main.c") - bp1 = target.BreakpointCreateBySourceRegex( - "// Set breakpoint 1 here", main_dot_c - ) - bp2 = target.BreakpointCreateBySourceRegex( - "// Set breakpoint 2 here", main_dot_c - ) - - process = target.LaunchSimple(None, None, self.get_process_working_directory()) - self.assertTrue(process, VALID_PROCESS) - - thread = self.thread() - - if self.TraceOn(): - print("Backtrace at the first breakpoint:") - for f in thread.frames: - print(f) - - # Check that we have stopped at correct breakpoint. - self.assertEqual( - thread.frame[0].GetLineEntry().GetLine(), - bp1.GetLocationAtIndex(0).GetAddress().GetLineEntry().GetLine(), - "LLDB reported incorrect line number.", - ) - - # Important to use SBProcess::Continue() instead of - # self.runCmd('continue'), because the problem doesn't reproduce with - # 'continue' command. - process.Continue() - - if self.TraceOn(): - print("Backtrace at the second breakpoint:") - for f in thread.frames: - print(f) - # Check that we have stopped at the breakpoint - self.assertEqual( - thread.frame[0].GetLineEntry().GetLine(), - bp2.GetLocationAtIndex(0).GetAddress().GetLineEntry().GetLine(), - "LLDB reported incorrect line number.", - ) - # Double-check with GetPCAddress() - self.assertEqual( - thread.frame[0].GetLineEntry().GetLine(), - thread.frame[0].GetPCAddress().GetLineEntry().GetLine(), - "LLDB reported incorrect line number.", - ) +""" +Test that line information is recalculated properly for a frame when it moves +from the middle of the backtrace to a zero index. + +This is a regression test for a StackFrame bug, where whether frame is zero or +not depends on an internal field. When LLDB was updating its frame list value +of the field wasn't copied into existing StackFrame instances, so those +StackFrame instances, would use an incorrect line entry evaluation logic in +situations if it was in the middle of the stack frame list (not zeroth), and +then moved to the top position. The difference in logic is that for zeroth +frames line entry is returned for program counter, while for other frame +(except for those that "behave like zeroth") it is for the instruction +preceding PC, as PC points to the next instruction after function call. When +the bug is present, when execution stops at the second breakpoint +SBFrame.GetLineEntry() returns line entry for the previous line, rather than +the one with a breakpoint. Note that this is specific to +SBFrame.GetLineEntry(), SBFrame.GetPCAddress().GetLineEntry() would return +correct entry. + +This bug doesn't reproduce through an LLDB interpretator, however it happens +when using API directly, for example in LLDB-MI. +""" + +import lldb +from lldbsuite.test.decorators import * +from lldbsuite.test.lldbtest import * +from lldbsuite.test import lldbutil + + +class ZerothFrame(TestBase): + def test(self): + """ + Test that line information is recalculated properly for a frame when it moves + from the middle of the backtrace to a zero index. + """ + self.build() + self.setTearDownCleanup() + + exe = self.getBuildArtifact("a.out") + target = self.dbg.CreateTarget(exe) + self.assertTrue(target, VALID_TARGET) + + main_dot_c = lldb.SBFileSpec("main.c") + bp1 = target.BreakpointCreateBySourceRegex( + "// Set breakpoint 1 here", main_dot_c + ) + bp2 = target.BreakpointCreateBySourceRegex( + "// Set breakpoint 2 here", main_dot_c + ) + + process = target.LaunchSimple(None, None, self.get_process_working_directory()) + self.assertTrue(process, VALID_PROCESS) + + thread = self.thread() + + if self.TraceOn(): + print("Backtrace at the first breakpoint:") + for f in thread.frames: + print(f) + + # Check that we have stopped at correct breakpoint. + self.assertEqual( + thread.frame[0].GetLineEntry().GetLine(), + bp1.GetLocationAtIndex(0).GetAddress().GetLineEntry().GetLine(), + "LLDB reported incorrect line number.", + ) + + # Important to use SBProcess::Continue() instead of + # self.runCmd('continue'), because the problem doesn't reproduce with + # 'continue' command. + process.Continue() + + if self.TraceOn(): + print("Backtrace at the second breakpoint:") + for f in thread.frames: + print(f) + # Check that we have stopped at the breakpoint + self.assertEqual( + thread.frame[0].GetLineEntry().GetLine(), + bp2.GetLocationAtIndex(0).GetAddress().GetLineEntry().GetLine(), + "LLDB reported incorrect line number.", + ) + # Double-check with GetPCAddress() + self.assertEqual( + thread.frame[0].GetLineEntry().GetLine(), + thread.frame[0].GetPCAddress().GetLineEntry().GetLine(), + "LLDB reported incorrect line number.", + ) diff --git a/lldb/test/API/python_api/debugger/Makefile b/lldb/test/API/python_api/debugger/Makefile index bfad5f33e86753..99998b20bcb050 100644 --- a/lldb/test/API/python_api/debugger/Makefile +++ b/lldb/test/API/python_api/debugger/Makefile @@ -1,3 +1,3 @@ -CXX_SOURCES := main.cpp - -include Makefile.rules +CXX_SOURCES := main.cpp + +include Makefile.rules diff --git a/lldb/test/Shell/BuildScript/modes.test b/lldb/test/Shell/BuildScript/modes.test index 02311f712d770f..1ce50104855f46 100644 --- a/lldb/test/Shell/BuildScript/modes.test +++ b/lldb/test/Shell/BuildScript/modes.test @@ -1,35 +1,35 @@ -RUN: %build -n --verbose --arch=32 --mode=compile --compiler=any -o %t/foo.out foobar.c \ -RUN: | FileCheck --check-prefix=COMPILE %s - -RUN: %build -n --verbose --arch=32 --mode=compile --compiler=any --outdir %t foo.c bar.c \ -RUN: | FileCheck --check-prefix=COMPILE-MULTI %s - -RUN: %build -n --verbose --arch=32 --mode=link --compiler=any -o %t/foo.exe foobar.obj \ -RUN: | FileCheck --check-prefix=LINK %s - -RUN: %build -n --verbose --arch=32 --mode=link --compiler=any -o %t/foobar.exe foo.obj bar.obj \ -RUN: | FileCheck --check-prefix=LINK-MULTI %s - -RUN: %build -n --verbose --arch=32 --mode=compile-and-link --compiler=any -o %t/foobar.exe foobar.c \ -RUN: | FileCheck --check-prefix=BOTH %s - -RUN: %build -n --verbose --arch=32 --mode=compile-and-link --compiler=any -o %t/foobar.exe foo.c bar.c \ -RUN: | FileCheck --check-prefix=BOTH-MULTI %s - - -COMPILE: compiling foobar.c -> foo.out - -COMPILE-MULTI: compiling foo.c -> foo.o{{(bj)?}} -COMPILE-MULTI: compiling bar.c -> bar.o{{(bj)?}} - - -LINK: linking foobar.obj -> foo.exe - -LINK-MULTI: linking foo.obj+bar.obj -> foobar.exe - -BOTH: compiling foobar.c -> [[OBJFOO:foobar.exe-foobar.o(bj)?]] -BOTH: linking [[OBJFOO]] -> foobar.exe - -BOTH-MULTI: compiling foo.c -> [[OBJFOO:foobar.exe-foo.o(bj)?]] -BOTH-MULTI: compiling bar.c -> [[OBJBAR:foobar.exe-bar.o(bj)?]] -BOTH-MULTI: linking [[OBJFOO]]+[[OBJBAR]] -> foobar.exe +RUN: %build -n --verbose --arch=32 --mode=compile --compiler=any -o %t/foo.out foobar.c \ +RUN: | FileCheck --check-prefix=COMPILE %s + +RUN: %build -n --verbose --arch=32 --mode=compile --compiler=any --outdir %t foo.c bar.c \ +RUN: | FileCheck --check-prefix=COMPILE-MULTI %s + +RUN: %build -n --verbose --arch=32 --mode=link --compiler=any -o %t/foo.exe foobar.obj \ +RUN: | FileCheck --check-prefix=LINK %s + +RUN: %build -n --verbose --arch=32 --mode=link --compiler=any -o %t/foobar.exe foo.obj bar.obj \ +RUN: | FileCheck --check-prefix=LINK-MULTI %s + +RUN: %build -n --verbose --arch=32 --mode=compile-and-link --compiler=any -o %t/foobar.exe foobar.c \ +RUN: | FileCheck --check-prefix=BOTH %s + +RUN: %build -n --verbose --arch=32 --mode=compile-and-link --compiler=any -o %t/foobar.exe foo.c bar.c \ +RUN: | FileCheck --check-prefix=BOTH-MULTI %s + + +COMPILE: compiling foobar.c -> foo.out + +COMPILE-MULTI: compiling foo.c -> foo.o{{(bj)?}} +COMPILE-MULTI: compiling bar.c -> bar.o{{(bj)?}} + + +LINK: linking foobar.obj -> foo.exe + +LINK-MULTI: linking foo.obj+bar.obj -> foobar.exe + +BOTH: compiling foobar.c -> [[OBJFOO:foobar.exe-foobar.o(bj)?]] +BOTH: linking [[OBJFOO]] -> foobar.exe + +BOTH-MULTI: compiling foo.c -> [[OBJFOO:foobar.exe-foo.o(bj)?]] +BOTH-MULTI: compiling bar.c -> [[OBJBAR:foobar.exe-bar.o(bj)?]] +BOTH-MULTI: linking [[OBJFOO]]+[[OBJBAR]] -> foobar.exe diff --git a/lldb/test/Shell/BuildScript/script-args.test b/lldb/test/Shell/BuildScript/script-args.test index 13e8a516094267..647a48e4442b12 100644 --- a/lldb/test/Shell/BuildScript/script-args.test +++ b/lldb/test/Shell/BuildScript/script-args.test @@ -1,32 +1,32 @@ -RUN: %build -n --verbose --arch=32 --mode=compile --compiler=any -o %t/foo.out foobar.c \ -RUN: | FileCheck %s -RUN: %build -n --verbose --arch=32 --mode=compile --compiler=any --outdir %t foo.c bar.c \ -RUN: | FileCheck --check-prefix=MULTI-INPUT %s - - -CHECK: Script Arguments: -CHECK-NEXT: Arch: 32 -CHECK: Compiler: any -CHECK: Outdir: {{.*}}script-args.test.tmp -CHECK: Output: {{.*}}script-args.test.tmp{{.}}foo.out -CHECK: Nodefaultlib: False -CHECK: Opt: none -CHECK: Mode: compile -CHECK: Clean: True -CHECK: Verbose: True -CHECK: Dryrun: True -CHECK: Inputs: foobar.c - -MULTI-INPUT: Script Arguments: -MULTI-INPUT-NEXT: Arch: 32 -MULTI-INPUT-NEXT: Compiler: any -MULTI-INPUT-NEXT: Outdir: {{.*}}script-args.test.tmp -MULTI-INPUT-NEXT: Output: -MULTI-INPUT-NEXT: Nodefaultlib: False -MULTI-INPUT-NEXT: Opt: none -MULTI-INPUT-NEXT: Mode: compile -MULTI-INPUT-NEXT: Clean: True -MULTI-INPUT-NEXT: Verbose: True -MULTI-INPUT-NEXT: Dryrun: True -MULTI-INPUT-NEXT: Inputs: foo.c -MULTI-INPUT-NEXT: bar.c +RUN: %build -n --verbose --arch=32 --mode=compile --compiler=any -o %t/foo.out foobar.c \ +RUN: | FileCheck %s +RUN: %build -n --verbose --arch=32 --mode=compile --compiler=any --outdir %t foo.c bar.c \ +RUN: | FileCheck --check-prefix=MULTI-INPUT %s + + +CHECK: Script Arguments: +CHECK-NEXT: Arch: 32 +CHECK: Compiler: any +CHECK: Outdir: {{.*}}script-args.test.tmp +CHECK: Output: {{.*}}script-args.test.tmp{{.}}foo.out +CHECK: Nodefaultlib: False +CHECK: Opt: none +CHECK: Mode: compile +CHECK: Clean: True +CHECK: Verbose: True +CHECK: Dryrun: True +CHECK: Inputs: foobar.c + +MULTI-INPUT: Script Arguments: +MULTI-INPUT-NEXT: Arch: 32 +MULTI-INPUT-NEXT: Compiler: any +MULTI-INPUT-NEXT: Outdir: {{.*}}script-args.test.tmp +MULTI-INPUT-NEXT: Output: +MULTI-INPUT-NEXT: Nodefaultlib: False +MULTI-INPUT-NEXT: Opt: none +MULTI-INPUT-NEXT: Mode: compile +MULTI-INPUT-NEXT: Clean: True +MULTI-INPUT-NEXT: Verbose: True +MULTI-INPUT-NEXT: Dryrun: True +MULTI-INPUT-NEXT: Inputs: foo.c +MULTI-INPUT-NEXT: bar.c diff --git a/lldb/test/Shell/BuildScript/toolchain-clang-cl.test b/lldb/test/Shell/BuildScript/toolchain-clang-cl.test index 8c9ea9fddb8a50..4f64859a02b607 100644 --- a/lldb/test/Shell/BuildScript/toolchain-clang-cl.test +++ b/lldb/test/Shell/BuildScript/toolchain-clang-cl.test @@ -1,49 +1,49 @@ -REQUIRES: lld, system-windows - -RUN: %build -n --verbose --arch=32 --compiler=clang-cl --mode=compile-and-link -o %t/foo.exe foobar.c \ -RUN: | FileCheck --check-prefix=CHECK-32 %s - -RUN: %build -n --verbose --arch=64 --compiler=clang-cl --mode=compile-and-link -o %t/foo.exe foobar.c \ -RUN: | FileCheck --check-prefix=CHECK-64 %s - -CHECK-32: Script Arguments: -CHECK-32: Arch: 32 -CHECK-32: Compiler: clang-cl -CHECK-32: Outdir: {{.*}} -CHECK-32: Output: {{.*}}toolchain-clang-cl.test.tmp\foo.exe -CHECK-32: Nodefaultlib: False -CHECK-32: Opt: none -CHECK-32: Mode: compile -CHECK-32: Clean: True -CHECK-32: Verbose: True -CHECK-32: Dryrun: True -CHECK-32: Inputs: foobar.c -CHECK-32: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foobar.ilk -CHECK-32: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.exe-foobar.obj -CHECK-32: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.pdb -CHECK-32: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.exe -CHECK-32: compiling foobar.c -> foo.exe-foobar.obj -CHECK-32: {{.*}}clang-cl{{(\.EXE)?}} -m32 -CHECK-32: linking foo.exe-foobar.obj -> foo.exe -CHECK-32: {{.*}}lld-link{{(\.EXE)?}} - -CHECK-64: Script Arguments: -CHECK-64: Arch: 64 -CHECK-64: Compiler: clang-cl -CHECK-64: Outdir: {{.*}} -CHECK-64: Output: {{.*}}toolchain-clang-cl.test.tmp\foo.exe -CHECK-64: Nodefaultlib: False -CHECK-64: Opt: none -CHECK-64: Mode: compile -CHECK-64: Clean: True -CHECK-64: Verbose: True -CHECK-64: Dryrun: True -CHECK-64: Inputs: foobar.c -CHECK-64: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foobar.ilk -CHECK-64: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.exe-foobar.obj -CHECK-64: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.pdb -CHECK-64: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.exe -CHECK-64: compiling foobar.c -> foo.exe-foobar.obj -CHECK-64: {{.*}}clang-cl{{(\.EXE)?}} -m64 -CHECK-64: linking foo.exe-foobar.obj -> foo.exe -CHECK-64: {{.*}}lld-link{{(\.EXE)?}} +REQUIRES: lld, system-windows + +RUN: %build -n --verbose --arch=32 --compiler=clang-cl --mode=compile-and-link -o %t/foo.exe foobar.c \ +RUN: | FileCheck --check-prefix=CHECK-32 %s + +RUN: %build -n --verbose --arch=64 --compiler=clang-cl --mode=compile-and-link -o %t/foo.exe foobar.c \ +RUN: | FileCheck --check-prefix=CHECK-64 %s + +CHECK-32: Script Arguments: +CHECK-32: Arch: 32 +CHECK-32: Compiler: clang-cl +CHECK-32: Outdir: {{.*}} +CHECK-32: Output: {{.*}}toolchain-clang-cl.test.tmp\foo.exe +CHECK-32: Nodefaultlib: False +CHECK-32: Opt: none +CHECK-32: Mode: compile +CHECK-32: Clean: True +CHECK-32: Verbose: True +CHECK-32: Dryrun: True +CHECK-32: Inputs: foobar.c +CHECK-32: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foobar.ilk +CHECK-32: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.exe-foobar.obj +CHECK-32: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.pdb +CHECK-32: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.exe +CHECK-32: compiling foobar.c -> foo.exe-foobar.obj +CHECK-32: {{.*}}clang-cl{{(\.EXE)?}} -m32 +CHECK-32: linking foo.exe-foobar.obj -> foo.exe +CHECK-32: {{.*}}lld-link{{(\.EXE)?}} + +CHECK-64: Script Arguments: +CHECK-64: Arch: 64 +CHECK-64: Compiler: clang-cl +CHECK-64: Outdir: {{.*}} +CHECK-64: Output: {{.*}}toolchain-clang-cl.test.tmp\foo.exe +CHECK-64: Nodefaultlib: False +CHECK-64: Opt: none +CHECK-64: Mode: compile +CHECK-64: Clean: True +CHECK-64: Verbose: True +CHECK-64: Dryrun: True +CHECK-64: Inputs: foobar.c +CHECK-64: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foobar.ilk +CHECK-64: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.exe-foobar.obj +CHECK-64: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.pdb +CHECK-64: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.exe +CHECK-64: compiling foobar.c -> foo.exe-foobar.obj +CHECK-64: {{.*}}clang-cl{{(\.EXE)?}} -m64 +CHECK-64: linking foo.exe-foobar.obj -> foo.exe +CHECK-64: {{.*}}lld-link{{(\.EXE)?}} diff --git a/lldb/test/Shell/Minidump/Windows/Sigsegv/Inputs/sigsegv.cpp b/lldb/test/Shell/Minidump/Windows/Sigsegv/Inputs/sigsegv.cpp index 6bf78b5dc43b29..d5b96472eb117f 100644 --- a/lldb/test/Shell/Minidump/Windows/Sigsegv/Inputs/sigsegv.cpp +++ b/lldb/test/Shell/Minidump/Windows/Sigsegv/Inputs/sigsegv.cpp @@ -1,40 +1,40 @@ - -// nodefaultlib build: cl -Zi sigsegv.cpp /link /nodefaultlib - -#ifdef USE_CRT -#include -#else -int main(); -extern "C" -{ - int _fltused; - void mainCRTStartup() { main(); } - void printf(const char*, ...) {} -} -#endif - -void crash(bool crash_self) -{ - printf("Before...\n"); - if(crash_self) - { - printf("Crashing in 3, 2, 1 ...\n"); - *(volatile int*)nullptr = 0; - } - printf("After...\n"); -} - -int foo(int x, float y, const char* msg) -{ - bool flag = x > y; - if(flag) - printf("x = %d, y = %f, msg = %s\n", x, y, msg); - crash(flag); - return x << 1; -} - -int main() -{ - foo(10, 3.14, "testing"); -} - + +// nodefaultlib build: cl -Zi sigsegv.cpp /link /nodefaultlib + +#ifdef USE_CRT +#include +#else +int main(); +extern "C" +{ + int _fltused; + void mainCRTStartup() { main(); } + void printf(const char*, ...) {} +} +#endif + +void crash(bool crash_self) +{ + printf("Before...\n"); + if(crash_self) + { + printf("Crashing in 3, 2, 1 ...\n"); + *(volatile int*)nullptr = 0; + } + printf("After...\n"); +} + +int foo(int x, float y, const char* msg) +{ + bool flag = x > y; + if(flag) + printf("x = %d, y = %f, msg = %s\n", x, y, msg); + crash(flag); + return x << 1; +} + +int main() +{ + foo(10, 3.14, "testing"); +} + diff --git a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/inline_sites.s b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/inline_sites.s index aac8f4c1698038..a9d248758bfcec 100644 --- a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/inline_sites.s +++ b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/inline_sites.s @@ -1,622 +1,622 @@ -# Compiled from the following files, but replaced the call to abort with nop. -# clang-cl -fuse-ld=lld-link /Z7 /O1 /Faa.asm /winsysroot~/win_toolchain a.cpp -# a.cpp: -# #include "a.h" -# int main(int argc, char** argv) { -# volatile int main_local = Namespace1::foo(2); -# return 0; -# } -# a.h: -# #include -# #include "b.h" -# namespace Namespace1 { -# inline int foo(int x) { -# volatile int foo_local = x + 1; -# ++foo_local; -# if (!foo_local) -# abort(); -# return Class1::bar(foo_local); -# } -# } // namespace Namespace1 -# b.h: -# #include "c.h" -# class Class1 { -# public: -# inline static int bar(int x) { -# volatile int bar_local = x + 1; -# ++bar_local; -# return Namespace2::Class2::func(bar_local); -# } -# }; -# c.h: -# namespace Namespace2 { -# class Class2 { -# public: -# inline static int func(int x) { -# volatile int func_local = x + 1; -# func_local += x; -# return func_local; -# } -# }; -# } // namespace Namespace2 - - .text - .def @feat.00; - .scl 3; - .type 0; - .endef - .globl @feat.00 -.set @feat.00, 0 - .intel_syntax noprefix - .file "a.cpp" - .def main; - .scl 2; - .type 32; - .endef - .section .text,"xr",one_only,main - .globl main # -- Begin function main -main: # @main -.Lfunc_begin0: - .cv_func_id 0 - .cv_file 1 "/tmp/a.cpp" "4FFB96E5DF1A95CE7DB9732CFFE001D7" 1 - .cv_loc 0 1 2 0 # a.cpp:2:0 -.seh_proc main -# %bb.0: - #DEBUG_VALUE: main:argv <- $rdx - #DEBUG_VALUE: main:argc <- $ecx - #DEBUG_VALUE: foo:x <- 2 - sub rsp, 56 - .seh_stackalloc 56 - .seh_endprologue -.Ltmp0: - .cv_file 2 "/tmp/./a.h" "BBFED90EF093E9C1D032CC9B05B5D167" 1 - .cv_inline_site_id 1 within 0 inlined_at 1 3 0 - .cv_loc 1 2 5 0 # ./a.h:5:0 - mov dword ptr [rsp + 44], 3 - .cv_loc 1 2 6 0 # ./a.h:6:0 - inc dword ptr [rsp + 44] - .cv_loc 1 2 7 0 # ./a.h:7:0 - mov eax, dword ptr [rsp + 44] - test eax, eax - je .LBB0_2 -.Ltmp1: -# %bb.1: - #DEBUG_VALUE: main:argv <- $rdx - #DEBUG_VALUE: main:argc <- $ecx - #DEBUG_VALUE: foo:x <- 2 - .cv_loc 1 2 9 0 # ./a.h:9:0 - mov eax, dword ptr [rsp + 44] -.Ltmp2: - #DEBUG_VALUE: bar:x <- $eax - .cv_file 3 "/tmp/./b.h" "A26CC743A260115F33AF91AB11F95877" 1 - .cv_inline_site_id 2 within 1 inlined_at 2 9 0 - .cv_loc 2 3 5 0 # ./b.h:5:0 - inc eax -.Ltmp3: - mov dword ptr [rsp + 52], eax - .cv_loc 2 3 6 0 # ./b.h:6:0 - inc dword ptr [rsp + 52] - .cv_loc 2 3 7 0 # ./b.h:7:0 - mov eax, dword ptr [rsp + 52] -.Ltmp4: - #DEBUG_VALUE: func:x <- $eax - .cv_file 4 "/tmp/./c.h" "8AF4613F78624BBE96D1C408ABA39B2D" 1 - .cv_inline_site_id 3 within 2 inlined_at 3 7 0 - .cv_loc 3 4 5 0 # ./c.h:5:0 - lea ecx, [rax + 1] -.Ltmp5: - #DEBUG_VALUE: main:argc <- [DW_OP_LLVM_entry_value 1] $ecx - mov dword ptr [rsp + 48], ecx - .cv_loc 3 4 6 0 # ./c.h:6:0 - add dword ptr [rsp + 48], eax - .cv_loc 3 4 7 0 # ./c.h:7:0 - mov eax, dword ptr [rsp + 48] -.Ltmp6: - .cv_loc 0 1 3 0 # a.cpp:3:0 - mov dword ptr [rsp + 48], eax - .cv_loc 0 1 4 0 # a.cpp:4:0 - xor eax, eax - # Use fake debug info to tests inline info. - .cv_loc 1 2 20 0 - add rsp, 56 - ret -.Ltmp7: -.LBB0_2: - #DEBUG_VALUE: main:argv <- $rdx - #DEBUG_VALUE: main:argc <- $ecx - #DEBUG_VALUE: foo:x <- 2 - .cv_loc 1 2 8 0 # ./a.h:8:0 - nop -.Ltmp8: - int3 -.Ltmp9: - #DEBUG_VALUE: main:argc <- [DW_OP_LLVM_entry_value 1] $ecx - #DEBUG_VALUE: main:argv <- [DW_OP_LLVM_entry_value 1] $rdx -.Lfunc_end0: - .seh_endproc - # -- End function - .section .drectve,"yn" - .ascii " /DEFAULTLIB:libcmt.lib" - .ascii " /DEFAULTLIB:oldnames.lib" - .section .debug$S,"dr" - .p2align 2 - .long 4 # Debug section magic - .long 241 - .long .Ltmp11-.Ltmp10 # Subsection size -.Ltmp10: - .short .Ltmp13-.Ltmp12 # Record length -.Ltmp12: - .short 4353 # Record kind: S_OBJNAME - .long 0 # Signature - .asciz "/tmp/a-2b2ba0.obj" # Object name - .p2align 2 -.Ltmp13: - .short .Ltmp15-.Ltmp14 # Record length -.Ltmp14: - .short 4412 # Record kind: S_COMPILE3 - .long 1 # Flags and language - .short 208 # CPUType - .short 15 # Frontend version - .short 0 - .short 0 - .short 0 - .short 15000 # Backend version - .short 0 - .short 0 - .short 0 - .asciz "clang version 15.0.0" # Null-terminated compiler version string - .p2align 2 -.Ltmp15: -.Ltmp11: - .p2align 2 - .long 246 # Inlinee lines subsection - .long .Ltmp17-.Ltmp16 # Subsection size -.Ltmp16: - .long 0 # Inlinee lines signature - - # Inlined function foo starts at ./a.h:4 - .long 4099 # Type index of inlined function - .cv_filechecksumoffset 2 # Offset into filechecksum table - .long 4 # Starting line number - - # Inlined function bar starts at ./b.h:4 - .long 4106 # Type index of inlined function - .cv_filechecksumoffset 3 # Offset into filechecksum table - .long 4 # Starting line number - - # Inlined function func starts at ./c.h:4 - .long 4113 # Type index of inlined function - .cv_filechecksumoffset 4 # Offset into filechecksum table - .long 4 # Starting line number -.Ltmp17: - .p2align 2 - .section .debug$S,"dr",associative,main - .p2align 2 - .long 4 # Debug section magic - .long 241 # Symbol subsection for main - .long .Ltmp19-.Ltmp18 # Subsection size -.Ltmp18: - .short .Ltmp21-.Ltmp20 # Record length -.Ltmp20: - .short 4423 # Record kind: S_GPROC32_ID - .long 0 # PtrParent - .long 0 # PtrEnd - .long 0 # PtrNext - .long .Lfunc_end0-main # Code size - .long 0 # Offset after prologue - .long 0 # Offset before epilogue - .long 4117 # Function type index - .secrel32 main # Function section relative address - .secidx main # Function section index - .byte 0 # Flags - .asciz "main" # Function name - .p2align 2 -.Ltmp21: - .short .Ltmp23-.Ltmp22 # Record length -.Ltmp22: - .short 4114 # Record kind: S_FRAMEPROC - .long 56 # FrameSize - .long 0 # Padding - .long 0 # Offset of padding - .long 0 # Bytes of callee saved registers - .long 0 # Exception handler offset - .short 0 # Exception handler section - .long 81920 # Flags (defines frame register) - .p2align 2 -.Ltmp23: - .short .Ltmp25-.Ltmp24 # Record length -.Ltmp24: - .short 4414 # Record kind: S_LOCAL - .long 116 # TypeIndex - .short 1 # Flags - .asciz "argc" - .p2align 2 -.Ltmp25: - .cv_def_range .Lfunc_begin0 .Ltmp5 .Ltmp7 .Ltmp8, reg, 18 - .short .Ltmp27-.Ltmp26 # Record length -.Ltmp26: - .short 4414 # Record kind: S_LOCAL - .long 4114 # TypeIndex - .short 1 # Flags - .asciz "argv" - .p2align 2 -.Ltmp27: - .cv_def_range .Lfunc_begin0 .Ltmp8, reg, 331 - .short .Ltmp29-.Ltmp28 # Record length -.Ltmp28: - .short 4414 # Record kind: S_LOCAL - .long 4118 # TypeIndex - .short 0 # Flags - .asciz "main_local" - .p2align 2 -.Ltmp29: - .cv_def_range .Ltmp0 .Ltmp9, frame_ptr_rel, 48 - .short .Ltmp31-.Ltmp30 # Record length -.Ltmp30: - .short 4429 # Record kind: S_INLINESITE - .long 0 # PtrParent - .long 0 # PtrEnd - .long 4099 # Inlinee type index - .cv_inline_linetable 1 2 4 .Lfunc_begin0 .Lfunc_end0 - .p2align 2 -.Ltmp31: - .short .Ltmp33-.Ltmp32 # Record length -.Ltmp32: - .short 4414 # Record kind: S_LOCAL - .long 116 # TypeIndex - .short 257 # Flags - .asciz "x" - .p2align 2 -.Ltmp33: - .short .Ltmp35-.Ltmp34 # Record length -.Ltmp34: - .short 4414 # Record kind: S_LOCAL - .long 4118 # TypeIndex - .short 0 # Flags - .asciz "foo_local" - .p2align 2 -.Ltmp35: - .cv_def_range .Ltmp0 .Ltmp6 .Ltmp7 .Ltmp9, frame_ptr_rel, 44 - .short .Ltmp37-.Ltmp36 # Record length -.Ltmp36: - .short 4429 # Record kind: S_INLINESITE - .long 0 # PtrParent - .long 0 # PtrEnd - .long 4106 # Inlinee type index - .cv_inline_linetable 2 3 4 .Lfunc_begin0 .Lfunc_end0 - .p2align 2 -.Ltmp37: - .short .Ltmp39-.Ltmp38 # Record length -.Ltmp38: - .short 4414 # Record kind: S_LOCAL - .long 116 # TypeIndex - .short 1 # Flags - .asciz "x" - .p2align 2 -.Ltmp39: - .cv_def_range .Ltmp2 .Ltmp3, reg, 17 - .short .Ltmp41-.Ltmp40 # Record length -.Ltmp40: - .short 4414 # Record kind: S_LOCAL - .long 4118 # TypeIndex - .short 0 # Flags - .asciz "bar_local" - .p2align 2 -.Ltmp41: - .cv_def_range .Ltmp2 .Ltmp6, frame_ptr_rel, 52 - .short .Ltmp43-.Ltmp42 # Record length -.Ltmp42: - .short 4429 # Record kind: S_INLINESITE - .long 0 # PtrParent - .long 0 # PtrEnd - .long 4113 # Inlinee type index - .cv_inline_linetable 3 4 4 .Lfunc_begin0 .Lfunc_end0 - .p2align 2 -.Ltmp43: - .short .Ltmp45-.Ltmp44 # Record length -.Ltmp44: - .short 4414 # Record kind: S_LOCAL - .long 116 # TypeIndex - .short 1 # Flags - .asciz "x" - .p2align 2 -.Ltmp45: - .cv_def_range .Ltmp4 .Ltmp6, reg, 17 - .short .Ltmp47-.Ltmp46 # Record length -.Ltmp46: - .short 4414 # Record kind: S_LOCAL - .long 4118 # TypeIndex - .short 0 # Flags - .asciz "func_local" - .p2align 2 -.Ltmp47: - .cv_def_range .Ltmp4 .Ltmp6, frame_ptr_rel, 48 - .short 2 # Record length - .short 4430 # Record kind: S_INLINESITE_END - .short 2 # Record length - .short 4430 # Record kind: S_INLINESITE_END - .short 2 # Record length - .short 4430 # Record kind: S_INLINESITE_END - .short 2 # Record length - .short 4431 # Record kind: S_PROC_ID_END -.Ltmp19: - .p2align 2 - .cv_linetable 0, main, .Lfunc_end0 - .section .debug$S,"dr" - .long 241 - .long .Ltmp49-.Ltmp48 # Subsection size -.Ltmp48: - .short .Ltmp51-.Ltmp50 # Record length -.Ltmp50: - .short 4360 # Record kind: S_UDT - .long 4103 # Type - .asciz "Class1" - .p2align 2 -.Ltmp51: - .short .Ltmp53-.Ltmp52 # Record length -.Ltmp52: - .short 4360 # Record kind: S_UDT - .long 4110 # Type - .asciz "Namespace2::Class2" - .p2align 2 -.Ltmp53: -.Ltmp49: - .p2align 2 - .cv_filechecksums # File index to string table offset subsection - .cv_stringtable # String table - .long 241 - .long .Ltmp55-.Ltmp54 # Subsection size -.Ltmp54: - .short .Ltmp57-.Ltmp56 # Record length -.Ltmp56: - .short 4428 # Record kind: S_BUILDINFO - .long 4124 # LF_BUILDINFO index - .p2align 2 -.Ltmp57: -.Ltmp55: - .p2align 2 - .section .debug$T,"dr" - .p2align 2 - .long 4 # Debug section magic - # StringId (0x1000) - .short 0x12 # Record length - .short 0x1605 # Record kind: LF_STRING_ID - .long 0x0 # Id - .asciz "Namespace1" # StringData - .byte 241 - # ArgList (0x1001) - .short 0xa # Record length - .short 0x1201 # Record kind: LF_ARGLIST - .long 0x1 # NumArgs - .long 0x74 # Argument: int - # Procedure (0x1002) - .short 0xe # Record length - .short 0x1008 # Record kind: LF_PROCEDURE - .long 0x74 # ReturnType: int - .byte 0x0 # CallingConvention: NearC - .byte 0x0 # FunctionOptions - .short 0x1 # NumParameters - .long 0x1001 # ArgListType: (int) - # FuncId (0x1003) - .short 0xe # Record length - .short 0x1601 # Record kind: LF_FUNC_ID - .long 0x1000 # ParentScope: Namespace1 - .long 0x1002 # FunctionType: int (int) - .asciz "foo" # Name - # Class (0x1004) - .short 0x2a # Record length - .short 0x1504 # Record kind: LF_CLASS - .short 0x0 # MemberCount - .short 0x280 # Properties ( ForwardReference (0x80) | HasUniqueName (0x200) ) - .long 0x0 # FieldList - .long 0x0 # DerivedFrom - .long 0x0 # VShape - .short 0x0 # SizeOf - .asciz "Class1" # Name - .asciz ".?AVClass1@@" # LinkageName - .byte 242 - .byte 241 - # MemberFunction (0x1005) - .short 0x1a # Record length - .short 0x1009 # Record kind: LF_MFUNCTION - .long 0x74 # ReturnType: int - .long 0x1004 # ClassType: Class1 - .long 0x0 # ThisType - .byte 0x0 # CallingConvention: NearC - .byte 0x0 # FunctionOptions - .short 0x1 # NumParameters - .long 0x1001 # ArgListType: (int) - .long 0x0 # ThisAdjustment - # FieldList (0x1006) - .short 0xe # Record length - .short 0x1203 # Record kind: LF_FIELDLIST - .short 0x1511 # Member kind: OneMethod ( LF_ONEMETHOD ) - .short 0xb # Attrs: Public, Static - .long 0x1005 # Type: int Class1::(int) - .asciz "bar" # Name - # Class (0x1007) - .short 0x2a # Record length - .short 0x1504 # Record kind: LF_CLASS - .short 0x1 # MemberCount - .short 0x200 # Properties ( HasUniqueName (0x200) ) - .long 0x1006 # FieldList: - .long 0x0 # DerivedFrom - .long 0x0 # VShape - .short 0x1 # SizeOf - .asciz "Class1" # Name - .asciz ".?AVClass1@@" # LinkageName - .byte 242 - .byte 241 - # StringId (0x1008) - .short 0x12 # Record length - .short 0x1605 # Record kind: LF_STRING_ID - .long 0x0 # Id - .asciz "/tmp/./b.h" # StringData - .byte 241 - # UdtSourceLine (0x1009) - .short 0xe # Record length - .short 0x1606 # Record kind: LF_UDT_SRC_LINE - .long 0x1007 # UDT: Class1 - .long 0x1008 # SourceFile: /tmp/./b.h - .long 0x2 # LineNumber - # MemberFuncId (0x100A) - .short 0xe # Record length - .short 0x1602 # Record kind: LF_MFUNC_ID - .long 0x1004 # ClassType: Class1 - .long 0x1005 # FunctionType: int Class1::(int) - .asciz "bar" # Name - # Class (0x100B) - .short 0x42 # Record length - .short 0x1504 # Record kind: LF_CLASS - .short 0x0 # MemberCount - .short 0x280 # Properties ( ForwardReference (0x80) | HasUniqueName (0x200) ) - .long 0x0 # FieldList - .long 0x0 # DerivedFrom - .long 0x0 # VShape - .short 0x0 # SizeOf - .asciz "Namespace2::Class2" # Name - .asciz ".?AVClass2 at Namespace2@@" # LinkageName - .byte 243 - .byte 242 - .byte 241 - # MemberFunction (0x100C) - .short 0x1a # Record length - .short 0x1009 # Record kind: LF_MFUNCTION - .long 0x74 # ReturnType: int - .long 0x100b # ClassType: Namespace2::Class2 - .long 0x0 # ThisType - .byte 0x0 # CallingConvention: NearC - .byte 0x0 # FunctionOptions - .short 0x1 # NumParameters - .long 0x1001 # ArgListType: (int) - .long 0x0 # ThisAdjustment - # FieldList (0x100D) - .short 0x12 # Record length - .short 0x1203 # Record kind: LF_FIELDLIST - .short 0x1511 # Member kind: OneMethod ( LF_ONEMETHOD ) - .short 0xb # Attrs: Public, Static - .long 0x100c # Type: int Namespace2::Class2::(int) - .asciz "func" # Name - .byte 243 - .byte 242 - .byte 241 - # Class (0x100E) - .short 0x42 # Record length - .short 0x1504 # Record kind: LF_CLASS - .short 0x1 # MemberCount - .short 0x200 # Properties ( HasUniqueName (0x200) ) - .long 0x100d # FieldList: - .long 0x0 # DerivedFrom - .long 0x0 # VShape - .short 0x1 # SizeOf - .asciz "Namespace2::Class2" # Name - .asciz ".?AVClass2 at Namespace2@@" # LinkageName - .byte 243 - .byte 242 - .byte 241 - # StringId (0x100F) - .short 0x12 # Record length - .short 0x1605 # Record kind: LF_STRING_ID - .long 0x0 # Id - .asciz "/tmp/./c.h" # StringData - .byte 241 - # UdtSourceLine (0x1010) - .short 0xe # Record length - .short 0x1606 # Record kind: LF_UDT_SRC_LINE - .long 0x100e # UDT: Namespace2::Class2 - .long 0x100f # SourceFile: /tmp/./c.h - .long 0x2 # LineNumber - # MemberFuncId (0x1011) - .short 0x12 # Record length - .short 0x1602 # Record kind: LF_MFUNC_ID - .long 0x100b # ClassType: Namespace2::Class2 - .long 0x100c # FunctionType: int Namespace2::Class2::(int) - .asciz "func" # Name - .byte 243 - .byte 242 - .byte 241 - # Pointer (0x1012) - .short 0xa # Record length - .short 0x1002 # Record kind: LF_POINTER - .long 0x670 # PointeeType: char* - .long 0x1000c # Attrs: [ Type: Near64, Mode: Pointer, SizeOf: 8 ] - # ArgList (0x1013) - .short 0xe # Record length - .short 0x1201 # Record kind: LF_ARGLIST - .long 0x2 # NumArgs - .long 0x74 # Argument: int - .long 0x1012 # Argument: char** - # Procedure (0x1014) - .short 0xe # Record length - .short 0x1008 # Record kind: LF_PROCEDURE - .long 0x74 # ReturnType: int - .byte 0x0 # CallingConvention: NearC - .byte 0x0 # FunctionOptions - .short 0x2 # NumParameters - .long 0x1013 # ArgListType: (int, char**) - # FuncId (0x1015) - .short 0x12 # Record length - .short 0x1601 # Record kind: LF_FUNC_ID - .long 0x0 # ParentScope - .long 0x1014 # FunctionType: int (int, char**) - .asciz "main" # Name - .byte 243 - .byte 242 - .byte 241 - # Modifier (0x1016) - .short 0xa # Record length - .short 0x1001 # Record kind: LF_MODIFIER - .long 0x74 # ModifiedType: int - .short 0x2 # Modifiers ( Volatile (0x2) ) - .byte 242 - .byte 241 - # StringId (0x1017) - .short 0xe # Record length - .short 0x1605 # Record kind: LF_STRING_ID - .long 0x0 # Id - .asciz "/tmp" # StringData - .byte 243 - .byte 242 - .byte 241 - # StringId (0x1018) - .short 0xe # Record length - .short 0x1605 # Record kind: LF_STRING_ID - .long 0x0 # Id - .asciz "a.cpp" # StringData - .byte 242 - .byte 241 - # StringId (0x1019) - .short 0xa # Record length - .short 0x1605 # Record kind: LF_STRING_ID - .long 0x0 # Id - .byte 0 # StringData - .byte 243 - .byte 242 - .byte 241 - # StringId (0x101A) - .short 0x4e # Record length - .short 0x1605 # Record kind: LF_STRING_ID - .long 0x0 # Id - .asciz "/usr/local/google/home/zequanwu/llvm-project/build/release/bin/clang" # StringData - .byte 243 - .byte 242 - .byte 241 - # StringId (0x101B) - .short 0x9f6 # Record length - .short 0x1605 # Record kind: LF_STRING_ID - .long 0x0 # Id - .asciz "\"-cc1\" \"-triple\" \"x86_64-pc-windows-msvc19.20.0\" \"-S\" \"-disable-free\" \"-clear-ast-before-backend\" \"-disable-llvm-verifier\" \"-discard-value-names\" \"-mrelocation-model\" \"pic\" \"-pic-level\" \"2\" \"-mframe-pointer=none\" \"-relaxed-aliasing\" \"-fmath-errno\" \"-ffp-contract=on\" \"-fno-rounding-math\" \"-mconstructor-aliases\" \"-funwind-tables=2\" \"-target-cpu\" \"x86-64\" \"-mllvm\" \"-x86-asm-syntax=intel\" \"-tune-cpu\" \"generic\" \"-mllvm\" \"-treat-scalable-fixed-error-as-warning\" \"-D_MT\" \"-flto-visibility-public-std\" \"--dependent-lib=libcmt\" \"--dependent-lib=oldnames\" \"-stack-protector\" \"2\" \"-fms-volatile\" \"-fdiagnostics-format\" \"msvc\" \"-gno-column-info\" \"-gcodeview\" \"-debug-info-kind=constructor\" \"-ffunction-sections\" \"-fcoverage-compilation-dir=/tmp\" \"-resource-dir\" \"/usr/local/google/home/zequanwu/llvm-project/build/release/lib/clang/15.0.0\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/llvm-project/build/release/lib/clang/15.0.0/include\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/DIA SDK/include\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/VC/Tools/MSVC/14.26.28801/include\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/VC/Tools/MSVC/14.26.28801/atlmfc/include\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/ucrt\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/shared\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/um\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/winrt\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/cppwinrt\" \"-Os\" \"-fdeprecated-macro\" \"-fdebug-compilation-dir=/tmp\" \"-ferror-limit\" \"19\" \"-fno-use-cxa-atexit\" \"-fms-extensions\" \"-fms-compatibility\" \"-fms-compatibility-version=19.20\" \"-std=c++14\" \"-fdelayed-template-parsing\" \"-fcolor-diagnostics\" \"-vectorize-loops\" \"-vectorize-slp\" \"-faddrsig\" \"-x\" \"c++\"" # StringData - .byte 242 - .byte 241 - # BuildInfo (0x101C) - .short 0x1a # Record length - .short 0x1603 # Record kind: LF_BUILDINFO - .short 0x5 # NumArgs - .long 0x1017 # Argument: /tmp - .long 0x101a # Argument: /usr/local/google/home/zequanwu/llvm-project/build/release/bin/clang - .long 0x1018 # Argument: a.cpp - .long 0x1019 # Argument - .long 0x101b # Argument: "-cc1" "-triple" "x86_64-pc-windows-msvc19.20.0" "-S" "-disable-free" "-clear-ast-before-backend" "-disable-llvm-verifier" "-discard-value-names" "-mrelocation-model" "pic" "-pic-level" "2" "-mframe-pointer=none" "-relaxed-aliasing" "-fmath-errno" "-ffp-contract=on" "-fno-rounding-math" "-mconstructor-aliases" "-funwind-tables=2" "-target-cpu" "x86-64" "-mllvm" "-x86-asm-syntax=intel" "-tune-cpu" "generic" "-mllvm" "-treat-scalable-fixed-error-as-warning" "-D_MT" "-flto-visibility-public-std" "--dependent-lib=libcmt" "--dependent-lib=oldnames" "-stack-protector" "2" "-fms-volatile" "-fdiagnostics-format" "msvc" "-gno-column-info" "-gcodeview" "-debug-info-kind=constructor" "-ffunction-sections" "-fcoverage-compilation-dir=/tmp" "-resource-dir" "/usr/local/google/home/zequanwu/llvm-project/build/release/lib/clang/15.0.0" "-internal-isystem" "/usr/local/google/home/zequanwu/llvm-project/build/release/lib/clang/15.0.0/include" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/DIA SDK/include" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/VC/Tools/MSVC/14.26.28801/include" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/VC/Tools/MSVC/14.26.28801/atlmfc/include" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/ucrt" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/shared" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/um" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/winrt" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/cppwinrt" "-Os" "-fdeprecated-macro" "-fdebug-compilation-dir=/tmp" "-ferror-limit" "19" "-fno-use-cxa-atexit" "-fms-extensions" "-fms-compatibility" "-fms-compatibility-version=19.20" "-std=c++14" "-fdelayed-template-parsing" "-fcolor-diagnostics" "-vectorize-loops" "-vectorize-slp" "-faddrsig" "-x" "c++" - .byte 242 - .byte 241 - .addrsig +# Compiled from the following files, but replaced the call to abort with nop. +# clang-cl -fuse-ld=lld-link /Z7 /O1 /Faa.asm /winsysroot~/win_toolchain a.cpp +# a.cpp: +# #include "a.h" +# int main(int argc, char** argv) { +# volatile int main_local = Namespace1::foo(2); +# return 0; +# } +# a.h: +# #include +# #include "b.h" +# namespace Namespace1 { +# inline int foo(int x) { +# volatile int foo_local = x + 1; +# ++foo_local; +# if (!foo_local) +# abort(); +# return Class1::bar(foo_local); +# } +# } // namespace Namespace1 +# b.h: +# #include "c.h" +# class Class1 { +# public: +# inline static int bar(int x) { +# volatile int bar_local = x + 1; +# ++bar_local; +# return Namespace2::Class2::func(bar_local); +# } +# }; +# c.h: +# namespace Namespace2 { +# class Class2 { +# public: +# inline static int func(int x) { +# volatile int func_local = x + 1; +# func_local += x; +# return func_local; +# } +# }; +# } // namespace Namespace2 + + .text + .def @feat.00; + .scl 3; + .type 0; + .endef + .globl @feat.00 +.set @feat.00, 0 + .intel_syntax noprefix + .file "a.cpp" + .def main; + .scl 2; + .type 32; + .endef + .section .text,"xr",one_only,main + .globl main # -- Begin function main +main: # @main +.Lfunc_begin0: + .cv_func_id 0 + .cv_file 1 "/tmp/a.cpp" "4FFB96E5DF1A95CE7DB9732CFFE001D7" 1 + .cv_loc 0 1 2 0 # a.cpp:2:0 +.seh_proc main +# %bb.0: + #DEBUG_VALUE: main:argv <- $rdx + #DEBUG_VALUE: main:argc <- $ecx + #DEBUG_VALUE: foo:x <- 2 + sub rsp, 56 + .seh_stackalloc 56 + .seh_endprologue +.Ltmp0: + .cv_file 2 "/tmp/./a.h" "BBFED90EF093E9C1D032CC9B05B5D167" 1 + .cv_inline_site_id 1 within 0 inlined_at 1 3 0 + .cv_loc 1 2 5 0 # ./a.h:5:0 + mov dword ptr [rsp + 44], 3 + .cv_loc 1 2 6 0 # ./a.h:6:0 + inc dword ptr [rsp + 44] + .cv_loc 1 2 7 0 # ./a.h:7:0 + mov eax, dword ptr [rsp + 44] + test eax, eax + je .LBB0_2 +.Ltmp1: +# %bb.1: + #DEBUG_VALUE: main:argv <- $rdx + #DEBUG_VALUE: main:argc <- $ecx + #DEBUG_VALUE: foo:x <- 2 + .cv_loc 1 2 9 0 # ./a.h:9:0 + mov eax, dword ptr [rsp + 44] +.Ltmp2: + #DEBUG_VALUE: bar:x <- $eax + .cv_file 3 "/tmp/./b.h" "A26CC743A260115F33AF91AB11F95877" 1 + .cv_inline_site_id 2 within 1 inlined_at 2 9 0 + .cv_loc 2 3 5 0 # ./b.h:5:0 + inc eax +.Ltmp3: + mov dword ptr [rsp + 52], eax + .cv_loc 2 3 6 0 # ./b.h:6:0 + inc dword ptr [rsp + 52] + .cv_loc 2 3 7 0 # ./b.h:7:0 + mov eax, dword ptr [rsp + 52] +.Ltmp4: + #DEBUG_VALUE: func:x <- $eax + .cv_file 4 "/tmp/./c.h" "8AF4613F78624BBE96D1C408ABA39B2D" 1 + .cv_inline_site_id 3 within 2 inlined_at 3 7 0 + .cv_loc 3 4 5 0 # ./c.h:5:0 + lea ecx, [rax + 1] +.Ltmp5: + #DEBUG_VALUE: main:argc <- [DW_OP_LLVM_entry_value 1] $ecx + mov dword ptr [rsp + 48], ecx + .cv_loc 3 4 6 0 # ./c.h:6:0 + add dword ptr [rsp + 48], eax + .cv_loc 3 4 7 0 # ./c.h:7:0 + mov eax, dword ptr [rsp + 48] +.Ltmp6: + .cv_loc 0 1 3 0 # a.cpp:3:0 + mov dword ptr [rsp + 48], eax + .cv_loc 0 1 4 0 # a.cpp:4:0 + xor eax, eax + # Use fake debug info to tests inline info. + .cv_loc 1 2 20 0 + add rsp, 56 + ret +.Ltmp7: +.LBB0_2: + #DEBUG_VALUE: main:argv <- $rdx + #DEBUG_VALUE: main:argc <- $ecx + #DEBUG_VALUE: foo:x <- 2 + .cv_loc 1 2 8 0 # ./a.h:8:0 + nop +.Ltmp8: + int3 +.Ltmp9: + #DEBUG_VALUE: main:argc <- [DW_OP_LLVM_entry_value 1] $ecx + #DEBUG_VALUE: main:argv <- [DW_OP_LLVM_entry_value 1] $rdx +.Lfunc_end0: + .seh_endproc + # -- End function + .section .drectve,"yn" + .ascii " /DEFAULTLIB:libcmt.lib" + .ascii " /DEFAULTLIB:oldnames.lib" + .section .debug$S,"dr" + .p2align 2 + .long 4 # Debug section magic + .long 241 + .long .Ltmp11-.Ltmp10 # Subsection size +.Ltmp10: + .short .Ltmp13-.Ltmp12 # Record length +.Ltmp12: + .short 4353 # Record kind: S_OBJNAME + .long 0 # Signature + .asciz "/tmp/a-2b2ba0.obj" # Object name + .p2align 2 +.Ltmp13: + .short .Ltmp15-.Ltmp14 # Record length +.Ltmp14: + .short 4412 # Record kind: S_COMPILE3 + .long 1 # Flags and language + .short 208 # CPUType + .short 15 # Frontend version + .short 0 + .short 0 + .short 0 + .short 15000 # Backend version + .short 0 + .short 0 + .short 0 + .asciz "clang version 15.0.0" # Null-terminated compiler version string + .p2align 2 +.Ltmp15: +.Ltmp11: + .p2align 2 + .long 246 # Inlinee lines subsection + .long .Ltmp17-.Ltmp16 # Subsection size +.Ltmp16: + .long 0 # Inlinee lines signature + + # Inlined function foo starts at ./a.h:4 + .long 4099 # Type index of inlined function + .cv_filechecksumoffset 2 # Offset into filechecksum table + .long 4 # Starting line number + + # Inlined function bar starts at ./b.h:4 + .long 4106 # Type index of inlined function + .cv_filechecksumoffset 3 # Offset into filechecksum table + .long 4 # Starting line number + + # Inlined function func starts at ./c.h:4 + .long 4113 # Type index of inlined function + .cv_filechecksumoffset 4 # Offset into filechecksum table + .long 4 # Starting line number +.Ltmp17: + .p2align 2 + .section .debug$S,"dr",associative,main + .p2align 2 + .long 4 # Debug section magic + .long 241 # Symbol subsection for main + .long .Ltmp19-.Ltmp18 # Subsection size +.Ltmp18: + .short .Ltmp21-.Ltmp20 # Record length +.Ltmp20: + .short 4423 # Record kind: S_GPROC32_ID + .long 0 # PtrParent + .long 0 # PtrEnd + .long 0 # PtrNext + .long .Lfunc_end0-main # Code size + .long 0 # Offset after prologue + .long 0 # Offset before epilogue + .long 4117 # Function type index + .secrel32 main # Function section relative address + .secidx main # Function section index + .byte 0 # Flags + .asciz "main" # Function name + .p2align 2 +.Ltmp21: + .short .Ltmp23-.Ltmp22 # Record length +.Ltmp22: + .short 4114 # Record kind: S_FRAMEPROC + .long 56 # FrameSize + .long 0 # Padding + .long 0 # Offset of padding + .long 0 # Bytes of callee saved registers + .long 0 # Exception handler offset + .short 0 # Exception handler section + .long 81920 # Flags (defines frame register) + .p2align 2 +.Ltmp23: + .short .Ltmp25-.Ltmp24 # Record length +.Ltmp24: + .short 4414 # Record kind: S_LOCAL + .long 116 # TypeIndex + .short 1 # Flags + .asciz "argc" + .p2align 2 +.Ltmp25: + .cv_def_range .Lfunc_begin0 .Ltmp5 .Ltmp7 .Ltmp8, reg, 18 + .short .Ltmp27-.Ltmp26 # Record length +.Ltmp26: + .short 4414 # Record kind: S_LOCAL + .long 4114 # TypeIndex + .short 1 # Flags + .asciz "argv" + .p2align 2 +.Ltmp27: + .cv_def_range .Lfunc_begin0 .Ltmp8, reg, 331 + .short .Ltmp29-.Ltmp28 # Record length +.Ltmp28: + .short 4414 # Record kind: S_LOCAL + .long 4118 # TypeIndex + .short 0 # Flags + .asciz "main_local" + .p2align 2 +.Ltmp29: + .cv_def_range .Ltmp0 .Ltmp9, frame_ptr_rel, 48 + .short .Ltmp31-.Ltmp30 # Record length +.Ltmp30: + .short 4429 # Record kind: S_INLINESITE + .long 0 # PtrParent + .long 0 # PtrEnd + .long 4099 # Inlinee type index + .cv_inline_linetable 1 2 4 .Lfunc_begin0 .Lfunc_end0 + .p2align 2 +.Ltmp31: + .short .Ltmp33-.Ltmp32 # Record length +.Ltmp32: + .short 4414 # Record kind: S_LOCAL + .long 116 # TypeIndex + .short 257 # Flags + .asciz "x" + .p2align 2 +.Ltmp33: + .short .Ltmp35-.Ltmp34 # Record length +.Ltmp34: + .short 4414 # Record kind: S_LOCAL + .long 4118 # TypeIndex + .short 0 # Flags + .asciz "foo_local" + .p2align 2 +.Ltmp35: + .cv_def_range .Ltmp0 .Ltmp6 .Ltmp7 .Ltmp9, frame_ptr_rel, 44 + .short .Ltmp37-.Ltmp36 # Record length +.Ltmp36: + .short 4429 # Record kind: S_INLINESITE + .long 0 # PtrParent + .long 0 # PtrEnd + .long 4106 # Inlinee type index + .cv_inline_linetable 2 3 4 .Lfunc_begin0 .Lfunc_end0 + .p2align 2 +.Ltmp37: + .short .Ltmp39-.Ltmp38 # Record length +.Ltmp38: + .short 4414 # Record kind: S_LOCAL + .long 116 # TypeIndex + .short 1 # Flags + .asciz "x" + .p2align 2 +.Ltmp39: + .cv_def_range .Ltmp2 .Ltmp3, reg, 17 + .short .Ltmp41-.Ltmp40 # Record length +.Ltmp40: + .short 4414 # Record kind: S_LOCAL + .long 4118 # TypeIndex + .short 0 # Flags + .asciz "bar_local" + .p2align 2 +.Ltmp41: + .cv_def_range .Ltmp2 .Ltmp6, frame_ptr_rel, 52 + .short .Ltmp43-.Ltmp42 # Record length +.Ltmp42: + .short 4429 # Record kind: S_INLINESITE + .long 0 # PtrParent + .long 0 # PtrEnd + .long 4113 # Inlinee type index + .cv_inline_linetable 3 4 4 .Lfunc_begin0 .Lfunc_end0 + .p2align 2 +.Ltmp43: + .short .Ltmp45-.Ltmp44 # Record length +.Ltmp44: + .short 4414 # Record kind: S_LOCAL + .long 116 # TypeIndex + .short 1 # Flags + .asciz "x" + .p2align 2 +.Ltmp45: + .cv_def_range .Ltmp4 .Ltmp6, reg, 17 + .short .Ltmp47-.Ltmp46 # Record length +.Ltmp46: + .short 4414 # Record kind: S_LOCAL + .long 4118 # TypeIndex + .short 0 # Flags + .asciz "func_local" + .p2align 2 +.Ltmp47: + .cv_def_range .Ltmp4 .Ltmp6, frame_ptr_rel, 48 + .short 2 # Record length + .short 4430 # Record kind: S_INLINESITE_END + .short 2 # Record length + .short 4430 # Record kind: S_INLINESITE_END + .short 2 # Record length + .short 4430 # Record kind: S_INLINESITE_END + .short 2 # Record length + .short 4431 # Record kind: S_PROC_ID_END +.Ltmp19: + .p2align 2 + .cv_linetable 0, main, .Lfunc_end0 + .section .debug$S,"dr" + .long 241 + .long .Ltmp49-.Ltmp48 # Subsection size +.Ltmp48: + .short .Ltmp51-.Ltmp50 # Record length +.Ltmp50: + .short 4360 # Record kind: S_UDT + .long 4103 # Type + .asciz "Class1" + .p2align 2 +.Ltmp51: + .short .Ltmp53-.Ltmp52 # Record length +.Ltmp52: + .short 4360 # Record kind: S_UDT + .long 4110 # Type + .asciz "Namespace2::Class2" + .p2align 2 +.Ltmp53: +.Ltmp49: + .p2align 2 + .cv_filechecksums # File index to string table offset subsection + .cv_stringtable # String table + .long 241 + .long .Ltmp55-.Ltmp54 # Subsection size +.Ltmp54: + .short .Ltmp57-.Ltmp56 # Record length +.Ltmp56: + .short 4428 # Record kind: S_BUILDINFO + .long 4124 # LF_BUILDINFO index + .p2align 2 +.Ltmp57: +.Ltmp55: + .p2align 2 + .section .debug$T,"dr" + .p2align 2 + .long 4 # Debug section magic + # StringId (0x1000) + .short 0x12 # Record length + .short 0x1605 # Record kind: LF_STRING_ID + .long 0x0 # Id + .asciz "Namespace1" # StringData + .byte 241 + # ArgList (0x1001) + .short 0xa # Record length + .short 0x1201 # Record kind: LF_ARGLIST + .long 0x1 # NumArgs + .long 0x74 # Argument: int + # Procedure (0x1002) + .short 0xe # Record length + .short 0x1008 # Record kind: LF_PROCEDURE + .long 0x74 # ReturnType: int + .byte 0x0 # CallingConvention: NearC + .byte 0x0 # FunctionOptions + .short 0x1 # NumParameters + .long 0x1001 # ArgListType: (int) + # FuncId (0x1003) + .short 0xe # Record length + .short 0x1601 # Record kind: LF_FUNC_ID + .long 0x1000 # ParentScope: Namespace1 + .long 0x1002 # FunctionType: int (int) + .asciz "foo" # Name + # Class (0x1004) + .short 0x2a # Record length + .short 0x1504 # Record kind: LF_CLASS + .short 0x0 # MemberCount + .short 0x280 # Properties ( ForwardReference (0x80) | HasUniqueName (0x200) ) + .long 0x0 # FieldList + .long 0x0 # DerivedFrom + .long 0x0 # VShape + .short 0x0 # SizeOf + .asciz "Class1" # Name + .asciz ".?AVClass1@@" # LinkageName + .byte 242 + .byte 241 + # MemberFunction (0x1005) + .short 0x1a # Record length + .short 0x1009 # Record kind: LF_MFUNCTION + .long 0x74 # ReturnType: int + .long 0x1004 # ClassType: Class1 + .long 0x0 # ThisType + .byte 0x0 # CallingConvention: NearC + .byte 0x0 # FunctionOptions + .short 0x1 # NumParameters + .long 0x1001 # ArgListType: (int) + .long 0x0 # ThisAdjustment + # FieldList (0x1006) + .short 0xe # Record length + .short 0x1203 # Record kind: LF_FIELDLIST + .short 0x1511 # Member kind: OneMethod ( LF_ONEMETHOD ) + .short 0xb # Attrs: Public, Static + .long 0x1005 # Type: int Class1::(int) + .asciz "bar" # Name + # Class (0x1007) + .short 0x2a # Record length + .short 0x1504 # Record kind: LF_CLASS + .short 0x1 # MemberCount + .short 0x200 # Properties ( HasUniqueName (0x200) ) + .long 0x1006 # FieldList: + .long 0x0 # DerivedFrom + .long 0x0 # VShape + .short 0x1 # SizeOf + .asciz "Class1" # Name + .asciz ".?AVClass1@@" # LinkageName + .byte 242 + .byte 241 + # StringId (0x1008) + .short 0x12 # Record length + .short 0x1605 # Record kind: LF_STRING_ID + .long 0x0 # Id + .asciz "/tmp/./b.h" # StringData + .byte 241 + # UdtSourceLine (0x1009) + .short 0xe # Record length + .short 0x1606 # Record kind: LF_UDT_SRC_LINE + .long 0x1007 # UDT: Class1 + .long 0x1008 # SourceFile: /tmp/./b.h + .long 0x2 # LineNumber + # MemberFuncId (0x100A) + .short 0xe # Record length + .short 0x1602 # Record kind: LF_MFUNC_ID + .long 0x1004 # ClassType: Class1 + .long 0x1005 # FunctionType: int Class1::(int) + .asciz "bar" # Name + # Class (0x100B) + .short 0x42 # Record length + .short 0x1504 # Record kind: LF_CLASS + .short 0x0 # MemberCount + .short 0x280 # Properties ( ForwardReference (0x80) | HasUniqueName (0x200) ) + .long 0x0 # FieldList + .long 0x0 # DerivedFrom + .long 0x0 # VShape + .short 0x0 # SizeOf + .asciz "Namespace2::Class2" # Name + .asciz ".?AVClass2 at Namespace2@@" # LinkageName + .byte 243 + .byte 242 + .byte 241 + # MemberFunction (0x100C) + .short 0x1a # Record length + .short 0x1009 # Record kind: LF_MFUNCTION + .long 0x74 # ReturnType: int + .long 0x100b # ClassType: Namespace2::Class2 + .long 0x0 # ThisType + .byte 0x0 # CallingConvention: NearC + .byte 0x0 # FunctionOptions + .short 0x1 # NumParameters + .long 0x1001 # ArgListType: (int) + .long 0x0 # ThisAdjustment + # FieldList (0x100D) + .short 0x12 # Record length + .short 0x1203 # Record kind: LF_FIELDLIST + .short 0x1511 # Member kind: OneMethod ( LF_ONEMETHOD ) + .short 0xb # Attrs: Public, Static + .long 0x100c # Type: int Namespace2::Class2::(int) + .asciz "func" # Name + .byte 243 + .byte 242 + .byte 241 + # Class (0x100E) + .short 0x42 # Record length + .short 0x1504 # Record kind: LF_CLASS + .short 0x1 # MemberCount + .short 0x200 # Properties ( HasUniqueName (0x200) ) + .long 0x100d # FieldList: + .long 0x0 # DerivedFrom + .long 0x0 # VShape + .short 0x1 # SizeOf + .asciz "Namespace2::Class2" # Name + .asciz ".?AVClass2 at Namespace2@@" # LinkageName + .byte 243 + .byte 242 + .byte 241 + # StringId (0x100F) + .short 0x12 # Record length + .short 0x1605 # Record kind: LF_STRING_ID + .long 0x0 # Id + .asciz "/tmp/./c.h" # StringData + .byte 241 + # UdtSourceLine (0x1010) + .short 0xe # Record length + .short 0x1606 # Record kind: LF_UDT_SRC_LINE + .long 0x100e # UDT: Namespace2::Class2 + .long 0x100f # SourceFile: /tmp/./c.h + .long 0x2 # LineNumber + # MemberFuncId (0x1011) + .short 0x12 # Record length + .short 0x1602 # Record kind: LF_MFUNC_ID + .long 0x100b # ClassType: Namespace2::Class2 + .long 0x100c # FunctionType: int Namespace2::Class2::(int) + .asciz "func" # Name + .byte 243 + .byte 242 + .byte 241 + # Pointer (0x1012) + .short 0xa # Record length + .short 0x1002 # Record kind: LF_POINTER + .long 0x670 # PointeeType: char* + .long 0x1000c # Attrs: [ Type: Near64, Mode: Pointer, SizeOf: 8 ] + # ArgList (0x1013) + .short 0xe # Record length + .short 0x1201 # Record kind: LF_ARGLIST + .long 0x2 # NumArgs + .long 0x74 # Argument: int + .long 0x1012 # Argument: char** + # Procedure (0x1014) + .short 0xe # Record length + .short 0x1008 # Record kind: LF_PROCEDURE + .long 0x74 # ReturnType: int + .byte 0x0 # CallingConvention: NearC + .byte 0x0 # FunctionOptions + .short 0x2 # NumParameters + .long 0x1013 # ArgListType: (int, char**) + # FuncId (0x1015) + .short 0x12 # Record length + .short 0x1601 # Record kind: LF_FUNC_ID + .long 0x0 # ParentScope + .long 0x1014 # FunctionType: int (int, char**) + .asciz "main" # Name + .byte 243 + .byte 242 + .byte 241 + # Modifier (0x1016) + .short 0xa # Record length + .short 0x1001 # Record kind: LF_MODIFIER + .long 0x74 # ModifiedType: int + .short 0x2 # Modifiers ( Volatile (0x2) ) + .byte 242 + .byte 241 + # StringId (0x1017) + .short 0xe # Record length + .short 0x1605 # Record kind: LF_STRING_ID + .long 0x0 # Id + .asciz "/tmp" # StringData + .byte 243 + .byte 242 + .byte 241 + # StringId (0x1018) + .short 0xe # Record length + .short 0x1605 # Record kind: LF_STRING_ID + .long 0x0 # Id + .asciz "a.cpp" # StringData + .byte 242 + .byte 241 + # StringId (0x1019) + .short 0xa # Record length + .short 0x1605 # Record kind: LF_STRING_ID + .long 0x0 # Id + .byte 0 # StringData + .byte 243 + .byte 242 + .byte 241 + # StringId (0x101A) + .short 0x4e # Record length + .short 0x1605 # Record kind: LF_STRING_ID + .long 0x0 # Id + .asciz "/usr/local/google/home/zequanwu/llvm-project/build/release/bin/clang" # StringData + .byte 243 + .byte 242 + .byte 241 + # StringId (0x101B) + .short 0x9f6 # Record length + .short 0x1605 # Record kind: LF_STRING_ID + .long 0x0 # Id + .asciz "\"-cc1\" \"-triple\" \"x86_64-pc-windows-msvc19.20.0\" \"-S\" \"-disable-free\" \"-clear-ast-before-backend\" \"-disable-llvm-verifier\" \"-discard-value-names\" \"-mrelocation-model\" \"pic\" \"-pic-level\" \"2\" \"-mframe-pointer=none\" \"-relaxed-aliasing\" \"-fmath-errno\" \"-ffp-contract=on\" \"-fno-rounding-math\" \"-mconstructor-aliases\" \"-funwind-tables=2\" \"-target-cpu\" \"x86-64\" \"-mllvm\" \"-x86-asm-syntax=intel\" \"-tune-cpu\" \"generic\" \"-mllvm\" \"-treat-scalable-fixed-error-as-warning\" \"-D_MT\" \"-flto-visibility-public-std\" \"--dependent-lib=libcmt\" \"--dependent-lib=oldnames\" \"-stack-protector\" \"2\" \"-fms-volatile\" \"-fdiagnostics-format\" \"msvc\" \"-gno-column-info\" \"-gcodeview\" \"-debug-info-kind=constructor\" \"-ffunction-sections\" \"-fcoverage-compilation-dir=/tmp\" \"-resource-dir\" \"/usr/local/google/home/zequanwu/llvm-project/build/release/lib/clang/15.0.0\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/llvm-project/build/release/lib/clang/15.0.0/include\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/DIA SDK/include\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/VC/Tools/MSVC/14.26.28801/include\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/VC/Tools/MSVC/14.26.28801/atlmfc/include\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/ucrt\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/shared\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/um\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/winrt\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/cppwinrt\" \"-Os\" \"-fdeprecated-macro\" \"-fdebug-compilation-dir=/tmp\" \"-ferror-limit\" \"19\" \"-fno-use-cxa-atexit\" \"-fms-extensions\" \"-fms-compatibility\" \"-fms-compatibility-version=19.20\" \"-std=c++14\" \"-fdelayed-template-parsing\" \"-fcolor-diagnostics\" \"-vectorize-loops\" \"-vectorize-slp\" \"-faddrsig\" \"-x\" \"c++\"" # StringData + .byte 242 + .byte 241 + # BuildInfo (0x101C) + .short 0x1a # Record length + .short 0x1603 # Record kind: LF_BUILDINFO + .short 0x5 # NumArgs + .long 0x1017 # Argument: /tmp + .long 0x101a # Argument: /usr/local/google/home/zequanwu/llvm-project/build/release/bin/clang + .long 0x1018 # Argument: a.cpp + .long 0x1019 # Argument + .long 0x101b # Argument: "-cc1" "-triple" "x86_64-pc-windows-msvc19.20.0" "-S" "-disable-free" "-clear-ast-before-backend" "-disable-llvm-verifier" "-discard-value-names" "-mrelocation-model" "pic" "-pic-level" "2" "-mframe-pointer=none" "-relaxed-aliasing" "-fmath-errno" "-ffp-contract=on" "-fno-rounding-math" "-mconstructor-aliases" "-funwind-tables=2" "-target-cpu" "x86-64" "-mllvm" "-x86-asm-syntax=intel" "-tune-cpu" "generic" "-mllvm" "-treat-scalable-fixed-error-as-warning" "-D_MT" "-flto-visibility-public-std" "--dependent-lib=libcmt" "--dependent-lib=oldnames" "-stack-protector" "2" "-fms-volatile" "-fdiagnostics-format" "msvc" "-gno-column-info" "-gcodeview" "-debug-info-kind=constructor" "-ffunction-sections" "-fcoverage-compilation-dir=/tmp" "-resource-dir" "/usr/local/google/home/zequanwu/llvm-project/build/release/lib/clang/15.0.0" "-internal-isystem" "/usr/local/google/home/zequanwu/llvm-project/build/release/lib/clang/15.0.0/include" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/DIA SDK/include" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/VC/Tools/MSVC/14.26.28801/include" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/VC/Tools/MSVC/14.26.28801/atlmfc/include" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/ucrt" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/shared" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/um" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/winrt" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/cppwinrt" "-Os" "-fdeprecated-macro" "-fdebug-compilation-dir=/tmp" "-ferror-limit" "19" "-fno-use-cxa-atexit" "-fms-extensions" "-fms-compatibility" "-fms-compatibility-version=19.20" "-std=c++14" "-fdelayed-template-parsing" "-fcolor-diagnostics" "-vectorize-loops" "-vectorize-slp" "-faddrsig" "-x" "c++" + .byte 242 + .byte 241 + .addrsig diff --git a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/inline_sites_live.lldbinit b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/inline_sites_live.lldbinit index 2291c7c4527175..eab5061dafbdcd 100644 --- a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/inline_sites_live.lldbinit +++ b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/inline_sites_live.lldbinit @@ -1,7 +1,7 @@ -br set -p BP_bar -f inline_sites_live.cpp -br set -p BP_foo -f inline_sites_live.cpp -run -expression param -continue -expression param -expression local +br set -p BP_bar -f inline_sites_live.cpp +br set -p BP_foo -f inline_sites_live.cpp +run +expression param +continue +expression param +expression local diff --git a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/local-variables-registers.lldbinit b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/local-variables-registers.lldbinit index ad080da24dab71..feda7485675792 100644 --- a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/local-variables-registers.lldbinit +++ b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/local-variables-registers.lldbinit @@ -1,35 +1,35 @@ -image lookup -a 0x140001000 -v -image lookup -a 0x140001003 -v -image lookup -a 0x140001006 -v - -image lookup -a 0x140001011 -v -image lookup -a 0x140001017 -v -image lookup -a 0x140001019 -v -image lookup -a 0x14000101e -v -image lookup -a 0x14000102c -v - -image lookup -a 0x140001031 -v -image lookup -a 0x140001032 -v -image lookup -a 0x140001033 -v -image lookup -a 0x140001034 -v -image lookup -a 0x140001035 -v -image lookup -a 0x140001036 -v -image lookup -a 0x140001037 -v -image lookup -a 0x14000103b -v -image lookup -a 0x14000103d -v -image lookup -a 0x14000103f -v -image lookup -a 0x140001041 -v -image lookup -a 0x140001043 -v -image lookup -a 0x140001045 -v -image lookup -a 0x140001046 -v -image lookup -a 0x140001047 -v -image lookup -a 0x140001048 -v -image lookup -a 0x140001049 -v -image lookup -a 0x14000104a -v -image lookup -a 0x14000104b -v -image lookup -a 0x14000104c -v -image lookup -a 0x14000104e -v -image lookup -a 0x14000104f -v -image lookup -a 0x140001050 -v -image lookup -a 0x140001051 -v -exit +image lookup -a 0x140001000 -v +image lookup -a 0x140001003 -v +image lookup -a 0x140001006 -v + +image lookup -a 0x140001011 -v +image lookup -a 0x140001017 -v +image lookup -a 0x140001019 -v +image lookup -a 0x14000101e -v +image lookup -a 0x14000102c -v + +image lookup -a 0x140001031 -v +image lookup -a 0x140001032 -v +image lookup -a 0x140001033 -v +image lookup -a 0x140001034 -v +image lookup -a 0x140001035 -v +image lookup -a 0x140001036 -v +image lookup -a 0x140001037 -v +image lookup -a 0x14000103b -v +image lookup -a 0x14000103d -v +image lookup -a 0x14000103f -v +image lookup -a 0x140001041 -v +image lookup -a 0x140001043 -v +image lookup -a 0x140001045 -v +image lookup -a 0x140001046 -v +image lookup -a 0x140001047 -v +image lookup -a 0x140001048 -v +image lookup -a 0x140001049 -v +image lookup -a 0x14000104a -v +image lookup -a 0x14000104b -v +image lookup -a 0x14000104c -v +image lookup -a 0x14000104e -v +image lookup -a 0x14000104f -v +image lookup -a 0x140001050 -v +image lookup -a 0x140001051 -v +exit diff --git a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/lookup-by-types.lldbinit b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/lookup-by-types.lldbinit index afe3f2c8b943e3..3f639eb2e539bc 100644 --- a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/lookup-by-types.lldbinit +++ b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/lookup-by-types.lldbinit @@ -1,4 +1,4 @@ -image lookup -type A -image lookup -type B - +image lookup -type A +image lookup -type B + quit \ No newline at end of file diff --git a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/subfield_register_simple_type.lldbinit b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/subfield_register_simple_type.lldbinit index 3dc33fd789dac0..32758f1fbc51f3 100644 --- a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/subfield_register_simple_type.lldbinit +++ b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/subfield_register_simple_type.lldbinit @@ -1,2 +1,2 @@ -image lookup -a 0x40102f -v -quit +image lookup -a 0x40102f -v +quit diff --git a/lldb/test/Shell/SymbolFile/NativePDB/function-types-classes.cpp b/lldb/test/Shell/SymbolFile/NativePDB/function-types-classes.cpp index ca2a84de7698a4..f0fac90e5065a1 100644 --- a/lldb/test/Shell/SymbolFile/NativePDB/function-types-classes.cpp +++ b/lldb/test/Shell/SymbolFile/NativePDB/function-types-classes.cpp @@ -113,9 +113,9 @@ auto incomplete = &three; // CHECK: |-CXXRecordDecl {{.*}} union U // CHECK: |-EnumDecl {{.*}} E // CHECK: |-CXXRecordDecl {{.*}} struct S -// CHECK: |-VarDecl {{.*}} a 'S (*)(C *, U &, E &&)' -// CHECK: |-VarDecl {{.*}} b 'E (*)(const S *, const C &, const U &&)' -// CHECK: |-VarDecl {{.*}} c 'U (*)(volatile E *, volatile S &, volatile C &&)' +// CHECK: |-VarDecl {{.*}} a 'S (*)(C *, U &, E &&)' +// CHECK: |-VarDecl {{.*}} b 'E (*)(const S *, const C &, const U &&)' +// CHECK: |-VarDecl {{.*}} c 'U (*)(volatile E *, volatile S &, volatile C &&)' // CHECK: |-VarDecl {{.*}} d 'C (*)(const volatile U *, const volatile E &, const volatile S &&)' // CHECK: |-CXXRecordDecl {{.*}} struct B // CHECK: | `-CXXRecordDecl {{.*}} struct A @@ -125,14 +125,14 @@ auto incomplete = &three; // CHECK: | | `-CXXRecordDecl {{.*}} struct S // CHECK: | `-NamespaceDecl {{.*}} B // CHECK: | `-CXXRecordDecl {{.*}} struct S -// CHECK: |-VarDecl {{.*}} e 'A::B::S *(*)(B::A::S *, A::C::S &)' -// CHECK: |-VarDecl {{.*}} f 'A::C::S &(*)(A::B::S *, B::A::S *)' +// CHECK: |-VarDecl {{.*}} e 'A::B::S *(*)(B::A::S *, A::C::S &)' +// CHECK: |-VarDecl {{.*}} f 'A::C::S &(*)(A::B::S *, B::A::S *)' // CHECK: |-VarDecl {{.*}} g 'B::A::S *(*)(A::C::S &, A::B::S *)' // CHECK: |-CXXRecordDecl {{.*}} struct TC // CHECK: |-CXXRecordDecl {{.*}} struct TC> // CHECK: |-CXXRecordDecl {{.*}} struct TC // CHECK: |-CXXRecordDecl {{.*}} struct TC -// CHECK: |-VarDecl {{.*}} h 'TC (*)(TC, TC>, TC)' +// CHECK: |-VarDecl {{.*}} h 'TC (*)(TC, TC>, TC)' // CHECK: |-VarDecl {{.*}} i 'A::B::S (*)()' // CHECK: |-CXXRecordDecl {{.*}} struct Incomplete // CHECK: `-VarDecl {{.*}} incomplete 'Incomplete *(*)(Incomplete **, const Incomplete *)' diff --git a/lldb/test/Shell/SymbolFile/NativePDB/inline_sites_live.cpp b/lldb/test/Shell/SymbolFile/NativePDB/inline_sites_live.cpp index 767149ea18c468..40298272696580 100644 --- a/lldb/test/Shell/SymbolFile/NativePDB/inline_sites_live.cpp +++ b/lldb/test/Shell/SymbolFile/NativePDB/inline_sites_live.cpp @@ -1,34 +1,34 @@ -// clang-format off -// REQUIRES: system-windows - -// RUN: %build -o %t.exe -- %s -// RUN: env LLDB_USE_NATIVE_PDB_READER=1 %lldb -f %t.exe -s \ -// RUN: %p/Inputs/inline_sites_live.lldbinit 2>&1 | FileCheck %s - -void use(int) {} - -void __attribute__((always_inline)) bar(int param) { - use(param); // BP_bar -} - -void __attribute__((always_inline)) foo(int param) { - int local = param+1; - bar(local); - use(param); - use(local); // BP_foo -} - -int main(int argc, char** argv) { - foo(argc); -} - -// CHECK: * thread #1, stop reason = breakpoint 1 -// CHECK-NEXT: frame #0: {{.*}}`main [inlined] bar(param=2) -// CHECK: (lldb) expression param -// CHECK-NEXT: (int) $0 = 2 -// CHECK: * thread #1, stop reason = breakpoint 2 -// CHECK-NEXT: frame #0: {{.*}}`main [inlined] foo(param=1) -// CHECK: (lldb) expression param -// CHECK-NEXT: (int) $1 = 1 -// CHECK-NEXT: (lldb) expression local -// CHECK-NEXT: (int) $2 = 2 +// clang-format off +// REQUIRES: system-windows + +// RUN: %build -o %t.exe -- %s +// RUN: env LLDB_USE_NATIVE_PDB_READER=1 %lldb -f %t.exe -s \ +// RUN: %p/Inputs/inline_sites_live.lldbinit 2>&1 | FileCheck %s + +void use(int) {} + +void __attribute__((always_inline)) bar(int param) { + use(param); // BP_bar +} + +void __attribute__((always_inline)) foo(int param) { + int local = param+1; + bar(local); + use(param); + use(local); // BP_foo +} + +int main(int argc, char** argv) { + foo(argc); +} + +// CHECK: * thread #1, stop reason = breakpoint 1 +// CHECK-NEXT: frame #0: {{.*}}`main [inlined] bar(param=2) +// CHECK: (lldb) expression param +// CHECK-NEXT: (int) $0 = 2 +// CHECK: * thread #1, stop reason = breakpoint 2 +// CHECK-NEXT: frame #0: {{.*}}`main [inlined] foo(param=1) +// CHECK: (lldb) expression param +// CHECK-NEXT: (int) $1 = 1 +// CHECK-NEXT: (lldb) expression local +// CHECK-NEXT: (int) $2 = 2 diff --git a/lldb/test/Shell/SymbolFile/NativePDB/lookup-by-types.cpp b/lldb/test/Shell/SymbolFile/NativePDB/lookup-by-types.cpp index f3aea8115f3858..cd5bbfc30fa0e1 100644 --- a/lldb/test/Shell/SymbolFile/NativePDB/lookup-by-types.cpp +++ b/lldb/test/Shell/SymbolFile/NativePDB/lookup-by-types.cpp @@ -1,46 +1,46 @@ -// clang-format off - -// RUN: %build -o %t.exe -- %s -// RUN: env LLDB_USE_NATIVE_PDB_READER=1 %lldb -f %t.exe -s \ -// RUN: %p/Inputs/lookup-by-types.lldbinit 2>&1 | FileCheck %s - -class B; -class A { -public: - static const A constA; - static A a; - static B b; - int val = 1; -}; -class B { -public: - static A a; - int val = 2; -}; -A varA; -B varB; -const A A::constA = varA; -A A::a = varA; -B A::b = varB; -A B::a = varA; - -int main(int argc, char **argv) { - return varA.val + varB.val; -} - -// CHECK: image lookup -type A -// CHECK-NEXT: 1 match found in {{.*}}.exe -// CHECK-NEXT: compiler_type = "class A { -// CHECK-NEXT: static const A constA; -// CHECK-NEXT: static A a; -// CHECK-NEXT: static B b; -// CHECK-NEXT: public: -// CHECK-NEXT: int val; -// CHECK-NEXT: }" -// CHECK: image lookup -type B -// CHECK-NEXT: 1 match found in {{.*}}.exe -// CHECK-NEXT: compiler_type = "class B { -// CHECK-NEXT: static A a; -// CHECK-NEXT: public: -// CHECK-NEXT: int val; -// CHECK-NEXT: }" +// clang-format off + +// RUN: %build -o %t.exe -- %s +// RUN: env LLDB_USE_NATIVE_PDB_READER=1 %lldb -f %t.exe -s \ +// RUN: %p/Inputs/lookup-by-types.lldbinit 2>&1 | FileCheck %s + +class B; +class A { +public: + static const A constA; + static A a; + static B b; + int val = 1; +}; +class B { +public: + static A a; + int val = 2; +}; +A varA; +B varB; +const A A::constA = varA; +A A::a = varA; +B A::b = varB; +A B::a = varA; + +int main(int argc, char **argv) { + return varA.val + varB.val; +} + +// CHECK: image lookup -type A +// CHECK-NEXT: 1 match found in {{.*}}.exe +// CHECK-NEXT: compiler_type = "class A { +// CHECK-NEXT: static const A constA; +// CHECK-NEXT: static A a; +// CHECK-NEXT: static B b; +// CHECK-NEXT: public: +// CHECK-NEXT: int val; +// CHECK-NEXT: }" +// CHECK: image lookup -type B +// CHECK-NEXT: 1 match found in {{.*}}.exe +// CHECK-NEXT: compiler_type = "class B { +// CHECK-NEXT: static A a; +// CHECK-NEXT: public: +// CHECK-NEXT: int val; +// CHECK-NEXT: }" diff --git a/lldb/unittests/Breakpoint/CMakeLists.txt b/lldb/unittests/Breakpoint/CMakeLists.txt index 757c2da1a4d9de..db985bc82dc5e2 100644 --- a/lldb/unittests/Breakpoint/CMakeLists.txt +++ b/lldb/unittests/Breakpoint/CMakeLists.txt @@ -1,10 +1,10 @@ -add_lldb_unittest(LLDBBreakpointTests - BreakpointIDTest.cpp - WatchpointAlgorithmsTests.cpp - - LINK_LIBS - lldbBreakpoint - lldbCore - LINK_COMPONENTS - Support - ) +add_lldb_unittest(LLDBBreakpointTests + BreakpointIDTest.cpp + WatchpointAlgorithmsTests.cpp + + LINK_LIBS + lldbBreakpoint + lldbCore + LINK_COMPONENTS + Support + ) diff --git a/llvm/benchmarks/FormatVariadicBM.cpp b/llvm/benchmarks/FormatVariadicBM.cpp index c03ead400d0d5c..e351db338730e9 100644 --- a/llvm/benchmarks/FormatVariadicBM.cpp +++ b/llvm/benchmarks/FormatVariadicBM.cpp @@ -1,63 +1,63 @@ -//===- FormatVariadicBM.cpp - formatv() benchmark ---------- --------------===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// - -#include "benchmark/benchmark.h" -#include "llvm/Support/FormatVariadic.h" -#include -#include -#include - -using namespace llvm; -using namespace std; - -// Generate a list of format strings that have `NumReplacements` replacements -// by permuting the replacements and some literal text. -static vector getFormatStrings(int NumReplacements) { - vector Components; - for (int I = 0; I < NumReplacements; I++) - Components.push_back("{" + to_string(I) + "}"); - // Intersperse these with some other literal text (_). - const string_view Literal = "____"; - for (char C : Literal) - Components.push_back(string(1, C)); - - vector Formats; - do { - string Concat; - for (const string &C : Components) - Concat += C; - Formats.emplace_back(Concat); - } while (next_permutation(Components.begin(), Components.end())); - return Formats; -} - -// Generate the set of formats to exercise outside the benchmark code. -static const vector> Formats = { - getFormatStrings(1), getFormatStrings(2), getFormatStrings(3), - getFormatStrings(4), getFormatStrings(5), -}; - -// Benchmark formatv() for a variety of format strings and 1-5 replacements. -static void BM_FormatVariadic(benchmark::State &state) { - for (auto _ : state) { - for (const string &Fmt : Formats[0]) - formatv(Fmt.c_str(), 1).str(); - for (const string &Fmt : Formats[1]) - formatv(Fmt.c_str(), 1, 2).str(); - for (const string &Fmt : Formats[2]) - formatv(Fmt.c_str(), 1, 2, 3).str(); - for (const string &Fmt : Formats[3]) - formatv(Fmt.c_str(), 1, 2, 3, 4).str(); - for (const string &Fmt : Formats[4]) - formatv(Fmt.c_str(), 1, 2, 3, 4, 5).str(); - } -} - -BENCHMARK(BM_FormatVariadic); - -BENCHMARK_MAIN(); +//===- FormatVariadicBM.cpp - formatv() benchmark ---------- --------------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#include "benchmark/benchmark.h" +#include "llvm/Support/FormatVariadic.h" +#include +#include +#include + +using namespace llvm; +using namespace std; + +// Generate a list of format strings that have `NumReplacements` replacements +// by permuting the replacements and some literal text. +static vector getFormatStrings(int NumReplacements) { + vector Components; + for (int I = 0; I < NumReplacements; I++) + Components.push_back("{" + to_string(I) + "}"); + // Intersperse these with some other literal text (_). + const string_view Literal = "____"; + for (char C : Literal) + Components.push_back(string(1, C)); + + vector Formats; + do { + string Concat; + for (const string &C : Components) + Concat += C; + Formats.emplace_back(Concat); + } while (next_permutation(Components.begin(), Components.end())); + return Formats; +} + +// Generate the set of formats to exercise outside the benchmark code. +static const vector> Formats = { + getFormatStrings(1), getFormatStrings(2), getFormatStrings(3), + getFormatStrings(4), getFormatStrings(5), +}; + +// Benchmark formatv() for a variety of format strings and 1-5 replacements. +static void BM_FormatVariadic(benchmark::State &state) { + for (auto _ : state) { + for (const string &Fmt : Formats[0]) + formatv(Fmt.c_str(), 1).str(); + for (const string &Fmt : Formats[1]) + formatv(Fmt.c_str(), 1, 2).str(); + for (const string &Fmt : Formats[2]) + formatv(Fmt.c_str(), 1, 2, 3).str(); + for (const string &Fmt : Formats[3]) + formatv(Fmt.c_str(), 1, 2, 3, 4).str(); + for (const string &Fmt : Formats[4]) + formatv(Fmt.c_str(), 1, 2, 3, 4, 5).str(); + } +} + +BENCHMARK(BM_FormatVariadic); + +BENCHMARK_MAIN(); diff --git a/llvm/benchmarks/GetIntrinsicForClangBuiltin.cpp b/llvm/benchmarks/GetIntrinsicForClangBuiltin.cpp index fa9c528424c95f..953d9125e11ee2 100644 --- a/llvm/benchmarks/GetIntrinsicForClangBuiltin.cpp +++ b/llvm/benchmarks/GetIntrinsicForClangBuiltin.cpp @@ -1,50 +1,50 @@ -#include "benchmark/benchmark.h" -#include "llvm/IR/Intrinsics.h" - -using namespace llvm; -using namespace Intrinsic; - -// Benchmark intrinsic lookup from a variety of targets. -static void BM_GetIntrinsicForClangBuiltin(benchmark::State &state) { - static const char *Builtins[] = { - "__builtin_adjust_trampoline", - "__builtin_trap", - "__builtin_arm_ttest", - "__builtin_amdgcn_cubetc", - "__builtin_amdgcn_udot2", - "__builtin_arm_stc", - "__builtin_bpf_compare", - "__builtin_HEXAGON_A2_max", - "__builtin_lasx_xvabsd_b", - "__builtin_mips_dlsa", - "__nvvm_floor_f", - "__builtin_altivec_vslb", - "__builtin_r600_read_tgid_x", - "__builtin_riscv_aes64im", - "__builtin_s390_vcksm", - "__builtin_ve_vl_pvfmksge_Mvl", - "__builtin_ia32_axor64", - "__builtin_bitrev", - }; - static const char *Targets[] = {"", "aarch64", "amdgcn", "mips", - "nvvm", "r600", "riscv"}; - - for (auto _ : state) { - for (auto Builtin : Builtins) - for (auto Target : Targets) - getIntrinsicForClangBuiltin(Target, Builtin); - } -} - -static void -BM_GetIntrinsicForClangBuiltinHexagonFirst(benchmark::State &state) { - // Exercise the worst case by looking for the first builtin for a target - // that has a lot of builtins. - for (auto _ : state) - getIntrinsicForClangBuiltin("hexagon", "__builtin_HEXAGON_A2_abs"); -} - -BENCHMARK(BM_GetIntrinsicForClangBuiltin); -BENCHMARK(BM_GetIntrinsicForClangBuiltinHexagonFirst); - -BENCHMARK_MAIN(); +#include "benchmark/benchmark.h" +#include "llvm/IR/Intrinsics.h" + +using namespace llvm; +using namespace Intrinsic; + +// Benchmark intrinsic lookup from a variety of targets. +static void BM_GetIntrinsicForClangBuiltin(benchmark::State &state) { + static const char *Builtins[] = { + "__builtin_adjust_trampoline", + "__builtin_trap", + "__builtin_arm_ttest", + "__builtin_amdgcn_cubetc", + "__builtin_amdgcn_udot2", + "__builtin_arm_stc", + "__builtin_bpf_compare", + "__builtin_HEXAGON_A2_max", + "__builtin_lasx_xvabsd_b", + "__builtin_mips_dlsa", + "__nvvm_floor_f", + "__builtin_altivec_vslb", + "__builtin_r600_read_tgid_x", + "__builtin_riscv_aes64im", + "__builtin_s390_vcksm", + "__builtin_ve_vl_pvfmksge_Mvl", + "__builtin_ia32_axor64", + "__builtin_bitrev", + }; + static const char *Targets[] = {"", "aarch64", "amdgcn", "mips", + "nvvm", "r600", "riscv"}; + + for (auto _ : state) { + for (auto Builtin : Builtins) + for (auto Target : Targets) + getIntrinsicForClangBuiltin(Target, Builtin); + } +} + +static void +BM_GetIntrinsicForClangBuiltinHexagonFirst(benchmark::State &state) { + // Exercise the worst case by looking for the first builtin for a target + // that has a lot of builtins. + for (auto _ : state) + getIntrinsicForClangBuiltin("hexagon", "__builtin_HEXAGON_A2_abs"); +} + +BENCHMARK(BM_GetIntrinsicForClangBuiltin); +BENCHMARK(BM_GetIntrinsicForClangBuiltinHexagonFirst); + +BENCHMARK_MAIN(); diff --git a/llvm/benchmarks/GetIntrinsicInfoTableEntriesBM.cpp b/llvm/benchmarks/GetIntrinsicInfoTableEntriesBM.cpp index 7f3bd3bc9eb6b3..758291274675d6 100644 --- a/llvm/benchmarks/GetIntrinsicInfoTableEntriesBM.cpp +++ b/llvm/benchmarks/GetIntrinsicInfoTableEntriesBM.cpp @@ -1,30 +1,30 @@ -//===- GetIntrinsicInfoTableEntries.cpp - IIT signature benchmark ---------===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// - -#include "benchmark/benchmark.h" -#include "llvm/ADT/SmallVector.h" -#include "llvm/IR/Intrinsics.h" - -using namespace llvm; -using namespace Intrinsic; - -static void BM_GetIntrinsicInfoTableEntries(benchmark::State &state) { - SmallVector Table; - for (auto _ : state) { - for (ID ID = 1; ID < num_intrinsics; ++ID) { - // This makes sure the vector does not keep growing, as well as after the - // first iteration does not result in additional allocations. - Table.clear(); - getIntrinsicInfoTableEntries(ID, Table); - } - } -} - -BENCHMARK(BM_GetIntrinsicInfoTableEntries); - -BENCHMARK_MAIN(); +//===- GetIntrinsicInfoTableEntries.cpp - IIT signature benchmark ---------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#include "benchmark/benchmark.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/IR/Intrinsics.h" + +using namespace llvm; +using namespace Intrinsic; + +static void BM_GetIntrinsicInfoTableEntries(benchmark::State &state) { + SmallVector Table; + for (auto _ : state) { + for (ID ID = 1; ID < num_intrinsics; ++ID) { + // This makes sure the vector does not keep growing, as well as after the + // first iteration does not result in additional allocations. + Table.clear(); + getIntrinsicInfoTableEntries(ID, Table); + } + } +} + +BENCHMARK(BM_GetIntrinsicInfoTableEntries); + +BENCHMARK_MAIN(); diff --git a/llvm/docs/_static/LoopOptWG_invite.ics b/llvm/docs/_static/LoopOptWG_invite.ics index 65597d90a9c852..7c92e4048cc3d1 100644 --- a/llvm/docs/_static/LoopOptWG_invite.ics +++ b/llvm/docs/_static/LoopOptWG_invite.ics @@ -1,80 +1,80 @@ -BEGIN:VCALENDAR -PRODID:-//Google Inc//Google Calendar 70.9054//EN -VERSION:2.0 -CALSCALE:GREGORIAN -METHOD:PUBLISH -X-WR-CALNAME:LLVM Loop Optimization Discussion -X-WR-TIMEZONE:Europe/Berlin -BEGIN:VTIMEZONE -TZID:America/New_York -X-LIC-LOCATION:America/New_York -BEGIN:DAYLIGHT -TZOFFSETFROM:-0500 -TZOFFSETTO:-0400 -TZNAME:EDT -DTSTART:19700308T020000 -RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU -END:DAYLIGHT -BEGIN:STANDARD -TZOFFSETFROM:-0400 -TZOFFSETTO:-0500 -TZNAME:EST -DTSTART:19701101T020000 -RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU -END:STANDARD -END:VTIMEZONE -BEGIN:VEVENT -DTSTART;TZID=America/New_York:20240904T110000 -DTEND;TZID=America/New_York:20240904T120000 -RRULE:FREQ=MONTHLY;BYDAY=1WE -DTSTAMP:20240821T160951Z -UID:58h3f0kd3aooohmeii0johh23c at google.com -X-GOOGLE-CONFERENCE:https://meet.google.com/fmz-gspu-odg -CREATED:20240821T151507Z -DESCRIPTION:LLVM Loop Optimization Discussion
Video call link: https://meet.google.c - om/fmz-gspu-odg
Agenda/Minutes/Discussion: https://docs.google.com/document/d/1sdzoyB - 11s0ccTZ3fobqctDpgJmRoFcz0sviKxqczs4g/edit?usp=sharing\n\n-::~:~::~:~:~ - :~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~::~:~::-\ - nJoin with Google Meet: https://meet.google.com/fmz-gspu-odg\nOr dial: (DE) - +49 40 8081617343 PIN: 948106286#\nMore phone numbers: https://tel.meet/fm - z-gspu-odg?pin=6273693382184&hs=7\n\nLearn more about Meet at: https://supp - ort.google.com/a/users/answer/9282720\n\nPlease do not edit this section.\n - -::~:~::~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~ - :~:~::~:~::- -LAST-MODIFIED:20240821T160941Z -SEQUENCE:0 -STATUS:CONFIRMED -SUMMARY:LLVM Loop Optimization Discussion -TRANSP:OPAQUE -END:VEVENT -BEGIN:VEVENT -DTSTART;TZID=America/New_York:20240904T110000 -DTEND;TZID=America/New_York:20240904T120000 -DTSTAMP:20240821T160951Z -UID:58h3f0kd3aooohmeii0johh23c at google.com -X-GOOGLE-CONFERENCE:https://meet.google.com/fmz-gspu-odg -RECURRENCE-ID;TZID=America/New_York:20240904T110000 -CREATED:20240821T151507Z -DESCRIPTION:LLVM Loop Optimization Discussion
Video call link: https://meet.google.c - om/fmz-gspu-odg
Agenda/Minutes/Discussion: https://docs.google.com/document/d/1sdzoyB - 11s0ccTZ3fobqctDpgJmRoFcz0sviKxqczs4g/edit?usp=sharing\n\n-::~:~::~:~:~ - :~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~::~:~::-\ - nJoin with Google Meet: https://meet.google.com/fmz-gspu-odg\nOr dial: (DE) - +49 40 8081617343 PIN: 948106286#\nMore phone numbers: https://tel.meet/fm - z-gspu-odg?pin=6273693382184&hs=7\n\nLearn more about Meet at: https://supp - ort.google.com/a/users/answer/9282720\n\nPlease do not edit this section.\n - -::~:~::~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~ - :~:~::~:~::- -LAST-MODIFIED:20240821T160941Z -SEQUENCE:0 -STATUS:CONFIRMED -SUMMARY:LLVM Loop Optimization Discussion -TRANSP:OPAQUE -END:VEVENT -END:VCALENDAR +BEGIN:VCALENDAR +PRODID:-//Google Inc//Google Calendar 70.9054//EN +VERSION:2.0 +CALSCALE:GREGORIAN +METHOD:PUBLISH +X-WR-CALNAME:LLVM Loop Optimization Discussion +X-WR-TIMEZONE:Europe/Berlin +BEGIN:VTIMEZONE +TZID:America/New_York +X-LIC-LOCATION:America/New_York +BEGIN:DAYLIGHT +TZOFFSETFROM:-0500 +TZOFFSETTO:-0400 +TZNAME:EDT +DTSTART:19700308T020000 +RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU +END:DAYLIGHT +BEGIN:STANDARD +TZOFFSETFROM:-0400 +TZOFFSETTO:-0500 +TZNAME:EST +DTSTART:19701101T020000 +RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU +END:STANDARD +END:VTIMEZONE +BEGIN:VEVENT +DTSTART;TZID=America/New_York:20240904T110000 +DTEND;TZID=America/New_York:20240904T120000 +RRULE:FREQ=MONTHLY;BYDAY=1WE +DTSTAMP:20240821T160951Z +UID:58h3f0kd3aooohmeii0johh23c at google.com +X-GOOGLE-CONFERENCE:https://meet.google.com/fmz-gspu-odg +CREATED:20240821T151507Z +DESCRIPTION:LLVM Loop Optimization Discussion
Video call link: https://meet.google.c + om/fmz-gspu-odg
Agenda/Minutes/Discussion: https://docs.google.com/document/d/1sdzoyB + 11s0ccTZ3fobqctDpgJmRoFcz0sviKxqczs4g/edit?usp=sharing\n\n-::~:~::~:~:~ + :~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~::~:~::-\ + nJoin with Google Meet: https://meet.google.com/fmz-gspu-odg\nOr dial: (DE) + +49 40 8081617343 PIN: 948106286#\nMore phone numbers: https://tel.meet/fm + z-gspu-odg?pin=6273693382184&hs=7\n\nLearn more about Meet at: https://supp + ort.google.com/a/users/answer/9282720\n\nPlease do not edit this section.\n + -::~:~::~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~ + :~:~::~:~::- +LAST-MODIFIED:20240821T160941Z +SEQUENCE:0 +STATUS:CONFIRMED +SUMMARY:LLVM Loop Optimization Discussion +TRANSP:OPAQUE +END:VEVENT +BEGIN:VEVENT +DTSTART;TZID=America/New_York:20240904T110000 +DTEND;TZID=America/New_York:20240904T120000 +DTSTAMP:20240821T160951Z +UID:58h3f0kd3aooohmeii0johh23c at google.com +X-GOOGLE-CONFERENCE:https://meet.google.com/fmz-gspu-odg +RECURRENCE-ID;TZID=America/New_York:20240904T110000 +CREATED:20240821T151507Z +DESCRIPTION:LLVM Loop Optimization Discussion
Video call link: https://meet.google.c + om/fmz-gspu-odg
Agenda/Minutes/Discussion: https://docs.google.com/document/d/1sdzoyB + 11s0ccTZ3fobqctDpgJmRoFcz0sviKxqczs4g/edit?usp=sharing\n\n-::~:~::~:~:~ + :~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~::~:~::-\ + nJoin with Google Meet: https://meet.google.com/fmz-gspu-odg\nOr dial: (DE) + +49 40 8081617343 PIN: 948106286#\nMore phone numbers: https://tel.meet/fm + z-gspu-odg?pin=6273693382184&hs=7\n\nLearn more about Meet at: https://supp + ort.google.com/a/users/answer/9282720\n\nPlease do not edit this section.\n + -::~:~::~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~ + :~:~::~:~::- +LAST-MODIFIED:20240821T160941Z +SEQUENCE:0 +STATUS:CONFIRMED +SUMMARY:LLVM Loop Optimization Discussion +TRANSP:OPAQUE +END:VEVENT +END:VCALENDAR diff --git a/llvm/lib/Support/rpmalloc/CACHE.md b/llvm/lib/Support/rpmalloc/CACHE.md index 052320baf53275..645093026debf1 100644 --- a/llvm/lib/Support/rpmalloc/CACHE.md +++ b/llvm/lib/Support/rpmalloc/CACHE.md @@ -1,19 +1,19 @@ -# Thread caches -rpmalloc has a thread cache of free memory blocks which can be used in allocations without interfering with other threads or going to system to map more memory, as well as a global cache shared by all threads to let spans of memory pages flow between threads. Configuring the size of these caches can be crucial to obtaining good performance while minimizing memory overhead blowup. Below is a simple case study using the benchmark tool to compare different thread cache configurations for rpmalloc. - -The rpmalloc thread cache is configured to be unlimited, performance oriented as meaning default values, size oriented where both thread cache and global cache is reduced significantly, or disabled where both thread and global caches are disabled and completely free pages are directly unmapped. - -The benchmark is configured to run threads allocating 150000 blocks distributed in the `[16, 16000]` bytes range with a linear falloff probability. It runs 1000 loops, and every iteration 75000 blocks (50%) are freed and allocated in a scattered pattern. There are no cross thread allocations/deallocations. Parameters: `benchmark n 0 0 0 1000 150000 75000 16 16000`. The benchmarks are run on an Ubuntu 16.10 machine with 8 cores (4 physical, HT) and 12GiB RAM. - -The benchmark also includes results for the standard library malloc implementation as a reference for comparison with the nocache setting. - -![Ubuntu 16.10 random [16, 16000] bytes, 8 cores](https://docs.google.com/spreadsheets/d/1NWNuar1z0uPCB5iVS_Cs6hSo2xPkTmZf0KsgWS_Fb_4/pubchart?oid=387883204&format=image) -![Ubuntu 16.10 random [16, 16000] bytes, 8 cores](https://docs.google.com/spreadsheets/d/1NWNuar1z0uPCB5iVS_Cs6hSo2xPkTmZf0KsgWS_Fb_4/pubchart?oid=1644710241&format=image) - -For single threaded case the unlimited cache and performance oriented cache settings have identical performance and memory overhead, indicating that the memory pages fit in the combined thread and global cache. As number of threads increase to 2-4 threads, the performance settings have slightly higher performance which can seem odd at first, but can be explained by low contention on the global cache where some memory pages can flow between threads without stalling, reducing the overall number of calls to map new memory pages (also indicated by the slightly lower memory overhead). - -As threads increase even more to 5-10 threads, the increased contention and eventual limit of global cache cause the unlimited setting to gain a slight advantage in performance. As expected the memory overhead remains constant for unlimited caches, while going down for performance setting when number of threads increases. - -The size oriented setting maintain good performance compared to the standard library while reducing the memory overhead compared to the performance setting with a decent amount. - -The nocache setting still outperforms the reference standard library allocator for workloads up to 6 threads while maintaining a near zero memory overhead, which is even slightly lower than the standard library. For use case scenarios where number of allocation of each size class is lower the overhead in rpmalloc from the 64KiB span size will of course increase. +# Thread caches +rpmalloc has a thread cache of free memory blocks which can be used in allocations without interfering with other threads or going to system to map more memory, as well as a global cache shared by all threads to let spans of memory pages flow between threads. Configuring the size of these caches can be crucial to obtaining good performance while minimizing memory overhead blowup. Below is a simple case study using the benchmark tool to compare different thread cache configurations for rpmalloc. + +The rpmalloc thread cache is configured to be unlimited, performance oriented as meaning default values, size oriented where both thread cache and global cache is reduced significantly, or disabled where both thread and global caches are disabled and completely free pages are directly unmapped. + +The benchmark is configured to run threads allocating 150000 blocks distributed in the `[16, 16000]` bytes range with a linear falloff probability. It runs 1000 loops, and every iteration 75000 blocks (50%) are freed and allocated in a scattered pattern. There are no cross thread allocations/deallocations. Parameters: `benchmark n 0 0 0 1000 150000 75000 16 16000`. The benchmarks are run on an Ubuntu 16.10 machine with 8 cores (4 physical, HT) and 12GiB RAM. + +The benchmark also includes results for the standard library malloc implementation as a reference for comparison with the nocache setting. + +![Ubuntu 16.10 random [16, 16000] bytes, 8 cores](https://docs.google.com/spreadsheets/d/1NWNuar1z0uPCB5iVS_Cs6hSo2xPkTmZf0KsgWS_Fb_4/pubchart?oid=387883204&format=image) +![Ubuntu 16.10 random [16, 16000] bytes, 8 cores](https://docs.google.com/spreadsheets/d/1NWNuar1z0uPCB5iVS_Cs6hSo2xPkTmZf0KsgWS_Fb_4/pubchart?oid=1644710241&format=image) + +For single threaded case the unlimited cache and performance oriented cache settings have identical performance and memory overhead, indicating that the memory pages fit in the combined thread and global cache. As number of threads increase to 2-4 threads, the performance settings have slightly higher performance which can seem odd at first, but can be explained by low contention on the global cache where some memory pages can flow between threads without stalling, reducing the overall number of calls to map new memory pages (also indicated by the slightly lower memory overhead). + +As threads increase even more to 5-10 threads, the increased contention and eventual limit of global cache cause the unlimited setting to gain a slight advantage in performance. As expected the memory overhead remains constant for unlimited caches, while going down for performance setting when number of threads increases. + +The size oriented setting maintain good performance compared to the standard library while reducing the memory overhead compared to the performance setting with a decent amount. + +The nocache setting still outperforms the reference standard library allocator for workloads up to 6 threads while maintaining a near zero memory overhead, which is even slightly lower than the standard library. For use case scenarios where number of allocation of each size class is lower the overhead in rpmalloc from the 64KiB span size will of course increase. diff --git a/llvm/lib/Support/rpmalloc/README.md b/llvm/lib/Support/rpmalloc/README.md index 916bca0118d868..2233df9da42d52 100644 --- a/llvm/lib/Support/rpmalloc/README.md +++ b/llvm/lib/Support/rpmalloc/README.md @@ -1,220 +1,220 @@ -# rpmalloc - General Purpose Memory Allocator -This library provides a cross platform lock free thread caching 16-byte aligned memory allocator implemented in C. -This is a fork of rpmalloc 1.4.5. - -Platforms currently supported: - -- Windows -- MacOS -- iOS -- Linux -- Android -- Haiku - -The code should be easily portable to any platform with atomic operations and an mmap-style virtual memory management API. The API used to map/unmap memory pages can be configured in runtime to a custom implementation and mapping granularity/size. - -This library is put in the public domain; you can redistribute it and/or modify it without any restrictions. Or, if you choose, you can use it under the MIT license. - -# Performance -We believe rpmalloc is faster than most popular memory allocators like tcmalloc, hoard, ptmalloc3 and others without causing extra allocated memory overhead in the thread caches compared to these allocators. We also believe the implementation to be easier to read and modify compared to these allocators, as it is a single source file of ~3000 lines of C code. All allocations have a natural 16-byte alignment. - -Contained in a parallel repository is a benchmark utility that performs interleaved unaligned allocations and deallocations (both in-thread and cross-thread) in multiple threads. It measures number of memory operations performed per CPU second, as well as memory overhead by comparing the virtual memory mapped with the number of bytes requested in allocation calls. The setup of number of thread, cross-thread deallocation rate and allocation size limits is configured by command line arguments. - -https://github.com/mjansson/rpmalloc-benchmark - -Below is an example performance comparison chart of rpmalloc and other popular allocator implementations, with default configurations used. - -![Ubuntu 16.10, random [16, 8000] bytes, 8 cores](https://docs.google.com/spreadsheets/d/1NWNuar1z0uPCB5iVS_Cs6hSo2xPkTmZf0KsgWS_Fb_4/pubchart?oid=301017877&format=image) - -The benchmark producing these numbers were run on an Ubuntu 16.10 machine with 8 logical cores (4 physical, HT). The actual numbers are not to be interpreted as absolute performance figures, but rather as relative comparisons between the different allocators. For additional benchmark results, see the [BENCHMARKS](BENCHMARKS.md) file. - -Configuration of the thread and global caches can be important depending on your use pattern. See [CACHE](CACHE.md) for a case study and some comments/guidelines. - -# Required functions - -Before calling any other function in the API, you __MUST__ call the initialization function, either __rpmalloc_initialize__ or __rpmalloc_initialize_config__, or you will get undefined behaviour when calling other rpmalloc entry point. - -Before terminating your use of the allocator, you __SHOULD__ call __rpmalloc_finalize__ in order to release caches and unmap virtual memory, as well as prepare the allocator for global scope cleanup at process exit or dynamic library unload depending on your use case. - -# Using -The easiest way to use the library is simply adding __rpmalloc.[h|c]__ to your project and compile them along with your sources. This contains only the rpmalloc specific entry points and does not provide internal hooks to process and/or thread creation at the moment. You are required to call these functions from your own code in order to initialize and finalize the allocator in your process and threads: - -__rpmalloc_initialize__ : Call at process start to initialize the allocator - -__rpmalloc_initialize_config__ : Optional entry point to call at process start to initialize the allocator with a custom memory mapping backend, memory page size and mapping granularity. - -__rpmalloc_finalize__: Call at process exit to finalize the allocator - -__rpmalloc_thread_initialize__: Call at each thread start to initialize the thread local data for the allocator - -__rpmalloc_thread_finalize__: Call at each thread exit to finalize and release thread cache back to global cache - -__rpmalloc_config__: Get the current runtime configuration of the allocator - -Then simply use the __rpmalloc__/__rpfree__ and the other malloc style replacement functions. Remember all allocations are 16-byte aligned, so no need to call the explicit rpmemalign/rpaligned_alloc/rpposix_memalign functions unless you need greater alignment, they are simply wrappers to make it easier to replace in existing code. - -If you wish to override the standard library malloc family of functions and have automatic initialization/finalization of process and threads, define __ENABLE_OVERRIDE__ to non-zero which will include the `malloc.c` file in compilation of __rpmalloc.c__, and then rebuild the library or your project where you added the rpmalloc source. If you compile rpmalloc as a separate library you must make the linker use the override symbols from the library by referencing at least one symbol. The easiest way is to simply include `rpmalloc.h` in at least one source file and call `rpmalloc_linker_reference` somewhere - it's a dummy empty function. On Windows platforms and C++ overrides you have to `#include ` in at least one source file and also manually handle the initialize/finalize of the process and all threads. The list of libc entry points replaced may not be complete, use libc/stdc++ replacement only as a convenience for testing the library on an existing code base, not a final solution. - -For explicit first class heaps, see the __rpmalloc_heap_*__ API under [first class heaps](#first-class-heaps) section, requiring __RPMALLOC_FIRST_CLASS_HEAPS__ tp be defined to 1. - -# Building -To compile as a static library run the configure python script which generates a Ninja build script, then build using ninja. The ninja build produces two static libraries, one named `rpmalloc` and one named `rpmallocwrap`, where the latter includes the libc entry point overrides. - -The configure + ninja build also produces two shared object/dynamic libraries. The `rpmallocwrap` shared library can be used with LD_PRELOAD/DYLD_INSERT_LIBRARIES to inject in a preexisting binary, replacing any malloc/free family of function calls. This is only implemented for Linux and macOS targets. The list of libc entry points replaced may not be complete, use preloading as a convenience for testing the library on an existing binary, not a final solution. The dynamic library also provides automatic init/fini of process and threads for all platforms. - -The latest stable release is available in the master branch. For latest development code, use the develop branch. - -# Cache configuration options -Free memory pages are cached both per thread and in a global cache for all threads. The size of the thread caches is determined by an adaptive scheme where each cache is limited by a percentage of the maximum allocation count of the corresponding size class. The size of the global caches is determined by a multiple of the maximum of all thread caches. The factors controlling the cache sizes can be set by editing the individual defines in the `rpmalloc.c` source file for fine tuned control. - -__ENABLE_UNLIMITED_CACHE__: By default defined to 0, set to 1 to make all caches infinite, i.e never release spans to global cache unless thread finishes and never unmap memory pages back to the OS. Highest performance but largest memory overhead. - -__ENABLE_UNLIMITED_GLOBAL_CACHE__: By default defined to 0, set to 1 to make global caches infinite, i.e never unmap memory pages back to the OS. - -__ENABLE_UNLIMITED_THREAD_CACHE__: By default defined to 0, set to 1 to make thread caches infinite, i.e never release spans to global cache unless thread finishes. - -__ENABLE_GLOBAL_CACHE__: By default defined to 1, enables the global cache shared between all threads. Set to 0 to disable the global cache and directly unmap pages evicted from the thread cache. - -__ENABLE_THREAD_CACHE__: By default defined to 1, enables the per-thread cache. Set to 0 to disable the thread cache and directly unmap pages no longer in use (also disables the global cache). - -__ENABLE_ADAPTIVE_THREAD_CACHE__: Introduces a simple heuristics in the thread cache size, keeping 25% of the high water mark for each span count class. - -# Other configuration options -Detailed statistics are available if __ENABLE_STATISTICS__ is defined to 1 (default is 0, or disabled), either on compile command line or by setting the value in `rpmalloc.c`. This will cause a slight overhead in runtime to collect statistics for each memory operation, and will also add 4 bytes overhead per allocation to track sizes. - -Integer safety checks on all calls are enabled if __ENABLE_VALIDATE_ARGS__ is defined to 1 (default is 0, or disabled), either on compile command line or by setting the value in `rpmalloc.c`. If enabled, size arguments to the global entry points are verified not to cause integer overflows in calculations. - -Asserts are enabled if __ENABLE_ASSERTS__ is defined to 1 (default is 0, or disabled), either on compile command line or by setting the value in `rpmalloc.c`. - -To include __malloc.c__ in compilation and provide overrides of standard library malloc entry points define __ENABLE_OVERRIDE__ to 1. To enable automatic initialization of finalization of process and threads in order to preload the library into executables using standard library malloc, define __ENABLE_PRELOAD__ to 1. - -To enable the runtime configurable memory page and span sizes, define __RPMALLOC_CONFIGURABLE__ to 1. By default, memory page size is determined by system APIs and memory span size is set to 64KiB. - -To enable support for first class heaps, define __RPMALLOC_FIRST_CLASS_HEAPS__ to 1. By default, the first class heap API is disabled. - -# Huge pages -The allocator has support for huge/large pages on Windows, Linux and MacOS. To enable it, pass a non-zero value in the config value `enable_huge_pages` when initializing the allocator with `rpmalloc_initialize_config`. If the system does not support huge pages it will be automatically disabled. You can query the status by looking at `enable_huge_pages` in the config returned from a call to `rpmalloc_config` after initialization is done. - -# Quick overview -The allocator is similar in spirit to tcmalloc from the [Google Performance Toolkit](https://github.com/gperftools/gperftools). It uses separate heaps for each thread and partitions memory blocks according to a preconfigured set of size classes, up to 2MiB. Larger blocks are mapped and unmapped directly. Allocations for different size classes will be served from different set of memory pages, each "span" of pages is dedicated to one size class. Spans of pages can flow between threads when the thread cache overflows and are released to a global cache, or when the thread ends. Unlike tcmalloc, single blocks do not flow between threads, only entire spans of pages. - -# Implementation details -The allocator is based on a fixed but configurable page alignment (defaults to 64KiB) and 16 byte block alignment, where all runs of memory pages (spans) are mapped to this alignment boundary. On Windows this is automatically guaranteed up to 64KiB by the VirtualAlloc granularity, and on mmap systems it is achieved by oversizing the mapping and aligning the returned virtual memory address to the required boundaries. By aligning to a fixed size the free operation can locate the header of the memory span without having to do a table lookup (as tcmalloc does) by simply masking out the low bits of the address (for 64KiB this would be the low 16 bits). - -Memory blocks are divided into three categories. For 64KiB span size/alignment the small blocks are [16, 1024] bytes, medium blocks (1024, 32256] bytes, and large blocks (32256, 2097120] bytes. The three categories are further divided in size classes. If the span size is changed, the small block classes remain but medium blocks go from (1024, span size] bytes. - -Small blocks have a size class granularity of 16 bytes each in 64 buckets. Medium blocks have a granularity of 512 bytes, 61 buckets (default). Large blocks have the same granularity as the configured span size (default 64KiB). All allocations are fitted to these size class boundaries (an allocation of 36 bytes will allocate a block of 48 bytes). Each small and medium size class has an associated span (meaning a contiguous set of memory pages) configuration describing how many pages the size class will allocate each time the cache is empty and a new allocation is requested. - -Spans for small and medium blocks are cached in four levels to avoid calls to map/unmap memory pages. The first level is a per thread single active span for each size class. The second level is a per thread list of partially free spans for each size class. The third level is a per thread list of free spans. The fourth level is a global list of free spans. - -Each span for a small and medium size class keeps track of how many blocks are allocated/free, as well as a list of which blocks that are free for allocation. To avoid locks, each span is completely owned by the allocating thread, and all cross-thread deallocations will be deferred to the owner thread through a separate free list per span. - -Large blocks, or super spans, are cached in two levels. The first level is a per thread list of free super spans. The second level is a global list of free super spans. - -# Memory mapping -By default the allocator uses OS APIs to map virtual memory pages as needed, either `VirtualAlloc` on Windows or `mmap` on POSIX systems. If you want to use your own custom memory mapping provider you can use __rpmalloc_initialize_config__ and pass function pointers to map and unmap virtual memory. These function should reserve and free the requested number of bytes. - -The returned memory address from the memory map function MUST be aligned to the memory page size and the memory span size (which ever is larger), both of which is configurable. Either provide the page and span sizes during initialization using __rpmalloc_initialize_config__, or use __rpmalloc_config__ to find the required alignment which is equal to the maximum of page and span size. The span size MUST be a power of two in [4096, 262144] range, and be a multiple or divisor of the memory page size. - -Memory mapping requests are always done in multiples of the memory page size. You can specify a custom page size when initializing rpmalloc with __rpmalloc_initialize_config__, or pass 0 to let rpmalloc determine the system memory page size using OS APIs. The page size MUST be a power of two. - -To reduce system call overhead, memory spans are mapped in batches controlled by the `span_map_count` configuration variable (which defaults to the `DEFAULT_SPAN_MAP_COUNT` value if 0, which in turn is sized according to the cache configuration define, defaulting to 64). If the memory page size is larger than the span size, the number of spans to map in a single call will be adjusted to guarantee a multiple of the page size, and the spans will be kept mapped until the entire span range can be unmapped in one call (to avoid trying to unmap partial pages). - -On macOS and iOS mmap requests are tagged with tag 240 for easy identification with the vmmap tool. - -# Span breaking -Super spans (spans a multiple > 1 of the span size) can be subdivided into smaller spans to fulfill a need to map a new span of memory. By default the allocator will greedily grab and break any larger span from the available caches before mapping new virtual memory. However, spans can currently not be glued together to form larger super spans again. Subspans can traverse the cache and be used by different threads individually. - -A span that is a subspan of a larger super span can be individually decommitted to reduce physical memory pressure when the span is evicted from caches and scheduled to be unmapped. The entire original super span will keep track of the subspans it is broken up into, and when the entire range is decommitted the super span will be unmapped. This allows platforms like Windows that require the entire virtual memory range that was mapped in a call to VirtualAlloc to be unmapped in one call to VirtualFree, while still decommitting individual pages in subspans (if the page size is smaller than the span size). - -If you use a custom memory map/unmap function you need to take this into account by looking at the `release` parameter given to the `memory_unmap` function. It is set to 0 for decommitting individual pages and the total super span byte size for finally releasing the entire super span memory range. - -# Memory fragmentation -There is no memory fragmentation by the allocator in the sense that it will not leave unallocated and unusable "holes" in the memory pages by calls to allocate and free blocks of different sizes. This is due to the fact that the memory pages allocated for each size class is split up in perfectly aligned blocks which are not reused for a request of a different size. The block freed by a call to `rpfree` will always be immediately available for an allocation request within the same size class. - -However, there is memory fragmentation in the meaning that a request for x bytes followed by a request of y bytes where x and y are at least one size class different in size will return blocks that are at least one memory page apart in virtual address space. Only blocks of the same size will potentially be within the same memory page span. - -rpmalloc keeps an "active span" and free list for each size class. This leads to back-to-back allocations will most likely be served from within the same span of memory pages (unless the span runs out of free blocks). The rpmalloc implementation will also use any "holes" in memory pages in semi-filled spans before using a completely free span. - -# First class heaps -rpmalloc provides a first class heap type with explicit heap control API. Heaps are maintained with calls to __rpmalloc_heap_acquire__ and __rpmalloc_heap_release__ and allocations/frees are done with __rpmalloc_heap_alloc__ and __rpmalloc_heap_free__. See the `rpmalloc.h` documentation for the full list of functions in the heap API. The main use case of explicit heap control is to scope allocations in a heap and release everything with a single call to __rpmalloc_heap_free_all__ without having to maintain ownership of memory blocks. Note that the heap API is not thread-safe, the caller must make sure that each heap is only used in a single thread at any given time. - -# Producer-consumer scenario -Compared to the some other allocators, rpmalloc does not suffer as much from a producer-consumer thread scenario where one thread allocates memory blocks and another thread frees the blocks. In some allocators the free blocks need to traverse both the thread cache of the thread doing the free operations as well as the global cache before being reused in the allocating thread. In rpmalloc the freed blocks will be reused as soon as the allocating thread needs to get new spans from the thread cache. This enables faster release of completely freed memory pages as blocks in a memory page will not be aliased between different owning threads. - -# Best case scenarios -Threads that keep ownership of allocated memory blocks within the thread and free the blocks from the same thread will have optimal performance. - -Threads that have allocation patterns where the difference in memory usage high and low water marks fit within the thread cache thresholds in the allocator will never touch the global cache except during thread init/fini and have optimal performance. Tweaking the cache limits can be done on a per-size-class basis. - -# Worst case scenarios -Since each thread cache maps spans of memory pages per size class, a thread that allocates just a few blocks of each size class (16, 32, ...) for many size classes will never fill each bucket, and thus map a lot of memory pages while only using a small fraction of the mapped memory. However, the wasted memory will always be less than 4KiB (or the configured memory page size) per size class as each span is initialized one memory page at a time. The cache for free spans will be reused by all size classes. - -Threads that perform a lot of allocations and deallocations in a pattern that have a large difference in high and low water marks, and that difference is larger than the thread cache size, will put a lot of contention on the global cache. What will happen is the thread cache will overflow on each low water mark causing pages to be released to the global cache, then underflow on high water mark causing pages to be re-acquired from the global cache. This can be mitigated by changing the __MAX_SPAN_CACHE_DIVISOR__ define in the source code (at the cost of higher average memory overhead). - -# Caveats -VirtualAlloc has an internal granularity of 64KiB. However, mmap lacks this granularity control, and the implementation instead oversizes the memory mapping with configured span size to be able to always return a memory area with the required alignment. Since the extra memory pages are never touched this will not result in extra committed physical memory pages, but rather only increase virtual memory address space. - -All entry points assume the passed values are valid, for example passing an invalid pointer to free would most likely result in a segmentation fault. __The library does not try to guard against errors!__. - -To support global scope data doing dynamic allocation/deallocation such as C++ objects with custom constructors and destructors, the call to __rpmalloc_finalize__ will not completely terminate the allocator but rather empty all caches and put the allocator in finalization mode. Once this call has been made, the allocator is no longer thread safe and expects all remaining calls to originate from global data destruction on main thread. Any spans or heaps becoming free during this phase will be immediately unmapped to allow correct teardown of the process or dynamic library without any leaks. - -# Other languages - -[Johan Andersson](https://github.com/repi) at Embark has created a Rust wrapper available at [rpmalloc-rs](https://github.com/EmbarkStudios/rpmalloc-rs) - -[Stas Denisov](https://github.com/nxrighthere) has created a C# wrapper available at [Rpmalloc-CSharp](https://github.com/nxrighthere/Rpmalloc-CSharp) - -# License - -This is free and unencumbered software released into the public domain. - -Anyone is free to copy, modify, publish, use, compile, sell, or -distribute this software, either in source code form or as a compiled -binary, for any purpose, commercial or non-commercial, and by any -means. - -In jurisdictions that recognize copyright laws, the author or authors -of this software dedicate any and all copyright interest in the -software to the public domain. We make this dedication for the benefit -of the public at large and to the detriment of our heirs and -successors. We intend this dedication to be an overt act of -relinquishment in perpetuity of all present and future rights to this -software under copyright law. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, -EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF -MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. -IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR -OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, -ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR -OTHER DEALINGS IN THE SOFTWARE. - -For more information, please refer to - - -You can also use this software under the MIT license if public domain is -not recognized in your country - - -The MIT License (MIT) - -Copyright (c) 2017 Mattias Jansson - -Permission is hereby granted, free of charge, to any person obtaining a copy -of this software and associated documentation files (the "Software"), to deal -in the Software without restriction, including without limitation the rights -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -copies of the Software, and to permit persons to whom the Software is -furnished to do so, subject to the following conditions: - -The above copyright notice and this permission notice shall be included in -all copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN -THE SOFTWARE. +# rpmalloc - General Purpose Memory Allocator +This library provides a cross platform lock free thread caching 16-byte aligned memory allocator implemented in C. +This is a fork of rpmalloc 1.4.5. + +Platforms currently supported: + +- Windows +- MacOS +- iOS +- Linux +- Android +- Haiku + +The code should be easily portable to any platform with atomic operations and an mmap-style virtual memory management API. The API used to map/unmap memory pages can be configured in runtime to a custom implementation and mapping granularity/size. + +This library is put in the public domain; you can redistribute it and/or modify it without any restrictions. Or, if you choose, you can use it under the MIT license. + +# Performance +We believe rpmalloc is faster than most popular memory allocators like tcmalloc, hoard, ptmalloc3 and others without causing extra allocated memory overhead in the thread caches compared to these allocators. We also believe the implementation to be easier to read and modify compared to these allocators, as it is a single source file of ~3000 lines of C code. All allocations have a natural 16-byte alignment. + +Contained in a parallel repository is a benchmark utility that performs interleaved unaligned allocations and deallocations (both in-thread and cross-thread) in multiple threads. It measures number of memory operations performed per CPU second, as well as memory overhead by comparing the virtual memory mapped with the number of bytes requested in allocation calls. The setup of number of thread, cross-thread deallocation rate and allocation size limits is configured by command line arguments. + +https://github.com/mjansson/rpmalloc-benchmark + +Below is an example performance comparison chart of rpmalloc and other popular allocator implementations, with default configurations used. + +![Ubuntu 16.10, random [16, 8000] bytes, 8 cores](https://docs.google.com/spreadsheets/d/1NWNuar1z0uPCB5iVS_Cs6hSo2xPkTmZf0KsgWS_Fb_4/pubchart?oid=301017877&format=image) + +The benchmark producing these numbers were run on an Ubuntu 16.10 machine with 8 logical cores (4 physical, HT). The actual numbers are not to be interpreted as absolute performance figures, but rather as relative comparisons between the different allocators. For additional benchmark results, see the [BENCHMARKS](BENCHMARKS.md) file. + +Configuration of the thread and global caches can be important depending on your use pattern. See [CACHE](CACHE.md) for a case study and some comments/guidelines. + +# Required functions + +Before calling any other function in the API, you __MUST__ call the initialization function, either __rpmalloc_initialize__ or __rpmalloc_initialize_config__, or you will get undefined behaviour when calling other rpmalloc entry point. + +Before terminating your use of the allocator, you __SHOULD__ call __rpmalloc_finalize__ in order to release caches and unmap virtual memory, as well as prepare the allocator for global scope cleanup at process exit or dynamic library unload depending on your use case. + +# Using +The easiest way to use the library is simply adding __rpmalloc.[h|c]__ to your project and compile them along with your sources. This contains only the rpmalloc specific entry points and does not provide internal hooks to process and/or thread creation at the moment. You are required to call these functions from your own code in order to initialize and finalize the allocator in your process and threads: + +__rpmalloc_initialize__ : Call at process start to initialize the allocator + +__rpmalloc_initialize_config__ : Optional entry point to call at process start to initialize the allocator with a custom memory mapping backend, memory page size and mapping granularity. + +__rpmalloc_finalize__: Call at process exit to finalize the allocator + +__rpmalloc_thread_initialize__: Call at each thread start to initialize the thread local data for the allocator + +__rpmalloc_thread_finalize__: Call at each thread exit to finalize and release thread cache back to global cache + +__rpmalloc_config__: Get the current runtime configuration of the allocator + +Then simply use the __rpmalloc__/__rpfree__ and the other malloc style replacement functions. Remember all allocations are 16-byte aligned, so no need to call the explicit rpmemalign/rpaligned_alloc/rpposix_memalign functions unless you need greater alignment, they are simply wrappers to make it easier to replace in existing code. + +If you wish to override the standard library malloc family of functions and have automatic initialization/finalization of process and threads, define __ENABLE_OVERRIDE__ to non-zero which will include the `malloc.c` file in compilation of __rpmalloc.c__, and then rebuild the library or your project where you added the rpmalloc source. If you compile rpmalloc as a separate library you must make the linker use the override symbols from the library by referencing at least one symbol. The easiest way is to simply include `rpmalloc.h` in at least one source file and call `rpmalloc_linker_reference` somewhere - it's a dummy empty function. On Windows platforms and C++ overrides you have to `#include ` in at least one source file and also manually handle the initialize/finalize of the process and all threads. The list of libc entry points replaced may not be complete, use libc/stdc++ replacement only as a convenience for testing the library on an existing code base, not a final solution. + +For explicit first class heaps, see the __rpmalloc_heap_*__ API under [first class heaps](#first-class-heaps) section, requiring __RPMALLOC_FIRST_CLASS_HEAPS__ tp be defined to 1. + +# Building +To compile as a static library run the configure python script which generates a Ninja build script, then build using ninja. The ninja build produces two static libraries, one named `rpmalloc` and one named `rpmallocwrap`, where the latter includes the libc entry point overrides. + +The configure + ninja build also produces two shared object/dynamic libraries. The `rpmallocwrap` shared library can be used with LD_PRELOAD/DYLD_INSERT_LIBRARIES to inject in a preexisting binary, replacing any malloc/free family of function calls. This is only implemented for Linux and macOS targets. The list of libc entry points replaced may not be complete, use preloading as a convenience for testing the library on an existing binary, not a final solution. The dynamic library also provides automatic init/fini of process and threads for all platforms. + +The latest stable release is available in the master branch. For latest development code, use the develop branch. + +# Cache configuration options +Free memory pages are cached both per thread and in a global cache for all threads. The size of the thread caches is determined by an adaptive scheme where each cache is limited by a percentage of the maximum allocation count of the corresponding size class. The size of the global caches is determined by a multiple of the maximum of all thread caches. The factors controlling the cache sizes can be set by editing the individual defines in the `rpmalloc.c` source file for fine tuned control. + +__ENABLE_UNLIMITED_CACHE__: By default defined to 0, set to 1 to make all caches infinite, i.e never release spans to global cache unless thread finishes and never unmap memory pages back to the OS. Highest performance but largest memory overhead. + +__ENABLE_UNLIMITED_GLOBAL_CACHE__: By default defined to 0, set to 1 to make global caches infinite, i.e never unmap memory pages back to the OS. + +__ENABLE_UNLIMITED_THREAD_CACHE__: By default defined to 0, set to 1 to make thread caches infinite, i.e never release spans to global cache unless thread finishes. + +__ENABLE_GLOBAL_CACHE__: By default defined to 1, enables the global cache shared between all threads. Set to 0 to disable the global cache and directly unmap pages evicted from the thread cache. + +__ENABLE_THREAD_CACHE__: By default defined to 1, enables the per-thread cache. Set to 0 to disable the thread cache and directly unmap pages no longer in use (also disables the global cache). + +__ENABLE_ADAPTIVE_THREAD_CACHE__: Introduces a simple heuristics in the thread cache size, keeping 25% of the high water mark for each span count class. + +# Other configuration options +Detailed statistics are available if __ENABLE_STATISTICS__ is defined to 1 (default is 0, or disabled), either on compile command line or by setting the value in `rpmalloc.c`. This will cause a slight overhead in runtime to collect statistics for each memory operation, and will also add 4 bytes overhead per allocation to track sizes. + +Integer safety checks on all calls are enabled if __ENABLE_VALIDATE_ARGS__ is defined to 1 (default is 0, or disabled), either on compile command line or by setting the value in `rpmalloc.c`. If enabled, size arguments to the global entry points are verified not to cause integer overflows in calculations. + +Asserts are enabled if __ENABLE_ASSERTS__ is defined to 1 (default is 0, or disabled), either on compile command line or by setting the value in `rpmalloc.c`. + +To include __malloc.c__ in compilation and provide overrides of standard library malloc entry points define __ENABLE_OVERRIDE__ to 1. To enable automatic initialization of finalization of process and threads in order to preload the library into executables using standard library malloc, define __ENABLE_PRELOAD__ to 1. + +To enable the runtime configurable memory page and span sizes, define __RPMALLOC_CONFIGURABLE__ to 1. By default, memory page size is determined by system APIs and memory span size is set to 64KiB. + +To enable support for first class heaps, define __RPMALLOC_FIRST_CLASS_HEAPS__ to 1. By default, the first class heap API is disabled. + +# Huge pages +The allocator has support for huge/large pages on Windows, Linux and MacOS. To enable it, pass a non-zero value in the config value `enable_huge_pages` when initializing the allocator with `rpmalloc_initialize_config`. If the system does not support huge pages it will be automatically disabled. You can query the status by looking at `enable_huge_pages` in the config returned from a call to `rpmalloc_config` after initialization is done. + +# Quick overview +The allocator is similar in spirit to tcmalloc from the [Google Performance Toolkit](https://github.com/gperftools/gperftools). It uses separate heaps for each thread and partitions memory blocks according to a preconfigured set of size classes, up to 2MiB. Larger blocks are mapped and unmapped directly. Allocations for different size classes will be served from different set of memory pages, each "span" of pages is dedicated to one size class. Spans of pages can flow between threads when the thread cache overflows and are released to a global cache, or when the thread ends. Unlike tcmalloc, single blocks do not flow between threads, only entire spans of pages. + +# Implementation details +The allocator is based on a fixed but configurable page alignment (defaults to 64KiB) and 16 byte block alignment, where all runs of memory pages (spans) are mapped to this alignment boundary. On Windows this is automatically guaranteed up to 64KiB by the VirtualAlloc granularity, and on mmap systems it is achieved by oversizing the mapping and aligning the returned virtual memory address to the required boundaries. By aligning to a fixed size the free operation can locate the header of the memory span without having to do a table lookup (as tcmalloc does) by simply masking out the low bits of the address (for 64KiB this would be the low 16 bits). + +Memory blocks are divided into three categories. For 64KiB span size/alignment the small blocks are [16, 1024] bytes, medium blocks (1024, 32256] bytes, and large blocks (32256, 2097120] bytes. The three categories are further divided in size classes. If the span size is changed, the small block classes remain but medium blocks go from (1024, span size] bytes. + +Small blocks have a size class granularity of 16 bytes each in 64 buckets. Medium blocks have a granularity of 512 bytes, 61 buckets (default). Large blocks have the same granularity as the configured span size (default 64KiB). All allocations are fitted to these size class boundaries (an allocation of 36 bytes will allocate a block of 48 bytes). Each small and medium size class has an associated span (meaning a contiguous set of memory pages) configuration describing how many pages the size class will allocate each time the cache is empty and a new allocation is requested. + +Spans for small and medium blocks are cached in four levels to avoid calls to map/unmap memory pages. The first level is a per thread single active span for each size class. The second level is a per thread list of partially free spans for each size class. The third level is a per thread list of free spans. The fourth level is a global list of free spans. + +Each span for a small and medium size class keeps track of how many blocks are allocated/free, as well as a list of which blocks that are free for allocation. To avoid locks, each span is completely owned by the allocating thread, and all cross-thread deallocations will be deferred to the owner thread through a separate free list per span. + +Large blocks, or super spans, are cached in two levels. The first level is a per thread list of free super spans. The second level is a global list of free super spans. + +# Memory mapping +By default the allocator uses OS APIs to map virtual memory pages as needed, either `VirtualAlloc` on Windows or `mmap` on POSIX systems. If you want to use your own custom memory mapping provider you can use __rpmalloc_initialize_config__ and pass function pointers to map and unmap virtual memory. These function should reserve and free the requested number of bytes. + +The returned memory address from the memory map function MUST be aligned to the memory page size and the memory span size (which ever is larger), both of which is configurable. Either provide the page and span sizes during initialization using __rpmalloc_initialize_config__, or use __rpmalloc_config__ to find the required alignment which is equal to the maximum of page and span size. The span size MUST be a power of two in [4096, 262144] range, and be a multiple or divisor of the memory page size. + +Memory mapping requests are always done in multiples of the memory page size. You can specify a custom page size when initializing rpmalloc with __rpmalloc_initialize_config__, or pass 0 to let rpmalloc determine the system memory page size using OS APIs. The page size MUST be a power of two. + +To reduce system call overhead, memory spans are mapped in batches controlled by the `span_map_count` configuration variable (which defaults to the `DEFAULT_SPAN_MAP_COUNT` value if 0, which in turn is sized according to the cache configuration define, defaulting to 64). If the memory page size is larger than the span size, the number of spans to map in a single call will be adjusted to guarantee a multiple of the page size, and the spans will be kept mapped until the entire span range can be unmapped in one call (to avoid trying to unmap partial pages). + +On macOS and iOS mmap requests are tagged with tag 240 for easy identification with the vmmap tool. + +# Span breaking +Super spans (spans a multiple > 1 of the span size) can be subdivided into smaller spans to fulfill a need to map a new span of memory. By default the allocator will greedily grab and break any larger span from the available caches before mapping new virtual memory. However, spans can currently not be glued together to form larger super spans again. Subspans can traverse the cache and be used by different threads individually. + +A span that is a subspan of a larger super span can be individually decommitted to reduce physical memory pressure when the span is evicted from caches and scheduled to be unmapped. The entire original super span will keep track of the subspans it is broken up into, and when the entire range is decommitted the super span will be unmapped. This allows platforms like Windows that require the entire virtual memory range that was mapped in a call to VirtualAlloc to be unmapped in one call to VirtualFree, while still decommitting individual pages in subspans (if the page size is smaller than the span size). + +If you use a custom memory map/unmap function you need to take this into account by looking at the `release` parameter given to the `memory_unmap` function. It is set to 0 for decommitting individual pages and the total super span byte size for finally releasing the entire super span memory range. + +# Memory fragmentation +There is no memory fragmentation by the allocator in the sense that it will not leave unallocated and unusable "holes" in the memory pages by calls to allocate and free blocks of different sizes. This is due to the fact that the memory pages allocated for each size class is split up in perfectly aligned blocks which are not reused for a request of a different size. The block freed by a call to `rpfree` will always be immediately available for an allocation request within the same size class. + +However, there is memory fragmentation in the meaning that a request for x bytes followed by a request of y bytes where x and y are at least one size class different in size will return blocks that are at least one memory page apart in virtual address space. Only blocks of the same size will potentially be within the same memory page span. + +rpmalloc keeps an "active span" and free list for each size class. This leads to back-to-back allocations will most likely be served from within the same span of memory pages (unless the span runs out of free blocks). The rpmalloc implementation will also use any "holes" in memory pages in semi-filled spans before using a completely free span. + +# First class heaps +rpmalloc provides a first class heap type with explicit heap control API. Heaps are maintained with calls to __rpmalloc_heap_acquire__ and __rpmalloc_heap_release__ and allocations/frees are done with __rpmalloc_heap_alloc__ and __rpmalloc_heap_free__. See the `rpmalloc.h` documentation for the full list of functions in the heap API. The main use case of explicit heap control is to scope allocations in a heap and release everything with a single call to __rpmalloc_heap_free_all__ without having to maintain ownership of memory blocks. Note that the heap API is not thread-safe, the caller must make sure that each heap is only used in a single thread at any given time. + +# Producer-consumer scenario +Compared to the some other allocators, rpmalloc does not suffer as much from a producer-consumer thread scenario where one thread allocates memory blocks and another thread frees the blocks. In some allocators the free blocks need to traverse both the thread cache of the thread doing the free operations as well as the global cache before being reused in the allocating thread. In rpmalloc the freed blocks will be reused as soon as the allocating thread needs to get new spans from the thread cache. This enables faster release of completely freed memory pages as blocks in a memory page will not be aliased between different owning threads. + +# Best case scenarios +Threads that keep ownership of allocated memory blocks within the thread and free the blocks from the same thread will have optimal performance. + +Threads that have allocation patterns where the difference in memory usage high and low water marks fit within the thread cache thresholds in the allocator will never touch the global cache except during thread init/fini and have optimal performance. Tweaking the cache limits can be done on a per-size-class basis. + +# Worst case scenarios +Since each thread cache maps spans of memory pages per size class, a thread that allocates just a few blocks of each size class (16, 32, ...) for many size classes will never fill each bucket, and thus map a lot of memory pages while only using a small fraction of the mapped memory. However, the wasted memory will always be less than 4KiB (or the configured memory page size) per size class as each span is initialized one memory page at a time. The cache for free spans will be reused by all size classes. + +Threads that perform a lot of allocations and deallocations in a pattern that have a large difference in high and low water marks, and that difference is larger than the thread cache size, will put a lot of contention on the global cache. What will happen is the thread cache will overflow on each low water mark causing pages to be released to the global cache, then underflow on high water mark causing pages to be re-acquired from the global cache. This can be mitigated by changing the __MAX_SPAN_CACHE_DIVISOR__ define in the source code (at the cost of higher average memory overhead). + +# Caveats +VirtualAlloc has an internal granularity of 64KiB. However, mmap lacks this granularity control, and the implementation instead oversizes the memory mapping with configured span size to be able to always return a memory area with the required alignment. Since the extra memory pages are never touched this will not result in extra committed physical memory pages, but rather only increase virtual memory address space. + +All entry points assume the passed values are valid, for example passing an invalid pointer to free would most likely result in a segmentation fault. __The library does not try to guard against errors!__. + +To support global scope data doing dynamic allocation/deallocation such as C++ objects with custom constructors and destructors, the call to __rpmalloc_finalize__ will not completely terminate the allocator but rather empty all caches and put the allocator in finalization mode. Once this call has been made, the allocator is no longer thread safe and expects all remaining calls to originate from global data destruction on main thread. Any spans or heaps becoming free during this phase will be immediately unmapped to allow correct teardown of the process or dynamic library without any leaks. + +# Other languages + +[Johan Andersson](https://github.com/repi) at Embark has created a Rust wrapper available at [rpmalloc-rs](https://github.com/EmbarkStudios/rpmalloc-rs) + +[Stas Denisov](https://github.com/nxrighthere) has created a C# wrapper available at [Rpmalloc-CSharp](https://github.com/nxrighthere/Rpmalloc-CSharp) + +# License + +This is free and unencumbered software released into the public domain. + +Anyone is free to copy, modify, publish, use, compile, sell, or +distribute this software, either in source code form or as a compiled +binary, for any purpose, commercial or non-commercial, and by any +means. + +In jurisdictions that recognize copyright laws, the author or authors +of this software dedicate any and all copyright interest in the +software to the public domain. We make this dedication for the benefit +of the public at large and to the detriment of our heirs and +successors. We intend this dedication to be an overt act of +relinquishment in perpetuity of all present and future rights to this +software under copyright law. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. +IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR +OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, +ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR +OTHER DEALINGS IN THE SOFTWARE. + +For more information, please refer to + + +You can also use this software under the MIT license if public domain is +not recognized in your country + + +The MIT License (MIT) + +Copyright (c) 2017 Mattias Jansson + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. diff --git a/llvm/lib/Support/rpmalloc/malloc.c b/llvm/lib/Support/rpmalloc/malloc.c index 3fcfe848250c6b..59e13aab3ef7ed 100644 --- a/llvm/lib/Support/rpmalloc/malloc.c +++ b/llvm/lib/Support/rpmalloc/malloc.c @@ -1,724 +1,724 @@ -//===------------------------ malloc.c ------------------*- C -*-=============// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// -// -// This library provides a cross-platform lock free thread caching malloc -// implementation in C11. -// -// -// This file provides overrides for the standard library malloc entry points for -// C and new/delete operators for C++ It also provides automatic -// initialization/finalization of process and threads -// -//===----------------------------------------------------------------------===// - -#if defined(__TINYC__) -#include -#endif - -#ifndef ARCH_64BIT -#if defined(__LLP64__) || defined(__LP64__) || defined(_WIN64) -#define ARCH_64BIT 1 -_Static_assert(sizeof(size_t) == 8, "Data type size mismatch"); -_Static_assert(sizeof(void *) == 8, "Data type size mismatch"); -#else -#define ARCH_64BIT 0 -_Static_assert(sizeof(size_t) == 4, "Data type size mismatch"); -_Static_assert(sizeof(void *) == 4, "Data type size mismatch"); -#endif -#endif - -#if (defined(__GNUC__) || defined(__clang__)) -#pragma GCC visibility push(default) -#endif - -#define USE_IMPLEMENT 1 -#define USE_INTERPOSE 0 -#define USE_ALIAS 0 - -#if defined(__APPLE__) -#undef USE_INTERPOSE -#define USE_INTERPOSE 1 - -typedef struct interpose_t { - void *new_func; - void *orig_func; -} interpose_t; - -#define MAC_INTERPOSE_PAIR(newf, oldf) {(void *)newf, (void *)oldf} -#define MAC_INTERPOSE_SINGLE(newf, oldf) \ - __attribute__((used)) static const interpose_t macinterpose##newf##oldf \ - __attribute__((section("__DATA, __interpose"))) = \ - MAC_INTERPOSE_PAIR(newf, oldf) - -#endif - -#if !defined(_WIN32) && !defined(__APPLE__) -#undef USE_IMPLEMENT -#undef USE_ALIAS -#define USE_IMPLEMENT 0 -#define USE_ALIAS 1 -#endif - -#ifdef _MSC_VER -#pragma warning(disable : 4100) -#undef malloc -#undef free -#undef calloc -#define RPMALLOC_RESTRICT __declspec(restrict) -#else -#define RPMALLOC_RESTRICT -#endif - -#if ENABLE_OVERRIDE - -typedef struct rp_nothrow_t { - int __dummy; -} rp_nothrow_t; - -#if USE_IMPLEMENT - -extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL malloc(size_t size) { - return rpmalloc(size); -} -extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL calloc(size_t count, - size_t size) { - return rpcalloc(count, size); -} -extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL realloc(void *ptr, - size_t size) { - return rprealloc(ptr, size); -} -extern inline void *RPMALLOC_CDECL reallocf(void *ptr, size_t size) { - return rprealloc(ptr, size); -} -extern inline void *RPMALLOC_CDECL aligned_alloc(size_t alignment, - size_t size) { - return rpaligned_alloc(alignment, size); -} -extern inline void *RPMALLOC_CDECL memalign(size_t alignment, size_t size) { - return rpmemalign(alignment, size); -} -extern inline int RPMALLOC_CDECL posix_memalign(void **memptr, size_t alignment, - size_t size) { - return rpposix_memalign(memptr, alignment, size); -} -extern inline void RPMALLOC_CDECL free(void *ptr) { rpfree(ptr); } -extern inline void RPMALLOC_CDECL cfree(void *ptr) { rpfree(ptr); } -extern inline size_t RPMALLOC_CDECL malloc_usable_size(void *ptr) { - return rpmalloc_usable_size(ptr); -} -extern inline size_t RPMALLOC_CDECL malloc_size(void *ptr) { - return rpmalloc_usable_size(ptr); -} - -#ifdef _WIN32 -extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL _malloc_base(size_t size) { - return rpmalloc(size); -} -extern inline void RPMALLOC_CDECL _free_base(void *ptr) { rpfree(ptr); } -extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL _calloc_base(size_t count, - size_t size) { - return rpcalloc(count, size); -} -extern inline size_t RPMALLOC_CDECL _msize(void *ptr) { - return rpmalloc_usable_size(ptr); -} -extern inline size_t RPMALLOC_CDECL _msize_base(void *ptr) { - return rpmalloc_usable_size(ptr); -} -extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL -_realloc_base(void *ptr, size_t size) { - return rprealloc(ptr, size); -} -#endif - -#ifdef _WIN32 -// For Windows, #include in one source file to get the C++ operator -// overrides implemented in your module -#else -// Overload the C++ operators using the mangled names -// (https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling) operators -// delete and delete[] -#define RPDEFVIS __attribute__((visibility("default"))) -extern void _ZdlPv(void *p); -void RPDEFVIS _ZdlPv(void *p) { rpfree(p); } -extern void _ZdaPv(void *p); -void RPDEFVIS _ZdaPv(void *p) { rpfree(p); } -#if ARCH_64BIT -// 64-bit operators new and new[], normal and aligned -extern void *_Znwm(uint64_t size); -void *RPDEFVIS _Znwm(uint64_t size) { return rpmalloc(size); } -extern void *_Znam(uint64_t size); -void *RPDEFVIS _Znam(uint64_t size) { return rpmalloc(size); } -extern void *_Znwmm(uint64_t size, uint64_t align); -void *RPDEFVIS _Znwmm(uint64_t size, uint64_t align) { - return rpaligned_alloc(align, size); -} -extern void *_Znamm(uint64_t size, uint64_t align); -void *RPDEFVIS _Znamm(uint64_t size, uint64_t align) { - return rpaligned_alloc(align, size); -} -extern void *_ZnwmSt11align_val_t(uint64_t size, uint64_t align); -void *RPDEFVIS _ZnwmSt11align_val_t(uint64_t size, uint64_t align) { - return rpaligned_alloc(align, size); -} -extern void *_ZnamSt11align_val_t(uint64_t size, uint64_t align); -void *RPDEFVIS _ZnamSt11align_val_t(uint64_t size, uint64_t align) { - return rpaligned_alloc(align, size); -} -extern void *_ZnwmRKSt9nothrow_t(uint64_t size, rp_nothrow_t t); -void *RPDEFVIS _ZnwmRKSt9nothrow_t(uint64_t size, rp_nothrow_t t) { - (void)sizeof(t); - return rpmalloc(size); -} -extern void *_ZnamRKSt9nothrow_t(uint64_t size, rp_nothrow_t t); -void *RPDEFVIS _ZnamRKSt9nothrow_t(uint64_t size, rp_nothrow_t t) { - (void)sizeof(t); - return rpmalloc(size); -} -extern void *_ZnwmSt11align_val_tRKSt9nothrow_t(uint64_t size, uint64_t align, - rp_nothrow_t t); -void *RPDEFVIS _ZnwmSt11align_val_tRKSt9nothrow_t(uint64_t size, uint64_t align, - rp_nothrow_t t) { - (void)sizeof(t); - return rpaligned_alloc(align, size); -} -extern void *_ZnamSt11align_val_tRKSt9nothrow_t(uint64_t size, uint64_t align, - rp_nothrow_t t); -void *RPDEFVIS _ZnamSt11align_val_tRKSt9nothrow_t(uint64_t size, uint64_t align, - rp_nothrow_t t) { - (void)sizeof(t); - return rpaligned_alloc(align, size); -} -// 64-bit operators sized delete and delete[], normal and aligned -extern void _ZdlPvm(void *p, uint64_t size); -void RPDEFVIS _ZdlPvm(void *p, uint64_t size) { - rpfree(p); - (void)sizeof(size); -} -extern void _ZdaPvm(void *p, uint64_t size); -void RPDEFVIS _ZdaPvm(void *p, uint64_t size) { - rpfree(p); - (void)sizeof(size); -} -extern void _ZdlPvSt11align_val_t(void *p, uint64_t align); -void RPDEFVIS _ZdlPvSt11align_val_t(void *p, uint64_t align) { - rpfree(p); - (void)sizeof(align); -} -extern void _ZdaPvSt11align_val_t(void *p, uint64_t align); -void RPDEFVIS _ZdaPvSt11align_val_t(void *p, uint64_t align) { - rpfree(p); - (void)sizeof(align); -} -extern void _ZdlPvmSt11align_val_t(void *p, uint64_t size, uint64_t align); -void RPDEFVIS _ZdlPvmSt11align_val_t(void *p, uint64_t size, uint64_t align) { - rpfree(p); - (void)sizeof(size); - (void)sizeof(align); -} -extern void _ZdaPvmSt11align_val_t(void *p, uint64_t size, uint64_t align); -void RPDEFVIS _ZdaPvmSt11align_val_t(void *p, uint64_t size, uint64_t align) { - rpfree(p); - (void)sizeof(size); - (void)sizeof(align); -} -#else -// 32-bit operators new and new[], normal and aligned -extern void *_Znwj(uint32_t size); -void *RPDEFVIS _Znwj(uint32_t size) { return rpmalloc(size); } -extern void *_Znaj(uint32_t size); -void *RPDEFVIS _Znaj(uint32_t size) { return rpmalloc(size); } -extern void *_Znwjj(uint32_t size, uint32_t align); -void *RPDEFVIS _Znwjj(uint32_t size, uint32_t align) { - return rpaligned_alloc(align, size); -} -extern void *_Znajj(uint32_t size, uint32_t align); -void *RPDEFVIS _Znajj(uint32_t size, uint32_t align) { - return rpaligned_alloc(align, size); -} -extern void *_ZnwjSt11align_val_t(size_t size, size_t align); -void *RPDEFVIS _ZnwjSt11align_val_t(size_t size, size_t align) { - return rpaligned_alloc(align, size); -} -extern void *_ZnajSt11align_val_t(size_t size, size_t align); -void *RPDEFVIS _ZnajSt11align_val_t(size_t size, size_t align) { - return rpaligned_alloc(align, size); -} -extern void *_ZnwjRKSt9nothrow_t(size_t size, rp_nothrow_t t); -void *RPDEFVIS _ZnwjRKSt9nothrow_t(size_t size, rp_nothrow_t t) { - (void)sizeof(t); - return rpmalloc(size); -} -extern void *_ZnajRKSt9nothrow_t(size_t size, rp_nothrow_t t); -void *RPDEFVIS _ZnajRKSt9nothrow_t(size_t size, rp_nothrow_t t) { - (void)sizeof(t); - return rpmalloc(size); -} -extern void *_ZnwjSt11align_val_tRKSt9nothrow_t(size_t size, size_t align, - rp_nothrow_t t); -void *RPDEFVIS _ZnwjSt11align_val_tRKSt9nothrow_t(size_t size, size_t align, - rp_nothrow_t t) { - (void)sizeof(t); - return rpaligned_alloc(align, size); -} -extern void *_ZnajSt11align_val_tRKSt9nothrow_t(size_t size, size_t align, - rp_nothrow_t t); -void *RPDEFVIS _ZnajSt11align_val_tRKSt9nothrow_t(size_t size, size_t align, - rp_nothrow_t t) { - (void)sizeof(t); - return rpaligned_alloc(align, size); -} -// 32-bit operators sized delete and delete[], normal and aligned -extern void _ZdlPvj(void *p, uint64_t size); -void RPDEFVIS _ZdlPvj(void *p, uint64_t size) { - rpfree(p); - (void)sizeof(size); -} -extern void _ZdaPvj(void *p, uint64_t size); -void RPDEFVIS _ZdaPvj(void *p, uint64_t size) { - rpfree(p); - (void)sizeof(size); -} -extern void _ZdlPvSt11align_val_t(void *p, uint32_t align); -void RPDEFVIS _ZdlPvSt11align_val_t(void *p, uint64_t a) { - rpfree(p); - (void)sizeof(align); -} -extern void _ZdaPvSt11align_val_t(void *p, uint32_t align); -void RPDEFVIS _ZdaPvSt11align_val_t(void *p, uint64_t a) { - rpfree(p); - (void)sizeof(align); -} -extern void _ZdlPvjSt11align_val_t(void *p, uint32_t size, uint32_t align); -void RPDEFVIS _ZdlPvjSt11align_val_t(void *p, uint64_t size, uint64_t align) { - rpfree(p); - (void)sizeof(size); - (void)sizeof(a); -} -extern void _ZdaPvjSt11align_val_t(void *p, uint32_t size, uint32_t align); -void RPDEFVIS _ZdaPvjSt11align_val_t(void *p, uint64_t size, uint64_t align) { - rpfree(p); - (void)sizeof(size); - (void)sizeof(a); -} -#endif -#endif -#endif - -#if USE_INTERPOSE || USE_ALIAS - -static void *rpmalloc_nothrow(size_t size, rp_nothrow_t t) { - (void)sizeof(t); - return rpmalloc(size); -} -static void *rpaligned_alloc_reverse(size_t size, size_t align) { - return rpaligned_alloc(align, size); -} -static void *rpaligned_alloc_reverse_nothrow(size_t size, size_t align, - rp_nothrow_t t) { - (void)sizeof(t); - return rpaligned_alloc(align, size); -} -static void rpfree_size(void *p, size_t size) { - (void)sizeof(size); - rpfree(p); -} -static void rpfree_aligned(void *p, size_t align) { - (void)sizeof(align); - rpfree(p); -} -static void rpfree_size_aligned(void *p, size_t size, size_t align) { - (void)sizeof(size); - (void)sizeof(align); - rpfree(p); -} - -#endif - -#if USE_INTERPOSE - -__attribute__((used)) static const interpose_t macinterpose_malloc[] - __attribute__((section("__DATA, __interpose"))) = { - // new and new[] - MAC_INTERPOSE_PAIR(rpmalloc, _Znwm), - MAC_INTERPOSE_PAIR(rpmalloc, _Znam), - MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse, _Znwmm), - MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse, _Znamm), - MAC_INTERPOSE_PAIR(rpmalloc_nothrow, _ZnwmRKSt9nothrow_t), - MAC_INTERPOSE_PAIR(rpmalloc_nothrow, _ZnamRKSt9nothrow_t), - MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse, _ZnwmSt11align_val_t), - MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse, _ZnamSt11align_val_t), - MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse_nothrow, - _ZnwmSt11align_val_tRKSt9nothrow_t), - MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse_nothrow, - _ZnamSt11align_val_tRKSt9nothrow_t), - // delete and delete[] - MAC_INTERPOSE_PAIR(rpfree, _ZdlPv), MAC_INTERPOSE_PAIR(rpfree, _ZdaPv), - MAC_INTERPOSE_PAIR(rpfree_size, _ZdlPvm), - MAC_INTERPOSE_PAIR(rpfree_size, _ZdaPvm), - MAC_INTERPOSE_PAIR(rpfree_aligned, _ZdlPvSt11align_val_t), - MAC_INTERPOSE_PAIR(rpfree_aligned, _ZdaPvSt11align_val_t), - MAC_INTERPOSE_PAIR(rpfree_size_aligned, _ZdlPvmSt11align_val_t), - MAC_INTERPOSE_PAIR(rpfree_size_aligned, _ZdaPvmSt11align_val_t), - // libc entry points - MAC_INTERPOSE_PAIR(rpmalloc, malloc), - MAC_INTERPOSE_PAIR(rpmalloc, calloc), - MAC_INTERPOSE_PAIR(rprealloc, realloc), - MAC_INTERPOSE_PAIR(rprealloc, reallocf), -#if defined(__MAC_10_15) && __MAC_OS_X_VERSION_MIN_REQUIRED >= __MAC_10_15 - MAC_INTERPOSE_PAIR(rpaligned_alloc, aligned_alloc), -#endif - MAC_INTERPOSE_PAIR(rpmemalign, memalign), - MAC_INTERPOSE_PAIR(rpposix_memalign, posix_memalign), - MAC_INTERPOSE_PAIR(rpfree, free), MAC_INTERPOSE_PAIR(rpfree, cfree), - MAC_INTERPOSE_PAIR(rpmalloc_usable_size, malloc_usable_size), - MAC_INTERPOSE_PAIR(rpmalloc_usable_size, malloc_size)}; - -#endif - -#if USE_ALIAS - -#define RPALIAS(fn) __attribute__((alias(#fn), used, visibility("default"))); - -// Alias the C++ operators using the mangled names -// (https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling) - -// operators delete and delete[] -void _ZdlPv(void *p) RPALIAS(rpfree) void _ZdaPv(void *p) RPALIAS(rpfree) - -#if ARCH_64BIT - // 64-bit operators new and new[], normal and aligned - void *_Znwm(uint64_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(1) - RPALIAS(rpmalloc) void *_Znam(uint64_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(1) RPALIAS(rpmalloc) void *_Znwmm(uint64_t size, - uint64_t align) - RPALIAS(rpaligned_alloc_reverse) void *_Znamm(uint64_t size, - uint64_t align) - RPALIAS(rpaligned_alloc_reverse) void *_ZnwmSt11align_val_t( - size_t size, size_t align) - RPALIAS(rpaligned_alloc_reverse) void *_ZnamSt11align_val_t( - size_t size, size_t align) - RPALIAS(rpaligned_alloc_reverse) void *_ZnwmRKSt9nothrow_t( - size_t size, rp_nothrow_t t) - RPALIAS(rpmalloc_nothrow) void *_ZnamRKSt9nothrow_t( - size_t size, - rp_nothrow_t t) RPALIAS(rpmalloc_nothrow) void - *_ZnwmSt11align_val_tRKSt9nothrow_t(size_t size, - size_t align, - rp_nothrow_t t) - RPALIAS(rpaligned_alloc_reverse_nothrow) void - *_ZnamSt11align_val_tRKSt9nothrow_t( - size_t size, size_t align, - rp_nothrow_t t) - RPALIAS(rpaligned_alloc_reverse_nothrow) - // 64-bit operators delete and delete[], sized and aligned - void _ZdlPvm(void *p, size_t n) RPALIAS(rpfree_size) void _ZdaPvm(void *p, - size_t n) - RPALIAS(rpfree_size) void _ZdlPvSt11align_val_t(void *p, size_t a) - RPALIAS(rpfree_aligned) void _ZdaPvSt11align_val_t(void *p, - size_t a) - RPALIAS(rpfree_aligned) void _ZdlPvmSt11align_val_t(void *p, - size_t n, - size_t a) - RPALIAS(rpfree_size_aligned) void _ZdaPvmSt11align_val_t( - void *p, size_t n, size_t a) - RPALIAS(rpfree_size_aligned) -#else - // 32-bit operators new and new[], normal and aligned - void *_Znwj(uint32_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(1) - RPALIAS(rpmalloc) void *_Znaj(uint32_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(1) RPALIAS(rpmalloc) void *_Znwjj(uint32_t size, - uint32_t align) - RPALIAS(rpaligned_alloc_reverse) void *_Znajj(uint32_t size, - uint32_t align) - RPALIAS(rpaligned_alloc_reverse) void *_ZnwjSt11align_val_t( - size_t size, size_t align) - RPALIAS(rpaligned_alloc_reverse) void *_ZnajSt11align_val_t( - size_t size, size_t align) - RPALIAS(rpaligned_alloc_reverse) void *_ZnwjRKSt9nothrow_t( - size_t size, rp_nothrow_t t) - RPALIAS(rpmalloc_nothrow) void *_ZnajRKSt9nothrow_t( - size_t size, - rp_nothrow_t t) RPALIAS(rpmalloc_nothrow) void - *_ZnwjSt11align_val_tRKSt9nothrow_t(size_t size, - size_t align, - rp_nothrow_t t) - RPALIAS(rpaligned_alloc_reverse_nothrow) void - *_ZnajSt11align_val_tRKSt9nothrow_t( - size_t size, size_t align, - rp_nothrow_t t) - RPALIAS(rpaligned_alloc_reverse_nothrow) - // 32-bit operators delete and delete[], sized and aligned - void _ZdlPvj(void *p, size_t n) RPALIAS(rpfree_size) void _ZdaPvj(void *p, - size_t n) - RPALIAS(rpfree_size) void _ZdlPvSt11align_val_t(void *p, size_t a) - RPALIAS(rpfree_aligned) void _ZdaPvSt11align_val_t(void *p, - size_t a) - RPALIAS(rpfree_aligned) void _ZdlPvjSt11align_val_t(void *p, - size_t n, - size_t a) - RPALIAS(rpfree_size_aligned) void _ZdaPvjSt11align_val_t( - void *p, size_t n, size_t a) - RPALIAS(rpfree_size_aligned) -#endif - - void *malloc(size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(1) - RPALIAS(rpmalloc) void *calloc(size_t count, size_t size) - RPALIAS(rpcalloc) void *realloc(void *ptr, size_t size) - RPALIAS(rprealloc) void *reallocf(void *ptr, size_t size) - RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(2) - RPALIAS(rprealloc) void *aligned_alloc(size_t alignment, size_t size) - RPALIAS(rpaligned_alloc) void *memalign( - size_t alignment, size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(2) - RPALIAS(rpmemalign) int posix_memalign(void **memptr, size_t alignment, - size_t size) - RPALIAS(rpposix_memalign) void free(void *ptr) - RPALIAS(rpfree) void cfree(void *ptr) RPALIAS(rpfree) -#if defined(__ANDROID__) || defined(__FreeBSD__) - size_t - malloc_usable_size(const void *ptr) RPALIAS(rpmalloc_usable_size) -#else - size_t - malloc_usable_size(void *ptr) RPALIAS(rpmalloc_usable_size) -#endif - size_t malloc_size(void *ptr) RPALIAS(rpmalloc_usable_size) - -#endif - - static inline size_t _rpmalloc_page_size(void) { - return _memory_page_size; -} - -extern void *RPMALLOC_CDECL reallocarray(void *ptr, size_t count, size_t size); - -extern void *RPMALLOC_CDECL reallocarray(void *ptr, size_t count, size_t size) { - size_t total; -#if ENABLE_VALIDATE_ARGS -#ifdef _MSC_VER - int err = SizeTMult(count, size, &total); - if ((err != S_OK) || (total >= MAX_ALLOC_SIZE)) { - errno = EINVAL; - return 0; - } -#else - int err = __builtin_umull_overflow(count, size, &total); - if (err || (total >= MAX_ALLOC_SIZE)) { - errno = EINVAL; - return 0; - } -#endif -#else - total = count * size; -#endif - return realloc(ptr, total); -} - -extern inline void *RPMALLOC_CDECL valloc(size_t size) { - get_thread_heap(); - return rpaligned_alloc(_rpmalloc_page_size(), size); -} - -extern inline void *RPMALLOC_CDECL pvalloc(size_t size) { - get_thread_heap(); - const size_t page_size = _rpmalloc_page_size(); - const size_t aligned_size = ((size + page_size - 1) / page_size) * page_size; -#if ENABLE_VALIDATE_ARGS - if (aligned_size < size) { - errno = EINVAL; - return 0; - } -#endif - return rpaligned_alloc(_rpmalloc_page_size(), aligned_size); -} - -#endif // ENABLE_OVERRIDE - -#if ENABLE_PRELOAD - -#ifdef _WIN32 - -#if defined(BUILD_DYNAMIC_LINK) && BUILD_DYNAMIC_LINK - -extern __declspec(dllexport) BOOL WINAPI DllMain(HINSTANCE instance, - DWORD reason, LPVOID reserved); - -extern __declspec(dllexport) BOOL WINAPI DllMain(HINSTANCE instance, - DWORD reason, - LPVOID reserved) { - (void)sizeof(reserved); - (void)sizeof(instance); - if (reason == DLL_PROCESS_ATTACH) - rpmalloc_initialize(); - else if (reason == DLL_PROCESS_DETACH) - rpmalloc_finalize(); - else if (reason == DLL_THREAD_ATTACH) - rpmalloc_thread_initialize(); - else if (reason == DLL_THREAD_DETACH) - rpmalloc_thread_finalize(1); - return TRUE; -} - -// end BUILD_DYNAMIC_LINK -#else - -extern void _global_rpmalloc_init(void) { - rpmalloc_set_main_thread(); - rpmalloc_initialize(); -} - -#if defined(__clang__) || defined(__GNUC__) - -static void __attribute__((constructor)) initializer(void) { - _global_rpmalloc_init(); -} - -#elif defined(_MSC_VER) - -static int _global_rpmalloc_xib(void) { - _global_rpmalloc_init(); - return 0; -} - -#pragma section(".CRT$XIB", read) -__declspec(allocate(".CRT$XIB")) void (*_rpmalloc_module_init)(void) = - _global_rpmalloc_xib; -#if defined(_M_IX86) || defined(__i386__) -#pragma comment(linker, "/include:" \ - "__rpmalloc_module_init") -#else -#pragma comment(linker, "/include:" \ - "_rpmalloc_module_init") -#endif - -#endif - -// end !BUILD_DYNAMIC_LINK -#endif - -#else - -#include -#include -#include -#include - -extern void rpmalloc_set_main_thread(void); - -static pthread_key_t destructor_key; - -static void thread_destructor(void *); - -static void __attribute__((constructor)) initializer(void) { - rpmalloc_set_main_thread(); - rpmalloc_initialize(); - pthread_key_create(&destructor_key, thread_destructor); -} - -static void __attribute__((destructor)) finalizer(void) { rpmalloc_finalize(); } - -typedef struct { - void *(*real_start)(void *); - void *real_arg; -} thread_starter_arg; - -static void *thread_starter(void *argptr) { - thread_starter_arg *arg = argptr; - void *(*real_start)(void *) = arg->real_start; - void *real_arg = arg->real_arg; - rpmalloc_thread_initialize(); - rpfree(argptr); - pthread_setspecific(destructor_key, (void *)1); - return (*real_start)(real_arg); -} - -static void thread_destructor(void *value) { - (void)sizeof(value); - rpmalloc_thread_finalize(1); -} - -#ifdef __APPLE__ - -static int pthread_create_proxy(pthread_t *thread, const pthread_attr_t *attr, - void *(*start_routine)(void *), void *arg) { - rpmalloc_initialize(); - thread_starter_arg *starter_arg = rpmalloc(sizeof(thread_starter_arg)); - starter_arg->real_start = start_routine; - starter_arg->real_arg = arg; - return pthread_create(thread, attr, thread_starter, starter_arg); -} - -MAC_INTERPOSE_SINGLE(pthread_create_proxy, pthread_create); - -#else - -#include - -int pthread_create(pthread_t *thread, const pthread_attr_t *attr, - void *(*start_routine)(void *), void *arg) { -#if defined(__linux__) || defined(__FreeBSD__) || defined(__OpenBSD__) || \ - defined(__NetBSD__) || defined(__DragonFly__) || defined(__APPLE__) || \ - defined(__HAIKU__) - char fname[] = "pthread_create"; -#else - char fname[] = "_pthread_create"; -#endif - void *real_pthread_create = dlsym(RTLD_NEXT, fname); - rpmalloc_thread_initialize(); - thread_starter_arg *starter_arg = rpmalloc(sizeof(thread_starter_arg)); - starter_arg->real_start = start_routine; - starter_arg->real_arg = arg; - return (*(int (*)(pthread_t *, const pthread_attr_t *, void *(*)(void *), - void *))real_pthread_create)(thread, attr, thread_starter, - starter_arg); -} - -#endif - -#endif - -#endif - -#if ENABLE_OVERRIDE - -#if defined(__GLIBC__) && defined(__linux__) - -void *__libc_malloc(size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(1) - RPALIAS(rpmalloc) void *__libc_calloc(size_t count, size_t size) - RPMALLOC_ATTRIB_MALLOC RPMALLOC_ATTRIB_ALLOC_SIZE2(1, 2) - RPALIAS(rpcalloc) void *__libc_realloc(void *p, size_t size) - RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(2) RPALIAS(rprealloc) void __libc_free(void *p) - RPALIAS(rpfree) void __libc_cfree(void *p) - RPALIAS(rpfree) void *__libc_memalign(size_t align, size_t size) - RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(2) - RPALIAS(rpmemalign) int __posix_memalign(void **p, size_t align, - size_t size) - RPALIAS(rpposix_memalign) - - extern void *__libc_valloc(size_t size); -extern void *__libc_pvalloc(size_t size); - -void *__libc_valloc(size_t size) { return valloc(size); } - -void *__libc_pvalloc(size_t size) { return pvalloc(size); } - -#endif - -#endif - -#if (defined(__GNUC__) || defined(__clang__)) -#pragma GCC visibility pop -#endif +//===------------------------ malloc.c ------------------*- C -*-=============// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This library provides a cross-platform lock free thread caching malloc +// implementation in C11. +// +// +// This file provides overrides for the standard library malloc entry points for +// C and new/delete operators for C++ It also provides automatic +// initialization/finalization of process and threads +// +//===----------------------------------------------------------------------===// + +#if defined(__TINYC__) +#include +#endif + +#ifndef ARCH_64BIT +#if defined(__LLP64__) || defined(__LP64__) || defined(_WIN64) +#define ARCH_64BIT 1 +_Static_assert(sizeof(size_t) == 8, "Data type size mismatch"); +_Static_assert(sizeof(void *) == 8, "Data type size mismatch"); +#else +#define ARCH_64BIT 0 +_Static_assert(sizeof(size_t) == 4, "Data type size mismatch"); +_Static_assert(sizeof(void *) == 4, "Data type size mismatch"); +#endif +#endif + +#if (defined(__GNUC__) || defined(__clang__)) +#pragma GCC visibility push(default) +#endif + +#define USE_IMPLEMENT 1 +#define USE_INTERPOSE 0 +#define USE_ALIAS 0 + +#if defined(__APPLE__) +#undef USE_INTERPOSE +#define USE_INTERPOSE 1 + +typedef struct interpose_t { + void *new_func; + void *orig_func; +} interpose_t; + +#define MAC_INTERPOSE_PAIR(newf, oldf) {(void *)newf, (void *)oldf} +#define MAC_INTERPOSE_SINGLE(newf, oldf) \ + __attribute__((used)) static const interpose_t macinterpose##newf##oldf \ + __attribute__((section("__DATA, __interpose"))) = \ + MAC_INTERPOSE_PAIR(newf, oldf) + +#endif + +#if !defined(_WIN32) && !defined(__APPLE__) +#undef USE_IMPLEMENT +#undef USE_ALIAS +#define USE_IMPLEMENT 0 +#define USE_ALIAS 1 +#endif + +#ifdef _MSC_VER +#pragma warning(disable : 4100) +#undef malloc +#undef free +#undef calloc +#define RPMALLOC_RESTRICT __declspec(restrict) +#else +#define RPMALLOC_RESTRICT +#endif + +#if ENABLE_OVERRIDE + +typedef struct rp_nothrow_t { + int __dummy; +} rp_nothrow_t; + +#if USE_IMPLEMENT + +extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL malloc(size_t size) { + return rpmalloc(size); +} +extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL calloc(size_t count, + size_t size) { + return rpcalloc(count, size); +} +extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL realloc(void *ptr, + size_t size) { + return rprealloc(ptr, size); +} +extern inline void *RPMALLOC_CDECL reallocf(void *ptr, size_t size) { + return rprealloc(ptr, size); +} +extern inline void *RPMALLOC_CDECL aligned_alloc(size_t alignment, + size_t size) { + return rpaligned_alloc(alignment, size); +} +extern inline void *RPMALLOC_CDECL memalign(size_t alignment, size_t size) { + return rpmemalign(alignment, size); +} +extern inline int RPMALLOC_CDECL posix_memalign(void **memptr, size_t alignment, + size_t size) { + return rpposix_memalign(memptr, alignment, size); +} +extern inline void RPMALLOC_CDECL free(void *ptr) { rpfree(ptr); } +extern inline void RPMALLOC_CDECL cfree(void *ptr) { rpfree(ptr); } +extern inline size_t RPMALLOC_CDECL malloc_usable_size(void *ptr) { + return rpmalloc_usable_size(ptr); +} +extern inline size_t RPMALLOC_CDECL malloc_size(void *ptr) { + return rpmalloc_usable_size(ptr); +} + +#ifdef _WIN32 +extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL _malloc_base(size_t size) { + return rpmalloc(size); +} +extern inline void RPMALLOC_CDECL _free_base(void *ptr) { rpfree(ptr); } +extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL _calloc_base(size_t count, + size_t size) { + return rpcalloc(count, size); +} +extern inline size_t RPMALLOC_CDECL _msize(void *ptr) { + return rpmalloc_usable_size(ptr); +} +extern inline size_t RPMALLOC_CDECL _msize_base(void *ptr) { + return rpmalloc_usable_size(ptr); +} +extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL +_realloc_base(void *ptr, size_t size) { + return rprealloc(ptr, size); +} +#endif + +#ifdef _WIN32 +// For Windows, #include in one source file to get the C++ operator +// overrides implemented in your module +#else +// Overload the C++ operators using the mangled names +// (https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling) operators +// delete and delete[] +#define RPDEFVIS __attribute__((visibility("default"))) +extern void _ZdlPv(void *p); +void RPDEFVIS _ZdlPv(void *p) { rpfree(p); } +extern void _ZdaPv(void *p); +void RPDEFVIS _ZdaPv(void *p) { rpfree(p); } +#if ARCH_64BIT +// 64-bit operators new and new[], normal and aligned +extern void *_Znwm(uint64_t size); +void *RPDEFVIS _Znwm(uint64_t size) { return rpmalloc(size); } +extern void *_Znam(uint64_t size); +void *RPDEFVIS _Znam(uint64_t size) { return rpmalloc(size); } +extern void *_Znwmm(uint64_t size, uint64_t align); +void *RPDEFVIS _Znwmm(uint64_t size, uint64_t align) { + return rpaligned_alloc(align, size); +} +extern void *_Znamm(uint64_t size, uint64_t align); +void *RPDEFVIS _Znamm(uint64_t size, uint64_t align) { + return rpaligned_alloc(align, size); +} +extern void *_ZnwmSt11align_val_t(uint64_t size, uint64_t align); +void *RPDEFVIS _ZnwmSt11align_val_t(uint64_t size, uint64_t align) { + return rpaligned_alloc(align, size); +} +extern void *_ZnamSt11align_val_t(uint64_t size, uint64_t align); +void *RPDEFVIS _ZnamSt11align_val_t(uint64_t size, uint64_t align) { + return rpaligned_alloc(align, size); +} +extern void *_ZnwmRKSt9nothrow_t(uint64_t size, rp_nothrow_t t); +void *RPDEFVIS _ZnwmRKSt9nothrow_t(uint64_t size, rp_nothrow_t t) { + (void)sizeof(t); + return rpmalloc(size); +} +extern void *_ZnamRKSt9nothrow_t(uint64_t size, rp_nothrow_t t); +void *RPDEFVIS _ZnamRKSt9nothrow_t(uint64_t size, rp_nothrow_t t) { + (void)sizeof(t); + return rpmalloc(size); +} +extern void *_ZnwmSt11align_val_tRKSt9nothrow_t(uint64_t size, uint64_t align, + rp_nothrow_t t); +void *RPDEFVIS _ZnwmSt11align_val_tRKSt9nothrow_t(uint64_t size, uint64_t align, + rp_nothrow_t t) { + (void)sizeof(t); + return rpaligned_alloc(align, size); +} +extern void *_ZnamSt11align_val_tRKSt9nothrow_t(uint64_t size, uint64_t align, + rp_nothrow_t t); +void *RPDEFVIS _ZnamSt11align_val_tRKSt9nothrow_t(uint64_t size, uint64_t align, + rp_nothrow_t t) { + (void)sizeof(t); + return rpaligned_alloc(align, size); +} +// 64-bit operators sized delete and delete[], normal and aligned +extern void _ZdlPvm(void *p, uint64_t size); +void RPDEFVIS _ZdlPvm(void *p, uint64_t size) { + rpfree(p); + (void)sizeof(size); +} +extern void _ZdaPvm(void *p, uint64_t size); +void RPDEFVIS _ZdaPvm(void *p, uint64_t size) { + rpfree(p); + (void)sizeof(size); +} +extern void _ZdlPvSt11align_val_t(void *p, uint64_t align); +void RPDEFVIS _ZdlPvSt11align_val_t(void *p, uint64_t align) { + rpfree(p); + (void)sizeof(align); +} +extern void _ZdaPvSt11align_val_t(void *p, uint64_t align); +void RPDEFVIS _ZdaPvSt11align_val_t(void *p, uint64_t align) { + rpfree(p); + (void)sizeof(align); +} +extern void _ZdlPvmSt11align_val_t(void *p, uint64_t size, uint64_t align); +void RPDEFVIS _ZdlPvmSt11align_val_t(void *p, uint64_t size, uint64_t align) { + rpfree(p); + (void)sizeof(size); + (void)sizeof(align); +} +extern void _ZdaPvmSt11align_val_t(void *p, uint64_t size, uint64_t align); +void RPDEFVIS _ZdaPvmSt11align_val_t(void *p, uint64_t size, uint64_t align) { + rpfree(p); + (void)sizeof(size); + (void)sizeof(align); +} +#else +// 32-bit operators new and new[], normal and aligned +extern void *_Znwj(uint32_t size); +void *RPDEFVIS _Znwj(uint32_t size) { return rpmalloc(size); } +extern void *_Znaj(uint32_t size); +void *RPDEFVIS _Znaj(uint32_t size) { return rpmalloc(size); } +extern void *_Znwjj(uint32_t size, uint32_t align); +void *RPDEFVIS _Znwjj(uint32_t size, uint32_t align) { + return rpaligned_alloc(align, size); +} +extern void *_Znajj(uint32_t size, uint32_t align); +void *RPDEFVIS _Znajj(uint32_t size, uint32_t align) { + return rpaligned_alloc(align, size); +} +extern void *_ZnwjSt11align_val_t(size_t size, size_t align); +void *RPDEFVIS _ZnwjSt11align_val_t(size_t size, size_t align) { + return rpaligned_alloc(align, size); +} +extern void *_ZnajSt11align_val_t(size_t size, size_t align); +void *RPDEFVIS _ZnajSt11align_val_t(size_t size, size_t align) { + return rpaligned_alloc(align, size); +} +extern void *_ZnwjRKSt9nothrow_t(size_t size, rp_nothrow_t t); +void *RPDEFVIS _ZnwjRKSt9nothrow_t(size_t size, rp_nothrow_t t) { + (void)sizeof(t); + return rpmalloc(size); +} +extern void *_ZnajRKSt9nothrow_t(size_t size, rp_nothrow_t t); +void *RPDEFVIS _ZnajRKSt9nothrow_t(size_t size, rp_nothrow_t t) { + (void)sizeof(t); + return rpmalloc(size); +} +extern void *_ZnwjSt11align_val_tRKSt9nothrow_t(size_t size, size_t align, + rp_nothrow_t t); +void *RPDEFVIS _ZnwjSt11align_val_tRKSt9nothrow_t(size_t size, size_t align, + rp_nothrow_t t) { + (void)sizeof(t); + return rpaligned_alloc(align, size); +} +extern void *_ZnajSt11align_val_tRKSt9nothrow_t(size_t size, size_t align, + rp_nothrow_t t); +void *RPDEFVIS _ZnajSt11align_val_tRKSt9nothrow_t(size_t size, size_t align, + rp_nothrow_t t) { + (void)sizeof(t); + return rpaligned_alloc(align, size); +} +// 32-bit operators sized delete and delete[], normal and aligned +extern void _ZdlPvj(void *p, uint64_t size); +void RPDEFVIS _ZdlPvj(void *p, uint64_t size) { + rpfree(p); + (void)sizeof(size); +} +extern void _ZdaPvj(void *p, uint64_t size); +void RPDEFVIS _ZdaPvj(void *p, uint64_t size) { + rpfree(p); + (void)sizeof(size); +} +extern void _ZdlPvSt11align_val_t(void *p, uint32_t align); +void RPDEFVIS _ZdlPvSt11align_val_t(void *p, uint64_t a) { + rpfree(p); + (void)sizeof(align); +} +extern void _ZdaPvSt11align_val_t(void *p, uint32_t align); +void RPDEFVIS _ZdaPvSt11align_val_t(void *p, uint64_t a) { + rpfree(p); + (void)sizeof(align); +} +extern void _ZdlPvjSt11align_val_t(void *p, uint32_t size, uint32_t align); +void RPDEFVIS _ZdlPvjSt11align_val_t(void *p, uint64_t size, uint64_t align) { + rpfree(p); + (void)sizeof(size); + (void)sizeof(a); +} +extern void _ZdaPvjSt11align_val_t(void *p, uint32_t size, uint32_t align); +void RPDEFVIS _ZdaPvjSt11align_val_t(void *p, uint64_t size, uint64_t align) { + rpfree(p); + (void)sizeof(size); + (void)sizeof(a); +} +#endif +#endif +#endif + +#if USE_INTERPOSE || USE_ALIAS + +static void *rpmalloc_nothrow(size_t size, rp_nothrow_t t) { + (void)sizeof(t); + return rpmalloc(size); +} +static void *rpaligned_alloc_reverse(size_t size, size_t align) { + return rpaligned_alloc(align, size); +} +static void *rpaligned_alloc_reverse_nothrow(size_t size, size_t align, + rp_nothrow_t t) { + (void)sizeof(t); + return rpaligned_alloc(align, size); +} +static void rpfree_size(void *p, size_t size) { + (void)sizeof(size); + rpfree(p); +} +static void rpfree_aligned(void *p, size_t align) { + (void)sizeof(align); + rpfree(p); +} +static void rpfree_size_aligned(void *p, size_t size, size_t align) { + (void)sizeof(size); + (void)sizeof(align); + rpfree(p); +} + +#endif + +#if USE_INTERPOSE + +__attribute__((used)) static const interpose_t macinterpose_malloc[] + __attribute__((section("__DATA, __interpose"))) = { + // new and new[] + MAC_INTERPOSE_PAIR(rpmalloc, _Znwm), + MAC_INTERPOSE_PAIR(rpmalloc, _Znam), + MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse, _Znwmm), + MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse, _Znamm), + MAC_INTERPOSE_PAIR(rpmalloc_nothrow, _ZnwmRKSt9nothrow_t), + MAC_INTERPOSE_PAIR(rpmalloc_nothrow, _ZnamRKSt9nothrow_t), + MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse, _ZnwmSt11align_val_t), + MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse, _ZnamSt11align_val_t), + MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse_nothrow, + _ZnwmSt11align_val_tRKSt9nothrow_t), + MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse_nothrow, + _ZnamSt11align_val_tRKSt9nothrow_t), + // delete and delete[] + MAC_INTERPOSE_PAIR(rpfree, _ZdlPv), MAC_INTERPOSE_PAIR(rpfree, _ZdaPv), + MAC_INTERPOSE_PAIR(rpfree_size, _ZdlPvm), + MAC_INTERPOSE_PAIR(rpfree_size, _ZdaPvm), + MAC_INTERPOSE_PAIR(rpfree_aligned, _ZdlPvSt11align_val_t), + MAC_INTERPOSE_PAIR(rpfree_aligned, _ZdaPvSt11align_val_t), + MAC_INTERPOSE_PAIR(rpfree_size_aligned, _ZdlPvmSt11align_val_t), + MAC_INTERPOSE_PAIR(rpfree_size_aligned, _ZdaPvmSt11align_val_t), + // libc entry points + MAC_INTERPOSE_PAIR(rpmalloc, malloc), + MAC_INTERPOSE_PAIR(rpmalloc, calloc), + MAC_INTERPOSE_PAIR(rprealloc, realloc), + MAC_INTERPOSE_PAIR(rprealloc, reallocf), +#if defined(__MAC_10_15) && __MAC_OS_X_VERSION_MIN_REQUIRED >= __MAC_10_15 + MAC_INTERPOSE_PAIR(rpaligned_alloc, aligned_alloc), +#endif + MAC_INTERPOSE_PAIR(rpmemalign, memalign), + MAC_INTERPOSE_PAIR(rpposix_memalign, posix_memalign), + MAC_INTERPOSE_PAIR(rpfree, free), MAC_INTERPOSE_PAIR(rpfree, cfree), + MAC_INTERPOSE_PAIR(rpmalloc_usable_size, malloc_usable_size), + MAC_INTERPOSE_PAIR(rpmalloc_usable_size, malloc_size)}; + +#endif + +#if USE_ALIAS + +#define RPALIAS(fn) __attribute__((alias(#fn), used, visibility("default"))); + +// Alias the C++ operators using the mangled names +// (https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling) + +// operators delete and delete[] +void _ZdlPv(void *p) RPALIAS(rpfree) void _ZdaPv(void *p) RPALIAS(rpfree) + +#if ARCH_64BIT + // 64-bit operators new and new[], normal and aligned + void *_Znwm(uint64_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(1) + RPALIAS(rpmalloc) void *_Znam(uint64_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(1) RPALIAS(rpmalloc) void *_Znwmm(uint64_t size, + uint64_t align) + RPALIAS(rpaligned_alloc_reverse) void *_Znamm(uint64_t size, + uint64_t align) + RPALIAS(rpaligned_alloc_reverse) void *_ZnwmSt11align_val_t( + size_t size, size_t align) + RPALIAS(rpaligned_alloc_reverse) void *_ZnamSt11align_val_t( + size_t size, size_t align) + RPALIAS(rpaligned_alloc_reverse) void *_ZnwmRKSt9nothrow_t( + size_t size, rp_nothrow_t t) + RPALIAS(rpmalloc_nothrow) void *_ZnamRKSt9nothrow_t( + size_t size, + rp_nothrow_t t) RPALIAS(rpmalloc_nothrow) void + *_ZnwmSt11align_val_tRKSt9nothrow_t(size_t size, + size_t align, + rp_nothrow_t t) + RPALIAS(rpaligned_alloc_reverse_nothrow) void + *_ZnamSt11align_val_tRKSt9nothrow_t( + size_t size, size_t align, + rp_nothrow_t t) + RPALIAS(rpaligned_alloc_reverse_nothrow) + // 64-bit operators delete and delete[], sized and aligned + void _ZdlPvm(void *p, size_t n) RPALIAS(rpfree_size) void _ZdaPvm(void *p, + size_t n) + RPALIAS(rpfree_size) void _ZdlPvSt11align_val_t(void *p, size_t a) + RPALIAS(rpfree_aligned) void _ZdaPvSt11align_val_t(void *p, + size_t a) + RPALIAS(rpfree_aligned) void _ZdlPvmSt11align_val_t(void *p, + size_t n, + size_t a) + RPALIAS(rpfree_size_aligned) void _ZdaPvmSt11align_val_t( + void *p, size_t n, size_t a) + RPALIAS(rpfree_size_aligned) +#else + // 32-bit operators new and new[], normal and aligned + void *_Znwj(uint32_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(1) + RPALIAS(rpmalloc) void *_Znaj(uint32_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(1) RPALIAS(rpmalloc) void *_Znwjj(uint32_t size, + uint32_t align) + RPALIAS(rpaligned_alloc_reverse) void *_Znajj(uint32_t size, + uint32_t align) + RPALIAS(rpaligned_alloc_reverse) void *_ZnwjSt11align_val_t( + size_t size, size_t align) + RPALIAS(rpaligned_alloc_reverse) void *_ZnajSt11align_val_t( + size_t size, size_t align) + RPALIAS(rpaligned_alloc_reverse) void *_ZnwjRKSt9nothrow_t( + size_t size, rp_nothrow_t t) + RPALIAS(rpmalloc_nothrow) void *_ZnajRKSt9nothrow_t( + size_t size, + rp_nothrow_t t) RPALIAS(rpmalloc_nothrow) void + *_ZnwjSt11align_val_tRKSt9nothrow_t(size_t size, + size_t align, + rp_nothrow_t t) + RPALIAS(rpaligned_alloc_reverse_nothrow) void + *_ZnajSt11align_val_tRKSt9nothrow_t( + size_t size, size_t align, + rp_nothrow_t t) + RPALIAS(rpaligned_alloc_reverse_nothrow) + // 32-bit operators delete and delete[], sized and aligned + void _ZdlPvj(void *p, size_t n) RPALIAS(rpfree_size) void _ZdaPvj(void *p, + size_t n) + RPALIAS(rpfree_size) void _ZdlPvSt11align_val_t(void *p, size_t a) + RPALIAS(rpfree_aligned) void _ZdaPvSt11align_val_t(void *p, + size_t a) + RPALIAS(rpfree_aligned) void _ZdlPvjSt11align_val_t(void *p, + size_t n, + size_t a) + RPALIAS(rpfree_size_aligned) void _ZdaPvjSt11align_val_t( + void *p, size_t n, size_t a) + RPALIAS(rpfree_size_aligned) +#endif + + void *malloc(size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(1) + RPALIAS(rpmalloc) void *calloc(size_t count, size_t size) + RPALIAS(rpcalloc) void *realloc(void *ptr, size_t size) + RPALIAS(rprealloc) void *reallocf(void *ptr, size_t size) + RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(2) + RPALIAS(rprealloc) void *aligned_alloc(size_t alignment, size_t size) + RPALIAS(rpaligned_alloc) void *memalign( + size_t alignment, size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(2) + RPALIAS(rpmemalign) int posix_memalign(void **memptr, size_t alignment, + size_t size) + RPALIAS(rpposix_memalign) void free(void *ptr) + RPALIAS(rpfree) void cfree(void *ptr) RPALIAS(rpfree) +#if defined(__ANDROID__) || defined(__FreeBSD__) + size_t + malloc_usable_size(const void *ptr) RPALIAS(rpmalloc_usable_size) +#else + size_t + malloc_usable_size(void *ptr) RPALIAS(rpmalloc_usable_size) +#endif + size_t malloc_size(void *ptr) RPALIAS(rpmalloc_usable_size) + +#endif + + static inline size_t _rpmalloc_page_size(void) { + return _memory_page_size; +} + +extern void *RPMALLOC_CDECL reallocarray(void *ptr, size_t count, size_t size); + +extern void *RPMALLOC_CDECL reallocarray(void *ptr, size_t count, size_t size) { + size_t total; +#if ENABLE_VALIDATE_ARGS +#ifdef _MSC_VER + int err = SizeTMult(count, size, &total); + if ((err != S_OK) || (total >= MAX_ALLOC_SIZE)) { + errno = EINVAL; + return 0; + } +#else + int err = __builtin_umull_overflow(count, size, &total); + if (err || (total >= MAX_ALLOC_SIZE)) { + errno = EINVAL; + return 0; + } +#endif +#else + total = count * size; +#endif + return realloc(ptr, total); +} + +extern inline void *RPMALLOC_CDECL valloc(size_t size) { + get_thread_heap(); + return rpaligned_alloc(_rpmalloc_page_size(), size); +} + +extern inline void *RPMALLOC_CDECL pvalloc(size_t size) { + get_thread_heap(); + const size_t page_size = _rpmalloc_page_size(); + const size_t aligned_size = ((size + page_size - 1) / page_size) * page_size; +#if ENABLE_VALIDATE_ARGS + if (aligned_size < size) { + errno = EINVAL; + return 0; + } +#endif + return rpaligned_alloc(_rpmalloc_page_size(), aligned_size); +} + +#endif // ENABLE_OVERRIDE + +#if ENABLE_PRELOAD + +#ifdef _WIN32 + +#if defined(BUILD_DYNAMIC_LINK) && BUILD_DYNAMIC_LINK + +extern __declspec(dllexport) BOOL WINAPI DllMain(HINSTANCE instance, + DWORD reason, LPVOID reserved); + +extern __declspec(dllexport) BOOL WINAPI DllMain(HINSTANCE instance, + DWORD reason, + LPVOID reserved) { + (void)sizeof(reserved); + (void)sizeof(instance); + if (reason == DLL_PROCESS_ATTACH) + rpmalloc_initialize(); + else if (reason == DLL_PROCESS_DETACH) + rpmalloc_finalize(); + else if (reason == DLL_THREAD_ATTACH) + rpmalloc_thread_initialize(); + else if (reason == DLL_THREAD_DETACH) + rpmalloc_thread_finalize(1); + return TRUE; +} + +// end BUILD_DYNAMIC_LINK +#else + +extern void _global_rpmalloc_init(void) { + rpmalloc_set_main_thread(); + rpmalloc_initialize(); +} + +#if defined(__clang__) || defined(__GNUC__) + +static void __attribute__((constructor)) initializer(void) { + _global_rpmalloc_init(); +} + +#elif defined(_MSC_VER) + +static int _global_rpmalloc_xib(void) { + _global_rpmalloc_init(); + return 0; +} + +#pragma section(".CRT$XIB", read) +__declspec(allocate(".CRT$XIB")) void (*_rpmalloc_module_init)(void) = + _global_rpmalloc_xib; +#if defined(_M_IX86) || defined(__i386__) +#pragma comment(linker, "/include:" \ + "__rpmalloc_module_init") +#else +#pragma comment(linker, "/include:" \ + "_rpmalloc_module_init") +#endif + +#endif + +// end !BUILD_DYNAMIC_LINK +#endif + +#else + +#include +#include +#include +#include + +extern void rpmalloc_set_main_thread(void); + +static pthread_key_t destructor_key; + +static void thread_destructor(void *); + +static void __attribute__((constructor)) initializer(void) { + rpmalloc_set_main_thread(); + rpmalloc_initialize(); + pthread_key_create(&destructor_key, thread_destructor); +} + +static void __attribute__((destructor)) finalizer(void) { rpmalloc_finalize(); } + +typedef struct { + void *(*real_start)(void *); + void *real_arg; +} thread_starter_arg; + +static void *thread_starter(void *argptr) { + thread_starter_arg *arg = argptr; + void *(*real_start)(void *) = arg->real_start; + void *real_arg = arg->real_arg; + rpmalloc_thread_initialize(); + rpfree(argptr); + pthread_setspecific(destructor_key, (void *)1); + return (*real_start)(real_arg); +} + +static void thread_destructor(void *value) { + (void)sizeof(value); + rpmalloc_thread_finalize(1); +} + +#ifdef __APPLE__ + +static int pthread_create_proxy(pthread_t *thread, const pthread_attr_t *attr, + void *(*start_routine)(void *), void *arg) { + rpmalloc_initialize(); + thread_starter_arg *starter_arg = rpmalloc(sizeof(thread_starter_arg)); + starter_arg->real_start = start_routine; + starter_arg->real_arg = arg; + return pthread_create(thread, attr, thread_starter, starter_arg); +} + +MAC_INTERPOSE_SINGLE(pthread_create_proxy, pthread_create); + +#else + +#include + +int pthread_create(pthread_t *thread, const pthread_attr_t *attr, + void *(*start_routine)(void *), void *arg) { +#if defined(__linux__) || defined(__FreeBSD__) || defined(__OpenBSD__) || \ + defined(__NetBSD__) || defined(__DragonFly__) || defined(__APPLE__) || \ + defined(__HAIKU__) + char fname[] = "pthread_create"; +#else + char fname[] = "_pthread_create"; +#endif + void *real_pthread_create = dlsym(RTLD_NEXT, fname); + rpmalloc_thread_initialize(); + thread_starter_arg *starter_arg = rpmalloc(sizeof(thread_starter_arg)); + starter_arg->real_start = start_routine; + starter_arg->real_arg = arg; + return (*(int (*)(pthread_t *, const pthread_attr_t *, void *(*)(void *), + void *))real_pthread_create)(thread, attr, thread_starter, + starter_arg); +} + +#endif + +#endif + +#endif + +#if ENABLE_OVERRIDE + +#if defined(__GLIBC__) && defined(__linux__) + +void *__libc_malloc(size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(1) + RPALIAS(rpmalloc) void *__libc_calloc(size_t count, size_t size) + RPMALLOC_ATTRIB_MALLOC RPMALLOC_ATTRIB_ALLOC_SIZE2(1, 2) + RPALIAS(rpcalloc) void *__libc_realloc(void *p, size_t size) + RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(2) RPALIAS(rprealloc) void __libc_free(void *p) + RPALIAS(rpfree) void __libc_cfree(void *p) + RPALIAS(rpfree) void *__libc_memalign(size_t align, size_t size) + RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(2) + RPALIAS(rpmemalign) int __posix_memalign(void **p, size_t align, + size_t size) + RPALIAS(rpposix_memalign) + + extern void *__libc_valloc(size_t size); +extern void *__libc_pvalloc(size_t size); + +void *__libc_valloc(size_t size) { return valloc(size); } + +void *__libc_pvalloc(size_t size) { return pvalloc(size); } + +#endif + +#endif + +#if (defined(__GNUC__) || defined(__clang__)) +#pragma GCC visibility pop +#endif diff --git a/llvm/lib/Support/rpmalloc/rpmalloc.c b/llvm/lib/Support/rpmalloc/rpmalloc.c index a06d3cdb5b52ef..0976ec8ae6af4e 100644 --- a/llvm/lib/Support/rpmalloc/rpmalloc.c +++ b/llvm/lib/Support/rpmalloc/rpmalloc.c @@ -1,3992 +1,3992 @@ -//===---------------------- rpmalloc.c ------------------*- C -*-=============// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// -// -// This library provides a cross-platform lock free thread caching malloc -// implementation in C11. -// -//===----------------------------------------------------------------------===// - -#include "rpmalloc.h" - -//////////// -/// -/// Build time configurable limits -/// -////// - -#if defined(__clang__) -#pragma clang diagnostic ignored "-Wunused-macros" -#pragma clang diagnostic ignored "-Wunused-function" -#if __has_warning("-Wreserved-identifier") -#pragma clang diagnostic ignored "-Wreserved-identifier" -#endif -#if __has_warning("-Wstatic-in-inline") -#pragma clang diagnostic ignored "-Wstatic-in-inline" -#endif -#elif defined(__GNUC__) -#pragma GCC diagnostic ignored "-Wunused-macros" -#pragma GCC diagnostic ignored "-Wunused-function" -#endif - -#if !defined(__has_builtin) -#define __has_builtin(b) 0 -#endif - -#if defined(__GNUC__) || defined(__clang__) - -#if __has_builtin(__builtin_memcpy_inline) -#define _rpmalloc_memcpy_const(x, y, s) __builtin_memcpy_inline(x, y, s) -#else -#define _rpmalloc_memcpy_const(x, y, s) \ - do { \ - _Static_assert(__builtin_choose_expr(__builtin_constant_p(s), 1, 0), \ - "len must be a constant integer"); \ - memcpy(x, y, s); \ - } while (0) -#endif - -#if __has_builtin(__builtin_memset_inline) -#define _rpmalloc_memset_const(x, y, s) __builtin_memset_inline(x, y, s) -#else -#define _rpmalloc_memset_const(x, y, s) \ - do { \ - _Static_assert(__builtin_choose_expr(__builtin_constant_p(s), 1, 0), \ - "len must be a constant integer"); \ - memset(x, y, s); \ - } while (0) -#endif -#else -#define _rpmalloc_memcpy_const(x, y, s) memcpy(x, y, s) -#define _rpmalloc_memset_const(x, y, s) memset(x, y, s) -#endif - -#if __has_builtin(__builtin_assume) -#define rpmalloc_assume(cond) __builtin_assume(cond) -#elif defined(__GNUC__) -#define rpmalloc_assume(cond) \ - do { \ - if (!__builtin_expect(cond, 0)) \ - __builtin_unreachable(); \ - } while (0) -#elif defined(_MSC_VER) -#define rpmalloc_assume(cond) __assume(cond) -#else -#define rpmalloc_assume(cond) 0 -#endif - -#ifndef HEAP_ARRAY_SIZE -//! Size of heap hashmap -#define HEAP_ARRAY_SIZE 47 -#endif -#ifndef ENABLE_THREAD_CACHE -//! Enable per-thread cache -#define ENABLE_THREAD_CACHE 1 -#endif -#ifndef ENABLE_GLOBAL_CACHE -//! Enable global cache shared between all threads, requires thread cache -#define ENABLE_GLOBAL_CACHE 1 -#endif -#ifndef ENABLE_VALIDATE_ARGS -//! Enable validation of args to public entry points -#define ENABLE_VALIDATE_ARGS 0 -#endif -#ifndef ENABLE_STATISTICS -//! Enable statistics collection -#define ENABLE_STATISTICS 0 -#endif -#ifndef ENABLE_ASSERTS -//! Enable asserts -#define ENABLE_ASSERTS 0 -#endif -#ifndef ENABLE_OVERRIDE -//! Override standard library malloc/free and new/delete entry points -#define ENABLE_OVERRIDE 0 -#endif -#ifndef ENABLE_PRELOAD -//! Support preloading -#define ENABLE_PRELOAD 0 -#endif -#ifndef DISABLE_UNMAP -//! Disable unmapping memory pages (also enables unlimited cache) -#define DISABLE_UNMAP 0 -#endif -#ifndef ENABLE_UNLIMITED_CACHE -//! Enable unlimited global cache (no unmapping until finalization) -#define ENABLE_UNLIMITED_CACHE 0 -#endif -#ifndef ENABLE_ADAPTIVE_THREAD_CACHE -//! Enable adaptive thread cache size based on use heuristics -#define ENABLE_ADAPTIVE_THREAD_CACHE 0 -#endif -#ifndef DEFAULT_SPAN_MAP_COUNT -//! Default number of spans to map in call to map more virtual memory (default -//! values yield 4MiB here) -#define DEFAULT_SPAN_MAP_COUNT 64 -#endif -#ifndef GLOBAL_CACHE_MULTIPLIER -//! Multiplier for global cache -#define GLOBAL_CACHE_MULTIPLIER 8 -#endif - -#if DISABLE_UNMAP && !ENABLE_GLOBAL_CACHE -#error Must use global cache if unmap is disabled -#endif - -#if DISABLE_UNMAP -#undef ENABLE_UNLIMITED_CACHE -#define ENABLE_UNLIMITED_CACHE 1 -#endif - -#if !ENABLE_GLOBAL_CACHE -#undef ENABLE_UNLIMITED_CACHE -#define ENABLE_UNLIMITED_CACHE 0 -#endif - -#if !ENABLE_THREAD_CACHE -#undef ENABLE_ADAPTIVE_THREAD_CACHE -#define ENABLE_ADAPTIVE_THREAD_CACHE 0 -#endif - -#if defined(_WIN32) || defined(__WIN32__) || defined(_WIN64) -#define PLATFORM_WINDOWS 1 -#define PLATFORM_POSIX 0 -#else -#define PLATFORM_WINDOWS 0 -#define PLATFORM_POSIX 1 -#endif - -/// Platform and arch specifics -#if defined(_MSC_VER) && !defined(__clang__) -#pragma warning(disable : 5105) -#ifndef FORCEINLINE -#define FORCEINLINE inline __forceinline -#endif -#define _Static_assert static_assert -#else -#ifndef FORCEINLINE -#define FORCEINLINE inline __attribute__((__always_inline__)) -#endif -#endif -#if PLATFORM_WINDOWS -#ifndef WIN32_LEAN_AND_MEAN -#define WIN32_LEAN_AND_MEAN -#endif -#include -#if ENABLE_VALIDATE_ARGS -#include -#endif -#else -#include -#include -#include -#include -#if defined(__linux__) || defined(__ANDROID__) -#include -#if !defined(PR_SET_VMA) -#define PR_SET_VMA 0x53564d41 -#define PR_SET_VMA_ANON_NAME 0 -#endif -#endif -#if defined(__APPLE__) -#include -#if !TARGET_OS_IPHONE && !TARGET_OS_SIMULATOR -#include -#include -#endif -#include -#endif -#if defined(__HAIKU__) || defined(__TINYC__) -#include -#endif -#endif - -#include -#include -#include - -#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) -#include -static DWORD fls_key; -#endif - -#if PLATFORM_POSIX -#include -#include -#ifdef __FreeBSD__ -#include -#define MAP_HUGETLB MAP_ALIGNED_SUPER -#ifndef PROT_MAX -#define PROT_MAX(f) 0 -#endif -#else -#define PROT_MAX(f) 0 -#endif -#ifdef __sun -extern int madvise(caddr_t, size_t, int); -#endif -#ifndef MAP_UNINITIALIZED -#define MAP_UNINITIALIZED 0 -#endif -#endif -#include - -#if ENABLE_ASSERTS -#undef NDEBUG -#if defined(_MSC_VER) && !defined(_DEBUG) -#define _DEBUG -#endif -#include -#define RPMALLOC_TOSTRING_M(x) #x -#define RPMALLOC_TOSTRING(x) RPMALLOC_TOSTRING_M(x) -#define rpmalloc_assert(truth, message) \ - do { \ - if (!(truth)) { \ - if (_memory_config.error_callback) { \ - _memory_config.error_callback(message " (" RPMALLOC_TOSTRING( \ - truth) ") at " __FILE__ ":" RPMALLOC_TOSTRING(__LINE__)); \ - } else { \ - assert((truth) && message); \ - } \ - } \ - } while (0) -#else -#define rpmalloc_assert(truth, message) \ - do { \ - } while (0) -#endif -#if ENABLE_STATISTICS -#include -#endif - -////// -/// -/// Atomic access abstraction (since MSVC does not do C11 yet) -/// -////// - -#if defined(_MSC_VER) && !defined(__clang__) - -typedef volatile long atomic32_t; -typedef volatile long long atomic64_t; -typedef volatile void *atomicptr_t; - -static FORCEINLINE int32_t atomic_load32(atomic32_t *src) { return *src; } -static FORCEINLINE void atomic_store32(atomic32_t *dst, int32_t val) { - *dst = val; -} -static FORCEINLINE int32_t atomic_incr32(atomic32_t *val) { - return (int32_t)InterlockedIncrement(val); -} -static FORCEINLINE int32_t atomic_decr32(atomic32_t *val) { - return (int32_t)InterlockedDecrement(val); -} -static FORCEINLINE int32_t atomic_add32(atomic32_t *val, int32_t add) { - return (int32_t)InterlockedExchangeAdd(val, add) + add; -} -static FORCEINLINE int atomic_cas32_acquire(atomic32_t *dst, int32_t val, - int32_t ref) { - return (InterlockedCompareExchange(dst, val, ref) == ref) ? 1 : 0; -} -static FORCEINLINE void atomic_store32_release(atomic32_t *dst, int32_t val) { - *dst = val; -} -static FORCEINLINE int64_t atomic_load64(atomic64_t *src) { return *src; } -static FORCEINLINE int64_t atomic_add64(atomic64_t *val, int64_t add) { - return (int64_t)InterlockedExchangeAdd64(val, add) + add; -} -static FORCEINLINE void *atomic_load_ptr(atomicptr_t *src) { - return (void *)*src; -} -static FORCEINLINE void atomic_store_ptr(atomicptr_t *dst, void *val) { - *dst = val; -} -static FORCEINLINE void atomic_store_ptr_release(atomicptr_t *dst, void *val) { - *dst = val; -} -static FORCEINLINE void *atomic_exchange_ptr_acquire(atomicptr_t *dst, - void *val) { - return (void *)InterlockedExchangePointer((void *volatile *)dst, val); -} -static FORCEINLINE int atomic_cas_ptr(atomicptr_t *dst, void *val, void *ref) { - return (InterlockedCompareExchangePointer((void *volatile *)dst, val, ref) == - ref) - ? 1 - : 0; -} - -#define EXPECTED(x) (x) -#define UNEXPECTED(x) (x) - -#else - -#include - -typedef volatile _Atomic(int32_t) atomic32_t; -typedef volatile _Atomic(int64_t) atomic64_t; -typedef volatile _Atomic(void *) atomicptr_t; - -static FORCEINLINE int32_t atomic_load32(atomic32_t *src) { - return atomic_load_explicit(src, memory_order_relaxed); -} -static FORCEINLINE void atomic_store32(atomic32_t *dst, int32_t val) { - atomic_store_explicit(dst, val, memory_order_relaxed); -} -static FORCEINLINE int32_t atomic_incr32(atomic32_t *val) { - return atomic_fetch_add_explicit(val, 1, memory_order_relaxed) + 1; -} -static FORCEINLINE int32_t atomic_decr32(atomic32_t *val) { - return atomic_fetch_add_explicit(val, -1, memory_order_relaxed) - 1; -} -static FORCEINLINE int32_t atomic_add32(atomic32_t *val, int32_t add) { - return atomic_fetch_add_explicit(val, add, memory_order_relaxed) + add; -} -static FORCEINLINE int atomic_cas32_acquire(atomic32_t *dst, int32_t val, - int32_t ref) { - return atomic_compare_exchange_weak_explicit( - dst, &ref, val, memory_order_acquire, memory_order_relaxed); -} -static FORCEINLINE void atomic_store32_release(atomic32_t *dst, int32_t val) { - atomic_store_explicit(dst, val, memory_order_release); -} -static FORCEINLINE int64_t atomic_load64(atomic64_t *val) { - return atomic_load_explicit(val, memory_order_relaxed); -} -static FORCEINLINE int64_t atomic_add64(atomic64_t *val, int64_t add) { - return atomic_fetch_add_explicit(val, add, memory_order_relaxed) + add; -} -static FORCEINLINE void *atomic_load_ptr(atomicptr_t *src) { - return atomic_load_explicit(src, memory_order_relaxed); -} -static FORCEINLINE void atomic_store_ptr(atomicptr_t *dst, void *val) { - atomic_store_explicit(dst, val, memory_order_relaxed); -} -static FORCEINLINE void atomic_store_ptr_release(atomicptr_t *dst, void *val) { - atomic_store_explicit(dst, val, memory_order_release); -} -static FORCEINLINE void *atomic_exchange_ptr_acquire(atomicptr_t *dst, - void *val) { - return atomic_exchange_explicit(dst, val, memory_order_acquire); -} -static FORCEINLINE int atomic_cas_ptr(atomicptr_t *dst, void *val, void *ref) { - return atomic_compare_exchange_weak_explicit( - dst, &ref, val, memory_order_relaxed, memory_order_relaxed); -} - -#define EXPECTED(x) __builtin_expect((x), 1) -#define UNEXPECTED(x) __builtin_expect((x), 0) - -#endif - -//////////// -/// -/// Statistics related functions (evaluate to nothing when statistics not -/// enabled) -/// -////// - -#if ENABLE_STATISTICS -#define _rpmalloc_stat_inc(counter) atomic_incr32(counter) -#define _rpmalloc_stat_dec(counter) atomic_decr32(counter) -#define _rpmalloc_stat_add(counter, value) \ - atomic_add32(counter, (int32_t)(value)) -#define _rpmalloc_stat_add64(counter, value) \ - atomic_add64(counter, (int64_t)(value)) -#define _rpmalloc_stat_add_peak(counter, value, peak) \ - do { \ - int32_t _cur_count = atomic_add32(counter, (int32_t)(value)); \ - if (_cur_count > (peak)) \ - peak = _cur_count; \ - } while (0) -#define _rpmalloc_stat_sub(counter, value) \ - atomic_add32(counter, -(int32_t)(value)) -#define _rpmalloc_stat_inc_alloc(heap, class_idx) \ - do { \ - int32_t alloc_current = \ - atomic_incr32(&heap->size_class_use[class_idx].alloc_current); \ - if (alloc_current > heap->size_class_use[class_idx].alloc_peak) \ - heap->size_class_use[class_idx].alloc_peak = alloc_current; \ - atomic_incr32(&heap->size_class_use[class_idx].alloc_total); \ - } while (0) -#define _rpmalloc_stat_inc_free(heap, class_idx) \ - do { \ - atomic_decr32(&heap->size_class_use[class_idx].alloc_current); \ - atomic_incr32(&heap->size_class_use[class_idx].free_total); \ - } while (0) -#else -#define _rpmalloc_stat_inc(counter) \ - do { \ - } while (0) -#define _rpmalloc_stat_dec(counter) \ - do { \ - } while (0) -#define _rpmalloc_stat_add(counter, value) \ - do { \ - } while (0) -#define _rpmalloc_stat_add64(counter, value) \ - do { \ - } while (0) -#define _rpmalloc_stat_add_peak(counter, value, peak) \ - do { \ - } while (0) -#define _rpmalloc_stat_sub(counter, value) \ - do { \ - } while (0) -#define _rpmalloc_stat_inc_alloc(heap, class_idx) \ - do { \ - } while (0) -#define _rpmalloc_stat_inc_free(heap, class_idx) \ - do { \ - } while (0) -#endif - -/// -/// Preconfigured limits and sizes -/// - -//! Granularity of a small allocation block (must be power of two) -#define SMALL_GRANULARITY 16 -//! Small granularity shift count -#define SMALL_GRANULARITY_SHIFT 4 -//! Number of small block size classes -#define SMALL_CLASS_COUNT 65 -//! Maximum size of a small block -#define SMALL_SIZE_LIMIT (SMALL_GRANULARITY * (SMALL_CLASS_COUNT - 1)) -//! Granularity of a medium allocation block -#define MEDIUM_GRANULARITY 512 -//! Medium granularity shift count -#define MEDIUM_GRANULARITY_SHIFT 9 -//! Number of medium block size classes -#define MEDIUM_CLASS_COUNT 61 -//! Total number of small + medium size classes -#define SIZE_CLASS_COUNT (SMALL_CLASS_COUNT + MEDIUM_CLASS_COUNT) -//! Number of large block size classes -#define LARGE_CLASS_COUNT 63 -//! Maximum size of a medium block -#define MEDIUM_SIZE_LIMIT \ - (SMALL_SIZE_LIMIT + (MEDIUM_GRANULARITY * MEDIUM_CLASS_COUNT)) -//! Maximum size of a large block -#define LARGE_SIZE_LIMIT \ - ((LARGE_CLASS_COUNT * _memory_span_size) - SPAN_HEADER_SIZE) -//! Size of a span header (must be a multiple of SMALL_GRANULARITY and a power -//! of two) -#define SPAN_HEADER_SIZE 128 -//! Number of spans in thread cache -#define MAX_THREAD_SPAN_CACHE 400 -//! Number of spans to transfer between thread and global cache -#define THREAD_SPAN_CACHE_TRANSFER 64 -//! Number of spans in thread cache for large spans (must be greater than -//! LARGE_CLASS_COUNT / 2) -#define MAX_THREAD_SPAN_LARGE_CACHE 100 -//! Number of spans to transfer between thread and global cache for large spans -#define THREAD_SPAN_LARGE_CACHE_TRANSFER 6 - -_Static_assert((SMALL_GRANULARITY & (SMALL_GRANULARITY - 1)) == 0, - "Small granularity must be power of two"); -_Static_assert((SPAN_HEADER_SIZE & (SPAN_HEADER_SIZE - 1)) == 0, - "Span header size must be power of two"); - -#if ENABLE_VALIDATE_ARGS -//! Maximum allocation size to avoid integer overflow -#undef MAX_ALLOC_SIZE -#define MAX_ALLOC_SIZE (((size_t) - 1) - _memory_span_size) -#endif - -#define pointer_offset(ptr, ofs) (void *)((char *)(ptr) + (ptrdiff_t)(ofs)) -#define pointer_diff(first, second) \ - (ptrdiff_t)((const char *)(first) - (const char *)(second)) - -#define INVALID_POINTER ((void *)((uintptr_t) - 1)) - -#define SIZE_CLASS_LARGE SIZE_CLASS_COUNT -#define SIZE_CLASS_HUGE ((uint32_t) - 1) - -//////////// -/// -/// Data types -/// -////// - -//! A memory heap, per thread -typedef struct heap_t heap_t; -//! Span of memory pages -typedef struct span_t span_t; -//! Span list -typedef struct span_list_t span_list_t; -//! Span active data -typedef struct span_active_t span_active_t; -//! Size class definition -typedef struct size_class_t size_class_t; -//! Global cache -typedef struct global_cache_t global_cache_t; - -//! Flag indicating span is the first (master) span of a split superspan -#define SPAN_FLAG_MASTER 1U -//! Flag indicating span is a secondary (sub) span of a split superspan -#define SPAN_FLAG_SUBSPAN 2U -//! Flag indicating span has blocks with increased alignment -#define SPAN_FLAG_ALIGNED_BLOCKS 4U -//! Flag indicating an unmapped master span -#define SPAN_FLAG_UNMAPPED_MASTER 8U - -#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS -struct span_use_t { - //! Current number of spans used (actually used, not in cache) - atomic32_t current; - //! High water mark of spans used - atomic32_t high; -#if ENABLE_STATISTICS - //! Number of spans in deferred list - atomic32_t spans_deferred; - //! Number of spans transitioned to global cache - atomic32_t spans_to_global; - //! Number of spans transitioned from global cache - atomic32_t spans_from_global; - //! Number of spans transitioned to thread cache - atomic32_t spans_to_cache; - //! Number of spans transitioned from thread cache - atomic32_t spans_from_cache; - //! Number of spans transitioned to reserved state - atomic32_t spans_to_reserved; - //! Number of spans transitioned from reserved state - atomic32_t spans_from_reserved; - //! Number of raw memory map calls - atomic32_t spans_map_calls; -#endif -}; -typedef struct span_use_t span_use_t; -#endif - -#if ENABLE_STATISTICS -struct size_class_use_t { - //! Current number of allocations - atomic32_t alloc_current; - //! Peak number of allocations - int32_t alloc_peak; - //! Total number of allocations - atomic32_t alloc_total; - //! Total number of frees - atomic32_t free_total; - //! Number of spans in use - atomic32_t spans_current; - //! Number of spans transitioned to cache - int32_t spans_peak; - //! Number of spans transitioned to cache - atomic32_t spans_to_cache; - //! Number of spans transitioned from cache - atomic32_t spans_from_cache; - //! Number of spans transitioned from reserved state - atomic32_t spans_from_reserved; - //! Number of spans mapped - atomic32_t spans_map_calls; - int32_t unused; -}; -typedef struct size_class_use_t size_class_use_t; -#endif - -// A span can either represent a single span of memory pages with size declared -// by span_map_count configuration variable, or a set of spans in a continuous -// region, a super span. Any reference to the term "span" usually refers to both -// a single span or a super span. A super span can further be divided into -// multiple spans (or this, super spans), where the first (super)span is the -// master and subsequent (super)spans are subspans. The master span keeps track -// of how many subspans that are still alive and mapped in virtual memory, and -// once all subspans and master have been unmapped the entire superspan region -// is released and unmapped (on Windows for example, the entire superspan range -// has to be released in the same call to release the virtual memory range, but -// individual subranges can be decommitted individually to reduce physical -// memory use). -struct span_t { - //! Free list - void *free_list; - //! Total block count of size class - uint32_t block_count; - //! Size class - uint32_t size_class; - //! Index of last block initialized in free list - uint32_t free_list_limit; - //! Number of used blocks remaining when in partial state - uint32_t used_count; - //! Deferred free list - atomicptr_t free_list_deferred; - //! Size of deferred free list, or list of spans when part of a cache list - uint32_t list_size; - //! Size of a block - uint32_t block_size; - //! Flags and counters - uint32_t flags; - //! Number of spans - uint32_t span_count; - //! Total span counter for master spans - uint32_t total_spans; - //! Offset from master span for subspans - uint32_t offset_from_master; - //! Remaining span counter, for master spans - atomic32_t remaining_spans; - //! Alignment offset - uint32_t align_offset; - //! Owning heap - heap_t *heap; - //! Next span - span_t *next; - //! Previous span - span_t *prev; -}; -_Static_assert(sizeof(span_t) <= SPAN_HEADER_SIZE, "span size mismatch"); - -struct span_cache_t { - size_t count; - span_t *span[MAX_THREAD_SPAN_CACHE]; -}; -typedef struct span_cache_t span_cache_t; - -struct span_large_cache_t { - size_t count; - span_t *span[MAX_THREAD_SPAN_LARGE_CACHE]; -}; -typedef struct span_large_cache_t span_large_cache_t; - -struct heap_size_class_t { - //! Free list of active span - void *free_list; - //! Double linked list of partially used spans with free blocks. - // Previous span pointer in head points to tail span of list. - span_t *partial_span; - //! Early level cache of fully free spans - span_t *cache; -}; -typedef struct heap_size_class_t heap_size_class_t; - -// Control structure for a heap, either a thread heap or a first class heap if -// enabled -struct heap_t { - //! Owning thread ID - uintptr_t owner_thread; - //! Free lists for each size class - heap_size_class_t size_class[SIZE_CLASS_COUNT]; -#if ENABLE_THREAD_CACHE - //! Arrays of fully freed spans, single span - span_cache_t span_cache; -#endif - //! List of deferred free spans (single linked list) - atomicptr_t span_free_deferred; - //! Number of full spans - size_t full_span_count; - //! Mapped but unused spans - span_t *span_reserve; - //! Master span for mapped but unused spans - span_t *span_reserve_master; - //! Number of mapped but unused spans - uint32_t spans_reserved; - //! Child count - atomic32_t child_count; - //! Next heap in id list - heap_t *next_heap; - //! Next heap in orphan list - heap_t *next_orphan; - //! Heap ID - int32_t id; - //! Finalization state flag - int finalize; - //! Master heap owning the memory pages - heap_t *master_heap; -#if ENABLE_THREAD_CACHE - //! Arrays of fully freed spans, large spans with > 1 span count - span_large_cache_t span_large_cache[LARGE_CLASS_COUNT - 1]; -#endif -#if RPMALLOC_FIRST_CLASS_HEAPS - //! Double linked list of fully utilized spans with free blocks for each size - //! class. - // Previous span pointer in head points to tail span of list. - span_t *full_span[SIZE_CLASS_COUNT]; - //! Double linked list of large and huge spans allocated by this heap - span_t *large_huge_span; -#endif -#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS - //! Current and high water mark of spans used per span count - span_use_t span_use[LARGE_CLASS_COUNT]; -#endif -#if ENABLE_STATISTICS - //! Allocation stats per size class - size_class_use_t size_class_use[SIZE_CLASS_COUNT + 1]; - //! Number of bytes transitioned thread -> global - atomic64_t thread_to_global; - //! Number of bytes transitioned global -> thread - atomic64_t global_to_thread; -#endif -}; - -// Size class for defining a block size bucket -struct size_class_t { - //! Size of blocks in this class - uint32_t block_size; - //! Number of blocks in each chunk - uint16_t block_count; - //! Class index this class is merged with - uint16_t class_idx; -}; -_Static_assert(sizeof(size_class_t) == 8, "Size class size mismatch"); - -struct global_cache_t { - //! Cache lock - atomic32_t lock; - //! Cache count - uint32_t count; -#if ENABLE_STATISTICS - //! Insert count - size_t insert_count; - //! Extract count - size_t extract_count; -#endif - //! Cached spans - span_t *span[GLOBAL_CACHE_MULTIPLIER * MAX_THREAD_SPAN_CACHE]; - //! Unlimited cache overflow - span_t *overflow; -}; - -//////////// -/// -/// Global data -/// -////// - -//! Default span size (64KiB) -#define _memory_default_span_size (64 * 1024) -#define _memory_default_span_size_shift 16 -#define _memory_default_span_mask (~((uintptr_t)(_memory_span_size - 1))) - -//! Initialized flag -static int _rpmalloc_initialized; -//! Main thread ID -static uintptr_t _rpmalloc_main_thread_id; -//! Configuration -static rpmalloc_config_t _memory_config; -//! Memory page size -static size_t _memory_page_size; -//! Shift to divide by page size -static size_t _memory_page_size_shift; -//! Granularity at which memory pages are mapped by OS -static size_t _memory_map_granularity; -#if RPMALLOC_CONFIGURABLE -//! Size of a span of memory pages -static size_t _memory_span_size; -//! Shift to divide by span size -static size_t _memory_span_size_shift; -//! Mask to get to start of a memory span -static uintptr_t _memory_span_mask; -#else -//! Hardwired span size -#define _memory_span_size _memory_default_span_size -#define _memory_span_size_shift _memory_default_span_size_shift -#define _memory_span_mask _memory_default_span_mask -#endif -//! Number of spans to map in each map call -static size_t _memory_span_map_count; -//! Number of spans to keep reserved in each heap -static size_t _memory_heap_reserve_count; -//! Global size classes -static size_class_t _memory_size_class[SIZE_CLASS_COUNT]; -//! Run-time size limit of medium blocks -static size_t _memory_medium_size_limit; -//! Heap ID counter -static atomic32_t _memory_heap_id; -//! Huge page support -static int _memory_huge_pages; -#if ENABLE_GLOBAL_CACHE -//! Global span cache -static global_cache_t _memory_span_cache[LARGE_CLASS_COUNT]; -#endif -//! Global reserved spans -static span_t *_memory_global_reserve; -//! Global reserved count -static size_t _memory_global_reserve_count; -//! Global reserved master -static span_t *_memory_global_reserve_master; -//! All heaps -static heap_t *_memory_heaps[HEAP_ARRAY_SIZE]; -//! Used to restrict access to mapping memory for huge pages -static atomic32_t _memory_global_lock; -//! Orphaned heaps -static heap_t *_memory_orphan_heaps; -#if RPMALLOC_FIRST_CLASS_HEAPS -//! Orphaned heaps (first class heaps) -static heap_t *_memory_first_class_orphan_heaps; -#endif -#if ENABLE_STATISTICS -//! Allocations counter -static atomic64_t _allocation_counter; -//! Deallocations counter -static atomic64_t _deallocation_counter; -//! Active heap count -static atomic32_t _memory_active_heaps; -//! Number of currently mapped memory pages -static atomic32_t _mapped_pages; -//! Peak number of concurrently mapped memory pages -static int32_t _mapped_pages_peak; -//! Number of mapped master spans -static atomic32_t _master_spans; -//! Number of unmapped dangling master spans -static atomic32_t _unmapped_master_spans; -//! Running counter of total number of mapped memory pages since start -static atomic32_t _mapped_total; -//! Running counter of total number of unmapped memory pages since start -static atomic32_t _unmapped_total; -//! Number of currently mapped memory pages in OS calls -static atomic32_t _mapped_pages_os; -//! Number of currently allocated pages in huge allocations -static atomic32_t _huge_pages_current; -//! Peak number of currently allocated pages in huge allocations -static int32_t _huge_pages_peak; -#endif - -//////////// -/// -/// Thread local heap and ID -/// -////// - -//! Current thread heap -#if ((defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD) || \ - defined(__TINYC__) -static pthread_key_t _memory_thread_heap; -#else -#ifdef _MSC_VER -#define _Thread_local __declspec(thread) -#define TLS_MODEL -#else -#ifndef __HAIKU__ -#define TLS_MODEL __attribute__((tls_model("initial-exec"))) -#else -#define TLS_MODEL -#endif -#if !defined(__clang__) && defined(__GNUC__) -#define _Thread_local __thread -#endif -#endif -static _Thread_local heap_t *_memory_thread_heap TLS_MODEL; -#endif - -static inline heap_t *get_thread_heap_raw(void) { -#if (defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD - return pthread_getspecific(_memory_thread_heap); -#else - return _memory_thread_heap; -#endif -} - -//! Get the current thread heap -static inline heap_t *get_thread_heap(void) { - heap_t *heap = get_thread_heap_raw(); -#if ENABLE_PRELOAD - if (EXPECTED(heap != 0)) - return heap; - rpmalloc_initialize(); - return get_thread_heap_raw(); -#else - return heap; -#endif -} - -//! Fast thread ID -static inline uintptr_t get_thread_id(void) { -#if defined(_WIN32) - return (uintptr_t)((void *)NtCurrentTeb()); -#elif (defined(__GNUC__) || defined(__clang__)) && !defined(__CYGWIN__) - uintptr_t tid; -#if defined(__i386__) - __asm__("movl %%gs:0, %0" : "=r"(tid) : :); -#elif defined(__x86_64__) -#if defined(__MACH__) - __asm__("movq %%gs:0, %0" : "=r"(tid) : :); -#else - __asm__("movq %%fs:0, %0" : "=r"(tid) : :); -#endif -#elif defined(__arm__) - __asm__ volatile("mrc p15, 0, %0, c13, c0, 3" : "=r"(tid)); -#elif defined(__aarch64__) -#if defined(__MACH__) - // tpidr_el0 likely unused, always return 0 on iOS - __asm__ volatile("mrs %0, tpidrro_el0" : "=r"(tid)); -#else - __asm__ volatile("mrs %0, tpidr_el0" : "=r"(tid)); -#endif -#else -#error This platform needs implementation of get_thread_id() -#endif - return tid; -#else -#error This platform needs implementation of get_thread_id() -#endif -} - -//! Set the current thread heap -static void set_thread_heap(heap_t *heap) { -#if ((defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD) || \ - defined(__TINYC__) - pthread_setspecific(_memory_thread_heap, heap); -#else - _memory_thread_heap = heap; -#endif - if (heap) - heap->owner_thread = get_thread_id(); -} - -//! Set main thread ID -extern void rpmalloc_set_main_thread(void); - -void rpmalloc_set_main_thread(void) { - _rpmalloc_main_thread_id = get_thread_id(); -} - -static void _rpmalloc_spin(void) { -#if defined(_MSC_VER) -#if defined(_M_ARM64) - __yield(); -#else - _mm_pause(); -#endif -#elif defined(__x86_64__) || defined(__i386__) - __asm__ volatile("pause" ::: "memory"); -#elif defined(__aarch64__) || (defined(__arm__) && __ARM_ARCH >= 7) - __asm__ volatile("yield" ::: "memory"); -#elif defined(__powerpc__) || defined(__powerpc64__) - // No idea if ever been compiled in such archs but ... as precaution - __asm__ volatile("or 27,27,27"); -#elif defined(__sparc__) - __asm__ volatile("rd %ccr, %g0 \n\trd %ccr, %g0 \n\trd %ccr, %g0"); -#else - struct timespec ts = {0}; - nanosleep(&ts, 0); -#endif -} - -#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) -static void NTAPI _rpmalloc_thread_destructor(void *value) { -#if ENABLE_OVERRIDE - // If this is called on main thread it means rpmalloc_finalize - // has not been called and shutdown is forced (through _exit) or unclean - if (get_thread_id() == _rpmalloc_main_thread_id) - return; -#endif - if (value) - rpmalloc_thread_finalize(1); -} -#endif - -//////////// -/// -/// Low level memory map/unmap -/// -////// - -static void _rpmalloc_set_name(void *address, size_t size) { -#if defined(__linux__) || defined(__ANDROID__) - const char *name = _memory_huge_pages ? _memory_config.huge_page_name - : _memory_config.page_name; - if (address == MAP_FAILED || !name) - return; - // If the kernel does not support CONFIG_ANON_VMA_NAME or if the call fails - // (e.g. invalid name) it is a no-op basically. - (void)prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, (uintptr_t)address, size, - (uintptr_t)name); -#else - (void)sizeof(size); - (void)sizeof(address); -#endif -} - -//! Map more virtual memory -// size is number of bytes to map -// offset receives the offset in bytes from start of mapped region -// returns address to start of mapped region to use -static void *_rpmalloc_mmap(size_t size, size_t *offset) { - rpmalloc_assert(!(size % _memory_page_size), "Invalid mmap size"); - rpmalloc_assert(size >= _memory_page_size, "Invalid mmap size"); - void *address = _memory_config.memory_map(size, offset); - if (EXPECTED(address != 0)) { - _rpmalloc_stat_add_peak(&_mapped_pages, (size >> _memory_page_size_shift), - _mapped_pages_peak); - _rpmalloc_stat_add(&_mapped_total, (size >> _memory_page_size_shift)); - } - return address; -} - -//! Unmap virtual memory -// address is the memory address to unmap, as returned from _memory_map -// size is the number of bytes to unmap, which might be less than full region -// for a partial unmap offset is the offset in bytes to the actual mapped -// region, as set by _memory_map release is set to 0 for partial unmap, or size -// of entire range for a full unmap -static void _rpmalloc_unmap(void *address, size_t size, size_t offset, - size_t release) { - rpmalloc_assert(!release || (release >= size), "Invalid unmap size"); - rpmalloc_assert(!release || (release >= _memory_page_size), - "Invalid unmap size"); - if (release) { - rpmalloc_assert(!(release % _memory_page_size), "Invalid unmap size"); - _rpmalloc_stat_sub(&_mapped_pages, (release >> _memory_page_size_shift)); - _rpmalloc_stat_add(&_unmapped_total, (release >> _memory_page_size_shift)); - } - _memory_config.memory_unmap(address, size, offset, release); -} - -//! Default implementation to map new pages to virtual memory -static void *_rpmalloc_mmap_os(size_t size, size_t *offset) { - // Either size is a heap (a single page) or a (multiple) span - we only need - // to align spans, and only if larger than map granularity - size_t padding = ((size >= _memory_span_size) && - (_memory_span_size > _memory_map_granularity)) - ? _memory_span_size - : 0; - rpmalloc_assert(size >= _memory_page_size, "Invalid mmap size"); -#if PLATFORM_WINDOWS - // Ok to MEM_COMMIT - according to MSDN, "actual physical pages are not - // allocated unless/until the virtual addresses are actually accessed" - void *ptr = VirtualAlloc(0, size + padding, - (_memory_huge_pages ? MEM_LARGE_PAGES : 0) | - MEM_RESERVE | MEM_COMMIT, - PAGE_READWRITE); - if (!ptr) { - if (_memory_config.map_fail_callback) { - if (_memory_config.map_fail_callback(size + padding)) - return _rpmalloc_mmap_os(size, offset); - } else { - rpmalloc_assert(ptr, "Failed to map virtual memory block"); - } - return 0; - } -#else - int flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_UNINITIALIZED; -#if defined(__APPLE__) && !TARGET_OS_IPHONE && !TARGET_OS_SIMULATOR - int fd = (int)VM_MAKE_TAG(240U); - if (_memory_huge_pages) - fd |= VM_FLAGS_SUPERPAGE_SIZE_2MB; - void *ptr = mmap(0, size + padding, PROT_READ | PROT_WRITE, flags, fd, 0); -#elif defined(MAP_HUGETLB) - void *ptr = mmap(0, size + padding, - PROT_READ | PROT_WRITE | PROT_MAX(PROT_READ | PROT_WRITE), - (_memory_huge_pages ? MAP_HUGETLB : 0) | flags, -1, 0); -#if defined(MADV_HUGEPAGE) - // In some configurations, huge pages allocations might fail thus - // we fallback to normal allocations and promote the region as transparent - // huge page - if ((ptr == MAP_FAILED || !ptr) && _memory_huge_pages) { - ptr = mmap(0, size + padding, PROT_READ | PROT_WRITE, flags, -1, 0); - if (ptr && ptr != MAP_FAILED) { - int prm = madvise(ptr, size + padding, MADV_HUGEPAGE); - (void)prm; - rpmalloc_assert((prm == 0), "Failed to promote the page to THP"); - } - } -#endif - _rpmalloc_set_name(ptr, size + padding); -#elif defined(MAP_ALIGNED) - const size_t align = - (sizeof(size_t) * 8) - (size_t)(__builtin_clzl(size - 1)); - void *ptr = - mmap(0, size + padding, PROT_READ | PROT_WRITE, - (_memory_huge_pages ? MAP_ALIGNED(align) : 0) | flags, -1, 0); -#elif defined(MAP_ALIGN) - caddr_t base = (_memory_huge_pages ? (caddr_t)(4 << 20) : 0); - void *ptr = mmap(base, size + padding, PROT_READ | PROT_WRITE, - (_memory_huge_pages ? MAP_ALIGN : 0) | flags, -1, 0); -#else - void *ptr = mmap(0, size + padding, PROT_READ | PROT_WRITE, flags, -1, 0); -#endif - if ((ptr == MAP_FAILED) || !ptr) { - if (_memory_config.map_fail_callback) { - if (_memory_config.map_fail_callback(size + padding)) - return _rpmalloc_mmap_os(size, offset); - } else if (errno != ENOMEM) { - rpmalloc_assert((ptr != MAP_FAILED) && ptr, - "Failed to map virtual memory block"); - } - return 0; - } -#endif - _rpmalloc_stat_add(&_mapped_pages_os, - (int32_t)((size + padding) >> _memory_page_size_shift)); - if (padding) { - size_t final_padding = padding - ((uintptr_t)ptr & ~_memory_span_mask); - rpmalloc_assert(final_padding <= _memory_span_size, - "Internal failure in padding"); - rpmalloc_assert(final_padding <= padding, "Internal failure in padding"); - rpmalloc_assert(!(final_padding % 8), "Internal failure in padding"); - ptr = pointer_offset(ptr, final_padding); - *offset = final_padding >> 3; - } - rpmalloc_assert((size < _memory_span_size) || - !((uintptr_t)ptr & ~_memory_span_mask), - "Internal failure in padding"); - return ptr; -} - -//! Default implementation to unmap pages from virtual memory -static void _rpmalloc_unmap_os(void *address, size_t size, size_t offset, - size_t release) { - rpmalloc_assert(release || (offset == 0), "Invalid unmap size"); - rpmalloc_assert(!release || (release >= _memory_page_size), - "Invalid unmap size"); - rpmalloc_assert(size >= _memory_page_size, "Invalid unmap size"); - if (release && offset) { - offset <<= 3; - address = pointer_offset(address, -(int32_t)offset); - if ((release >= _memory_span_size) && - (_memory_span_size > _memory_map_granularity)) { - // Padding is always one span size - release += _memory_span_size; - } - } -#if !DISABLE_UNMAP -#if PLATFORM_WINDOWS - if (!VirtualFree(address, release ? 0 : size, - release ? MEM_RELEASE : MEM_DECOMMIT)) { - rpmalloc_assert(0, "Failed to unmap virtual memory block"); - } -#else - if (release) { - if (munmap(address, release)) { - rpmalloc_assert(0, "Failed to unmap virtual memory block"); - } - } else { -#if defined(MADV_FREE_REUSABLE) - int ret; - while ((ret = madvise(address, size, MADV_FREE_REUSABLE)) == -1 && - (errno == EAGAIN)) - errno = 0; - if ((ret == -1) && (errno != 0)) { -#elif defined(MADV_DONTNEED) - if (madvise(address, size, MADV_DONTNEED)) { -#elif defined(MADV_PAGEOUT) - if (madvise(address, size, MADV_PAGEOUT)) { -#elif defined(MADV_FREE) - if (madvise(address, size, MADV_FREE)) { -#else - if (posix_madvise(address, size, POSIX_MADV_DONTNEED)) { -#endif - rpmalloc_assert(0, "Failed to madvise virtual memory block as free"); - } - } -#endif -#endif - if (release) - _rpmalloc_stat_sub(&_mapped_pages_os, release >> _memory_page_size_shift); -} - -static void _rpmalloc_span_mark_as_subspan_unless_master(span_t *master, - span_t *subspan, - size_t span_count); - -//! Use global reserved spans to fulfill a memory map request (reserve size must -//! be checked by caller) -static span_t *_rpmalloc_global_get_reserved_spans(size_t span_count) { - span_t *span = _memory_global_reserve; - _rpmalloc_span_mark_as_subspan_unless_master(_memory_global_reserve_master, - span, span_count); - _memory_global_reserve_count -= span_count; - if (_memory_global_reserve_count) - _memory_global_reserve = - (span_t *)pointer_offset(span, span_count << _memory_span_size_shift); - else - _memory_global_reserve = 0; - return span; -} - -//! Store the given spans as global reserve (must only be called from within new -//! heap allocation, not thread safe) -static void _rpmalloc_global_set_reserved_spans(span_t *master, span_t *reserve, - size_t reserve_span_count) { - _memory_global_reserve_master = master; - _memory_global_reserve_count = reserve_span_count; - _memory_global_reserve = reserve; -} - -//////////// -/// -/// Span linked list management -/// -////// - -//! Add a span to double linked list at the head -static void _rpmalloc_span_double_link_list_add(span_t **head, span_t *span) { - if (*head) - (*head)->prev = span; - span->next = *head; - *head = span; -} - -//! Pop head span from double linked list -static void _rpmalloc_span_double_link_list_pop_head(span_t **head, - span_t *span) { - rpmalloc_assert(*head == span, "Linked list corrupted"); - span = *head; - *head = span->next; -} - -//! Remove a span from double linked list -static void _rpmalloc_span_double_link_list_remove(span_t **head, - span_t *span) { - rpmalloc_assert(*head, "Linked list corrupted"); - if (*head == span) { - *head = span->next; - } else { - span_t *next_span = span->next; - span_t *prev_span = span->prev; - prev_span->next = next_span; - if (EXPECTED(next_span != 0)) - next_span->prev = prev_span; - } -} - -//////////// -/// -/// Span control -/// -////// - -static void _rpmalloc_heap_cache_insert(heap_t *heap, span_t *span); - -static void _rpmalloc_heap_finalize(heap_t *heap); - -static void _rpmalloc_heap_set_reserved_spans(heap_t *heap, span_t *master, - span_t *reserve, - size_t reserve_span_count); - -//! Declare the span to be a subspan and store distance from master span and -//! span count -static void _rpmalloc_span_mark_as_subspan_unless_master(span_t *master, - span_t *subspan, - size_t span_count) { - rpmalloc_assert((subspan != master) || (subspan->flags & SPAN_FLAG_MASTER), - "Span master pointer and/or flag mismatch"); - if (subspan != master) { - subspan->flags = SPAN_FLAG_SUBSPAN; - subspan->offset_from_master = - (uint32_t)((uintptr_t)pointer_diff(subspan, master) >> - _memory_span_size_shift); - subspan->align_offset = 0; - } - subspan->span_count = (uint32_t)span_count; -} - -//! Use reserved spans to fulfill a memory map request (reserve size must be -//! checked by caller) -static span_t *_rpmalloc_span_map_from_reserve(heap_t *heap, - size_t span_count) { - // Update the heap span reserve - span_t *span = heap->span_reserve; - heap->span_reserve = - (span_t *)pointer_offset(span, span_count * _memory_span_size); - heap->spans_reserved -= (uint32_t)span_count; - - _rpmalloc_span_mark_as_subspan_unless_master(heap->span_reserve_master, span, - span_count); - if (span_count <= LARGE_CLASS_COUNT) - _rpmalloc_stat_inc(&heap->span_use[span_count - 1].spans_from_reserved); - - return span; -} - -//! Get the aligned number of spans to map in based on wanted count, configured -//! mapping granularity and the page size -static size_t _rpmalloc_span_align_count(size_t span_count) { - size_t request_count = (span_count > _memory_span_map_count) - ? span_count - : _memory_span_map_count; - if ((_memory_page_size > _memory_span_size) && - ((request_count * _memory_span_size) % _memory_page_size)) - request_count += - _memory_span_map_count - (request_count % _memory_span_map_count); - return request_count; -} - -//! Setup a newly mapped span -static void _rpmalloc_span_initialize(span_t *span, size_t total_span_count, - size_t span_count, size_t align_offset) { - span->total_spans = (uint32_t)total_span_count; - span->span_count = (uint32_t)span_count; - span->align_offset = (uint32_t)align_offset; - span->flags = SPAN_FLAG_MASTER; - atomic_store32(&span->remaining_spans, (int32_t)total_span_count); -} - -static void _rpmalloc_span_unmap(span_t *span); - -//! Map an aligned set of spans, taking configured mapping granularity and the -//! page size into account -static span_t *_rpmalloc_span_map_aligned_count(heap_t *heap, - size_t span_count) { - // If we already have some, but not enough, reserved spans, release those to - // heap cache and map a new full set of spans. Otherwise we would waste memory - // if page size > span size (huge pages) - size_t aligned_span_count = _rpmalloc_span_align_count(span_count); - size_t align_offset = 0; - span_t *span = (span_t *)_rpmalloc_mmap( - aligned_span_count * _memory_span_size, &align_offset); - if (!span) - return 0; - _rpmalloc_span_initialize(span, aligned_span_count, span_count, align_offset); - _rpmalloc_stat_inc(&_master_spans); - if (span_count <= LARGE_CLASS_COUNT) - _rpmalloc_stat_inc(&heap->span_use[span_count - 1].spans_map_calls); - if (aligned_span_count > span_count) { - span_t *reserved_spans = - (span_t *)pointer_offset(span, span_count * _memory_span_size); - size_t reserved_count = aligned_span_count - span_count; - if (heap->spans_reserved) { - _rpmalloc_span_mark_as_subspan_unless_master( - heap->span_reserve_master, heap->span_reserve, heap->spans_reserved); - _rpmalloc_heap_cache_insert(heap, heap->span_reserve); - } - if (reserved_count > _memory_heap_reserve_count) { - // If huge pages or eager spam map count, the global reserve spin lock is - // held by caller, _rpmalloc_span_map - rpmalloc_assert(atomic_load32(&_memory_global_lock) == 1, - "Global spin lock not held as expected"); - size_t remain_count = reserved_count - _memory_heap_reserve_count; - reserved_count = _memory_heap_reserve_count; - span_t *remain_span = (span_t *)pointer_offset( - reserved_spans, reserved_count * _memory_span_size); - if (_memory_global_reserve) { - _rpmalloc_span_mark_as_subspan_unless_master( - _memory_global_reserve_master, _memory_global_reserve, - _memory_global_reserve_count); - _rpmalloc_span_unmap(_memory_global_reserve); - } - _rpmalloc_global_set_reserved_spans(span, remain_span, remain_count); - } - _rpmalloc_heap_set_reserved_spans(heap, span, reserved_spans, - reserved_count); - } - return span; -} - -//! Map in memory pages for the given number of spans (or use previously -//! reserved pages) -static span_t *_rpmalloc_span_map(heap_t *heap, size_t span_count) { - if (span_count <= heap->spans_reserved) - return _rpmalloc_span_map_from_reserve(heap, span_count); - span_t *span = 0; - int use_global_reserve = - (_memory_page_size > _memory_span_size) || - (_memory_span_map_count > _memory_heap_reserve_count); - if (use_global_reserve) { - // If huge pages, make sure only one thread maps more memory to avoid bloat - while (!atomic_cas32_acquire(&_memory_global_lock, 1, 0)) - _rpmalloc_spin(); - if (_memory_global_reserve_count >= span_count) { - size_t reserve_count = - (!heap->spans_reserved ? _memory_heap_reserve_count : span_count); - if (_memory_global_reserve_count < reserve_count) - reserve_count = _memory_global_reserve_count; - span = _rpmalloc_global_get_reserved_spans(reserve_count); - if (span) { - if (reserve_count > span_count) { - span_t *reserved_span = (span_t *)pointer_offset( - span, span_count << _memory_span_size_shift); - _rpmalloc_heap_set_reserved_spans(heap, _memory_global_reserve_master, - reserved_span, - reserve_count - span_count); - } - // Already marked as subspan in _rpmalloc_global_get_reserved_spans - span->span_count = (uint32_t)span_count; - } - } - } - if (!span) - span = _rpmalloc_span_map_aligned_count(heap, span_count); - if (use_global_reserve) - atomic_store32_release(&_memory_global_lock, 0); - return span; -} - -//! Unmap memory pages for the given number of spans (or mark as unused if no -//! partial unmappings) -static void _rpmalloc_span_unmap(span_t *span) { - rpmalloc_assert((span->flags & SPAN_FLAG_MASTER) || - (span->flags & SPAN_FLAG_SUBSPAN), - "Span flag corrupted"); - rpmalloc_assert(!(span->flags & SPAN_FLAG_MASTER) || - !(span->flags & SPAN_FLAG_SUBSPAN), - "Span flag corrupted"); - - int is_master = !!(span->flags & SPAN_FLAG_MASTER); - span_t *master = - is_master ? span - : ((span_t *)pointer_offset( - span, -(intptr_t)((uintptr_t)span->offset_from_master * - _memory_span_size))); - rpmalloc_assert(is_master || (span->flags & SPAN_FLAG_SUBSPAN), - "Span flag corrupted"); - rpmalloc_assert(master->flags & SPAN_FLAG_MASTER, "Span flag corrupted"); - - size_t span_count = span->span_count; - if (!is_master) { - // Directly unmap subspans (unless huge pages, in which case we defer and - // unmap entire page range with master) - rpmalloc_assert(span->align_offset == 0, "Span align offset corrupted"); - if (_memory_span_size >= _memory_page_size) - _rpmalloc_unmap(span, span_count * _memory_span_size, 0, 0); - } else { - // Special double flag to denote an unmapped master - // It must be kept in memory since span header must be used - span->flags |= - SPAN_FLAG_MASTER | SPAN_FLAG_SUBSPAN | SPAN_FLAG_UNMAPPED_MASTER; - _rpmalloc_stat_add(&_unmapped_master_spans, 1); - } - - if (atomic_add32(&master->remaining_spans, -(int32_t)span_count) <= 0) { - // Everything unmapped, unmap the master span with release flag to unmap the - // entire range of the super span - rpmalloc_assert(!!(master->flags & SPAN_FLAG_MASTER) && - !!(master->flags & SPAN_FLAG_SUBSPAN), - "Span flag corrupted"); - size_t unmap_count = master->span_count; - if (_memory_span_size < _memory_page_size) - unmap_count = master->total_spans; - _rpmalloc_stat_sub(&_master_spans, 1); - _rpmalloc_stat_sub(&_unmapped_master_spans, 1); - _rpmalloc_unmap(master, unmap_count * _memory_span_size, - master->align_offset, - (size_t)master->total_spans * _memory_span_size); - } -} - -//! Move the span (used for small or medium allocations) to the heap thread -//! cache -static void _rpmalloc_span_release_to_cache(heap_t *heap, span_t *span) { - rpmalloc_assert(heap == span->heap, "Span heap pointer corrupted"); - rpmalloc_assert(span->size_class < SIZE_CLASS_COUNT, - "Invalid span size class"); - rpmalloc_assert(span->span_count == 1, "Invalid span count"); -#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS - atomic_decr32(&heap->span_use[0].current); -#endif - _rpmalloc_stat_dec(&heap->size_class_use[span->size_class].spans_current); - if (!heap->finalize) { - _rpmalloc_stat_inc(&heap->span_use[0].spans_to_cache); - _rpmalloc_stat_inc(&heap->size_class_use[span->size_class].spans_to_cache); - if (heap->size_class[span->size_class].cache) - _rpmalloc_heap_cache_insert(heap, - heap->size_class[span->size_class].cache); - heap->size_class[span->size_class].cache = span; - } else { - _rpmalloc_span_unmap(span); - } -} - -//! Initialize a (partial) free list up to next system memory page, while -//! reserving the first block as allocated, returning number of blocks in list -static uint32_t free_list_partial_init(void **list, void **first_block, - void *page_start, void *block_start, - uint32_t block_count, - uint32_t block_size) { - rpmalloc_assert(block_count, "Internal failure"); - *first_block = block_start; - if (block_count > 1) { - void *free_block = pointer_offset(block_start, block_size); - void *block_end = - pointer_offset(block_start, (size_t)block_size * block_count); - // If block size is less than half a memory page, bound init to next memory - // page boundary - if (block_size < (_memory_page_size >> 1)) { - void *page_end = pointer_offset(page_start, _memory_page_size); - if (page_end < block_end) - block_end = page_end; - } - *list = free_block; - block_count = 2; - void *next_block = pointer_offset(free_block, block_size); - while (next_block < block_end) { - *((void **)free_block) = next_block; - free_block = next_block; - ++block_count; - next_block = pointer_offset(next_block, block_size); - } - *((void **)free_block) = 0; - } else { - *list = 0; - } - return block_count; -} - -//! Initialize an unused span (from cache or mapped) to be new active span, -//! putting the initial free list in heap class free list -static void *_rpmalloc_span_initialize_new(heap_t *heap, - heap_size_class_t *heap_size_class, - span_t *span, uint32_t class_idx) { - rpmalloc_assert(span->span_count == 1, "Internal failure"); - size_class_t *size_class = _memory_size_class + class_idx; - span->size_class = class_idx; - span->heap = heap; - span->flags &= ~SPAN_FLAG_ALIGNED_BLOCKS; - span->block_size = size_class->block_size; - span->block_count = size_class->block_count; - span->free_list = 0; - span->list_size = 0; - atomic_store_ptr_release(&span->free_list_deferred, 0); - - // Setup free list. Only initialize one system page worth of free blocks in - // list - void *block; - span->free_list_limit = - free_list_partial_init(&heap_size_class->free_list, &block, span, - pointer_offset(span, SPAN_HEADER_SIZE), - size_class->block_count, size_class->block_size); - // Link span as partial if there remains blocks to be initialized as free - // list, or full if fully initialized - if (span->free_list_limit < span->block_count) { - _rpmalloc_span_double_link_list_add(&heap_size_class->partial_span, span); - span->used_count = span->free_list_limit; - } else { -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_add(&heap->full_span[class_idx], span); -#endif - ++heap->full_span_count; - span->used_count = span->block_count; - } - return block; -} - -static void _rpmalloc_span_extract_free_list_deferred(span_t *span) { - // We need acquire semantics on the CAS operation since we are interested in - // the list size Refer to _rpmalloc_deallocate_defer_small_or_medium for - // further comments on this dependency - do { - span->free_list = - atomic_exchange_ptr_acquire(&span->free_list_deferred, INVALID_POINTER); - } while (span->free_list == INVALID_POINTER); - span->used_count -= span->list_size; - span->list_size = 0; - atomic_store_ptr_release(&span->free_list_deferred, 0); -} - -static int _rpmalloc_span_is_fully_utilized(span_t *span) { - rpmalloc_assert(span->free_list_limit <= span->block_count, - "Span free list corrupted"); - return !span->free_list && (span->free_list_limit >= span->block_count); -} - -static int _rpmalloc_span_finalize(heap_t *heap, size_t iclass, span_t *span, - span_t **list_head) { - void *free_list = heap->size_class[iclass].free_list; - span_t *class_span = (span_t *)((uintptr_t)free_list & _memory_span_mask); - if (span == class_span) { - // Adopt the heap class free list back into the span free list - void *block = span->free_list; - void *last_block = 0; - while (block) { - last_block = block; - block = *((void **)block); - } - uint32_t free_count = 0; - block = free_list; - while (block) { - ++free_count; - block = *((void **)block); - } - if (last_block) { - *((void **)last_block) = free_list; - } else { - span->free_list = free_list; - } - heap->size_class[iclass].free_list = 0; - span->used_count -= free_count; - } - // If this assert triggers you have memory leaks - rpmalloc_assert(span->list_size == span->used_count, "Memory leak detected"); - if (span->list_size == span->used_count) { - _rpmalloc_stat_dec(&heap->span_use[0].current); - _rpmalloc_stat_dec(&heap->size_class_use[iclass].spans_current); - // This function only used for spans in double linked lists - if (list_head) - _rpmalloc_span_double_link_list_remove(list_head, span); - _rpmalloc_span_unmap(span); - return 1; - } - return 0; -} - -//////////// -/// -/// Global cache -/// -////// - -#if ENABLE_GLOBAL_CACHE - -//! Finalize a global cache -static void _rpmalloc_global_cache_finalize(global_cache_t *cache) { - while (!atomic_cas32_acquire(&cache->lock, 1, 0)) - _rpmalloc_spin(); - - for (size_t ispan = 0; ispan < cache->count; ++ispan) - _rpmalloc_span_unmap(cache->span[ispan]); - cache->count = 0; - - while (cache->overflow) { - span_t *span = cache->overflow; - cache->overflow = span->next; - _rpmalloc_span_unmap(span); - } - - atomic_store32_release(&cache->lock, 0); -} - -static void _rpmalloc_global_cache_insert_spans(span_t **span, - size_t span_count, - size_t count) { - const size_t cache_limit = - (span_count == 1) ? GLOBAL_CACHE_MULTIPLIER * MAX_THREAD_SPAN_CACHE - : GLOBAL_CACHE_MULTIPLIER * - (MAX_THREAD_SPAN_LARGE_CACHE - (span_count >> 1)); - - global_cache_t *cache = &_memory_span_cache[span_count - 1]; - - size_t insert_count = count; - while (!atomic_cas32_acquire(&cache->lock, 1, 0)) - _rpmalloc_spin(); - -#if ENABLE_STATISTICS - cache->insert_count += count; -#endif - if ((cache->count + insert_count) > cache_limit) - insert_count = cache_limit - cache->count; - - memcpy(cache->span + cache->count, span, sizeof(span_t *) * insert_count); - cache->count += (uint32_t)insert_count; - -#if ENABLE_UNLIMITED_CACHE - while (insert_count < count) { -#else - // Enable unlimited cache if huge pages, or we will leak since it is unlikely - // that an entire huge page will be unmapped, and we're unable to partially - // decommit a huge page - while ((_memory_page_size > _memory_span_size) && (insert_count < count)) { -#endif - span_t *current_span = span[insert_count++]; - current_span->next = cache->overflow; - cache->overflow = current_span; - } - atomic_store32_release(&cache->lock, 0); - - span_t *keep = 0; - for (size_t ispan = insert_count; ispan < count; ++ispan) { - span_t *current_span = span[ispan]; - // Keep master spans that has remaining subspans to avoid dangling them - if ((current_span->flags & SPAN_FLAG_MASTER) && - (atomic_load32(¤t_span->remaining_spans) > - (int32_t)current_span->span_count)) { - current_span->next = keep; - keep = current_span; - } else { - _rpmalloc_span_unmap(current_span); - } - } - - if (keep) { - while (!atomic_cas32_acquire(&cache->lock, 1, 0)) - _rpmalloc_spin(); - - size_t islot = 0; - while (keep) { - for (; islot < cache->count; ++islot) { - span_t *current_span = cache->span[islot]; - if (!(current_span->flags & SPAN_FLAG_MASTER) || - ((current_span->flags & SPAN_FLAG_MASTER) && - (atomic_load32(¤t_span->remaining_spans) <= - (int32_t)current_span->span_count))) { - _rpmalloc_span_unmap(current_span); - cache->span[islot] = keep; - break; - } - } - if (islot == cache->count) - break; - keep = keep->next; - } - - if (keep) { - span_t *tail = keep; - while (tail->next) - tail = tail->next; - tail->next = cache->overflow; - cache->overflow = keep; - } - - atomic_store32_release(&cache->lock, 0); - } -} - -static size_t _rpmalloc_global_cache_extract_spans(span_t **span, - size_t span_count, - size_t count) { - global_cache_t *cache = &_memory_span_cache[span_count - 1]; - - size_t extract_count = 0; - while (!atomic_cas32_acquire(&cache->lock, 1, 0)) - _rpmalloc_spin(); - -#if ENABLE_STATISTICS - cache->extract_count += count; -#endif - size_t want = count - extract_count; - if (want > cache->count) - want = cache->count; - - memcpy(span + extract_count, cache->span + (cache->count - want), - sizeof(span_t *) * want); - cache->count -= (uint32_t)want; - extract_count += want; - - while ((extract_count < count) && cache->overflow) { - span_t *current_span = cache->overflow; - span[extract_count++] = current_span; - cache->overflow = current_span->next; - } - -#if ENABLE_ASSERTS - for (size_t ispan = 0; ispan < extract_count; ++ispan) { - rpmalloc_assert(span[ispan]->span_count == span_count, - "Global cache span count mismatch"); - } -#endif - - atomic_store32_release(&cache->lock, 0); - - return extract_count; -} - -#endif - -//////////// -/// -/// Heap control -/// -////// - -static void _rpmalloc_deallocate_huge(span_t *); - -//! Store the given spans as reserve in the given heap -static void _rpmalloc_heap_set_reserved_spans(heap_t *heap, span_t *master, - span_t *reserve, - size_t reserve_span_count) { - heap->span_reserve_master = master; - heap->span_reserve = reserve; - heap->spans_reserved = (uint32_t)reserve_span_count; -} - -//! Adopt the deferred span cache list, optionally extracting the first single -//! span for immediate re-use -static void _rpmalloc_heap_cache_adopt_deferred(heap_t *heap, - span_t **single_span) { - span_t *span = (span_t *)((void *)atomic_exchange_ptr_acquire( - &heap->span_free_deferred, 0)); - while (span) { - span_t *next_span = (span_t *)span->free_list; - rpmalloc_assert(span->heap == heap, "Span heap pointer corrupted"); - if (EXPECTED(span->size_class < SIZE_CLASS_COUNT)) { - rpmalloc_assert(heap->full_span_count, "Heap span counter corrupted"); - --heap->full_span_count; - _rpmalloc_stat_dec(&heap->span_use[0].spans_deferred); -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_remove(&heap->full_span[span->size_class], - span); -#endif - _rpmalloc_stat_dec(&heap->span_use[0].current); - _rpmalloc_stat_dec(&heap->size_class_use[span->size_class].spans_current); - if (single_span && !*single_span) - *single_span = span; - else - _rpmalloc_heap_cache_insert(heap, span); - } else { - if (span->size_class == SIZE_CLASS_HUGE) { - _rpmalloc_deallocate_huge(span); - } else { - rpmalloc_assert(span->size_class == SIZE_CLASS_LARGE, - "Span size class invalid"); - rpmalloc_assert(heap->full_span_count, "Heap span counter corrupted"); - --heap->full_span_count; -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_remove(&heap->large_huge_span, span); -#endif - uint32_t idx = span->span_count - 1; - _rpmalloc_stat_dec(&heap->span_use[idx].spans_deferred); - _rpmalloc_stat_dec(&heap->span_use[idx].current); - if (!idx && single_span && !*single_span) - *single_span = span; - else - _rpmalloc_heap_cache_insert(heap, span); - } - } - span = next_span; - } -} - -static void _rpmalloc_heap_unmap(heap_t *heap) { - if (!heap->master_heap) { - if ((heap->finalize > 1) && !atomic_load32(&heap->child_count)) { - span_t *span = (span_t *)((uintptr_t)heap & _memory_span_mask); - _rpmalloc_span_unmap(span); - } - } else { - if (atomic_decr32(&heap->master_heap->child_count) == 0) { - _rpmalloc_heap_unmap(heap->master_heap); - } - } -} - -static void _rpmalloc_heap_global_finalize(heap_t *heap) { - if (heap->finalize++ > 1) { - --heap->finalize; - return; - } - - _rpmalloc_heap_finalize(heap); - -#if ENABLE_THREAD_CACHE - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - span_cache_t *span_cache; - if (!iclass) - span_cache = &heap->span_cache; - else - span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); - for (size_t ispan = 0; ispan < span_cache->count; ++ispan) - _rpmalloc_span_unmap(span_cache->span[ispan]); - span_cache->count = 0; - } -#endif - - if (heap->full_span_count) { - --heap->finalize; - return; - } - - for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { - if (heap->size_class[iclass].free_list || - heap->size_class[iclass].partial_span) { - --heap->finalize; - return; - } - } - // Heap is now completely free, unmap and remove from heap list - size_t list_idx = (size_t)heap->id % HEAP_ARRAY_SIZE; - heap_t *list_heap = _memory_heaps[list_idx]; - if (list_heap == heap) { - _memory_heaps[list_idx] = heap->next_heap; - } else { - while (list_heap->next_heap != heap) - list_heap = list_heap->next_heap; - list_heap->next_heap = heap->next_heap; - } - - _rpmalloc_heap_unmap(heap); -} - -//! Insert a single span into thread heap cache, releasing to global cache if -//! overflow -static void _rpmalloc_heap_cache_insert(heap_t *heap, span_t *span) { - if (UNEXPECTED(heap->finalize != 0)) { - _rpmalloc_span_unmap(span); - _rpmalloc_heap_global_finalize(heap); - return; - } -#if ENABLE_THREAD_CACHE - size_t span_count = span->span_count; - _rpmalloc_stat_inc(&heap->span_use[span_count - 1].spans_to_cache); - if (span_count == 1) { - span_cache_t *span_cache = &heap->span_cache; - span_cache->span[span_cache->count++] = span; - if (span_cache->count == MAX_THREAD_SPAN_CACHE) { - const size_t remain_count = - MAX_THREAD_SPAN_CACHE - THREAD_SPAN_CACHE_TRANSFER; -#if ENABLE_GLOBAL_CACHE - _rpmalloc_stat_add64(&heap->thread_to_global, - THREAD_SPAN_CACHE_TRANSFER * _memory_span_size); - _rpmalloc_stat_add(&heap->span_use[span_count - 1].spans_to_global, - THREAD_SPAN_CACHE_TRANSFER); - _rpmalloc_global_cache_insert_spans(span_cache->span + remain_count, - span_count, - THREAD_SPAN_CACHE_TRANSFER); -#else - for (size_t ispan = 0; ispan < THREAD_SPAN_CACHE_TRANSFER; ++ispan) - _rpmalloc_span_unmap(span_cache->span[remain_count + ispan]); -#endif - span_cache->count = remain_count; - } - } else { - size_t cache_idx = span_count - 2; - span_large_cache_t *span_cache = heap->span_large_cache + cache_idx; - span_cache->span[span_cache->count++] = span; - const size_t cache_limit = - (MAX_THREAD_SPAN_LARGE_CACHE - (span_count >> 1)); - if (span_cache->count == cache_limit) { - const size_t transfer_limit = 2 + (cache_limit >> 2); - const size_t transfer_count = - (THREAD_SPAN_LARGE_CACHE_TRANSFER <= transfer_limit - ? THREAD_SPAN_LARGE_CACHE_TRANSFER - : transfer_limit); - const size_t remain_count = cache_limit - transfer_count; -#if ENABLE_GLOBAL_CACHE - _rpmalloc_stat_add64(&heap->thread_to_global, - transfer_count * span_count * _memory_span_size); - _rpmalloc_stat_add(&heap->span_use[span_count - 1].spans_to_global, - transfer_count); - _rpmalloc_global_cache_insert_spans(span_cache->span + remain_count, - span_count, transfer_count); -#else - for (size_t ispan = 0; ispan < transfer_count; ++ispan) - _rpmalloc_span_unmap(span_cache->span[remain_count + ispan]); -#endif - span_cache->count = remain_count; - } - } -#else - (void)sizeof(heap); - _rpmalloc_span_unmap(span); -#endif -} - -//! Extract the given number of spans from the different cache levels -static span_t *_rpmalloc_heap_thread_cache_extract(heap_t *heap, - size_t span_count) { - span_t *span = 0; -#if ENABLE_THREAD_CACHE - span_cache_t *span_cache; - if (span_count == 1) - span_cache = &heap->span_cache; - else - span_cache = (span_cache_t *)(heap->span_large_cache + (span_count - 2)); - if (span_cache->count) { - _rpmalloc_stat_inc(&heap->span_use[span_count - 1].spans_from_cache); - return span_cache->span[--span_cache->count]; - } -#endif - return span; -} - -static span_t *_rpmalloc_heap_thread_cache_deferred_extract(heap_t *heap, - size_t span_count) { - span_t *span = 0; - if (span_count == 1) { - _rpmalloc_heap_cache_adopt_deferred(heap, &span); - } else { - _rpmalloc_heap_cache_adopt_deferred(heap, 0); - span = _rpmalloc_heap_thread_cache_extract(heap, span_count); - } - return span; -} - -static span_t *_rpmalloc_heap_reserved_extract(heap_t *heap, - size_t span_count) { - if (heap->spans_reserved >= span_count) - return _rpmalloc_span_map(heap, span_count); - return 0; -} - -//! Extract a span from the global cache -static span_t *_rpmalloc_heap_global_cache_extract(heap_t *heap, - size_t span_count) { -#if ENABLE_GLOBAL_CACHE -#if ENABLE_THREAD_CACHE - span_cache_t *span_cache; - size_t wanted_count; - if (span_count == 1) { - span_cache = &heap->span_cache; - wanted_count = THREAD_SPAN_CACHE_TRANSFER; - } else { - span_cache = (span_cache_t *)(heap->span_large_cache + (span_count - 2)); - wanted_count = THREAD_SPAN_LARGE_CACHE_TRANSFER; - } - span_cache->count = _rpmalloc_global_cache_extract_spans( - span_cache->span, span_count, wanted_count); - if (span_cache->count) { - _rpmalloc_stat_add64(&heap->global_to_thread, - span_count * span_cache->count * _memory_span_size); - _rpmalloc_stat_add(&heap->span_use[span_count - 1].spans_from_global, - span_cache->count); - return span_cache->span[--span_cache->count]; - } -#else - span_t *span = 0; - size_t count = _rpmalloc_global_cache_extract_spans(&span, span_count, 1); - if (count) { - _rpmalloc_stat_add64(&heap->global_to_thread, - span_count * count * _memory_span_size); - _rpmalloc_stat_add(&heap->span_use[span_count - 1].spans_from_global, - count); - return span; - } -#endif -#endif - (void)sizeof(heap); - (void)sizeof(span_count); - return 0; -} - -static void _rpmalloc_inc_span_statistics(heap_t *heap, size_t span_count, - uint32_t class_idx) { - (void)sizeof(heap); - (void)sizeof(span_count); - (void)sizeof(class_idx); -#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS - uint32_t idx = (uint32_t)span_count - 1; - uint32_t current_count = - (uint32_t)atomic_incr32(&heap->span_use[idx].current); - if (current_count > (uint32_t)atomic_load32(&heap->span_use[idx].high)) - atomic_store32(&heap->span_use[idx].high, (int32_t)current_count); - _rpmalloc_stat_add_peak(&heap->size_class_use[class_idx].spans_current, 1, - heap->size_class_use[class_idx].spans_peak); -#endif -} - -//! Get a span from one of the cache levels (thread cache, reserved, global -//! cache) or fallback to mapping more memory -static span_t * -_rpmalloc_heap_extract_new_span(heap_t *heap, - heap_size_class_t *heap_size_class, - size_t span_count, uint32_t class_idx) { - span_t *span; -#if ENABLE_THREAD_CACHE - if (heap_size_class && heap_size_class->cache) { - span = heap_size_class->cache; - heap_size_class->cache = - (heap->span_cache.count - ? heap->span_cache.span[--heap->span_cache.count] - : 0); - _rpmalloc_inc_span_statistics(heap, span_count, class_idx); - return span; - } -#endif - (void)sizeof(class_idx); - // Allow 50% overhead to increase cache hits - size_t base_span_count = span_count; - size_t limit_span_count = - (span_count > 2) ? (span_count + (span_count >> 1)) : span_count; - if (limit_span_count > LARGE_CLASS_COUNT) - limit_span_count = LARGE_CLASS_COUNT; - do { - span = _rpmalloc_heap_thread_cache_extract(heap, span_count); - if (EXPECTED(span != 0)) { - _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_from_cache); - _rpmalloc_inc_span_statistics(heap, span_count, class_idx); - return span; - } - span = _rpmalloc_heap_thread_cache_deferred_extract(heap, span_count); - if (EXPECTED(span != 0)) { - _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_from_cache); - _rpmalloc_inc_span_statistics(heap, span_count, class_idx); - return span; - } - span = _rpmalloc_heap_global_cache_extract(heap, span_count); - if (EXPECTED(span != 0)) { - _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_from_cache); - _rpmalloc_inc_span_statistics(heap, span_count, class_idx); - return span; - } - span = _rpmalloc_heap_reserved_extract(heap, span_count); - if (EXPECTED(span != 0)) { - _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_from_reserved); - _rpmalloc_inc_span_statistics(heap, span_count, class_idx); - return span; - } - ++span_count; - } while (span_count <= limit_span_count); - // Final fallback, map in more virtual memory - span = _rpmalloc_span_map(heap, base_span_count); - _rpmalloc_inc_span_statistics(heap, base_span_count, class_idx); - _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_map_calls); - return span; -} - -static void _rpmalloc_heap_initialize(heap_t *heap) { - _rpmalloc_memset_const(heap, 0, sizeof(heap_t)); - // Get a new heap ID - heap->id = 1 + atomic_incr32(&_memory_heap_id); - - // Link in heap in heap ID map - size_t list_idx = (size_t)heap->id % HEAP_ARRAY_SIZE; - heap->next_heap = _memory_heaps[list_idx]; - _memory_heaps[list_idx] = heap; -} - -static void _rpmalloc_heap_orphan(heap_t *heap, int first_class) { - heap->owner_thread = (uintptr_t)-1; -#if RPMALLOC_FIRST_CLASS_HEAPS - heap_t **heap_list = - (first_class ? &_memory_first_class_orphan_heaps : &_memory_orphan_heaps); -#else - (void)sizeof(first_class); - heap_t **heap_list = &_memory_orphan_heaps; -#endif - heap->next_orphan = *heap_list; - *heap_list = heap; -} - -//! Allocate a new heap from newly mapped memory pages -static heap_t *_rpmalloc_heap_allocate_new(void) { - // Map in pages for a 16 heaps. If page size is greater than required size for - // this, map a page and use first part for heaps and remaining part for spans - // for allocations. Adds a lot of complexity, but saves a lot of memory on - // systems where page size > 64 spans (4MiB) - size_t heap_size = sizeof(heap_t); - size_t aligned_heap_size = 16 * ((heap_size + 15) / 16); - size_t request_heap_count = 16; - size_t heap_span_count = ((aligned_heap_size * request_heap_count) + - sizeof(span_t) + _memory_span_size - 1) / - _memory_span_size; - size_t block_size = _memory_span_size * heap_span_count; - size_t span_count = heap_span_count; - span_t *span = 0; - // If there are global reserved spans, use these first - if (_memory_global_reserve_count >= heap_span_count) { - span = _rpmalloc_global_get_reserved_spans(heap_span_count); - } - if (!span) { - if (_memory_page_size > block_size) { - span_count = _memory_page_size / _memory_span_size; - block_size = _memory_page_size; - // If using huge pages, make sure to grab enough heaps to avoid - // reallocating a huge page just to serve new heaps - size_t possible_heap_count = - (block_size - sizeof(span_t)) / aligned_heap_size; - if (possible_heap_count >= (request_heap_count * 16)) - request_heap_count *= 16; - else if (possible_heap_count < request_heap_count) - request_heap_count = possible_heap_count; - heap_span_count = ((aligned_heap_size * request_heap_count) + - sizeof(span_t) + _memory_span_size - 1) / - _memory_span_size; - } - - size_t align_offset = 0; - span = (span_t *)_rpmalloc_mmap(block_size, &align_offset); - if (!span) - return 0; - - // Master span will contain the heaps - _rpmalloc_stat_inc(&_master_spans); - _rpmalloc_span_initialize(span, span_count, heap_span_count, align_offset); - } - - size_t remain_size = _memory_span_size - sizeof(span_t); - heap_t *heap = (heap_t *)pointer_offset(span, sizeof(span_t)); - _rpmalloc_heap_initialize(heap); - - // Put extra heaps as orphans - size_t num_heaps = remain_size / aligned_heap_size; - if (num_heaps < request_heap_count) - num_heaps = request_heap_count; - atomic_store32(&heap->child_count, (int32_t)num_heaps - 1); - heap_t *extra_heap = (heap_t *)pointer_offset(heap, aligned_heap_size); - while (num_heaps > 1) { - _rpmalloc_heap_initialize(extra_heap); - extra_heap->master_heap = heap; - _rpmalloc_heap_orphan(extra_heap, 1); - extra_heap = (heap_t *)pointer_offset(extra_heap, aligned_heap_size); - --num_heaps; - } - - if (span_count > heap_span_count) { - // Cap reserved spans - size_t remain_count = span_count - heap_span_count; - size_t reserve_count = - (remain_count > _memory_heap_reserve_count ? _memory_heap_reserve_count - : remain_count); - span_t *remain_span = - (span_t *)pointer_offset(span, heap_span_count * _memory_span_size); - _rpmalloc_heap_set_reserved_spans(heap, span, remain_span, reserve_count); - - if (remain_count > reserve_count) { - // Set to global reserved spans - remain_span = (span_t *)pointer_offset(remain_span, - reserve_count * _memory_span_size); - reserve_count = remain_count - reserve_count; - _rpmalloc_global_set_reserved_spans(span, remain_span, reserve_count); - } - } - - return heap; -} - -static heap_t *_rpmalloc_heap_extract_orphan(heap_t **heap_list) { - heap_t *heap = *heap_list; - *heap_list = (heap ? heap->next_orphan : 0); - return heap; -} - -//! Allocate a new heap, potentially reusing a previously orphaned heap -static heap_t *_rpmalloc_heap_allocate(int first_class) { - heap_t *heap = 0; - while (!atomic_cas32_acquire(&_memory_global_lock, 1, 0)) - _rpmalloc_spin(); - if (first_class == 0) - heap = _rpmalloc_heap_extract_orphan(&_memory_orphan_heaps); -#if RPMALLOC_FIRST_CLASS_HEAPS - if (!heap) - heap = _rpmalloc_heap_extract_orphan(&_memory_first_class_orphan_heaps); -#endif - if (!heap) - heap = _rpmalloc_heap_allocate_new(); - atomic_store32_release(&_memory_global_lock, 0); - if (heap) - _rpmalloc_heap_cache_adopt_deferred(heap, 0); - return heap; -} - -static void _rpmalloc_heap_release(void *heapptr, int first_class, - int release_cache) { - heap_t *heap = (heap_t *)heapptr; - if (!heap) - return; - // Release thread cache spans back to global cache - _rpmalloc_heap_cache_adopt_deferred(heap, 0); - if (release_cache || heap->finalize) { -#if ENABLE_THREAD_CACHE - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - span_cache_t *span_cache; - if (!iclass) - span_cache = &heap->span_cache; - else - span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); - if (!span_cache->count) - continue; -#if ENABLE_GLOBAL_CACHE - if (heap->finalize) { - for (size_t ispan = 0; ispan < span_cache->count; ++ispan) - _rpmalloc_span_unmap(span_cache->span[ispan]); - } else { - _rpmalloc_stat_add64(&heap->thread_to_global, span_cache->count * - (iclass + 1) * - _memory_span_size); - _rpmalloc_stat_add(&heap->span_use[iclass].spans_to_global, - span_cache->count); - _rpmalloc_global_cache_insert_spans(span_cache->span, iclass + 1, - span_cache->count); - } -#else - for (size_t ispan = 0; ispan < span_cache->count; ++ispan) - _rpmalloc_span_unmap(span_cache->span[ispan]); -#endif - span_cache->count = 0; - } -#endif - } - - if (get_thread_heap_raw() == heap) - set_thread_heap(0); - -#if ENABLE_STATISTICS - atomic_decr32(&_memory_active_heaps); - rpmalloc_assert(atomic_load32(&_memory_active_heaps) >= 0, - "Still active heaps during finalization"); -#endif - - // If we are forcibly terminating with _exit the state of the - // lock atomic is unknown and it's best to just go ahead and exit - if (get_thread_id() != _rpmalloc_main_thread_id) { - while (!atomic_cas32_acquire(&_memory_global_lock, 1, 0)) - _rpmalloc_spin(); - } - _rpmalloc_heap_orphan(heap, first_class); - atomic_store32_release(&_memory_global_lock, 0); -} - -static void _rpmalloc_heap_release_raw(void *heapptr, int release_cache) { - _rpmalloc_heap_release(heapptr, 0, release_cache); -} - -static void _rpmalloc_heap_release_raw_fc(void *heapptr) { - _rpmalloc_heap_release_raw(heapptr, 1); -} - -static void _rpmalloc_heap_finalize(heap_t *heap) { - if (heap->spans_reserved) { - span_t *span = _rpmalloc_span_map(heap, heap->spans_reserved); - _rpmalloc_span_unmap(span); - heap->spans_reserved = 0; - } - - _rpmalloc_heap_cache_adopt_deferred(heap, 0); - - for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { - if (heap->size_class[iclass].cache) - _rpmalloc_span_unmap(heap->size_class[iclass].cache); - heap->size_class[iclass].cache = 0; - span_t *span = heap->size_class[iclass].partial_span; - while (span) { - span_t *next = span->next; - _rpmalloc_span_finalize(heap, iclass, span, - &heap->size_class[iclass].partial_span); - span = next; - } - // If class still has a free list it must be a full span - if (heap->size_class[iclass].free_list) { - span_t *class_span = - (span_t *)((uintptr_t)heap->size_class[iclass].free_list & - _memory_span_mask); - span_t **list = 0; -#if RPMALLOC_FIRST_CLASS_HEAPS - list = &heap->full_span[iclass]; -#endif - --heap->full_span_count; - if (!_rpmalloc_span_finalize(heap, iclass, class_span, list)) { - if (list) - _rpmalloc_span_double_link_list_remove(list, class_span); - _rpmalloc_span_double_link_list_add( - &heap->size_class[iclass].partial_span, class_span); - } - } - } - -#if ENABLE_THREAD_CACHE - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - span_cache_t *span_cache; - if (!iclass) - span_cache = &heap->span_cache; - else - span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); - for (size_t ispan = 0; ispan < span_cache->count; ++ispan) - _rpmalloc_span_unmap(span_cache->span[ispan]); - span_cache->count = 0; - } -#endif - rpmalloc_assert(!atomic_load_ptr(&heap->span_free_deferred), - "Heaps still active during finalization"); -} - -//////////// -/// -/// Allocation entry points -/// -////// - -//! Pop first block from a free list -static void *free_list_pop(void **list) { - void *block = *list; - *list = *((void **)block); - return block; -} - -//! Allocate a small/medium sized memory block from the given heap -static void *_rpmalloc_allocate_from_heap_fallback( - heap_t *heap, heap_size_class_t *heap_size_class, uint32_t class_idx) { - span_t *span = heap_size_class->partial_span; - rpmalloc_assume(heap != 0); - if (EXPECTED(span != 0)) { - rpmalloc_assert(span->block_count == - _memory_size_class[span->size_class].block_count, - "Span block count corrupted"); - rpmalloc_assert(!_rpmalloc_span_is_fully_utilized(span), - "Internal failure"); - void *block; - if (span->free_list) { - // Span local free list is not empty, swap to size class free list - block = free_list_pop(&span->free_list); - heap_size_class->free_list = span->free_list; - span->free_list = 0; - } else { - // If the span did not fully initialize free list, link up another page - // worth of blocks - void *block_start = pointer_offset( - span, SPAN_HEADER_SIZE + - ((size_t)span->free_list_limit * span->block_size)); - span->free_list_limit += free_list_partial_init( - &heap_size_class->free_list, &block, - (void *)((uintptr_t)block_start & ~(_memory_page_size - 1)), - block_start, span->block_count - span->free_list_limit, - span->block_size); - } - rpmalloc_assert(span->free_list_limit <= span->block_count, - "Span block count corrupted"); - span->used_count = span->free_list_limit; - - // Swap in deferred free list if present - if (atomic_load_ptr(&span->free_list_deferred)) - _rpmalloc_span_extract_free_list_deferred(span); - - // If span is still not fully utilized keep it in partial list and early - // return block - if (!_rpmalloc_span_is_fully_utilized(span)) - return block; - - // The span is fully utilized, unlink from partial list and add to fully - // utilized list - _rpmalloc_span_double_link_list_pop_head(&heap_size_class->partial_span, - span); -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_add(&heap->full_span[class_idx], span); -#endif - ++heap->full_span_count; - return block; - } - - // Find a span in one of the cache levels - span = _rpmalloc_heap_extract_new_span(heap, heap_size_class, 1, class_idx); - if (EXPECTED(span != 0)) { - // Mark span as owned by this heap and set base data, return first block - return _rpmalloc_span_initialize_new(heap, heap_size_class, span, - class_idx); - } - - return 0; -} - -//! Allocate a small sized memory block from the given heap -static void *_rpmalloc_allocate_small(heap_t *heap, size_t size) { - rpmalloc_assert(heap, "No thread heap"); - // Small sizes have unique size classes - const uint32_t class_idx = - (uint32_t)((size + (SMALL_GRANULARITY - 1)) >> SMALL_GRANULARITY_SHIFT); - heap_size_class_t *heap_size_class = heap->size_class + class_idx; - _rpmalloc_stat_inc_alloc(heap, class_idx); - if (EXPECTED(heap_size_class->free_list != 0)) - return free_list_pop(&heap_size_class->free_list); - return _rpmalloc_allocate_from_heap_fallback(heap, heap_size_class, - class_idx); -} - -//! Allocate a medium sized memory block from the given heap -static void *_rpmalloc_allocate_medium(heap_t *heap, size_t size) { - rpmalloc_assert(heap, "No thread heap"); - // Calculate the size class index and do a dependent lookup of the final class - // index (in case of merged classes) - const uint32_t base_idx = - (uint32_t)(SMALL_CLASS_COUNT + - ((size - (SMALL_SIZE_LIMIT + 1)) >> MEDIUM_GRANULARITY_SHIFT)); - const uint32_t class_idx = _memory_size_class[base_idx].class_idx; - heap_size_class_t *heap_size_class = heap->size_class + class_idx; - _rpmalloc_stat_inc_alloc(heap, class_idx); - if (EXPECTED(heap_size_class->free_list != 0)) - return free_list_pop(&heap_size_class->free_list); - return _rpmalloc_allocate_from_heap_fallback(heap, heap_size_class, - class_idx); -} - -//! Allocate a large sized memory block from the given heap -static void *_rpmalloc_allocate_large(heap_t *heap, size_t size) { - rpmalloc_assert(heap, "No thread heap"); - // Calculate number of needed max sized spans (including header) - // Since this function is never called if size > LARGE_SIZE_LIMIT - // the span_count is guaranteed to be <= LARGE_CLASS_COUNT - size += SPAN_HEADER_SIZE; - size_t span_count = size >> _memory_span_size_shift; - if (size & (_memory_span_size - 1)) - ++span_count; - - // Find a span in one of the cache levels - span_t *span = - _rpmalloc_heap_extract_new_span(heap, 0, span_count, SIZE_CLASS_LARGE); - if (!span) - return span; - - // Mark span as owned by this heap and set base data - rpmalloc_assert(span->span_count >= span_count, "Internal failure"); - span->size_class = SIZE_CLASS_LARGE; - span->heap = heap; - -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_add(&heap->large_huge_span, span); -#endif - ++heap->full_span_count; - - return pointer_offset(span, SPAN_HEADER_SIZE); -} - -//! Allocate a huge block by mapping memory pages directly -static void *_rpmalloc_allocate_huge(heap_t *heap, size_t size) { - rpmalloc_assert(heap, "No thread heap"); - _rpmalloc_heap_cache_adopt_deferred(heap, 0); - size += SPAN_HEADER_SIZE; - size_t num_pages = size >> _memory_page_size_shift; - if (size & (_memory_page_size - 1)) - ++num_pages; - size_t align_offset = 0; - span_t *span = - (span_t *)_rpmalloc_mmap(num_pages * _memory_page_size, &align_offset); - if (!span) - return span; - - // Store page count in span_count - span->size_class = SIZE_CLASS_HUGE; - span->span_count = (uint32_t)num_pages; - span->align_offset = (uint32_t)align_offset; - span->heap = heap; - _rpmalloc_stat_add_peak(&_huge_pages_current, num_pages, _huge_pages_peak); - -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_add(&heap->large_huge_span, span); -#endif - ++heap->full_span_count; - - return pointer_offset(span, SPAN_HEADER_SIZE); -} - -//! Allocate a block of the given size -static void *_rpmalloc_allocate(heap_t *heap, size_t size) { - _rpmalloc_stat_add64(&_allocation_counter, 1); - if (EXPECTED(size <= SMALL_SIZE_LIMIT)) - return _rpmalloc_allocate_small(heap, size); - else if (size <= _memory_medium_size_limit) - return _rpmalloc_allocate_medium(heap, size); - else if (size <= LARGE_SIZE_LIMIT) - return _rpmalloc_allocate_large(heap, size); - return _rpmalloc_allocate_huge(heap, size); -} - -static void *_rpmalloc_aligned_allocate(heap_t *heap, size_t alignment, - size_t size) { - if (alignment <= SMALL_GRANULARITY) - return _rpmalloc_allocate(heap, size); - -#if ENABLE_VALIDATE_ARGS - if ((size + alignment) < size) { - errno = EINVAL; - return 0; - } - if (alignment & (alignment - 1)) { - errno = EINVAL; - return 0; - } -#endif - - if ((alignment <= SPAN_HEADER_SIZE) && - ((size + SPAN_HEADER_SIZE) < _memory_medium_size_limit)) { - // If alignment is less or equal to span header size (which is power of - // two), and size aligned to span header size multiples is less than size + - // alignment, then use natural alignment of blocks to provide alignment - size_t multiple_size = size ? (size + (SPAN_HEADER_SIZE - 1)) & - ~(uintptr_t)(SPAN_HEADER_SIZE - 1) - : SPAN_HEADER_SIZE; - rpmalloc_assert(!(multiple_size % SPAN_HEADER_SIZE), - "Failed alignment calculation"); - if (multiple_size <= (size + alignment)) - return _rpmalloc_allocate(heap, multiple_size); - } - - void *ptr = 0; - size_t align_mask = alignment - 1; - if (alignment <= _memory_page_size) { - ptr = _rpmalloc_allocate(heap, size + alignment); - if ((uintptr_t)ptr & align_mask) { - ptr = (void *)(((uintptr_t)ptr & ~(uintptr_t)align_mask) + alignment); - // Mark as having aligned blocks - span_t *span = (span_t *)((uintptr_t)ptr & _memory_span_mask); - span->flags |= SPAN_FLAG_ALIGNED_BLOCKS; - } - return ptr; - } - - // Fallback to mapping new pages for this request. Since pointers passed - // to rpfree must be able to reach the start of the span by bitmasking of - // the address with the span size, the returned aligned pointer from this - // function must be with a span size of the start of the mapped area. - // In worst case this requires us to loop and map pages until we get a - // suitable memory address. It also means we can never align to span size - // or greater, since the span header will push alignment more than one - // span size away from span start (thus causing pointer mask to give us - // an invalid span start on free) - if (alignment & align_mask) { - errno = EINVAL; - return 0; - } - if (alignment >= _memory_span_size) { - errno = EINVAL; - return 0; - } - - size_t extra_pages = alignment / _memory_page_size; - - // Since each span has a header, we will at least need one extra memory page - size_t num_pages = 1 + (size / _memory_page_size); - if (size & (_memory_page_size - 1)) - ++num_pages; - - if (extra_pages > num_pages) - num_pages = 1 + extra_pages; - - size_t original_pages = num_pages; - size_t limit_pages = (_memory_span_size / _memory_page_size) * 2; - if (limit_pages < (original_pages * 2)) - limit_pages = original_pages * 2; - - size_t mapped_size, align_offset; - span_t *span; - -retry: - align_offset = 0; - mapped_size = num_pages * _memory_page_size; - - span = (span_t *)_rpmalloc_mmap(mapped_size, &align_offset); - if (!span) { - errno = ENOMEM; - return 0; - } - ptr = pointer_offset(span, SPAN_HEADER_SIZE); - - if ((uintptr_t)ptr & align_mask) - ptr = (void *)(((uintptr_t)ptr & ~(uintptr_t)align_mask) + alignment); - - if (((size_t)pointer_diff(ptr, span) >= _memory_span_size) || - (pointer_offset(ptr, size) > pointer_offset(span, mapped_size)) || - (((uintptr_t)ptr & _memory_span_mask) != (uintptr_t)span)) { - _rpmalloc_unmap(span, mapped_size, align_offset, mapped_size); - ++num_pages; - if (num_pages > limit_pages) { - errno = EINVAL; - return 0; - } - goto retry; - } - - // Store page count in span_count - span->size_class = SIZE_CLASS_HUGE; - span->span_count = (uint32_t)num_pages; - span->align_offset = (uint32_t)align_offset; - span->heap = heap; - _rpmalloc_stat_add_peak(&_huge_pages_current, num_pages, _huge_pages_peak); - -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_add(&heap->large_huge_span, span); -#endif - ++heap->full_span_count; - - _rpmalloc_stat_add64(&_allocation_counter, 1); - - return ptr; -} - -//////////// -/// -/// Deallocation entry points -/// -////// - -//! Deallocate the given small/medium memory block in the current thread local -//! heap -static void _rpmalloc_deallocate_direct_small_or_medium(span_t *span, - void *block) { - heap_t *heap = span->heap; - rpmalloc_assert(heap->owner_thread == get_thread_id() || - !heap->owner_thread || heap->finalize, - "Internal failure"); - // Add block to free list - if (UNEXPECTED(_rpmalloc_span_is_fully_utilized(span))) { - span->used_count = span->block_count; -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_remove(&heap->full_span[span->size_class], - span); -#endif - _rpmalloc_span_double_link_list_add( - &heap->size_class[span->size_class].partial_span, span); - --heap->full_span_count; - } - *((void **)block) = span->free_list; - --span->used_count; - span->free_list = block; - if (UNEXPECTED(span->used_count == span->list_size)) { - // If there are no used blocks it is guaranteed that no other external - // thread is accessing the span - if (span->used_count) { - // Make sure we have synchronized the deferred list and list size by using - // acquire semantics and guarantee that no external thread is accessing - // span concurrently - void *free_list; - do { - free_list = atomic_exchange_ptr_acquire(&span->free_list_deferred, - INVALID_POINTER); - } while (free_list == INVALID_POINTER); - atomic_store_ptr_release(&span->free_list_deferred, free_list); - } - _rpmalloc_span_double_link_list_remove( - &heap->size_class[span->size_class].partial_span, span); - _rpmalloc_span_release_to_cache(heap, span); - } -} - -static void _rpmalloc_deallocate_defer_free_span(heap_t *heap, span_t *span) { - if (span->size_class != SIZE_CLASS_HUGE) - _rpmalloc_stat_inc(&heap->span_use[span->span_count - 1].spans_deferred); - // This list does not need ABA protection, no mutable side state - do { - span->free_list = (void *)atomic_load_ptr(&heap->span_free_deferred); - } while (!atomic_cas_ptr(&heap->span_free_deferred, span, span->free_list)); -} - -//! Put the block in the deferred free list of the owning span -static void _rpmalloc_deallocate_defer_small_or_medium(span_t *span, - void *block) { - // The memory ordering here is a bit tricky, to avoid having to ABA protect - // the deferred free list to avoid desynchronization of list and list size - // we need to have acquire semantics on successful CAS of the pointer to - // guarantee the list_size variable validity + release semantics on pointer - // store - void *free_list; - do { - free_list = - atomic_exchange_ptr_acquire(&span->free_list_deferred, INVALID_POINTER); - } while (free_list == INVALID_POINTER); - *((void **)block) = free_list; - uint32_t free_count = ++span->list_size; - int all_deferred_free = (free_count == span->block_count); - atomic_store_ptr_release(&span->free_list_deferred, block); - if (all_deferred_free) { - // Span was completely freed by this block. Due to the INVALID_POINTER spin - // lock no other thread can reach this state simultaneously on this span. - // Safe to move to owner heap deferred cache - _rpmalloc_deallocate_defer_free_span(span->heap, span); - } -} - -static void _rpmalloc_deallocate_small_or_medium(span_t *span, void *p) { - _rpmalloc_stat_inc_free(span->heap, span->size_class); - if (span->flags & SPAN_FLAG_ALIGNED_BLOCKS) { - // Realign pointer to block start - void *blocks_start = pointer_offset(span, SPAN_HEADER_SIZE); - uint32_t block_offset = (uint32_t)pointer_diff(p, blocks_start); - p = pointer_offset(p, -(int32_t)(block_offset % span->block_size)); - } - // Check if block belongs to this heap or if deallocation should be deferred -#if RPMALLOC_FIRST_CLASS_HEAPS - int defer = - (span->heap->owner_thread && - (span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); -#else - int defer = - ((span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); -#endif - if (!defer) - _rpmalloc_deallocate_direct_small_or_medium(span, p); - else - _rpmalloc_deallocate_defer_small_or_medium(span, p); -} - -//! Deallocate the given large memory block to the current heap -static void _rpmalloc_deallocate_large(span_t *span) { - rpmalloc_assert(span->size_class == SIZE_CLASS_LARGE, "Bad span size class"); - rpmalloc_assert(!(span->flags & SPAN_FLAG_MASTER) || - !(span->flags & SPAN_FLAG_SUBSPAN), - "Span flag corrupted"); - rpmalloc_assert((span->flags & SPAN_FLAG_MASTER) || - (span->flags & SPAN_FLAG_SUBSPAN), - "Span flag corrupted"); - // We must always defer (unless finalizing) if from another heap since we - // cannot touch the list or counters of another heap -#if RPMALLOC_FIRST_CLASS_HEAPS - int defer = - (span->heap->owner_thread && - (span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); -#else - int defer = - ((span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); -#endif - if (defer) { - _rpmalloc_deallocate_defer_free_span(span->heap, span); - return; - } - rpmalloc_assert(span->heap->full_span_count, "Heap span counter corrupted"); - --span->heap->full_span_count; -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_remove(&span->heap->large_huge_span, span); -#endif -#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS - // Decrease counter - size_t idx = span->span_count - 1; - atomic_decr32(&span->heap->span_use[idx].current); -#endif - heap_t *heap = span->heap; - rpmalloc_assert(heap, "No thread heap"); -#if ENABLE_THREAD_CACHE - const int set_as_reserved = - ((span->span_count > 1) && (heap->span_cache.count == 0) && - !heap->finalize && !heap->spans_reserved); -#else - const int set_as_reserved = - ((span->span_count > 1) && !heap->finalize && !heap->spans_reserved); -#endif - if (set_as_reserved) { - heap->span_reserve = span; - heap->spans_reserved = span->span_count; - if (span->flags & SPAN_FLAG_MASTER) { - heap->span_reserve_master = span; - } else { // SPAN_FLAG_SUBSPAN - span_t *master = (span_t *)pointer_offset( - span, - -(intptr_t)((size_t)span->offset_from_master * _memory_span_size)); - heap->span_reserve_master = master; - rpmalloc_assert(master->flags & SPAN_FLAG_MASTER, "Span flag corrupted"); - rpmalloc_assert(atomic_load32(&master->remaining_spans) >= - (int32_t)span->span_count, - "Master span count corrupted"); - } - _rpmalloc_stat_inc(&heap->span_use[idx].spans_to_reserved); - } else { - // Insert into cache list - _rpmalloc_heap_cache_insert(heap, span); - } -} - -//! Deallocate the given huge span -static void _rpmalloc_deallocate_huge(span_t *span) { - rpmalloc_assert(span->heap, "No span heap"); -#if RPMALLOC_FIRST_CLASS_HEAPS - int defer = - (span->heap->owner_thread && - (span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); -#else - int defer = - ((span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); -#endif - if (defer) { - _rpmalloc_deallocate_defer_free_span(span->heap, span); - return; - } - rpmalloc_assert(span->heap->full_span_count, "Heap span counter corrupted"); - --span->heap->full_span_count; -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_remove(&span->heap->large_huge_span, span); -#endif - - // Oversized allocation, page count is stored in span_count - size_t num_pages = span->span_count; - _rpmalloc_unmap(span, num_pages * _memory_page_size, span->align_offset, - num_pages * _memory_page_size); - _rpmalloc_stat_sub(&_huge_pages_current, num_pages); -} - -//! Deallocate the given block -static void _rpmalloc_deallocate(void *p) { - _rpmalloc_stat_add64(&_deallocation_counter, 1); - // Grab the span (always at start of span, using span alignment) - span_t *span = (span_t *)((uintptr_t)p & _memory_span_mask); - if (UNEXPECTED(!span)) - return; - if (EXPECTED(span->size_class < SIZE_CLASS_COUNT)) - _rpmalloc_deallocate_small_or_medium(span, p); - else if (span->size_class == SIZE_CLASS_LARGE) - _rpmalloc_deallocate_large(span); - else - _rpmalloc_deallocate_huge(span); -} - -//////////// -/// -/// Reallocation entry points -/// -////// - -static size_t _rpmalloc_usable_size(void *p); - -//! Reallocate the given block to the given size -static void *_rpmalloc_reallocate(heap_t *heap, void *p, size_t size, - size_t oldsize, unsigned int flags) { - if (p) { - // Grab the span using guaranteed span alignment - span_t *span = (span_t *)((uintptr_t)p & _memory_span_mask); - if (EXPECTED(span->size_class < SIZE_CLASS_COUNT)) { - // Small/medium sized block - rpmalloc_assert(span->span_count == 1, "Span counter corrupted"); - void *blocks_start = pointer_offset(span, SPAN_HEADER_SIZE); - uint32_t block_offset = (uint32_t)pointer_diff(p, blocks_start); - uint32_t block_idx = block_offset / span->block_size; - void *block = - pointer_offset(blocks_start, (size_t)block_idx * span->block_size); - if (!oldsize) - oldsize = - (size_t)((ptrdiff_t)span->block_size - pointer_diff(p, block)); - if ((size_t)span->block_size >= size) { - // Still fits in block, never mind trying to save memory, but preserve - // data if alignment changed - if ((p != block) && !(flags & RPMALLOC_NO_PRESERVE)) - memmove(block, p, oldsize); - return block; - } - } else if (span->size_class == SIZE_CLASS_LARGE) { - // Large block - size_t total_size = size + SPAN_HEADER_SIZE; - size_t num_spans = total_size >> _memory_span_size_shift; - if (total_size & (_memory_span_mask - 1)) - ++num_spans; - size_t current_spans = span->span_count; - void *block = pointer_offset(span, SPAN_HEADER_SIZE); - if (!oldsize) - oldsize = (current_spans * _memory_span_size) - - (size_t)pointer_diff(p, block) - SPAN_HEADER_SIZE; - if ((current_spans >= num_spans) && (total_size >= (oldsize / 2))) { - // Still fits in block, never mind trying to save memory, but preserve - // data if alignment changed - if ((p != block) && !(flags & RPMALLOC_NO_PRESERVE)) - memmove(block, p, oldsize); - return block; - } - } else { - // Oversized block - size_t total_size = size + SPAN_HEADER_SIZE; - size_t num_pages = total_size >> _memory_page_size_shift; - if (total_size & (_memory_page_size - 1)) - ++num_pages; - // Page count is stored in span_count - size_t current_pages = span->span_count; - void *block = pointer_offset(span, SPAN_HEADER_SIZE); - if (!oldsize) - oldsize = (current_pages * _memory_page_size) - - (size_t)pointer_diff(p, block) - SPAN_HEADER_SIZE; - if ((current_pages >= num_pages) && (num_pages >= (current_pages / 2))) { - // Still fits in block, never mind trying to save memory, but preserve - // data if alignment changed - if ((p != block) && !(flags & RPMALLOC_NO_PRESERVE)) - memmove(block, p, oldsize); - return block; - } - } - } else { - oldsize = 0; - } - - if (!!(flags & RPMALLOC_GROW_OR_FAIL)) - return 0; - - // Size is greater than block size, need to allocate a new block and - // deallocate the old Avoid hysteresis by overallocating if increase is small - // (below 37%) - size_t lower_bound = oldsize + (oldsize >> 2) + (oldsize >> 3); - size_t new_size = - (size > lower_bound) ? size : ((size > oldsize) ? lower_bound : size); - void *block = _rpmalloc_allocate(heap, new_size); - if (p && block) { - if (!(flags & RPMALLOC_NO_PRESERVE)) - memcpy(block, p, oldsize < new_size ? oldsize : new_size); - _rpmalloc_deallocate(p); - } - - return block; -} - -static void *_rpmalloc_aligned_reallocate(heap_t *heap, void *ptr, - size_t alignment, size_t size, - size_t oldsize, unsigned int flags) { - if (alignment <= SMALL_GRANULARITY) - return _rpmalloc_reallocate(heap, ptr, size, oldsize, flags); - - int no_alloc = !!(flags & RPMALLOC_GROW_OR_FAIL); - size_t usablesize = (ptr ? _rpmalloc_usable_size(ptr) : 0); - if ((usablesize >= size) && !((uintptr_t)ptr & (alignment - 1))) { - if (no_alloc || (size >= (usablesize / 2))) - return ptr; - } - // Aligned alloc marks span as having aligned blocks - void *block = - (!no_alloc ? _rpmalloc_aligned_allocate(heap, alignment, size) : 0); - if (EXPECTED(block != 0)) { - if (!(flags & RPMALLOC_NO_PRESERVE) && ptr) { - if (!oldsize) - oldsize = usablesize; - memcpy(block, ptr, oldsize < size ? oldsize : size); - } - _rpmalloc_deallocate(ptr); - } - return block; -} - -//////////// -/// -/// Initialization, finalization and utility -/// -////// - -//! Get the usable size of the given block -static size_t _rpmalloc_usable_size(void *p) { - // Grab the span using guaranteed span alignment - span_t *span = (span_t *)((uintptr_t)p & _memory_span_mask); - if (span->size_class < SIZE_CLASS_COUNT) { - // Small/medium block - void *blocks_start = pointer_offset(span, SPAN_HEADER_SIZE); - return span->block_size - - ((size_t)pointer_diff(p, blocks_start) % span->block_size); - } - if (span->size_class == SIZE_CLASS_LARGE) { - // Large block - size_t current_spans = span->span_count; - return (current_spans * _memory_span_size) - (size_t)pointer_diff(p, span); - } - // Oversized block, page count is stored in span_count - size_t current_pages = span->span_count; - return (current_pages * _memory_page_size) - (size_t)pointer_diff(p, span); -} - -//! Adjust and optimize the size class properties for the given class -static void _rpmalloc_adjust_size_class(size_t iclass) { - size_t block_size = _memory_size_class[iclass].block_size; - size_t block_count = (_memory_span_size - SPAN_HEADER_SIZE) / block_size; - - _memory_size_class[iclass].block_count = (uint16_t)block_count; - _memory_size_class[iclass].class_idx = (uint16_t)iclass; - - // Check if previous size classes can be merged - if (iclass >= SMALL_CLASS_COUNT) { - size_t prevclass = iclass; - while (prevclass > 0) { - --prevclass; - // A class can be merged if number of pages and number of blocks are equal - if (_memory_size_class[prevclass].block_count == - _memory_size_class[iclass].block_count) - _rpmalloc_memcpy_const(_memory_size_class + prevclass, - _memory_size_class + iclass, - sizeof(_memory_size_class[iclass])); - else - break; - } - } -} - -//! Initialize the allocator and setup global data -extern inline int rpmalloc_initialize(void) { - if (_rpmalloc_initialized) { - rpmalloc_thread_initialize(); - return 0; - } - return rpmalloc_initialize_config(0); -} - -int rpmalloc_initialize_config(const rpmalloc_config_t *config) { - if (_rpmalloc_initialized) { - rpmalloc_thread_initialize(); - return 0; - } - _rpmalloc_initialized = 1; - - if (config) - memcpy(&_memory_config, config, sizeof(rpmalloc_config_t)); - else - _rpmalloc_memset_const(&_memory_config, 0, sizeof(rpmalloc_config_t)); - - if (!_memory_config.memory_map || !_memory_config.memory_unmap) { - _memory_config.memory_map = _rpmalloc_mmap_os; - _memory_config.memory_unmap = _rpmalloc_unmap_os; - } - -#if PLATFORM_WINDOWS - SYSTEM_INFO system_info; - memset(&system_info, 0, sizeof(system_info)); - GetSystemInfo(&system_info); - _memory_map_granularity = system_info.dwAllocationGranularity; -#else - _memory_map_granularity = (size_t)sysconf(_SC_PAGESIZE); -#endif - -#if RPMALLOC_CONFIGURABLE - _memory_page_size = _memory_config.page_size; -#else - _memory_page_size = 0; -#endif - _memory_huge_pages = 0; - if (!_memory_page_size) { -#if PLATFORM_WINDOWS - _memory_page_size = system_info.dwPageSize; -#else - _memory_page_size = _memory_map_granularity; - if (_memory_config.enable_huge_pages) { -#if defined(__linux__) - size_t huge_page_size = 0; - FILE *meminfo = fopen("/proc/meminfo", "r"); - if (meminfo) { - char line[128]; - while (!huge_page_size && fgets(line, sizeof(line) - 1, meminfo)) { - line[sizeof(line) - 1] = 0; - if (strstr(line, "Hugepagesize:")) - huge_page_size = (size_t)strtol(line + 13, 0, 10) * 1024; - } - fclose(meminfo); - } - if (huge_page_size) { - _memory_huge_pages = 1; - _memory_page_size = huge_page_size; - _memory_map_granularity = huge_page_size; - } -#elif defined(__FreeBSD__) - int rc; - size_t sz = sizeof(rc); - - if (sysctlbyname("vm.pmap.pg_ps_enabled", &rc, &sz, NULL, 0) == 0 && - rc == 1) { - static size_t defsize = 2 * 1024 * 1024; - int nsize = 0; - size_t sizes[4] = {0}; - _memory_huge_pages = 1; - _memory_page_size = defsize; - if ((nsize = getpagesizes(sizes, 4)) >= 2) { - nsize--; - for (size_t csize = sizes[nsize]; nsize >= 0 && csize; - --nsize, csize = sizes[nsize]) { - //! Unlikely, but as a precaution.. - rpmalloc_assert(!(csize & (csize - 1)) && !(csize % 1024), - "Invalid page size"); - if (defsize < csize) { - _memory_page_size = csize; - break; - } - } - } - _memory_map_granularity = _memory_page_size; - } -#elif defined(__APPLE__) || defined(__NetBSD__) - _memory_huge_pages = 1; - _memory_page_size = 2 * 1024 * 1024; - _memory_map_granularity = _memory_page_size; -#endif - } -#endif - } else { - if (_memory_config.enable_huge_pages) - _memory_huge_pages = 1; - } - -#if PLATFORM_WINDOWS - if (_memory_config.enable_huge_pages) { - HANDLE token = 0; - size_t large_page_minimum = GetLargePageMinimum(); - if (large_page_minimum) - OpenProcessToken(GetCurrentProcess(), - TOKEN_ADJUST_PRIVILEGES | TOKEN_QUERY, &token); - if (token) { - LUID luid; - if (LookupPrivilegeValue(0, SE_LOCK_MEMORY_NAME, &luid)) { - TOKEN_PRIVILEGES token_privileges; - memset(&token_privileges, 0, sizeof(token_privileges)); - token_privileges.PrivilegeCount = 1; - token_privileges.Privileges[0].Luid = luid; - token_privileges.Privileges[0].Attributes = SE_PRIVILEGE_ENABLED; - if (AdjustTokenPrivileges(token, FALSE, &token_privileges, 0, 0, 0)) { - if (GetLastError() == ERROR_SUCCESS) - _memory_huge_pages = 1; - } - } - CloseHandle(token); - } - if (_memory_huge_pages) { - if (large_page_minimum > _memory_page_size) - _memory_page_size = large_page_minimum; - if (large_page_minimum > _memory_map_granularity) - _memory_map_granularity = large_page_minimum; - } - } -#endif - - size_t min_span_size = 256; - size_t max_page_size; -#if UINTPTR_MAX > 0xFFFFFFFF - max_page_size = 4096ULL * 1024ULL * 1024ULL; -#else - max_page_size = 4 * 1024 * 1024; -#endif - if (_memory_page_size < min_span_size) - _memory_page_size = min_span_size; - if (_memory_page_size > max_page_size) - _memory_page_size = max_page_size; - _memory_page_size_shift = 0; - size_t page_size_bit = _memory_page_size; - while (page_size_bit != 1) { - ++_memory_page_size_shift; - page_size_bit >>= 1; - } - _memory_page_size = ((size_t)1 << _memory_page_size_shift); - -#if RPMALLOC_CONFIGURABLE - if (!_memory_config.span_size) { - _memory_span_size = _memory_default_span_size; - _memory_span_size_shift = _memory_default_span_size_shift; - _memory_span_mask = _memory_default_span_mask; - } else { - size_t span_size = _memory_config.span_size; - if (span_size > (256 * 1024)) - span_size = (256 * 1024); - _memory_span_size = 4096; - _memory_span_size_shift = 12; - while (_memory_span_size < span_size) { - _memory_span_size <<= 1; - ++_memory_span_size_shift; - } - _memory_span_mask = ~(uintptr_t)(_memory_span_size - 1); - } -#endif - - _memory_span_map_count = - (_memory_config.span_map_count ? _memory_config.span_map_count - : DEFAULT_SPAN_MAP_COUNT); - if ((_memory_span_size * _memory_span_map_count) < _memory_page_size) - _memory_span_map_count = (_memory_page_size / _memory_span_size); - if ((_memory_page_size >= _memory_span_size) && - ((_memory_span_map_count * _memory_span_size) % _memory_page_size)) - _memory_span_map_count = (_memory_page_size / _memory_span_size); - _memory_heap_reserve_count = (_memory_span_map_count > DEFAULT_SPAN_MAP_COUNT) - ? DEFAULT_SPAN_MAP_COUNT - : _memory_span_map_count; - - _memory_config.page_size = _memory_page_size; - _memory_config.span_size = _memory_span_size; - _memory_config.span_map_count = _memory_span_map_count; - _memory_config.enable_huge_pages = _memory_huge_pages; - -#if ((defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD) || \ - defined(__TINYC__) - if (pthread_key_create(&_memory_thread_heap, _rpmalloc_heap_release_raw_fc)) - return -1; -#endif -#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) - fls_key = FlsAlloc(&_rpmalloc_thread_destructor); -#endif - - // Setup all small and medium size classes - size_t iclass = 0; - _memory_size_class[iclass].block_size = SMALL_GRANULARITY; - _rpmalloc_adjust_size_class(iclass); - for (iclass = 1; iclass < SMALL_CLASS_COUNT; ++iclass) { - size_t size = iclass * SMALL_GRANULARITY; - _memory_size_class[iclass].block_size = (uint32_t)size; - _rpmalloc_adjust_size_class(iclass); - } - // At least two blocks per span, then fall back to large allocations - _memory_medium_size_limit = (_memory_span_size - SPAN_HEADER_SIZE) >> 1; - if (_memory_medium_size_limit > MEDIUM_SIZE_LIMIT) - _memory_medium_size_limit = MEDIUM_SIZE_LIMIT; - for (iclass = 0; iclass < MEDIUM_CLASS_COUNT; ++iclass) { - size_t size = SMALL_SIZE_LIMIT + ((iclass + 1) * MEDIUM_GRANULARITY); - if (size > _memory_medium_size_limit) { - _memory_medium_size_limit = - SMALL_SIZE_LIMIT + (iclass * MEDIUM_GRANULARITY); - break; - } - _memory_size_class[SMALL_CLASS_COUNT + iclass].block_size = (uint32_t)size; - _rpmalloc_adjust_size_class(SMALL_CLASS_COUNT + iclass); - } - - _memory_orphan_heaps = 0; -#if RPMALLOC_FIRST_CLASS_HEAPS - _memory_first_class_orphan_heaps = 0; -#endif -#if ENABLE_STATISTICS - atomic_store32(&_memory_active_heaps, 0); - atomic_store32(&_mapped_pages, 0); - _mapped_pages_peak = 0; - atomic_store32(&_master_spans, 0); - atomic_store32(&_mapped_total, 0); - atomic_store32(&_unmapped_total, 0); - atomic_store32(&_mapped_pages_os, 0); - atomic_store32(&_huge_pages_current, 0); - _huge_pages_peak = 0; -#endif - memset(_memory_heaps, 0, sizeof(_memory_heaps)); - atomic_store32_release(&_memory_global_lock, 0); - - rpmalloc_linker_reference(); - - // Initialize this thread - rpmalloc_thread_initialize(); - return 0; -} - -//! Finalize the allocator -void rpmalloc_finalize(void) { - rpmalloc_thread_finalize(1); - // rpmalloc_dump_statistics(stdout); - - if (_memory_global_reserve) { - atomic_add32(&_memory_global_reserve_master->remaining_spans, - -(int32_t)_memory_global_reserve_count); - _memory_global_reserve_master = 0; - _memory_global_reserve_count = 0; - _memory_global_reserve = 0; - } - atomic_store32_release(&_memory_global_lock, 0); - - // Free all thread caches and fully free spans - for (size_t list_idx = 0; list_idx < HEAP_ARRAY_SIZE; ++list_idx) { - heap_t *heap = _memory_heaps[list_idx]; - while (heap) { - heap_t *next_heap = heap->next_heap; - heap->finalize = 1; - _rpmalloc_heap_global_finalize(heap); - heap = next_heap; - } - } - -#if ENABLE_GLOBAL_CACHE - // Free global caches - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) - _rpmalloc_global_cache_finalize(&_memory_span_cache[iclass]); -#endif - -#if (defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD - pthread_key_delete(_memory_thread_heap); -#endif -#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) - FlsFree(fls_key); - fls_key = 0; -#endif -#if ENABLE_STATISTICS - // If you hit these asserts you probably have memory leaks (perhaps global - // scope data doing dynamic allocations) or double frees in your code - rpmalloc_assert(atomic_load32(&_mapped_pages) == 0, "Memory leak detected"); - rpmalloc_assert(atomic_load32(&_mapped_pages_os) == 0, - "Memory leak detected"); -#endif - - _rpmalloc_initialized = 0; -} - -//! Initialize thread, assign heap -extern inline void rpmalloc_thread_initialize(void) { - if (!get_thread_heap_raw()) { - heap_t *heap = _rpmalloc_heap_allocate(0); - if (heap) { - _rpmalloc_stat_inc(&_memory_active_heaps); - set_thread_heap(heap); -#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) - FlsSetValue(fls_key, heap); -#endif - } - } -} - -//! Finalize thread, orphan heap -void rpmalloc_thread_finalize(int release_caches) { - heap_t *heap = get_thread_heap_raw(); - if (heap) - _rpmalloc_heap_release_raw(heap, release_caches); - set_thread_heap(0); -#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) - FlsSetValue(fls_key, 0); -#endif -} - -int rpmalloc_is_thread_initialized(void) { - return (get_thread_heap_raw() != 0) ? 1 : 0; -} - -const rpmalloc_config_t *rpmalloc_config(void) { return &_memory_config; } - -// Extern interface - -extern inline RPMALLOC_ALLOCATOR void *rpmalloc(size_t size) { -#if ENABLE_VALIDATE_ARGS - if (size >= MAX_ALLOC_SIZE) { - errno = EINVAL; - return 0; - } -#endif - heap_t *heap = get_thread_heap(); - return _rpmalloc_allocate(heap, size); -} - -extern inline void rpfree(void *ptr) { _rpmalloc_deallocate(ptr); } - -extern inline RPMALLOC_ALLOCATOR void *rpcalloc(size_t num, size_t size) { - size_t total; -#if ENABLE_VALIDATE_ARGS -#if PLATFORM_WINDOWS - int err = SizeTMult(num, size, &total); - if ((err != S_OK) || (total >= MAX_ALLOC_SIZE)) { - errno = EINVAL; - return 0; - } -#else - int err = __builtin_umull_overflow(num, size, &total); - if (err || (total >= MAX_ALLOC_SIZE)) { - errno = EINVAL; - return 0; - } -#endif -#else - total = num * size; -#endif - heap_t *heap = get_thread_heap(); - void *block = _rpmalloc_allocate(heap, total); - if (block) - memset(block, 0, total); - return block; -} - -extern inline RPMALLOC_ALLOCATOR void *rprealloc(void *ptr, size_t size) { -#if ENABLE_VALIDATE_ARGS - if (size >= MAX_ALLOC_SIZE) { - errno = EINVAL; - return ptr; - } -#endif - heap_t *heap = get_thread_heap(); - return _rpmalloc_reallocate(heap, ptr, size, 0, 0); -} - -extern RPMALLOC_ALLOCATOR void *rpaligned_realloc(void *ptr, size_t alignment, - size_t size, size_t oldsize, - unsigned int flags) { -#if ENABLE_VALIDATE_ARGS - if ((size + alignment < size) || (alignment > _memory_page_size)) { - errno = EINVAL; - return 0; - } -#endif - heap_t *heap = get_thread_heap(); - return _rpmalloc_aligned_reallocate(heap, ptr, alignment, size, oldsize, - flags); -} - -extern RPMALLOC_ALLOCATOR void *rpaligned_alloc(size_t alignment, size_t size) { - heap_t *heap = get_thread_heap(); - return _rpmalloc_aligned_allocate(heap, alignment, size); -} - -extern inline RPMALLOC_ALLOCATOR void * -rpaligned_calloc(size_t alignment, size_t num, size_t size) { - size_t total; -#if ENABLE_VALIDATE_ARGS -#if PLATFORM_WINDOWS - int err = SizeTMult(num, size, &total); - if ((err != S_OK) || (total >= MAX_ALLOC_SIZE)) { - errno = EINVAL; - return 0; - } -#else - int err = __builtin_umull_overflow(num, size, &total); - if (err || (total >= MAX_ALLOC_SIZE)) { - errno = EINVAL; - return 0; - } -#endif -#else - total = num * size; -#endif - void *block = rpaligned_alloc(alignment, total); - if (block) - memset(block, 0, total); - return block; -} - -extern inline RPMALLOC_ALLOCATOR void *rpmemalign(size_t alignment, - size_t size) { - return rpaligned_alloc(alignment, size); -} - -extern inline int rpposix_memalign(void **memptr, size_t alignment, - size_t size) { - if (memptr) - *memptr = rpaligned_alloc(alignment, size); - else - return EINVAL; - return *memptr ? 0 : ENOMEM; -} - -extern inline size_t rpmalloc_usable_size(void *ptr) { - return (ptr ? _rpmalloc_usable_size(ptr) : 0); -} - -extern inline void rpmalloc_thread_collect(void) {} - -void rpmalloc_thread_statistics(rpmalloc_thread_statistics_t *stats) { - memset(stats, 0, sizeof(rpmalloc_thread_statistics_t)); - heap_t *heap = get_thread_heap_raw(); - if (!heap) - return; - - for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { - size_class_t *size_class = _memory_size_class + iclass; - span_t *span = heap->size_class[iclass].partial_span; - while (span) { - size_t free_count = span->list_size; - size_t block_count = size_class->block_count; - if (span->free_list_limit < block_count) - block_count = span->free_list_limit; - free_count += (block_count - span->used_count); - stats->sizecache += free_count * size_class->block_size; - span = span->next; - } - } - -#if ENABLE_THREAD_CACHE - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - span_cache_t *span_cache; - if (!iclass) - span_cache = &heap->span_cache; - else - span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); - stats->spancache += span_cache->count * (iclass + 1) * _memory_span_size; - } -#endif - - span_t *deferred = (span_t *)atomic_load_ptr(&heap->span_free_deferred); - while (deferred) { - if (deferred->size_class != SIZE_CLASS_HUGE) - stats->spancache += (size_t)deferred->span_count * _memory_span_size; - deferred = (span_t *)deferred->free_list; - } - -#if ENABLE_STATISTICS - stats->thread_to_global = (size_t)atomic_load64(&heap->thread_to_global); - stats->global_to_thread = (size_t)atomic_load64(&heap->global_to_thread); - - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - stats->span_use[iclass].current = - (size_t)atomic_load32(&heap->span_use[iclass].current); - stats->span_use[iclass].peak = - (size_t)atomic_load32(&heap->span_use[iclass].high); - stats->span_use[iclass].to_global = - (size_t)atomic_load32(&heap->span_use[iclass].spans_to_global); - stats->span_use[iclass].from_global = - (size_t)atomic_load32(&heap->span_use[iclass].spans_from_global); - stats->span_use[iclass].to_cache = - (size_t)atomic_load32(&heap->span_use[iclass].spans_to_cache); - stats->span_use[iclass].from_cache = - (size_t)atomic_load32(&heap->span_use[iclass].spans_from_cache); - stats->span_use[iclass].to_reserved = - (size_t)atomic_load32(&heap->span_use[iclass].spans_to_reserved); - stats->span_use[iclass].from_reserved = - (size_t)atomic_load32(&heap->span_use[iclass].spans_from_reserved); - stats->span_use[iclass].map_calls = - (size_t)atomic_load32(&heap->span_use[iclass].spans_map_calls); - } - for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { - stats->size_use[iclass].alloc_current = - (size_t)atomic_load32(&heap->size_class_use[iclass].alloc_current); - stats->size_use[iclass].alloc_peak = - (size_t)heap->size_class_use[iclass].alloc_peak; - stats->size_use[iclass].alloc_total = - (size_t)atomic_load32(&heap->size_class_use[iclass].alloc_total); - stats->size_use[iclass].free_total = - (size_t)atomic_load32(&heap->size_class_use[iclass].free_total); - stats->size_use[iclass].spans_to_cache = - (size_t)atomic_load32(&heap->size_class_use[iclass].spans_to_cache); - stats->size_use[iclass].spans_from_cache = - (size_t)atomic_load32(&heap->size_class_use[iclass].spans_from_cache); - stats->size_use[iclass].spans_from_reserved = (size_t)atomic_load32( - &heap->size_class_use[iclass].spans_from_reserved); - stats->size_use[iclass].map_calls = - (size_t)atomic_load32(&heap->size_class_use[iclass].spans_map_calls); - } -#endif -} - -void rpmalloc_global_statistics(rpmalloc_global_statistics_t *stats) { - memset(stats, 0, sizeof(rpmalloc_global_statistics_t)); -#if ENABLE_STATISTICS - stats->mapped = (size_t)atomic_load32(&_mapped_pages) * _memory_page_size; - stats->mapped_peak = (size_t)_mapped_pages_peak * _memory_page_size; - stats->mapped_total = - (size_t)atomic_load32(&_mapped_total) * _memory_page_size; - stats->unmapped_total = - (size_t)atomic_load32(&_unmapped_total) * _memory_page_size; - stats->huge_alloc = - (size_t)atomic_load32(&_huge_pages_current) * _memory_page_size; - stats->huge_alloc_peak = (size_t)_huge_pages_peak * _memory_page_size; -#endif -#if ENABLE_GLOBAL_CACHE - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - global_cache_t *cache = &_memory_span_cache[iclass]; - while (!atomic_cas32_acquire(&cache->lock, 1, 0)) - _rpmalloc_spin(); - uint32_t count = cache->count; -#if ENABLE_UNLIMITED_CACHE - span_t *current_span = cache->overflow; - while (current_span) { - ++count; - current_span = current_span->next; - } -#endif - atomic_store32_release(&cache->lock, 0); - stats->cached += count * (iclass + 1) * _memory_span_size; - } -#endif -} - -#if ENABLE_STATISTICS - -static void _memory_heap_dump_statistics(heap_t *heap, void *file) { - fprintf(file, "Heap %d stats:\n", heap->id); - fprintf(file, "Class CurAlloc PeakAlloc TotAlloc TotFree BlkSize " - "BlkCount SpansCur SpansPeak PeakAllocMiB ToCacheMiB " - "FromCacheMiB FromReserveMiB MmapCalls\n"); - for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { - if (!atomic_load32(&heap->size_class_use[iclass].alloc_total)) - continue; - fprintf( - file, - "%3u: %10u %10u %10u %10u %8u %8u %8d %9d %13zu %11zu %12zu %14zu " - "%9u\n", - (uint32_t)iclass, - atomic_load32(&heap->size_class_use[iclass].alloc_current), - heap->size_class_use[iclass].alloc_peak, - atomic_load32(&heap->size_class_use[iclass].alloc_total), - atomic_load32(&heap->size_class_use[iclass].free_total), - _memory_size_class[iclass].block_size, - _memory_size_class[iclass].block_count, - atomic_load32(&heap->size_class_use[iclass].spans_current), - heap->size_class_use[iclass].spans_peak, - ((size_t)heap->size_class_use[iclass].alloc_peak * - (size_t)_memory_size_class[iclass].block_size) / - (size_t)(1024 * 1024), - ((size_t)atomic_load32(&heap->size_class_use[iclass].spans_to_cache) * - _memory_span_size) / - (size_t)(1024 * 1024), - ((size_t)atomic_load32(&heap->size_class_use[iclass].spans_from_cache) * - _memory_span_size) / - (size_t)(1024 * 1024), - ((size_t)atomic_load32( - &heap->size_class_use[iclass].spans_from_reserved) * - _memory_span_size) / - (size_t)(1024 * 1024), - atomic_load32(&heap->size_class_use[iclass].spans_map_calls)); - } - fprintf(file, "Spans Current Peak Deferred PeakMiB Cached ToCacheMiB " - "FromCacheMiB ToReserveMiB FromReserveMiB ToGlobalMiB " - "FromGlobalMiB MmapCalls\n"); - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - if (!atomic_load32(&heap->span_use[iclass].high) && - !atomic_load32(&heap->span_use[iclass].spans_map_calls)) - continue; - fprintf( - file, - "%4u: %8d %8u %8u %8zu %7u %11zu %12zu %12zu %14zu %11zu %13zu %10u\n", - (uint32_t)(iclass + 1), atomic_load32(&heap->span_use[iclass].current), - atomic_load32(&heap->span_use[iclass].high), - atomic_load32(&heap->span_use[iclass].spans_deferred), - ((size_t)atomic_load32(&heap->span_use[iclass].high) * - (size_t)_memory_span_size * (iclass + 1)) / - (size_t)(1024 * 1024), -#if ENABLE_THREAD_CACHE - (unsigned int)(!iclass ? heap->span_cache.count - : heap->span_large_cache[iclass - 1].count), - ((size_t)atomic_load32(&heap->span_use[iclass].spans_to_cache) * - (iclass + 1) * _memory_span_size) / - (size_t)(1024 * 1024), - ((size_t)atomic_load32(&heap->span_use[iclass].spans_from_cache) * - (iclass + 1) * _memory_span_size) / - (size_t)(1024 * 1024), -#else - 0, (size_t)0, (size_t)0, -#endif - ((size_t)atomic_load32(&heap->span_use[iclass].spans_to_reserved) * - (iclass + 1) * _memory_span_size) / - (size_t)(1024 * 1024), - ((size_t)atomic_load32(&heap->span_use[iclass].spans_from_reserved) * - (iclass + 1) * _memory_span_size) / - (size_t)(1024 * 1024), - ((size_t)atomic_load32(&heap->span_use[iclass].spans_to_global) * - (size_t)_memory_span_size * (iclass + 1)) / - (size_t)(1024 * 1024), - ((size_t)atomic_load32(&heap->span_use[iclass].spans_from_global) * - (size_t)_memory_span_size * (iclass + 1)) / - (size_t)(1024 * 1024), - atomic_load32(&heap->span_use[iclass].spans_map_calls)); - } - fprintf(file, "Full spans: %zu\n", heap->full_span_count); - fprintf(file, "ThreadToGlobalMiB GlobalToThreadMiB\n"); - fprintf( - file, "%17zu %17zu\n", - (size_t)atomic_load64(&heap->thread_to_global) / (size_t)(1024 * 1024), - (size_t)atomic_load64(&heap->global_to_thread) / (size_t)(1024 * 1024)); -} - -#endif - -void rpmalloc_dump_statistics(void *file) { -#if ENABLE_STATISTICS - for (size_t list_idx = 0; list_idx < HEAP_ARRAY_SIZE; ++list_idx) { - heap_t *heap = _memory_heaps[list_idx]; - while (heap) { - int need_dump = 0; - for (size_t iclass = 0; !need_dump && (iclass < SIZE_CLASS_COUNT); - ++iclass) { - if (!atomic_load32(&heap->size_class_use[iclass].alloc_total)) { - rpmalloc_assert( - !atomic_load32(&heap->size_class_use[iclass].free_total), - "Heap statistics counter mismatch"); - rpmalloc_assert( - !atomic_load32(&heap->size_class_use[iclass].spans_map_calls), - "Heap statistics counter mismatch"); - continue; - } - need_dump = 1; - } - for (size_t iclass = 0; !need_dump && (iclass < LARGE_CLASS_COUNT); - ++iclass) { - if (!atomic_load32(&heap->span_use[iclass].high) && - !atomic_load32(&heap->span_use[iclass].spans_map_calls)) - continue; - need_dump = 1; - } - if (need_dump) - _memory_heap_dump_statistics(heap, file); - heap = heap->next_heap; - } - } - fprintf(file, "Global stats:\n"); - size_t huge_current = - (size_t)atomic_load32(&_huge_pages_current) * _memory_page_size; - size_t huge_peak = (size_t)_huge_pages_peak * _memory_page_size; - fprintf(file, "HugeCurrentMiB HugePeakMiB\n"); - fprintf(file, "%14zu %11zu\n", huge_current / (size_t)(1024 * 1024), - huge_peak / (size_t)(1024 * 1024)); - -#if ENABLE_GLOBAL_CACHE - fprintf(file, "GlobalCacheMiB\n"); - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - global_cache_t *cache = _memory_span_cache + iclass; - size_t global_cache = (size_t)cache->count * iclass * _memory_span_size; - - size_t global_overflow_cache = 0; - span_t *span = cache->overflow; - while (span) { - global_overflow_cache += iclass * _memory_span_size; - span = span->next; - } - if (global_cache || global_overflow_cache || cache->insert_count || - cache->extract_count) - fprintf(file, - "%4zu: %8zuMiB (%8zuMiB overflow) %14zu insert %14zu extract\n", - iclass + 1, global_cache / (size_t)(1024 * 1024), - global_overflow_cache / (size_t)(1024 * 1024), - cache->insert_count, cache->extract_count); - } -#endif - - size_t mapped = (size_t)atomic_load32(&_mapped_pages) * _memory_page_size; - size_t mapped_os = - (size_t)atomic_load32(&_mapped_pages_os) * _memory_page_size; - size_t mapped_peak = (size_t)_mapped_pages_peak * _memory_page_size; - size_t mapped_total = - (size_t)atomic_load32(&_mapped_total) * _memory_page_size; - size_t unmapped_total = - (size_t)atomic_load32(&_unmapped_total) * _memory_page_size; - fprintf( - file, - "MappedMiB MappedOSMiB MappedPeakMiB MappedTotalMiB UnmappedTotalMiB\n"); - fprintf(file, "%9zu %11zu %13zu %14zu %16zu\n", - mapped / (size_t)(1024 * 1024), mapped_os / (size_t)(1024 * 1024), - mapped_peak / (size_t)(1024 * 1024), - mapped_total / (size_t)(1024 * 1024), - unmapped_total / (size_t)(1024 * 1024)); - - fprintf(file, "\n"); -#if 0 - int64_t allocated = atomic_load64(&_allocation_counter); - int64_t deallocated = atomic_load64(&_deallocation_counter); - fprintf(file, "Allocation count: %lli\n", allocated); - fprintf(file, "Deallocation count: %lli\n", deallocated); - fprintf(file, "Current allocations: %lli\n", (allocated - deallocated)); - fprintf(file, "Master spans: %d\n", atomic_load32(&_master_spans)); - fprintf(file, "Dangling master spans: %d\n", atomic_load32(&_unmapped_master_spans)); -#endif -#endif - (void)sizeof(file); -} - -#if RPMALLOC_FIRST_CLASS_HEAPS - -extern inline rpmalloc_heap_t *rpmalloc_heap_acquire(void) { - // Must be a pristine heap from newly mapped memory pages, or else memory - // blocks could already be allocated from the heap which would (wrongly) be - // released when heap is cleared with rpmalloc_heap_free_all(). Also heaps - // guaranteed to be pristine from the dedicated orphan list can be used. - heap_t *heap = _rpmalloc_heap_allocate(1); - rpmalloc_assume(heap != NULL); - heap->owner_thread = 0; - _rpmalloc_stat_inc(&_memory_active_heaps); - return heap; -} - -extern inline void rpmalloc_heap_release(rpmalloc_heap_t *heap) { - if (heap) - _rpmalloc_heap_release(heap, 1, 1); -} - -extern inline RPMALLOC_ALLOCATOR void * -rpmalloc_heap_alloc(rpmalloc_heap_t *heap, size_t size) { -#if ENABLE_VALIDATE_ARGS - if (size >= MAX_ALLOC_SIZE) { - errno = EINVAL; - return 0; - } -#endif - return _rpmalloc_allocate(heap, size); -} - -extern inline RPMALLOC_ALLOCATOR void * -rpmalloc_heap_aligned_alloc(rpmalloc_heap_t *heap, size_t alignment, - size_t size) { -#if ENABLE_VALIDATE_ARGS - if (size >= MAX_ALLOC_SIZE) { - errno = EINVAL; - return 0; - } -#endif - return _rpmalloc_aligned_allocate(heap, alignment, size); -} - -extern inline RPMALLOC_ALLOCATOR void * -rpmalloc_heap_calloc(rpmalloc_heap_t *heap, size_t num, size_t size) { - return rpmalloc_heap_aligned_calloc(heap, 0, num, size); -} - -extern inline RPMALLOC_ALLOCATOR void * -rpmalloc_heap_aligned_calloc(rpmalloc_heap_t *heap, size_t alignment, - size_t num, size_t size) { - size_t total; -#if ENABLE_VALIDATE_ARGS -#if PLATFORM_WINDOWS - int err = SizeTMult(num, size, &total); - if ((err != S_OK) || (total >= MAX_ALLOC_SIZE)) { - errno = EINVAL; - return 0; - } -#else - int err = __builtin_umull_overflow(num, size, &total); - if (err || (total >= MAX_ALLOC_SIZE)) { - errno = EINVAL; - return 0; - } -#endif -#else - total = num * size; -#endif - void *block = _rpmalloc_aligned_allocate(heap, alignment, total); - if (block) - memset(block, 0, total); - return block; -} - -extern inline RPMALLOC_ALLOCATOR void * -rpmalloc_heap_realloc(rpmalloc_heap_t *heap, void *ptr, size_t size, - unsigned int flags) { -#if ENABLE_VALIDATE_ARGS - if (size >= MAX_ALLOC_SIZE) { - errno = EINVAL; - return ptr; - } -#endif - return _rpmalloc_reallocate(heap, ptr, size, 0, flags); -} - -extern inline RPMALLOC_ALLOCATOR void * -rpmalloc_heap_aligned_realloc(rpmalloc_heap_t *heap, void *ptr, - size_t alignment, size_t size, - unsigned int flags) { -#if ENABLE_VALIDATE_ARGS - if ((size + alignment < size) || (alignment > _memory_page_size)) { - errno = EINVAL; - return 0; - } -#endif - return _rpmalloc_aligned_reallocate(heap, ptr, alignment, size, 0, flags); -} - -extern inline void rpmalloc_heap_free(rpmalloc_heap_t *heap, void *ptr) { - (void)sizeof(heap); - _rpmalloc_deallocate(ptr); -} - -extern inline void rpmalloc_heap_free_all(rpmalloc_heap_t *heap) { - span_t *span; - span_t *next_span; - - _rpmalloc_heap_cache_adopt_deferred(heap, 0); - - for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { - span = heap->size_class[iclass].partial_span; - while (span) { - next_span = span->next; - _rpmalloc_heap_cache_insert(heap, span); - span = next_span; - } - heap->size_class[iclass].partial_span = 0; - span = heap->full_span[iclass]; - while (span) { - next_span = span->next; - _rpmalloc_heap_cache_insert(heap, span); - span = next_span; - } - - span = heap->size_class[iclass].cache; - if (span) - _rpmalloc_heap_cache_insert(heap, span); - heap->size_class[iclass].cache = 0; - } - memset(heap->size_class, 0, sizeof(heap->size_class)); - memset(heap->full_span, 0, sizeof(heap->full_span)); - - span = heap->large_huge_span; - while (span) { - next_span = span->next; - if (UNEXPECTED(span->size_class == SIZE_CLASS_HUGE)) - _rpmalloc_deallocate_huge(span); - else - _rpmalloc_heap_cache_insert(heap, span); - span = next_span; - } - heap->large_huge_span = 0; - heap->full_span_count = 0; - -#if ENABLE_THREAD_CACHE - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - span_cache_t *span_cache; - if (!iclass) - span_cache = &heap->span_cache; - else - span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); - if (!span_cache->count) - continue; -#if ENABLE_GLOBAL_CACHE - _rpmalloc_stat_add64(&heap->thread_to_global, - span_cache->count * (iclass + 1) * _memory_span_size); - _rpmalloc_stat_add(&heap->span_use[iclass].spans_to_global, - span_cache->count); - _rpmalloc_global_cache_insert_spans(span_cache->span, iclass + 1, - span_cache->count); -#else - for (size_t ispan = 0; ispan < span_cache->count; ++ispan) - _rpmalloc_span_unmap(span_cache->span[ispan]); -#endif - span_cache->count = 0; - } -#endif - -#if ENABLE_STATISTICS - for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { - atomic_store32(&heap->size_class_use[iclass].alloc_current, 0); - atomic_store32(&heap->size_class_use[iclass].spans_current, 0); - } - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - atomic_store32(&heap->span_use[iclass].current, 0); - } -#endif -} - -extern inline void rpmalloc_heap_thread_set_current(rpmalloc_heap_t *heap) { - heap_t *prev_heap = get_thread_heap_raw(); - if (prev_heap != heap) { - set_thread_heap(heap); - if (prev_heap) - rpmalloc_heap_release(prev_heap); - } -} - -extern inline rpmalloc_heap_t *rpmalloc_get_heap_for_ptr(void *ptr) { - // Grab the span, and then the heap from the span - span_t *span = (span_t *)((uintptr_t)ptr & _memory_span_mask); - if (span) { - return span->heap; - } - return 0; -} - -#endif - -#if ENABLE_PRELOAD || ENABLE_OVERRIDE - -#include "malloc.c" - -#endif - -void rpmalloc_linker_reference(void) { (void)sizeof(_rpmalloc_initialized); } +//===---------------------- rpmalloc.c ------------------*- C -*-=============// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This library provides a cross-platform lock free thread caching malloc +// implementation in C11. +// +//===----------------------------------------------------------------------===// + +#include "rpmalloc.h" + +//////////// +/// +/// Build time configurable limits +/// +////// + +#if defined(__clang__) +#pragma clang diagnostic ignored "-Wunused-macros" +#pragma clang diagnostic ignored "-Wunused-function" +#if __has_warning("-Wreserved-identifier") +#pragma clang diagnostic ignored "-Wreserved-identifier" +#endif +#if __has_warning("-Wstatic-in-inline") +#pragma clang diagnostic ignored "-Wstatic-in-inline" +#endif +#elif defined(__GNUC__) +#pragma GCC diagnostic ignored "-Wunused-macros" +#pragma GCC diagnostic ignored "-Wunused-function" +#endif + +#if !defined(__has_builtin) +#define __has_builtin(b) 0 +#endif + +#if defined(__GNUC__) || defined(__clang__) + +#if __has_builtin(__builtin_memcpy_inline) +#define _rpmalloc_memcpy_const(x, y, s) __builtin_memcpy_inline(x, y, s) +#else +#define _rpmalloc_memcpy_const(x, y, s) \ + do { \ + _Static_assert(__builtin_choose_expr(__builtin_constant_p(s), 1, 0), \ + "len must be a constant integer"); \ + memcpy(x, y, s); \ + } while (0) +#endif + +#if __has_builtin(__builtin_memset_inline) +#define _rpmalloc_memset_const(x, y, s) __builtin_memset_inline(x, y, s) +#else +#define _rpmalloc_memset_const(x, y, s) \ + do { \ + _Static_assert(__builtin_choose_expr(__builtin_constant_p(s), 1, 0), \ + "len must be a constant integer"); \ + memset(x, y, s); \ + } while (0) +#endif +#else +#define _rpmalloc_memcpy_const(x, y, s) memcpy(x, y, s) +#define _rpmalloc_memset_const(x, y, s) memset(x, y, s) +#endif + +#if __has_builtin(__builtin_assume) +#define rpmalloc_assume(cond) __builtin_assume(cond) +#elif defined(__GNUC__) +#define rpmalloc_assume(cond) \ + do { \ + if (!__builtin_expect(cond, 0)) \ + __builtin_unreachable(); \ + } while (0) +#elif defined(_MSC_VER) +#define rpmalloc_assume(cond) __assume(cond) +#else +#define rpmalloc_assume(cond) 0 +#endif + +#ifndef HEAP_ARRAY_SIZE +//! Size of heap hashmap +#define HEAP_ARRAY_SIZE 47 +#endif +#ifndef ENABLE_THREAD_CACHE +//! Enable per-thread cache +#define ENABLE_THREAD_CACHE 1 +#endif +#ifndef ENABLE_GLOBAL_CACHE +//! Enable global cache shared between all threads, requires thread cache +#define ENABLE_GLOBAL_CACHE 1 +#endif +#ifndef ENABLE_VALIDATE_ARGS +//! Enable validation of args to public entry points +#define ENABLE_VALIDATE_ARGS 0 +#endif +#ifndef ENABLE_STATISTICS +//! Enable statistics collection +#define ENABLE_STATISTICS 0 +#endif +#ifndef ENABLE_ASSERTS +//! Enable asserts +#define ENABLE_ASSERTS 0 +#endif +#ifndef ENABLE_OVERRIDE +//! Override standard library malloc/free and new/delete entry points +#define ENABLE_OVERRIDE 0 +#endif +#ifndef ENABLE_PRELOAD +//! Support preloading +#define ENABLE_PRELOAD 0 +#endif +#ifndef DISABLE_UNMAP +//! Disable unmapping memory pages (also enables unlimited cache) +#define DISABLE_UNMAP 0 +#endif +#ifndef ENABLE_UNLIMITED_CACHE +//! Enable unlimited global cache (no unmapping until finalization) +#define ENABLE_UNLIMITED_CACHE 0 +#endif +#ifndef ENABLE_ADAPTIVE_THREAD_CACHE +//! Enable adaptive thread cache size based on use heuristics +#define ENABLE_ADAPTIVE_THREAD_CACHE 0 +#endif +#ifndef DEFAULT_SPAN_MAP_COUNT +//! Default number of spans to map in call to map more virtual memory (default +//! values yield 4MiB here) +#define DEFAULT_SPAN_MAP_COUNT 64 +#endif +#ifndef GLOBAL_CACHE_MULTIPLIER +//! Multiplier for global cache +#define GLOBAL_CACHE_MULTIPLIER 8 +#endif + +#if DISABLE_UNMAP && !ENABLE_GLOBAL_CACHE +#error Must use global cache if unmap is disabled +#endif + +#if DISABLE_UNMAP +#undef ENABLE_UNLIMITED_CACHE +#define ENABLE_UNLIMITED_CACHE 1 +#endif + +#if !ENABLE_GLOBAL_CACHE +#undef ENABLE_UNLIMITED_CACHE +#define ENABLE_UNLIMITED_CACHE 0 +#endif + +#if !ENABLE_THREAD_CACHE +#undef ENABLE_ADAPTIVE_THREAD_CACHE +#define ENABLE_ADAPTIVE_THREAD_CACHE 0 +#endif + +#if defined(_WIN32) || defined(__WIN32__) || defined(_WIN64) +#define PLATFORM_WINDOWS 1 +#define PLATFORM_POSIX 0 +#else +#define PLATFORM_WINDOWS 0 +#define PLATFORM_POSIX 1 +#endif + +/// Platform and arch specifics +#if defined(_MSC_VER) && !defined(__clang__) +#pragma warning(disable : 5105) +#ifndef FORCEINLINE +#define FORCEINLINE inline __forceinline +#endif +#define _Static_assert static_assert +#else +#ifndef FORCEINLINE +#define FORCEINLINE inline __attribute__((__always_inline__)) +#endif +#endif +#if PLATFORM_WINDOWS +#ifndef WIN32_LEAN_AND_MEAN +#define WIN32_LEAN_AND_MEAN +#endif +#include +#if ENABLE_VALIDATE_ARGS +#include +#endif +#else +#include +#include +#include +#include +#if defined(__linux__) || defined(__ANDROID__) +#include +#if !defined(PR_SET_VMA) +#define PR_SET_VMA 0x53564d41 +#define PR_SET_VMA_ANON_NAME 0 +#endif +#endif +#if defined(__APPLE__) +#include +#if !TARGET_OS_IPHONE && !TARGET_OS_SIMULATOR +#include +#include +#endif +#include +#endif +#if defined(__HAIKU__) || defined(__TINYC__) +#include +#endif +#endif + +#include +#include +#include + +#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) +#include +static DWORD fls_key; +#endif + +#if PLATFORM_POSIX +#include +#include +#ifdef __FreeBSD__ +#include +#define MAP_HUGETLB MAP_ALIGNED_SUPER +#ifndef PROT_MAX +#define PROT_MAX(f) 0 +#endif +#else +#define PROT_MAX(f) 0 +#endif +#ifdef __sun +extern int madvise(caddr_t, size_t, int); +#endif +#ifndef MAP_UNINITIALIZED +#define MAP_UNINITIALIZED 0 +#endif +#endif +#include + +#if ENABLE_ASSERTS +#undef NDEBUG +#if defined(_MSC_VER) && !defined(_DEBUG) +#define _DEBUG +#endif +#include +#define RPMALLOC_TOSTRING_M(x) #x +#define RPMALLOC_TOSTRING(x) RPMALLOC_TOSTRING_M(x) +#define rpmalloc_assert(truth, message) \ + do { \ + if (!(truth)) { \ + if (_memory_config.error_callback) { \ + _memory_config.error_callback(message " (" RPMALLOC_TOSTRING( \ + truth) ") at " __FILE__ ":" RPMALLOC_TOSTRING(__LINE__)); \ + } else { \ + assert((truth) && message); \ + } \ + } \ + } while (0) +#else +#define rpmalloc_assert(truth, message) \ + do { \ + } while (0) +#endif +#if ENABLE_STATISTICS +#include +#endif + +////// +/// +/// Atomic access abstraction (since MSVC does not do C11 yet) +/// +////// + +#if defined(_MSC_VER) && !defined(__clang__) + +typedef volatile long atomic32_t; +typedef volatile long long atomic64_t; +typedef volatile void *atomicptr_t; + +static FORCEINLINE int32_t atomic_load32(atomic32_t *src) { return *src; } +static FORCEINLINE void atomic_store32(atomic32_t *dst, int32_t val) { + *dst = val; +} +static FORCEINLINE int32_t atomic_incr32(atomic32_t *val) { + return (int32_t)InterlockedIncrement(val); +} +static FORCEINLINE int32_t atomic_decr32(atomic32_t *val) { + return (int32_t)InterlockedDecrement(val); +} +static FORCEINLINE int32_t atomic_add32(atomic32_t *val, int32_t add) { + return (int32_t)InterlockedExchangeAdd(val, add) + add; +} +static FORCEINLINE int atomic_cas32_acquire(atomic32_t *dst, int32_t val, + int32_t ref) { + return (InterlockedCompareExchange(dst, val, ref) == ref) ? 1 : 0; +} +static FORCEINLINE void atomic_store32_release(atomic32_t *dst, int32_t val) { + *dst = val; +} +static FORCEINLINE int64_t atomic_load64(atomic64_t *src) { return *src; } +static FORCEINLINE int64_t atomic_add64(atomic64_t *val, int64_t add) { + return (int64_t)InterlockedExchangeAdd64(val, add) + add; +} +static FORCEINLINE void *atomic_load_ptr(atomicptr_t *src) { + return (void *)*src; +} +static FORCEINLINE void atomic_store_ptr(atomicptr_t *dst, void *val) { + *dst = val; +} +static FORCEINLINE void atomic_store_ptr_release(atomicptr_t *dst, void *val) { + *dst = val; +} +static FORCEINLINE void *atomic_exchange_ptr_acquire(atomicptr_t *dst, + void *val) { + return (void *)InterlockedExchangePointer((void *volatile *)dst, val); +} +static FORCEINLINE int atomic_cas_ptr(atomicptr_t *dst, void *val, void *ref) { + return (InterlockedCompareExchangePointer((void *volatile *)dst, val, ref) == + ref) + ? 1 + : 0; +} + +#define EXPECTED(x) (x) +#define UNEXPECTED(x) (x) + +#else + +#include + +typedef volatile _Atomic(int32_t) atomic32_t; +typedef volatile _Atomic(int64_t) atomic64_t; +typedef volatile _Atomic(void *) atomicptr_t; + +static FORCEINLINE int32_t atomic_load32(atomic32_t *src) { + return atomic_load_explicit(src, memory_order_relaxed); +} +static FORCEINLINE void atomic_store32(atomic32_t *dst, int32_t val) { + atomic_store_explicit(dst, val, memory_order_relaxed); +} +static FORCEINLINE int32_t atomic_incr32(atomic32_t *val) { + return atomic_fetch_add_explicit(val, 1, memory_order_relaxed) + 1; +} +static FORCEINLINE int32_t atomic_decr32(atomic32_t *val) { + return atomic_fetch_add_explicit(val, -1, memory_order_relaxed) - 1; +} +static FORCEINLINE int32_t atomic_add32(atomic32_t *val, int32_t add) { + return atomic_fetch_add_explicit(val, add, memory_order_relaxed) + add; +} +static FORCEINLINE int atomic_cas32_acquire(atomic32_t *dst, int32_t val, + int32_t ref) { + return atomic_compare_exchange_weak_explicit( + dst, &ref, val, memory_order_acquire, memory_order_relaxed); +} +static FORCEINLINE void atomic_store32_release(atomic32_t *dst, int32_t val) { + atomic_store_explicit(dst, val, memory_order_release); +} +static FORCEINLINE int64_t atomic_load64(atomic64_t *val) { + return atomic_load_explicit(val, memory_order_relaxed); +} +static FORCEINLINE int64_t atomic_add64(atomic64_t *val, int64_t add) { + return atomic_fetch_add_explicit(val, add, memory_order_relaxed) + add; +} +static FORCEINLINE void *atomic_load_ptr(atomicptr_t *src) { + return atomic_load_explicit(src, memory_order_relaxed); +} +static FORCEINLINE void atomic_store_ptr(atomicptr_t *dst, void *val) { + atomic_store_explicit(dst, val, memory_order_relaxed); +} +static FORCEINLINE void atomic_store_ptr_release(atomicptr_t *dst, void *val) { + atomic_store_explicit(dst, val, memory_order_release); +} +static FORCEINLINE void *atomic_exchange_ptr_acquire(atomicptr_t *dst, + void *val) { + return atomic_exchange_explicit(dst, val, memory_order_acquire); +} +static FORCEINLINE int atomic_cas_ptr(atomicptr_t *dst, void *val, void *ref) { + return atomic_compare_exchange_weak_explicit( + dst, &ref, val, memory_order_relaxed, memory_order_relaxed); +} + +#define EXPECTED(x) __builtin_expect((x), 1) +#define UNEXPECTED(x) __builtin_expect((x), 0) + +#endif + +//////////// +/// +/// Statistics related functions (evaluate to nothing when statistics not +/// enabled) +/// +////// + +#if ENABLE_STATISTICS +#define _rpmalloc_stat_inc(counter) atomic_incr32(counter) +#define _rpmalloc_stat_dec(counter) atomic_decr32(counter) +#define _rpmalloc_stat_add(counter, value) \ + atomic_add32(counter, (int32_t)(value)) +#define _rpmalloc_stat_add64(counter, value) \ + atomic_add64(counter, (int64_t)(value)) +#define _rpmalloc_stat_add_peak(counter, value, peak) \ + do { \ + int32_t _cur_count = atomic_add32(counter, (int32_t)(value)); \ + if (_cur_count > (peak)) \ + peak = _cur_count; \ + } while (0) +#define _rpmalloc_stat_sub(counter, value) \ + atomic_add32(counter, -(int32_t)(value)) +#define _rpmalloc_stat_inc_alloc(heap, class_idx) \ + do { \ + int32_t alloc_current = \ + atomic_incr32(&heap->size_class_use[class_idx].alloc_current); \ + if (alloc_current > heap->size_class_use[class_idx].alloc_peak) \ + heap->size_class_use[class_idx].alloc_peak = alloc_current; \ + atomic_incr32(&heap->size_class_use[class_idx].alloc_total); \ + } while (0) +#define _rpmalloc_stat_inc_free(heap, class_idx) \ + do { \ + atomic_decr32(&heap->size_class_use[class_idx].alloc_current); \ + atomic_incr32(&heap->size_class_use[class_idx].free_total); \ + } while (0) +#else +#define _rpmalloc_stat_inc(counter) \ + do { \ + } while (0) +#define _rpmalloc_stat_dec(counter) \ + do { \ + } while (0) +#define _rpmalloc_stat_add(counter, value) \ + do { \ + } while (0) +#define _rpmalloc_stat_add64(counter, value) \ + do { \ + } while (0) +#define _rpmalloc_stat_add_peak(counter, value, peak) \ + do { \ + } while (0) +#define _rpmalloc_stat_sub(counter, value) \ + do { \ + } while (0) +#define _rpmalloc_stat_inc_alloc(heap, class_idx) \ + do { \ + } while (0) +#define _rpmalloc_stat_inc_free(heap, class_idx) \ + do { \ + } while (0) +#endif + +/// +/// Preconfigured limits and sizes +/// + +//! Granularity of a small allocation block (must be power of two) +#define SMALL_GRANULARITY 16 +//! Small granularity shift count +#define SMALL_GRANULARITY_SHIFT 4 +//! Number of small block size classes +#define SMALL_CLASS_COUNT 65 +//! Maximum size of a small block +#define SMALL_SIZE_LIMIT (SMALL_GRANULARITY * (SMALL_CLASS_COUNT - 1)) +//! Granularity of a medium allocation block +#define MEDIUM_GRANULARITY 512 +//! Medium granularity shift count +#define MEDIUM_GRANULARITY_SHIFT 9 +//! Number of medium block size classes +#define MEDIUM_CLASS_COUNT 61 +//! Total number of small + medium size classes +#define SIZE_CLASS_COUNT (SMALL_CLASS_COUNT + MEDIUM_CLASS_COUNT) +//! Number of large block size classes +#define LARGE_CLASS_COUNT 63 +//! Maximum size of a medium block +#define MEDIUM_SIZE_LIMIT \ + (SMALL_SIZE_LIMIT + (MEDIUM_GRANULARITY * MEDIUM_CLASS_COUNT)) +//! Maximum size of a large block +#define LARGE_SIZE_LIMIT \ + ((LARGE_CLASS_COUNT * _memory_span_size) - SPAN_HEADER_SIZE) +//! Size of a span header (must be a multiple of SMALL_GRANULARITY and a power +//! of two) +#define SPAN_HEADER_SIZE 128 +//! Number of spans in thread cache +#define MAX_THREAD_SPAN_CACHE 400 +//! Number of spans to transfer between thread and global cache +#define THREAD_SPAN_CACHE_TRANSFER 64 +//! Number of spans in thread cache for large spans (must be greater than +//! LARGE_CLASS_COUNT / 2) +#define MAX_THREAD_SPAN_LARGE_CACHE 100 +//! Number of spans to transfer between thread and global cache for large spans +#define THREAD_SPAN_LARGE_CACHE_TRANSFER 6 + +_Static_assert((SMALL_GRANULARITY & (SMALL_GRANULARITY - 1)) == 0, + "Small granularity must be power of two"); +_Static_assert((SPAN_HEADER_SIZE & (SPAN_HEADER_SIZE - 1)) == 0, + "Span header size must be power of two"); + +#if ENABLE_VALIDATE_ARGS +//! Maximum allocation size to avoid integer overflow +#undef MAX_ALLOC_SIZE +#define MAX_ALLOC_SIZE (((size_t) - 1) - _memory_span_size) +#endif + +#define pointer_offset(ptr, ofs) (void *)((char *)(ptr) + (ptrdiff_t)(ofs)) +#define pointer_diff(first, second) \ + (ptrdiff_t)((const char *)(first) - (const char *)(second)) + +#define INVALID_POINTER ((void *)((uintptr_t) - 1)) + +#define SIZE_CLASS_LARGE SIZE_CLASS_COUNT +#define SIZE_CLASS_HUGE ((uint32_t) - 1) + +//////////// +/// +/// Data types +/// +////// + +//! A memory heap, per thread +typedef struct heap_t heap_t; +//! Span of memory pages +typedef struct span_t span_t; +//! Span list +typedef struct span_list_t span_list_t; +//! Span active data +typedef struct span_active_t span_active_t; +//! Size class definition +typedef struct size_class_t size_class_t; +//! Global cache +typedef struct global_cache_t global_cache_t; + +//! Flag indicating span is the first (master) span of a split superspan +#define SPAN_FLAG_MASTER 1U +//! Flag indicating span is a secondary (sub) span of a split superspan +#define SPAN_FLAG_SUBSPAN 2U +//! Flag indicating span has blocks with increased alignment +#define SPAN_FLAG_ALIGNED_BLOCKS 4U +//! Flag indicating an unmapped master span +#define SPAN_FLAG_UNMAPPED_MASTER 8U + +#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS +struct span_use_t { + //! Current number of spans used (actually used, not in cache) + atomic32_t current; + //! High water mark of spans used + atomic32_t high; +#if ENABLE_STATISTICS + //! Number of spans in deferred list + atomic32_t spans_deferred; + //! Number of spans transitioned to global cache + atomic32_t spans_to_global; + //! Number of spans transitioned from global cache + atomic32_t spans_from_global; + //! Number of spans transitioned to thread cache + atomic32_t spans_to_cache; + //! Number of spans transitioned from thread cache + atomic32_t spans_from_cache; + //! Number of spans transitioned to reserved state + atomic32_t spans_to_reserved; + //! Number of spans transitioned from reserved state + atomic32_t spans_from_reserved; + //! Number of raw memory map calls + atomic32_t spans_map_calls; +#endif +}; +typedef struct span_use_t span_use_t; +#endif + +#if ENABLE_STATISTICS +struct size_class_use_t { + //! Current number of allocations + atomic32_t alloc_current; + //! Peak number of allocations + int32_t alloc_peak; + //! Total number of allocations + atomic32_t alloc_total; + //! Total number of frees + atomic32_t free_total; + //! Number of spans in use + atomic32_t spans_current; + //! Number of spans transitioned to cache + int32_t spans_peak; + //! Number of spans transitioned to cache + atomic32_t spans_to_cache; + //! Number of spans transitioned from cache + atomic32_t spans_from_cache; + //! Number of spans transitioned from reserved state + atomic32_t spans_from_reserved; + //! Number of spans mapped + atomic32_t spans_map_calls; + int32_t unused; +}; +typedef struct size_class_use_t size_class_use_t; +#endif + +// A span can either represent a single span of memory pages with size declared +// by span_map_count configuration variable, or a set of spans in a continuous +// region, a super span. Any reference to the term "span" usually refers to both +// a single span or a super span. A super span can further be divided into +// multiple spans (or this, super spans), where the first (super)span is the +// master and subsequent (super)spans are subspans. The master span keeps track +// of how many subspans that are still alive and mapped in virtual memory, and +// once all subspans and master have been unmapped the entire superspan region +// is released and unmapped (on Windows for example, the entire superspan range +// has to be released in the same call to release the virtual memory range, but +// individual subranges can be decommitted individually to reduce physical +// memory use). +struct span_t { + //! Free list + void *free_list; + //! Total block count of size class + uint32_t block_count; + //! Size class + uint32_t size_class; + //! Index of last block initialized in free list + uint32_t free_list_limit; + //! Number of used blocks remaining when in partial state + uint32_t used_count; + //! Deferred free list + atomicptr_t free_list_deferred; + //! Size of deferred free list, or list of spans when part of a cache list + uint32_t list_size; + //! Size of a block + uint32_t block_size; + //! Flags and counters + uint32_t flags; + //! Number of spans + uint32_t span_count; + //! Total span counter for master spans + uint32_t total_spans; + //! Offset from master span for subspans + uint32_t offset_from_master; + //! Remaining span counter, for master spans + atomic32_t remaining_spans; + //! Alignment offset + uint32_t align_offset; + //! Owning heap + heap_t *heap; + //! Next span + span_t *next; + //! Previous span + span_t *prev; +}; +_Static_assert(sizeof(span_t) <= SPAN_HEADER_SIZE, "span size mismatch"); + +struct span_cache_t { + size_t count; + span_t *span[MAX_THREAD_SPAN_CACHE]; +}; +typedef struct span_cache_t span_cache_t; + +struct span_large_cache_t { + size_t count; + span_t *span[MAX_THREAD_SPAN_LARGE_CACHE]; +}; +typedef struct span_large_cache_t span_large_cache_t; + +struct heap_size_class_t { + //! Free list of active span + void *free_list; + //! Double linked list of partially used spans with free blocks. + // Previous span pointer in head points to tail span of list. + span_t *partial_span; + //! Early level cache of fully free spans + span_t *cache; +}; +typedef struct heap_size_class_t heap_size_class_t; + +// Control structure for a heap, either a thread heap or a first class heap if +// enabled +struct heap_t { + //! Owning thread ID + uintptr_t owner_thread; + //! Free lists for each size class + heap_size_class_t size_class[SIZE_CLASS_COUNT]; +#if ENABLE_THREAD_CACHE + //! Arrays of fully freed spans, single span + span_cache_t span_cache; +#endif + //! List of deferred free spans (single linked list) + atomicptr_t span_free_deferred; + //! Number of full spans + size_t full_span_count; + //! Mapped but unused spans + span_t *span_reserve; + //! Master span for mapped but unused spans + span_t *span_reserve_master; + //! Number of mapped but unused spans + uint32_t spans_reserved; + //! Child count + atomic32_t child_count; + //! Next heap in id list + heap_t *next_heap; + //! Next heap in orphan list + heap_t *next_orphan; + //! Heap ID + int32_t id; + //! Finalization state flag + int finalize; + //! Master heap owning the memory pages + heap_t *master_heap; +#if ENABLE_THREAD_CACHE + //! Arrays of fully freed spans, large spans with > 1 span count + span_large_cache_t span_large_cache[LARGE_CLASS_COUNT - 1]; +#endif +#if RPMALLOC_FIRST_CLASS_HEAPS + //! Double linked list of fully utilized spans with free blocks for each size + //! class. + // Previous span pointer in head points to tail span of list. + span_t *full_span[SIZE_CLASS_COUNT]; + //! Double linked list of large and huge spans allocated by this heap + span_t *large_huge_span; +#endif +#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS + //! Current and high water mark of spans used per span count + span_use_t span_use[LARGE_CLASS_COUNT]; +#endif +#if ENABLE_STATISTICS + //! Allocation stats per size class + size_class_use_t size_class_use[SIZE_CLASS_COUNT + 1]; + //! Number of bytes transitioned thread -> global + atomic64_t thread_to_global; + //! Number of bytes transitioned global -> thread + atomic64_t global_to_thread; +#endif +}; + +// Size class for defining a block size bucket +struct size_class_t { + //! Size of blocks in this class + uint32_t block_size; + //! Number of blocks in each chunk + uint16_t block_count; + //! Class index this class is merged with + uint16_t class_idx; +}; +_Static_assert(sizeof(size_class_t) == 8, "Size class size mismatch"); + +struct global_cache_t { + //! Cache lock + atomic32_t lock; + //! Cache count + uint32_t count; +#if ENABLE_STATISTICS + //! Insert count + size_t insert_count; + //! Extract count + size_t extract_count; +#endif + //! Cached spans + span_t *span[GLOBAL_CACHE_MULTIPLIER * MAX_THREAD_SPAN_CACHE]; + //! Unlimited cache overflow + span_t *overflow; +}; + +//////////// +/// +/// Global data +/// +////// + +//! Default span size (64KiB) +#define _memory_default_span_size (64 * 1024) +#define _memory_default_span_size_shift 16 +#define _memory_default_span_mask (~((uintptr_t)(_memory_span_size - 1))) + +//! Initialized flag +static int _rpmalloc_initialized; +//! Main thread ID +static uintptr_t _rpmalloc_main_thread_id; +//! Configuration +static rpmalloc_config_t _memory_config; +//! Memory page size +static size_t _memory_page_size; +//! Shift to divide by page size +static size_t _memory_page_size_shift; +//! Granularity at which memory pages are mapped by OS +static size_t _memory_map_granularity; +#if RPMALLOC_CONFIGURABLE +//! Size of a span of memory pages +static size_t _memory_span_size; +//! Shift to divide by span size +static size_t _memory_span_size_shift; +//! Mask to get to start of a memory span +static uintptr_t _memory_span_mask; +#else +//! Hardwired span size +#define _memory_span_size _memory_default_span_size +#define _memory_span_size_shift _memory_default_span_size_shift +#define _memory_span_mask _memory_default_span_mask +#endif +//! Number of spans to map in each map call +static size_t _memory_span_map_count; +//! Number of spans to keep reserved in each heap +static size_t _memory_heap_reserve_count; +//! Global size classes +static size_class_t _memory_size_class[SIZE_CLASS_COUNT]; +//! Run-time size limit of medium blocks +static size_t _memory_medium_size_limit; +//! Heap ID counter +static atomic32_t _memory_heap_id; +//! Huge page support +static int _memory_huge_pages; +#if ENABLE_GLOBAL_CACHE +//! Global span cache +static global_cache_t _memory_span_cache[LARGE_CLASS_COUNT]; +#endif +//! Global reserved spans +static span_t *_memory_global_reserve; +//! Global reserved count +static size_t _memory_global_reserve_count; +//! Global reserved master +static span_t *_memory_global_reserve_master; +//! All heaps +static heap_t *_memory_heaps[HEAP_ARRAY_SIZE]; +//! Used to restrict access to mapping memory for huge pages +static atomic32_t _memory_global_lock; +//! Orphaned heaps +static heap_t *_memory_orphan_heaps; +#if RPMALLOC_FIRST_CLASS_HEAPS +//! Orphaned heaps (first class heaps) +static heap_t *_memory_first_class_orphan_heaps; +#endif +#if ENABLE_STATISTICS +//! Allocations counter +static atomic64_t _allocation_counter; +//! Deallocations counter +static atomic64_t _deallocation_counter; +//! Active heap count +static atomic32_t _memory_active_heaps; +//! Number of currently mapped memory pages +static atomic32_t _mapped_pages; +//! Peak number of concurrently mapped memory pages +static int32_t _mapped_pages_peak; +//! Number of mapped master spans +static atomic32_t _master_spans; +//! Number of unmapped dangling master spans +static atomic32_t _unmapped_master_spans; +//! Running counter of total number of mapped memory pages since start +static atomic32_t _mapped_total; +//! Running counter of total number of unmapped memory pages since start +static atomic32_t _unmapped_total; +//! Number of currently mapped memory pages in OS calls +static atomic32_t _mapped_pages_os; +//! Number of currently allocated pages in huge allocations +static atomic32_t _huge_pages_current; +//! Peak number of currently allocated pages in huge allocations +static int32_t _huge_pages_peak; +#endif + +//////////// +/// +/// Thread local heap and ID +/// +////// + +//! Current thread heap +#if ((defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD) || \ + defined(__TINYC__) +static pthread_key_t _memory_thread_heap; +#else +#ifdef _MSC_VER +#define _Thread_local __declspec(thread) +#define TLS_MODEL +#else +#ifndef __HAIKU__ +#define TLS_MODEL __attribute__((tls_model("initial-exec"))) +#else +#define TLS_MODEL +#endif +#if !defined(__clang__) && defined(__GNUC__) +#define _Thread_local __thread +#endif +#endif +static _Thread_local heap_t *_memory_thread_heap TLS_MODEL; +#endif + +static inline heap_t *get_thread_heap_raw(void) { +#if (defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD + return pthread_getspecific(_memory_thread_heap); +#else + return _memory_thread_heap; +#endif +} + +//! Get the current thread heap +static inline heap_t *get_thread_heap(void) { + heap_t *heap = get_thread_heap_raw(); +#if ENABLE_PRELOAD + if (EXPECTED(heap != 0)) + return heap; + rpmalloc_initialize(); + return get_thread_heap_raw(); +#else + return heap; +#endif +} + +//! Fast thread ID +static inline uintptr_t get_thread_id(void) { +#if defined(_WIN32) + return (uintptr_t)((void *)NtCurrentTeb()); +#elif (defined(__GNUC__) || defined(__clang__)) && !defined(__CYGWIN__) + uintptr_t tid; +#if defined(__i386__) + __asm__("movl %%gs:0, %0" : "=r"(tid) : :); +#elif defined(__x86_64__) +#if defined(__MACH__) + __asm__("movq %%gs:0, %0" : "=r"(tid) : :); +#else + __asm__("movq %%fs:0, %0" : "=r"(tid) : :); +#endif +#elif defined(__arm__) + __asm__ volatile("mrc p15, 0, %0, c13, c0, 3" : "=r"(tid)); +#elif defined(__aarch64__) +#if defined(__MACH__) + // tpidr_el0 likely unused, always return 0 on iOS + __asm__ volatile("mrs %0, tpidrro_el0" : "=r"(tid)); +#else + __asm__ volatile("mrs %0, tpidr_el0" : "=r"(tid)); +#endif +#else +#error This platform needs implementation of get_thread_id() +#endif + return tid; +#else +#error This platform needs implementation of get_thread_id() +#endif +} + +//! Set the current thread heap +static void set_thread_heap(heap_t *heap) { +#if ((defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD) || \ + defined(__TINYC__) + pthread_setspecific(_memory_thread_heap, heap); +#else + _memory_thread_heap = heap; +#endif + if (heap) + heap->owner_thread = get_thread_id(); +} + +//! Set main thread ID +extern void rpmalloc_set_main_thread(void); + +void rpmalloc_set_main_thread(void) { + _rpmalloc_main_thread_id = get_thread_id(); +} + +static void _rpmalloc_spin(void) { +#if defined(_MSC_VER) +#if defined(_M_ARM64) + __yield(); +#else + _mm_pause(); +#endif +#elif defined(__x86_64__) || defined(__i386__) + __asm__ volatile("pause" ::: "memory"); +#elif defined(__aarch64__) || (defined(__arm__) && __ARM_ARCH >= 7) + __asm__ volatile("yield" ::: "memory"); +#elif defined(__powerpc__) || defined(__powerpc64__) + // No idea if ever been compiled in such archs but ... as precaution + __asm__ volatile("or 27,27,27"); +#elif defined(__sparc__) + __asm__ volatile("rd %ccr, %g0 \n\trd %ccr, %g0 \n\trd %ccr, %g0"); +#else + struct timespec ts = {0}; + nanosleep(&ts, 0); +#endif +} + +#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) +static void NTAPI _rpmalloc_thread_destructor(void *value) { +#if ENABLE_OVERRIDE + // If this is called on main thread it means rpmalloc_finalize + // has not been called and shutdown is forced (through _exit) or unclean + if (get_thread_id() == _rpmalloc_main_thread_id) + return; +#endif + if (value) + rpmalloc_thread_finalize(1); +} +#endif + +//////////// +/// +/// Low level memory map/unmap +/// +////// + +static void _rpmalloc_set_name(void *address, size_t size) { +#if defined(__linux__) || defined(__ANDROID__) + const char *name = _memory_huge_pages ? _memory_config.huge_page_name + : _memory_config.page_name; + if (address == MAP_FAILED || !name) + return; + // If the kernel does not support CONFIG_ANON_VMA_NAME or if the call fails + // (e.g. invalid name) it is a no-op basically. + (void)prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, (uintptr_t)address, size, + (uintptr_t)name); +#else + (void)sizeof(size); + (void)sizeof(address); +#endif +} + +//! Map more virtual memory +// size is number of bytes to map +// offset receives the offset in bytes from start of mapped region +// returns address to start of mapped region to use +static void *_rpmalloc_mmap(size_t size, size_t *offset) { + rpmalloc_assert(!(size % _memory_page_size), "Invalid mmap size"); + rpmalloc_assert(size >= _memory_page_size, "Invalid mmap size"); + void *address = _memory_config.memory_map(size, offset); + if (EXPECTED(address != 0)) { + _rpmalloc_stat_add_peak(&_mapped_pages, (size >> _memory_page_size_shift), + _mapped_pages_peak); + _rpmalloc_stat_add(&_mapped_total, (size >> _memory_page_size_shift)); + } + return address; +} + +//! Unmap virtual memory +// address is the memory address to unmap, as returned from _memory_map +// size is the number of bytes to unmap, which might be less than full region +// for a partial unmap offset is the offset in bytes to the actual mapped +// region, as set by _memory_map release is set to 0 for partial unmap, or size +// of entire range for a full unmap +static void _rpmalloc_unmap(void *address, size_t size, size_t offset, + size_t release) { + rpmalloc_assert(!release || (release >= size), "Invalid unmap size"); + rpmalloc_assert(!release || (release >= _memory_page_size), + "Invalid unmap size"); + if (release) { + rpmalloc_assert(!(release % _memory_page_size), "Invalid unmap size"); + _rpmalloc_stat_sub(&_mapped_pages, (release >> _memory_page_size_shift)); + _rpmalloc_stat_add(&_unmapped_total, (release >> _memory_page_size_shift)); + } + _memory_config.memory_unmap(address, size, offset, release); +} + +//! Default implementation to map new pages to virtual memory +static void *_rpmalloc_mmap_os(size_t size, size_t *offset) { + // Either size is a heap (a single page) or a (multiple) span - we only need + // to align spans, and only if larger than map granularity + size_t padding = ((size >= _memory_span_size) && + (_memory_span_size > _memory_map_granularity)) + ? _memory_span_size + : 0; + rpmalloc_assert(size >= _memory_page_size, "Invalid mmap size"); +#if PLATFORM_WINDOWS + // Ok to MEM_COMMIT - according to MSDN, "actual physical pages are not + // allocated unless/until the virtual addresses are actually accessed" + void *ptr = VirtualAlloc(0, size + padding, + (_memory_huge_pages ? MEM_LARGE_PAGES : 0) | + MEM_RESERVE | MEM_COMMIT, + PAGE_READWRITE); + if (!ptr) { + if (_memory_config.map_fail_callback) { + if (_memory_config.map_fail_callback(size + padding)) + return _rpmalloc_mmap_os(size, offset); + } else { + rpmalloc_assert(ptr, "Failed to map virtual memory block"); + } + return 0; + } +#else + int flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_UNINITIALIZED; +#if defined(__APPLE__) && !TARGET_OS_IPHONE && !TARGET_OS_SIMULATOR + int fd = (int)VM_MAKE_TAG(240U); + if (_memory_huge_pages) + fd |= VM_FLAGS_SUPERPAGE_SIZE_2MB; + void *ptr = mmap(0, size + padding, PROT_READ | PROT_WRITE, flags, fd, 0); +#elif defined(MAP_HUGETLB) + void *ptr = mmap(0, size + padding, + PROT_READ | PROT_WRITE | PROT_MAX(PROT_READ | PROT_WRITE), + (_memory_huge_pages ? MAP_HUGETLB : 0) | flags, -1, 0); +#if defined(MADV_HUGEPAGE) + // In some configurations, huge pages allocations might fail thus + // we fallback to normal allocations and promote the region as transparent + // huge page + if ((ptr == MAP_FAILED || !ptr) && _memory_huge_pages) { + ptr = mmap(0, size + padding, PROT_READ | PROT_WRITE, flags, -1, 0); + if (ptr && ptr != MAP_FAILED) { + int prm = madvise(ptr, size + padding, MADV_HUGEPAGE); + (void)prm; + rpmalloc_assert((prm == 0), "Failed to promote the page to THP"); + } + } +#endif + _rpmalloc_set_name(ptr, size + padding); +#elif defined(MAP_ALIGNED) + const size_t align = + (sizeof(size_t) * 8) - (size_t)(__builtin_clzl(size - 1)); + void *ptr = + mmap(0, size + padding, PROT_READ | PROT_WRITE, + (_memory_huge_pages ? MAP_ALIGNED(align) : 0) | flags, -1, 0); +#elif defined(MAP_ALIGN) + caddr_t base = (_memory_huge_pages ? (caddr_t)(4 << 20) : 0); + void *ptr = mmap(base, size + padding, PROT_READ | PROT_WRITE, + (_memory_huge_pages ? MAP_ALIGN : 0) | flags, -1, 0); +#else + void *ptr = mmap(0, size + padding, PROT_READ | PROT_WRITE, flags, -1, 0); +#endif + if ((ptr == MAP_FAILED) || !ptr) { + if (_memory_config.map_fail_callback) { + if (_memory_config.map_fail_callback(size + padding)) + return _rpmalloc_mmap_os(size, offset); + } else if (errno != ENOMEM) { + rpmalloc_assert((ptr != MAP_FAILED) && ptr, + "Failed to map virtual memory block"); + } + return 0; + } +#endif + _rpmalloc_stat_add(&_mapped_pages_os, + (int32_t)((size + padding) >> _memory_page_size_shift)); + if (padding) { + size_t final_padding = padding - ((uintptr_t)ptr & ~_memory_span_mask); + rpmalloc_assert(final_padding <= _memory_span_size, + "Internal failure in padding"); + rpmalloc_assert(final_padding <= padding, "Internal failure in padding"); + rpmalloc_assert(!(final_padding % 8), "Internal failure in padding"); + ptr = pointer_offset(ptr, final_padding); + *offset = final_padding >> 3; + } + rpmalloc_assert((size < _memory_span_size) || + !((uintptr_t)ptr & ~_memory_span_mask), + "Internal failure in padding"); + return ptr; +} + +//! Default implementation to unmap pages from virtual memory +static void _rpmalloc_unmap_os(void *address, size_t size, size_t offset, + size_t release) { + rpmalloc_assert(release || (offset == 0), "Invalid unmap size"); + rpmalloc_assert(!release || (release >= _memory_page_size), + "Invalid unmap size"); + rpmalloc_assert(size >= _memory_page_size, "Invalid unmap size"); + if (release && offset) { + offset <<= 3; + address = pointer_offset(address, -(int32_t)offset); + if ((release >= _memory_span_size) && + (_memory_span_size > _memory_map_granularity)) { + // Padding is always one span size + release += _memory_span_size; + } + } +#if !DISABLE_UNMAP +#if PLATFORM_WINDOWS + if (!VirtualFree(address, release ? 0 : size, + release ? MEM_RELEASE : MEM_DECOMMIT)) { + rpmalloc_assert(0, "Failed to unmap virtual memory block"); + } +#else + if (release) { + if (munmap(address, release)) { + rpmalloc_assert(0, "Failed to unmap virtual memory block"); + } + } else { +#if defined(MADV_FREE_REUSABLE) + int ret; + while ((ret = madvise(address, size, MADV_FREE_REUSABLE)) == -1 && + (errno == EAGAIN)) + errno = 0; + if ((ret == -1) && (errno != 0)) { +#elif defined(MADV_DONTNEED) + if (madvise(address, size, MADV_DONTNEED)) { +#elif defined(MADV_PAGEOUT) + if (madvise(address, size, MADV_PAGEOUT)) { +#elif defined(MADV_FREE) + if (madvise(address, size, MADV_FREE)) { +#else + if (posix_madvise(address, size, POSIX_MADV_DONTNEED)) { +#endif + rpmalloc_assert(0, "Failed to madvise virtual memory block as free"); + } + } +#endif +#endif + if (release) + _rpmalloc_stat_sub(&_mapped_pages_os, release >> _memory_page_size_shift); +} + +static void _rpmalloc_span_mark_as_subspan_unless_master(span_t *master, + span_t *subspan, + size_t span_count); + +//! Use global reserved spans to fulfill a memory map request (reserve size must +//! be checked by caller) +static span_t *_rpmalloc_global_get_reserved_spans(size_t span_count) { + span_t *span = _memory_global_reserve; + _rpmalloc_span_mark_as_subspan_unless_master(_memory_global_reserve_master, + span, span_count); + _memory_global_reserve_count -= span_count; + if (_memory_global_reserve_count) + _memory_global_reserve = + (span_t *)pointer_offset(span, span_count << _memory_span_size_shift); + else + _memory_global_reserve = 0; + return span; +} + +//! Store the given spans as global reserve (must only be called from within new +//! heap allocation, not thread safe) +static void _rpmalloc_global_set_reserved_spans(span_t *master, span_t *reserve, + size_t reserve_span_count) { + _memory_global_reserve_master = master; + _memory_global_reserve_count = reserve_span_count; + _memory_global_reserve = reserve; +} + +//////////// +/// +/// Span linked list management +/// +////// + +//! Add a span to double linked list at the head +static void _rpmalloc_span_double_link_list_add(span_t **head, span_t *span) { + if (*head) + (*head)->prev = span; + span->next = *head; + *head = span; +} + +//! Pop head span from double linked list +static void _rpmalloc_span_double_link_list_pop_head(span_t **head, + span_t *span) { + rpmalloc_assert(*head == span, "Linked list corrupted"); + span = *head; + *head = span->next; +} + +//! Remove a span from double linked list +static void _rpmalloc_span_double_link_list_remove(span_t **head, + span_t *span) { + rpmalloc_assert(*head, "Linked list corrupted"); + if (*head == span) { + *head = span->next; + } else { + span_t *next_span = span->next; + span_t *prev_span = span->prev; + prev_span->next = next_span; + if (EXPECTED(next_span != 0)) + next_span->prev = prev_span; + } +} + +//////////// +/// +/// Span control +/// +////// + +static void _rpmalloc_heap_cache_insert(heap_t *heap, span_t *span); + +static void _rpmalloc_heap_finalize(heap_t *heap); + +static void _rpmalloc_heap_set_reserved_spans(heap_t *heap, span_t *master, + span_t *reserve, + size_t reserve_span_count); + +//! Declare the span to be a subspan and store distance from master span and +//! span count +static void _rpmalloc_span_mark_as_subspan_unless_master(span_t *master, + span_t *subspan, + size_t span_count) { + rpmalloc_assert((subspan != master) || (subspan->flags & SPAN_FLAG_MASTER), + "Span master pointer and/or flag mismatch"); + if (subspan != master) { + subspan->flags = SPAN_FLAG_SUBSPAN; + subspan->offset_from_master = + (uint32_t)((uintptr_t)pointer_diff(subspan, master) >> + _memory_span_size_shift); + subspan->align_offset = 0; + } + subspan->span_count = (uint32_t)span_count; +} + +//! Use reserved spans to fulfill a memory map request (reserve size must be +//! checked by caller) +static span_t *_rpmalloc_span_map_from_reserve(heap_t *heap, + size_t span_count) { + // Update the heap span reserve + span_t *span = heap->span_reserve; + heap->span_reserve = + (span_t *)pointer_offset(span, span_count * _memory_span_size); + heap->spans_reserved -= (uint32_t)span_count; + + _rpmalloc_span_mark_as_subspan_unless_master(heap->span_reserve_master, span, + span_count); + if (span_count <= LARGE_CLASS_COUNT) + _rpmalloc_stat_inc(&heap->span_use[span_count - 1].spans_from_reserved); + + return span; +} + +//! Get the aligned number of spans to map in based on wanted count, configured +//! mapping granularity and the page size +static size_t _rpmalloc_span_align_count(size_t span_count) { + size_t request_count = (span_count > _memory_span_map_count) + ? span_count + : _memory_span_map_count; + if ((_memory_page_size > _memory_span_size) && + ((request_count * _memory_span_size) % _memory_page_size)) + request_count += + _memory_span_map_count - (request_count % _memory_span_map_count); + return request_count; +} + +//! Setup a newly mapped span +static void _rpmalloc_span_initialize(span_t *span, size_t total_span_count, + size_t span_count, size_t align_offset) { + span->total_spans = (uint32_t)total_span_count; + span->span_count = (uint32_t)span_count; + span->align_offset = (uint32_t)align_offset; + span->flags = SPAN_FLAG_MASTER; + atomic_store32(&span->remaining_spans, (int32_t)total_span_count); +} + +static void _rpmalloc_span_unmap(span_t *span); + +//! Map an aligned set of spans, taking configured mapping granularity and the +//! page size into account +static span_t *_rpmalloc_span_map_aligned_count(heap_t *heap, + size_t span_count) { + // If we already have some, but not enough, reserved spans, release those to + // heap cache and map a new full set of spans. Otherwise we would waste memory + // if page size > span size (huge pages) + size_t aligned_span_count = _rpmalloc_span_align_count(span_count); + size_t align_offset = 0; + span_t *span = (span_t *)_rpmalloc_mmap( + aligned_span_count * _memory_span_size, &align_offset); + if (!span) + return 0; + _rpmalloc_span_initialize(span, aligned_span_count, span_count, align_offset); + _rpmalloc_stat_inc(&_master_spans); + if (span_count <= LARGE_CLASS_COUNT) + _rpmalloc_stat_inc(&heap->span_use[span_count - 1].spans_map_calls); + if (aligned_span_count > span_count) { + span_t *reserved_spans = + (span_t *)pointer_offset(span, span_count * _memory_span_size); + size_t reserved_count = aligned_span_count - span_count; + if (heap->spans_reserved) { + _rpmalloc_span_mark_as_subspan_unless_master( + heap->span_reserve_master, heap->span_reserve, heap->spans_reserved); + _rpmalloc_heap_cache_insert(heap, heap->span_reserve); + } + if (reserved_count > _memory_heap_reserve_count) { + // If huge pages or eager spam map count, the global reserve spin lock is + // held by caller, _rpmalloc_span_map + rpmalloc_assert(atomic_load32(&_memory_global_lock) == 1, + "Global spin lock not held as expected"); + size_t remain_count = reserved_count - _memory_heap_reserve_count; + reserved_count = _memory_heap_reserve_count; + span_t *remain_span = (span_t *)pointer_offset( + reserved_spans, reserved_count * _memory_span_size); + if (_memory_global_reserve) { + _rpmalloc_span_mark_as_subspan_unless_master( + _memory_global_reserve_master, _memory_global_reserve, + _memory_global_reserve_count); + _rpmalloc_span_unmap(_memory_global_reserve); + } + _rpmalloc_global_set_reserved_spans(span, remain_span, remain_count); + } + _rpmalloc_heap_set_reserved_spans(heap, span, reserved_spans, + reserved_count); + } + return span; +} + +//! Map in memory pages for the given number of spans (or use previously +//! reserved pages) +static span_t *_rpmalloc_span_map(heap_t *heap, size_t span_count) { + if (span_count <= heap->spans_reserved) + return _rpmalloc_span_map_from_reserve(heap, span_count); + span_t *span = 0; + int use_global_reserve = + (_memory_page_size > _memory_span_size) || + (_memory_span_map_count > _memory_heap_reserve_count); + if (use_global_reserve) { + // If huge pages, make sure only one thread maps more memory to avoid bloat + while (!atomic_cas32_acquire(&_memory_global_lock, 1, 0)) + _rpmalloc_spin(); + if (_memory_global_reserve_count >= span_count) { + size_t reserve_count = + (!heap->spans_reserved ? _memory_heap_reserve_count : span_count); + if (_memory_global_reserve_count < reserve_count) + reserve_count = _memory_global_reserve_count; + span = _rpmalloc_global_get_reserved_spans(reserve_count); + if (span) { + if (reserve_count > span_count) { + span_t *reserved_span = (span_t *)pointer_offset( + span, span_count << _memory_span_size_shift); + _rpmalloc_heap_set_reserved_spans(heap, _memory_global_reserve_master, + reserved_span, + reserve_count - span_count); + } + // Already marked as subspan in _rpmalloc_global_get_reserved_spans + span->span_count = (uint32_t)span_count; + } + } + } + if (!span) + span = _rpmalloc_span_map_aligned_count(heap, span_count); + if (use_global_reserve) + atomic_store32_release(&_memory_global_lock, 0); + return span; +} + +//! Unmap memory pages for the given number of spans (or mark as unused if no +//! partial unmappings) +static void _rpmalloc_span_unmap(span_t *span) { + rpmalloc_assert((span->flags & SPAN_FLAG_MASTER) || + (span->flags & SPAN_FLAG_SUBSPAN), + "Span flag corrupted"); + rpmalloc_assert(!(span->flags & SPAN_FLAG_MASTER) || + !(span->flags & SPAN_FLAG_SUBSPAN), + "Span flag corrupted"); + + int is_master = !!(span->flags & SPAN_FLAG_MASTER); + span_t *master = + is_master ? span + : ((span_t *)pointer_offset( + span, -(intptr_t)((uintptr_t)span->offset_from_master * + _memory_span_size))); + rpmalloc_assert(is_master || (span->flags & SPAN_FLAG_SUBSPAN), + "Span flag corrupted"); + rpmalloc_assert(master->flags & SPAN_FLAG_MASTER, "Span flag corrupted"); + + size_t span_count = span->span_count; + if (!is_master) { + // Directly unmap subspans (unless huge pages, in which case we defer and + // unmap entire page range with master) + rpmalloc_assert(span->align_offset == 0, "Span align offset corrupted"); + if (_memory_span_size >= _memory_page_size) + _rpmalloc_unmap(span, span_count * _memory_span_size, 0, 0); + } else { + // Special double flag to denote an unmapped master + // It must be kept in memory since span header must be used + span->flags |= + SPAN_FLAG_MASTER | SPAN_FLAG_SUBSPAN | SPAN_FLAG_UNMAPPED_MASTER; + _rpmalloc_stat_add(&_unmapped_master_spans, 1); + } + + if (atomic_add32(&master->remaining_spans, -(int32_t)span_count) <= 0) { + // Everything unmapped, unmap the master span with release flag to unmap the + // entire range of the super span + rpmalloc_assert(!!(master->flags & SPAN_FLAG_MASTER) && + !!(master->flags & SPAN_FLAG_SUBSPAN), + "Span flag corrupted"); + size_t unmap_count = master->span_count; + if (_memory_span_size < _memory_page_size) + unmap_count = master->total_spans; + _rpmalloc_stat_sub(&_master_spans, 1); + _rpmalloc_stat_sub(&_unmapped_master_spans, 1); + _rpmalloc_unmap(master, unmap_count * _memory_span_size, + master->align_offset, + (size_t)master->total_spans * _memory_span_size); + } +} + +//! Move the span (used for small or medium allocations) to the heap thread +//! cache +static void _rpmalloc_span_release_to_cache(heap_t *heap, span_t *span) { + rpmalloc_assert(heap == span->heap, "Span heap pointer corrupted"); + rpmalloc_assert(span->size_class < SIZE_CLASS_COUNT, + "Invalid span size class"); + rpmalloc_assert(span->span_count == 1, "Invalid span count"); +#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS + atomic_decr32(&heap->span_use[0].current); +#endif + _rpmalloc_stat_dec(&heap->size_class_use[span->size_class].spans_current); + if (!heap->finalize) { + _rpmalloc_stat_inc(&heap->span_use[0].spans_to_cache); + _rpmalloc_stat_inc(&heap->size_class_use[span->size_class].spans_to_cache); + if (heap->size_class[span->size_class].cache) + _rpmalloc_heap_cache_insert(heap, + heap->size_class[span->size_class].cache); + heap->size_class[span->size_class].cache = span; + } else { + _rpmalloc_span_unmap(span); + } +} + +//! Initialize a (partial) free list up to next system memory page, while +//! reserving the first block as allocated, returning number of blocks in list +static uint32_t free_list_partial_init(void **list, void **first_block, + void *page_start, void *block_start, + uint32_t block_count, + uint32_t block_size) { + rpmalloc_assert(block_count, "Internal failure"); + *first_block = block_start; + if (block_count > 1) { + void *free_block = pointer_offset(block_start, block_size); + void *block_end = + pointer_offset(block_start, (size_t)block_size * block_count); + // If block size is less than half a memory page, bound init to next memory + // page boundary + if (block_size < (_memory_page_size >> 1)) { + void *page_end = pointer_offset(page_start, _memory_page_size); + if (page_end < block_end) + block_end = page_end; + } + *list = free_block; + block_count = 2; + void *next_block = pointer_offset(free_block, block_size); + while (next_block < block_end) { + *((void **)free_block) = next_block; + free_block = next_block; + ++block_count; + next_block = pointer_offset(next_block, block_size); + } + *((void **)free_block) = 0; + } else { + *list = 0; + } + return block_count; +} + +//! Initialize an unused span (from cache or mapped) to be new active span, +//! putting the initial free list in heap class free list +static void *_rpmalloc_span_initialize_new(heap_t *heap, + heap_size_class_t *heap_size_class, + span_t *span, uint32_t class_idx) { + rpmalloc_assert(span->span_count == 1, "Internal failure"); + size_class_t *size_class = _memory_size_class + class_idx; + span->size_class = class_idx; + span->heap = heap; + span->flags &= ~SPAN_FLAG_ALIGNED_BLOCKS; + span->block_size = size_class->block_size; + span->block_count = size_class->block_count; + span->free_list = 0; + span->list_size = 0; + atomic_store_ptr_release(&span->free_list_deferred, 0); + + // Setup free list. Only initialize one system page worth of free blocks in + // list + void *block; + span->free_list_limit = + free_list_partial_init(&heap_size_class->free_list, &block, span, + pointer_offset(span, SPAN_HEADER_SIZE), + size_class->block_count, size_class->block_size); + // Link span as partial if there remains blocks to be initialized as free + // list, or full if fully initialized + if (span->free_list_limit < span->block_count) { + _rpmalloc_span_double_link_list_add(&heap_size_class->partial_span, span); + span->used_count = span->free_list_limit; + } else { +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_add(&heap->full_span[class_idx], span); +#endif + ++heap->full_span_count; + span->used_count = span->block_count; + } + return block; +} + +static void _rpmalloc_span_extract_free_list_deferred(span_t *span) { + // We need acquire semantics on the CAS operation since we are interested in + // the list size Refer to _rpmalloc_deallocate_defer_small_or_medium for + // further comments on this dependency + do { + span->free_list = + atomic_exchange_ptr_acquire(&span->free_list_deferred, INVALID_POINTER); + } while (span->free_list == INVALID_POINTER); + span->used_count -= span->list_size; + span->list_size = 0; + atomic_store_ptr_release(&span->free_list_deferred, 0); +} + +static int _rpmalloc_span_is_fully_utilized(span_t *span) { + rpmalloc_assert(span->free_list_limit <= span->block_count, + "Span free list corrupted"); + return !span->free_list && (span->free_list_limit >= span->block_count); +} + +static int _rpmalloc_span_finalize(heap_t *heap, size_t iclass, span_t *span, + span_t **list_head) { + void *free_list = heap->size_class[iclass].free_list; + span_t *class_span = (span_t *)((uintptr_t)free_list & _memory_span_mask); + if (span == class_span) { + // Adopt the heap class free list back into the span free list + void *block = span->free_list; + void *last_block = 0; + while (block) { + last_block = block; + block = *((void **)block); + } + uint32_t free_count = 0; + block = free_list; + while (block) { + ++free_count; + block = *((void **)block); + } + if (last_block) { + *((void **)last_block) = free_list; + } else { + span->free_list = free_list; + } + heap->size_class[iclass].free_list = 0; + span->used_count -= free_count; + } + // If this assert triggers you have memory leaks + rpmalloc_assert(span->list_size == span->used_count, "Memory leak detected"); + if (span->list_size == span->used_count) { + _rpmalloc_stat_dec(&heap->span_use[0].current); + _rpmalloc_stat_dec(&heap->size_class_use[iclass].spans_current); + // This function only used for spans in double linked lists + if (list_head) + _rpmalloc_span_double_link_list_remove(list_head, span); + _rpmalloc_span_unmap(span); + return 1; + } + return 0; +} + +//////////// +/// +/// Global cache +/// +////// + +#if ENABLE_GLOBAL_CACHE + +//! Finalize a global cache +static void _rpmalloc_global_cache_finalize(global_cache_t *cache) { + while (!atomic_cas32_acquire(&cache->lock, 1, 0)) + _rpmalloc_spin(); + + for (size_t ispan = 0; ispan < cache->count; ++ispan) + _rpmalloc_span_unmap(cache->span[ispan]); + cache->count = 0; + + while (cache->overflow) { + span_t *span = cache->overflow; + cache->overflow = span->next; + _rpmalloc_span_unmap(span); + } + + atomic_store32_release(&cache->lock, 0); +} + +static void _rpmalloc_global_cache_insert_spans(span_t **span, + size_t span_count, + size_t count) { + const size_t cache_limit = + (span_count == 1) ? GLOBAL_CACHE_MULTIPLIER * MAX_THREAD_SPAN_CACHE + : GLOBAL_CACHE_MULTIPLIER * + (MAX_THREAD_SPAN_LARGE_CACHE - (span_count >> 1)); + + global_cache_t *cache = &_memory_span_cache[span_count - 1]; + + size_t insert_count = count; + while (!atomic_cas32_acquire(&cache->lock, 1, 0)) + _rpmalloc_spin(); + +#if ENABLE_STATISTICS + cache->insert_count += count; +#endif + if ((cache->count + insert_count) > cache_limit) + insert_count = cache_limit - cache->count; + + memcpy(cache->span + cache->count, span, sizeof(span_t *) * insert_count); + cache->count += (uint32_t)insert_count; + +#if ENABLE_UNLIMITED_CACHE + while (insert_count < count) { +#else + // Enable unlimited cache if huge pages, or we will leak since it is unlikely + // that an entire huge page will be unmapped, and we're unable to partially + // decommit a huge page + while ((_memory_page_size > _memory_span_size) && (insert_count < count)) { +#endif + span_t *current_span = span[insert_count++]; + current_span->next = cache->overflow; + cache->overflow = current_span; + } + atomic_store32_release(&cache->lock, 0); + + span_t *keep = 0; + for (size_t ispan = insert_count; ispan < count; ++ispan) { + span_t *current_span = span[ispan]; + // Keep master spans that has remaining subspans to avoid dangling them + if ((current_span->flags & SPAN_FLAG_MASTER) && + (atomic_load32(¤t_span->remaining_spans) > + (int32_t)current_span->span_count)) { + current_span->next = keep; + keep = current_span; + } else { + _rpmalloc_span_unmap(current_span); + } + } + + if (keep) { + while (!atomic_cas32_acquire(&cache->lock, 1, 0)) + _rpmalloc_spin(); + + size_t islot = 0; + while (keep) { + for (; islot < cache->count; ++islot) { + span_t *current_span = cache->span[islot]; + if (!(current_span->flags & SPAN_FLAG_MASTER) || + ((current_span->flags & SPAN_FLAG_MASTER) && + (atomic_load32(¤t_span->remaining_spans) <= + (int32_t)current_span->span_count))) { + _rpmalloc_span_unmap(current_span); + cache->span[islot] = keep; + break; + } + } + if (islot == cache->count) + break; + keep = keep->next; + } + + if (keep) { + span_t *tail = keep; + while (tail->next) + tail = tail->next; + tail->next = cache->overflow; + cache->overflow = keep; + } + + atomic_store32_release(&cache->lock, 0); + } +} + +static size_t _rpmalloc_global_cache_extract_spans(span_t **span, + size_t span_count, + size_t count) { + global_cache_t *cache = &_memory_span_cache[span_count - 1]; + + size_t extract_count = 0; + while (!atomic_cas32_acquire(&cache->lock, 1, 0)) + _rpmalloc_spin(); + +#if ENABLE_STATISTICS + cache->extract_count += count; +#endif + size_t want = count - extract_count; + if (want > cache->count) + want = cache->count; + + memcpy(span + extract_count, cache->span + (cache->count - want), + sizeof(span_t *) * want); + cache->count -= (uint32_t)want; + extract_count += want; + + while ((extract_count < count) && cache->overflow) { + span_t *current_span = cache->overflow; + span[extract_count++] = current_span; + cache->overflow = current_span->next; + } + +#if ENABLE_ASSERTS + for (size_t ispan = 0; ispan < extract_count; ++ispan) { + rpmalloc_assert(span[ispan]->span_count == span_count, + "Global cache span count mismatch"); + } +#endif + + atomic_store32_release(&cache->lock, 0); + + return extract_count; +} + +#endif + +//////////// +/// +/// Heap control +/// +////// + +static void _rpmalloc_deallocate_huge(span_t *); + +//! Store the given spans as reserve in the given heap +static void _rpmalloc_heap_set_reserved_spans(heap_t *heap, span_t *master, + span_t *reserve, + size_t reserve_span_count) { + heap->span_reserve_master = master; + heap->span_reserve = reserve; + heap->spans_reserved = (uint32_t)reserve_span_count; +} + +//! Adopt the deferred span cache list, optionally extracting the first single +//! span for immediate re-use +static void _rpmalloc_heap_cache_adopt_deferred(heap_t *heap, + span_t **single_span) { + span_t *span = (span_t *)((void *)atomic_exchange_ptr_acquire( + &heap->span_free_deferred, 0)); + while (span) { + span_t *next_span = (span_t *)span->free_list; + rpmalloc_assert(span->heap == heap, "Span heap pointer corrupted"); + if (EXPECTED(span->size_class < SIZE_CLASS_COUNT)) { + rpmalloc_assert(heap->full_span_count, "Heap span counter corrupted"); + --heap->full_span_count; + _rpmalloc_stat_dec(&heap->span_use[0].spans_deferred); +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_remove(&heap->full_span[span->size_class], + span); +#endif + _rpmalloc_stat_dec(&heap->span_use[0].current); + _rpmalloc_stat_dec(&heap->size_class_use[span->size_class].spans_current); + if (single_span && !*single_span) + *single_span = span; + else + _rpmalloc_heap_cache_insert(heap, span); + } else { + if (span->size_class == SIZE_CLASS_HUGE) { + _rpmalloc_deallocate_huge(span); + } else { + rpmalloc_assert(span->size_class == SIZE_CLASS_LARGE, + "Span size class invalid"); + rpmalloc_assert(heap->full_span_count, "Heap span counter corrupted"); + --heap->full_span_count; +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_remove(&heap->large_huge_span, span); +#endif + uint32_t idx = span->span_count - 1; + _rpmalloc_stat_dec(&heap->span_use[idx].spans_deferred); + _rpmalloc_stat_dec(&heap->span_use[idx].current); + if (!idx && single_span && !*single_span) + *single_span = span; + else + _rpmalloc_heap_cache_insert(heap, span); + } + } + span = next_span; + } +} + +static void _rpmalloc_heap_unmap(heap_t *heap) { + if (!heap->master_heap) { + if ((heap->finalize > 1) && !atomic_load32(&heap->child_count)) { + span_t *span = (span_t *)((uintptr_t)heap & _memory_span_mask); + _rpmalloc_span_unmap(span); + } + } else { + if (atomic_decr32(&heap->master_heap->child_count) == 0) { + _rpmalloc_heap_unmap(heap->master_heap); + } + } +} + +static void _rpmalloc_heap_global_finalize(heap_t *heap) { + if (heap->finalize++ > 1) { + --heap->finalize; + return; + } + + _rpmalloc_heap_finalize(heap); + +#if ENABLE_THREAD_CACHE + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + span_cache_t *span_cache; + if (!iclass) + span_cache = &heap->span_cache; + else + span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); + for (size_t ispan = 0; ispan < span_cache->count; ++ispan) + _rpmalloc_span_unmap(span_cache->span[ispan]); + span_cache->count = 0; + } +#endif + + if (heap->full_span_count) { + --heap->finalize; + return; + } + + for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { + if (heap->size_class[iclass].free_list || + heap->size_class[iclass].partial_span) { + --heap->finalize; + return; + } + } + // Heap is now completely free, unmap and remove from heap list + size_t list_idx = (size_t)heap->id % HEAP_ARRAY_SIZE; + heap_t *list_heap = _memory_heaps[list_idx]; + if (list_heap == heap) { + _memory_heaps[list_idx] = heap->next_heap; + } else { + while (list_heap->next_heap != heap) + list_heap = list_heap->next_heap; + list_heap->next_heap = heap->next_heap; + } + + _rpmalloc_heap_unmap(heap); +} + +//! Insert a single span into thread heap cache, releasing to global cache if +//! overflow +static void _rpmalloc_heap_cache_insert(heap_t *heap, span_t *span) { + if (UNEXPECTED(heap->finalize != 0)) { + _rpmalloc_span_unmap(span); + _rpmalloc_heap_global_finalize(heap); + return; + } +#if ENABLE_THREAD_CACHE + size_t span_count = span->span_count; + _rpmalloc_stat_inc(&heap->span_use[span_count - 1].spans_to_cache); + if (span_count == 1) { + span_cache_t *span_cache = &heap->span_cache; + span_cache->span[span_cache->count++] = span; + if (span_cache->count == MAX_THREAD_SPAN_CACHE) { + const size_t remain_count = + MAX_THREAD_SPAN_CACHE - THREAD_SPAN_CACHE_TRANSFER; +#if ENABLE_GLOBAL_CACHE + _rpmalloc_stat_add64(&heap->thread_to_global, + THREAD_SPAN_CACHE_TRANSFER * _memory_span_size); + _rpmalloc_stat_add(&heap->span_use[span_count - 1].spans_to_global, + THREAD_SPAN_CACHE_TRANSFER); + _rpmalloc_global_cache_insert_spans(span_cache->span + remain_count, + span_count, + THREAD_SPAN_CACHE_TRANSFER); +#else + for (size_t ispan = 0; ispan < THREAD_SPAN_CACHE_TRANSFER; ++ispan) + _rpmalloc_span_unmap(span_cache->span[remain_count + ispan]); +#endif + span_cache->count = remain_count; + } + } else { + size_t cache_idx = span_count - 2; + span_large_cache_t *span_cache = heap->span_large_cache + cache_idx; + span_cache->span[span_cache->count++] = span; + const size_t cache_limit = + (MAX_THREAD_SPAN_LARGE_CACHE - (span_count >> 1)); + if (span_cache->count == cache_limit) { + const size_t transfer_limit = 2 + (cache_limit >> 2); + const size_t transfer_count = + (THREAD_SPAN_LARGE_CACHE_TRANSFER <= transfer_limit + ? THREAD_SPAN_LARGE_CACHE_TRANSFER + : transfer_limit); + const size_t remain_count = cache_limit - transfer_count; +#if ENABLE_GLOBAL_CACHE + _rpmalloc_stat_add64(&heap->thread_to_global, + transfer_count * span_count * _memory_span_size); + _rpmalloc_stat_add(&heap->span_use[span_count - 1].spans_to_global, + transfer_count); + _rpmalloc_global_cache_insert_spans(span_cache->span + remain_count, + span_count, transfer_count); +#else + for (size_t ispan = 0; ispan < transfer_count; ++ispan) + _rpmalloc_span_unmap(span_cache->span[remain_count + ispan]); +#endif + span_cache->count = remain_count; + } + } +#else + (void)sizeof(heap); + _rpmalloc_span_unmap(span); +#endif +} + +//! Extract the given number of spans from the different cache levels +static span_t *_rpmalloc_heap_thread_cache_extract(heap_t *heap, + size_t span_count) { + span_t *span = 0; +#if ENABLE_THREAD_CACHE + span_cache_t *span_cache; + if (span_count == 1) + span_cache = &heap->span_cache; + else + span_cache = (span_cache_t *)(heap->span_large_cache + (span_count - 2)); + if (span_cache->count) { + _rpmalloc_stat_inc(&heap->span_use[span_count - 1].spans_from_cache); + return span_cache->span[--span_cache->count]; + } +#endif + return span; +} + +static span_t *_rpmalloc_heap_thread_cache_deferred_extract(heap_t *heap, + size_t span_count) { + span_t *span = 0; + if (span_count == 1) { + _rpmalloc_heap_cache_adopt_deferred(heap, &span); + } else { + _rpmalloc_heap_cache_adopt_deferred(heap, 0); + span = _rpmalloc_heap_thread_cache_extract(heap, span_count); + } + return span; +} + +static span_t *_rpmalloc_heap_reserved_extract(heap_t *heap, + size_t span_count) { + if (heap->spans_reserved >= span_count) + return _rpmalloc_span_map(heap, span_count); + return 0; +} + +//! Extract a span from the global cache +static span_t *_rpmalloc_heap_global_cache_extract(heap_t *heap, + size_t span_count) { +#if ENABLE_GLOBAL_CACHE +#if ENABLE_THREAD_CACHE + span_cache_t *span_cache; + size_t wanted_count; + if (span_count == 1) { + span_cache = &heap->span_cache; + wanted_count = THREAD_SPAN_CACHE_TRANSFER; + } else { + span_cache = (span_cache_t *)(heap->span_large_cache + (span_count - 2)); + wanted_count = THREAD_SPAN_LARGE_CACHE_TRANSFER; + } + span_cache->count = _rpmalloc_global_cache_extract_spans( + span_cache->span, span_count, wanted_count); + if (span_cache->count) { + _rpmalloc_stat_add64(&heap->global_to_thread, + span_count * span_cache->count * _memory_span_size); + _rpmalloc_stat_add(&heap->span_use[span_count - 1].spans_from_global, + span_cache->count); + return span_cache->span[--span_cache->count]; + } +#else + span_t *span = 0; + size_t count = _rpmalloc_global_cache_extract_spans(&span, span_count, 1); + if (count) { + _rpmalloc_stat_add64(&heap->global_to_thread, + span_count * count * _memory_span_size); + _rpmalloc_stat_add(&heap->span_use[span_count - 1].spans_from_global, + count); + return span; + } +#endif +#endif + (void)sizeof(heap); + (void)sizeof(span_count); + return 0; +} + +static void _rpmalloc_inc_span_statistics(heap_t *heap, size_t span_count, + uint32_t class_idx) { + (void)sizeof(heap); + (void)sizeof(span_count); + (void)sizeof(class_idx); +#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS + uint32_t idx = (uint32_t)span_count - 1; + uint32_t current_count = + (uint32_t)atomic_incr32(&heap->span_use[idx].current); + if (current_count > (uint32_t)atomic_load32(&heap->span_use[idx].high)) + atomic_store32(&heap->span_use[idx].high, (int32_t)current_count); + _rpmalloc_stat_add_peak(&heap->size_class_use[class_idx].spans_current, 1, + heap->size_class_use[class_idx].spans_peak); +#endif +} + +//! Get a span from one of the cache levels (thread cache, reserved, global +//! cache) or fallback to mapping more memory +static span_t * +_rpmalloc_heap_extract_new_span(heap_t *heap, + heap_size_class_t *heap_size_class, + size_t span_count, uint32_t class_idx) { + span_t *span; +#if ENABLE_THREAD_CACHE + if (heap_size_class && heap_size_class->cache) { + span = heap_size_class->cache; + heap_size_class->cache = + (heap->span_cache.count + ? heap->span_cache.span[--heap->span_cache.count] + : 0); + _rpmalloc_inc_span_statistics(heap, span_count, class_idx); + return span; + } +#endif + (void)sizeof(class_idx); + // Allow 50% overhead to increase cache hits + size_t base_span_count = span_count; + size_t limit_span_count = + (span_count > 2) ? (span_count + (span_count >> 1)) : span_count; + if (limit_span_count > LARGE_CLASS_COUNT) + limit_span_count = LARGE_CLASS_COUNT; + do { + span = _rpmalloc_heap_thread_cache_extract(heap, span_count); + if (EXPECTED(span != 0)) { + _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_from_cache); + _rpmalloc_inc_span_statistics(heap, span_count, class_idx); + return span; + } + span = _rpmalloc_heap_thread_cache_deferred_extract(heap, span_count); + if (EXPECTED(span != 0)) { + _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_from_cache); + _rpmalloc_inc_span_statistics(heap, span_count, class_idx); + return span; + } + span = _rpmalloc_heap_global_cache_extract(heap, span_count); + if (EXPECTED(span != 0)) { + _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_from_cache); + _rpmalloc_inc_span_statistics(heap, span_count, class_idx); + return span; + } + span = _rpmalloc_heap_reserved_extract(heap, span_count); + if (EXPECTED(span != 0)) { + _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_from_reserved); + _rpmalloc_inc_span_statistics(heap, span_count, class_idx); + return span; + } + ++span_count; + } while (span_count <= limit_span_count); + // Final fallback, map in more virtual memory + span = _rpmalloc_span_map(heap, base_span_count); + _rpmalloc_inc_span_statistics(heap, base_span_count, class_idx); + _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_map_calls); + return span; +} + +static void _rpmalloc_heap_initialize(heap_t *heap) { + _rpmalloc_memset_const(heap, 0, sizeof(heap_t)); + // Get a new heap ID + heap->id = 1 + atomic_incr32(&_memory_heap_id); + + // Link in heap in heap ID map + size_t list_idx = (size_t)heap->id % HEAP_ARRAY_SIZE; + heap->next_heap = _memory_heaps[list_idx]; + _memory_heaps[list_idx] = heap; +} + +static void _rpmalloc_heap_orphan(heap_t *heap, int first_class) { + heap->owner_thread = (uintptr_t)-1; +#if RPMALLOC_FIRST_CLASS_HEAPS + heap_t **heap_list = + (first_class ? &_memory_first_class_orphan_heaps : &_memory_orphan_heaps); +#else + (void)sizeof(first_class); + heap_t **heap_list = &_memory_orphan_heaps; +#endif + heap->next_orphan = *heap_list; + *heap_list = heap; +} + +//! Allocate a new heap from newly mapped memory pages +static heap_t *_rpmalloc_heap_allocate_new(void) { + // Map in pages for a 16 heaps. If page size is greater than required size for + // this, map a page and use first part for heaps and remaining part for spans + // for allocations. Adds a lot of complexity, but saves a lot of memory on + // systems where page size > 64 spans (4MiB) + size_t heap_size = sizeof(heap_t); + size_t aligned_heap_size = 16 * ((heap_size + 15) / 16); + size_t request_heap_count = 16; + size_t heap_span_count = ((aligned_heap_size * request_heap_count) + + sizeof(span_t) + _memory_span_size - 1) / + _memory_span_size; + size_t block_size = _memory_span_size * heap_span_count; + size_t span_count = heap_span_count; + span_t *span = 0; + // If there are global reserved spans, use these first + if (_memory_global_reserve_count >= heap_span_count) { + span = _rpmalloc_global_get_reserved_spans(heap_span_count); + } + if (!span) { + if (_memory_page_size > block_size) { + span_count = _memory_page_size / _memory_span_size; + block_size = _memory_page_size; + // If using huge pages, make sure to grab enough heaps to avoid + // reallocating a huge page just to serve new heaps + size_t possible_heap_count = + (block_size - sizeof(span_t)) / aligned_heap_size; + if (possible_heap_count >= (request_heap_count * 16)) + request_heap_count *= 16; + else if (possible_heap_count < request_heap_count) + request_heap_count = possible_heap_count; + heap_span_count = ((aligned_heap_size * request_heap_count) + + sizeof(span_t) + _memory_span_size - 1) / + _memory_span_size; + } + + size_t align_offset = 0; + span = (span_t *)_rpmalloc_mmap(block_size, &align_offset); + if (!span) + return 0; + + // Master span will contain the heaps + _rpmalloc_stat_inc(&_master_spans); + _rpmalloc_span_initialize(span, span_count, heap_span_count, align_offset); + } + + size_t remain_size = _memory_span_size - sizeof(span_t); + heap_t *heap = (heap_t *)pointer_offset(span, sizeof(span_t)); + _rpmalloc_heap_initialize(heap); + + // Put extra heaps as orphans + size_t num_heaps = remain_size / aligned_heap_size; + if (num_heaps < request_heap_count) + num_heaps = request_heap_count; + atomic_store32(&heap->child_count, (int32_t)num_heaps - 1); + heap_t *extra_heap = (heap_t *)pointer_offset(heap, aligned_heap_size); + while (num_heaps > 1) { + _rpmalloc_heap_initialize(extra_heap); + extra_heap->master_heap = heap; + _rpmalloc_heap_orphan(extra_heap, 1); + extra_heap = (heap_t *)pointer_offset(extra_heap, aligned_heap_size); + --num_heaps; + } + + if (span_count > heap_span_count) { + // Cap reserved spans + size_t remain_count = span_count - heap_span_count; + size_t reserve_count = + (remain_count > _memory_heap_reserve_count ? _memory_heap_reserve_count + : remain_count); + span_t *remain_span = + (span_t *)pointer_offset(span, heap_span_count * _memory_span_size); + _rpmalloc_heap_set_reserved_spans(heap, span, remain_span, reserve_count); + + if (remain_count > reserve_count) { + // Set to global reserved spans + remain_span = (span_t *)pointer_offset(remain_span, + reserve_count * _memory_span_size); + reserve_count = remain_count - reserve_count; + _rpmalloc_global_set_reserved_spans(span, remain_span, reserve_count); + } + } + + return heap; +} + +static heap_t *_rpmalloc_heap_extract_orphan(heap_t **heap_list) { + heap_t *heap = *heap_list; + *heap_list = (heap ? heap->next_orphan : 0); + return heap; +} + +//! Allocate a new heap, potentially reusing a previously orphaned heap +static heap_t *_rpmalloc_heap_allocate(int first_class) { + heap_t *heap = 0; + while (!atomic_cas32_acquire(&_memory_global_lock, 1, 0)) + _rpmalloc_spin(); + if (first_class == 0) + heap = _rpmalloc_heap_extract_orphan(&_memory_orphan_heaps); +#if RPMALLOC_FIRST_CLASS_HEAPS + if (!heap) + heap = _rpmalloc_heap_extract_orphan(&_memory_first_class_orphan_heaps); +#endif + if (!heap) + heap = _rpmalloc_heap_allocate_new(); + atomic_store32_release(&_memory_global_lock, 0); + if (heap) + _rpmalloc_heap_cache_adopt_deferred(heap, 0); + return heap; +} + +static void _rpmalloc_heap_release(void *heapptr, int first_class, + int release_cache) { + heap_t *heap = (heap_t *)heapptr; + if (!heap) + return; + // Release thread cache spans back to global cache + _rpmalloc_heap_cache_adopt_deferred(heap, 0); + if (release_cache || heap->finalize) { +#if ENABLE_THREAD_CACHE + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + span_cache_t *span_cache; + if (!iclass) + span_cache = &heap->span_cache; + else + span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); + if (!span_cache->count) + continue; +#if ENABLE_GLOBAL_CACHE + if (heap->finalize) { + for (size_t ispan = 0; ispan < span_cache->count; ++ispan) + _rpmalloc_span_unmap(span_cache->span[ispan]); + } else { + _rpmalloc_stat_add64(&heap->thread_to_global, span_cache->count * + (iclass + 1) * + _memory_span_size); + _rpmalloc_stat_add(&heap->span_use[iclass].spans_to_global, + span_cache->count); + _rpmalloc_global_cache_insert_spans(span_cache->span, iclass + 1, + span_cache->count); + } +#else + for (size_t ispan = 0; ispan < span_cache->count; ++ispan) + _rpmalloc_span_unmap(span_cache->span[ispan]); +#endif + span_cache->count = 0; + } +#endif + } + + if (get_thread_heap_raw() == heap) + set_thread_heap(0); + +#if ENABLE_STATISTICS + atomic_decr32(&_memory_active_heaps); + rpmalloc_assert(atomic_load32(&_memory_active_heaps) >= 0, + "Still active heaps during finalization"); +#endif + + // If we are forcibly terminating with _exit the state of the + // lock atomic is unknown and it's best to just go ahead and exit + if (get_thread_id() != _rpmalloc_main_thread_id) { + while (!atomic_cas32_acquire(&_memory_global_lock, 1, 0)) + _rpmalloc_spin(); + } + _rpmalloc_heap_orphan(heap, first_class); + atomic_store32_release(&_memory_global_lock, 0); +} + +static void _rpmalloc_heap_release_raw(void *heapptr, int release_cache) { + _rpmalloc_heap_release(heapptr, 0, release_cache); +} + +static void _rpmalloc_heap_release_raw_fc(void *heapptr) { + _rpmalloc_heap_release_raw(heapptr, 1); +} + +static void _rpmalloc_heap_finalize(heap_t *heap) { + if (heap->spans_reserved) { + span_t *span = _rpmalloc_span_map(heap, heap->spans_reserved); + _rpmalloc_span_unmap(span); + heap->spans_reserved = 0; + } + + _rpmalloc_heap_cache_adopt_deferred(heap, 0); + + for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { + if (heap->size_class[iclass].cache) + _rpmalloc_span_unmap(heap->size_class[iclass].cache); + heap->size_class[iclass].cache = 0; + span_t *span = heap->size_class[iclass].partial_span; + while (span) { + span_t *next = span->next; + _rpmalloc_span_finalize(heap, iclass, span, + &heap->size_class[iclass].partial_span); + span = next; + } + // If class still has a free list it must be a full span + if (heap->size_class[iclass].free_list) { + span_t *class_span = + (span_t *)((uintptr_t)heap->size_class[iclass].free_list & + _memory_span_mask); + span_t **list = 0; +#if RPMALLOC_FIRST_CLASS_HEAPS + list = &heap->full_span[iclass]; +#endif + --heap->full_span_count; + if (!_rpmalloc_span_finalize(heap, iclass, class_span, list)) { + if (list) + _rpmalloc_span_double_link_list_remove(list, class_span); + _rpmalloc_span_double_link_list_add( + &heap->size_class[iclass].partial_span, class_span); + } + } + } + +#if ENABLE_THREAD_CACHE + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + span_cache_t *span_cache; + if (!iclass) + span_cache = &heap->span_cache; + else + span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); + for (size_t ispan = 0; ispan < span_cache->count; ++ispan) + _rpmalloc_span_unmap(span_cache->span[ispan]); + span_cache->count = 0; + } +#endif + rpmalloc_assert(!atomic_load_ptr(&heap->span_free_deferred), + "Heaps still active during finalization"); +} + +//////////// +/// +/// Allocation entry points +/// +////// + +//! Pop first block from a free list +static void *free_list_pop(void **list) { + void *block = *list; + *list = *((void **)block); + return block; +} + +//! Allocate a small/medium sized memory block from the given heap +static void *_rpmalloc_allocate_from_heap_fallback( + heap_t *heap, heap_size_class_t *heap_size_class, uint32_t class_idx) { + span_t *span = heap_size_class->partial_span; + rpmalloc_assume(heap != 0); + if (EXPECTED(span != 0)) { + rpmalloc_assert(span->block_count == + _memory_size_class[span->size_class].block_count, + "Span block count corrupted"); + rpmalloc_assert(!_rpmalloc_span_is_fully_utilized(span), + "Internal failure"); + void *block; + if (span->free_list) { + // Span local free list is not empty, swap to size class free list + block = free_list_pop(&span->free_list); + heap_size_class->free_list = span->free_list; + span->free_list = 0; + } else { + // If the span did not fully initialize free list, link up another page + // worth of blocks + void *block_start = pointer_offset( + span, SPAN_HEADER_SIZE + + ((size_t)span->free_list_limit * span->block_size)); + span->free_list_limit += free_list_partial_init( + &heap_size_class->free_list, &block, + (void *)((uintptr_t)block_start & ~(_memory_page_size - 1)), + block_start, span->block_count - span->free_list_limit, + span->block_size); + } + rpmalloc_assert(span->free_list_limit <= span->block_count, + "Span block count corrupted"); + span->used_count = span->free_list_limit; + + // Swap in deferred free list if present + if (atomic_load_ptr(&span->free_list_deferred)) + _rpmalloc_span_extract_free_list_deferred(span); + + // If span is still not fully utilized keep it in partial list and early + // return block + if (!_rpmalloc_span_is_fully_utilized(span)) + return block; + + // The span is fully utilized, unlink from partial list and add to fully + // utilized list + _rpmalloc_span_double_link_list_pop_head(&heap_size_class->partial_span, + span); +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_add(&heap->full_span[class_idx], span); +#endif + ++heap->full_span_count; + return block; + } + + // Find a span in one of the cache levels + span = _rpmalloc_heap_extract_new_span(heap, heap_size_class, 1, class_idx); + if (EXPECTED(span != 0)) { + // Mark span as owned by this heap and set base data, return first block + return _rpmalloc_span_initialize_new(heap, heap_size_class, span, + class_idx); + } + + return 0; +} + +//! Allocate a small sized memory block from the given heap +static void *_rpmalloc_allocate_small(heap_t *heap, size_t size) { + rpmalloc_assert(heap, "No thread heap"); + // Small sizes have unique size classes + const uint32_t class_idx = + (uint32_t)((size + (SMALL_GRANULARITY - 1)) >> SMALL_GRANULARITY_SHIFT); + heap_size_class_t *heap_size_class = heap->size_class + class_idx; + _rpmalloc_stat_inc_alloc(heap, class_idx); + if (EXPECTED(heap_size_class->free_list != 0)) + return free_list_pop(&heap_size_class->free_list); + return _rpmalloc_allocate_from_heap_fallback(heap, heap_size_class, + class_idx); +} + +//! Allocate a medium sized memory block from the given heap +static void *_rpmalloc_allocate_medium(heap_t *heap, size_t size) { + rpmalloc_assert(heap, "No thread heap"); + // Calculate the size class index and do a dependent lookup of the final class + // index (in case of merged classes) + const uint32_t base_idx = + (uint32_t)(SMALL_CLASS_COUNT + + ((size - (SMALL_SIZE_LIMIT + 1)) >> MEDIUM_GRANULARITY_SHIFT)); + const uint32_t class_idx = _memory_size_class[base_idx].class_idx; + heap_size_class_t *heap_size_class = heap->size_class + class_idx; + _rpmalloc_stat_inc_alloc(heap, class_idx); + if (EXPECTED(heap_size_class->free_list != 0)) + return free_list_pop(&heap_size_class->free_list); + return _rpmalloc_allocate_from_heap_fallback(heap, heap_size_class, + class_idx); +} + +//! Allocate a large sized memory block from the given heap +static void *_rpmalloc_allocate_large(heap_t *heap, size_t size) { + rpmalloc_assert(heap, "No thread heap"); + // Calculate number of needed max sized spans (including header) + // Since this function is never called if size > LARGE_SIZE_LIMIT + // the span_count is guaranteed to be <= LARGE_CLASS_COUNT + size += SPAN_HEADER_SIZE; + size_t span_count = size >> _memory_span_size_shift; + if (size & (_memory_span_size - 1)) + ++span_count; + + // Find a span in one of the cache levels + span_t *span = + _rpmalloc_heap_extract_new_span(heap, 0, span_count, SIZE_CLASS_LARGE); + if (!span) + return span; + + // Mark span as owned by this heap and set base data + rpmalloc_assert(span->span_count >= span_count, "Internal failure"); + span->size_class = SIZE_CLASS_LARGE; + span->heap = heap; + +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_add(&heap->large_huge_span, span); +#endif + ++heap->full_span_count; + + return pointer_offset(span, SPAN_HEADER_SIZE); +} + +//! Allocate a huge block by mapping memory pages directly +static void *_rpmalloc_allocate_huge(heap_t *heap, size_t size) { + rpmalloc_assert(heap, "No thread heap"); + _rpmalloc_heap_cache_adopt_deferred(heap, 0); + size += SPAN_HEADER_SIZE; + size_t num_pages = size >> _memory_page_size_shift; + if (size & (_memory_page_size - 1)) + ++num_pages; + size_t align_offset = 0; + span_t *span = + (span_t *)_rpmalloc_mmap(num_pages * _memory_page_size, &align_offset); + if (!span) + return span; + + // Store page count in span_count + span->size_class = SIZE_CLASS_HUGE; + span->span_count = (uint32_t)num_pages; + span->align_offset = (uint32_t)align_offset; + span->heap = heap; + _rpmalloc_stat_add_peak(&_huge_pages_current, num_pages, _huge_pages_peak); + +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_add(&heap->large_huge_span, span); +#endif + ++heap->full_span_count; + + return pointer_offset(span, SPAN_HEADER_SIZE); +} + +//! Allocate a block of the given size +static void *_rpmalloc_allocate(heap_t *heap, size_t size) { + _rpmalloc_stat_add64(&_allocation_counter, 1); + if (EXPECTED(size <= SMALL_SIZE_LIMIT)) + return _rpmalloc_allocate_small(heap, size); + else if (size <= _memory_medium_size_limit) + return _rpmalloc_allocate_medium(heap, size); + else if (size <= LARGE_SIZE_LIMIT) + return _rpmalloc_allocate_large(heap, size); + return _rpmalloc_allocate_huge(heap, size); +} + +static void *_rpmalloc_aligned_allocate(heap_t *heap, size_t alignment, + size_t size) { + if (alignment <= SMALL_GRANULARITY) + return _rpmalloc_allocate(heap, size); + +#if ENABLE_VALIDATE_ARGS + if ((size + alignment) < size) { + errno = EINVAL; + return 0; + } + if (alignment & (alignment - 1)) { + errno = EINVAL; + return 0; + } +#endif + + if ((alignment <= SPAN_HEADER_SIZE) && + ((size + SPAN_HEADER_SIZE) < _memory_medium_size_limit)) { + // If alignment is less or equal to span header size (which is power of + // two), and size aligned to span header size multiples is less than size + + // alignment, then use natural alignment of blocks to provide alignment + size_t multiple_size = size ? (size + (SPAN_HEADER_SIZE - 1)) & + ~(uintptr_t)(SPAN_HEADER_SIZE - 1) + : SPAN_HEADER_SIZE; + rpmalloc_assert(!(multiple_size % SPAN_HEADER_SIZE), + "Failed alignment calculation"); + if (multiple_size <= (size + alignment)) + return _rpmalloc_allocate(heap, multiple_size); + } + + void *ptr = 0; + size_t align_mask = alignment - 1; + if (alignment <= _memory_page_size) { + ptr = _rpmalloc_allocate(heap, size + alignment); + if ((uintptr_t)ptr & align_mask) { + ptr = (void *)(((uintptr_t)ptr & ~(uintptr_t)align_mask) + alignment); + // Mark as having aligned blocks + span_t *span = (span_t *)((uintptr_t)ptr & _memory_span_mask); + span->flags |= SPAN_FLAG_ALIGNED_BLOCKS; + } + return ptr; + } + + // Fallback to mapping new pages for this request. Since pointers passed + // to rpfree must be able to reach the start of the span by bitmasking of + // the address with the span size, the returned aligned pointer from this + // function must be with a span size of the start of the mapped area. + // In worst case this requires us to loop and map pages until we get a + // suitable memory address. It also means we can never align to span size + // or greater, since the span header will push alignment more than one + // span size away from span start (thus causing pointer mask to give us + // an invalid span start on free) + if (alignment & align_mask) { + errno = EINVAL; + return 0; + } + if (alignment >= _memory_span_size) { + errno = EINVAL; + return 0; + } + + size_t extra_pages = alignment / _memory_page_size; + + // Since each span has a header, we will at least need one extra memory page + size_t num_pages = 1 + (size / _memory_page_size); + if (size & (_memory_page_size - 1)) + ++num_pages; + + if (extra_pages > num_pages) + num_pages = 1 + extra_pages; + + size_t original_pages = num_pages; + size_t limit_pages = (_memory_span_size / _memory_page_size) * 2; + if (limit_pages < (original_pages * 2)) + limit_pages = original_pages * 2; + + size_t mapped_size, align_offset; + span_t *span; + +retry: + align_offset = 0; + mapped_size = num_pages * _memory_page_size; + + span = (span_t *)_rpmalloc_mmap(mapped_size, &align_offset); + if (!span) { + errno = ENOMEM; + return 0; + } + ptr = pointer_offset(span, SPAN_HEADER_SIZE); + + if ((uintptr_t)ptr & align_mask) + ptr = (void *)(((uintptr_t)ptr & ~(uintptr_t)align_mask) + alignment); + + if (((size_t)pointer_diff(ptr, span) >= _memory_span_size) || + (pointer_offset(ptr, size) > pointer_offset(span, mapped_size)) || + (((uintptr_t)ptr & _memory_span_mask) != (uintptr_t)span)) { + _rpmalloc_unmap(span, mapped_size, align_offset, mapped_size); + ++num_pages; + if (num_pages > limit_pages) { + errno = EINVAL; + return 0; + } + goto retry; + } + + // Store page count in span_count + span->size_class = SIZE_CLASS_HUGE; + span->span_count = (uint32_t)num_pages; + span->align_offset = (uint32_t)align_offset; + span->heap = heap; + _rpmalloc_stat_add_peak(&_huge_pages_current, num_pages, _huge_pages_peak); + +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_add(&heap->large_huge_span, span); +#endif + ++heap->full_span_count; + + _rpmalloc_stat_add64(&_allocation_counter, 1); + + return ptr; +} + +//////////// +/// +/// Deallocation entry points +/// +////// + +//! Deallocate the given small/medium memory block in the current thread local +//! heap +static void _rpmalloc_deallocate_direct_small_or_medium(span_t *span, + void *block) { + heap_t *heap = span->heap; + rpmalloc_assert(heap->owner_thread == get_thread_id() || + !heap->owner_thread || heap->finalize, + "Internal failure"); + // Add block to free list + if (UNEXPECTED(_rpmalloc_span_is_fully_utilized(span))) { + span->used_count = span->block_count; +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_remove(&heap->full_span[span->size_class], + span); +#endif + _rpmalloc_span_double_link_list_add( + &heap->size_class[span->size_class].partial_span, span); + --heap->full_span_count; + } + *((void **)block) = span->free_list; + --span->used_count; + span->free_list = block; + if (UNEXPECTED(span->used_count == span->list_size)) { + // If there are no used blocks it is guaranteed that no other external + // thread is accessing the span + if (span->used_count) { + // Make sure we have synchronized the deferred list and list size by using + // acquire semantics and guarantee that no external thread is accessing + // span concurrently + void *free_list; + do { + free_list = atomic_exchange_ptr_acquire(&span->free_list_deferred, + INVALID_POINTER); + } while (free_list == INVALID_POINTER); + atomic_store_ptr_release(&span->free_list_deferred, free_list); + } + _rpmalloc_span_double_link_list_remove( + &heap->size_class[span->size_class].partial_span, span); + _rpmalloc_span_release_to_cache(heap, span); + } +} + +static void _rpmalloc_deallocate_defer_free_span(heap_t *heap, span_t *span) { + if (span->size_class != SIZE_CLASS_HUGE) + _rpmalloc_stat_inc(&heap->span_use[span->span_count - 1].spans_deferred); + // This list does not need ABA protection, no mutable side state + do { + span->free_list = (void *)atomic_load_ptr(&heap->span_free_deferred); + } while (!atomic_cas_ptr(&heap->span_free_deferred, span, span->free_list)); +} + +//! Put the block in the deferred free list of the owning span +static void _rpmalloc_deallocate_defer_small_or_medium(span_t *span, + void *block) { + // The memory ordering here is a bit tricky, to avoid having to ABA protect + // the deferred free list to avoid desynchronization of list and list size + // we need to have acquire semantics on successful CAS of the pointer to + // guarantee the list_size variable validity + release semantics on pointer + // store + void *free_list; + do { + free_list = + atomic_exchange_ptr_acquire(&span->free_list_deferred, INVALID_POINTER); + } while (free_list == INVALID_POINTER); + *((void **)block) = free_list; + uint32_t free_count = ++span->list_size; + int all_deferred_free = (free_count == span->block_count); + atomic_store_ptr_release(&span->free_list_deferred, block); + if (all_deferred_free) { + // Span was completely freed by this block. Due to the INVALID_POINTER spin + // lock no other thread can reach this state simultaneously on this span. + // Safe to move to owner heap deferred cache + _rpmalloc_deallocate_defer_free_span(span->heap, span); + } +} + +static void _rpmalloc_deallocate_small_or_medium(span_t *span, void *p) { + _rpmalloc_stat_inc_free(span->heap, span->size_class); + if (span->flags & SPAN_FLAG_ALIGNED_BLOCKS) { + // Realign pointer to block start + void *blocks_start = pointer_offset(span, SPAN_HEADER_SIZE); + uint32_t block_offset = (uint32_t)pointer_diff(p, blocks_start); + p = pointer_offset(p, -(int32_t)(block_offset % span->block_size)); + } + // Check if block belongs to this heap or if deallocation should be deferred +#if RPMALLOC_FIRST_CLASS_HEAPS + int defer = + (span->heap->owner_thread && + (span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); +#else + int defer = + ((span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); +#endif + if (!defer) + _rpmalloc_deallocate_direct_small_or_medium(span, p); + else + _rpmalloc_deallocate_defer_small_or_medium(span, p); +} + +//! Deallocate the given large memory block to the current heap +static void _rpmalloc_deallocate_large(span_t *span) { + rpmalloc_assert(span->size_class == SIZE_CLASS_LARGE, "Bad span size class"); + rpmalloc_assert(!(span->flags & SPAN_FLAG_MASTER) || + !(span->flags & SPAN_FLAG_SUBSPAN), + "Span flag corrupted"); + rpmalloc_assert((span->flags & SPAN_FLAG_MASTER) || + (span->flags & SPAN_FLAG_SUBSPAN), + "Span flag corrupted"); + // We must always defer (unless finalizing) if from another heap since we + // cannot touch the list or counters of another heap +#if RPMALLOC_FIRST_CLASS_HEAPS + int defer = + (span->heap->owner_thread && + (span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); +#else + int defer = + ((span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); +#endif + if (defer) { + _rpmalloc_deallocate_defer_free_span(span->heap, span); + return; + } + rpmalloc_assert(span->heap->full_span_count, "Heap span counter corrupted"); + --span->heap->full_span_count; +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_remove(&span->heap->large_huge_span, span); +#endif +#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS + // Decrease counter + size_t idx = span->span_count - 1; + atomic_decr32(&span->heap->span_use[idx].current); +#endif + heap_t *heap = span->heap; + rpmalloc_assert(heap, "No thread heap"); +#if ENABLE_THREAD_CACHE + const int set_as_reserved = + ((span->span_count > 1) && (heap->span_cache.count == 0) && + !heap->finalize && !heap->spans_reserved); +#else + const int set_as_reserved = + ((span->span_count > 1) && !heap->finalize && !heap->spans_reserved); +#endif + if (set_as_reserved) { + heap->span_reserve = span; + heap->spans_reserved = span->span_count; + if (span->flags & SPAN_FLAG_MASTER) { + heap->span_reserve_master = span; + } else { // SPAN_FLAG_SUBSPAN + span_t *master = (span_t *)pointer_offset( + span, + -(intptr_t)((size_t)span->offset_from_master * _memory_span_size)); + heap->span_reserve_master = master; + rpmalloc_assert(master->flags & SPAN_FLAG_MASTER, "Span flag corrupted"); + rpmalloc_assert(atomic_load32(&master->remaining_spans) >= + (int32_t)span->span_count, + "Master span count corrupted"); + } + _rpmalloc_stat_inc(&heap->span_use[idx].spans_to_reserved); + } else { + // Insert into cache list + _rpmalloc_heap_cache_insert(heap, span); + } +} + +//! Deallocate the given huge span +static void _rpmalloc_deallocate_huge(span_t *span) { + rpmalloc_assert(span->heap, "No span heap"); +#if RPMALLOC_FIRST_CLASS_HEAPS + int defer = + (span->heap->owner_thread && + (span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); +#else + int defer = + ((span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); +#endif + if (defer) { + _rpmalloc_deallocate_defer_free_span(span->heap, span); + return; + } + rpmalloc_assert(span->heap->full_span_count, "Heap span counter corrupted"); + --span->heap->full_span_count; +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_remove(&span->heap->large_huge_span, span); +#endif + + // Oversized allocation, page count is stored in span_count + size_t num_pages = span->span_count; + _rpmalloc_unmap(span, num_pages * _memory_page_size, span->align_offset, + num_pages * _memory_page_size); + _rpmalloc_stat_sub(&_huge_pages_current, num_pages); +} + +//! Deallocate the given block +static void _rpmalloc_deallocate(void *p) { + _rpmalloc_stat_add64(&_deallocation_counter, 1); + // Grab the span (always at start of span, using span alignment) + span_t *span = (span_t *)((uintptr_t)p & _memory_span_mask); + if (UNEXPECTED(!span)) + return; + if (EXPECTED(span->size_class < SIZE_CLASS_COUNT)) + _rpmalloc_deallocate_small_or_medium(span, p); + else if (span->size_class == SIZE_CLASS_LARGE) + _rpmalloc_deallocate_large(span); + else + _rpmalloc_deallocate_huge(span); +} + +//////////// +/// +/// Reallocation entry points +/// +////// + +static size_t _rpmalloc_usable_size(void *p); + +//! Reallocate the given block to the given size +static void *_rpmalloc_reallocate(heap_t *heap, void *p, size_t size, + size_t oldsize, unsigned int flags) { + if (p) { + // Grab the span using guaranteed span alignment + span_t *span = (span_t *)((uintptr_t)p & _memory_span_mask); + if (EXPECTED(span->size_class < SIZE_CLASS_COUNT)) { + // Small/medium sized block + rpmalloc_assert(span->span_count == 1, "Span counter corrupted"); + void *blocks_start = pointer_offset(span, SPAN_HEADER_SIZE); + uint32_t block_offset = (uint32_t)pointer_diff(p, blocks_start); + uint32_t block_idx = block_offset / span->block_size; + void *block = + pointer_offset(blocks_start, (size_t)block_idx * span->block_size); + if (!oldsize) + oldsize = + (size_t)((ptrdiff_t)span->block_size - pointer_diff(p, block)); + if ((size_t)span->block_size >= size) { + // Still fits in block, never mind trying to save memory, but preserve + // data if alignment changed + if ((p != block) && !(flags & RPMALLOC_NO_PRESERVE)) + memmove(block, p, oldsize); + return block; + } + } else if (span->size_class == SIZE_CLASS_LARGE) { + // Large block + size_t total_size = size + SPAN_HEADER_SIZE; + size_t num_spans = total_size >> _memory_span_size_shift; + if (total_size & (_memory_span_mask - 1)) + ++num_spans; + size_t current_spans = span->span_count; + void *block = pointer_offset(span, SPAN_HEADER_SIZE); + if (!oldsize) + oldsize = (current_spans * _memory_span_size) - + (size_t)pointer_diff(p, block) - SPAN_HEADER_SIZE; + if ((current_spans >= num_spans) && (total_size >= (oldsize / 2))) { + // Still fits in block, never mind trying to save memory, but preserve + // data if alignment changed + if ((p != block) && !(flags & RPMALLOC_NO_PRESERVE)) + memmove(block, p, oldsize); + return block; + } + } else { + // Oversized block + size_t total_size = size + SPAN_HEADER_SIZE; + size_t num_pages = total_size >> _memory_page_size_shift; + if (total_size & (_memory_page_size - 1)) + ++num_pages; + // Page count is stored in span_count + size_t current_pages = span->span_count; + void *block = pointer_offset(span, SPAN_HEADER_SIZE); + if (!oldsize) + oldsize = (current_pages * _memory_page_size) - + (size_t)pointer_diff(p, block) - SPAN_HEADER_SIZE; + if ((current_pages >= num_pages) && (num_pages >= (current_pages / 2))) { + // Still fits in block, never mind trying to save memory, but preserve + // data if alignment changed + if ((p != block) && !(flags & RPMALLOC_NO_PRESERVE)) + memmove(block, p, oldsize); + return block; + } + } + } else { + oldsize = 0; + } + + if (!!(flags & RPMALLOC_GROW_OR_FAIL)) + return 0; + + // Size is greater than block size, need to allocate a new block and + // deallocate the old Avoid hysteresis by overallocating if increase is small + // (below 37%) + size_t lower_bound = oldsize + (oldsize >> 2) + (oldsize >> 3); + size_t new_size = + (size > lower_bound) ? size : ((size > oldsize) ? lower_bound : size); + void *block = _rpmalloc_allocate(heap, new_size); + if (p && block) { + if (!(flags & RPMALLOC_NO_PRESERVE)) + memcpy(block, p, oldsize < new_size ? oldsize : new_size); + _rpmalloc_deallocate(p); + } + + return block; +} + +static void *_rpmalloc_aligned_reallocate(heap_t *heap, void *ptr, + size_t alignment, size_t size, + size_t oldsize, unsigned int flags) { + if (alignment <= SMALL_GRANULARITY) + return _rpmalloc_reallocate(heap, ptr, size, oldsize, flags); + + int no_alloc = !!(flags & RPMALLOC_GROW_OR_FAIL); + size_t usablesize = (ptr ? _rpmalloc_usable_size(ptr) : 0); + if ((usablesize >= size) && !((uintptr_t)ptr & (alignment - 1))) { + if (no_alloc || (size >= (usablesize / 2))) + return ptr; + } + // Aligned alloc marks span as having aligned blocks + void *block = + (!no_alloc ? _rpmalloc_aligned_allocate(heap, alignment, size) : 0); + if (EXPECTED(block != 0)) { + if (!(flags & RPMALLOC_NO_PRESERVE) && ptr) { + if (!oldsize) + oldsize = usablesize; + memcpy(block, ptr, oldsize < size ? oldsize : size); + } + _rpmalloc_deallocate(ptr); + } + return block; +} + +//////////// +/// +/// Initialization, finalization and utility +/// +////// + +//! Get the usable size of the given block +static size_t _rpmalloc_usable_size(void *p) { + // Grab the span using guaranteed span alignment + span_t *span = (span_t *)((uintptr_t)p & _memory_span_mask); + if (span->size_class < SIZE_CLASS_COUNT) { + // Small/medium block + void *blocks_start = pointer_offset(span, SPAN_HEADER_SIZE); + return span->block_size - + ((size_t)pointer_diff(p, blocks_start) % span->block_size); + } + if (span->size_class == SIZE_CLASS_LARGE) { + // Large block + size_t current_spans = span->span_count; + return (current_spans * _memory_span_size) - (size_t)pointer_diff(p, span); + } + // Oversized block, page count is stored in span_count + size_t current_pages = span->span_count; + return (current_pages * _memory_page_size) - (size_t)pointer_diff(p, span); +} + +//! Adjust and optimize the size class properties for the given class +static void _rpmalloc_adjust_size_class(size_t iclass) { + size_t block_size = _memory_size_class[iclass].block_size; + size_t block_count = (_memory_span_size - SPAN_HEADER_SIZE) / block_size; + + _memory_size_class[iclass].block_count = (uint16_t)block_count; + _memory_size_class[iclass].class_idx = (uint16_t)iclass; + + // Check if previous size classes can be merged + if (iclass >= SMALL_CLASS_COUNT) { + size_t prevclass = iclass; + while (prevclass > 0) { + --prevclass; + // A class can be merged if number of pages and number of blocks are equal + if (_memory_size_class[prevclass].block_count == + _memory_size_class[iclass].block_count) + _rpmalloc_memcpy_const(_memory_size_class + prevclass, + _memory_size_class + iclass, + sizeof(_memory_size_class[iclass])); + else + break; + } + } +} + +//! Initialize the allocator and setup global data +extern inline int rpmalloc_initialize(void) { + if (_rpmalloc_initialized) { + rpmalloc_thread_initialize(); + return 0; + } + return rpmalloc_initialize_config(0); +} + +int rpmalloc_initialize_config(const rpmalloc_config_t *config) { + if (_rpmalloc_initialized) { + rpmalloc_thread_initialize(); + return 0; + } + _rpmalloc_initialized = 1; + + if (config) + memcpy(&_memory_config, config, sizeof(rpmalloc_config_t)); + else + _rpmalloc_memset_const(&_memory_config, 0, sizeof(rpmalloc_config_t)); + + if (!_memory_config.memory_map || !_memory_config.memory_unmap) { + _memory_config.memory_map = _rpmalloc_mmap_os; + _memory_config.memory_unmap = _rpmalloc_unmap_os; + } + +#if PLATFORM_WINDOWS + SYSTEM_INFO system_info; + memset(&system_info, 0, sizeof(system_info)); + GetSystemInfo(&system_info); + _memory_map_granularity = system_info.dwAllocationGranularity; +#else + _memory_map_granularity = (size_t)sysconf(_SC_PAGESIZE); +#endif + +#if RPMALLOC_CONFIGURABLE + _memory_page_size = _memory_config.page_size; +#else + _memory_page_size = 0; +#endif + _memory_huge_pages = 0; + if (!_memory_page_size) { +#if PLATFORM_WINDOWS + _memory_page_size = system_info.dwPageSize; +#else + _memory_page_size = _memory_map_granularity; + if (_memory_config.enable_huge_pages) { +#if defined(__linux__) + size_t huge_page_size = 0; + FILE *meminfo = fopen("/proc/meminfo", "r"); + if (meminfo) { + char line[128]; + while (!huge_page_size && fgets(line, sizeof(line) - 1, meminfo)) { + line[sizeof(line) - 1] = 0; + if (strstr(line, "Hugepagesize:")) + huge_page_size = (size_t)strtol(line + 13, 0, 10) * 1024; + } + fclose(meminfo); + } + if (huge_page_size) { + _memory_huge_pages = 1; + _memory_page_size = huge_page_size; + _memory_map_granularity = huge_page_size; + } +#elif defined(__FreeBSD__) + int rc; + size_t sz = sizeof(rc); + + if (sysctlbyname("vm.pmap.pg_ps_enabled", &rc, &sz, NULL, 0) == 0 && + rc == 1) { + static size_t defsize = 2 * 1024 * 1024; + int nsize = 0; + size_t sizes[4] = {0}; + _memory_huge_pages = 1; + _memory_page_size = defsize; + if ((nsize = getpagesizes(sizes, 4)) >= 2) { + nsize--; + for (size_t csize = sizes[nsize]; nsize >= 0 && csize; + --nsize, csize = sizes[nsize]) { + //! Unlikely, but as a precaution.. + rpmalloc_assert(!(csize & (csize - 1)) && !(csize % 1024), + "Invalid page size"); + if (defsize < csize) { + _memory_page_size = csize; + break; + } + } + } + _memory_map_granularity = _memory_page_size; + } +#elif defined(__APPLE__) || defined(__NetBSD__) + _memory_huge_pages = 1; + _memory_page_size = 2 * 1024 * 1024; + _memory_map_granularity = _memory_page_size; +#endif + } +#endif + } else { + if (_memory_config.enable_huge_pages) + _memory_huge_pages = 1; + } + +#if PLATFORM_WINDOWS + if (_memory_config.enable_huge_pages) { + HANDLE token = 0; + size_t large_page_minimum = GetLargePageMinimum(); + if (large_page_minimum) + OpenProcessToken(GetCurrentProcess(), + TOKEN_ADJUST_PRIVILEGES | TOKEN_QUERY, &token); + if (token) { + LUID luid; + if (LookupPrivilegeValue(0, SE_LOCK_MEMORY_NAME, &luid)) { + TOKEN_PRIVILEGES token_privileges; + memset(&token_privileges, 0, sizeof(token_privileges)); + token_privileges.PrivilegeCount = 1; + token_privileges.Privileges[0].Luid = luid; + token_privileges.Privileges[0].Attributes = SE_PRIVILEGE_ENABLED; + if (AdjustTokenPrivileges(token, FALSE, &token_privileges, 0, 0, 0)) { + if (GetLastError() == ERROR_SUCCESS) + _memory_huge_pages = 1; + } + } + CloseHandle(token); + } + if (_memory_huge_pages) { + if (large_page_minimum > _memory_page_size) + _memory_page_size = large_page_minimum; + if (large_page_minimum > _memory_map_granularity) + _memory_map_granularity = large_page_minimum; + } + } +#endif + + size_t min_span_size = 256; + size_t max_page_size; +#if UINTPTR_MAX > 0xFFFFFFFF + max_page_size = 4096ULL * 1024ULL * 1024ULL; +#else + max_page_size = 4 * 1024 * 1024; +#endif + if (_memory_page_size < min_span_size) + _memory_page_size = min_span_size; + if (_memory_page_size > max_page_size) + _memory_page_size = max_page_size; + _memory_page_size_shift = 0; + size_t page_size_bit = _memory_page_size; + while (page_size_bit != 1) { + ++_memory_page_size_shift; + page_size_bit >>= 1; + } + _memory_page_size = ((size_t)1 << _memory_page_size_shift); + +#if RPMALLOC_CONFIGURABLE + if (!_memory_config.span_size) { + _memory_span_size = _memory_default_span_size; + _memory_span_size_shift = _memory_default_span_size_shift; + _memory_span_mask = _memory_default_span_mask; + } else { + size_t span_size = _memory_config.span_size; + if (span_size > (256 * 1024)) + span_size = (256 * 1024); + _memory_span_size = 4096; + _memory_span_size_shift = 12; + while (_memory_span_size < span_size) { + _memory_span_size <<= 1; + ++_memory_span_size_shift; + } + _memory_span_mask = ~(uintptr_t)(_memory_span_size - 1); + } +#endif + + _memory_span_map_count = + (_memory_config.span_map_count ? _memory_config.span_map_count + : DEFAULT_SPAN_MAP_COUNT); + if ((_memory_span_size * _memory_span_map_count) < _memory_page_size) + _memory_span_map_count = (_memory_page_size / _memory_span_size); + if ((_memory_page_size >= _memory_span_size) && + ((_memory_span_map_count * _memory_span_size) % _memory_page_size)) + _memory_span_map_count = (_memory_page_size / _memory_span_size); + _memory_heap_reserve_count = (_memory_span_map_count > DEFAULT_SPAN_MAP_COUNT) + ? DEFAULT_SPAN_MAP_COUNT + : _memory_span_map_count; + + _memory_config.page_size = _memory_page_size; + _memory_config.span_size = _memory_span_size; + _memory_config.span_map_count = _memory_span_map_count; + _memory_config.enable_huge_pages = _memory_huge_pages; + +#if ((defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD) || \ + defined(__TINYC__) + if (pthread_key_create(&_memory_thread_heap, _rpmalloc_heap_release_raw_fc)) + return -1; +#endif +#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) + fls_key = FlsAlloc(&_rpmalloc_thread_destructor); +#endif + + // Setup all small and medium size classes + size_t iclass = 0; + _memory_size_class[iclass].block_size = SMALL_GRANULARITY; + _rpmalloc_adjust_size_class(iclass); + for (iclass = 1; iclass < SMALL_CLASS_COUNT; ++iclass) { + size_t size = iclass * SMALL_GRANULARITY; + _memory_size_class[iclass].block_size = (uint32_t)size; + _rpmalloc_adjust_size_class(iclass); + } + // At least two blocks per span, then fall back to large allocations + _memory_medium_size_limit = (_memory_span_size - SPAN_HEADER_SIZE) >> 1; + if (_memory_medium_size_limit > MEDIUM_SIZE_LIMIT) + _memory_medium_size_limit = MEDIUM_SIZE_LIMIT; + for (iclass = 0; iclass < MEDIUM_CLASS_COUNT; ++iclass) { + size_t size = SMALL_SIZE_LIMIT + ((iclass + 1) * MEDIUM_GRANULARITY); + if (size > _memory_medium_size_limit) { + _memory_medium_size_limit = + SMALL_SIZE_LIMIT + (iclass * MEDIUM_GRANULARITY); + break; + } + _memory_size_class[SMALL_CLASS_COUNT + iclass].block_size = (uint32_t)size; + _rpmalloc_adjust_size_class(SMALL_CLASS_COUNT + iclass); + } + + _memory_orphan_heaps = 0; +#if RPMALLOC_FIRST_CLASS_HEAPS + _memory_first_class_orphan_heaps = 0; +#endif +#if ENABLE_STATISTICS + atomic_store32(&_memory_active_heaps, 0); + atomic_store32(&_mapped_pages, 0); + _mapped_pages_peak = 0; + atomic_store32(&_master_spans, 0); + atomic_store32(&_mapped_total, 0); + atomic_store32(&_unmapped_total, 0); + atomic_store32(&_mapped_pages_os, 0); + atomic_store32(&_huge_pages_current, 0); + _huge_pages_peak = 0; +#endif + memset(_memory_heaps, 0, sizeof(_memory_heaps)); + atomic_store32_release(&_memory_global_lock, 0); + + rpmalloc_linker_reference(); + + // Initialize this thread + rpmalloc_thread_initialize(); + return 0; +} + +//! Finalize the allocator +void rpmalloc_finalize(void) { + rpmalloc_thread_finalize(1); + // rpmalloc_dump_statistics(stdout); + + if (_memory_global_reserve) { + atomic_add32(&_memory_global_reserve_master->remaining_spans, + -(int32_t)_memory_global_reserve_count); + _memory_global_reserve_master = 0; + _memory_global_reserve_count = 0; + _memory_global_reserve = 0; + } + atomic_store32_release(&_memory_global_lock, 0); + + // Free all thread caches and fully free spans + for (size_t list_idx = 0; list_idx < HEAP_ARRAY_SIZE; ++list_idx) { + heap_t *heap = _memory_heaps[list_idx]; + while (heap) { + heap_t *next_heap = heap->next_heap; + heap->finalize = 1; + _rpmalloc_heap_global_finalize(heap); + heap = next_heap; + } + } + +#if ENABLE_GLOBAL_CACHE + // Free global caches + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) + _rpmalloc_global_cache_finalize(&_memory_span_cache[iclass]); +#endif + +#if (defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD + pthread_key_delete(_memory_thread_heap); +#endif +#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) + FlsFree(fls_key); + fls_key = 0; +#endif +#if ENABLE_STATISTICS + // If you hit these asserts you probably have memory leaks (perhaps global + // scope data doing dynamic allocations) or double frees in your code + rpmalloc_assert(atomic_load32(&_mapped_pages) == 0, "Memory leak detected"); + rpmalloc_assert(atomic_load32(&_mapped_pages_os) == 0, + "Memory leak detected"); +#endif + + _rpmalloc_initialized = 0; +} + +//! Initialize thread, assign heap +extern inline void rpmalloc_thread_initialize(void) { + if (!get_thread_heap_raw()) { + heap_t *heap = _rpmalloc_heap_allocate(0); + if (heap) { + _rpmalloc_stat_inc(&_memory_active_heaps); + set_thread_heap(heap); +#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) + FlsSetValue(fls_key, heap); +#endif + } + } +} + +//! Finalize thread, orphan heap +void rpmalloc_thread_finalize(int release_caches) { + heap_t *heap = get_thread_heap_raw(); + if (heap) + _rpmalloc_heap_release_raw(heap, release_caches); + set_thread_heap(0); +#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) + FlsSetValue(fls_key, 0); +#endif +} + +int rpmalloc_is_thread_initialized(void) { + return (get_thread_heap_raw() != 0) ? 1 : 0; +} + +const rpmalloc_config_t *rpmalloc_config(void) { return &_memory_config; } + +// Extern interface + +extern inline RPMALLOC_ALLOCATOR void *rpmalloc(size_t size) { +#if ENABLE_VALIDATE_ARGS + if (size >= MAX_ALLOC_SIZE) { + errno = EINVAL; + return 0; + } +#endif + heap_t *heap = get_thread_heap(); + return _rpmalloc_allocate(heap, size); +} + +extern inline void rpfree(void *ptr) { _rpmalloc_deallocate(ptr); } + +extern inline RPMALLOC_ALLOCATOR void *rpcalloc(size_t num, size_t size) { + size_t total; +#if ENABLE_VALIDATE_ARGS +#if PLATFORM_WINDOWS + int err = SizeTMult(num, size, &total); + if ((err != S_OK) || (total >= MAX_ALLOC_SIZE)) { + errno = EINVAL; + return 0; + } +#else + int err = __builtin_umull_overflow(num, size, &total); + if (err || (total >= MAX_ALLOC_SIZE)) { + errno = EINVAL; + return 0; + } +#endif +#else + total = num * size; +#endif + heap_t *heap = get_thread_heap(); + void *block = _rpmalloc_allocate(heap, total); + if (block) + memset(block, 0, total); + return block; +} + +extern inline RPMALLOC_ALLOCATOR void *rprealloc(void *ptr, size_t size) { +#if ENABLE_VALIDATE_ARGS + if (size >= MAX_ALLOC_SIZE) { + errno = EINVAL; + return ptr; + } +#endif + heap_t *heap = get_thread_heap(); + return _rpmalloc_reallocate(heap, ptr, size, 0, 0); +} + +extern RPMALLOC_ALLOCATOR void *rpaligned_realloc(void *ptr, size_t alignment, + size_t size, size_t oldsize, + unsigned int flags) { +#if ENABLE_VALIDATE_ARGS + if ((size + alignment < size) || (alignment > _memory_page_size)) { + errno = EINVAL; + return 0; + } +#endif + heap_t *heap = get_thread_heap(); + return _rpmalloc_aligned_reallocate(heap, ptr, alignment, size, oldsize, + flags); +} + +extern RPMALLOC_ALLOCATOR void *rpaligned_alloc(size_t alignment, size_t size) { + heap_t *heap = get_thread_heap(); + return _rpmalloc_aligned_allocate(heap, alignment, size); +} + +extern inline RPMALLOC_ALLOCATOR void * +rpaligned_calloc(size_t alignment, size_t num, size_t size) { + size_t total; +#if ENABLE_VALIDATE_ARGS +#if PLATFORM_WINDOWS + int err = SizeTMult(num, size, &total); + if ((err != S_OK) || (total >= MAX_ALLOC_SIZE)) { + errno = EINVAL; + return 0; + } +#else + int err = __builtin_umull_overflow(num, size, &total); + if (err || (total >= MAX_ALLOC_SIZE)) { + errno = EINVAL; + return 0; + } +#endif +#else + total = num * size; +#endif + void *block = rpaligned_alloc(alignment, total); + if (block) + memset(block, 0, total); + return block; +} + +extern inline RPMALLOC_ALLOCATOR void *rpmemalign(size_t alignment, + size_t size) { + return rpaligned_alloc(alignment, size); +} + +extern inline int rpposix_memalign(void **memptr, size_t alignment, + size_t size) { + if (memptr) + *memptr = rpaligned_alloc(alignment, size); + else + return EINVAL; + return *memptr ? 0 : ENOMEM; +} + +extern inline size_t rpmalloc_usable_size(void *ptr) { + return (ptr ? _rpmalloc_usable_size(ptr) : 0); +} + +extern inline void rpmalloc_thread_collect(void) {} + +void rpmalloc_thread_statistics(rpmalloc_thread_statistics_t *stats) { + memset(stats, 0, sizeof(rpmalloc_thread_statistics_t)); + heap_t *heap = get_thread_heap_raw(); + if (!heap) + return; + + for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { + size_class_t *size_class = _memory_size_class + iclass; + span_t *span = heap->size_class[iclass].partial_span; + while (span) { + size_t free_count = span->list_size; + size_t block_count = size_class->block_count; + if (span->free_list_limit < block_count) + block_count = span->free_list_limit; + free_count += (block_count - span->used_count); + stats->sizecache += free_count * size_class->block_size; + span = span->next; + } + } + +#if ENABLE_THREAD_CACHE + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + span_cache_t *span_cache; + if (!iclass) + span_cache = &heap->span_cache; + else + span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); + stats->spancache += span_cache->count * (iclass + 1) * _memory_span_size; + } +#endif + + span_t *deferred = (span_t *)atomic_load_ptr(&heap->span_free_deferred); + while (deferred) { + if (deferred->size_class != SIZE_CLASS_HUGE) + stats->spancache += (size_t)deferred->span_count * _memory_span_size; + deferred = (span_t *)deferred->free_list; + } + +#if ENABLE_STATISTICS + stats->thread_to_global = (size_t)atomic_load64(&heap->thread_to_global); + stats->global_to_thread = (size_t)atomic_load64(&heap->global_to_thread); + + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + stats->span_use[iclass].current = + (size_t)atomic_load32(&heap->span_use[iclass].current); + stats->span_use[iclass].peak = + (size_t)atomic_load32(&heap->span_use[iclass].high); + stats->span_use[iclass].to_global = + (size_t)atomic_load32(&heap->span_use[iclass].spans_to_global); + stats->span_use[iclass].from_global = + (size_t)atomic_load32(&heap->span_use[iclass].spans_from_global); + stats->span_use[iclass].to_cache = + (size_t)atomic_load32(&heap->span_use[iclass].spans_to_cache); + stats->span_use[iclass].from_cache = + (size_t)atomic_load32(&heap->span_use[iclass].spans_from_cache); + stats->span_use[iclass].to_reserved = + (size_t)atomic_load32(&heap->span_use[iclass].spans_to_reserved); + stats->span_use[iclass].from_reserved = + (size_t)atomic_load32(&heap->span_use[iclass].spans_from_reserved); + stats->span_use[iclass].map_calls = + (size_t)atomic_load32(&heap->span_use[iclass].spans_map_calls); + } + for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { + stats->size_use[iclass].alloc_current = + (size_t)atomic_load32(&heap->size_class_use[iclass].alloc_current); + stats->size_use[iclass].alloc_peak = + (size_t)heap->size_class_use[iclass].alloc_peak; + stats->size_use[iclass].alloc_total = + (size_t)atomic_load32(&heap->size_class_use[iclass].alloc_total); + stats->size_use[iclass].free_total = + (size_t)atomic_load32(&heap->size_class_use[iclass].free_total); + stats->size_use[iclass].spans_to_cache = + (size_t)atomic_load32(&heap->size_class_use[iclass].spans_to_cache); + stats->size_use[iclass].spans_from_cache = + (size_t)atomic_load32(&heap->size_class_use[iclass].spans_from_cache); + stats->size_use[iclass].spans_from_reserved = (size_t)atomic_load32( + &heap->size_class_use[iclass].spans_from_reserved); + stats->size_use[iclass].map_calls = + (size_t)atomic_load32(&heap->size_class_use[iclass].spans_map_calls); + } +#endif +} + +void rpmalloc_global_statistics(rpmalloc_global_statistics_t *stats) { + memset(stats, 0, sizeof(rpmalloc_global_statistics_t)); +#if ENABLE_STATISTICS + stats->mapped = (size_t)atomic_load32(&_mapped_pages) * _memory_page_size; + stats->mapped_peak = (size_t)_mapped_pages_peak * _memory_page_size; + stats->mapped_total = + (size_t)atomic_load32(&_mapped_total) * _memory_page_size; + stats->unmapped_total = + (size_t)atomic_load32(&_unmapped_total) * _memory_page_size; + stats->huge_alloc = + (size_t)atomic_load32(&_huge_pages_current) * _memory_page_size; + stats->huge_alloc_peak = (size_t)_huge_pages_peak * _memory_page_size; +#endif +#if ENABLE_GLOBAL_CACHE + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + global_cache_t *cache = &_memory_span_cache[iclass]; + while (!atomic_cas32_acquire(&cache->lock, 1, 0)) + _rpmalloc_spin(); + uint32_t count = cache->count; +#if ENABLE_UNLIMITED_CACHE + span_t *current_span = cache->overflow; + while (current_span) { + ++count; + current_span = current_span->next; + } +#endif + atomic_store32_release(&cache->lock, 0); + stats->cached += count * (iclass + 1) * _memory_span_size; + } +#endif +} + +#if ENABLE_STATISTICS + +static void _memory_heap_dump_statistics(heap_t *heap, void *file) { + fprintf(file, "Heap %d stats:\n", heap->id); + fprintf(file, "Class CurAlloc PeakAlloc TotAlloc TotFree BlkSize " + "BlkCount SpansCur SpansPeak PeakAllocMiB ToCacheMiB " + "FromCacheMiB FromReserveMiB MmapCalls\n"); + for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { + if (!atomic_load32(&heap->size_class_use[iclass].alloc_total)) + continue; + fprintf( + file, + "%3u: %10u %10u %10u %10u %8u %8u %8d %9d %13zu %11zu %12zu %14zu " + "%9u\n", + (uint32_t)iclass, + atomic_load32(&heap->size_class_use[iclass].alloc_current), + heap->size_class_use[iclass].alloc_peak, + atomic_load32(&heap->size_class_use[iclass].alloc_total), + atomic_load32(&heap->size_class_use[iclass].free_total), + _memory_size_class[iclass].block_size, + _memory_size_class[iclass].block_count, + atomic_load32(&heap->size_class_use[iclass].spans_current), + heap->size_class_use[iclass].spans_peak, + ((size_t)heap->size_class_use[iclass].alloc_peak * + (size_t)_memory_size_class[iclass].block_size) / + (size_t)(1024 * 1024), + ((size_t)atomic_load32(&heap->size_class_use[iclass].spans_to_cache) * + _memory_span_size) / + (size_t)(1024 * 1024), + ((size_t)atomic_load32(&heap->size_class_use[iclass].spans_from_cache) * + _memory_span_size) / + (size_t)(1024 * 1024), + ((size_t)atomic_load32( + &heap->size_class_use[iclass].spans_from_reserved) * + _memory_span_size) / + (size_t)(1024 * 1024), + atomic_load32(&heap->size_class_use[iclass].spans_map_calls)); + } + fprintf(file, "Spans Current Peak Deferred PeakMiB Cached ToCacheMiB " + "FromCacheMiB ToReserveMiB FromReserveMiB ToGlobalMiB " + "FromGlobalMiB MmapCalls\n"); + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + if (!atomic_load32(&heap->span_use[iclass].high) && + !atomic_load32(&heap->span_use[iclass].spans_map_calls)) + continue; + fprintf( + file, + "%4u: %8d %8u %8u %8zu %7u %11zu %12zu %12zu %14zu %11zu %13zu %10u\n", + (uint32_t)(iclass + 1), atomic_load32(&heap->span_use[iclass].current), + atomic_load32(&heap->span_use[iclass].high), + atomic_load32(&heap->span_use[iclass].spans_deferred), + ((size_t)atomic_load32(&heap->span_use[iclass].high) * + (size_t)_memory_span_size * (iclass + 1)) / + (size_t)(1024 * 1024), +#if ENABLE_THREAD_CACHE + (unsigned int)(!iclass ? heap->span_cache.count + : heap->span_large_cache[iclass - 1].count), + ((size_t)atomic_load32(&heap->span_use[iclass].spans_to_cache) * + (iclass + 1) * _memory_span_size) / + (size_t)(1024 * 1024), + ((size_t)atomic_load32(&heap->span_use[iclass].spans_from_cache) * + (iclass + 1) * _memory_span_size) / + (size_t)(1024 * 1024), +#else + 0, (size_t)0, (size_t)0, +#endif + ((size_t)atomic_load32(&heap->span_use[iclass].spans_to_reserved) * + (iclass + 1) * _memory_span_size) / + (size_t)(1024 * 1024), + ((size_t)atomic_load32(&heap->span_use[iclass].spans_from_reserved) * + (iclass + 1) * _memory_span_size) / + (size_t)(1024 * 1024), + ((size_t)atomic_load32(&heap->span_use[iclass].spans_to_global) * + (size_t)_memory_span_size * (iclass + 1)) / + (size_t)(1024 * 1024), + ((size_t)atomic_load32(&heap->span_use[iclass].spans_from_global) * + (size_t)_memory_span_size * (iclass + 1)) / + (size_t)(1024 * 1024), + atomic_load32(&heap->span_use[iclass].spans_map_calls)); + } + fprintf(file, "Full spans: %zu\n", heap->full_span_count); + fprintf(file, "ThreadToGlobalMiB GlobalToThreadMiB\n"); + fprintf( + file, "%17zu %17zu\n", + (size_t)atomic_load64(&heap->thread_to_global) / (size_t)(1024 * 1024), + (size_t)atomic_load64(&heap->global_to_thread) / (size_t)(1024 * 1024)); +} + +#endif + +void rpmalloc_dump_statistics(void *file) { +#if ENABLE_STATISTICS + for (size_t list_idx = 0; list_idx < HEAP_ARRAY_SIZE; ++list_idx) { + heap_t *heap = _memory_heaps[list_idx]; + while (heap) { + int need_dump = 0; + for (size_t iclass = 0; !need_dump && (iclass < SIZE_CLASS_COUNT); + ++iclass) { + if (!atomic_load32(&heap->size_class_use[iclass].alloc_total)) { + rpmalloc_assert( + !atomic_load32(&heap->size_class_use[iclass].free_total), + "Heap statistics counter mismatch"); + rpmalloc_assert( + !atomic_load32(&heap->size_class_use[iclass].spans_map_calls), + "Heap statistics counter mismatch"); + continue; + } + need_dump = 1; + } + for (size_t iclass = 0; !need_dump && (iclass < LARGE_CLASS_COUNT); + ++iclass) { + if (!atomic_load32(&heap->span_use[iclass].high) && + !atomic_load32(&heap->span_use[iclass].spans_map_calls)) + continue; + need_dump = 1; + } + if (need_dump) + _memory_heap_dump_statistics(heap, file); + heap = heap->next_heap; + } + } + fprintf(file, "Global stats:\n"); + size_t huge_current = + (size_t)atomic_load32(&_huge_pages_current) * _memory_page_size; + size_t huge_peak = (size_t)_huge_pages_peak * _memory_page_size; + fprintf(file, "HugeCurrentMiB HugePeakMiB\n"); + fprintf(file, "%14zu %11zu\n", huge_current / (size_t)(1024 * 1024), + huge_peak / (size_t)(1024 * 1024)); + +#if ENABLE_GLOBAL_CACHE + fprintf(file, "GlobalCacheMiB\n"); + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + global_cache_t *cache = _memory_span_cache + iclass; + size_t global_cache = (size_t)cache->count * iclass * _memory_span_size; + + size_t global_overflow_cache = 0; + span_t *span = cache->overflow; + while (span) { + global_overflow_cache += iclass * _memory_span_size; + span = span->next; + } + if (global_cache || global_overflow_cache || cache->insert_count || + cache->extract_count) + fprintf(file, + "%4zu: %8zuMiB (%8zuMiB overflow) %14zu insert %14zu extract\n", + iclass + 1, global_cache / (size_t)(1024 * 1024), + global_overflow_cache / (size_t)(1024 * 1024), + cache->insert_count, cache->extract_count); + } +#endif + + size_t mapped = (size_t)atomic_load32(&_mapped_pages) * _memory_page_size; + size_t mapped_os = + (size_t)atomic_load32(&_mapped_pages_os) * _memory_page_size; + size_t mapped_peak = (size_t)_mapped_pages_peak * _memory_page_size; + size_t mapped_total = + (size_t)atomic_load32(&_mapped_total) * _memory_page_size; + size_t unmapped_total = + (size_t)atomic_load32(&_unmapped_total) * _memory_page_size; + fprintf( + file, + "MappedMiB MappedOSMiB MappedPeakMiB MappedTotalMiB UnmappedTotalMiB\n"); + fprintf(file, "%9zu %11zu %13zu %14zu %16zu\n", + mapped / (size_t)(1024 * 1024), mapped_os / (size_t)(1024 * 1024), + mapped_peak / (size_t)(1024 * 1024), + mapped_total / (size_t)(1024 * 1024), + unmapped_total / (size_t)(1024 * 1024)); + + fprintf(file, "\n"); +#if 0 + int64_t allocated = atomic_load64(&_allocation_counter); + int64_t deallocated = atomic_load64(&_deallocation_counter); + fprintf(file, "Allocation count: %lli\n", allocated); + fprintf(file, "Deallocation count: %lli\n", deallocated); + fprintf(file, "Current allocations: %lli\n", (allocated - deallocated)); + fprintf(file, "Master spans: %d\n", atomic_load32(&_master_spans)); + fprintf(file, "Dangling master spans: %d\n", atomic_load32(&_unmapped_master_spans)); +#endif +#endif + (void)sizeof(file); +} + +#if RPMALLOC_FIRST_CLASS_HEAPS + +extern inline rpmalloc_heap_t *rpmalloc_heap_acquire(void) { + // Must be a pristine heap from newly mapped memory pages, or else memory + // blocks could already be allocated from the heap which would (wrongly) be + // released when heap is cleared with rpmalloc_heap_free_all(). Also heaps + // guaranteed to be pristine from the dedicated orphan list can be used. + heap_t *heap = _rpmalloc_heap_allocate(1); + rpmalloc_assume(heap != NULL); + heap->owner_thread = 0; + _rpmalloc_stat_inc(&_memory_active_heaps); + return heap; +} + +extern inline void rpmalloc_heap_release(rpmalloc_heap_t *heap) { + if (heap) + _rpmalloc_heap_release(heap, 1, 1); +} + +extern inline RPMALLOC_ALLOCATOR void * +rpmalloc_heap_alloc(rpmalloc_heap_t *heap, size_t size) { +#if ENABLE_VALIDATE_ARGS + if (size >= MAX_ALLOC_SIZE) { + errno = EINVAL; + return 0; + } +#endif + return _rpmalloc_allocate(heap, size); +} + +extern inline RPMALLOC_ALLOCATOR void * +rpmalloc_heap_aligned_alloc(rpmalloc_heap_t *heap, size_t alignment, + size_t size) { +#if ENABLE_VALIDATE_ARGS + if (size >= MAX_ALLOC_SIZE) { + errno = EINVAL; + return 0; + } +#endif + return _rpmalloc_aligned_allocate(heap, alignment, size); +} + +extern inline RPMALLOC_ALLOCATOR void * +rpmalloc_heap_calloc(rpmalloc_heap_t *heap, size_t num, size_t size) { + return rpmalloc_heap_aligned_calloc(heap, 0, num, size); +} + +extern inline RPMALLOC_ALLOCATOR void * +rpmalloc_heap_aligned_calloc(rpmalloc_heap_t *heap, size_t alignment, + size_t num, size_t size) { + size_t total; +#if ENABLE_VALIDATE_ARGS +#if PLATFORM_WINDOWS + int err = SizeTMult(num, size, &total); + if ((err != S_OK) || (total >= MAX_ALLOC_SIZE)) { + errno = EINVAL; + return 0; + } +#else + int err = __builtin_umull_overflow(num, size, &total); + if (err || (total >= MAX_ALLOC_SIZE)) { + errno = EINVAL; + return 0; + } +#endif +#else + total = num * size; +#endif + void *block = _rpmalloc_aligned_allocate(heap, alignment, total); + if (block) + memset(block, 0, total); + return block; +} + +extern inline RPMALLOC_ALLOCATOR void * +rpmalloc_heap_realloc(rpmalloc_heap_t *heap, void *ptr, size_t size, + unsigned int flags) { +#if ENABLE_VALIDATE_ARGS + if (size >= MAX_ALLOC_SIZE) { + errno = EINVAL; + return ptr; + } +#endif + return _rpmalloc_reallocate(heap, ptr, size, 0, flags); +} + +extern inline RPMALLOC_ALLOCATOR void * +rpmalloc_heap_aligned_realloc(rpmalloc_heap_t *heap, void *ptr, + size_t alignment, size_t size, + unsigned int flags) { +#if ENABLE_VALIDATE_ARGS + if ((size + alignment < size) || (alignment > _memory_page_size)) { + errno = EINVAL; + return 0; + } +#endif + return _rpmalloc_aligned_reallocate(heap, ptr, alignment, size, 0, flags); +} + +extern inline void rpmalloc_heap_free(rpmalloc_heap_t *heap, void *ptr) { + (void)sizeof(heap); + _rpmalloc_deallocate(ptr); +} + +extern inline void rpmalloc_heap_free_all(rpmalloc_heap_t *heap) { + span_t *span; + span_t *next_span; + + _rpmalloc_heap_cache_adopt_deferred(heap, 0); + + for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { + span = heap->size_class[iclass].partial_span; + while (span) { + next_span = span->next; + _rpmalloc_heap_cache_insert(heap, span); + span = next_span; + } + heap->size_class[iclass].partial_span = 0; + span = heap->full_span[iclass]; + while (span) { + next_span = span->next; + _rpmalloc_heap_cache_insert(heap, span); + span = next_span; + } + + span = heap->size_class[iclass].cache; + if (span) + _rpmalloc_heap_cache_insert(heap, span); + heap->size_class[iclass].cache = 0; + } + memset(heap->size_class, 0, sizeof(heap->size_class)); + memset(heap->full_span, 0, sizeof(heap->full_span)); + + span = heap->large_huge_span; + while (span) { + next_span = span->next; + if (UNEXPECTED(span->size_class == SIZE_CLASS_HUGE)) + _rpmalloc_deallocate_huge(span); + else + _rpmalloc_heap_cache_insert(heap, span); + span = next_span; + } + heap->large_huge_span = 0; + heap->full_span_count = 0; + +#if ENABLE_THREAD_CACHE + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + span_cache_t *span_cache; + if (!iclass) + span_cache = &heap->span_cache; + else + span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); + if (!span_cache->count) + continue; +#if ENABLE_GLOBAL_CACHE + _rpmalloc_stat_add64(&heap->thread_to_global, + span_cache->count * (iclass + 1) * _memory_span_size); + _rpmalloc_stat_add(&heap->span_use[iclass].spans_to_global, + span_cache->count); + _rpmalloc_global_cache_insert_spans(span_cache->span, iclass + 1, + span_cache->count); +#else + for (size_t ispan = 0; ispan < span_cache->count; ++ispan) + _rpmalloc_span_unmap(span_cache->span[ispan]); +#endif + span_cache->count = 0; + } +#endif + +#if ENABLE_STATISTICS + for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { + atomic_store32(&heap->size_class_use[iclass].alloc_current, 0); + atomic_store32(&heap->size_class_use[iclass].spans_current, 0); + } + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + atomic_store32(&heap->span_use[iclass].current, 0); + } +#endif +} + +extern inline void rpmalloc_heap_thread_set_current(rpmalloc_heap_t *heap) { + heap_t *prev_heap = get_thread_heap_raw(); + if (prev_heap != heap) { + set_thread_heap(heap); + if (prev_heap) + rpmalloc_heap_release(prev_heap); + } +} + +extern inline rpmalloc_heap_t *rpmalloc_get_heap_for_ptr(void *ptr) { + // Grab the span, and then the heap from the span + span_t *span = (span_t *)((uintptr_t)ptr & _memory_span_mask); + if (span) { + return span->heap; + } + return 0; +} + +#endif + +#if ENABLE_PRELOAD || ENABLE_OVERRIDE + +#include "malloc.c" + +#endif + +void rpmalloc_linker_reference(void) { (void)sizeof(_rpmalloc_initialized); } diff --git a/llvm/lib/Support/rpmalloc/rpmalloc.h b/llvm/lib/Support/rpmalloc/rpmalloc.h index 3911c53b779b36..5b7fe1ff4286ba 100644 --- a/llvm/lib/Support/rpmalloc/rpmalloc.h +++ b/llvm/lib/Support/rpmalloc/rpmalloc.h @@ -1,428 +1,428 @@ -//===---------------------- rpmalloc.h ------------------*- C -*-=============// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// -// -// This library provides a cross-platform lock free thread caching malloc -// implementation in C11. -// -//===----------------------------------------------------------------------===// - -#pragma once - -#include - -#ifdef __cplusplus -extern "C" { -#endif - -#if defined(__clang__) || defined(__GNUC__) -#define RPMALLOC_EXPORT __attribute__((visibility("default"))) -#define RPMALLOC_ALLOCATOR -#if (defined(__clang_major__) && (__clang_major__ < 4)) || \ - (defined(__GNUC__) && defined(ENABLE_PRELOAD) && ENABLE_PRELOAD) -#define RPMALLOC_ATTRIB_MALLOC -#define RPMALLOC_ATTRIB_ALLOC_SIZE(size) -#define RPMALLOC_ATTRIB_ALLOC_SIZE2(count, size) -#else -#define RPMALLOC_ATTRIB_MALLOC __attribute__((__malloc__)) -#define RPMALLOC_ATTRIB_ALLOC_SIZE(size) __attribute__((alloc_size(size))) -#define RPMALLOC_ATTRIB_ALLOC_SIZE2(count, size) \ - __attribute__((alloc_size(count, size))) -#endif -#define RPMALLOC_CDECL -#elif defined(_MSC_VER) -#define RPMALLOC_EXPORT -#define RPMALLOC_ALLOCATOR __declspec(allocator) __declspec(restrict) -#define RPMALLOC_ATTRIB_MALLOC -#define RPMALLOC_ATTRIB_ALLOC_SIZE(size) -#define RPMALLOC_ATTRIB_ALLOC_SIZE2(count, size) -#define RPMALLOC_CDECL __cdecl -#else -#define RPMALLOC_EXPORT -#define RPMALLOC_ALLOCATOR -#define RPMALLOC_ATTRIB_MALLOC -#define RPMALLOC_ATTRIB_ALLOC_SIZE(size) -#define RPMALLOC_ATTRIB_ALLOC_SIZE2(count, size) -#define RPMALLOC_CDECL -#endif - -//! Define RPMALLOC_CONFIGURABLE to enable configuring sizes. Will introduce -// a very small overhead due to some size calculations not being compile time -// constants -#ifndef RPMALLOC_CONFIGURABLE -#define RPMALLOC_CONFIGURABLE 0 -#endif - -//! Define RPMALLOC_FIRST_CLASS_HEAPS to enable heap based API (rpmalloc_heap_* -//! functions). -// Will introduce a very small overhead to track fully allocated spans in heaps -#ifndef RPMALLOC_FIRST_CLASS_HEAPS -#define RPMALLOC_FIRST_CLASS_HEAPS 0 -#endif - -//! Flag to rpaligned_realloc to not preserve content in reallocation -#define RPMALLOC_NO_PRESERVE 1 -//! Flag to rpaligned_realloc to fail and return null pointer if grow cannot be -//! done in-place, -// in which case the original pointer is still valid (just like a call to -// realloc which failes to allocate a new block). -#define RPMALLOC_GROW_OR_FAIL 2 - -typedef struct rpmalloc_global_statistics_t { - //! Current amount of virtual memory mapped, all of which might not have been - //! committed (only if ENABLE_STATISTICS=1) - size_t mapped; - //! Peak amount of virtual memory mapped, all of which might not have been - //! committed (only if ENABLE_STATISTICS=1) - size_t mapped_peak; - //! Current amount of memory in global caches for small and medium sizes - //! (<32KiB) - size_t cached; - //! Current amount of memory allocated in huge allocations, i.e larger than - //! LARGE_SIZE_LIMIT which is 2MiB by default (only if ENABLE_STATISTICS=1) - size_t huge_alloc; - //! Peak amount of memory allocated in huge allocations, i.e larger than - //! LARGE_SIZE_LIMIT which is 2MiB by default (only if ENABLE_STATISTICS=1) - size_t huge_alloc_peak; - //! Total amount of memory mapped since initialization (only if - //! ENABLE_STATISTICS=1) - size_t mapped_total; - //! Total amount of memory unmapped since initialization (only if - //! ENABLE_STATISTICS=1) - size_t unmapped_total; -} rpmalloc_global_statistics_t; - -typedef struct rpmalloc_thread_statistics_t { - //! Current number of bytes available in thread size class caches for small - //! and medium sizes (<32KiB) - size_t sizecache; - //! Current number of bytes available in thread span caches for small and - //! medium sizes (<32KiB) - size_t spancache; - //! Total number of bytes transitioned from thread cache to global cache (only - //! if ENABLE_STATISTICS=1) - size_t thread_to_global; - //! Total number of bytes transitioned from global cache to thread cache (only - //! if ENABLE_STATISTICS=1) - size_t global_to_thread; - //! Per span count statistics (only if ENABLE_STATISTICS=1) - struct { - //! Currently used number of spans - size_t current; - //! High water mark of spans used - size_t peak; - //! Number of spans transitioned to global cache - size_t to_global; - //! Number of spans transitioned from global cache - size_t from_global; - //! Number of spans transitioned to thread cache - size_t to_cache; - //! Number of spans transitioned from thread cache - size_t from_cache; - //! Number of spans transitioned to reserved state - size_t to_reserved; - //! Number of spans transitioned from reserved state - size_t from_reserved; - //! Number of raw memory map calls (not hitting the reserve spans but - //! resulting in actual OS mmap calls) - size_t map_calls; - } span_use[64]; - //! Per size class statistics (only if ENABLE_STATISTICS=1) - struct { - //! Current number of allocations - size_t alloc_current; - //! Peak number of allocations - size_t alloc_peak; - //! Total number of allocations - size_t alloc_total; - //! Total number of frees - size_t free_total; - //! Number of spans transitioned to cache - size_t spans_to_cache; - //! Number of spans transitioned from cache - size_t spans_from_cache; - //! Number of spans transitioned from reserved state - size_t spans_from_reserved; - //! Number of raw memory map calls (not hitting the reserve spans but - //! resulting in actual OS mmap calls) - size_t map_calls; - } size_use[128]; -} rpmalloc_thread_statistics_t; - -typedef struct rpmalloc_config_t { - //! Map memory pages for the given number of bytes. The returned address MUST - //! be - // aligned to the rpmalloc span size, which will always be a power of two. - // Optionally the function can store an alignment offset in the offset - // variable in case it performs alignment and the returned pointer is offset - // from the actual start of the memory region due to this alignment. The - // alignment offset will be passed to the memory unmap function. The - // alignment offset MUST NOT be larger than 65535 (storable in an uint16_t), - // if it is you must use natural alignment to shift it into 16 bits. If you - // set a memory_map function, you must also set a memory_unmap function or - // else the default implementation will be used for both. This function must - // be thread safe, it can be called by multiple threads simultaneously. - void *(*memory_map)(size_t size, size_t *offset); - //! Unmap the memory pages starting at address and spanning the given number - //! of bytes. - // If release is set to non-zero, the unmap is for an entire span range as - // returned by a previous call to memory_map and that the entire range should - // be released. The release argument holds the size of the entire span range. - // If release is set to 0, the unmap is a partial decommit of a subset of the - // mapped memory range. If you set a memory_unmap function, you must also set - // a memory_map function or else the default implementation will be used for - // both. This function must be thread safe, it can be called by multiple - // threads simultaneously. - void (*memory_unmap)(void *address, size_t size, size_t offset, - size_t release); - //! Called when an assert fails, if asserts are enabled. Will use the standard - //! assert() - // if this is not set. - void (*error_callback)(const char *message); - //! Called when a call to map memory pages fails (out of memory). If this - //! callback is - // not set or returns zero the library will return a null pointer in the - // allocation call. If this callback returns non-zero the map call will be - // retried. The argument passed is the number of bytes that was requested in - // the map call. Only used if the default system memory map function is used - // (memory_map callback is not set). - int (*map_fail_callback)(size_t size); - //! Size of memory pages. The page size MUST be a power of two. All memory - //! mapping - // requests to memory_map will be made with size set to a multiple of the - // page size. Used if RPMALLOC_CONFIGURABLE is defined to 1, otherwise system - // page size is used. - size_t page_size; - //! Size of a span of memory blocks. MUST be a power of two, and in - //! [4096,262144] - // range (unless 0 - set to 0 to use the default span size). Used if - // RPMALLOC_CONFIGURABLE is defined to 1. - size_t span_size; - //! Number of spans to map at each request to map new virtual memory blocks. - //! This can - // be used to minimize the system call overhead at the cost of virtual memory - // address space. The extra mapped pages will not be written until actually - // used, so physical committed memory should not be affected in the default - // implementation. Will be aligned to a multiple of spans that match memory - // page size in case of huge pages. - size_t span_map_count; - //! Enable use of large/huge pages. If this flag is set to non-zero and page - //! size is - // zero, the allocator will try to enable huge pages and auto detect the - // configuration. If this is set to non-zero and page_size is also non-zero, - // the allocator will assume huge pages have been configured and enabled - // prior to initializing the allocator. For Windows, see - // https://docs.microsoft.com/en-us/windows/desktop/memory/large-page-support - // For Linux, see https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt - int enable_huge_pages; - //! Respectively allocated pages and huge allocated pages names for systems - // supporting it to be able to distinguish among anonymous regions. - const char *page_name; - const char *huge_page_name; -} rpmalloc_config_t; - -//! Initialize allocator with default configuration -RPMALLOC_EXPORT int rpmalloc_initialize(void); - -//! Initialize allocator with given configuration -RPMALLOC_EXPORT int rpmalloc_initialize_config(const rpmalloc_config_t *config); - -//! Get allocator configuration -RPMALLOC_EXPORT const rpmalloc_config_t *rpmalloc_config(void); - -//! Finalize allocator -RPMALLOC_EXPORT void rpmalloc_finalize(void); - -//! Initialize allocator for calling thread -RPMALLOC_EXPORT void rpmalloc_thread_initialize(void); - -//! Finalize allocator for calling thread -RPMALLOC_EXPORT void rpmalloc_thread_finalize(int release_caches); - -//! Perform deferred deallocations pending for the calling thread heap -RPMALLOC_EXPORT void rpmalloc_thread_collect(void); - -//! Query if allocator is initialized for calling thread -RPMALLOC_EXPORT int rpmalloc_is_thread_initialized(void); - -//! Get per-thread statistics -RPMALLOC_EXPORT void -rpmalloc_thread_statistics(rpmalloc_thread_statistics_t *stats); - -//! Get global statistics -RPMALLOC_EXPORT void -rpmalloc_global_statistics(rpmalloc_global_statistics_t *stats); - -//! Dump all statistics in human readable format to file (should be a FILE*) -RPMALLOC_EXPORT void rpmalloc_dump_statistics(void *file); - -//! Allocate a memory block of at least the given size -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpmalloc(size_t size) RPMALLOC_ATTRIB_MALLOC RPMALLOC_ATTRIB_ALLOC_SIZE(1); - -//! Free the given memory block -RPMALLOC_EXPORT void rpfree(void *ptr); - -//! Allocate a memory block of at least the given size and zero initialize it -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpcalloc(size_t num, size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE2(1, 2); - -//! Reallocate the given block to at least the given size -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rprealloc(void *ptr, size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(2); - -//! Reallocate the given block to at least the given size and alignment, -// with optional control flags (see RPMALLOC_NO_PRESERVE). -// Alignment must be a power of two and a multiple of sizeof(void*), -// and should ideally be less than memory page size. A caveat of rpmalloc -// internals is that this must also be strictly less than the span size -// (default 64KiB) -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpaligned_realloc(void *ptr, size_t alignment, size_t size, size_t oldsize, - unsigned int flags) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(3); - -//! Allocate a memory block of at least the given size and alignment. -// Alignment must be a power of two and a multiple of sizeof(void*), -// and should ideally be less than memory page size. A caveat of rpmalloc -// internals is that this must also be strictly less than the span size -// (default 64KiB) -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpaligned_alloc(size_t alignment, size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(2); - -//! Allocate a memory block of at least the given size and alignment, and zero -//! initialize it. -// Alignment must be a power of two and a multiple of sizeof(void*), -// and should ideally be less than memory page size. A caveat of rpmalloc -// internals is that this must also be strictly less than the span size -// (default 64KiB) -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpaligned_calloc(size_t alignment, size_t num, - size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE2(2, 3); - -//! Allocate a memory block of at least the given size and alignment. -// Alignment must be a power of two and a multiple of sizeof(void*), -// and should ideally be less than memory page size. A caveat of rpmalloc -// internals is that this must also be strictly less than the span size -// (default 64KiB) -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpmemalign(size_t alignment, size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(2); - -//! Allocate a memory block of at least the given size and alignment. -// Alignment must be a power of two and a multiple of sizeof(void*), -// and should ideally be less than memory page size. A caveat of rpmalloc -// internals is that this must also be strictly less than the span size -// (default 64KiB) -RPMALLOC_EXPORT int rpposix_memalign(void **memptr, size_t alignment, - size_t size); - -//! Query the usable size of the given memory block (from given pointer to the -//! end of block) -RPMALLOC_EXPORT size_t rpmalloc_usable_size(void *ptr); - -//! Dummy empty function for forcing linker symbol inclusion -RPMALLOC_EXPORT void rpmalloc_linker_reference(void); - -#if RPMALLOC_FIRST_CLASS_HEAPS - -//! Heap type -typedef struct heap_t rpmalloc_heap_t; - -//! Acquire a new heap. Will reuse existing released heaps or allocate memory -//! for a new heap -// if none available. Heap API is implemented with the strict assumption that -// only one single thread will call heap functions for a given heap at any -// given time, no functions are thread safe. -RPMALLOC_EXPORT rpmalloc_heap_t *rpmalloc_heap_acquire(void); - -//! Release a heap (does NOT free the memory allocated by the heap, use -//! rpmalloc_heap_free_all before destroying the heap). -// Releasing a heap will enable it to be reused by other threads. Safe to pass -// a null pointer. -RPMALLOC_EXPORT void rpmalloc_heap_release(rpmalloc_heap_t *heap); - -//! Allocate a memory block of at least the given size using the given heap. -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpmalloc_heap_alloc(rpmalloc_heap_t *heap, size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(2); - -//! Allocate a memory block of at least the given size using the given heap. The -//! returned -// block will have the requested alignment. Alignment must be a power of two -// and a multiple of sizeof(void*), and should ideally be less than memory page -// size. A caveat of rpmalloc internals is that this must also be strictly less -// than the span size (default 64KiB). -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpmalloc_heap_aligned_alloc(rpmalloc_heap_t *heap, size_t alignment, - size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(3); - -//! Allocate a memory block of at least the given size using the given heap and -//! zero initialize it. -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpmalloc_heap_calloc(rpmalloc_heap_t *heap, size_t num, - size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE2(2, 3); - -//! Allocate a memory block of at least the given size using the given heap and -//! zero initialize it. The returned -// block will have the requested alignment. Alignment must either be zero, or a -// power of two and a multiple of sizeof(void*), and should ideally be less -// than memory page size. A caveat of rpmalloc internals is that this must also -// be strictly less than the span size (default 64KiB). -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpmalloc_heap_aligned_calloc(rpmalloc_heap_t *heap, size_t alignment, - size_t num, size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE2(2, 3); - -//! Reallocate the given block to at least the given size. The memory block MUST -//! be allocated -// by the same heap given to this function. -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpmalloc_heap_realloc(rpmalloc_heap_t *heap, void *ptr, size_t size, - unsigned int flags) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(3); - -//! Reallocate the given block to at least the given size. The memory block MUST -//! be allocated -// by the same heap given to this function. The returned block will have the -// requested alignment. Alignment must be either zero, or a power of two and a -// multiple of sizeof(void*), and should ideally be less than memory page size. -// A caveat of rpmalloc internals is that this must also be strictly less than -// the span size (default 64KiB). -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void *rpmalloc_heap_aligned_realloc( - rpmalloc_heap_t *heap, void *ptr, size_t alignment, size_t size, - unsigned int flags) RPMALLOC_ATTRIB_MALLOC RPMALLOC_ATTRIB_ALLOC_SIZE(4); - -//! Free the given memory block from the given heap. The memory block MUST be -//! allocated -// by the same heap given to this function. -RPMALLOC_EXPORT void rpmalloc_heap_free(rpmalloc_heap_t *heap, void *ptr); - -//! Free all memory allocated by the heap -RPMALLOC_EXPORT void rpmalloc_heap_free_all(rpmalloc_heap_t *heap); - -//! Set the given heap as the current heap for the calling thread. A heap MUST -//! only be current heap -// for a single thread, a heap can never be shared between multiple threads. -// The previous current heap for the calling thread is released to be reused by -// other threads. -RPMALLOC_EXPORT void rpmalloc_heap_thread_set_current(rpmalloc_heap_t *heap); - -//! Returns which heap the given pointer is allocated on -RPMALLOC_EXPORT rpmalloc_heap_t *rpmalloc_get_heap_for_ptr(void *ptr); - -#endif - -#ifdef __cplusplus -} -#endif +//===---------------------- rpmalloc.h ------------------*- C -*-=============// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This library provides a cross-platform lock free thread caching malloc +// implementation in C11. +// +//===----------------------------------------------------------------------===// + +#pragma once + +#include + +#ifdef __cplusplus +extern "C" { +#endif + +#if defined(__clang__) || defined(__GNUC__) +#define RPMALLOC_EXPORT __attribute__((visibility("default"))) +#define RPMALLOC_ALLOCATOR +#if (defined(__clang_major__) && (__clang_major__ < 4)) || \ + (defined(__GNUC__) && defined(ENABLE_PRELOAD) && ENABLE_PRELOAD) +#define RPMALLOC_ATTRIB_MALLOC +#define RPMALLOC_ATTRIB_ALLOC_SIZE(size) +#define RPMALLOC_ATTRIB_ALLOC_SIZE2(count, size) +#else +#define RPMALLOC_ATTRIB_MALLOC __attribute__((__malloc__)) +#define RPMALLOC_ATTRIB_ALLOC_SIZE(size) __attribute__((alloc_size(size))) +#define RPMALLOC_ATTRIB_ALLOC_SIZE2(count, size) \ + __attribute__((alloc_size(count, size))) +#endif +#define RPMALLOC_CDECL +#elif defined(_MSC_VER) +#define RPMALLOC_EXPORT +#define RPMALLOC_ALLOCATOR __declspec(allocator) __declspec(restrict) +#define RPMALLOC_ATTRIB_MALLOC +#define RPMALLOC_ATTRIB_ALLOC_SIZE(size) +#define RPMALLOC_ATTRIB_ALLOC_SIZE2(count, size) +#define RPMALLOC_CDECL __cdecl +#else +#define RPMALLOC_EXPORT +#define RPMALLOC_ALLOCATOR +#define RPMALLOC_ATTRIB_MALLOC +#define RPMALLOC_ATTRIB_ALLOC_SIZE(size) +#define RPMALLOC_ATTRIB_ALLOC_SIZE2(count, size) +#define RPMALLOC_CDECL +#endif + +//! Define RPMALLOC_CONFIGURABLE to enable configuring sizes. Will introduce +// a very small overhead due to some size calculations not being compile time +// constants +#ifndef RPMALLOC_CONFIGURABLE +#define RPMALLOC_CONFIGURABLE 0 +#endif + +//! Define RPMALLOC_FIRST_CLASS_HEAPS to enable heap based API (rpmalloc_heap_* +//! functions). +// Will introduce a very small overhead to track fully allocated spans in heaps +#ifndef RPMALLOC_FIRST_CLASS_HEAPS +#define RPMALLOC_FIRST_CLASS_HEAPS 0 +#endif + +//! Flag to rpaligned_realloc to not preserve content in reallocation +#define RPMALLOC_NO_PRESERVE 1 +//! Flag to rpaligned_realloc to fail and return null pointer if grow cannot be +//! done in-place, +// in which case the original pointer is still valid (just like a call to +// realloc which failes to allocate a new block). +#define RPMALLOC_GROW_OR_FAIL 2 + +typedef struct rpmalloc_global_statistics_t { + //! Current amount of virtual memory mapped, all of which might not have been + //! committed (only if ENABLE_STATISTICS=1) + size_t mapped; + //! Peak amount of virtual memory mapped, all of which might not have been + //! committed (only if ENABLE_STATISTICS=1) + size_t mapped_peak; + //! Current amount of memory in global caches for small and medium sizes + //! (<32KiB) + size_t cached; + //! Current amount of memory allocated in huge allocations, i.e larger than + //! LARGE_SIZE_LIMIT which is 2MiB by default (only if ENABLE_STATISTICS=1) + size_t huge_alloc; + //! Peak amount of memory allocated in huge allocations, i.e larger than + //! LARGE_SIZE_LIMIT which is 2MiB by default (only if ENABLE_STATISTICS=1) + size_t huge_alloc_peak; + //! Total amount of memory mapped since initialization (only if + //! ENABLE_STATISTICS=1) + size_t mapped_total; + //! Total amount of memory unmapped since initialization (only if + //! ENABLE_STATISTICS=1) + size_t unmapped_total; +} rpmalloc_global_statistics_t; + +typedef struct rpmalloc_thread_statistics_t { + //! Current number of bytes available in thread size class caches for small + //! and medium sizes (<32KiB) + size_t sizecache; + //! Current number of bytes available in thread span caches for small and + //! medium sizes (<32KiB) + size_t spancache; + //! Total number of bytes transitioned from thread cache to global cache (only + //! if ENABLE_STATISTICS=1) + size_t thread_to_global; + //! Total number of bytes transitioned from global cache to thread cache (only + //! if ENABLE_STATISTICS=1) + size_t global_to_thread; + //! Per span count statistics (only if ENABLE_STATISTICS=1) + struct { + //! Currently used number of spans + size_t current; + //! High water mark of spans used + size_t peak; + //! Number of spans transitioned to global cache + size_t to_global; + //! Number of spans transitioned from global cache + size_t from_global; + //! Number of spans transitioned to thread cache + size_t to_cache; + //! Number of spans transitioned from thread cache + size_t from_cache; + //! Number of spans transitioned to reserved state + size_t to_reserved; + //! Number of spans transitioned from reserved state + size_t from_reserved; + //! Number of raw memory map calls (not hitting the reserve spans but + //! resulting in actual OS mmap calls) + size_t map_calls; + } span_use[64]; + //! Per size class statistics (only if ENABLE_STATISTICS=1) + struct { + //! Current number of allocations + size_t alloc_current; + //! Peak number of allocations + size_t alloc_peak; + //! Total number of allocations + size_t alloc_total; + //! Total number of frees + size_t free_total; + //! Number of spans transitioned to cache + size_t spans_to_cache; + //! Number of spans transitioned from cache + size_t spans_from_cache; + //! Number of spans transitioned from reserved state + size_t spans_from_reserved; + //! Number of raw memory map calls (not hitting the reserve spans but + //! resulting in actual OS mmap calls) + size_t map_calls; + } size_use[128]; +} rpmalloc_thread_statistics_t; + +typedef struct rpmalloc_config_t { + //! Map memory pages for the given number of bytes. The returned address MUST + //! be + // aligned to the rpmalloc span size, which will always be a power of two. + // Optionally the function can store an alignment offset in the offset + // variable in case it performs alignment and the returned pointer is offset + // from the actual start of the memory region due to this alignment. The + // alignment offset will be passed to the memory unmap function. The + // alignment offset MUST NOT be larger than 65535 (storable in an uint16_t), + // if it is you must use natural alignment to shift it into 16 bits. If you + // set a memory_map function, you must also set a memory_unmap function or + // else the default implementation will be used for both. This function must + // be thread safe, it can be called by multiple threads simultaneously. + void *(*memory_map)(size_t size, size_t *offset); + //! Unmap the memory pages starting at address and spanning the given number + //! of bytes. + // If release is set to non-zero, the unmap is for an entire span range as + // returned by a previous call to memory_map and that the entire range should + // be released. The release argument holds the size of the entire span range. + // If release is set to 0, the unmap is a partial decommit of a subset of the + // mapped memory range. If you set a memory_unmap function, you must also set + // a memory_map function or else the default implementation will be used for + // both. This function must be thread safe, it can be called by multiple + // threads simultaneously. + void (*memory_unmap)(void *address, size_t size, size_t offset, + size_t release); + //! Called when an assert fails, if asserts are enabled. Will use the standard + //! assert() + // if this is not set. + void (*error_callback)(const char *message); + //! Called when a call to map memory pages fails (out of memory). If this + //! callback is + // not set or returns zero the library will return a null pointer in the + // allocation call. If this callback returns non-zero the map call will be + // retried. The argument passed is the number of bytes that was requested in + // the map call. Only used if the default system memory map function is used + // (memory_map callback is not set). + int (*map_fail_callback)(size_t size); + //! Size of memory pages. The page size MUST be a power of two. All memory + //! mapping + // requests to memory_map will be made with size set to a multiple of the + // page size. Used if RPMALLOC_CONFIGURABLE is defined to 1, otherwise system + // page size is used. + size_t page_size; + //! Size of a span of memory blocks. MUST be a power of two, and in + //! [4096,262144] + // range (unless 0 - set to 0 to use the default span size). Used if + // RPMALLOC_CONFIGURABLE is defined to 1. + size_t span_size; + //! Number of spans to map at each request to map new virtual memory blocks. + //! This can + // be used to minimize the system call overhead at the cost of virtual memory + // address space. The extra mapped pages will not be written until actually + // used, so physical committed memory should not be affected in the default + // implementation. Will be aligned to a multiple of spans that match memory + // page size in case of huge pages. + size_t span_map_count; + //! Enable use of large/huge pages. If this flag is set to non-zero and page + //! size is + // zero, the allocator will try to enable huge pages and auto detect the + // configuration. If this is set to non-zero and page_size is also non-zero, + // the allocator will assume huge pages have been configured and enabled + // prior to initializing the allocator. For Windows, see + // https://docs.microsoft.com/en-us/windows/desktop/memory/large-page-support + // For Linux, see https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt + int enable_huge_pages; + //! Respectively allocated pages and huge allocated pages names for systems + // supporting it to be able to distinguish among anonymous regions. + const char *page_name; + const char *huge_page_name; +} rpmalloc_config_t; + +//! Initialize allocator with default configuration +RPMALLOC_EXPORT int rpmalloc_initialize(void); + +//! Initialize allocator with given configuration +RPMALLOC_EXPORT int rpmalloc_initialize_config(const rpmalloc_config_t *config); + +//! Get allocator configuration +RPMALLOC_EXPORT const rpmalloc_config_t *rpmalloc_config(void); + +//! Finalize allocator +RPMALLOC_EXPORT void rpmalloc_finalize(void); + +//! Initialize allocator for calling thread +RPMALLOC_EXPORT void rpmalloc_thread_initialize(void); + +//! Finalize allocator for calling thread +RPMALLOC_EXPORT void rpmalloc_thread_finalize(int release_caches); + +//! Perform deferred deallocations pending for the calling thread heap +RPMALLOC_EXPORT void rpmalloc_thread_collect(void); + +//! Query if allocator is initialized for calling thread +RPMALLOC_EXPORT int rpmalloc_is_thread_initialized(void); + +//! Get per-thread statistics +RPMALLOC_EXPORT void +rpmalloc_thread_statistics(rpmalloc_thread_statistics_t *stats); + +//! Get global statistics +RPMALLOC_EXPORT void +rpmalloc_global_statistics(rpmalloc_global_statistics_t *stats); + +//! Dump all statistics in human readable format to file (should be a FILE*) +RPMALLOC_EXPORT void rpmalloc_dump_statistics(void *file); + +//! Allocate a memory block of at least the given size +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpmalloc(size_t size) RPMALLOC_ATTRIB_MALLOC RPMALLOC_ATTRIB_ALLOC_SIZE(1); + +//! Free the given memory block +RPMALLOC_EXPORT void rpfree(void *ptr); + +//! Allocate a memory block of at least the given size and zero initialize it +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpcalloc(size_t num, size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE2(1, 2); + +//! Reallocate the given block to at least the given size +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rprealloc(void *ptr, size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(2); + +//! Reallocate the given block to at least the given size and alignment, +// with optional control flags (see RPMALLOC_NO_PRESERVE). +// Alignment must be a power of two and a multiple of sizeof(void*), +// and should ideally be less than memory page size. A caveat of rpmalloc +// internals is that this must also be strictly less than the span size +// (default 64KiB) +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpaligned_realloc(void *ptr, size_t alignment, size_t size, size_t oldsize, + unsigned int flags) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(3); + +//! Allocate a memory block of at least the given size and alignment. +// Alignment must be a power of two and a multiple of sizeof(void*), +// and should ideally be less than memory page size. A caveat of rpmalloc +// internals is that this must also be strictly less than the span size +// (default 64KiB) +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpaligned_alloc(size_t alignment, size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(2); + +//! Allocate a memory block of at least the given size and alignment, and zero +//! initialize it. +// Alignment must be a power of two and a multiple of sizeof(void*), +// and should ideally be less than memory page size. A caveat of rpmalloc +// internals is that this must also be strictly less than the span size +// (default 64KiB) +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpaligned_calloc(size_t alignment, size_t num, + size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE2(2, 3); + +//! Allocate a memory block of at least the given size and alignment. +// Alignment must be a power of two and a multiple of sizeof(void*), +// and should ideally be less than memory page size. A caveat of rpmalloc +// internals is that this must also be strictly less than the span size +// (default 64KiB) +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpmemalign(size_t alignment, size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(2); + +//! Allocate a memory block of at least the given size and alignment. +// Alignment must be a power of two and a multiple of sizeof(void*), +// and should ideally be less than memory page size. A caveat of rpmalloc +// internals is that this must also be strictly less than the span size +// (default 64KiB) +RPMALLOC_EXPORT int rpposix_memalign(void **memptr, size_t alignment, + size_t size); + +//! Query the usable size of the given memory block (from given pointer to the +//! end of block) +RPMALLOC_EXPORT size_t rpmalloc_usable_size(void *ptr); + +//! Dummy empty function for forcing linker symbol inclusion +RPMALLOC_EXPORT void rpmalloc_linker_reference(void); + +#if RPMALLOC_FIRST_CLASS_HEAPS + +//! Heap type +typedef struct heap_t rpmalloc_heap_t; + +//! Acquire a new heap. Will reuse existing released heaps or allocate memory +//! for a new heap +// if none available. Heap API is implemented with the strict assumption that +// only one single thread will call heap functions for a given heap at any +// given time, no functions are thread safe. +RPMALLOC_EXPORT rpmalloc_heap_t *rpmalloc_heap_acquire(void); + +//! Release a heap (does NOT free the memory allocated by the heap, use +//! rpmalloc_heap_free_all before destroying the heap). +// Releasing a heap will enable it to be reused by other threads. Safe to pass +// a null pointer. +RPMALLOC_EXPORT void rpmalloc_heap_release(rpmalloc_heap_t *heap); + +//! Allocate a memory block of at least the given size using the given heap. +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpmalloc_heap_alloc(rpmalloc_heap_t *heap, size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(2); + +//! Allocate a memory block of at least the given size using the given heap. The +//! returned +// block will have the requested alignment. Alignment must be a power of two +// and a multiple of sizeof(void*), and should ideally be less than memory page +// size. A caveat of rpmalloc internals is that this must also be strictly less +// than the span size (default 64KiB). +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpmalloc_heap_aligned_alloc(rpmalloc_heap_t *heap, size_t alignment, + size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(3); + +//! Allocate a memory block of at least the given size using the given heap and +//! zero initialize it. +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpmalloc_heap_calloc(rpmalloc_heap_t *heap, size_t num, + size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE2(2, 3); + +//! Allocate a memory block of at least the given size using the given heap and +//! zero initialize it. The returned +// block will have the requested alignment. Alignment must either be zero, or a +// power of two and a multiple of sizeof(void*), and should ideally be less +// than memory page size. A caveat of rpmalloc internals is that this must also +// be strictly less than the span size (default 64KiB). +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpmalloc_heap_aligned_calloc(rpmalloc_heap_t *heap, size_t alignment, + size_t num, size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE2(2, 3); + +//! Reallocate the given block to at least the given size. The memory block MUST +//! be allocated +// by the same heap given to this function. +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpmalloc_heap_realloc(rpmalloc_heap_t *heap, void *ptr, size_t size, + unsigned int flags) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(3); + +//! Reallocate the given block to at least the given size. The memory block MUST +//! be allocated +// by the same heap given to this function. The returned block will have the +// requested alignment. Alignment must be either zero, or a power of two and a +// multiple of sizeof(void*), and should ideally be less than memory page size. +// A caveat of rpmalloc internals is that this must also be strictly less than +// the span size (default 64KiB). +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void *rpmalloc_heap_aligned_realloc( + rpmalloc_heap_t *heap, void *ptr, size_t alignment, size_t size, + unsigned int flags) RPMALLOC_ATTRIB_MALLOC RPMALLOC_ATTRIB_ALLOC_SIZE(4); + +//! Free the given memory block from the given heap. The memory block MUST be +//! allocated +// by the same heap given to this function. +RPMALLOC_EXPORT void rpmalloc_heap_free(rpmalloc_heap_t *heap, void *ptr); + +//! Free all memory allocated by the heap +RPMALLOC_EXPORT void rpmalloc_heap_free_all(rpmalloc_heap_t *heap); + +//! Set the given heap as the current heap for the calling thread. A heap MUST +//! only be current heap +// for a single thread, a heap can never be shared between multiple threads. +// The previous current heap for the calling thread is released to be reused by +// other threads. +RPMALLOC_EXPORT void rpmalloc_heap_thread_set_current(rpmalloc_heap_t *heap); + +//! Returns which heap the given pointer is allocated on +RPMALLOC_EXPORT rpmalloc_heap_t *rpmalloc_get_heap_for_ptr(void *ptr); + +#endif + +#ifdef __cplusplus +} +#endif diff --git a/llvm/lib/Support/rpmalloc/rpnew.h b/llvm/lib/Support/rpmalloc/rpnew.h index d8303c6f95652f..a18f0799d56d1f 100644 --- a/llvm/lib/Support/rpmalloc/rpnew.h +++ b/llvm/lib/Support/rpmalloc/rpnew.h @@ -1,113 +1,113 @@ -//===-------------------------- rpnew.h -----------------*- C -*-=============// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// -// -// This library provides a cross-platform lock free thread caching malloc -// implementation in C11. -// -//===----------------------------------------------------------------------===// - -#ifdef __cplusplus - -#include -#include - -#ifndef __CRTDECL -#define __CRTDECL -#endif - -extern void __CRTDECL operator delete(void *p) noexcept { rpfree(p); } - -extern void __CRTDECL operator delete[](void *p) noexcept { rpfree(p); } - -extern void *__CRTDECL operator new(std::size_t size) noexcept(false) { - return rpmalloc(size); -} - -extern void *__CRTDECL operator new[](std::size_t size) noexcept(false) { - return rpmalloc(size); -} - -extern void *__CRTDECL operator new(std::size_t size, - const std::nothrow_t &tag) noexcept { - (void)sizeof(tag); - return rpmalloc(size); -} - -extern void *__CRTDECL operator new[](std::size_t size, - const std::nothrow_t &tag) noexcept { - (void)sizeof(tag); - return rpmalloc(size); -} - -#if (__cplusplus >= 201402L || _MSC_VER >= 1916) - -extern void __CRTDECL operator delete(void *p, std::size_t size) noexcept { - (void)sizeof(size); - rpfree(p); -} - -extern void __CRTDECL operator delete[](void *p, std::size_t size) noexcept { - (void)sizeof(size); - rpfree(p); -} - -#endif - -#if (__cplusplus > 201402L || defined(__cpp_aligned_new)) - -extern void __CRTDECL operator delete(void *p, - std::align_val_t align) noexcept { - (void)sizeof(align); - rpfree(p); -} - -extern void __CRTDECL operator delete[](void *p, - std::align_val_t align) noexcept { - (void)sizeof(align); - rpfree(p); -} - -extern void __CRTDECL operator delete(void *p, std::size_t size, - std::align_val_t align) noexcept { - (void)sizeof(size); - (void)sizeof(align); - rpfree(p); -} - -extern void __CRTDECL operator delete[](void *p, std::size_t size, - std::align_val_t align) noexcept { - (void)sizeof(size); - (void)sizeof(align); - rpfree(p); -} - -extern void *__CRTDECL operator new(std::size_t size, - std::align_val_t align) noexcept(false) { - return rpaligned_alloc(static_cast(align), size); -} - -extern void *__CRTDECL operator new[](std::size_t size, - std::align_val_t align) noexcept(false) { - return rpaligned_alloc(static_cast(align), size); -} - -extern void *__CRTDECL operator new(std::size_t size, std::align_val_t align, - const std::nothrow_t &tag) noexcept { - (void)sizeof(tag); - return rpaligned_alloc(static_cast(align), size); -} - -extern void *__CRTDECL operator new[](std::size_t size, std::align_val_t align, - const std::nothrow_t &tag) noexcept { - (void)sizeof(tag); - return rpaligned_alloc(static_cast(align), size); -} - -#endif - -#endif +//===-------------------------- rpnew.h -----------------*- C -*-=============// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This library provides a cross-platform lock free thread caching malloc +// implementation in C11. +// +//===----------------------------------------------------------------------===// + +#ifdef __cplusplus + +#include +#include + +#ifndef __CRTDECL +#define __CRTDECL +#endif + +extern void __CRTDECL operator delete(void *p) noexcept { rpfree(p); } + +extern void __CRTDECL operator delete[](void *p) noexcept { rpfree(p); } + +extern void *__CRTDECL operator new(std::size_t size) noexcept(false) { + return rpmalloc(size); +} + +extern void *__CRTDECL operator new[](std::size_t size) noexcept(false) { + return rpmalloc(size); +} + +extern void *__CRTDECL operator new(std::size_t size, + const std::nothrow_t &tag) noexcept { + (void)sizeof(tag); + return rpmalloc(size); +} + +extern void *__CRTDECL operator new[](std::size_t size, + const std::nothrow_t &tag) noexcept { + (void)sizeof(tag); + return rpmalloc(size); +} + +#if (__cplusplus >= 201402L || _MSC_VER >= 1916) + +extern void __CRTDECL operator delete(void *p, std::size_t size) noexcept { + (void)sizeof(size); + rpfree(p); +} + +extern void __CRTDECL operator delete[](void *p, std::size_t size) noexcept { + (void)sizeof(size); + rpfree(p); +} + +#endif + +#if (__cplusplus > 201402L || defined(__cpp_aligned_new)) + +extern void __CRTDECL operator delete(void *p, + std::align_val_t align) noexcept { + (void)sizeof(align); + rpfree(p); +} + +extern void __CRTDECL operator delete[](void *p, + std::align_val_t align) noexcept { + (void)sizeof(align); + rpfree(p); +} + +extern void __CRTDECL operator delete(void *p, std::size_t size, + std::align_val_t align) noexcept { + (void)sizeof(size); + (void)sizeof(align); + rpfree(p); +} + +extern void __CRTDECL operator delete[](void *p, std::size_t size, + std::align_val_t align) noexcept { + (void)sizeof(size); + (void)sizeof(align); + rpfree(p); +} + +extern void *__CRTDECL operator new(std::size_t size, + std::align_val_t align) noexcept(false) { + return rpaligned_alloc(static_cast(align), size); +} + +extern void *__CRTDECL operator new[](std::size_t size, + std::align_val_t align) noexcept(false) { + return rpaligned_alloc(static_cast(align), size); +} + +extern void *__CRTDECL operator new(std::size_t size, std::align_val_t align, + const std::nothrow_t &tag) noexcept { + (void)sizeof(tag); + return rpaligned_alloc(static_cast(align), size); +} + +extern void *__CRTDECL operator new[](std::size_t size, std::align_val_t align, + const std::nothrow_t &tag) noexcept { + (void)sizeof(tag); + return rpaligned_alloc(static_cast(align), size); +} + +#endif + +#endif diff --git a/llvm/lib/Target/DirectX/DXILFinalizeLinkage.cpp b/llvm/lib/Target/DirectX/DXILFinalizeLinkage.cpp index d315d9bd16f439..d32dda2a67c951 100644 --- a/llvm/lib/Target/DirectX/DXILFinalizeLinkage.cpp +++ b/llvm/lib/Target/DirectX/DXILFinalizeLinkage.cpp @@ -1,65 +1,65 @@ -//===- DXILFinalizeLinkage.cpp - Finalize linkage of functions ------------===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// - -#include "DXILFinalizeLinkage.h" -#include "DirectX.h" -#include "llvm/Analysis/DXILResource.h" -#include "llvm/IR/Function.h" -#include "llvm/IR/GlobalValue.h" -#include "llvm/IR/Metadata.h" -#include "llvm/IR/Module.h" - -#define DEBUG_TYPE "dxil-finalize-linkage" - -using namespace llvm; - -static bool finalizeLinkage(Module &M) { - SmallPtrSet EntriesAndExports; - - // Find all entry points and export functions - for (Function &EF : M.functions()) { - if (!EF.hasFnAttribute("hlsl.shader") && !EF.hasFnAttribute("hlsl.export")) - continue; - EntriesAndExports.insert(&EF); - } - - for (Function &F : M.functions()) { - if (F.getLinkage() == GlobalValue::ExternalLinkage && - !EntriesAndExports.contains(&F)) { - F.setLinkage(GlobalValue::InternalLinkage); - } - } - - return false; -} - -PreservedAnalyses DXILFinalizeLinkage::run(Module &M, - ModuleAnalysisManager &AM) { - if (finalizeLinkage(M)) - return PreservedAnalyses::none(); - return PreservedAnalyses::all(); -} - -bool DXILFinalizeLinkageLegacy::runOnModule(Module &M) { - return finalizeLinkage(M); -} - -void DXILFinalizeLinkageLegacy::getAnalysisUsage(AnalysisUsage &AU) const { - AU.addPreserved(); -} - -char DXILFinalizeLinkageLegacy::ID = 0; - -INITIALIZE_PASS_BEGIN(DXILFinalizeLinkageLegacy, DEBUG_TYPE, - "DXIL Finalize Linkage", false, false) -INITIALIZE_PASS_END(DXILFinalizeLinkageLegacy, DEBUG_TYPE, - "DXIL Finalize Linkage", false, false) - -ModulePass *llvm::createDXILFinalizeLinkageLegacyPass() { - return new DXILFinalizeLinkageLegacy(); -} +//===- DXILFinalizeLinkage.cpp - Finalize linkage of functions ------------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#include "DXILFinalizeLinkage.h" +#include "DirectX.h" +#include "llvm/Analysis/DXILResource.h" +#include "llvm/IR/Function.h" +#include "llvm/IR/GlobalValue.h" +#include "llvm/IR/Metadata.h" +#include "llvm/IR/Module.h" + +#define DEBUG_TYPE "dxil-finalize-linkage" + +using namespace llvm; + +static bool finalizeLinkage(Module &M) { + SmallPtrSet EntriesAndExports; + + // Find all entry points and export functions + for (Function &EF : M.functions()) { + if (!EF.hasFnAttribute("hlsl.shader") && !EF.hasFnAttribute("hlsl.export")) + continue; + EntriesAndExports.insert(&EF); + } + + for (Function &F : M.functions()) { + if (F.getLinkage() == GlobalValue::ExternalLinkage && + !EntriesAndExports.contains(&F)) { + F.setLinkage(GlobalValue::InternalLinkage); + } + } + + return false; +} + +PreservedAnalyses DXILFinalizeLinkage::run(Module &M, + ModuleAnalysisManager &AM) { + if (finalizeLinkage(M)) + return PreservedAnalyses::none(); + return PreservedAnalyses::all(); +} + +bool DXILFinalizeLinkageLegacy::runOnModule(Module &M) { + return finalizeLinkage(M); +} + +void DXILFinalizeLinkageLegacy::getAnalysisUsage(AnalysisUsage &AU) const { + AU.addPreserved(); +} + +char DXILFinalizeLinkageLegacy::ID = 0; + +INITIALIZE_PASS_BEGIN(DXILFinalizeLinkageLegacy, DEBUG_TYPE, + "DXIL Finalize Linkage", false, false) +INITIALIZE_PASS_END(DXILFinalizeLinkageLegacy, DEBUG_TYPE, + "DXIL Finalize Linkage", false, false) + +ModulePass *llvm::createDXILFinalizeLinkageLegacyPass() { + return new DXILFinalizeLinkageLegacy(); +} diff --git a/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.cpp b/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.cpp index be714b5c87895a..71573f9630f47d 100644 --- a/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.cpp +++ b/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.cpp @@ -1,35 +1,35 @@ -//===- DirectXTargetTransformInfo.cpp - DirectX TTI ---------------*- C++ -//-*-===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// -/// -//===----------------------------------------------------------------------===// - -#include "DirectXTargetTransformInfo.h" -#include "llvm/IR/Intrinsics.h" -#include "llvm/IR/IntrinsicsDirectX.h" - -using namespace llvm; - -bool DirectXTTIImpl::isTargetIntrinsicWithScalarOpAtArg(Intrinsic::ID ID, - unsigned ScalarOpdIdx) { - switch (ID) { - default: - return false; - } -} - -bool DirectXTTIImpl::isTargetIntrinsicTriviallyScalarizable( - Intrinsic::ID ID) const { - switch (ID) { - case Intrinsic::dx_frac: - case Intrinsic::dx_rsqrt: - return true; - default: - return false; - } -} +//===- DirectXTargetTransformInfo.cpp - DirectX TTI ---------------*- C++ +//-*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +/// +//===----------------------------------------------------------------------===// + +#include "DirectXTargetTransformInfo.h" +#include "llvm/IR/Intrinsics.h" +#include "llvm/IR/IntrinsicsDirectX.h" + +using namespace llvm; + +bool DirectXTTIImpl::isTargetIntrinsicWithScalarOpAtArg(Intrinsic::ID ID, + unsigned ScalarOpdIdx) { + switch (ID) { + default: + return false; + } +} + +bool DirectXTTIImpl::isTargetIntrinsicTriviallyScalarizable( + Intrinsic::ID ID) const { + switch (ID) { + case Intrinsic::dx_frac: + case Intrinsic::dx_rsqrt: + return true; + default: + return false; + } +} diff --git a/llvm/test/CodeGen/DirectX/atan2.ll b/llvm/test/CodeGen/DirectX/atan2.ll index 9d86f87f3ed50e..b2c650d1162655 100644 --- a/llvm/test/CodeGen/DirectX/atan2.ll +++ b/llvm/test/CodeGen/DirectX/atan2.ll @@ -1,87 +1,87 @@ -; RUN: opt -S -dxil-intrinsic-expansion -mtriple=dxil-pc-shadermodel6.3-library %s | FileCheck %s --check-prefixes=CHECK,EXPCHECK -; RUN: opt -S -dxil-intrinsic-expansion -scalarizer -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library %s | FileCheck %s --check-prefixes=CHECK,DOPCHECK - -; Make sure correct dxil expansions for atan2 are generated for float and half. - -define noundef float @atan2_float(float noundef %y, float noundef %x) { -entry: -; CHECK: [[DIV:%.+]] = fdiv float %y, %x -; EXPCHECK: [[ATAN:%.+]] = call float @llvm.atan.f32(float [[DIV]]) -; DOPCHECK: [[ATAN:%.+]] = call float @dx.op.unary.f32(i32 17, float [[DIV]]) -; CHECK-DAG: [[ADD_PI:%.+]] = fadd float [[ATAN]], 0x400921FB60000000 -; CHECK-DAG: [[SUB_PI:%.+]] = fsub float [[ATAN]], 0x400921FB60000000 -; CHECK-DAG: [[X_LT_0:%.+]] = fcmp olt float %x, 0.000000e+00 -; CHECK-DAG: [[X_EQ_0:%.+]] = fcmp oeq float %x, 0.000000e+00 -; CHECK-DAG: [[Y_GE_0:%.+]] = fcmp oge float %y, 0.000000e+00 -; CHECK-DAG: [[Y_LT_0:%.+]] = fcmp olt float %y, 0.000000e+00 -; CHECK: [[XLT0_AND_YGE0:%.+]] = and i1 [[X_LT_0]], [[Y_GE_0]] -; CHECK: [[SELECT_ADD_PI:%.+]] = select i1 [[XLT0_AND_YGE0]], float [[ADD_PI]], float [[ATAN]] -; CHECK: [[XLT0_AND_YLT0:%.+]] = and i1 [[X_LT_0]], [[Y_LT_0]] -; CHECK: [[SELECT_SUB_PI:%.+]] = select i1 [[XLT0_AND_YLT0]], float [[SUB_PI]], float [[SELECT_ADD_PI]] -; CHECK: [[XEQ0_AND_YLT0:%.+]] = and i1 [[X_EQ_0]], [[Y_LT_0]] -; CHECK: [[SELECT_NEGHPI:%.+]] = select i1 [[XEQ0_AND_YLT0]], float 0xBFF921FB60000000, float [[SELECT_SUB_PI]] -; CHECK: [[XEQ0_AND_YGE0:%.+]] = and i1 [[X_EQ_0]], [[Y_GE_0]] -; CHECK: [[SELECT_HPI:%.+]] = select i1 [[XEQ0_AND_YGE0]], float 0x3FF921FB60000000, float [[SELECT_NEGHPI]] -; CHECK: ret float [[SELECT_HPI]] - %elt.atan2 = call float @llvm.atan2.f32(float %y, float %x) - ret float %elt.atan2 -} - -define noundef half @atan2_half(half noundef %y, half noundef %x) { -entry: -; CHECK: [[DIV:%.+]] = fdiv half %y, %x -; EXPCHECK: [[ATAN:%.+]] = call half @llvm.atan.f16(half [[DIV]]) -; DOPCHECK: [[ATAN:%.+]] = call half @dx.op.unary.f16(i32 17, half [[DIV]]) -; CHECK-DAG: [[ADD_PI:%.+]] = fadd half [[ATAN]], 0xH4248 -; CHECK-DAG: [[SUB_PI:%.+]] = fsub half [[ATAN]], 0xH4248 -; CHECK-DAG: [[X_LT_0:%.+]] = fcmp olt half %x, 0xH0000 -; CHECK-DAG: [[X_EQ_0:%.+]] = fcmp oeq half %x, 0xH0000 -; CHECK-DAG: [[Y_GE_0:%.+]] = fcmp oge half %y, 0xH0000 -; CHECK-DAG: [[Y_LT_0:%.+]] = fcmp olt half %y, 0xH0000 -; CHECK: [[XLT0_AND_YGE0:%.+]] = and i1 [[X_LT_0]], [[Y_GE_0]] -; CHECK: [[SELECT_ADD_PI:%.+]] = select i1 [[XLT0_AND_YGE0]], half [[ADD_PI]], half [[ATAN]] -; CHECK: [[XLT0_AND_YLT0:%.+]] = and i1 [[X_LT_0]], [[Y_LT_0]] -; CHECK: [[SELECT_SUB_PI:%.+]] = select i1 [[XLT0_AND_YLT0]], half [[SUB_PI]], half [[SELECT_ADD_PI]] -; CHECK: [[XEQ0_AND_YLT0:%.+]] = and i1 [[X_EQ_0]], [[Y_LT_0]] -; CHECK: [[SELECT_NEGHPI:%.+]] = select i1 [[XEQ0_AND_YLT0]], half 0xHBE48, half [[SELECT_SUB_PI]] -; CHECK: [[XEQ0_AND_YGE0:%.+]] = and i1 [[X_EQ_0]], [[Y_GE_0]] -; CHECK: [[SELECT_HPI:%.+]] = select i1 [[XEQ0_AND_YGE0]], half 0xH3E48, half [[SELECT_NEGHPI]] -; CHECK: ret half [[SELECT_HPI]] - %elt.atan2 = call half @llvm.atan2.f16(half %y, half %x) - ret half %elt.atan2 -} - -define noundef <4 x float> @atan2_float4(<4 x float> noundef %y, <4 x float> noundef %x) { -entry: -; Just Expansion, no scalarization or lowering: -; EXPCHECK: [[DIV:%.+]] = fdiv <4 x float> %y, %x -; EXPCHECK: [[ATAN:%.+]] = call <4 x float> @llvm.atan.v4f32(<4 x float> [[DIV]]) -; EXPCHECK-DAG: [[ADD_PI:%.+]] = fadd <4 x float> [[ATAN]], -; EXPCHECK-DAG: [[SUB_PI:%.+]] = fsub <4 x float> [[ATAN]], -; EXPCHECK-DAG: [[X_LT_0:%.+]] = fcmp olt <4 x float> %x, zeroinitializer -; EXPCHECK-DAG: [[X_EQ_0:%.+]] = fcmp oeq <4 x float> %x, zeroinitializer -; EXPCHECK-DAG: [[Y_GE_0:%.+]] = fcmp oge <4 x float> %y, zeroinitializer -; EXPCHECK-DAG: [[Y_LT_0:%.+]] = fcmp olt <4 x float> %y, zeroinitializer -; EXPCHECK: [[XLT0_AND_YGE0:%.+]] = and <4 x i1> [[X_LT_0]], [[Y_GE_0]] -; EXPCHECK: [[SELECT_ADD_PI:%.+]] = select <4 x i1> [[XLT0_AND_YGE0]], <4 x float> [[ADD_PI]], <4 x float> [[ATAN]] -; EXPCHECK: [[XLT0_AND_YLT0:%.+]] = and <4 x i1> [[X_LT_0]], [[Y_LT_0]] -; EXPCHECK: [[SELECT_SUB_PI:%.+]] = select <4 x i1> [[XLT0_AND_YLT0]], <4 x float> [[SUB_PI]], <4 x float> [[SELECT_ADD_PI]] -; EXPCHECK: [[XEQ0_AND_YLT0:%.+]] = and <4 x i1> [[X_EQ_0]], [[Y_LT_0]] -; EXPCHECK: [[SELECT_NEGHPI:%.+]] = select <4 x i1> [[XEQ0_AND_YLT0]], <4 x float> , <4 x float> [[SELECT_SUB_PI]] -; EXPCHECK: [[XEQ0_AND_YGE0:%.+]] = and <4 x i1> [[X_EQ_0]], [[Y_GE_0]] -; EXPCHECK: [[SELECT_HPI:%.+]] = select <4 x i1> [[XEQ0_AND_YGE0]], <4 x float> , <4 x float> [[SELECT_NEGHPI]] -; EXPCHECK: ret <4 x float> [[SELECT_HPI]] - -; Scalarization occurs after expansion, so atan scalarization is tested separately. -; Expansion, scalarization and lowering: -; Just make sure this expands to exactly 4 scalar DXIL atan (OpCode=17) calls. -; DOPCHECK-COUNT-4: call float @dx.op.unary.f32(i32 17, float %{{.*}}) -; DOPCHECK-NOT: call float @dx.op.unary.f32(i32 17, - - %elt.atan2 = call <4 x float> @llvm.atan2.v4f32(<4 x float> %y, <4 x float> %x) - ret <4 x float> %elt.atan2 -} - -declare half @llvm.atan2.f16(half, half) -declare float @llvm.atan2.f32(float, float) -declare <4 x float> @llvm.atan2.v4f32(<4 x float>, <4 x float>) +; RUN: opt -S -dxil-intrinsic-expansion -mtriple=dxil-pc-shadermodel6.3-library %s | FileCheck %s --check-prefixes=CHECK,EXPCHECK +; RUN: opt -S -dxil-intrinsic-expansion -scalarizer -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library %s | FileCheck %s --check-prefixes=CHECK,DOPCHECK + +; Make sure correct dxil expansions for atan2 are generated for float and half. + +define noundef float @atan2_float(float noundef %y, float noundef %x) { +entry: +; CHECK: [[DIV:%.+]] = fdiv float %y, %x +; EXPCHECK: [[ATAN:%.+]] = call float @llvm.atan.f32(float [[DIV]]) +; DOPCHECK: [[ATAN:%.+]] = call float @dx.op.unary.f32(i32 17, float [[DIV]]) +; CHECK-DAG: [[ADD_PI:%.+]] = fadd float [[ATAN]], 0x400921FB60000000 +; CHECK-DAG: [[SUB_PI:%.+]] = fsub float [[ATAN]], 0x400921FB60000000 +; CHECK-DAG: [[X_LT_0:%.+]] = fcmp olt float %x, 0.000000e+00 +; CHECK-DAG: [[X_EQ_0:%.+]] = fcmp oeq float %x, 0.000000e+00 +; CHECK-DAG: [[Y_GE_0:%.+]] = fcmp oge float %y, 0.000000e+00 +; CHECK-DAG: [[Y_LT_0:%.+]] = fcmp olt float %y, 0.000000e+00 +; CHECK: [[XLT0_AND_YGE0:%.+]] = and i1 [[X_LT_0]], [[Y_GE_0]] +; CHECK: [[SELECT_ADD_PI:%.+]] = select i1 [[XLT0_AND_YGE0]], float [[ADD_PI]], float [[ATAN]] +; CHECK: [[XLT0_AND_YLT0:%.+]] = and i1 [[X_LT_0]], [[Y_LT_0]] +; CHECK: [[SELECT_SUB_PI:%.+]] = select i1 [[XLT0_AND_YLT0]], float [[SUB_PI]], float [[SELECT_ADD_PI]] +; CHECK: [[XEQ0_AND_YLT0:%.+]] = and i1 [[X_EQ_0]], [[Y_LT_0]] +; CHECK: [[SELECT_NEGHPI:%.+]] = select i1 [[XEQ0_AND_YLT0]], float 0xBFF921FB60000000, float [[SELECT_SUB_PI]] +; CHECK: [[XEQ0_AND_YGE0:%.+]] = and i1 [[X_EQ_0]], [[Y_GE_0]] +; CHECK: [[SELECT_HPI:%.+]] = select i1 [[XEQ0_AND_YGE0]], float 0x3FF921FB60000000, float [[SELECT_NEGHPI]] +; CHECK: ret float [[SELECT_HPI]] + %elt.atan2 = call float @llvm.atan2.f32(float %y, float %x) + ret float %elt.atan2 +} + +define noundef half @atan2_half(half noundef %y, half noundef %x) { +entry: +; CHECK: [[DIV:%.+]] = fdiv half %y, %x +; EXPCHECK: [[ATAN:%.+]] = call half @llvm.atan.f16(half [[DIV]]) +; DOPCHECK: [[ATAN:%.+]] = call half @dx.op.unary.f16(i32 17, half [[DIV]]) +; CHECK-DAG: [[ADD_PI:%.+]] = fadd half [[ATAN]], 0xH4248 +; CHECK-DAG: [[SUB_PI:%.+]] = fsub half [[ATAN]], 0xH4248 +; CHECK-DAG: [[X_LT_0:%.+]] = fcmp olt half %x, 0xH0000 +; CHECK-DAG: [[X_EQ_0:%.+]] = fcmp oeq half %x, 0xH0000 +; CHECK-DAG: [[Y_GE_0:%.+]] = fcmp oge half %y, 0xH0000 +; CHECK-DAG: [[Y_LT_0:%.+]] = fcmp olt half %y, 0xH0000 +; CHECK: [[XLT0_AND_YGE0:%.+]] = and i1 [[X_LT_0]], [[Y_GE_0]] +; CHECK: [[SELECT_ADD_PI:%.+]] = select i1 [[XLT0_AND_YGE0]], half [[ADD_PI]], half [[ATAN]] +; CHECK: [[XLT0_AND_YLT0:%.+]] = and i1 [[X_LT_0]], [[Y_LT_0]] +; CHECK: [[SELECT_SUB_PI:%.+]] = select i1 [[XLT0_AND_YLT0]], half [[SUB_PI]], half [[SELECT_ADD_PI]] +; CHECK: [[XEQ0_AND_YLT0:%.+]] = and i1 [[X_EQ_0]], [[Y_LT_0]] +; CHECK: [[SELECT_NEGHPI:%.+]] = select i1 [[XEQ0_AND_YLT0]], half 0xHBE48, half [[SELECT_SUB_PI]] +; CHECK: [[XEQ0_AND_YGE0:%.+]] = and i1 [[X_EQ_0]], [[Y_GE_0]] +; CHECK: [[SELECT_HPI:%.+]] = select i1 [[XEQ0_AND_YGE0]], half 0xH3E48, half [[SELECT_NEGHPI]] +; CHECK: ret half [[SELECT_HPI]] + %elt.atan2 = call half @llvm.atan2.f16(half %y, half %x) + ret half %elt.atan2 +} + +define noundef <4 x float> @atan2_float4(<4 x float> noundef %y, <4 x float> noundef %x) { +entry: +; Just Expansion, no scalarization or lowering: +; EXPCHECK: [[DIV:%.+]] = fdiv <4 x float> %y, %x +; EXPCHECK: [[ATAN:%.+]] = call <4 x float> @llvm.atan.v4f32(<4 x float> [[DIV]]) +; EXPCHECK-DAG: [[ADD_PI:%.+]] = fadd <4 x float> [[ATAN]], +; EXPCHECK-DAG: [[SUB_PI:%.+]] = fsub <4 x float> [[ATAN]], +; EXPCHECK-DAG: [[X_LT_0:%.+]] = fcmp olt <4 x float> %x, zeroinitializer +; EXPCHECK-DAG: [[X_EQ_0:%.+]] = fcmp oeq <4 x float> %x, zeroinitializer +; EXPCHECK-DAG: [[Y_GE_0:%.+]] = fcmp oge <4 x float> %y, zeroinitializer +; EXPCHECK-DAG: [[Y_LT_0:%.+]] = fcmp olt <4 x float> %y, zeroinitializer +; EXPCHECK: [[XLT0_AND_YGE0:%.+]] = and <4 x i1> [[X_LT_0]], [[Y_GE_0]] +; EXPCHECK: [[SELECT_ADD_PI:%.+]] = select <4 x i1> [[XLT0_AND_YGE0]], <4 x float> [[ADD_PI]], <4 x float> [[ATAN]] +; EXPCHECK: [[XLT0_AND_YLT0:%.+]] = and <4 x i1> [[X_LT_0]], [[Y_LT_0]] +; EXPCHECK: [[SELECT_SUB_PI:%.+]] = select <4 x i1> [[XLT0_AND_YLT0]], <4 x float> [[SUB_PI]], <4 x float> [[SELECT_ADD_PI]] +; EXPCHECK: [[XEQ0_AND_YLT0:%.+]] = and <4 x i1> [[X_EQ_0]], [[Y_LT_0]] +; EXPCHECK: [[SELECT_NEGHPI:%.+]] = select <4 x i1> [[XEQ0_AND_YLT0]], <4 x float> , <4 x float> [[SELECT_SUB_PI]] +; EXPCHECK: [[XEQ0_AND_YGE0:%.+]] = and <4 x i1> [[X_EQ_0]], [[Y_GE_0]] +; EXPCHECK: [[SELECT_HPI:%.+]] = select <4 x i1> [[XEQ0_AND_YGE0]], <4 x float> , <4 x float> [[SELECT_NEGHPI]] +; EXPCHECK: ret <4 x float> [[SELECT_HPI]] + +; Scalarization occurs after expansion, so atan scalarization is tested separately. +; Expansion, scalarization and lowering: +; Just make sure this expands to exactly 4 scalar DXIL atan (OpCode=17) calls. +; DOPCHECK-COUNT-4: call float @dx.op.unary.f32(i32 17, float %{{.*}}) +; DOPCHECK-NOT: call float @dx.op.unary.f32(i32 17, + + %elt.atan2 = call <4 x float> @llvm.atan2.v4f32(<4 x float> %y, <4 x float> %x) + ret <4 x float> %elt.atan2 +} + +declare half @llvm.atan2.f16(half, half) +declare float @llvm.atan2.f32(float, float) +declare <4 x float> @llvm.atan2.v4f32(<4 x float>, <4 x float>) diff --git a/llvm/test/CodeGen/DirectX/atan2_error.ll b/llvm/test/CodeGen/DirectX/atan2_error.ll index 372934098b7cab..9b66f9f1dd45a7 100644 --- a/llvm/test/CodeGen/DirectX/atan2_error.ll +++ b/llvm/test/CodeGen/DirectX/atan2_error.ll @@ -1,11 +1,11 @@ -; RUN: not opt -S -dxil-intrinsic-expansion -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library %s 2>&1 | FileCheck %s - -; DXIL operation atan does not support double overload type -; CHECK: in function atan2_double -; CHECK-SAME: Cannot create ATan operation: Invalid overload type - -define noundef double @atan2_double(double noundef %a, double noundef %b) #0 { -entry: - %1 = call double @llvm.atan2.f64(double %a, double %b) - ret double %1 -} +; RUN: not opt -S -dxil-intrinsic-expansion -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library %s 2>&1 | FileCheck %s + +; DXIL operation atan does not support double overload type +; CHECK: in function atan2_double +; CHECK-SAME: Cannot create ATan operation: Invalid overload type + +define noundef double @atan2_double(double noundef %a, double noundef %b) #0 { +entry: + %1 = call double @llvm.atan2.f64(double %a, double %b) + ret double %1 +} diff --git a/llvm/test/CodeGen/DirectX/cross.ll b/llvm/test/CodeGen/DirectX/cross.ll index 6ec3ec4d3594af..6153cf7cddc9d5 100644 --- a/llvm/test/CodeGen/DirectX/cross.ll +++ b/llvm/test/CodeGen/DirectX/cross.ll @@ -1,56 +1,56 @@ -; RUN: opt -S -dxil-intrinsic-expansion < %s | FileCheck %s - -; Make sure dxil operation function calls for cross are generated for half/float. - -declare <3 x half> @llvm.dx.cross.v3f16(<3 x half>, <3 x half>) -declare <3 x float> @llvm.dx.cross.v3f32(<3 x float>, <3 x float>) - -define noundef <3 x half> @test_cross_half3(<3 x half> noundef %p0, <3 x half> noundef %p1) { -entry: - ; CHECK: %x0 = extractelement <3 x half> %p0, i64 0 - ; CHECK: %x1 = extractelement <3 x half> %p0, i64 1 - ; CHECK: %x2 = extractelement <3 x half> %p0, i64 2 - ; CHECK: %y0 = extractelement <3 x half> %p1, i64 0 - ; CHECK: %y1 = extractelement <3 x half> %p1, i64 1 - ; CHECK: %y2 = extractelement <3 x half> %p1, i64 2 - ; CHECK: %0 = fmul half %x1, %y2 - ; CHECK: %1 = fmul half %x2, %y1 - ; CHECK: %hlsl.cross1 = fsub half %0, %1 - ; CHECK: %2 = fmul half %x2, %y0 - ; CHECK: %3 = fmul half %x0, %y2 - ; CHECK: %hlsl.cross2 = fsub half %2, %3 - ; CHECK: %4 = fmul half %x0, %y1 - ; CHECK: %5 = fmul half %x1, %y0 - ; CHECK: %hlsl.cross3 = fsub half %4, %5 - ; CHECK: %6 = insertelement <3 x half> undef, half %hlsl.cross1, i64 0 - ; CHECK: %7 = insertelement <3 x half> %6, half %hlsl.cross2, i64 1 - ; CHECK: %8 = insertelement <3 x half> %7, half %hlsl.cross3, i64 2 - ; CHECK: ret <3 x half> %8 - %hlsl.cross = call <3 x half> @llvm.dx.cross.v3f16(<3 x half> %p0, <3 x half> %p1) - ret <3 x half> %hlsl.cross -} - -define noundef <3 x float> @test_cross_float3(<3 x float> noundef %p0, <3 x float> noundef %p1) { -entry: - ; CHECK: %x0 = extractelement <3 x float> %p0, i64 0 - ; CHECK: %x1 = extractelement <3 x float> %p0, i64 1 - ; CHECK: %x2 = extractelement <3 x float> %p0, i64 2 - ; CHECK: %y0 = extractelement <3 x float> %p1, i64 0 - ; CHECK: %y1 = extractelement <3 x float> %p1, i64 1 - ; CHECK: %y2 = extractelement <3 x float> %p1, i64 2 - ; CHECK: %0 = fmul float %x1, %y2 - ; CHECK: %1 = fmul float %x2, %y1 - ; CHECK: %hlsl.cross1 = fsub float %0, %1 - ; CHECK: %2 = fmul float %x2, %y0 - ; CHECK: %3 = fmul float %x0, %y2 - ; CHECK: %hlsl.cross2 = fsub float %2, %3 - ; CHECK: %4 = fmul float %x0, %y1 - ; CHECK: %5 = fmul float %x1, %y0 - ; CHECK: %hlsl.cross3 = fsub float %4, %5 - ; CHECK: %6 = insertelement <3 x float> undef, float %hlsl.cross1, i64 0 - ; CHECK: %7 = insertelement <3 x float> %6, float %hlsl.cross2, i64 1 - ; CHECK: %8 = insertelement <3 x float> %7, float %hlsl.cross3, i64 2 - ; CHECK: ret <3 x float> %8 - %hlsl.cross = call <3 x float> @llvm.dx.cross.v3f32(<3 x float> %p0, <3 x float> %p1) - ret <3 x float> %hlsl.cross -} +; RUN: opt -S -dxil-intrinsic-expansion < %s | FileCheck %s + +; Make sure dxil operation function calls for cross are generated for half/float. + +declare <3 x half> @llvm.dx.cross.v3f16(<3 x half>, <3 x half>) +declare <3 x float> @llvm.dx.cross.v3f32(<3 x float>, <3 x float>) + +define noundef <3 x half> @test_cross_half3(<3 x half> noundef %p0, <3 x half> noundef %p1) { +entry: + ; CHECK: %x0 = extractelement <3 x half> %p0, i64 0 + ; CHECK: %x1 = extractelement <3 x half> %p0, i64 1 + ; CHECK: %x2 = extractelement <3 x half> %p0, i64 2 + ; CHECK: %y0 = extractelement <3 x half> %p1, i64 0 + ; CHECK: %y1 = extractelement <3 x half> %p1, i64 1 + ; CHECK: %y2 = extractelement <3 x half> %p1, i64 2 + ; CHECK: %0 = fmul half %x1, %y2 + ; CHECK: %1 = fmul half %x2, %y1 + ; CHECK: %hlsl.cross1 = fsub half %0, %1 + ; CHECK: %2 = fmul half %x2, %y0 + ; CHECK: %3 = fmul half %x0, %y2 + ; CHECK: %hlsl.cross2 = fsub half %2, %3 + ; CHECK: %4 = fmul half %x0, %y1 + ; CHECK: %5 = fmul half %x1, %y0 + ; CHECK: %hlsl.cross3 = fsub half %4, %5 + ; CHECK: %6 = insertelement <3 x half> undef, half %hlsl.cross1, i64 0 + ; CHECK: %7 = insertelement <3 x half> %6, half %hlsl.cross2, i64 1 + ; CHECK: %8 = insertelement <3 x half> %7, half %hlsl.cross3, i64 2 + ; CHECK: ret <3 x half> %8 + %hlsl.cross = call <3 x half> @llvm.dx.cross.v3f16(<3 x half> %p0, <3 x half> %p1) + ret <3 x half> %hlsl.cross +} + +define noundef <3 x float> @test_cross_float3(<3 x float> noundef %p0, <3 x float> noundef %p1) { +entry: + ; CHECK: %x0 = extractelement <3 x float> %p0, i64 0 + ; CHECK: %x1 = extractelement <3 x float> %p0, i64 1 + ; CHECK: %x2 = extractelement <3 x float> %p0, i64 2 + ; CHECK: %y0 = extractelement <3 x float> %p1, i64 0 + ; CHECK: %y1 = extractelement <3 x float> %p1, i64 1 + ; CHECK: %y2 = extractelement <3 x float> %p1, i64 2 + ; CHECK: %0 = fmul float %x1, %y2 + ; CHECK: %1 = fmul float %x2, %y1 + ; CHECK: %hlsl.cross1 = fsub float %0, %1 + ; CHECK: %2 = fmul float %x2, %y0 + ; CHECK: %3 = fmul float %x0, %y2 + ; CHECK: %hlsl.cross2 = fsub float %2, %3 + ; CHECK: %4 = fmul float %x0, %y1 + ; CHECK: %5 = fmul float %x1, %y0 + ; CHECK: %hlsl.cross3 = fsub float %4, %5 + ; CHECK: %6 = insertelement <3 x float> undef, float %hlsl.cross1, i64 0 + ; CHECK: %7 = insertelement <3 x float> %6, float %hlsl.cross2, i64 1 + ; CHECK: %8 = insertelement <3 x float> %7, float %hlsl.cross3, i64 2 + ; CHECK: ret <3 x float> %8 + %hlsl.cross = call <3 x float> @llvm.dx.cross.v3f32(<3 x float> %p0, <3 x float> %p1) + ret <3 x float> %hlsl.cross +} diff --git a/llvm/test/CodeGen/DirectX/finalize_linkage.ll b/llvm/test/CodeGen/DirectX/finalize_linkage.ll index 0ee8a5f44593ba..b6da9f6cb3926a 100644 --- a/llvm/test/CodeGen/DirectX/finalize_linkage.ll +++ b/llvm/test/CodeGen/DirectX/finalize_linkage.ll @@ -1,64 +1,64 @@ -; RUN: opt -S -dxil-finalize-linkage -mtriple=dxil-unknown-shadermodel6.5-compute %s | FileCheck %s -; RUN: llc %s --filetype=asm -o - | FileCheck %s --check-prefixes=CHECK-LLC - -target triple = "dxilv1.5-pc-shadermodel6.5-compute" - -; DXILFinalizeLinkage changes linkage of all functions that are not -; entry points or exported function to internal. - -; CHECK: define internal void @"?f1@@YAXXZ"() -define void @"?f1@@YAXXZ"() #0 { -entry: - ret void -} - -; CHECK: define internal void @"?f2@@YAXXZ"() -define void @"?f2@@YAXXZ"() #0 { -entry: - ret void -} - -; CHECK: define internal void @"?f3@@YAXXZ"() -define void @"?f3@@YAXXZ"() #0 { -entry: - ret void -} - -; CHECK: define internal void @"?foo@@YAXXZ"() -define void @"?foo@@YAXXZ"() #0 { -entry: - call void @"?f2@@YAXXZ"() #3 - ret void -} - -; Exported function - do not change linkage -; CHECK: define void @"?bar@@YAXXZ"() -define void @"?bar@@YAXXZ"() #1 { -entry: - call void @"?f3@@YAXXZ"() #3 - ret void -} - -; CHECK: define internal void @"?main@@YAXXZ"() #0 -define internal void @"?main@@YAXXZ"() #0 { -entry: - call void @"?foo@@YAXXZ"() #3 - call void @"?bar@@YAXXZ"() #3 - ret void -} - -; Entry point function - do not change linkage -; CHECK: define void @main() #2 -define void @main() #2 { -entry: - call void @"?main@@YAXXZ"() - ret void -} - -attributes #0 = { convergent noinline nounwind optnone} -attributes #1 = { convergent noinline nounwind optnone "hlsl.export"} -attributes #2 = { convergent "hlsl.numthreads"="4,1,1" "hlsl.shader"="compute"} -attributes #3 = { convergent } - -; Make sure "hlsl.export" attribute is stripped by llc -; CHECK-LLC-NOT: "hlsl.export" +; RUN: opt -S -dxil-finalize-linkage -mtriple=dxil-unknown-shadermodel6.5-compute %s | FileCheck %s +; RUN: llc %s --filetype=asm -o - | FileCheck %s --check-prefixes=CHECK-LLC + +target triple = "dxilv1.5-pc-shadermodel6.5-compute" + +; DXILFinalizeLinkage changes linkage of all functions that are not +; entry points or exported function to internal. + +; CHECK: define internal void @"?f1@@YAXXZ"() +define void @"?f1@@YAXXZ"() #0 { +entry: + ret void +} + +; CHECK: define internal void @"?f2@@YAXXZ"() +define void @"?f2@@YAXXZ"() #0 { +entry: + ret void +} + +; CHECK: define internal void @"?f3@@YAXXZ"() +define void @"?f3@@YAXXZ"() #0 { +entry: + ret void +} + +; CHECK: define internal void @"?foo@@YAXXZ"() +define void @"?foo@@YAXXZ"() #0 { +entry: + call void @"?f2@@YAXXZ"() #3 + ret void +} + +; Exported function - do not change linkage +; CHECK: define void @"?bar@@YAXXZ"() +define void @"?bar@@YAXXZ"() #1 { +entry: + call void @"?f3@@YAXXZ"() #3 + ret void +} + +; CHECK: define internal void @"?main@@YAXXZ"() #0 +define internal void @"?main@@YAXXZ"() #0 { +entry: + call void @"?foo@@YAXXZ"() #3 + call void @"?bar@@YAXXZ"() #3 + ret void +} + +; Entry point function - do not change linkage +; CHECK: define void @main() #2 +define void @main() #2 { +entry: + call void @"?main@@YAXXZ"() + ret void +} + +attributes #0 = { convergent noinline nounwind optnone} +attributes #1 = { convergent noinline nounwind optnone "hlsl.export"} +attributes #2 = { convergent "hlsl.numthreads"="4,1,1" "hlsl.shader"="compute"} +attributes #3 = { convergent } + +; Make sure "hlsl.export" attribute is stripped by llc +; CHECK-LLC-NOT: "hlsl.export" diff --git a/llvm/test/CodeGen/DirectX/normalize.ll b/llvm/test/CodeGen/DirectX/normalize.ll index 2aba9d5f74d78e..de106be1243712 100644 --- a/llvm/test/CodeGen/DirectX/normalize.ll +++ b/llvm/test/CodeGen/DirectX/normalize.ll @@ -1,112 +1,112 @@ -; RUN: opt -S -dxil-intrinsic-expansion < %s | FileCheck %s --check-prefixes=CHECK,EXPCHECK -; RUN: opt -S -dxil-intrinsic-expansion -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library < %s | FileCheck %s --check-prefixes=CHECK,DOPCHECK - -; Make sure dxil operation function calls for normalize are generated for half/float. - -declare half @llvm.dx.normalize.f16(half) -declare <2 x half> @llvm.dx.normalize.v2f16(<2 x half>) -declare <3 x half> @llvm.dx.normalize.v3f16(<3 x half>) -declare <4 x half> @llvm.dx.normalize.v4f16(<4 x half>) - -declare float @llvm.dx.normalize.f32(float) -declare <2 x float> @llvm.dx.normalize.v2f32(<2 x float>) -declare <3 x float> @llvm.dx.normalize.v3f32(<3 x float>) -declare <4 x float> @llvm.dx.normalize.v4f32(<4 x float>) - -define noundef half @test_normalize_half(half noundef %p0) { -entry: - ; CHECK: fdiv half %p0, %p0 - %hlsl.normalize = call half @llvm.dx.normalize.f16(half %p0) - ret half %hlsl.normalize -} - -define noundef <2 x half> @test_normalize_half2(<2 x half> noundef %p0) { -entry: - ; EXPCHECK: [[doth2:%.*]] = call half @llvm.dx.dot2.v2f16(<2 x half> %{{.*}}, <2 x half> %{{.*}}) - ; DOPCHECK: [[doth2:%.*]] = call half @dx.op.dot2.f16(i32 54, half %{{.*}}, half %{{.*}}, half %{{.*}}, half %{{.*}}) - ; EXPCHECK: [[rsqrt:%.*]] = call half @llvm.dx.rsqrt.f16(half [[doth2]]) - ; DOPCHECK: [[rsqrt:%.*]] = call half @dx.op.unary.f16(i32 25, half [[doth2]]) - ; CHECK: [[splatinserth2:%.*]] = insertelement <2 x half> poison, half [[rsqrt]], i64 0 - ; CHECK: [[splat:%.*]] = shufflevector <2 x half> [[splatinserth2]], <2 x half> poison, <2 x i32> zeroinitializer - ; CHECK: fmul <2 x half> %p0, [[splat]] - - %hlsl.normalize = call <2 x half> @llvm.dx.normalize.v2f16(<2 x half> %p0) - ret <2 x half> %hlsl.normalize -} - -define noundef <3 x half> @test_normalize_half3(<3 x half> noundef %p0) { -entry: - ; EXPCHECK: [[doth3:%.*]] = call half @llvm.dx.dot3.v3f16(<3 x half> %{{.*}}, <3 x half> %{{.*}}) - ; DOPCHECK: [[doth3:%.*]] = call half @dx.op.dot3.f16(i32 55, half %{{.*}}, half %{{.*}}, half %{{.*}}, half %{{.*}}) - ; EXPCHECK: [[rsqrt:%.*]] = call half @llvm.dx.rsqrt.f16(half [[doth3]]) - ; DOPCHECK: [[rsqrt:%.*]] = call half @dx.op.unary.f16(i32 25, half [[doth3]]) - ; CHECK: [[splatinserth3:%.*]] = insertelement <3 x half> poison, half [[rsqrt]], i64 0 - ; CHECK: [[splat:%.*]] shufflevector <3 x half> [[splatinserth3]], <3 x half> poison, <3 x i32> zeroinitializer - ; CHECK: fmul <3 x half> %p0, %.splat - - %hlsl.normalize = call <3 x half> @llvm.dx.normalize.v3f16(<3 x half> %p0) - ret <3 x half> %hlsl.normalize -} - -define noundef <4 x half> @test_normalize_half4(<4 x half> noundef %p0) { -entry: - ; EXPCHECK: [[doth4:%.*]] = call half @llvm.dx.dot4.v4f16(<4 x half> %{{.*}}, <4 x half> %{{.*}}) - ; DOPCHECK: [[doth4:%.*]] = call half @dx.op.dot4.f16(i32 56, half %{{.*}}, half %{{.*}}, half %{{.*}}, half %{{.*}}) - ; EXPCHECK: [[rsqrt:%.*]] = call half @llvm.dx.rsqrt.f16(half [[doth4]]) - ; DOPCHECK: [[rsqrt:%.*]] = call half @dx.op.unary.f16(i32 25, half [[doth4]]) - ; CHECK: [[splatinserth4:%.*]] = insertelement <4 x half> poison, half [[rsqrt]], i64 0 - ; CHECK: [[splat:%.*]] shufflevector <4 x half> [[splatinserth4]], <4 x half> poison, <4 x i32> zeroinitializer - ; CHECK: fmul <4 x half> %p0, %.splat - - %hlsl.normalize = call <4 x half> @llvm.dx.normalize.v4f16(<4 x half> %p0) - ret <4 x half> %hlsl.normalize -} - -define noundef float @test_normalize_float(float noundef %p0) { -entry: - ; CHECK: fdiv float %p0, %p0 - %hlsl.normalize = call float @llvm.dx.normalize.f32(float %p0) - ret float %hlsl.normalize -} - -define noundef <2 x float> @test_normalize_float2(<2 x float> noundef %p0) { -entry: - ; EXPCHECK: [[dotf2:%.*]] = call float @llvm.dx.dot2.v2f32(<2 x float> %{{.*}}, <2 x float> %{{.*}}) - ; DOPCHECK: [[dotf2:%.*]] = call float @dx.op.dot2.f32(i32 54, float %{{.*}}, float %{{.*}}, float %{{.*}}, float %{{.*}}) - ; EXPCHECK: [[rsqrt:%.*]] = call float @llvm.dx.rsqrt.f32(float [[dotf2]]) - ; DOPCHECK: [[rsqrt:%.*]] = call float @dx.op.unary.f32(i32 25, float [[dotf2]]) - ; CHECK: [[splatinsertf2:%.*]] = insertelement <2 x float> poison, float [[rsqrt]], i64 0 - ; CHECK: [[splat:%.*]] shufflevector <2 x float> [[splatinsertf2]], <2 x float> poison, <2 x i32> zeroinitializer - ; CHECK: fmul <2 x float> %p0, %.splat - - %hlsl.normalize = call <2 x float> @llvm.dx.normalize.v2f32(<2 x float> %p0) - ret <2 x float> %hlsl.normalize -} - -define noundef <3 x float> @test_normalize_float3(<3 x float> noundef %p0) { -entry: - ; EXPCHECK: [[dotf3:%.*]] = call float @llvm.dx.dot3.v3f32(<3 x float> %{{.*}}, <3 x float> %{{.*}}) - ; DOPCHECK: [[dotf3:%.*]] = call float @dx.op.dot3.f32(i32 55, float %{{.*}}, float %{{.*}}, float %{{.*}}, float %{{.*}}) - ; EXPCHECK: [[rsqrt:%.*]] = call float @llvm.dx.rsqrt.f32(float [[dotf3]]) - ; DOPCHECK: [[rsqrt:%.*]] = call float @dx.op.unary.f32(i32 25, float [[dotf3]]) - ; CHECK: [[splatinsertf3:%.*]] = insertelement <3 x float> poison, float [[rsqrt]], i64 0 - ; CHECK: [[splat:%.*]] shufflevector <3 x float> [[splatinsertf3]], <3 x float> poison, <3 x i32> zeroinitializer - ; CHECK: fmul <3 x float> %p0, %.splat - - %hlsl.normalize = call <3 x float> @llvm.dx.normalize.v3f32(<3 x float> %p0) - ret <3 x float> %hlsl.normalize -} - -define noundef <4 x float> @test_normalize_float4(<4 x float> noundef %p0) { -entry: - ; EXPCHECK: [[dotf4:%.*]] = call float @llvm.dx.dot4.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}) - ; DOPCHECK: [[dotf4:%.*]] = call float @dx.op.dot4.f32(i32 56, float %{{.*}}, float %{{.*}}, float %{{.*}}, float %{{.*}}) - ; EXPCHECK: [[rsqrt:%.*]] = call float @llvm.dx.rsqrt.f32(float [[dotf4]]) - ; DOPCHECK: [[rsqrt:%.*]] = call float @dx.op.unary.f32(i32 25, float [[dotf4]]) - ; CHECK: [[splatinsertf4:%.*]] = insertelement <4 x float> poison, float [[rsqrt]], i64 0 - ; CHECK: [[splat:%.*]] shufflevector <4 x float> [[splatinsertf4]], <4 x float> poison, <4 x i32> zeroinitializer - ; CHECK: fmul <4 x float> %p0, %.splat - - %hlsl.normalize = call <4 x float> @llvm.dx.normalize.v4f32(<4 x float> %p0) - ret <4 x float> %hlsl.normalize -} +; RUN: opt -S -dxil-intrinsic-expansion < %s | FileCheck %s --check-prefixes=CHECK,EXPCHECK +; RUN: opt -S -dxil-intrinsic-expansion -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library < %s | FileCheck %s --check-prefixes=CHECK,DOPCHECK + +; Make sure dxil operation function calls for normalize are generated for half/float. + +declare half @llvm.dx.normalize.f16(half) +declare <2 x half> @llvm.dx.normalize.v2f16(<2 x half>) +declare <3 x half> @llvm.dx.normalize.v3f16(<3 x half>) +declare <4 x half> @llvm.dx.normalize.v4f16(<4 x half>) + +declare float @llvm.dx.normalize.f32(float) +declare <2 x float> @llvm.dx.normalize.v2f32(<2 x float>) +declare <3 x float> @llvm.dx.normalize.v3f32(<3 x float>) +declare <4 x float> @llvm.dx.normalize.v4f32(<4 x float>) + +define noundef half @test_normalize_half(half noundef %p0) { +entry: + ; CHECK: fdiv half %p0, %p0 + %hlsl.normalize = call half @llvm.dx.normalize.f16(half %p0) + ret half %hlsl.normalize +} + +define noundef <2 x half> @test_normalize_half2(<2 x half> noundef %p0) { +entry: + ; EXPCHECK: [[doth2:%.*]] = call half @llvm.dx.dot2.v2f16(<2 x half> %{{.*}}, <2 x half> %{{.*}}) + ; DOPCHECK: [[doth2:%.*]] = call half @dx.op.dot2.f16(i32 54, half %{{.*}}, half %{{.*}}, half %{{.*}}, half %{{.*}}) + ; EXPCHECK: [[rsqrt:%.*]] = call half @llvm.dx.rsqrt.f16(half [[doth2]]) + ; DOPCHECK: [[rsqrt:%.*]] = call half @dx.op.unary.f16(i32 25, half [[doth2]]) + ; CHECK: [[splatinserth2:%.*]] = insertelement <2 x half> poison, half [[rsqrt]], i64 0 + ; CHECK: [[splat:%.*]] = shufflevector <2 x half> [[splatinserth2]], <2 x half> poison, <2 x i32> zeroinitializer + ; CHECK: fmul <2 x half> %p0, [[splat]] + + %hlsl.normalize = call <2 x half> @llvm.dx.normalize.v2f16(<2 x half> %p0) + ret <2 x half> %hlsl.normalize +} + +define noundef <3 x half> @test_normalize_half3(<3 x half> noundef %p0) { +entry: + ; EXPCHECK: [[doth3:%.*]] = call half @llvm.dx.dot3.v3f16(<3 x half> %{{.*}}, <3 x half> %{{.*}}) + ; DOPCHECK: [[doth3:%.*]] = call half @dx.op.dot3.f16(i32 55, half %{{.*}}, half %{{.*}}, half %{{.*}}, half %{{.*}}) + ; EXPCHECK: [[rsqrt:%.*]] = call half @llvm.dx.rsqrt.f16(half [[doth3]]) + ; DOPCHECK: [[rsqrt:%.*]] = call half @dx.op.unary.f16(i32 25, half [[doth3]]) + ; CHECK: [[splatinserth3:%.*]] = insertelement <3 x half> poison, half [[rsqrt]], i64 0 + ; CHECK: [[splat:%.*]] shufflevector <3 x half> [[splatinserth3]], <3 x half> poison, <3 x i32> zeroinitializer + ; CHECK: fmul <3 x half> %p0, %.splat + + %hlsl.normalize = call <3 x half> @llvm.dx.normalize.v3f16(<3 x half> %p0) + ret <3 x half> %hlsl.normalize +} + +define noundef <4 x half> @test_normalize_half4(<4 x half> noundef %p0) { +entry: + ; EXPCHECK: [[doth4:%.*]] = call half @llvm.dx.dot4.v4f16(<4 x half> %{{.*}}, <4 x half> %{{.*}}) + ; DOPCHECK: [[doth4:%.*]] = call half @dx.op.dot4.f16(i32 56, half %{{.*}}, half %{{.*}}, half %{{.*}}, half %{{.*}}) + ; EXPCHECK: [[rsqrt:%.*]] = call half @llvm.dx.rsqrt.f16(half [[doth4]]) + ; DOPCHECK: [[rsqrt:%.*]] = call half @dx.op.unary.f16(i32 25, half [[doth4]]) + ; CHECK: [[splatinserth4:%.*]] = insertelement <4 x half> poison, half [[rsqrt]], i64 0 + ; CHECK: [[splat:%.*]] shufflevector <4 x half> [[splatinserth4]], <4 x half> poison, <4 x i32> zeroinitializer + ; CHECK: fmul <4 x half> %p0, %.splat + + %hlsl.normalize = call <4 x half> @llvm.dx.normalize.v4f16(<4 x half> %p0) + ret <4 x half> %hlsl.normalize +} + +define noundef float @test_normalize_float(float noundef %p0) { +entry: + ; CHECK: fdiv float %p0, %p0 + %hlsl.normalize = call float @llvm.dx.normalize.f32(float %p0) + ret float %hlsl.normalize +} + +define noundef <2 x float> @test_normalize_float2(<2 x float> noundef %p0) { +entry: + ; EXPCHECK: [[dotf2:%.*]] = call float @llvm.dx.dot2.v2f32(<2 x float> %{{.*}}, <2 x float> %{{.*}}) + ; DOPCHECK: [[dotf2:%.*]] = call float @dx.op.dot2.f32(i32 54, float %{{.*}}, float %{{.*}}, float %{{.*}}, float %{{.*}}) + ; EXPCHECK: [[rsqrt:%.*]] = call float @llvm.dx.rsqrt.f32(float [[dotf2]]) + ; DOPCHECK: [[rsqrt:%.*]] = call float @dx.op.unary.f32(i32 25, float [[dotf2]]) + ; CHECK: [[splatinsertf2:%.*]] = insertelement <2 x float> poison, float [[rsqrt]], i64 0 + ; CHECK: [[splat:%.*]] shufflevector <2 x float> [[splatinsertf2]], <2 x float> poison, <2 x i32> zeroinitializer + ; CHECK: fmul <2 x float> %p0, %.splat + + %hlsl.normalize = call <2 x float> @llvm.dx.normalize.v2f32(<2 x float> %p0) + ret <2 x float> %hlsl.normalize +} + +define noundef <3 x float> @test_normalize_float3(<3 x float> noundef %p0) { +entry: + ; EXPCHECK: [[dotf3:%.*]] = call float @llvm.dx.dot3.v3f32(<3 x float> %{{.*}}, <3 x float> %{{.*}}) + ; DOPCHECK: [[dotf3:%.*]] = call float @dx.op.dot3.f32(i32 55, float %{{.*}}, float %{{.*}}, float %{{.*}}, float %{{.*}}) + ; EXPCHECK: [[rsqrt:%.*]] = call float @llvm.dx.rsqrt.f32(float [[dotf3]]) + ; DOPCHECK: [[rsqrt:%.*]] = call float @dx.op.unary.f32(i32 25, float [[dotf3]]) + ; CHECK: [[splatinsertf3:%.*]] = insertelement <3 x float> poison, float [[rsqrt]], i64 0 + ; CHECK: [[splat:%.*]] shufflevector <3 x float> [[splatinsertf3]], <3 x float> poison, <3 x i32> zeroinitializer + ; CHECK: fmul <3 x float> %p0, %.splat + + %hlsl.normalize = call <3 x float> @llvm.dx.normalize.v3f32(<3 x float> %p0) + ret <3 x float> %hlsl.normalize +} + +define noundef <4 x float> @test_normalize_float4(<4 x float> noundef %p0) { +entry: + ; EXPCHECK: [[dotf4:%.*]] = call float @llvm.dx.dot4.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}) + ; DOPCHECK: [[dotf4:%.*]] = call float @dx.op.dot4.f32(i32 56, float %{{.*}}, float %{{.*}}, float %{{.*}}, float %{{.*}}) + ; EXPCHECK: [[rsqrt:%.*]] = call float @llvm.dx.rsqrt.f32(float [[dotf4]]) + ; DOPCHECK: [[rsqrt:%.*]] = call float @dx.op.unary.f32(i32 25, float [[dotf4]]) + ; CHECK: [[splatinsertf4:%.*]] = insertelement <4 x float> poison, float [[rsqrt]], i64 0 + ; CHECK: [[splat:%.*]] shufflevector <4 x float> [[splatinsertf4]], <4 x float> poison, <4 x i32> zeroinitializer + ; CHECK: fmul <4 x float> %p0, %.splat + + %hlsl.normalize = call <4 x float> @llvm.dx.normalize.v4f32(<4 x float> %p0) + ret <4 x float> %hlsl.normalize +} diff --git a/llvm/test/CodeGen/DirectX/normalize_error.ll b/llvm/test/CodeGen/DirectX/normalize_error.ll index 35a91c0cdc24df..3041d2ecdd923a 100644 --- a/llvm/test/CodeGen/DirectX/normalize_error.ll +++ b/llvm/test/CodeGen/DirectX/normalize_error.ll @@ -1,10 +1,10 @@ -; RUN: not opt -S -dxil-intrinsic-expansion -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library %s 2>&1 | FileCheck %s - -; DXIL operation normalize does not support double overload type -; CHECK: Cannot create Dot2 operation: Invalid overload type - -define noundef <2 x double> @test_normalize_double2(<2 x double> noundef %p0) { -entry: - %hlsl.normalize = call <2 x double> @llvm.dx.normalize.v2f32(<2 x double> %p0) - ret <2 x double> %hlsl.normalize -} +; RUN: not opt -S -dxil-intrinsic-expansion -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library %s 2>&1 | FileCheck %s + +; DXIL operation normalize does not support double overload type +; CHECK: Cannot create Dot2 operation: Invalid overload type + +define noundef <2 x double> @test_normalize_double2(<2 x double> noundef %p0) { +entry: + %hlsl.normalize = call <2 x double> @llvm.dx.normalize.v2f32(<2 x double> %p0) + ret <2 x double> %hlsl.normalize +} diff --git a/llvm/test/CodeGen/DirectX/step.ll b/llvm/test/CodeGen/DirectX/step.ll index 1c9894026c62ec..6a9b5bf71da899 100644 --- a/llvm/test/CodeGen/DirectX/step.ll +++ b/llvm/test/CodeGen/DirectX/step.ll @@ -1,78 +1,78 @@ -; RUN: opt -S -dxil-intrinsic-expansion < %s | FileCheck %s --check-prefix=CHECK -; RUN: opt -S -dxil-intrinsic-expansion -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library < %s | FileCheck %s --check-prefix=CHECK - -; Make sure dxil operation function calls for step are generated for half/float. - -declare half @llvm.dx.step.f16(half, half) -declare <2 x half> @llvm.dx.step.v2f16(<2 x half>, <2 x half>) -declare <3 x half> @llvm.dx.step.v3f16(<3 x half>, <3 x half>) -declare <4 x half> @llvm.dx.step.v4f16(<4 x half>, <4 x half>) - -declare float @llvm.dx.step.f32(float, float) -declare <2 x float> @llvm.dx.step.v2f32(<2 x float>, <2 x float>) -declare <3 x float> @llvm.dx.step.v3f32(<3 x float>, <3 x float>) -declare <4 x float> @llvm.dx.step.v4f32(<4 x float>, <4 x float>) - -define noundef half @test_step_half(half noundef %p0, half noundef %p1) { -entry: - ; CHECK: %0 = fcmp olt half %p1, %p0 - ; CHECK: %1 = select i1 %0, half 0xH0000, half 0xH3C00 - %hlsl.step = call half @llvm.dx.step.f16(half %p0, half %p1) - ret half %hlsl.step -} - -define noundef <2 x half> @test_step_half2(<2 x half> noundef %p0, <2 x half> noundef %p1) { -entry: - ; CHECK: %0 = fcmp olt <2 x half> %p1, %p0 - ; CHECK: %1 = select <2 x i1> %0, <2 x half> zeroinitializer, <2 x half> - %hlsl.step = call <2 x half> @llvm.dx.step.v2f16(<2 x half> %p0, <2 x half> %p1) - ret <2 x half> %hlsl.step -} - -define noundef <3 x half> @test_step_half3(<3 x half> noundef %p0, <3 x half> noundef %p1) { -entry: - ; CHECK: %0 = fcmp olt <3 x half> %p1, %p0 - ; CHECK: %1 = select <3 x i1> %0, <3 x half> zeroinitializer, <3 x half> - %hlsl.step = call <3 x half> @llvm.dx.step.v3f16(<3 x half> %p0, <3 x half> %p1) - ret <3 x half> %hlsl.step -} - -define noundef <4 x half> @test_step_half4(<4 x half> noundef %p0, <4 x half> noundef %p1) { -entry: - ; CHECK: %0 = fcmp olt <4 x half> %p1, %p0 - ; CHECK: %1 = select <4 x i1> %0, <4 x half> zeroinitializer, <4 x half> - %hlsl.step = call <4 x half> @llvm.dx.step.v4f16(<4 x half> %p0, <4 x half> %p1) - ret <4 x half> %hlsl.step -} - -define noundef float @test_step_float(float noundef %p0, float noundef %p1) { -entry: - ; CHECK: %0 = fcmp olt float %p1, %p0 - ; CHECK: %1 = select i1 %0, float 0.000000e+00, float 1.000000e+00 - %hlsl.step = call float @llvm.dx.step.f32(float %p0, float %p1) - ret float %hlsl.step -} - -define noundef <2 x float> @test_step_float2(<2 x float> noundef %p0, <2 x float> noundef %p1) { -entry: - ; CHECK: %0 = fcmp olt <2 x float> %p1, %p0 - ; CHECK: %1 = select <2 x i1> %0, <2 x float> zeroinitializer, <2 x float> - %hlsl.step = call <2 x float> @llvm.dx.step.v2f32(<2 x float> %p0, <2 x float> %p1) - ret <2 x float> %hlsl.step -} - -define noundef <3 x float> @test_step_float3(<3 x float> noundef %p0, <3 x float> noundef %p1) { -entry: - ; CHECK: %0 = fcmp olt <3 x float> %p1, %p0 - ; CHECK: %1 = select <3 x i1> %0, <3 x float> zeroinitializer, <3 x float> - %hlsl.step = call <3 x float> @llvm.dx.step.v3f32(<3 x float> %p0, <3 x float> %p1) - ret <3 x float> %hlsl.step -} - -define noundef <4 x float> @test_step_float4(<4 x float> noundef %p0, <4 x float> noundef %p1) { -entry: - ; CHECK: %0 = fcmp olt <4 x float> %p1, %p0 - ; CHECK: %1 = select <4 x i1> %0, <4 x float> zeroinitializer, <4 x float> - %hlsl.step = call <4 x float> @llvm.dx.step.v4f32(<4 x float> %p0, <4 x float> %p1) - ret <4 x float> %hlsl.step -} +; RUN: opt -S -dxil-intrinsic-expansion < %s | FileCheck %s --check-prefix=CHECK +; RUN: opt -S -dxil-intrinsic-expansion -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library < %s | FileCheck %s --check-prefix=CHECK + +; Make sure dxil operation function calls for step are generated for half/float. + +declare half @llvm.dx.step.f16(half, half) +declare <2 x half> @llvm.dx.step.v2f16(<2 x half>, <2 x half>) +declare <3 x half> @llvm.dx.step.v3f16(<3 x half>, <3 x half>) +declare <4 x half> @llvm.dx.step.v4f16(<4 x half>, <4 x half>) + +declare float @llvm.dx.step.f32(float, float) +declare <2 x float> @llvm.dx.step.v2f32(<2 x float>, <2 x float>) +declare <3 x float> @llvm.dx.step.v3f32(<3 x float>, <3 x float>) +declare <4 x float> @llvm.dx.step.v4f32(<4 x float>, <4 x float>) + +define noundef half @test_step_half(half noundef %p0, half noundef %p1) { +entry: + ; CHECK: %0 = fcmp olt half %p1, %p0 + ; CHECK: %1 = select i1 %0, half 0xH0000, half 0xH3C00 + %hlsl.step = call half @llvm.dx.step.f16(half %p0, half %p1) + ret half %hlsl.step +} + +define noundef <2 x half> @test_step_half2(<2 x half> noundef %p0, <2 x half> noundef %p1) { +entry: + ; CHECK: %0 = fcmp olt <2 x half> %p1, %p0 + ; CHECK: %1 = select <2 x i1> %0, <2 x half> zeroinitializer, <2 x half> + %hlsl.step = call <2 x half> @llvm.dx.step.v2f16(<2 x half> %p0, <2 x half> %p1) + ret <2 x half> %hlsl.step +} + +define noundef <3 x half> @test_step_half3(<3 x half> noundef %p0, <3 x half> noundef %p1) { +entry: + ; CHECK: %0 = fcmp olt <3 x half> %p1, %p0 + ; CHECK: %1 = select <3 x i1> %0, <3 x half> zeroinitializer, <3 x half> + %hlsl.step = call <3 x half> @llvm.dx.step.v3f16(<3 x half> %p0, <3 x half> %p1) + ret <3 x half> %hlsl.step +} + +define noundef <4 x half> @test_step_half4(<4 x half> noundef %p0, <4 x half> noundef %p1) { +entry: + ; CHECK: %0 = fcmp olt <4 x half> %p1, %p0 + ; CHECK: %1 = select <4 x i1> %0, <4 x half> zeroinitializer, <4 x half> + %hlsl.step = call <4 x half> @llvm.dx.step.v4f16(<4 x half> %p0, <4 x half> %p1) + ret <4 x half> %hlsl.step +} + +define noundef float @test_step_float(float noundef %p0, float noundef %p1) { +entry: + ; CHECK: %0 = fcmp olt float %p1, %p0 + ; CHECK: %1 = select i1 %0, float 0.000000e+00, float 1.000000e+00 + %hlsl.step = call float @llvm.dx.step.f32(float %p0, float %p1) + ret float %hlsl.step +} + +define noundef <2 x float> @test_step_float2(<2 x float> noundef %p0, <2 x float> noundef %p1) { +entry: + ; CHECK: %0 = fcmp olt <2 x float> %p1, %p0 + ; CHECK: %1 = select <2 x i1> %0, <2 x float> zeroinitializer, <2 x float> + %hlsl.step = call <2 x float> @llvm.dx.step.v2f32(<2 x float> %p0, <2 x float> %p1) + ret <2 x float> %hlsl.step +} + +define noundef <3 x float> @test_step_float3(<3 x float> noundef %p0, <3 x float> noundef %p1) { +entry: + ; CHECK: %0 = fcmp olt <3 x float> %p1, %p0 + ; CHECK: %1 = select <3 x i1> %0, <3 x float> zeroinitializer, <3 x float> + %hlsl.step = call <3 x float> @llvm.dx.step.v3f32(<3 x float> %p0, <3 x float> %p1) + ret <3 x float> %hlsl.step +} + +define noundef <4 x float> @test_step_float4(<4 x float> noundef %p0, <4 x float> noundef %p1) { +entry: + ; CHECK: %0 = fcmp olt <4 x float> %p1, %p0 + ; CHECK: %1 = select <4 x i1> %0, <4 x float> zeroinitializer, <4 x float> + %hlsl.step = call <4 x float> @llvm.dx.step.v4f32(<4 x float> %p0, <4 x float> %p1) + ret <4 x float> %hlsl.step +} diff --git a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/atan2.ll b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/atan2.ll index bdbfc133efa29b..a0306bae4a22de 100644 --- a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/atan2.ll +++ b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/atan2.ll @@ -1,49 +1,49 @@ -; RUN: llc -verify-machineinstrs -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s -; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} - -; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" -; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 -; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 -; CHECK-DAG: %[[#vec4_float_32:]] = OpTypeVector %[[#float_32]] 4 -; CHECK-DAG: %[[#vec4_float_16:]] = OpTypeVector %[[#float_16]] 4 - -define noundef float @atan2_float(float noundef %a, float noundef %b) { -entry: -; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] -; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#]] -; CHECK: %[[#]] = OpExtInst %[[#float_32]] %[[#op_ext_glsl]] Atan2 %[[#arg0]] %[[#arg1]] - %elt.atan2 = call float @llvm.atan2.f32(float %a, float %b) - ret float %elt.atan2 -} - -define noundef half @atan2_half(half noundef %a, half noundef %b) { -entry: -; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] -; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#]] -; CHECK: %[[#]] = OpExtInst %[[#float_16]] %[[#op_ext_glsl]] Atan2 %[[#arg0]] %[[#arg1]] - %elt.atan2 = call half @llvm.atan2.f16(half %a, half %b) - ret half %elt.atan2 -} - -define noundef <4 x float> @atan2_float4(<4 x float> noundef %a, <4 x float> noundef %b) { -entry: - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] - ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#]] - ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_32]] %[[#op_ext_glsl]] Atan2 %[[#arg0]] %[[#arg1]] - %elt.atan2 = call <4 x float> @llvm.atan2.v4f32(<4 x float> %a, <4 x float> %b) - ret <4 x float> %elt.atan2 -} - -define noundef <4 x half> @atan2_half4(<4 x half> noundef %a, <4 x half> noundef %b) { -entry: - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] - ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#]] - ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_16]] %[[#op_ext_glsl]] Atan2 %[[#arg0]] %[[#arg1]] - %elt.atan2 = call <4 x half> @llvm.atan2.v4f16(<4 x half> %a, <4 x half> %b) - ret <4 x half> %elt.atan2 -} - -declare half @llvm.atan2.f16(half, half) -declare float @llvm.atan2.f32(float, float) -declare <4 x half> @llvm.atan2.v4f16(<4 x half>, <4 x half>) -declare <4 x float> @llvm.atan2.v4f32(<4 x float>, <4 x float>) +; RUN: llc -verify-machineinstrs -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s +; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} + +; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" +; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 +; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 +; CHECK-DAG: %[[#vec4_float_32:]] = OpTypeVector %[[#float_32]] 4 +; CHECK-DAG: %[[#vec4_float_16:]] = OpTypeVector %[[#float_16]] 4 + +define noundef float @atan2_float(float noundef %a, float noundef %b) { +entry: +; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] +; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#]] +; CHECK: %[[#]] = OpExtInst %[[#float_32]] %[[#op_ext_glsl]] Atan2 %[[#arg0]] %[[#arg1]] + %elt.atan2 = call float @llvm.atan2.f32(float %a, float %b) + ret float %elt.atan2 +} + +define noundef half @atan2_half(half noundef %a, half noundef %b) { +entry: +; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] +; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#]] +; CHECK: %[[#]] = OpExtInst %[[#float_16]] %[[#op_ext_glsl]] Atan2 %[[#arg0]] %[[#arg1]] + %elt.atan2 = call half @llvm.atan2.f16(half %a, half %b) + ret half %elt.atan2 +} + +define noundef <4 x float> @atan2_float4(<4 x float> noundef %a, <4 x float> noundef %b) { +entry: + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] + ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#]] + ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_32]] %[[#op_ext_glsl]] Atan2 %[[#arg0]] %[[#arg1]] + %elt.atan2 = call <4 x float> @llvm.atan2.v4f32(<4 x float> %a, <4 x float> %b) + ret <4 x float> %elt.atan2 +} + +define noundef <4 x half> @atan2_half4(<4 x half> noundef %a, <4 x half> noundef %b) { +entry: + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] + ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#]] + ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_16]] %[[#op_ext_glsl]] Atan2 %[[#arg0]] %[[#arg1]] + %elt.atan2 = call <4 x half> @llvm.atan2.v4f16(<4 x half> %a, <4 x half> %b) + ret <4 x half> %elt.atan2 +} + +declare half @llvm.atan2.f16(half, half) +declare float @llvm.atan2.f32(float, float) +declare <4 x half> @llvm.atan2.v4f16(<4 x half>, <4 x half>) +declare <4 x float> @llvm.atan2.v4f32(<4 x float>, <4 x float>) diff --git a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/cross.ll b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/cross.ll index 2e0eb8c429ac27..7c06c14bb968d1 100644 --- a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/cross.ll +++ b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/cross.ll @@ -1,33 +1,33 @@ -; RUN: llc -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s -; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} - -; Make sure SPIRV operation function calls for cross are lowered correctly. - -; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" -; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 -; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 -; CHECK-DAG: %[[#vec3_float_16:]] = OpTypeVector %[[#float_16]] 3 -; CHECK-DAG: %[[#vec3_float_32:]] = OpTypeVector %[[#float_32]] 3 - -define noundef <3 x half> @cross_half4(<3 x half> noundef %a, <3 x half> noundef %b) { -entry: - ; CHECK: %[[#]] = OpFunction %[[#vec3_float_16]] None %[[#]] - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec3_float_16]] - ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#vec3_float_16]] - ; CHECK: %[[#]] = OpExtInst %[[#vec3_float_16]] %[[#op_ext_glsl]] Cross %[[#arg0]] %[[#arg1]] - %hlsl.cross = call <3 x half> @llvm.spv.cross.v4f16(<3 x half> %a, <3 x half> %b) - ret <3 x half> %hlsl.cross -} - -define noundef <3 x float> @cross_float4(<3 x float> noundef %a, <3 x float> noundef %b) { -entry: - ; CHECK: %[[#]] = OpFunction %[[#vec3_float_32]] None %[[#]] - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec3_float_32]] - ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#vec3_float_32]] - ; CHECK: %[[#]] = OpExtInst %[[#vec3_float_32]] %[[#op_ext_glsl]] Cross %[[#arg0]] %[[#arg1]] - %hlsl.cross = call <3 x float> @llvm.spv.cross.v4f32(<3 x float> %a, <3 x float> %b) - ret <3 x float> %hlsl.cross -} - -declare <3 x half> @llvm.spv.cross.v4f16(<3 x half>, <3 x half>) -declare <3 x float> @llvm.spv.cross.v4f32(<3 x float>, <3 x float>) +; RUN: llc -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s +; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} + +; Make sure SPIRV operation function calls for cross are lowered correctly. + +; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" +; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 +; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 +; CHECK-DAG: %[[#vec3_float_16:]] = OpTypeVector %[[#float_16]] 3 +; CHECK-DAG: %[[#vec3_float_32:]] = OpTypeVector %[[#float_32]] 3 + +define noundef <3 x half> @cross_half4(<3 x half> noundef %a, <3 x half> noundef %b) { +entry: + ; CHECK: %[[#]] = OpFunction %[[#vec3_float_16]] None %[[#]] + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec3_float_16]] + ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#vec3_float_16]] + ; CHECK: %[[#]] = OpExtInst %[[#vec3_float_16]] %[[#op_ext_glsl]] Cross %[[#arg0]] %[[#arg1]] + %hlsl.cross = call <3 x half> @llvm.spv.cross.v4f16(<3 x half> %a, <3 x half> %b) + ret <3 x half> %hlsl.cross +} + +define noundef <3 x float> @cross_float4(<3 x float> noundef %a, <3 x float> noundef %b) { +entry: + ; CHECK: %[[#]] = OpFunction %[[#vec3_float_32]] None %[[#]] + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec3_float_32]] + ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#vec3_float_32]] + ; CHECK: %[[#]] = OpExtInst %[[#vec3_float_32]] %[[#op_ext_glsl]] Cross %[[#arg0]] %[[#arg1]] + %hlsl.cross = call <3 x float> @llvm.spv.cross.v4f32(<3 x float> %a, <3 x float> %b) + ret <3 x float> %hlsl.cross +} + +declare <3 x half> @llvm.spv.cross.v4f16(<3 x half>, <3 x half>) +declare <3 x float> @llvm.spv.cross.v4f32(<3 x float>, <3 x float>) diff --git a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/length.ll b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/length.ll index b4a9d8e0664b7e..df1ef3a7287c3b 100644 --- a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/length.ll +++ b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/length.ll @@ -1,29 +1,29 @@ -; RUN: llc -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s -; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} - -; Make sure SPIRV operation function calls for length are lowered correctly. - -; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" -; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 -; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 -; CHECK-DAG: %[[#vec4_float_16:]] = OpTypeVector %[[#float_16]] 4 -; CHECK-DAG: %[[#vec4_float_32:]] = OpTypeVector %[[#float_32]] 4 - -define noundef half @length_half4(<4 x half> noundef %a) { -entry: - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] - ; CHECK: %[[#]] = OpExtInst %[[#float_16]] %[[#op_ext_glsl]] Length %[[#arg0]] - %hlsl.length = call half @llvm.spv.length.v4f16(<4 x half> %a) - ret half %hlsl.length -} - -define noundef float @length_float4(<4 x float> noundef %a) { -entry: - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] - ; CHECK: %[[#]] = OpExtInst %[[#float_32]] %[[#op_ext_glsl]] Length %[[#arg0]] - %hlsl.length = call float @llvm.spv.length.v4f32(<4 x float> %a) - ret float %hlsl.length -} - -declare half @llvm.spv.length.v4f16(<4 x half>) -declare float @llvm.spv.length.v4f32(<4 x float>) +; RUN: llc -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s +; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} + +; Make sure SPIRV operation function calls for length are lowered correctly. + +; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" +; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 +; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 +; CHECK-DAG: %[[#vec4_float_16:]] = OpTypeVector %[[#float_16]] 4 +; CHECK-DAG: %[[#vec4_float_32:]] = OpTypeVector %[[#float_32]] 4 + +define noundef half @length_half4(<4 x half> noundef %a) { +entry: + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] + ; CHECK: %[[#]] = OpExtInst %[[#float_16]] %[[#op_ext_glsl]] Length %[[#arg0]] + %hlsl.length = call half @llvm.spv.length.v4f16(<4 x half> %a) + ret half %hlsl.length +} + +define noundef float @length_float4(<4 x float> noundef %a) { +entry: + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] + ; CHECK: %[[#]] = OpExtInst %[[#float_32]] %[[#op_ext_glsl]] Length %[[#arg0]] + %hlsl.length = call float @llvm.spv.length.v4f32(<4 x float> %a) + ret float %hlsl.length +} + +declare half @llvm.spv.length.v4f16(<4 x half>) +declare float @llvm.spv.length.v4f32(<4 x float>) diff --git a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/normalize.ll b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/normalize.ll index fa73b9c2a4d3ab..4659b5146e4327 100644 --- a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/normalize.ll +++ b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/normalize.ll @@ -1,31 +1,31 @@ -; RUN: llc -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s -; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} - -; Make sure SPIRV operation function calls for normalize are lowered correctly. - -; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" -; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 -; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 -; CHECK-DAG: %[[#vec4_float_16:]] = OpTypeVector %[[#float_16]] 4 -; CHECK-DAG: %[[#vec4_float_32:]] = OpTypeVector %[[#float_32]] 4 - -define noundef <4 x half> @normalize_half4(<4 x half> noundef %a) { -entry: - ; CHECK: %[[#]] = OpFunction %[[#vec4_float_16]] None %[[#]] - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec4_float_16]] - ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_16]] %[[#op_ext_glsl]] Normalize %[[#arg0]] - %hlsl.normalize = call <4 x half> @llvm.spv.normalize.v4f16(<4 x half> %a) - ret <4 x half> %hlsl.normalize -} - -define noundef <4 x float> @normalize_float4(<4 x float> noundef %a) { -entry: - ; CHECK: %[[#]] = OpFunction %[[#vec4_float_32]] None %[[#]] - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec4_float_32]] - ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_32]] %[[#op_ext_glsl]] Normalize %[[#arg0]] - %hlsl.normalize = call <4 x float> @llvm.spv.normalize.v4f32(<4 x float> %a) - ret <4 x float> %hlsl.normalize -} - -declare <4 x half> @llvm.spv.normalize.v4f16(<4 x half>) -declare <4 x float> @llvm.spv.normalize.v4f32(<4 x float>) +; RUN: llc -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s +; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} + +; Make sure SPIRV operation function calls for normalize are lowered correctly. + +; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" +; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 +; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 +; CHECK-DAG: %[[#vec4_float_16:]] = OpTypeVector %[[#float_16]] 4 +; CHECK-DAG: %[[#vec4_float_32:]] = OpTypeVector %[[#float_32]] 4 + +define noundef <4 x half> @normalize_half4(<4 x half> noundef %a) { +entry: + ; CHECK: %[[#]] = OpFunction %[[#vec4_float_16]] None %[[#]] + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec4_float_16]] + ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_16]] %[[#op_ext_glsl]] Normalize %[[#arg0]] + %hlsl.normalize = call <4 x half> @llvm.spv.normalize.v4f16(<4 x half> %a) + ret <4 x half> %hlsl.normalize +} + +define noundef <4 x float> @normalize_float4(<4 x float> noundef %a) { +entry: + ; CHECK: %[[#]] = OpFunction %[[#vec4_float_32]] None %[[#]] + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec4_float_32]] + ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_32]] %[[#op_ext_glsl]] Normalize %[[#arg0]] + %hlsl.normalize = call <4 x float> @llvm.spv.normalize.v4f32(<4 x float> %a) + ret <4 x float> %hlsl.normalize +} + +declare <4 x half> @llvm.spv.normalize.v4f16(<4 x half>) +declare <4 x float> @llvm.spv.normalize.v4f32(<4 x float>) diff --git a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/step.ll b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/step.ll index bb50d8c790f8ad..7c0ee9398d15fc 100644 --- a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/step.ll +++ b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/step.ll @@ -1,33 +1,33 @@ -; RUN: llc -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s -; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} - -; Make sure SPIRV operation function calls for step are lowered correctly. - -; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" -; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 -; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 -; CHECK-DAG: %[[#vec4_float_16:]] = OpTypeVector %[[#float_16]] 4 -; CHECK-DAG: %[[#vec4_float_32:]] = OpTypeVector %[[#float_32]] 4 - -define noundef <4 x half> @step_half4(<4 x half> noundef %a, <4 x half> noundef %b) { -entry: - ; CHECK: %[[#]] = OpFunction %[[#vec4_float_16]] None %[[#]] - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec4_float_16]] - ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#vec4_float_16]] - ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_16]] %[[#op_ext_glsl]] Step %[[#arg0]] %[[#arg1]] - %hlsl.step = call <4 x half> @llvm.spv.step.v4f16(<4 x half> %a, <4 x half> %b) - ret <4 x half> %hlsl.step -} - -define noundef <4 x float> @step_float4(<4 x float> noundef %a, <4 x float> noundef %b) { -entry: - ; CHECK: %[[#]] = OpFunction %[[#vec4_float_32]] None %[[#]] - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec4_float_32]] - ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#vec4_float_32]] - ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_32]] %[[#op_ext_glsl]] Step %[[#arg0]] %[[#arg1]] - %hlsl.step = call <4 x float> @llvm.spv.step.v4f32(<4 x float> %a, <4 x float> %b) - ret <4 x float> %hlsl.step -} - -declare <4 x half> @llvm.spv.step.v4f16(<4 x half>, <4 x half>) -declare <4 x float> @llvm.spv.step.v4f32(<4 x float>, <4 x float>) +; RUN: llc -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s +; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} + +; Make sure SPIRV operation function calls for step are lowered correctly. + +; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" +; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 +; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 +; CHECK-DAG: %[[#vec4_float_16:]] = OpTypeVector %[[#float_16]] 4 +; CHECK-DAG: %[[#vec4_float_32:]] = OpTypeVector %[[#float_32]] 4 + +define noundef <4 x half> @step_half4(<4 x half> noundef %a, <4 x half> noundef %b) { +entry: + ; CHECK: %[[#]] = OpFunction %[[#vec4_float_16]] None %[[#]] + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec4_float_16]] + ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#vec4_float_16]] + ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_16]] %[[#op_ext_glsl]] Step %[[#arg0]] %[[#arg1]] + %hlsl.step = call <4 x half> @llvm.spv.step.v4f16(<4 x half> %a, <4 x half> %b) + ret <4 x half> %hlsl.step +} + +define noundef <4 x float> @step_float4(<4 x float> noundef %a, <4 x float> noundef %b) { +entry: + ; CHECK: %[[#]] = OpFunction %[[#vec4_float_32]] None %[[#]] + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec4_float_32]] + ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#vec4_float_32]] + ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_32]] %[[#op_ext_glsl]] Step %[[#arg0]] %[[#arg1]] + %hlsl.step = call <4 x float> @llvm.spv.step.v4f32(<4 x float> %a, <4 x float> %b) + ret <4 x float> %hlsl.step +} + +declare <4 x half> @llvm.spv.step.v4f16(<4 x half>, <4 x half>) +declare <4 x float> @llvm.spv.step.v4f32(<4 x float>, <4 x float>) diff --git a/llvm/test/Demangle/ms-placeholder-return-type.test b/llvm/test/Demangle/ms-placeholder-return-type.test index 18038e636c8d5a..a656400fe140fb 100644 --- a/llvm/test/Demangle/ms-placeholder-return-type.test +++ b/llvm/test/Demangle/ms-placeholder-return-type.test @@ -1,18 +1,18 @@ -; RUN: llvm-undname < %s | FileCheck %s - -; CHECK-NOT: Invalid mangled name - -?TestNonTemplateAuto@@YA at XZ -; CHECK: __cdecl TestNonTemplateAuto(void) - -??$AutoT at X@@YA?A_PXZ -; CHECK: auto __cdecl AutoT(void) - -??$AutoT at X@@YA?B_PXZ -; CHECK: auto const __cdecl AutoT(void) - -??$AutoT at X@@YA?A_TXZ -; CHECK: decltype(auto) __cdecl AutoT(void) - -??$AutoT at X@@YA?B_TXZ -; CHECK: decltype(auto) const __cdecl AutoT(void) +; RUN: llvm-undname < %s | FileCheck %s + +; CHECK-NOT: Invalid mangled name + +?TestNonTemplateAuto@@YA at XZ +; CHECK: __cdecl TestNonTemplateAuto(void) + +??$AutoT at X@@YA?A_PXZ +; CHECK: auto __cdecl AutoT(void) + +??$AutoT at X@@YA?B_PXZ +; CHECK: auto const __cdecl AutoT(void) + +??$AutoT at X@@YA?A_TXZ +; CHECK: decltype(auto) __cdecl AutoT(void) + +??$AutoT at X@@YA?B_TXZ +; CHECK: decltype(auto) const __cdecl AutoT(void) diff --git a/llvm/test/FileCheck/dos-style-eol.txt b/llvm/test/FileCheck/dos-style-eol.txt index 4252aad4d3e7bf..52184f465c3fdf 100644 --- a/llvm/test/FileCheck/dos-style-eol.txt +++ b/llvm/test/FileCheck/dos-style-eol.txt @@ -1,11 +1,11 @@ -// Test for using FileCheck on DOS style end-of-line -// This test was deliberately committed with DOS style end of line. -// Don't change line endings! -// RUN: FileCheck -input-file %s %s -// RUN: FileCheck --strict-whitespace -input-file %s %s - -LINE 1 -; CHECK: {{^}}LINE 1{{$}} - -LINE 2 +// Test for using FileCheck on DOS style end-of-line +// This test was deliberately committed with DOS style end of line. +// Don't change line endings! +// RUN: FileCheck -input-file %s %s +// RUN: FileCheck --strict-whitespace -input-file %s %s + +LINE 1 +; CHECK: {{^}}LINE 1{{$}} + +LINE 2 ; CHECK: {{^}}LINE 2{{$}} \ No newline at end of file diff --git a/llvm/test/tools/llvm-ar/Inputs/mri-crlf.mri b/llvm/test/tools/llvm-ar/Inputs/mri-crlf.mri index 72d23d041ae807..857c4ff87b6cf8 100644 --- a/llvm/test/tools/llvm-ar/Inputs/mri-crlf.mri +++ b/llvm/test/tools/llvm-ar/Inputs/mri-crlf.mri @@ -1,4 +1,4 @@ -; this file intentionally has crlf line endings -create crlf.a -addmod foo.txt -end +; this file intentionally has crlf line endings +create crlf.a +addmod foo.txt +end diff --git a/llvm/test/tools/llvm-cvtres/Inputs/languages.rc b/llvm/test/tools/llvm-cvtres/Inputs/languages.rc index 081b3a77bebc10..82031d0e208395 100644 --- a/llvm/test/tools/llvm-cvtres/Inputs/languages.rc +++ b/llvm/test/tools/llvm-cvtres/Inputs/languages.rc @@ -1,36 +1,36 @@ -#include "windows.h" - -LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US -randomdat RCDATA -{ - "this is a random bit of data that means nothing\0", - 0x23a9, - 0x140e, - 194292, -} - -LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED -randomdat RCDATA -{ - "zhe4 shi4 yi1ge4 sui2ji1 de shu4ju4, zhe4 yi4wei4zhe shen2me\0", - 0x23a9, - 0x140e, - 194292, -} - -LANGUAGE LANG_GERMAN, SUBLANG_GERMAN_LUXEMBOURG -randomdat RCDATA -{ - "Dies ist ein zufälliges Bit von Daten, die nichts bedeutet\0", - 0x23a9, - 0x140e, - 194292, -} - -LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED -myaccelerators ACCELERATORS -{ - "^C", 999, VIRTKEY, ALT - "D", 1100, VIRTKEY, CONTROL, SHIFT - "^R", 444, ASCII, NOINVERT -} +#include "windows.h" + +LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US +randomdat RCDATA +{ + "this is a random bit of data that means nothing\0", + 0x23a9, + 0x140e, + 194292, +} + +LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED +randomdat RCDATA +{ + "zhe4 shi4 yi1ge4 sui2ji1 de shu4ju4, zhe4 yi4wei4zhe shen2me\0", + 0x23a9, + 0x140e, + 194292, +} + +LANGUAGE LANG_GERMAN, SUBLANG_GERMAN_LUXEMBOURG +randomdat RCDATA +{ + "Dies ist ein zufälliges Bit von Daten, die nichts bedeutet\0", + 0x23a9, + 0x140e, + 194292, +} + +LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED +myaccelerators ACCELERATORS +{ + "^C", 999, VIRTKEY, ALT + "D", 1100, VIRTKEY, CONTROL, SHIFT + "^R", 444, ASCII, NOINVERT +} diff --git a/llvm/test/tools/llvm-cvtres/Inputs/test_resource.rc b/llvm/test/tools/llvm-cvtres/Inputs/test_resource.rc index 5ca097baa0f736..494849f57a0a9e 100644 --- a/llvm/test/tools/llvm-cvtres/Inputs/test_resource.rc +++ b/llvm/test/tools/llvm-cvtres/Inputs/test_resource.rc @@ -1,50 +1,50 @@ -#include "windows.h" - -LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US - -myaccelerators ACCELERATORS -{ - "^C", 999, VIRTKEY, ALT - "D", 1100, VIRTKEY, CONTROL, SHIFT - "^R", 444, ASCII, NOINVERT -} - -cursor BITMAP "cursor_small.bmp" -okay BITMAP "okay_small.bmp" - -14432 MENU -LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED -{ - MENUITEM "yu", 100 - MENUITEM "shala", 101 - MENUITEM "kaoya", 102 -} - -testdialog DIALOG 10, 10, 200, 300 -STYLE WS_POPUP | WS_BORDER -CAPTION "Test" -{ - CTEXT "Continue:", 1, 10, 10, 230, 14 - PUSHBUTTON "&OK", 2, 66, 134, 161, 13 -} - -12 ACCELERATORS -{ - "X", 164, VIRTKEY, ALT - "H", 5678, VIRTKEY, CONTROL, SHIFT - "^R", 444, ASCII, NOINVERT -} - -"eat" MENU -LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_AUS -{ - MENUITEM "fish", 100 - MENUITEM "salad", 101 - MENUITEM "duck", 102 -} - - -myresource stringarray { - "this is a user defined resource\0", - "it contains many strings\0", +#include "windows.h" + +LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US + +myaccelerators ACCELERATORS +{ + "^C", 999, VIRTKEY, ALT + "D", 1100, VIRTKEY, CONTROL, SHIFT + "^R", 444, ASCII, NOINVERT +} + +cursor BITMAP "cursor_small.bmp" +okay BITMAP "okay_small.bmp" + +14432 MENU +LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED +{ + MENUITEM "yu", 100 + MENUITEM "shala", 101 + MENUITEM "kaoya", 102 +} + +testdialog DIALOG 10, 10, 200, 300 +STYLE WS_POPUP | WS_BORDER +CAPTION "Test" +{ + CTEXT "Continue:", 1, 10, 10, 230, 14 + PUSHBUTTON "&OK", 2, 66, 134, 161, 13 +} + +12 ACCELERATORS +{ + "X", 164, VIRTKEY, ALT + "H", 5678, VIRTKEY, CONTROL, SHIFT + "^R", 444, ASCII, NOINVERT +} + +"eat" MENU +LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_AUS +{ + MENUITEM "fish", 100 + MENUITEM "salad", 101 + MENUITEM "duck", 102 +} + + +myresource stringarray { + "this is a user defined resource\0", + "it contains many strings\0", } \ No newline at end of file diff --git a/llvm/test/tools/llvm-rc/Inputs/dialog-with-menu.rc b/llvm/test/tools/llvm-rc/Inputs/dialog-with-menu.rc index bb79dca399c219..c700b587af6483 100644 --- a/llvm/test/tools/llvm-rc/Inputs/dialog-with-menu.rc +++ b/llvm/test/tools/llvm-rc/Inputs/dialog-with-menu.rc @@ -1,16 +1,16 @@ -101 DIALOG 0, 0, 362, 246 -STYLE 0x40l | 0x0004l | 0x0008l | 0x0800l | 0x00020000l | - 0x00010000l | 0x80000000l | 0x10000000l | 0x02000000l | 0x00C00000l | - 0x00080000l | 0x00040000l -CAPTION "MakeNSISW" -MENU 104 -FONT 8, "MS Shell Dlg" -BEGIN - CONTROL "",202,"RichEdit20A",0x0004l | 0x0040l | - 0x0100l | 0x0800l | 0x00008000 | - 0x00010000l | 0x00800000l | 0x00200000l,7,22,348,190 - CONTROL "",-1,"Static",0x00000010l,7,220,346,1 - LTEXT "",200,7,230,200,12,0x08000000l - DEFPUSHBUTTON "Test &Installer",203,230,226,60,15,0x08000000l | 0x00010000l - PUSHBUTTON "&Close",2,296,226,49,15,0x00010000l -END +101 DIALOG 0, 0, 362, 246 +STYLE 0x40l | 0x0004l | 0x0008l | 0x0800l | 0x00020000l | + 0x00010000l | 0x80000000l | 0x10000000l | 0x02000000l | 0x00C00000l | + 0x00080000l | 0x00040000l +CAPTION "MakeNSISW" +MENU 104 +FONT 8, "MS Shell Dlg" +BEGIN + CONTROL "",202,"RichEdit20A",0x0004l | 0x0040l | + 0x0100l | 0x0800l | 0x00008000 | + 0x00010000l | 0x00800000l | 0x00200000l,7,22,348,190 + CONTROL "",-1,"Static",0x00000010l,7,220,346,1 + LTEXT "",200,7,230,200,12,0x08000000l + DEFPUSHBUTTON "Test &Installer",203,230,226,60,15,0x08000000l | 0x00010000l + PUSHBUTTON "&Close",2,296,226,49,15,0x00010000l +END diff --git a/llvm/test/tools/llvm-readobj/COFF/Inputs/resources/test_resource.rc b/llvm/test/tools/llvm-readobj/COFF/Inputs/resources/test_resource.rc index fd616520dbe1b3..6ad56bc02d73ca 100644 --- a/llvm/test/tools/llvm-readobj/COFF/Inputs/resources/test_resource.rc +++ b/llvm/test/tools/llvm-readobj/COFF/Inputs/resources/test_resource.rc @@ -1,44 +1,44 @@ -#include "windows.h" - -LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US - -myaccelerators ACCELERATORS -{ - "^C", 999, VIRTKEY, ALT - "D", 1100, VIRTKEY, CONTROL, SHIFT - "^R", 444, ASCII, NOINVERT -} - -cursor BITMAP "cursor_small.bmp" -okay BITMAP "okay_small.bmp" - -14432 MENU -LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED -{ - MENUITEM "yu", 100 - MENUITEM "shala", 101 - MENUITEM "kaoya", 102 -} - -testdialog DIALOG 10, 10, 200, 300 -STYLE WS_POPUP | WS_BORDER -CAPTION "Test" -{ - CTEXT "Continue:", 1, 10, 10, 230, 14 - PUSHBUTTON "&OK", 2, 66, 134, 161, 13 -} - -12 ACCELERATORS -{ - "X", 164, VIRTKEY, ALT - "H", 5678, VIRTKEY, CONTROL, SHIFT - "^R", 444, ASCII, NOINVERT -} - -"eat" MENU -LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_AUS -{ - MENUITEM "fish", 100 - MENUITEM "salad", 101 - MENUITEM "duck", 102 -} +#include "windows.h" + +LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US + +myaccelerators ACCELERATORS +{ + "^C", 999, VIRTKEY, ALT + "D", 1100, VIRTKEY, CONTROL, SHIFT + "^R", 444, ASCII, NOINVERT +} + +cursor BITMAP "cursor_small.bmp" +okay BITMAP "okay_small.bmp" + +14432 MENU +LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED +{ + MENUITEM "yu", 100 + MENUITEM "shala", 101 + MENUITEM "kaoya", 102 +} + +testdialog DIALOG 10, 10, 200, 300 +STYLE WS_POPUP | WS_BORDER +CAPTION "Test" +{ + CTEXT "Continue:", 1, 10, 10, 230, 14 + PUSHBUTTON "&OK", 2, 66, 134, 161, 13 +} + +12 ACCELERATORS +{ + "X", 164, VIRTKEY, ALT + "H", 5678, VIRTKEY, CONTROL, SHIFT + "^R", 444, ASCII, NOINVERT +} + +"eat" MENU +LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_AUS +{ + MENUITEM "fish", 100 + MENUITEM "salad", 101 + MENUITEM "duck", 102 +} diff --git a/llvm/unittests/Support/ModRefTest.cpp b/llvm/unittests/Support/ModRefTest.cpp index 35107e50b32db7..f77e7e39e14eab 100644 --- a/llvm/unittests/Support/ModRefTest.cpp +++ b/llvm/unittests/Support/ModRefTest.cpp @@ -1,27 +1,27 @@ -//===- llvm/unittest/Support/ModRefTest.cpp - ModRef tests ----------------===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// - -#include "llvm/Support/ModRef.h" -#include "llvm/ADT/SmallString.h" -#include "llvm/Support/raw_ostream.h" -#include "gtest/gtest.h" -#include - -using namespace llvm; - -namespace { - -// Verify that printing a MemoryEffects does not end with a ,. -TEST(ModRefTest, PrintMemoryEffects) { - std::string S; - raw_string_ostream OS(S); - OS << MemoryEffects::none(); - EXPECT_EQ(S, "ArgMem: NoModRef, InaccessibleMem: NoModRef, Other: NoModRef"); -} - -} // namespace +//===- llvm/unittest/Support/ModRefTest.cpp - ModRef tests ----------------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#include "llvm/Support/ModRef.h" +#include "llvm/ADT/SmallString.h" +#include "llvm/Support/raw_ostream.h" +#include "gtest/gtest.h" +#include + +using namespace llvm; + +namespace { + +// Verify that printing a MemoryEffects does not end with a ,. +TEST(ModRefTest, PrintMemoryEffects) { + std::string S; + raw_string_ostream OS(S); + OS << MemoryEffects::none(); + EXPECT_EQ(S, "ArgMem: NoModRef, InaccessibleMem: NoModRef, Other: NoModRef"); +} + +} // namespace diff --git a/llvm/utils/LLVMVisualizers/llvm.natvis b/llvm/utils/LLVMVisualizers/llvm.natvis index d83ae8013c51e2..03ca2d33a80ba6 100644 --- a/llvm/utils/LLVMVisualizers/llvm.natvis +++ b/llvm/utils/LLVMVisualizers/llvm.natvis @@ -1,408 +1,408 @@ - - - - - empty - {(value_type*)BeginX,[Size]} - {Size} elements - Uninitialized - - Size - Capacity - - Size - (value_type*)BeginX - - - - - - {U.VAL} - Cannot visualize APInts longer than 64 bits - - - {Data,[Length]} - {Length} elements - Uninitialized - - Length - - Length - Data - - - - - {(const char*)BeginX,[Size]s8} - (const char*)BeginX,[Size] - - Size - Capacity - - Size - (char*)BeginX - - - - - - {First,[Last - First]s8} - - - - {Data,[Length]s8} - Data,[Length]s8 - - Length - - Length - Data - - - - - - {($T1)*(intptr_t *)Data} - - - - - - {($T1)(*(intptr_t *)Value.Data & $T6::PointerBitMask)} - {($T4)((*(intptr_t *)Value.Data >> $T6::IntShift) & $T6::IntMask)} - {$T6::IntMask}: {($T1)(*(intptr_t *)Value.Data & $T6::PointerBitMask)} [{($T4)((*(intptr_t *)Value.Data >> $T6::IntShift) & $T6::IntMask)}] - - ($T1)(*(intptr_t *)Value.Data & $T6::PointerBitMask) - ($T4)((*(intptr_t *)Value.Data >> $T6::IntShift) & $T6::IntMask) - - - - - {($T1)(*(intptr_t *)Value.Data & $T5::PointerBitMask)} - {((*(intptr_t *)Value.Data >> $T5::IntShift) & $T5::IntMask)} - {$T5::IntMask}: {($T1)(*(intptr_t *)Value.Data & $T5::PointerBitMask)} [{((*(intptr_t *)Value.Data >> $T5::IntShift) & $T5::IntMask)}] - - ($T1)(*(intptr_t *)Value.Data & $T5::PointerBitMask) - ((*(intptr_t *)Value.Data >> $T5::IntShift) & $T5::IntMask) - - - - - - {($T4)(*(intptr_t *)Val.Value.Data & $T2::InfoTy::PointerBitMask)} - - - {($T5)(*(intptr_t *)Val.Value.Data & $T2::InfoTy::PointerBitMask)} - - Unexpected index in PointerUnion: {(*(intptr_t *)Val.Value.Data>>$T2::InfoTy::IntShift) & $T2::InfoTy::IntMask} - - "$T4",s8b - - ($T4)(*(intptr_t *)Val.Value.Data & $T2::InfoTy::PointerBitMask) - - "$T5",s8b - - ($T5)(*(intptr_t *)Val.Value.Data & $T2::InfoTy::PointerBitMask) - - - - - - {{ empty }} - {{ head={Head} }} - - - Head - Next - this - - - - - - empty - RefPtr [1 ref] {*Obj} - RefPtr [{Obj->RefCount} refs] {*Obj} - - Obj->RefCount - Obj - - - - - {{ [Small Mode] size={NumNonEmpty}, capacity={CurArraySize} }} - {{ [Big Mode] size={NumNonEmpty}, capacity={CurArraySize} }} - - NumNonEmpty - CurArraySize - - NumNonEmpty - ($T1*)CurArray - - - - - - empty - {{ size={NumEntries}, buckets={NumBuckets} }} - - NumEntries - NumBuckets - - NumBuckets - Buckets - - - - - - {{ size={NumItems}, buckets={NumBuckets} }} - - NumItems - NumBuckets - - NumBuckets - (MapEntryTy**)TheTable - - - - - - empty - ({this+1,s8}, {second}) - - this+1,s - second - - - - - {Data} - - - - None - {Storage.value} - - Storage.value - - - - - Error - {*((storage_type *)TStorage.buffer)} - - *((storage_type *)TStorage.buffer) - *((error_type *)ErrorStorage.buffer) - - - - - - - {{little endian value = {*(($T1*)(unsigned char *)Value.buffer)} }} - - (unsigned char *)Value.buffer,1 - (unsigned char *)Value.buffer,2 - (unsigned char *)Value.buffer,4 - (unsigned char *)Value.buffer,8 - - - - - - {{ big endian value = {*(unsigned char *)Value.buffer} }} - {{ big endian value = {(($T1)(*(unsigned char *)Value.buffer) << 8) - | ($T1)(*((unsigned char *)Value.buffer+1))} }} - {{ big endian value = {(($T1)(*(unsigned char *)Value.buffer) << 24) - | (($T1)(*((unsigned char *)Value.buffer+1)) << 16) - | (($T1)(*((unsigned char *)Value.buffer+2)) << 8) - | ($T1)(*((unsigned char *)Value.buffer+3))} }} - {{ big endian value = {(($T1)(*(unsigned char *)Value.buffer) << 56) - | (($T1)(*((unsigned char *)Value.buffer+1)) << 48) - | (($T1)(*((unsigned char *)Value.buffer+2)) << 40) - | (($T1)(*((unsigned char *)Value.buffer+3)) << 32) - | (($T1)(*((unsigned char *)Value.buffer+4)) << 24) - | (($T1)(*((unsigned char *)Value.buffer+5)) << 16) - | (($T1)(*((unsigned char *)Value.buffer+6)) << 8) - | ($T1)(*((unsigned char *)Value.buffer+7))} }} - - (unsigned char *)Value.buffer,1 - (unsigned char *)Value.buffer,2 - (unsigned char *)Value.buffer,4 - (unsigned char *)Value.buffer,8 - - - - - {ID} - - ID - - SubclassData - - *ContainedTys - - {NumContainedTys - 1} - - - NumContainedTys - 1 - ContainedTys + 1 - - - - SubclassData == 1 - - (SubclassData & llvm::StructType::SCDB_HasBody) != 0 - (SubclassData & llvm::StructType::SCDB_Packed) != 0 - (SubclassData & llvm::StructType::SCDB_IsLiteral) != 0 - (SubclassData & llvm::StructType::SCDB_IsSized) != 0 - - {NumContainedTys} - - - NumContainedTys - ContainedTys - - - - - *ContainedTys - ((llvm::ArrayType*)this)->NumElements - - *ContainedTys - ((llvm::VectorType*)this)->ElementQuantity - - *ContainedTys - ((llvm::VectorType*)this)->ElementQuantity - - SubclassData - *ContainedTys - - Context - - - - - $(Type) {*Value} - - - - $(Type) {(llvm::ISD::NodeType)this->NodeType} - - - NumOperands - OperandList - - - - - - i{Val.BitWidth} {Val.VAL} - - - - {IDAndSubclassData >> 8}bit integer type - - - - $(Type) {*VTy} {this->getName()} {SubclassData} - $(Type) {*VTy} anon {SubclassData} - - (Instruction*)this - (User*)this - - UseList - Next - Prev.Value & 3 == 3 ? (User*)(this + 1) : (User*)(this + 2) - - - - - - - Val - - - - - - - $(Type) {*VTy} {this->getName()} {SubclassData} - $(Type) {*VTy} anon {SubclassData} - - (Value*)this,nd - *VTy - - NumUserOperands - (llvm::Use*)this - NumUserOperands - - - NumUserOperands - *((llvm::Use**)this - 1) - - - - - - {getOpcodeName(SubclassID - InstructionVal)} - - (User*)this,nd - - - - - {this->getName()} {(LinkageTypes)Linkage} {(VisibilityTypes)Visibility} {(DLLStorageClassTypes)DllStorageClass} {(llvm::GlobalValue::ThreadLocalMode) ThreadLocal} - - - - - - - this - Next - this - - - - - - - pImpl - - - - - {ModuleID,s8} {TargetTriple} - - - - $(Type) {PassID} {Kind} - - + + + + + empty + {(value_type*)BeginX,[Size]} + {Size} elements + Uninitialized + + Size + Capacity + + Size + (value_type*)BeginX + + + + + + {U.VAL} + Cannot visualize APInts longer than 64 bits + + + {Data,[Length]} + {Length} elements + Uninitialized + + Length + + Length + Data + + + + + {(const char*)BeginX,[Size]s8} + (const char*)BeginX,[Size] + + Size + Capacity + + Size + (char*)BeginX + + + + + + {First,[Last - First]s8} + + + + {Data,[Length]s8} + Data,[Length]s8 + + Length + + Length + Data + + + + + + {($T1)*(intptr_t *)Data} + + + + + + {($T1)(*(intptr_t *)Value.Data & $T6::PointerBitMask)} + {($T4)((*(intptr_t *)Value.Data >> $T6::IntShift) & $T6::IntMask)} + {$T6::IntMask}: {($T1)(*(intptr_t *)Value.Data & $T6::PointerBitMask)} [{($T4)((*(intptr_t *)Value.Data >> $T6::IntShift) & $T6::IntMask)}] + + ($T1)(*(intptr_t *)Value.Data & $T6::PointerBitMask) + ($T4)((*(intptr_t *)Value.Data >> $T6::IntShift) & $T6::IntMask) + + + + + {($T1)(*(intptr_t *)Value.Data & $T5::PointerBitMask)} + {((*(intptr_t *)Value.Data >> $T5::IntShift) & $T5::IntMask)} + {$T5::IntMask}: {($T1)(*(intptr_t *)Value.Data & $T5::PointerBitMask)} [{((*(intptr_t *)Value.Data >> $T5::IntShift) & $T5::IntMask)}] + + ($T1)(*(intptr_t *)Value.Data & $T5::PointerBitMask) + ((*(intptr_t *)Value.Data >> $T5::IntShift) & $T5::IntMask) + + + + + + {($T4)(*(intptr_t *)Val.Value.Data & $T2::InfoTy::PointerBitMask)} + + + {($T5)(*(intptr_t *)Val.Value.Data & $T2::InfoTy::PointerBitMask)} + + Unexpected index in PointerUnion: {(*(intptr_t *)Val.Value.Data>>$T2::InfoTy::IntShift) & $T2::InfoTy::IntMask} + + "$T4",s8b + + ($T4)(*(intptr_t *)Val.Value.Data & $T2::InfoTy::PointerBitMask) + + "$T5",s8b + + ($T5)(*(intptr_t *)Val.Value.Data & $T2::InfoTy::PointerBitMask) + + + + + + {{ empty }} + {{ head={Head} }} + + + Head + Next + this + + + + + + empty + RefPtr [1 ref] {*Obj} + RefPtr [{Obj->RefCount} refs] {*Obj} + + Obj->RefCount + Obj + + + + + {{ [Small Mode] size={NumNonEmpty}, capacity={CurArraySize} }} + {{ [Big Mode] size={NumNonEmpty}, capacity={CurArraySize} }} + + NumNonEmpty + CurArraySize + + NumNonEmpty + ($T1*)CurArray + + + + + + empty + {{ size={NumEntries}, buckets={NumBuckets} }} + + NumEntries + NumBuckets + + NumBuckets + Buckets + + + + + + {{ size={NumItems}, buckets={NumBuckets} }} + + NumItems + NumBuckets + + NumBuckets + (MapEntryTy**)TheTable + + + + + + empty + ({this+1,s8}, {second}) + + this+1,s + second + + + + + {Data} + + + + None + {Storage.value} + + Storage.value + + + + + Error + {*((storage_type *)TStorage.buffer)} + + *((storage_type *)TStorage.buffer) + *((error_type *)ErrorStorage.buffer) + + + + + + + {{little endian value = {*(($T1*)(unsigned char *)Value.buffer)} }} + + (unsigned char *)Value.buffer,1 + (unsigned char *)Value.buffer,2 + (unsigned char *)Value.buffer,4 + (unsigned char *)Value.buffer,8 + + + + + + {{ big endian value = {*(unsigned char *)Value.buffer} }} + {{ big endian value = {(($T1)(*(unsigned char *)Value.buffer) << 8) + | ($T1)(*((unsigned char *)Value.buffer+1))} }} + {{ big endian value = {(($T1)(*(unsigned char *)Value.buffer) << 24) + | (($T1)(*((unsigned char *)Value.buffer+1)) << 16) + | (($T1)(*((unsigned char *)Value.buffer+2)) << 8) + | ($T1)(*((unsigned char *)Value.buffer+3))} }} + {{ big endian value = {(($T1)(*(unsigned char *)Value.buffer) << 56) + | (($T1)(*((unsigned char *)Value.buffer+1)) << 48) + | (($T1)(*((unsigned char *)Value.buffer+2)) << 40) + | (($T1)(*((unsigned char *)Value.buffer+3)) << 32) + | (($T1)(*((unsigned char *)Value.buffer+4)) << 24) + | (($T1)(*((unsigned char *)Value.buffer+5)) << 16) + | (($T1)(*((unsigned char *)Value.buffer+6)) << 8) + | ($T1)(*((unsigned char *)Value.buffer+7))} }} + + (unsigned char *)Value.buffer,1 + (unsigned char *)Value.buffer,2 + (unsigned char *)Value.buffer,4 + (unsigned char *)Value.buffer,8 + + + + + {ID} + + ID + + SubclassData + + *ContainedTys + + {NumContainedTys - 1} + + + NumContainedTys - 1 + ContainedTys + 1 + + + + SubclassData == 1 + + (SubclassData & llvm::StructType::SCDB_HasBody) != 0 + (SubclassData & llvm::StructType::SCDB_Packed) != 0 + (SubclassData & llvm::StructType::SCDB_IsLiteral) != 0 + (SubclassData & llvm::StructType::SCDB_IsSized) != 0 + + {NumContainedTys} + + + NumContainedTys + ContainedTys + + + + + *ContainedTys + ((llvm::ArrayType*)this)->NumElements + + *ContainedTys + ((llvm::VectorType*)this)->ElementQuantity + + *ContainedTys + ((llvm::VectorType*)this)->ElementQuantity + + SubclassData + *ContainedTys + + Context + + + + + $(Type) {*Value} + + + + $(Type) {(llvm::ISD::NodeType)this->NodeType} + + + NumOperands + OperandList + + + + + + i{Val.BitWidth} {Val.VAL} + + + + {IDAndSubclassData >> 8}bit integer type + + + + $(Type) {*VTy} {this->getName()} {SubclassData} + $(Type) {*VTy} anon {SubclassData} + + (Instruction*)this + (User*)this + + UseList + Next + Prev.Value & 3 == 3 ? (User*)(this + 1) : (User*)(this + 2) + + + + + + + Val + + + + + + + $(Type) {*VTy} {this->getName()} {SubclassData} + $(Type) {*VTy} anon {SubclassData} + + (Value*)this,nd + *VTy + + NumUserOperands + (llvm::Use*)this - NumUserOperands + + + NumUserOperands + *((llvm::Use**)this - 1) + + + + + + {getOpcodeName(SubclassID - InstructionVal)} + + (User*)this,nd + + + + + {this->getName()} {(LinkageTypes)Linkage} {(VisibilityTypes)Visibility} {(DLLStorageClassTypes)DllStorageClass} {(llvm::GlobalValue::ThreadLocalMode) ThreadLocal} + + + + + + + this + Next + this + + + + + + + pImpl + + + + + {ModuleID,s8} {TargetTriple} + + + + $(Type) {PassID} {Kind} + + diff --git a/llvm/utils/lit/tests/Inputs/shtest-shell/diff-in.dos b/llvm/utils/lit/tests/Inputs/shtest-shell/diff-in.dos index 7a0560654c5c70..0f25621c787ed3 100644 --- a/llvm/utils/lit/tests/Inputs/shtest-shell/diff-in.dos +++ b/llvm/utils/lit/tests/Inputs/shtest-shell/diff-in.dos @@ -1,3 +1,3 @@ -In this file, the -sequence "\r\n" -terminates lines. +In this file, the +sequence "\r\n" +terminates lines. diff --git a/llvm/utils/release/build_llvm_release.bat b/llvm/utils/release/build_llvm_release.bat index dd041d7d384ec4..3718673ae7a28d 100755 --- a/llvm/utils/release/build_llvm_release.bat +++ b/llvm/utils/release/build_llvm_release.bat @@ -1,515 +1,515 @@ - at echo off -setlocal enabledelayedexpansion - -goto begin - -:usage -echo Script for building the LLVM installer on Windows, -echo used for the releases at https://github.com/llvm/llvm-project/releases -echo. -echo Usage: build_llvm_release.bat --version ^ [--x86,--x64, --arm64] [--skip-checkout] [--local-python] -echo. -echo Options: -echo --version: [required] version to build -echo --help: display this help -echo --x86: build and test x86 variant -echo --x64: build and test x64 variant -echo --arm64: build and test arm64 variant -echo --skip-checkout: use local git checkout instead of downloading src.zip -echo --local-python: use installed Python and does not try to use a specific version (3.10) -echo. -echo Note: At least one variant to build is required. -echo. -echo Example: build_llvm_release.bat --version 15.0.0 --x86 --x64 -exit /b 1 - -:begin - -::============================================================================== -:: parse args -set version= -set help= -set x86= -set x64= -set arm64= -set skip-checkout= -set local-python= -call :parse_args %* - -if "%help%" NEQ "" goto usage - -if "%version%" == "" ( - echo --version option is required - echo ============================= - goto usage -) - -if "%arm64%" == "" if "%x64%" == "" if "%x86%" == "" ( - echo nothing to build! - echo choose one or several variants from: --x86 --x64 --arm64 - exit /b 1 -) - -::============================================================================== -:: check prerequisites -REM Note: -REM 7zip versions 21.x and higher will try to extract the symlinks in -REM llvm's git archive, which requires running as administrator. - -REM Check 7-zip version and/or administrator permissions. -for /f "delims=" %%i in ('7z.exe ^| findstr /r "2[1-9].[0-9][0-9]"') do set version_7z=%%i -if not "%version_7z%"=="" ( - REM Unique temporary filename to use by the 'mklink' command. - set "link_name=%temp%\%username%_%random%_%random%.tmp" - - REM As the 'mklink' requires elevated permissions, the symbolic link - REM creation will fail if the script is not running as administrator. - mklink /d "!link_name!" . 1>nul 2>nul - if errorlevel 1 ( - echo. - echo Script requires administrator permissions, or a 7-zip version 20.x or older. - echo Current version is "%version_7z%" - exit /b 1 - ) else ( - REM Remove the temporary symbolic link. - rd "!link_name!" - ) -) - -REM Prerequisites: -REM -REM Visual Studio 2019, CMake, Ninja, GNUWin32, SWIG, Python 3, -REM NSIS with the strlen_8192 patch, -REM Perl (for the OpenMP run-time). -REM -REM -REM For LLDB, SWIG version 4.1.1 should be used. -REM - -:: Detect Visual Studio -set vsinstall= -set vswhere=%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\vswhere.exe - -if "%VSINSTALLDIR%" NEQ "" ( - echo using enabled Visual Studio installation - set "vsinstall=%VSINSTALLDIR%" -) else ( - echo using vswhere to detect Visual Studio installation - FOR /F "delims=" %%r IN ('^""%vswhere%" -nologo -latest -products "*" -all -property installationPath^"') DO set vsinstall=%%r -) -set "vsdevcmd=%vsinstall%\Common7\Tools\VsDevCmd.bat" - -if not exist "%vsdevcmd%" ( - echo Can't find any installation of Visual Studio - exit /b 1 -) -echo Using VS devcmd: %vsdevcmd% - -::============================================================================== -:: start echoing what we do - at echo on - -set python32_dir=C:\Users\%USERNAME%\AppData\Local\Programs\Python\Python310-32 -set python64_dir=C:\Users\%USERNAME%\AppData\Local\Programs\Python\Python310 -set pythonarm64_dir=C:\Users\%USERNAME%\AppData\Local\Programs\Python\Python311-arm64 - -set revision=llvmorg-%version% -set package_version=%version% -set build_dir=%cd%\llvm_package_%package_version% - -echo Revision: %revision% -echo Package version: %package_version% -echo Build dir: %build_dir% -echo. - -if exist %build_dir% ( - echo Build directory already exists: %build_dir% - exit /b 1 -) -mkdir %build_dir% -cd %build_dir% || exit /b 1 - -if "%skip-checkout%" == "true" ( - echo Using local source - set llvm_src=%~dp0..\..\.. -) else ( - echo Checking out %revision% - curl -L https://github.com/llvm/llvm-project/archive/%revision%.zip -o src.zip || exit /b 1 - 7z x src.zip || exit /b 1 - mv llvm-project-* llvm-project || exit /b 1 - set llvm_src=%build_dir%\llvm-project -) - -curl -O https://gitlab.gnome.org/GNOME/libxml2/-/archive/v2.9.12/libxml2-v2.9.12.tar.gz || exit /b 1 -tar zxf libxml2-v2.9.12.tar.gz - -REM Setting CMAKE_CL_SHOWINCLUDES_PREFIX to work around PR27226. -REM Common flags for all builds. -set common_compiler_flags=-DLIBXML_STATIC -set common_cmake_flags=^ - -DCMAKE_BUILD_TYPE=Release ^ - -DLLVM_ENABLE_ASSERTIONS=OFF ^ - -DLLVM_INSTALL_TOOLCHAIN_ONLY=ON ^ - -DLLVM_TARGETS_TO_BUILD="AArch64;ARM;X86" ^ - -DLLVM_BUILD_LLVM_C_DYLIB=ON ^ - -DCMAKE_INSTALL_UCRT_LIBRARIES=ON ^ - -DPython3_FIND_REGISTRY=NEVER ^ - -DPACKAGE_VERSION=%package_version% ^ - -DLLDB_RELOCATABLE_PYTHON=1 ^ - -DLLDB_EMBED_PYTHON_HOME=OFF ^ - -DCMAKE_CL_SHOWINCLUDES_PREFIX="Note: including file: " ^ - -DLLVM_ENABLE_LIBXML2=FORCE_ON ^ - -DLLDB_ENABLE_LIBXML2=OFF ^ - -DCLANG_ENABLE_LIBXML2=OFF ^ - -DCMAKE_C_FLAGS="%common_compiler_flags%" ^ - -DCMAKE_CXX_FLAGS="%common_compiler_flags%" ^ - -DLLVM_ENABLE_RPMALLOC=ON ^ - -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;lld;compiler-rt;lldb;openmp" - -set cmake_profile_flags="" - -REM Preserve original path -set OLDPATH=%PATH% - -REM Build the 32-bits and/or 64-bits binaries. -if "%x86%" == "true" call :do_build_32 || exit /b 1 -if "%x64%" == "true" call :do_build_64 || exit /b 1 -if "%arm64%" == "true" call :do_build_arm64 || exit /b 1 -exit /b 0 - -::============================================================================== -:: Build 32-bits binaries. -::============================================================================== -:do_build_32 -call :set_environment %python32_dir% || exit /b 1 -call "%vsdevcmd%" -arch=x86 || exit /b 1 - at echo on -mkdir build32_stage0 -cd build32_stage0 -call :do_build_libxml || exit /b 1 - -REM Stage0 binaries directory; used in stage1. -set "stage0_bin_dir=%build_dir%/build32_stage0/bin" -set cmake_flags=^ - %common_cmake_flags% ^ - -DLLVM_ENABLE_RPMALLOC=OFF ^ - -DLLDB_TEST_COMPILER=%stage0_bin_dir%/clang.exe ^ - -DPYTHON_HOME=%PYTHONHOME% ^ - -DPython3_ROOT_DIR=%PYTHONHOME% ^ - -DLIBXML2_INCLUDE_DIR=%libxmldir%/include/libxml2 ^ - -DLIBXML2_LIBRARIES=%libxmldir%/lib/libxml2s.lib - -cmake -GNinja %cmake_flags% %llvm_src%\llvm || exit /b 1 -ninja || ninja || ninja || exit /b 1 -REM ninja check-llvm || ninja check-llvm || ninja check-llvm || exit /b 1 -REM ninja check-clang || ninja check-clang || ninja check-clang || exit /b 1 -ninja check-lld || ninja check-lld || ninja check-lld || exit /b 1 -ninja check-sanitizer || ninja check-sanitizer || ninja check-sanitizer || exit /b 1 -REM ninja check-clang-tools || ninja check-clang-tools || ninja check-clang-tools || exit /b 1 -cd.. - -REM CMake expects the paths that specifies the compiler and linker to be -REM with forward slash. -set all_cmake_flags=^ - %cmake_flags% ^ - -DCMAKE_C_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ - -DCMAKE_CXX_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ - -DCMAKE_LINKER=%stage0_bin_dir%/lld-link.exe ^ - -DCMAKE_AR=%stage0_bin_dir%/llvm-lib.exe ^ - -DCMAKE_RC=%stage0_bin_dir%/llvm-windres.exe -set cmake_flags=%all_cmake_flags:\=/% - -mkdir build32 -cd build32 -cmake -GNinja %cmake_flags% %llvm_src%\llvm || exit /b 1 -ninja || ninja || ninja || exit /b 1 -REM ninja check-llvm || ninja check-llvm || ninja check-llvm || exit /b 1 -REM ninja check-clang || ninja check-clang || ninja check-clang || exit /b 1 -ninja check-lld || ninja check-lld || ninja check-lld || exit /b 1 -ninja check-sanitizer || ninja check-sanitizer || ninja check-sanitizer || exit /b 1 -REM ninja check-clang-tools || ninja check-clang-tools || ninja check-clang-tools || exit /b 1 -ninja package || exit /b 1 -cd .. - -exit /b 0 -::============================================================================== - -::============================================================================== -:: Build 64-bits binaries. -::============================================================================== -:do_build_64 -call :set_environment %python64_dir% || exit /b 1 -call "%vsdevcmd%" -arch=amd64 || exit /b 1 - at echo on -mkdir build64_stage0 -cd build64_stage0 -call :do_build_libxml || exit /b 1 - -REM Stage0 binaries directory; used in stage1. -set "stage0_bin_dir=%build_dir%/build64_stage0/bin" -set cmake_flags=^ - %common_cmake_flags% ^ - -DLLDB_TEST_COMPILER=%stage0_bin_dir%/clang.exe ^ - -DPYTHON_HOME=%PYTHONHOME% ^ - -DPython3_ROOT_DIR=%PYTHONHOME% ^ - -DLIBXML2_INCLUDE_DIR=%libxmldir%/include/libxml2 ^ - -DLIBXML2_LIBRARIES=%libxmldir%/lib/libxml2s.lib - -cmake -GNinja %cmake_flags% %llvm_src%\llvm || exit /b 1 -ninja || ninja || ninja || exit /b 1 -ninja check-llvm || ninja check-llvm || ninja check-llvm || exit /b 1 -ninja check-clang || ninja check-clang || ninja check-clang || exit /b 1 -ninja check-lld || ninja check-lld || ninja check-lld || exit /b 1 -ninja check-sanitizer || ninja check-sanitizer || ninja check-sanitizer || exit /b 1 -ninja check-clang-tools || ninja check-clang-tools || ninja check-clang-tools || exit /b 1 -ninja check-clangd || ninja check-clangd || ninja check-clangd || exit /b 1 -cd.. - -REM CMake expects the paths that specifies the compiler and linker to be -REM with forward slash. -set all_cmake_flags=^ - %cmake_flags% ^ - -DCMAKE_C_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ - -DCMAKE_CXX_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ - -DCMAKE_LINKER=%stage0_bin_dir%/lld-link.exe ^ - -DCMAKE_AR=%stage0_bin_dir%/llvm-lib.exe ^ - -DCMAKE_RC=%stage0_bin_dir%/llvm-windres.exe -set cmake_flags=%all_cmake_flags:\=/% - - -mkdir build64 -cd build64 -call :do_generate_profile || exit /b 1 -cmake -GNinja %cmake_flags% %cmake_profile_flags% %llvm_src%\llvm || exit /b 1 -ninja || ninja || ninja || exit /b 1 -ninja check-llvm || ninja check-llvm || ninja check-llvm || exit /b 1 -ninja check-clang || ninja check-clang || ninja check-clang || exit /b 1 -ninja check-lld || ninja check-lld || ninja check-lld || exit /b 1 -ninja check-sanitizer || ninja check-sanitizer || ninja check-sanitizer || exit /b 1 -ninja check-clang-tools || ninja check-clang-tools || ninja check-clang-tools || exit /b 1 -ninja check-clangd || ninja check-clangd || ninja check-clangd || exit /b 1 -ninja package || exit /b 1 - -:: generate tarball with install toolchain only off -set filename=clang+llvm-%version%-x86_64-pc-windows-msvc -cmake -GNinja %cmake_flags% %cmake_profile_flags% -DLLVM_INSTALL_TOOLCHAIN_ONLY=OFF ^ - -DCMAKE_INSTALL_PREFIX=%build_dir%/%filename% ..\llvm-project\llvm || exit /b 1 -ninja install || exit /b 1 -:: check llvm_config is present & returns something -%build_dir%/%filename%/bin/llvm-config.exe --bindir || exit /b 1 -cd .. -7z a -ttar -so %filename%.tar %filename% | 7z a -txz -si %filename%.tar.xz - -exit /b 0 -::============================================================================== - -::============================================================================== -:: Build arm64 binaries. -::============================================================================== -:do_build_arm64 -call :set_environment %pythonarm64_dir% || exit /b 1 -call "%vsdevcmd%" -host_arch=x64 -arch=arm64 || exit /b 1 - at echo on -mkdir build_arm64_stage0 -cd build_arm64_stage0 -call :do_build_libxml || exit /b 1 - -REM Stage0 binaries directory; used in stage1. -set "stage0_bin_dir=%build_dir%/build_arm64_stage0/bin" -set cmake_flags=^ - %common_cmake_flags% ^ - -DCLANG_DEFAULT_LINKER=lld ^ - -DLIBXML2_INCLUDE_DIR=%libxmldir%/include/libxml2 ^ - -DLIBXML2_LIBRARIES=%libxmldir%/lib/libxml2s.lib ^ - -DPython3_ROOT_DIR=%PYTHONHOME% ^ - -DCOMPILER_RT_BUILD_PROFILE=OFF ^ - -DCOMPILER_RT_BUILD_SANITIZERS=OFF - -REM We need to build stage0 compiler-rt with clang-cl (msvc lacks some builtins). -cmake -GNinja %cmake_flags% ^ - -DCMAKE_C_COMPILER=clang-cl.exe ^ - -DCMAKE_CXX_COMPILER=clang-cl.exe ^ - %llvm_src%\llvm || exit /b 1 -ninja || exit /b 1 -::ninja check-llvm || exit /b 1 -::ninja check-clang || exit /b 1 -::ninja check-lld || exit /b 1 -::ninja check-sanitizer || exit /b 1 -::ninja check-clang-tools || exit /b 1 -::ninja check-clangd || exit /b 1 -cd.. - -REM CMake expects the paths that specifies the compiler and linker to be -REM with forward slash. -REM CPACK_SYSTEM_NAME is set to have a correct name for installer generated. -set all_cmake_flags=^ - %cmake_flags% ^ - -DCMAKE_C_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ - -DCMAKE_CXX_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ - -DCMAKE_LINKER=%stage0_bin_dir%/lld-link.exe ^ - -DCMAKE_AR=%stage0_bin_dir%/llvm-lib.exe ^ - -DCMAKE_RC=%stage0_bin_dir%/llvm-windres.exe ^ - -DCPACK_SYSTEM_NAME=woa64 -set cmake_flags=%all_cmake_flags:\=/% - -mkdir build_arm64 -cd build_arm64 -cmake -GNinja %cmake_flags% %llvm_src%\llvm || exit /b 1 -ninja || exit /b 1 -REM Check but do not fail on errors. -ninja check-lldb -::ninja check-llvm || exit /b 1 -::ninja check-clang || exit /b 1 -::ninja check-lld || exit /b 1 -::ninja check-sanitizer || exit /b 1 -::ninja check-clang-tools || exit /b 1 -::ninja check-clangd || exit /b 1 -ninja package || exit /b 1 -cd .. - -exit /b 0 -::============================================================================== -:: -::============================================================================== -:: Set PATH and some environment variables. -::============================================================================== -:set_environment -REM Restore original path -set PATH=%OLDPATH% - -set python_dir=%1 - -REM Set Python environment -if "%local-python%" == "true" ( - FOR /F "delims=" %%i IN ('where python.exe ^| head -1') DO set python_exe=%%i - set PYTHONHOME=!python_exe:~0,-11! -) else ( - %python_dir%/python.exe --version || exit /b 1 - set PYTHONHOME=%python_dir% -) -set PATH=%PYTHONHOME%;%PATH% - -set "VSCMD_START_DIR=%build_dir%" - -exit /b 0 - -::============================================================================= - -::============================================================================== -:: Build libxml. -::============================================================================== -:do_build_libxml -mkdir libxmlbuild -cd libxmlbuild -cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=install ^ - -DBUILD_SHARED_LIBS=OFF -DLIBXML2_WITH_C14N=OFF -DLIBXML2_WITH_CATALOG=OFF ^ - -DLIBXML2_WITH_DEBUG=OFF -DLIBXML2_WITH_DOCB=OFF -DLIBXML2_WITH_FTP=OFF ^ - -DLIBXML2_WITH_HTML=OFF -DLIBXML2_WITH_HTTP=OFF -DLIBXML2_WITH_ICONV=OFF ^ - -DLIBXML2_WITH_ICU=OFF -DLIBXML2_WITH_ISO8859X=OFF -DLIBXML2_WITH_LEGACY=OFF ^ - -DLIBXML2_WITH_LZMA=OFF -DLIBXML2_WITH_MEM_DEBUG=OFF -DLIBXML2_WITH_MODULES=OFF ^ - -DLIBXML2_WITH_OUTPUT=ON -DLIBXML2_WITH_PATTERN=OFF -DLIBXML2_WITH_PROGRAMS=OFF ^ - -DLIBXML2_WITH_PUSH=OFF -DLIBXML2_WITH_PYTHON=OFF -DLIBXML2_WITH_READER=OFF ^ - -DLIBXML2_WITH_REGEXPS=OFF -DLIBXML2_WITH_RUN_DEBUG=OFF -DLIBXML2_WITH_SAX1=OFF ^ - -DLIBXML2_WITH_SCHEMAS=OFF -DLIBXML2_WITH_SCHEMATRON=OFF -DLIBXML2_WITH_TESTS=OFF ^ - -DLIBXML2_WITH_THREADS=ON -DLIBXML2_WITH_THREAD_ALLOC=OFF -DLIBXML2_WITH_TREE=ON ^ - -DLIBXML2_WITH_VALID=OFF -DLIBXML2_WITH_WRITER=OFF -DLIBXML2_WITH_XINCLUDE=OFF ^ - -DLIBXML2_WITH_XPATH=OFF -DLIBXML2_WITH_XPTR=OFF -DLIBXML2_WITH_ZLIB=OFF ^ - -DCMAKE_MSVC_RUNTIME_LIBRARY=MultiThreaded ^ - ../../libxml2-v2.9.12 || exit /b 1 -ninja install || exit /b 1 -set libxmldir=%cd%\install -set "libxmldir=%libxmldir:\=/%" -cd .. -exit /b 0 - -::============================================================================== -:: Generate a PGO profile. -::============================================================================== -:do_generate_profile -REM Build Clang with instrumentation. -mkdir instrument -cd instrument -cmake -GNinja %cmake_flags% -DLLVM_TARGETS_TO_BUILD=Native ^ - -DLLVM_BUILD_INSTRUMENTED=IR %llvm_src%\llvm || exit /b 1 -ninja clang || ninja clang || ninja clang || exit /b 1 -set instrumented_clang=%cd:\=/%/bin/clang-cl.exe -cd .. -REM Use that to build part of llvm to generate a profile. -mkdir train -cd train -cmake -GNinja %cmake_flags% ^ - -DCMAKE_C_COMPILER=%instrumented_clang% ^ - -DCMAKE_CXX_COMPILER=%instrumented_clang% ^ - -DLLVM_ENABLE_PROJECTS=clang ^ - -DLLVM_TARGETS_TO_BUILD=Native ^ - %llvm_src%\llvm || exit /b 1 -REM Drop profiles generated from running cmake; those are not representative. -del ..\instrument\profiles\*.profraw -ninja tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/Sema.cpp.obj -cd .. -set profile=%cd:\=/%/profile.profdata -%stage0_bin_dir%\llvm-profdata merge -output=%profile% instrument\profiles\*.profraw || exit /b 1 -set common_compiler_flags=%common_compiler_flags% -Wno-backend-plugin -set cmake_profile_flags=-DLLVM_PROFDATA_FILE=%profile% ^ - -DCMAKE_C_FLAGS="%common_compiler_flags%" ^ - -DCMAKE_CXX_FLAGS="%common_compiler_flags%" -exit /b 0 - -::============================================================================= -:: Parse command line arguments. -:: The format for the arguments is: -:: Boolean: --option -:: Value: --optionvalue -:: with being: space, colon, semicolon or equal sign -:: -:: Command line usage example: -:: my-batch-file.bat --build --type=release --version 123 -:: It will create 3 variables: -:: 'build' with the value 'true' -:: 'type' with the value 'release' -:: 'version' with the value '123' -:: -:: Usage: -:: set "build=" -:: set "type=" -:: set "version=" -:: -:: REM Parse arguments. -:: call :parse_args %* -:: -:: if defined build ( -:: ... -:: ) -:: if %type%=='release' ( -:: ... -:: ) -:: if %version%=='123' ( -:: ... -:: ) -::============================================================================= -:parse_args - set "arg_name=" - :parse_args_start - if "%1" == "" ( - :: Set a seen boolean argument. - if "%arg_name%" neq "" ( - set "%arg_name%=true" - ) - goto :parse_args_done - ) - set aux=%1 - if "%aux:~0,2%" == "--" ( - :: Set a seen boolean argument. - if "%arg_name%" neq "" ( - set "%arg_name%=true" - ) - set "arg_name=%aux:~2,250%" - ) else ( - set "%arg_name%=%1" - set "arg_name=" - ) - shift - goto :parse_args_start - -:parse_args_done -exit /b 0 + at echo off +setlocal enabledelayedexpansion + +goto begin + +:usage +echo Script for building the LLVM installer on Windows, +echo used for the releases at https://github.com/llvm/llvm-project/releases +echo. +echo Usage: build_llvm_release.bat --version ^ [--x86,--x64, --arm64] [--skip-checkout] [--local-python] +echo. +echo Options: +echo --version: [required] version to build +echo --help: display this help +echo --x86: build and test x86 variant +echo --x64: build and test x64 variant +echo --arm64: build and test arm64 variant +echo --skip-checkout: use local git checkout instead of downloading src.zip +echo --local-python: use installed Python and does not try to use a specific version (3.10) +echo. +echo Note: At least one variant to build is required. +echo. +echo Example: build_llvm_release.bat --version 15.0.0 --x86 --x64 +exit /b 1 + +:begin + +::============================================================================== +:: parse args +set version= +set help= +set x86= +set x64= +set arm64= +set skip-checkout= +set local-python= +call :parse_args %* + +if "%help%" NEQ "" goto usage + +if "%version%" == "" ( + echo --version option is required + echo ============================= + goto usage +) + +if "%arm64%" == "" if "%x64%" == "" if "%x86%" == "" ( + echo nothing to build! + echo choose one or several variants from: --x86 --x64 --arm64 + exit /b 1 +) + +::============================================================================== +:: check prerequisites +REM Note: +REM 7zip versions 21.x and higher will try to extract the symlinks in +REM llvm's git archive, which requires running as administrator. + +REM Check 7-zip version and/or administrator permissions. +for /f "delims=" %%i in ('7z.exe ^| findstr /r "2[1-9].[0-9][0-9]"') do set version_7z=%%i +if not "%version_7z%"=="" ( + REM Unique temporary filename to use by the 'mklink' command. + set "link_name=%temp%\%username%_%random%_%random%.tmp" + + REM As the 'mklink' requires elevated permissions, the symbolic link + REM creation will fail if the script is not running as administrator. + mklink /d "!link_name!" . 1>nul 2>nul + if errorlevel 1 ( + echo. + echo Script requires administrator permissions, or a 7-zip version 20.x or older. + echo Current version is "%version_7z%" + exit /b 1 + ) else ( + REM Remove the temporary symbolic link. + rd "!link_name!" + ) +) + +REM Prerequisites: +REM +REM Visual Studio 2019, CMake, Ninja, GNUWin32, SWIG, Python 3, +REM NSIS with the strlen_8192 patch, +REM Perl (for the OpenMP run-time). +REM +REM +REM For LLDB, SWIG version 4.1.1 should be used. +REM + +:: Detect Visual Studio +set vsinstall= +set vswhere=%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\vswhere.exe + +if "%VSINSTALLDIR%" NEQ "" ( + echo using enabled Visual Studio installation + set "vsinstall=%VSINSTALLDIR%" +) else ( + echo using vswhere to detect Visual Studio installation + FOR /F "delims=" %%r IN ('^""%vswhere%" -nologo -latest -products "*" -all -property installationPath^"') DO set vsinstall=%%r +) +set "vsdevcmd=%vsinstall%\Common7\Tools\VsDevCmd.bat" + +if not exist "%vsdevcmd%" ( + echo Can't find any installation of Visual Studio + exit /b 1 +) +echo Using VS devcmd: %vsdevcmd% + +::============================================================================== +:: start echoing what we do + at echo on + +set python32_dir=C:\Users\%USERNAME%\AppData\Local\Programs\Python\Python310-32 +set python64_dir=C:\Users\%USERNAME%\AppData\Local\Programs\Python\Python310 +set pythonarm64_dir=C:\Users\%USERNAME%\AppData\Local\Programs\Python\Python311-arm64 + +set revision=llvmorg-%version% +set package_version=%version% +set build_dir=%cd%\llvm_package_%package_version% + +echo Revision: %revision% +echo Package version: %package_version% +echo Build dir: %build_dir% +echo. + +if exist %build_dir% ( + echo Build directory already exists: %build_dir% + exit /b 1 +) +mkdir %build_dir% +cd %build_dir% || exit /b 1 + +if "%skip-checkout%" == "true" ( + echo Using local source + set llvm_src=%~dp0..\..\.. +) else ( + echo Checking out %revision% + curl -L https://github.com/llvm/llvm-project/archive/%revision%.zip -o src.zip || exit /b 1 + 7z x src.zip || exit /b 1 + mv llvm-project-* llvm-project || exit /b 1 + set llvm_src=%build_dir%\llvm-project +) + +curl -O https://gitlab.gnome.org/GNOME/libxml2/-/archive/v2.9.12/libxml2-v2.9.12.tar.gz || exit /b 1 +tar zxf libxml2-v2.9.12.tar.gz + +REM Setting CMAKE_CL_SHOWINCLUDES_PREFIX to work around PR27226. +REM Common flags for all builds. +set common_compiler_flags=-DLIBXML_STATIC +set common_cmake_flags=^ + -DCMAKE_BUILD_TYPE=Release ^ + -DLLVM_ENABLE_ASSERTIONS=OFF ^ + -DLLVM_INSTALL_TOOLCHAIN_ONLY=ON ^ + -DLLVM_TARGETS_TO_BUILD="AArch64;ARM;X86" ^ + -DLLVM_BUILD_LLVM_C_DYLIB=ON ^ + -DCMAKE_INSTALL_UCRT_LIBRARIES=ON ^ + -DPython3_FIND_REGISTRY=NEVER ^ + -DPACKAGE_VERSION=%package_version% ^ + -DLLDB_RELOCATABLE_PYTHON=1 ^ + -DLLDB_EMBED_PYTHON_HOME=OFF ^ + -DCMAKE_CL_SHOWINCLUDES_PREFIX="Note: including file: " ^ + -DLLVM_ENABLE_LIBXML2=FORCE_ON ^ + -DLLDB_ENABLE_LIBXML2=OFF ^ + -DCLANG_ENABLE_LIBXML2=OFF ^ + -DCMAKE_C_FLAGS="%common_compiler_flags%" ^ + -DCMAKE_CXX_FLAGS="%common_compiler_flags%" ^ + -DLLVM_ENABLE_RPMALLOC=ON ^ + -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;lld;compiler-rt;lldb;openmp" + +set cmake_profile_flags="" + +REM Preserve original path +set OLDPATH=%PATH% + +REM Build the 32-bits and/or 64-bits binaries. +if "%x86%" == "true" call :do_build_32 || exit /b 1 +if "%x64%" == "true" call :do_build_64 || exit /b 1 +if "%arm64%" == "true" call :do_build_arm64 || exit /b 1 +exit /b 0 + +::============================================================================== +:: Build 32-bits binaries. +::============================================================================== +:do_build_32 +call :set_environment %python32_dir% || exit /b 1 +call "%vsdevcmd%" -arch=x86 || exit /b 1 + at echo on +mkdir build32_stage0 +cd build32_stage0 +call :do_build_libxml || exit /b 1 + +REM Stage0 binaries directory; used in stage1. +set "stage0_bin_dir=%build_dir%/build32_stage0/bin" +set cmake_flags=^ + %common_cmake_flags% ^ + -DLLVM_ENABLE_RPMALLOC=OFF ^ + -DLLDB_TEST_COMPILER=%stage0_bin_dir%/clang.exe ^ + -DPYTHON_HOME=%PYTHONHOME% ^ + -DPython3_ROOT_DIR=%PYTHONHOME% ^ + -DLIBXML2_INCLUDE_DIR=%libxmldir%/include/libxml2 ^ + -DLIBXML2_LIBRARIES=%libxmldir%/lib/libxml2s.lib + +cmake -GNinja %cmake_flags% %llvm_src%\llvm || exit /b 1 +ninja || ninja || ninja || exit /b 1 +REM ninja check-llvm || ninja check-llvm || ninja check-llvm || exit /b 1 +REM ninja check-clang || ninja check-clang || ninja check-clang || exit /b 1 +ninja check-lld || ninja check-lld || ninja check-lld || exit /b 1 +ninja check-sanitizer || ninja check-sanitizer || ninja check-sanitizer || exit /b 1 +REM ninja check-clang-tools || ninja check-clang-tools || ninja check-clang-tools || exit /b 1 +cd.. + +REM CMake expects the paths that specifies the compiler and linker to be +REM with forward slash. +set all_cmake_flags=^ + %cmake_flags% ^ + -DCMAKE_C_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ + -DCMAKE_CXX_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ + -DCMAKE_LINKER=%stage0_bin_dir%/lld-link.exe ^ + -DCMAKE_AR=%stage0_bin_dir%/llvm-lib.exe ^ + -DCMAKE_RC=%stage0_bin_dir%/llvm-windres.exe +set cmake_flags=%all_cmake_flags:\=/% + +mkdir build32 +cd build32 +cmake -GNinja %cmake_flags% %llvm_src%\llvm || exit /b 1 +ninja || ninja || ninja || exit /b 1 +REM ninja check-llvm || ninja check-llvm || ninja check-llvm || exit /b 1 +REM ninja check-clang || ninja check-clang || ninja check-clang || exit /b 1 +ninja check-lld || ninja check-lld || ninja check-lld || exit /b 1 +ninja check-sanitizer || ninja check-sanitizer || ninja check-sanitizer || exit /b 1 +REM ninja check-clang-tools || ninja check-clang-tools || ninja check-clang-tools || exit /b 1 +ninja package || exit /b 1 +cd .. + +exit /b 0 +::============================================================================== + +::============================================================================== +:: Build 64-bits binaries. +::============================================================================== +:do_build_64 +call :set_environment %python64_dir% || exit /b 1 +call "%vsdevcmd%" -arch=amd64 || exit /b 1 + at echo on +mkdir build64_stage0 +cd build64_stage0 +call :do_build_libxml || exit /b 1 + +REM Stage0 binaries directory; used in stage1. +set "stage0_bin_dir=%build_dir%/build64_stage0/bin" +set cmake_flags=^ + %common_cmake_flags% ^ + -DLLDB_TEST_COMPILER=%stage0_bin_dir%/clang.exe ^ + -DPYTHON_HOME=%PYTHONHOME% ^ + -DPython3_ROOT_DIR=%PYTHONHOME% ^ + -DLIBXML2_INCLUDE_DIR=%libxmldir%/include/libxml2 ^ + -DLIBXML2_LIBRARIES=%libxmldir%/lib/libxml2s.lib + +cmake -GNinja %cmake_flags% %llvm_src%\llvm || exit /b 1 +ninja || ninja || ninja || exit /b 1 +ninja check-llvm || ninja check-llvm || ninja check-llvm || exit /b 1 +ninja check-clang || ninja check-clang || ninja check-clang || exit /b 1 +ninja check-lld || ninja check-lld || ninja check-lld || exit /b 1 +ninja check-sanitizer || ninja check-sanitizer || ninja check-sanitizer || exit /b 1 +ninja check-clang-tools || ninja check-clang-tools || ninja check-clang-tools || exit /b 1 +ninja check-clangd || ninja check-clangd || ninja check-clangd || exit /b 1 +cd.. + +REM CMake expects the paths that specifies the compiler and linker to be +REM with forward slash. +set all_cmake_flags=^ + %cmake_flags% ^ + -DCMAKE_C_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ + -DCMAKE_CXX_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ + -DCMAKE_LINKER=%stage0_bin_dir%/lld-link.exe ^ + -DCMAKE_AR=%stage0_bin_dir%/llvm-lib.exe ^ + -DCMAKE_RC=%stage0_bin_dir%/llvm-windres.exe +set cmake_flags=%all_cmake_flags:\=/% + + +mkdir build64 +cd build64 +call :do_generate_profile || exit /b 1 +cmake -GNinja %cmake_flags% %cmake_profile_flags% %llvm_src%\llvm || exit /b 1 +ninja || ninja || ninja || exit /b 1 +ninja check-llvm || ninja check-llvm || ninja check-llvm || exit /b 1 +ninja check-clang || ninja check-clang || ninja check-clang || exit /b 1 +ninja check-lld || ninja check-lld || ninja check-lld || exit /b 1 +ninja check-sanitizer || ninja check-sanitizer || ninja check-sanitizer || exit /b 1 +ninja check-clang-tools || ninja check-clang-tools || ninja check-clang-tools || exit /b 1 +ninja check-clangd || ninja check-clangd || ninja check-clangd || exit /b 1 +ninja package || exit /b 1 + +:: generate tarball with install toolchain only off +set filename=clang+llvm-%version%-x86_64-pc-windows-msvc +cmake -GNinja %cmake_flags% %cmake_profile_flags% -DLLVM_INSTALL_TOOLCHAIN_ONLY=OFF ^ + -DCMAKE_INSTALL_PREFIX=%build_dir%/%filename% ..\llvm-project\llvm || exit /b 1 +ninja install || exit /b 1 +:: check llvm_config is present & returns something +%build_dir%/%filename%/bin/llvm-config.exe --bindir || exit /b 1 +cd .. +7z a -ttar -so %filename%.tar %filename% | 7z a -txz -si %filename%.tar.xz + +exit /b 0 +::============================================================================== + +::============================================================================== +:: Build arm64 binaries. +::============================================================================== +:do_build_arm64 +call :set_environment %pythonarm64_dir% || exit /b 1 +call "%vsdevcmd%" -host_arch=x64 -arch=arm64 || exit /b 1 + at echo on +mkdir build_arm64_stage0 +cd build_arm64_stage0 +call :do_build_libxml || exit /b 1 + +REM Stage0 binaries directory; used in stage1. +set "stage0_bin_dir=%build_dir%/build_arm64_stage0/bin" +set cmake_flags=^ + %common_cmake_flags% ^ + -DCLANG_DEFAULT_LINKER=lld ^ + -DLIBXML2_INCLUDE_DIR=%libxmldir%/include/libxml2 ^ + -DLIBXML2_LIBRARIES=%libxmldir%/lib/libxml2s.lib ^ + -DPython3_ROOT_DIR=%PYTHONHOME% ^ + -DCOMPILER_RT_BUILD_PROFILE=OFF ^ + -DCOMPILER_RT_BUILD_SANITIZERS=OFF + +REM We need to build stage0 compiler-rt with clang-cl (msvc lacks some builtins). +cmake -GNinja %cmake_flags% ^ + -DCMAKE_C_COMPILER=clang-cl.exe ^ + -DCMAKE_CXX_COMPILER=clang-cl.exe ^ + %llvm_src%\llvm || exit /b 1 +ninja || exit /b 1 +::ninja check-llvm || exit /b 1 +::ninja check-clang || exit /b 1 +::ninja check-lld || exit /b 1 +::ninja check-sanitizer || exit /b 1 +::ninja check-clang-tools || exit /b 1 +::ninja check-clangd || exit /b 1 +cd.. + +REM CMake expects the paths that specifies the compiler and linker to be +REM with forward slash. +REM CPACK_SYSTEM_NAME is set to have a correct name for installer generated. +set all_cmake_flags=^ + %cmake_flags% ^ + -DCMAKE_C_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ + -DCMAKE_CXX_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ + -DCMAKE_LINKER=%stage0_bin_dir%/lld-link.exe ^ + -DCMAKE_AR=%stage0_bin_dir%/llvm-lib.exe ^ + -DCMAKE_RC=%stage0_bin_dir%/llvm-windres.exe ^ + -DCPACK_SYSTEM_NAME=woa64 +set cmake_flags=%all_cmake_flags:\=/% + +mkdir build_arm64 +cd build_arm64 +cmake -GNinja %cmake_flags% %llvm_src%\llvm || exit /b 1 +ninja || exit /b 1 +REM Check but do not fail on errors. +ninja check-lldb +::ninja check-llvm || exit /b 1 +::ninja check-clang || exit /b 1 +::ninja check-lld || exit /b 1 +::ninja check-sanitizer || exit /b 1 +::ninja check-clang-tools || exit /b 1 +::ninja check-clangd || exit /b 1 +ninja package || exit /b 1 +cd .. + +exit /b 0 +::============================================================================== +:: +::============================================================================== +:: Set PATH and some environment variables. +::============================================================================== +:set_environment +REM Restore original path +set PATH=%OLDPATH% + +set python_dir=%1 + +REM Set Python environment +if "%local-python%" == "true" ( + FOR /F "delims=" %%i IN ('where python.exe ^| head -1') DO set python_exe=%%i + set PYTHONHOME=!python_exe:~0,-11! +) else ( + %python_dir%/python.exe --version || exit /b 1 + set PYTHONHOME=%python_dir% +) +set PATH=%PYTHONHOME%;%PATH% + +set "VSCMD_START_DIR=%build_dir%" + +exit /b 0 + +::============================================================================= + +::============================================================================== +:: Build libxml. +::============================================================================== +:do_build_libxml +mkdir libxmlbuild +cd libxmlbuild +cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=install ^ + -DBUILD_SHARED_LIBS=OFF -DLIBXML2_WITH_C14N=OFF -DLIBXML2_WITH_CATALOG=OFF ^ + -DLIBXML2_WITH_DEBUG=OFF -DLIBXML2_WITH_DOCB=OFF -DLIBXML2_WITH_FTP=OFF ^ + -DLIBXML2_WITH_HTML=OFF -DLIBXML2_WITH_HTTP=OFF -DLIBXML2_WITH_ICONV=OFF ^ + -DLIBXML2_WITH_ICU=OFF -DLIBXML2_WITH_ISO8859X=OFF -DLIBXML2_WITH_LEGACY=OFF ^ + -DLIBXML2_WITH_LZMA=OFF -DLIBXML2_WITH_MEM_DEBUG=OFF -DLIBXML2_WITH_MODULES=OFF ^ + -DLIBXML2_WITH_OUTPUT=ON -DLIBXML2_WITH_PATTERN=OFF -DLIBXML2_WITH_PROGRAMS=OFF ^ + -DLIBXML2_WITH_PUSH=OFF -DLIBXML2_WITH_PYTHON=OFF -DLIBXML2_WITH_READER=OFF ^ + -DLIBXML2_WITH_REGEXPS=OFF -DLIBXML2_WITH_RUN_DEBUG=OFF -DLIBXML2_WITH_SAX1=OFF ^ + -DLIBXML2_WITH_SCHEMAS=OFF -DLIBXML2_WITH_SCHEMATRON=OFF -DLIBXML2_WITH_TESTS=OFF ^ + -DLIBXML2_WITH_THREADS=ON -DLIBXML2_WITH_THREAD_ALLOC=OFF -DLIBXML2_WITH_TREE=ON ^ + -DLIBXML2_WITH_VALID=OFF -DLIBXML2_WITH_WRITER=OFF -DLIBXML2_WITH_XINCLUDE=OFF ^ + -DLIBXML2_WITH_XPATH=OFF -DLIBXML2_WITH_XPTR=OFF -DLIBXML2_WITH_ZLIB=OFF ^ + -DCMAKE_MSVC_RUNTIME_LIBRARY=MultiThreaded ^ + ../../libxml2-v2.9.12 || exit /b 1 +ninja install || exit /b 1 +set libxmldir=%cd%\install +set "libxmldir=%libxmldir:\=/%" +cd .. +exit /b 0 + +::============================================================================== +:: Generate a PGO profile. +::============================================================================== +:do_generate_profile +REM Build Clang with instrumentation. +mkdir instrument +cd instrument +cmake -GNinja %cmake_flags% -DLLVM_TARGETS_TO_BUILD=Native ^ + -DLLVM_BUILD_INSTRUMENTED=IR %llvm_src%\llvm || exit /b 1 +ninja clang || ninja clang || ninja clang || exit /b 1 +set instrumented_clang=%cd:\=/%/bin/clang-cl.exe +cd .. +REM Use that to build part of llvm to generate a profile. +mkdir train +cd train +cmake -GNinja %cmake_flags% ^ + -DCMAKE_C_COMPILER=%instrumented_clang% ^ + -DCMAKE_CXX_COMPILER=%instrumented_clang% ^ + -DLLVM_ENABLE_PROJECTS=clang ^ + -DLLVM_TARGETS_TO_BUILD=Native ^ + %llvm_src%\llvm || exit /b 1 +REM Drop profiles generated from running cmake; those are not representative. +del ..\instrument\profiles\*.profraw +ninja tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/Sema.cpp.obj +cd .. +set profile=%cd:\=/%/profile.profdata +%stage0_bin_dir%\llvm-profdata merge -output=%profile% instrument\profiles\*.profraw || exit /b 1 +set common_compiler_flags=%common_compiler_flags% -Wno-backend-plugin +set cmake_profile_flags=-DLLVM_PROFDATA_FILE=%profile% ^ + -DCMAKE_C_FLAGS="%common_compiler_flags%" ^ + -DCMAKE_CXX_FLAGS="%common_compiler_flags%" +exit /b 0 + +::============================================================================= +:: Parse command line arguments. +:: The format for the arguments is: +:: Boolean: --option +:: Value: --optionvalue +:: with being: space, colon, semicolon or equal sign +:: +:: Command line usage example: +:: my-batch-file.bat --build --type=release --version 123 +:: It will create 3 variables: +:: 'build' with the value 'true' +:: 'type' with the value 'release' +:: 'version' with the value '123' +:: +:: Usage: +:: set "build=" +:: set "type=" +:: set "version=" +:: +:: REM Parse arguments. +:: call :parse_args %* +:: +:: if defined build ( +:: ... +:: ) +:: if %type%=='release' ( +:: ... +:: ) +:: if %version%=='123' ( +:: ... +:: ) +::============================================================================= +:parse_args + set "arg_name=" + :parse_args_start + if "%1" == "" ( + :: Set a seen boolean argument. + if "%arg_name%" neq "" ( + set "%arg_name%=true" + ) + goto :parse_args_done + ) + set aux=%1 + if "%aux:~0,2%" == "--" ( + :: Set a seen boolean argument. + if "%arg_name%" neq "" ( + set "%arg_name%=true" + ) + set "arg_name=%aux:~2,250%" + ) else ( + set "%arg_name%=%1" + set "arg_name=" + ) + shift + goto :parse_args_start + +:parse_args_done +exit /b 0 diff --git a/openmp/runtime/doc/doxygen/config b/openmp/runtime/doc/doxygen/config index 04c966766ba6ef..8d79dc143cc1a0 100644 --- a/openmp/runtime/doc/doxygen/config +++ b/openmp/runtime/doc/doxygen/config @@ -1,1822 +1,1822 @@ -# Doxyfile 1.o8.2 - -# This file describes the settings to be used by the documentation system -# doxygen (www.doxygen.org) for a project. -# -# All text after a hash (#) is considered a comment and will be ignored. -# The format is: -# TAG = value [value, ...] -# For lists items can also be appended using: -# TAG += value [value, ...] -# Values that contain spaces should be placed between quotes (" "). - -#--------------------------------------------------------------------------- -# Project related configuration options -#--------------------------------------------------------------------------- - -# This tag specifies the encoding used for all characters in the config file -# that follow. The default is UTF-8 which is also the encoding used for all -# text before the first occurrence of this tag. Doxygen uses libiconv (or the -# iconv built into libc) for the transcoding. See -# http://www.gnu.org/software/libiconv for the list of possible encodings. - -DOXYFILE_ENCODING = UTF-8 - -# The PROJECT_NAME tag is a single word (or sequence of words) that should -# identify the project. Note that if you do not use Doxywizard you need -# to put quotes around the project name if it contains spaces. - -PROJECT_NAME = "LLVM OpenMP* Runtime Library" - -# The PROJECT_NUMBER tag can be used to enter a project or revision number. -# This could be handy for archiving the generated documentation or -# if some version control system is used. - -PROJECT_NUMBER = - -# Using the PROJECT_BRIEF tag one can provide an optional one line description -# for a project that appears at the top of each page and should give viewer -# a quick idea about the purpose of the project. Keep the description short. - -PROJECT_BRIEF = - -# With the PROJECT_LOGO tag one can specify an logo or icon that is -# included in the documentation. The maximum height of the logo should not -# exceed 55 pixels and the maximum width should not exceed 200 pixels. -# Doxygen will copy the logo to the output directory. - -PROJECT_LOGO = - -# The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute) -# base path where the generated documentation will be put. -# If a relative path is entered, it will be relative to the location -# where doxygen was started. If left blank the current directory will be used. - -OUTPUT_DIRECTORY = doc/doxygen/generated - -# If the CREATE_SUBDIRS tag is set to YES, then doxygen will create -# 4096 sub-directories (in 2 levels) under the output directory of each output -# format and will distribute the generated files over these directories. -# Enabling this option can be useful when feeding doxygen a huge amount of -# source files, where putting all generated files in the same directory would -# otherwise cause performance problems for the file system. - -CREATE_SUBDIRS = NO - -# The OUTPUT_LANGUAGE tag is used to specify the language in which all -# documentation generated by doxygen is written. Doxygen will use this -# information to generate all constant output in the proper language. -# The default language is English, other supported languages are: -# Afrikaans, Arabic, Brazilian, Catalan, Chinese, Chinese-Traditional, -# Croatian, Czech, Danish, Dutch, Esperanto, Farsi, Finnish, French, German, -# Greek, Hungarian, Italian, Japanese, Japanese-en (Japanese with English -# messages), Korean, Korean-en, Lithuanian, Norwegian, Macedonian, Persian, -# Polish, Portuguese, Romanian, Russian, Serbian, Serbian-Cyrillic, Slovak, -# Slovene, Spanish, Swedish, Ukrainian, and Vietnamese. - -OUTPUT_LANGUAGE = English - -# If the BRIEF_MEMBER_DESC tag is set to YES (the default) Doxygen will -# include brief member descriptions after the members that are listed in -# the file and class documentation (similar to JavaDoc). -# Set to NO to disable this. - -BRIEF_MEMBER_DESC = YES - -# If the REPEAT_BRIEF tag is set to YES (the default) Doxygen will prepend -# the brief description of a member or function before the detailed description. -# Note: if both HIDE_UNDOC_MEMBERS and BRIEF_MEMBER_DESC are set to NO, the -# brief descriptions will be completely suppressed. - -REPEAT_BRIEF = YES - -# This tag implements a quasi-intelligent brief description abbreviator -# that is used to form the text in various listings. Each string -# in this list, if found as the leading text of the brief description, will be -# stripped from the text and the result after processing the whole list, is -# used as the annotated text. Otherwise, the brief description is used as-is. -# If left blank, the following values are used ("$name" is automatically -# replaced with the name of the entity): "The $name class" "The $name widget" -# "The $name file" "is" "provides" "specifies" "contains" -# "represents" "a" "an" "the" - -ABBREVIATE_BRIEF = - -# If the ALWAYS_DETAILED_SEC and REPEAT_BRIEF tags are both set to YES then -# Doxygen will generate a detailed section even if there is only a brief -# description. - -ALWAYS_DETAILED_SEC = NO - -# If the INLINE_INHERITED_MEMB tag is set to YES, doxygen will show all -# inherited members of a class in the documentation of that class as if those -# members were ordinary class members. Constructors, destructors and assignment -# operators of the base classes will not be shown. - -INLINE_INHERITED_MEMB = NO - -# If the FULL_PATH_NAMES tag is set to YES then Doxygen will prepend the full -# path before files name in the file list and in the header files. If set -# to NO the shortest path that makes the file name unique will be used. - -FULL_PATH_NAMES = NO - -# If the FULL_PATH_NAMES tag is set to YES then the STRIP_FROM_PATH tag -# can be used to strip a user-defined part of the path. Stripping is -# only done if one of the specified strings matches the left-hand part of -# the path. The tag can be used to show relative paths in the file list. -# If left blank the directory from which doxygen is run is used as the -# path to strip. Note that you specify absolute paths here, but also -# relative paths, which will be relative from the directory where doxygen is -# started. - -STRIP_FROM_PATH = - -# The STRIP_FROM_INC_PATH tag can be used to strip a user-defined part of -# the path mentioned in the documentation of a class, which tells -# the reader which header file to include in order to use a class. -# If left blank only the name of the header file containing the class -# definition is used. Otherwise one should specify the include paths that -# are normally passed to the compiler using the -I flag. - -STRIP_FROM_INC_PATH = - -# If the SHORT_NAMES tag is set to YES, doxygen will generate much shorter -# (but less readable) file names. This can be useful if your file system -# doesn't support long names like on DOS, Mac, or CD-ROM. - -SHORT_NAMES = NO - -# If the JAVADOC_AUTOBRIEF tag is set to YES then Doxygen -# will interpret the first line (until the first dot) of a JavaDoc-style -# comment as the brief description. If set to NO, the JavaDoc -# comments will behave just like regular Qt-style comments -# (thus requiring an explicit @brief command for a brief description.) - -JAVADOC_AUTOBRIEF = NO - -# If the QT_AUTOBRIEF tag is set to YES then Doxygen will -# interpret the first line (until the first dot) of a Qt-style -# comment as the brief description. If set to NO, the comments -# will behave just like regular Qt-style comments (thus requiring -# an explicit \brief command for a brief description.) - -QT_AUTOBRIEF = NO - -# The MULTILINE_CPP_IS_BRIEF tag can be set to YES to make Doxygen -# treat a multi-line C++ special comment block (i.e. a block of //! or /// -# comments) as a brief description. This used to be the default behaviour. -# The new default is to treat a multi-line C++ comment block as a detailed -# description. Set this tag to YES if you prefer the old behaviour instead. - -MULTILINE_CPP_IS_BRIEF = NO - -# If the INHERIT_DOCS tag is set to YES (the default) then an undocumented -# member inherits the documentation from any documented member that it -# re-implements. - -INHERIT_DOCS = YES - -# If the SEPARATE_MEMBER_PAGES tag is set to YES, then doxygen will produce -# a new page for each member. If set to NO, the documentation of a member will -# be part of the file/class/namespace that contains it. - -SEPARATE_MEMBER_PAGES = NO - -# The TAB_SIZE tag can be used to set the number of spaces in a tab. -# Doxygen uses this value to replace tabs by spaces in code fragments. - -TAB_SIZE = 8 - -# This tag can be used to specify a number of aliases that acts -# as commands in the documentation. An alias has the form "name=value". -# For example adding "sideeffect=\par Side Effects:\n" will allow you to -# put the command \sideeffect (or @sideeffect) in the documentation, which -# will result in a user-defined paragraph with heading "Side Effects:". -# You can put \n's in the value part of an alias to insert newlines. - -ALIASES = "other=*" - -# This tag can be used to specify a number of word-keyword mappings (TCL only). -# A mapping has the form "name=value". For example adding -# "class=itcl::class" will allow you to use the command class in the -# itcl::class meaning. - -TCL_SUBST = - -# Set the OPTIMIZE_OUTPUT_FOR_C tag to YES if your project consists of C -# sources only. Doxygen will then generate output that is more tailored for C. -# For instance, some of the names that are used will be different. The list -# of all members will be omitted, etc. - -OPTIMIZE_OUTPUT_FOR_C = NO - -# Set the OPTIMIZE_OUTPUT_JAVA tag to YES if your project consists of Java -# sources only. Doxygen will then generate output that is more tailored for -# Java. For instance, namespaces will be presented as packages, qualified -# scopes will look different, etc. - -OPTIMIZE_OUTPUT_JAVA = NO - -# Set the OPTIMIZE_FOR_FORTRAN tag to YES if your project consists of Fortran -# sources only. Doxygen will then generate output that is more tailored for -# Fortran. - -OPTIMIZE_FOR_FORTRAN = NO - -# Set the OPTIMIZE_OUTPUT_VHDL tag to YES if your project consists of VHDL -# sources. Doxygen will then generate output that is tailored for -# VHDL. - -OPTIMIZE_OUTPUT_VHDL = NO - -# Doxygen selects the parser to use depending on the extension of the files it -# parses. With this tag you can assign which parser to use for a given -# extension. Doxygen has a built-in mapping, but you can override or extend it -# using this tag. The format is ext=language, where ext is a file extension, -# and language is one of the parsers supported by doxygen: IDL, Java, -# Javascript, CSharp, C, C++, D, PHP, Objective-C, Python, Fortran, VHDL, C, -# C++. For instance to make doxygen treat .inc files as Fortran files (default -# is PHP), and .f files as C (default is Fortran), use: inc=Fortran f=C. Note -# that for custom extensions you also need to set FILE_PATTERNS otherwise the -# files are not read by doxygen. - -EXTENSION_MAPPING = - -# If MARKDOWN_SUPPORT is enabled (the default) then doxygen pre-processes all -# comments according to the Markdown format, which allows for more readable -# documentation. See http://daringfireball.net/projects/markdown/ for details. -# The output of markdown processing is further processed by doxygen, so you -# can mix doxygen, HTML, and XML commands with Markdown formatting. -# Disable only in case of backward compatibilities issues. - -MARKDOWN_SUPPORT = YES - -# When enabled doxygen tries to link words that correspond to documented classes, -# or namespaces to their corresponding documentation. Such a link can be -# prevented in individual cases by by putting a % sign in front of the word or -# globally by setting AUTOLINK_SUPPORT to NO. - -AUTOLINK_SUPPORT = YES - -# If you use STL classes (i.e. std::string, std::vector, etc.) but do not want -# to include (a tag file for) the STL sources as input, then you should -# set this tag to YES in order to let doxygen match functions declarations and -# definitions whose arguments contain STL classes (e.g. func(std::string); v.s. -# func(std::string) {}). This also makes the inheritance and collaboration -# diagrams that involve STL classes more complete and accurate. - -BUILTIN_STL_SUPPORT = NO - -# If you use Microsoft's C++/CLI language, you should set this option to YES to -# enable parsing support. - -CPP_CLI_SUPPORT = NO - -# Set the SIP_SUPPORT tag to YES if your project consists of sip sources only. -# Doxygen will parse them like normal C++ but will assume all classes use public -# instead of private inheritance when no explicit protection keyword is present. - -SIP_SUPPORT = NO - -# For Microsoft's IDL there are propget and propput attributes to -# indicate getter and setter methods for a property. Setting this -# option to YES (the default) will make doxygen replace the get and -# set methods by a property in the documentation. This will only work -# if the methods are indeed getting or setting a simple type. If this -# is not the case, or you want to show the methods anyway, you should -# set this option to NO. - -IDL_PROPERTY_SUPPORT = YES - -# If member grouping is used in the documentation and the DISTRIBUTE_GROUP_DOC -# tag is set to YES, then doxygen will reuse the documentation of the first -# member in the group (if any) for the other members of the group. By default -# all members of a group must be documented explicitly. - -DISTRIBUTE_GROUP_DOC = NO - -# Set the SUBGROUPING tag to YES (the default) to allow class member groups of -# the same type (for instance a group of public functions) to be put as a -# subgroup of that type (e.g. under the Public Functions section). Set it to -# NO to prevent subgrouping. Alternatively, this can be done per class using -# the \nosubgrouping command. - -SUBGROUPING = YES - -# When the INLINE_GROUPED_CLASSES tag is set to YES, classes, structs and -# unions are shown inside the group in which they are included (e.g. using -# @ingroup) instead of on a separate page (for HTML and Man pages) or -# section (for LaTeX and RTF). - -INLINE_GROUPED_CLASSES = NO - -# When the INLINE_SIMPLE_STRUCTS tag is set to YES, structs, classes, and -# unions with only public data fields will be shown inline in the documentation -# of the scope in which they are defined (i.e. file, namespace, or group -# documentation), provided this scope is documented. If set to NO (the default), -# structs, classes, and unions are shown on a separate page (for HTML and Man -# pages) or section (for LaTeX and RTF). - -INLINE_SIMPLE_STRUCTS = NO - -# When TYPEDEF_HIDES_STRUCT is enabled, a typedef of a struct, union, or enum -# is documented as struct, union, or enum with the name of the typedef. So -# typedef struct TypeS {} TypeT, will appear in the documentation as a struct -# with name TypeT. When disabled the typedef will appear as a member of a file, -# namespace, or class. And the struct will be named TypeS. This can typically -# be useful for C code in case the coding convention dictates that all compound -# types are typedef'ed and only the typedef is referenced, never the tag name. - -TYPEDEF_HIDES_STRUCT = NO - -# The SYMBOL_CACHE_SIZE determines the size of the internal cache use to -# determine which symbols to keep in memory and which to flush to disk. -# When the cache is full, less often used symbols will be written to disk. -# For small to medium size projects (<1000 input files) the default value is -# probably good enough. For larger projects a too small cache size can cause -# doxygen to be busy swapping symbols to and from disk most of the time -# causing a significant performance penalty. -# If the system has enough physical memory increasing the cache will improve the -# performance by keeping more symbols in memory. Note that the value works on -# a logarithmic scale so increasing the size by one will roughly double the -# memory usage. The cache size is given by this formula: -# 2^(16+SYMBOL_CACHE_SIZE). The valid range is 0..9, the default is 0, -# corresponding to a cache size of 2^16 = 65536 symbols. - -SYMBOL_CACHE_SIZE = 0 - -# Similar to the SYMBOL_CACHE_SIZE the size of the symbol lookup cache can be -# set using LOOKUP_CACHE_SIZE. This cache is used to resolve symbols given -# their name and scope. Since this can be an expensive process and often the -# same symbol appear multiple times in the code, doxygen keeps a cache of -# pre-resolved symbols. If the cache is too small doxygen will become slower. -# If the cache is too large, memory is wasted. The cache size is given by this -# formula: 2^(16+LOOKUP_CACHE_SIZE). The valid range is 0..9, the default is 0, -# corresponding to a cache size of 2^16 = 65536 symbols. - -LOOKUP_CACHE_SIZE = 0 - -#--------------------------------------------------------------------------- -# Build related configuration options -#--------------------------------------------------------------------------- - -# If the EXTRACT_ALL tag is set to YES doxygen will assume all entities in -# documentation are documented, even if no documentation was available. -# Private class members and static file members will be hidden unless -# the EXTRACT_PRIVATE and EXTRACT_STATIC tags are set to YES - -EXTRACT_ALL = NO - -# If the EXTRACT_PRIVATE tag is set to YES all private members of a class -# will be included in the documentation. - -EXTRACT_PRIVATE = YES - -# If the EXTRACT_PACKAGE tag is set to YES all members with package or internal -# scope will be included in the documentation. - -EXTRACT_PACKAGE = NO - -# If the EXTRACT_STATIC tag is set to YES all static members of a file -# will be included in the documentation. - -EXTRACT_STATIC = YES - -# If the EXTRACT_LOCAL_CLASSES tag is set to YES classes (and structs) -# defined locally in source files will be included in the documentation. -# If set to NO only classes defined in header files are included. - -EXTRACT_LOCAL_CLASSES = YES - -# This flag is only useful for Objective-C code. When set to YES local -# methods, which are defined in the implementation section but not in -# the interface are included in the documentation. -# If set to NO (the default) only methods in the interface are included. - -EXTRACT_LOCAL_METHODS = NO - -# If this flag is set to YES, the members of anonymous namespaces will be -# extracted and appear in the documentation as a namespace called -# 'anonymous_namespace{file}', where file will be replaced with the base -# name of the file that contains the anonymous namespace. By default -# anonymous namespaces are hidden. - -EXTRACT_ANON_NSPACES = NO - -# If the HIDE_UNDOC_MEMBERS tag is set to YES, Doxygen will hide all -# undocumented members of documented classes, files or namespaces. -# If set to NO (the default) these members will be included in the -# various overviews, but no documentation section is generated. -# This option has no effect if EXTRACT_ALL is enabled. - -HIDE_UNDOC_MEMBERS = YES - -# If the HIDE_UNDOC_CLASSES tag is set to YES, Doxygen will hide all -# undocumented classes that are normally visible in the class hierarchy. -# If set to NO (the default) these classes will be included in the various -# overviews. This option has no effect if EXTRACT_ALL is enabled. - -HIDE_UNDOC_CLASSES = YES - -# If the HIDE_FRIEND_COMPOUNDS tag is set to YES, Doxygen will hide all -# friend (class|struct|union) declarations. -# If set to NO (the default) these declarations will be included in the -# documentation. - -HIDE_FRIEND_COMPOUNDS = NO - -# If the HIDE_IN_BODY_DOCS tag is set to YES, Doxygen will hide any -# documentation blocks found inside the body of a function. -# If set to NO (the default) these blocks will be appended to the -# function's detailed documentation block. - -HIDE_IN_BODY_DOCS = NO - -# The INTERNAL_DOCS tag determines if documentation -# that is typed after a \internal command is included. If the tag is set -# to NO (the default) then the documentation will be excluded. -# Set it to YES to include the internal documentation. - -INTERNAL_DOCS = NO - -# If the CASE_SENSE_NAMES tag is set to NO then Doxygen will only generate -# file names in lower-case letters. If set to YES upper-case letters are also -# allowed. This is useful if you have classes or files whose names only differ -# in case and if your file system supports case sensitive file names. Windows -# and Mac users are advised to set this option to NO. - -CASE_SENSE_NAMES = YES - -# If the HIDE_SCOPE_NAMES tag is set to NO (the default) then Doxygen -# will show members with their full class and namespace scopes in the -# documentation. If set to YES the scope will be hidden. - -HIDE_SCOPE_NAMES = NO - -# If the SHOW_INCLUDE_FILES tag is set to YES (the default) then Doxygen -# will put a list of the files that are included by a file in the documentation -# of that file. - -SHOW_INCLUDE_FILES = YES - -# If the FORCE_LOCAL_INCLUDES tag is set to YES then Doxygen -# will list include files with double quotes in the documentation -# rather than with sharp brackets. - -FORCE_LOCAL_INCLUDES = NO - -# If the INLINE_INFO tag is set to YES (the default) then a tag [inline] -# is inserted in the documentation for inline members. - -INLINE_INFO = YES - -# If the SORT_MEMBER_DOCS tag is set to YES (the default) then doxygen -# will sort the (detailed) documentation of file and class members -# alphabetically by member name. If set to NO the members will appear in -# declaration order. - -SORT_MEMBER_DOCS = YES - -# If the SORT_BRIEF_DOCS tag is set to YES then doxygen will sort the -# brief documentation of file, namespace and class members alphabetically -# by member name. If set to NO (the default) the members will appear in -# declaration order. - -SORT_BRIEF_DOCS = NO - -# If the SORT_MEMBERS_CTORS_1ST tag is set to YES then doxygen -# will sort the (brief and detailed) documentation of class members so that -# constructors and destructors are listed first. If set to NO (the default) -# the constructors will appear in the respective orders defined by -# SORT_MEMBER_DOCS and SORT_BRIEF_DOCS. -# This tag will be ignored for brief docs if SORT_BRIEF_DOCS is set to NO -# and ignored for detailed docs if SORT_MEMBER_DOCS is set to NO. - -SORT_MEMBERS_CTORS_1ST = NO - -# If the SORT_GROUP_NAMES tag is set to YES then doxygen will sort the -# hierarchy of group names into alphabetical order. If set to NO (the default) -# the group names will appear in their defined order. - -SORT_GROUP_NAMES = NO - -# If the SORT_BY_SCOPE_NAME tag is set to YES, the class list will be -# sorted by fully-qualified names, including namespaces. If set to -# NO (the default), the class list will be sorted only by class name, -# not including the namespace part. -# Note: This option is not very useful if HIDE_SCOPE_NAMES is set to YES. -# Note: This option applies only to the class list, not to the -# alphabetical list. - -SORT_BY_SCOPE_NAME = NO - -# If the STRICT_PROTO_MATCHING option is enabled and doxygen fails to -# do proper type resolution of all parameters of a function it will reject a -# match between the prototype and the implementation of a member function even -# if there is only one candidate or it is obvious which candidate to choose -# by doing a simple string match. By disabling STRICT_PROTO_MATCHING doxygen -# will still accept a match between prototype and implementation in such cases. - -STRICT_PROTO_MATCHING = NO - -# The GENERATE_TODOLIST tag can be used to enable (YES) or -# disable (NO) the todo list. This list is created by putting \todo -# commands in the documentation. - -GENERATE_TODOLIST = YES - -# The GENERATE_TESTLIST tag can be used to enable (YES) or -# disable (NO) the test list. This list is created by putting \test -# commands in the documentation. - -GENERATE_TESTLIST = YES - -# The GENERATE_BUGLIST tag can be used to enable (YES) or -# disable (NO) the bug list. This list is created by putting \bug -# commands in the documentation. - -GENERATE_BUGLIST = YES - -# The GENERATE_DEPRECATEDLIST tag can be used to enable (YES) or -# disable (NO) the deprecated list. This list is created by putting -# \deprecated commands in the documentation. - -GENERATE_DEPRECATEDLIST= YES - -# The ENABLED_SECTIONS tag can be used to enable conditional -# documentation sections, marked by \if sectionname ... \endif. - -ENABLED_SECTIONS = - -# The MAX_INITIALIZER_LINES tag determines the maximum number of lines -# the initial value of a variable or macro consists of for it to appear in -# the documentation. If the initializer consists of more lines than specified -# here it will be hidden. Use a value of 0 to hide initializers completely. -# The appearance of the initializer of individual variables and macros in the -# documentation can be controlled using \showinitializer or \hideinitializer -# command in the documentation regardless of this setting. - -MAX_INITIALIZER_LINES = 30 - -# Set the SHOW_USED_FILES tag to NO to disable the list of files generated -# at the bottom of the documentation of classes and structs. If set to YES the -# list will mention the files that were used to generate the documentation. - -SHOW_USED_FILES = YES - -# Set the SHOW_FILES tag to NO to disable the generation of the Files page. -# This will remove the Files entry from the Quick Index and from the -# Folder Tree View (if specified). The default is YES. - -# We probably will want this, but we have no file documentation yet so it's simpler to remove -# it for now. -SHOW_FILES = NO - -# Set the SHOW_NAMESPACES tag to NO to disable the generation of the -# Namespaces page. -# This will remove the Namespaces entry from the Quick Index -# and from the Folder Tree View (if specified). The default is YES. - -SHOW_NAMESPACES = YES - -# The FILE_VERSION_FILTER tag can be used to specify a program or script that -# doxygen should invoke to get the current version for each file (typically from -# the version control system). Doxygen will invoke the program by executing (via -# popen()) the command , where is the value of -# the FILE_VERSION_FILTER tag, and is the name of an input file -# provided by doxygen. Whatever the program writes to standard output -# is used as the file version. See the manual for examples. - -FILE_VERSION_FILTER = - -# The LAYOUT_FILE tag can be used to specify a layout file which will be parsed -# by doxygen. The layout file controls the global structure of the generated -# output files in an output format independent way. To create the layout file -# that represents doxygen's defaults, run doxygen with the -l option. -# You can optionally specify a file name after the option, if omitted -# DoxygenLayout.xml will be used as the name of the layout file. - -LAYOUT_FILE = - -# The CITE_BIB_FILES tag can be used to specify one or more bib files -# containing the references data. This must be a list of .bib files. The -# .bib extension is automatically appended if omitted. Using this command -# requires the bibtex tool to be installed. See also -# http://en.wikipedia.org/wiki/BibTeX for more info. For LaTeX the style -# of the bibliography can be controlled using LATEX_BIB_STYLE. To use this -# feature you need bibtex and perl available in the search path. - -CITE_BIB_FILES = - -#--------------------------------------------------------------------------- -# configuration options related to warning and progress messages -#--------------------------------------------------------------------------- - -# The QUIET tag can be used to turn on/off the messages that are generated -# by doxygen. Possible values are YES and NO. If left blank NO is used. - -QUIET = NO - -# The WARNINGS tag can be used to turn on/off the warning messages that are -# generated by doxygen. Possible values are YES and NO. If left blank -# NO is used. - -WARNINGS = YES - -# If WARN_IF_UNDOCUMENTED is set to YES, then doxygen will generate warnings -# for undocumented members. If EXTRACT_ALL is set to YES then this flag will -# automatically be disabled. - -WARN_IF_UNDOCUMENTED = YES - -# If WARN_IF_DOC_ERROR is set to YES, doxygen will generate warnings for -# potential errors in the documentation, such as not documenting some -# parameters in a documented function, or documenting parameters that -# don't exist or using markup commands wrongly. - -WARN_IF_DOC_ERROR = YES - -# The WARN_NO_PARAMDOC option can be enabled to get warnings for -# functions that are documented, but have no documentation for their parameters -# or return value. If set to NO (the default) doxygen will only warn about -# wrong or incomplete parameter documentation, but not about the absence of -# documentation. - -WARN_NO_PARAMDOC = NO - -# The WARN_FORMAT tag determines the format of the warning messages that -# doxygen can produce. The string should contain the $file, $line, and $text -# tags, which will be replaced by the file and line number from which the -# warning originated and the warning text. Optionally the format may contain -# $version, which will be replaced by the version of the file (if it could -# be obtained via FILE_VERSION_FILTER) - -WARN_FORMAT = - -# The WARN_LOGFILE tag can be used to specify a file to which warning -# and error messages should be written. If left blank the output is written -# to stderr. - -WARN_LOGFILE = - -#--------------------------------------------------------------------------- -# configuration options related to the input files -#--------------------------------------------------------------------------- - -# The INPUT tag can be used to specify the files and/or directories that contain -# documented source files. You may enter file names like "myfile.cpp" or -# directories like "/usr/src/myproject". Separate the files or directories -# with spaces. - -INPUT = src doc/doxygen/libomp_interface.h -# The ittnotify code also has doxygen documentation, but if we include it here -# it takes over from us! -# src/thirdparty/ittnotify - -# This tag can be used to specify the character encoding of the source files -# that doxygen parses. Internally doxygen uses the UTF-8 encoding, which is -# also the default input encoding. Doxygen uses libiconv (or the iconv built -# into libc) for the transcoding. See http://www.gnu.org/software/libiconv for -# the list of possible encodings. - -INPUT_ENCODING = UTF-8 - -# If the value of the INPUT tag contains directories, you can use the -# FILE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp -# and *.h) to filter out the source-files in the directories. If left -# blank the following patterns are tested: -# *.c *.cc *.cxx *.cpp *.c++ *.d *.java *.ii *.ixx *.ipp *.i++ *.inl *.h *.hh -# *.hxx *.hpp *.h++ *.idl *.odl *.cs *.php *.php3 *.inc *.m *.mm *.dox *.py -# *.f90 *.f *.for *.vhd *.vhdl - -FILE_PATTERNS = *.c *.h *.cpp -# We may also want to include the asm files with appropriate ifdef to ensure -# doxygen doesn't see the content, just the documentation... - -# The RECURSIVE tag can be used to turn specify whether or not subdirectories -# should be searched for input files as well. Possible values are YES and NO. -# If left blank NO is used. - -# Only look in the one directory. -RECURSIVE = NO - -# The EXCLUDE tag can be used to specify files and/or directories that should be -# excluded from the INPUT source files. This way you can easily exclude a -# subdirectory from a directory tree whose root is specified with the INPUT tag. -# Note that relative paths are relative to the directory from which doxygen is -# run. - -EXCLUDE = src/test-touch.c - -# The EXCLUDE_SYMLINKS tag can be used to select whether or not files or -# directories that are symbolic links (a Unix file system feature) are excluded -# from the input. - -EXCLUDE_SYMLINKS = NO - -# If the value of the INPUT tag contains directories, you can use the -# EXCLUDE_PATTERNS tag to specify one or more wildcard patterns to exclude -# certain files from those directories. Note that the wildcards are matched -# against the file with absolute path, so to exclude all test directories -# for example use the pattern */test/* - -EXCLUDE_PATTERNS = - -# The EXCLUDE_SYMBOLS tag can be used to specify one or more symbol names -# (namespaces, classes, functions, etc.) that should be excluded from the -# output. The symbol name can be a fully qualified name, a word, or if the -# wildcard * is used, a substring. Examples: ANamespace, AClass, -# AClass::ANamespace, ANamespace::*Test - -EXCLUDE_SYMBOLS = - -# The EXAMPLE_PATH tag can be used to specify one or more files or -# directories that contain example code fragments that are included (see -# the \include command). - -EXAMPLE_PATH = - -# If the value of the EXAMPLE_PATH tag contains directories, you can use the -# EXAMPLE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp -# and *.h) to filter out the source-files in the directories. If left -# blank all files are included. - -EXAMPLE_PATTERNS = - -# If the EXAMPLE_RECURSIVE tag is set to YES then subdirectories will be -# searched for input files to be used with the \include or \dontinclude -# commands irrespective of the value of the RECURSIVE tag. -# Possible values are YES and NO. If left blank NO is used. - -EXAMPLE_RECURSIVE = NO - -# The IMAGE_PATH tag can be used to specify one or more files or -# directories that contain image that are included in the documentation (see -# the \image command). - -IMAGE_PATH = - -# The INPUT_FILTER tag can be used to specify a program that doxygen should -# invoke to filter for each input file. Doxygen will invoke the filter program -# by executing (via popen()) the command , where -# is the value of the INPUT_FILTER tag, and is the name of an -# input file. Doxygen will then use the output that the filter program writes -# to standard output. -# If FILTER_PATTERNS is specified, this tag will be -# ignored. - -INPUT_FILTER = - -# The FILTER_PATTERNS tag can be used to specify filters on a per file pattern -# basis. -# Doxygen will compare the file name with each pattern and apply the -# filter if there is a match. -# The filters are a list of the form: -# pattern=filter (like *.cpp=my_cpp_filter). See INPUT_FILTER for further -# info on how filters are used. If FILTER_PATTERNS is empty or if -# non of the patterns match the file name, INPUT_FILTER is applied. - -FILTER_PATTERNS = - -# If the FILTER_SOURCE_FILES tag is set to YES, the input filter (if set using -# INPUT_FILTER) will be used to filter the input files when producing source -# files to browse (i.e. when SOURCE_BROWSER is set to YES). - -FILTER_SOURCE_FILES = NO - -# The FILTER_SOURCE_PATTERNS tag can be used to specify source filters per file -# pattern. A pattern will override the setting for FILTER_PATTERN (if any) -# and it is also possible to disable source filtering for a specific pattern -# using *.ext= (so without naming a filter). This option only has effect when -# FILTER_SOURCE_FILES is enabled. - -FILTER_SOURCE_PATTERNS = - -#--------------------------------------------------------------------------- -# configuration options related to source browsing -#--------------------------------------------------------------------------- - -# If the SOURCE_BROWSER tag is set to YES then a list of source files will -# be generated. Documented entities will be cross-referenced with these sources. -# Note: To get rid of all source code in the generated output, make sure also -# VERBATIM_HEADERS is set to NO. - -SOURCE_BROWSER = YES - -# Setting the INLINE_SOURCES tag to YES will include the body -# of functions and classes directly in the documentation. - -INLINE_SOURCES = NO - -# Setting the STRIP_CODE_COMMENTS tag to YES (the default) will instruct -# doxygen to hide any special comment blocks from generated source code -# fragments. Normal C, C++ and Fortran comments will always remain visible. - -STRIP_CODE_COMMENTS = YES - -# If the REFERENCED_BY_RELATION tag is set to YES -# then for each documented function all documented -# functions referencing it will be listed. - -REFERENCED_BY_RELATION = YES - -# If the REFERENCES_RELATION tag is set to YES -# then for each documented function all documented entities -# called/used by that function will be listed. - -REFERENCES_RELATION = NO - -# If the REFERENCES_LINK_SOURCE tag is set to YES (the default) -# and SOURCE_BROWSER tag is set to YES, then the hyperlinks from -# functions in REFERENCES_RELATION and REFERENCED_BY_RELATION lists will -# link to the source code. -# Otherwise they will link to the documentation. - -REFERENCES_LINK_SOURCE = YES - -# If the USE_HTAGS tag is set to YES then the references to source code -# will point to the HTML generated by the htags(1) tool instead of doxygen -# built-in source browser. The htags tool is part of GNU's global source -# tagging system (see http://www.gnu.org/software/global/global.html). You -# will need version 4.8.6 or higher. - -USE_HTAGS = NO - -# If the VERBATIM_HEADERS tag is set to YES (the default) then Doxygen -# will generate a verbatim copy of the header file for each class for -# which an include is specified. Set to NO to disable this. - -VERBATIM_HEADERS = YES - -#--------------------------------------------------------------------------- -# configuration options related to the alphabetical class index -#--------------------------------------------------------------------------- - -# If the ALPHABETICAL_INDEX tag is set to YES, an alphabetical index -# of all compounds will be generated. Enable this if the project -# contains a lot of classes, structs, unions or interfaces. - -ALPHABETICAL_INDEX = YES - -# If the alphabetical index is enabled (see ALPHABETICAL_INDEX) then -# the COLS_IN_ALPHA_INDEX tag can be used to specify the number of columns -# in which this list will be split (can be a number in the range [1..20]) - -COLS_IN_ALPHA_INDEX = 5 - -# In case all classes in a project start with a common prefix, all -# classes will be put under the same header in the alphabetical index. -# The IGNORE_PREFIX tag can be used to specify one or more prefixes that -# should be ignored while generating the index headers. - -IGNORE_PREFIX = - -#--------------------------------------------------------------------------- -# configuration options related to the HTML output -#--------------------------------------------------------------------------- - -# If the GENERATE_HTML tag is set to YES (the default) Doxygen will -# generate HTML output. - -GENERATE_HTML = YES - -# The HTML_OUTPUT tag is used to specify where the HTML docs will be put. -# If a relative path is entered the value of OUTPUT_DIRECTORY will be -# put in front of it. If left blank `html' will be used as the default path. - -HTML_OUTPUT = - -# The HTML_FILE_EXTENSION tag can be used to specify the file extension for -# each generated HTML page (for example: .htm,.php,.asp). If it is left blank -# doxygen will generate files with .html extension. - -HTML_FILE_EXTENSION = .html - -# The HTML_HEADER tag can be used to specify a personal HTML header for -# each generated HTML page. If it is left blank doxygen will generate a -# standard header. Note that when using a custom header you are responsible -# for the proper inclusion of any scripts and style sheets that doxygen -# needs, which is dependent on the configuration options used. -# It is advised to generate a default header using "doxygen -w html -# header.html footer.html stylesheet.css YourConfigFile" and then modify -# that header. Note that the header is subject to change so you typically -# have to redo this when upgrading to a newer version of doxygen or when -# changing the value of configuration settings such as GENERATE_TREEVIEW! - -HTML_HEADER = - -# The HTML_FOOTER tag can be used to specify a personal HTML footer for -# each generated HTML page. If it is left blank doxygen will generate a -# standard footer. - -HTML_FOOTER = - -# The HTML_STYLESHEET tag can be used to specify a user-defined cascading -# style sheet that is used by each HTML page. It can be used to -# fine-tune the look of the HTML output. If left blank doxygen will -# generate a default style sheet. Note that it is recommended to use -# HTML_EXTRA_STYLESHEET instead of this one, as it is more robust and this -# tag will in the future become obsolete. - -HTML_STYLESHEET = - -# The HTML_EXTRA_STYLESHEET tag can be used to specify an additional -# user-defined cascading style sheet that is included after the standard -# style sheets created by doxygen. Using this option one can overrule -# certain style aspects. This is preferred over using HTML_STYLESHEET -# since it does not replace the standard style sheet and is therefor more -# robust against future updates. Doxygen will copy the style sheet file to -# the output directory. - -HTML_EXTRA_STYLESHEET = - -# The HTML_EXTRA_FILES tag can be used to specify one or more extra images or -# other source files which should be copied to the HTML output directory. Note -# that these files will be copied to the base HTML output directory. Use the -# $relpath$ marker in the HTML_HEADER and/or HTML_FOOTER files to load these -# files. In the HTML_STYLESHEET file, use the file name only. Also note that -# the files will be copied as-is; there are no commands or markers available. - -HTML_EXTRA_FILES = - -# The HTML_COLORSTYLE_HUE tag controls the color of the HTML output. -# Doxygen will adjust the colors in the style sheet and background images -# according to this color. Hue is specified as an angle on a colorwheel, -# see http://en.wikipedia.org/wiki/Hue for more information. -# For instance the value 0 represents red, 60 is yellow, 120 is green, -# 180 is cyan, 240 is blue, 300 purple, and 360 is red again. -# The allowed range is 0 to 359. - -HTML_COLORSTYLE_HUE = 220 - -# The HTML_COLORSTYLE_SAT tag controls the purity (or saturation) of -# the colors in the HTML output. For a value of 0 the output will use -# grayscales only. A value of 255 will produce the most vivid colors. - -HTML_COLORSTYLE_SAT = 100 - -# The HTML_COLORSTYLE_GAMMA tag controls the gamma correction applied to -# the luminance component of the colors in the HTML output. Values below -# 100 gradually make the output lighter, whereas values above 100 make -# the output darker. The value divided by 100 is the actual gamma applied, -# so 80 represents a gamma of 0.8, The value 220 represents a gamma of 2.2, -# and 100 does not change the gamma. - -HTML_COLORSTYLE_GAMMA = 80 - -# If the HTML_TIMESTAMP tag is set to YES then the footer of each generated HTML -# page will contain the date and time when the page was generated. Setting -# this to NO can help when comparing the output of multiple runs. - -HTML_TIMESTAMP = NO - -# If the HTML_DYNAMIC_SECTIONS tag is set to YES then the generated HTML -# documentation will contain sections that can be hidden and shown after the -# page has loaded. - -HTML_DYNAMIC_SECTIONS = NO - -# With HTML_INDEX_NUM_ENTRIES one can control the preferred number of -# entries shown in the various tree structured indices initially; the user -# can expand and collapse entries dynamically later on. Doxygen will expand -# the tree to such a level that at most the specified number of entries are -# visible (unless a fully collapsed tree already exceeds this amount). -# So setting the number of entries 1 will produce a full collapsed tree by -# default. 0 is a special value representing an infinite number of entries -# and will result in a full expanded tree by default. - -HTML_INDEX_NUM_ENTRIES = 100 - -# If the GENERATE_DOCSET tag is set to YES, additional index files -# will be generated that can be used as input for Apple's Xcode 3 -# integrated development environment, introduced with OSX 10.5 (Leopard). -# To create a documentation set, doxygen will generate a Makefile in the -# HTML output directory. Running make will produce the docset in that -# directory and running "make install" will install the docset in -# ~/Library/Developer/Shared/Documentation/DocSets so that Xcode will find -# it at startup. -# See http://developer.apple.com/tools/creatingdocsetswithdoxygen.html -# for more information. - -GENERATE_DOCSET = NO - -# When GENERATE_DOCSET tag is set to YES, this tag determines the name of the -# feed. A documentation feed provides an umbrella under which multiple -# documentation sets from a single provider (such as a company or product suite) -# can be grouped. - -DOCSET_FEEDNAME = "Doxygen generated docs" - -# When GENERATE_DOCSET tag is set to YES, this tag specifies a string that -# should uniquely identify the documentation set bundle. This should be a -# reverse domain-name style string, e.g. com.mycompany.MyDocSet. Doxygen -# will append .docset to the name. - -DOCSET_BUNDLE_ID = org.doxygen.Project - -# When GENERATE_PUBLISHER_ID tag specifies a string that should uniquely -# identify the documentation publisher. This should be a reverse domain-name -# style string, e.g. com.mycompany.MyDocSet.documentation. - -DOCSET_PUBLISHER_ID = org.doxygen.Publisher - -# The GENERATE_PUBLISHER_NAME tag identifies the documentation publisher. - -DOCSET_PUBLISHER_NAME = Publisher - -# If the GENERATE_HTMLHELP tag is set to YES, additional index files -# will be generated that can be used as input for tools like the -# Microsoft HTML help workshop to generate a compiled HTML help file (.chm) -# of the generated HTML documentation. - -GENERATE_HTMLHELP = NO - -# If the GENERATE_HTMLHELP tag is set to YES, the CHM_FILE tag can -# be used to specify the file name of the resulting .chm file. You -# can add a path in front of the file if the result should not be -# written to the html output directory. - -CHM_FILE = - -# If the GENERATE_HTMLHELP tag is set to YES, the HHC_LOCATION tag can -# be used to specify the location (absolute path including file name) of -# the HTML help compiler (hhc.exe). If non-empty doxygen will try to run -# the HTML help compiler on the generated index.hhp. - -HHC_LOCATION = - -# If the GENERATE_HTMLHELP tag is set to YES, the GENERATE_CHI flag -# controls if a separate .chi index file is generated (YES) or that -# it should be included in the main .chm file (NO). - -GENERATE_CHI = NO - -# If the GENERATE_HTMLHELP tag is set to YES, the CHM_INDEX_ENCODING -# is used to encode HtmlHelp index (hhk), content (hhc) and project file -# content. - -CHM_INDEX_ENCODING = - -# If the GENERATE_HTMLHELP tag is set to YES, the BINARY_TOC flag -# controls whether a binary table of contents is generated (YES) or a -# normal table of contents (NO) in the .chm file. - -BINARY_TOC = NO - -# The TOC_EXPAND flag can be set to YES to add extra items for group members -# to the contents of the HTML help documentation and to the tree view. - -TOC_EXPAND = NO - -# If the GENERATE_QHP tag is set to YES and both QHP_NAMESPACE and -# QHP_VIRTUAL_FOLDER are set, an additional index file will be generated -# that can be used as input for Qt's qhelpgenerator to generate a -# Qt Compressed Help (.qch) of the generated HTML documentation. - -GENERATE_QHP = NO - -# If the QHG_LOCATION tag is specified, the QCH_FILE tag can -# be used to specify the file name of the resulting .qch file. -# The path specified is relative to the HTML output folder. - -QCH_FILE = - -# The QHP_NAMESPACE tag specifies the namespace to use when generating -# Qt Help Project output. For more information please see -# http://doc.trolltech.com/qthelpproject.html#namespace - -QHP_NAMESPACE = org.doxygen.Project - -# The QHP_VIRTUAL_FOLDER tag specifies the namespace to use when generating -# Qt Help Project output. For more information please see -# http://doc.trolltech.com/qthelpproject.html#virtual-folders - -QHP_VIRTUAL_FOLDER = doc - -# If QHP_CUST_FILTER_NAME is set, it specifies the name of a custom filter to -# add. For more information please see -# http://doc.trolltech.com/qthelpproject.html#custom-filters - -QHP_CUST_FILTER_NAME = - -# The QHP_CUST_FILT_ATTRS tag specifies the list of the attributes of the -# custom filter to add. For more information please see -# -# Qt Help Project / Custom Filters. - -QHP_CUST_FILTER_ATTRS = - -# The QHP_SECT_FILTER_ATTRS tag specifies the list of the attributes this -# project's -# filter section matches. -# -# Qt Help Project / Filter Attributes. - -QHP_SECT_FILTER_ATTRS = - -# If the GENERATE_QHP tag is set to YES, the QHG_LOCATION tag can -# be used to specify the location of Qt's qhelpgenerator. -# If non-empty doxygen will try to run qhelpgenerator on the generated -# .qhp file. - -QHG_LOCATION = - -# If the GENERATE_ECLIPSEHELP tag is set to YES, additional index files -# will be generated, which together with the HTML files, form an Eclipse help -# plugin. To install this plugin and make it available under the help contents -# menu in Eclipse, the contents of the directory containing the HTML and XML -# files needs to be copied into the plugins directory of eclipse. The name of -# the directory within the plugins directory should be the same as -# the ECLIPSE_DOC_ID value. After copying Eclipse needs to be restarted before -# the help appears. - -GENERATE_ECLIPSEHELP = NO - -# A unique identifier for the eclipse help plugin. When installing the plugin -# the directory name containing the HTML and XML files should also have -# this name. - -ECLIPSE_DOC_ID = org.doxygen.Project - -# The DISABLE_INDEX tag can be used to turn on/off the condensed index (tabs) -# at top of each HTML page. The value NO (the default) enables the index and -# the value YES disables it. Since the tabs have the same information as the -# navigation tree you can set this option to NO if you already set -# GENERATE_TREEVIEW to YES. - -DISABLE_INDEX = NO - -# The GENERATE_TREEVIEW tag is used to specify whether a tree-like index -# structure should be generated to display hierarchical information. -# If the tag value is set to YES, a side panel will be generated -# containing a tree-like index structure (just like the one that -# is generated for HTML Help). For this to work a browser that supports -# JavaScript, DHTML, CSS and frames is required (i.e. any modern browser). -# Windows users are probably better off using the HTML help feature. -# Since the tree basically has the same information as the tab index you -# could consider to set DISABLE_INDEX to NO when enabling this option. - -GENERATE_TREEVIEW = NO - -# The ENUM_VALUES_PER_LINE tag can be used to set the number of enum values -# (range [0,1..20]) that doxygen will group on one line in the generated HTML -# documentation. Note that a value of 0 will completely suppress the enum -# values from appearing in the overview section. - -ENUM_VALUES_PER_LINE = 4 - -# If the treeview is enabled (see GENERATE_TREEVIEW) then this tag can be -# used to set the initial width (in pixels) of the frame in which the tree -# is shown. - -TREEVIEW_WIDTH = 250 - -# When the EXT_LINKS_IN_WINDOW option is set to YES doxygen will open -# links to external symbols imported via tag files in a separate window. - -EXT_LINKS_IN_WINDOW = NO - -# Use this tag to change the font size of Latex formulas included -# as images in the HTML documentation. The default is 10. Note that -# when you change the font size after a successful doxygen run you need -# to manually remove any form_*.png images from the HTML output directory -# to force them to be regenerated. - -FORMULA_FONTSIZE = 10 - -# Use the FORMULA_TRANPARENT tag to determine whether or not the images -# generated for formulas are transparent PNGs. Transparent PNGs are -# not supported properly for IE 6.0, but are supported on all modern browsers. -# Note that when changing this option you need to delete any form_*.png files -# in the HTML output before the changes have effect. - -FORMULA_TRANSPARENT = YES - -# Enable the USE_MATHJAX option to render LaTeX formulas using MathJax -# (see http://www.mathjax.org) which uses client side Javascript for the -# rendering instead of using prerendered bitmaps. Use this if you do not -# have LaTeX installed or if you want to formulas look prettier in the HTML -# output. When enabled you may also need to install MathJax separately and -# configure the path to it using the MATHJAX_RELPATH option. - -USE_MATHJAX = NO - -# When MathJax is enabled you need to specify the location relative to the -# HTML output directory using the MATHJAX_RELPATH option. The destination -# directory should contain the MathJax.js script. For instance, if the mathjax -# directory is located at the same level as the HTML output directory, then -# MATHJAX_RELPATH should be ../mathjax. The default value points to -# the MathJax Content Delivery Network so you can quickly see the result without -# installing MathJax. -# However, it is strongly recommended to install a local -# copy of MathJax from http://www.mathjax.org before deployment. - -MATHJAX_RELPATH = http://cdn.mathjax.org/mathjax/latest - -# The MATHJAX_EXTENSIONS tag can be used to specify one or MathJax extension -# names that should be enabled during MathJax rendering. - -MATHJAX_EXTENSIONS = - -# When the SEARCHENGINE tag is enabled doxygen will generate a search box -# for the HTML output. The underlying search engine uses javascript -# and DHTML and should work on any modern browser. Note that when using -# HTML help (GENERATE_HTMLHELP), Qt help (GENERATE_QHP), or docsets -# (GENERATE_DOCSET) there is already a search function so this one should -# typically be disabled. For large projects the javascript based search engine -# can be slow, then enabling SERVER_BASED_SEARCH may provide a better solution. - -SEARCHENGINE = YES - -# When the SERVER_BASED_SEARCH tag is enabled the search engine will be -# implemented using a PHP enabled web server instead of at the web client -# using Javascript. Doxygen will generate the search PHP script and index -# file to put on the web server. The advantage of the server -# based approach is that it scales better to large projects and allows -# full text search. The disadvantages are that it is more difficult to setup -# and does not have live searching capabilities. - -SERVER_BASED_SEARCH = NO - -#--------------------------------------------------------------------------- -# configuration options related to the LaTeX output -#--------------------------------------------------------------------------- - -# If the GENERATE_LATEX tag is set to YES (the default) Doxygen will -# generate Latex output. - -GENERATE_LATEX = YES - -# The LATEX_OUTPUT tag is used to specify where the LaTeX docs will be put. -# If a relative path is entered the value of OUTPUT_DIRECTORY will be -# put in front of it. If left blank `latex' will be used as the default path. - -LATEX_OUTPUT = - -# The LATEX_CMD_NAME tag can be used to specify the LaTeX command name to be -# invoked. If left blank `latex' will be used as the default command name. -# Note that when enabling USE_PDFLATEX this option is only used for -# generating bitmaps for formulas in the HTML output, but not in the -# Makefile that is written to the output directory. - -LATEX_CMD_NAME = latex - -# The MAKEINDEX_CMD_NAME tag can be used to specify the command name to -# generate index for LaTeX. If left blank `makeindex' will be used as the -# default command name. - -MAKEINDEX_CMD_NAME = makeindex - -# If the COMPACT_LATEX tag is set to YES Doxygen generates more compact -# LaTeX documents. This may be useful for small projects and may help to -# save some trees in general. - -COMPACT_LATEX = NO - -# The PAPER_TYPE tag can be used to set the paper type that is used -# by the printer. Possible values are: a4, letter, legal and -# executive. If left blank a4wide will be used. - -PAPER_TYPE = a4wide - -# The EXTRA_PACKAGES tag can be to specify one or more names of LaTeX -# packages that should be included in the LaTeX output. - -EXTRA_PACKAGES = - -# The LATEX_HEADER tag can be used to specify a personal LaTeX header for -# the generated latex document. The header should contain everything until -# the first chapter. If it is left blank doxygen will generate a -# standard header. Notice: only use this tag if you know what you are doing! - -LATEX_HEADER = doc/doxygen/header.tex - -# The LATEX_FOOTER tag can be used to specify a personal LaTeX footer for -# the generated latex document. The footer should contain everything after -# the last chapter. If it is left blank doxygen will generate a -# standard footer. Notice: only use this tag if you know what you are doing! - -LATEX_FOOTER = - -# If the PDF_HYPERLINKS tag is set to YES, the LaTeX that is generated -# is prepared for conversion to pdf (using ps2pdf). The pdf file will -# contain links (just like the HTML output) instead of page references -# This makes the output suitable for online browsing using a pdf viewer. - -PDF_HYPERLINKS = YES - -# If the USE_PDFLATEX tag is set to YES, pdflatex will be used instead of -# plain latex in the generated Makefile. Set this option to YES to get a -# higher quality PDF documentation. - -USE_PDFLATEX = YES - -# If the LATEX_BATCHMODE tag is set to YES, doxygen will add the \\batchmode. -# command to the generated LaTeX files. This will instruct LaTeX to keep -# running if errors occur, instead of asking the user for help. -# This option is also used when generating formulas in HTML. - -LATEX_BATCHMODE = NO - -# If LATEX_HIDE_INDICES is set to YES then doxygen will not -# include the index chapters (such as File Index, Compound Index, etc.) -# in the output. - -LATEX_HIDE_INDICES = NO - -# If LATEX_SOURCE_CODE is set to YES then doxygen will include -# source code with syntax highlighting in the LaTeX output. -# Note that which sources are shown also depends on other settings -# such as SOURCE_BROWSER. - -LATEX_SOURCE_CODE = NO - -# The LATEX_BIB_STYLE tag can be used to specify the style to use for the -# bibliography, e.g. plainnat, or ieeetr. The default style is "plain". See -# http://en.wikipedia.org/wiki/BibTeX for more info. - -LATEX_BIB_STYLE = plain - -#--------------------------------------------------------------------------- -# configuration options related to the RTF output -#--------------------------------------------------------------------------- - -# If the GENERATE_RTF tag is set to YES Doxygen will generate RTF output -# The RTF output is optimized for Word 97 and may not look very pretty with -# other RTF readers or editors. - -GENERATE_RTF = NO - -# The RTF_OUTPUT tag is used to specify where the RTF docs will be put. -# If a relative path is entered the value of OUTPUT_DIRECTORY will be -# put in front of it. If left blank `rtf' will be used as the default path. - -RTF_OUTPUT = - -# If the COMPACT_RTF tag is set to YES Doxygen generates more compact -# RTF documents. This may be useful for small projects and may help to -# save some trees in general. - -COMPACT_RTF = NO - -# If the RTF_HYPERLINKS tag is set to YES, the RTF that is generated -# will contain hyperlink fields. The RTF file will -# contain links (just like the HTML output) instead of page references. -# This makes the output suitable for online browsing using WORD or other -# programs which support those fields. -# Note: wordpad (write) and others do not support links. - -RTF_HYPERLINKS = NO - -# Load style sheet definitions from file. Syntax is similar to doxygen's -# config file, i.e. a series of assignments. You only have to provide -# replacements, missing definitions are set to their default value. - -RTF_STYLESHEET_FILE = - -# Set optional variables used in the generation of an rtf document. -# Syntax is similar to doxygen's config file. - -RTF_EXTENSIONS_FILE = - -#--------------------------------------------------------------------------- -# configuration options related to the man page output -#--------------------------------------------------------------------------- - -# If the GENERATE_MAN tag is set to YES (the default) Doxygen will -# generate man pages - -GENERATE_MAN = NO - -# The MAN_OUTPUT tag is used to specify where the man pages will be put. -# If a relative path is entered the value of OUTPUT_DIRECTORY will be -# put in front of it. If left blank `man' will be used as the default path. - -MAN_OUTPUT = - -# The MAN_EXTENSION tag determines the extension that is added to -# the generated man pages (default is the subroutine's section .3) - -MAN_EXTENSION = - -# If the MAN_LINKS tag is set to YES and Doxygen generates man output, -# then it will generate one additional man file for each entity -# documented in the real man page(s). These additional files -# only source the real man page, but without them the man command -# would be unable to find the correct page. The default is NO. - -MAN_LINKS = NO - -#--------------------------------------------------------------------------- -# configuration options related to the XML output -#--------------------------------------------------------------------------- - -# If the GENERATE_XML tag is set to YES Doxygen will -# generate an XML file that captures the structure of -# the code including all documentation. - -GENERATE_XML = NO - -# The XML_OUTPUT tag is used to specify where the XML pages will be put. -# If a relative path is entered the value of OUTPUT_DIRECTORY will be -# put in front of it. If left blank `xml' will be used as the default path. - -XML_OUTPUT = xml - -# The XML_SCHEMA tag can be used to specify an XML schema, -# which can be used by a validating XML parser to check the -# syntax of the XML files. - -XML_SCHEMA = - -# The XML_DTD tag can be used to specify an XML DTD, -# which can be used by a validating XML parser to check the -# syntax of the XML files. - -XML_DTD = - -# If the XML_PROGRAMLISTING tag is set to YES Doxygen will -# dump the program listings (including syntax highlighting -# and cross-referencing information) to the XML output. Note that -# enabling this will significantly increase the size of the XML output. - -XML_PROGRAMLISTING = YES - -#--------------------------------------------------------------------------- -# configuration options for the AutoGen Definitions output -#--------------------------------------------------------------------------- - -# If the GENERATE_AUTOGEN_DEF tag is set to YES Doxygen will -# generate an AutoGen Definitions (see autogen.sf.net) file -# that captures the structure of the code including all -# documentation. Note that this feature is still experimental -# and incomplete at the moment. - -GENERATE_AUTOGEN_DEF = NO - -#--------------------------------------------------------------------------- -# configuration options related to the Perl module output -#--------------------------------------------------------------------------- - -# If the GENERATE_PERLMOD tag is set to YES Doxygen will -# generate a Perl module file that captures the structure of -# the code including all documentation. Note that this -# feature is still experimental and incomplete at the -# moment. - -GENERATE_PERLMOD = NO - -# If the PERLMOD_LATEX tag is set to YES Doxygen will generate -# the necessary Makefile rules, Perl scripts and LaTeX code to be able -# to generate PDF and DVI output from the Perl module output. - -PERLMOD_LATEX = NO - -# If the PERLMOD_PRETTY tag is set to YES the Perl module output will be -# nicely formatted so it can be parsed by a human reader. -# This is useful -# if you want to understand what is going on. -# On the other hand, if this -# tag is set to NO the size of the Perl module output will be much smaller -# and Perl will parse it just the same. - -PERLMOD_PRETTY = YES - -# The names of the make variables in the generated doxyrules.make file -# are prefixed with the string contained in PERLMOD_MAKEVAR_PREFIX. -# This is useful so different doxyrules.make files included by the same -# Makefile don't overwrite each other's variables. - -PERLMOD_MAKEVAR_PREFIX = - -#--------------------------------------------------------------------------- -# Configuration options related to the preprocessor -#--------------------------------------------------------------------------- - -# If the ENABLE_PREPROCESSING tag is set to YES (the default) Doxygen will -# evaluate all C-preprocessor directives found in the sources and include -# files. - -ENABLE_PREPROCESSING = YES - -# If the MACRO_EXPANSION tag is set to YES Doxygen will expand all macro -# names in the source code. If set to NO (the default) only conditional -# compilation will be performed. Macro expansion can be done in a controlled -# way by setting EXPAND_ONLY_PREDEF to YES. - -MACRO_EXPANSION = YES - -# If the EXPAND_ONLY_PREDEF and MACRO_EXPANSION tags are both set to YES -# then the macro expansion is limited to the macros specified with the -# PREDEFINED and EXPAND_AS_DEFINED tags. - -EXPAND_ONLY_PREDEF = YES - -# If the SEARCH_INCLUDES tag is set to YES (the default) the includes files -# pointed to by INCLUDE_PATH will be searched when a #include is found. - -SEARCH_INCLUDES = YES - -# The INCLUDE_PATH tag can be used to specify one or more directories that -# contain include files that are not input files but should be processed by -# the preprocessor. - -INCLUDE_PATH = - -# You can use the INCLUDE_FILE_PATTERNS tag to specify one or more wildcard -# patterns (like *.h and *.hpp) to filter out the header-files in the -# directories. If left blank, the patterns specified with FILE_PATTERNS will -# be used. - -INCLUDE_FILE_PATTERNS = - -# The PREDEFINED tag can be used to specify one or more macro names that -# are defined before the preprocessor is started (similar to the -D option of -# gcc). The argument of the tag is a list of macros of the form: name -# or name=definition (no spaces). If the definition and the = are -# omitted =1 is assumed. To prevent a macro definition from being -# undefined via #undef or recursively expanded use the := operator -# instead of the = operator. - -PREDEFINED = OMP_30_ENABLED=1, OMP_40_ENABLED=1, KMP_STATS_ENABLED=1 - -# If the MACRO_EXPANSION and EXPAND_ONLY_PREDEF tags are set to YES then -# this tag can be used to specify a list of macro names that should be expanded. -# The macro definition that is found in the sources will be used. -# Use the PREDEFINED tag if you want to use a different macro definition that -# overrules the definition found in the source code. - -EXPAND_AS_DEFINED = - -# If the SKIP_FUNCTION_MACROS tag is set to YES (the default) then -# doxygen's preprocessor will remove all references to function-like macros -# that are alone on a line, have an all uppercase name, and do not end with a -# semicolon, because these will confuse the parser if not removed. - -SKIP_FUNCTION_MACROS = YES - -#--------------------------------------------------------------------------- -# Configuration::additions related to external references -#--------------------------------------------------------------------------- - -# The TAGFILES option can be used to specify one or more tagfiles. For each -# tag file the location of the external documentation should be added. The -# format of a tag file without this location is as follows: -# -# TAGFILES = file1 file2 ... -# Adding location for the tag files is done as follows: -# -# TAGFILES = file1=loc1 "file2 = loc2" ... -# where "loc1" and "loc2" can be relative or absolute paths -# or URLs. Note that each tag file must have a unique name (where the name does -# NOT include the path). If a tag file is not located in the directory in which -# doxygen is run, you must also specify the path to the tagfile here. - -TAGFILES = - -# When a file name is specified after GENERATE_TAGFILE, doxygen will create -# a tag file that is based on the input files it reads. - -GENERATE_TAGFILE = - -# If the ALLEXTERNALS tag is set to YES all external classes will be listed -# in the class index. If set to NO only the inherited external classes -# will be listed. - -ALLEXTERNALS = NO - -# If the EXTERNAL_GROUPS tag is set to YES all external groups will be listed -# in the modules index. If set to NO, only the current project's groups will -# be listed. - -EXTERNAL_GROUPS = YES - -# The PERL_PATH should be the absolute path and name of the perl script -# interpreter (i.e. the result of `which perl'). - -PERL_PATH = - -#--------------------------------------------------------------------------- -# Configuration options related to the dot tool -#--------------------------------------------------------------------------- - -# If the CLASS_DIAGRAMS tag is set to YES (the default) Doxygen will -# generate a inheritance diagram (in HTML, RTF and LaTeX) for classes with base -# or super classes. Setting the tag to NO turns the diagrams off. Note that -# this option also works with HAVE_DOT disabled, but it is recommended to -# install and use dot, since it yields more powerful graphs. - -CLASS_DIAGRAMS = YES - -# You can define message sequence charts within doxygen comments using the \msc -# command. Doxygen will then run the mscgen tool (see -# http://www.mcternan.me.uk/mscgen/) to produce the chart and insert it in the -# documentation. The MSCGEN_PATH tag allows you to specify the directory where -# the mscgen tool resides. If left empty the tool is assumed to be found in the -# default search path. - -MSCGEN_PATH = - -# If set to YES, the inheritance and collaboration graphs will hide -# inheritance and usage relations if the target is undocumented -# or is not a class. - -HIDE_UNDOC_RELATIONS = YES - -# If you set the HAVE_DOT tag to YES then doxygen will assume the dot tool is -# available from the path. This tool is part of Graphviz, a graph visualization -# toolkit from AT&T and Lucent Bell Labs. The other options in this section -# have no effect if this option is set to NO (the default) - -HAVE_DOT = NO - -# The DOT_NUM_THREADS specifies the number of dot invocations doxygen is -# allowed to run in parallel. When set to 0 (the default) doxygen will -# base this on the number of processors available in the system. You can set it -# explicitly to a value larger than 0 to get control over the balance -# between CPU load and processing speed. - -DOT_NUM_THREADS = 0 - -# By default doxygen will use the Helvetica font for all dot files that -# doxygen generates. When you want a differently looking font you can specify -# the font name using DOT_FONTNAME. You need to make sure dot is able to find -# the font, which can be done by putting it in a standard location or by setting -# the DOTFONTPATH environment variable or by setting DOT_FONTPATH to the -# directory containing the font. - -DOT_FONTNAME = Helvetica - -# The DOT_FONTSIZE tag can be used to set the size of the font of dot graphs. -# The default size is 10pt. - -DOT_FONTSIZE = 10 - -# By default doxygen will tell dot to use the Helvetica font. -# If you specify a different font using DOT_FONTNAME you can use DOT_FONTPATH to -# set the path where dot can find it. - -DOT_FONTPATH = - -# If the CLASS_GRAPH and HAVE_DOT tags are set to YES then doxygen -# will generate a graph for each documented class showing the direct and -# indirect inheritance relations. Setting this tag to YES will force the -# CLASS_DIAGRAMS tag to NO. - -CLASS_GRAPH = YES - -# If the COLLABORATION_GRAPH and HAVE_DOT tags are set to YES then doxygen -# will generate a graph for each documented class showing the direct and -# indirect implementation dependencies (inheritance, containment, and -# class references variables) of the class with other documented classes. - -COLLABORATION_GRAPH = NO - -# If the GROUP_GRAPHS and HAVE_DOT tags are set to YES then doxygen -# will generate a graph for groups, showing the direct groups dependencies - -GROUP_GRAPHS = YES - -# If the UML_LOOK tag is set to YES doxygen will generate inheritance and -# collaboration diagrams in a style similar to the OMG's Unified Modeling -# Language. - -UML_LOOK = NO - -# If the UML_LOOK tag is enabled, the fields and methods are shown inside -# the class node. If there are many fields or methods and many nodes the -# graph may become too big to be useful. The UML_LIMIT_NUM_FIELDS -# threshold limits the number of items for each type to make the size more -# manageable. Set this to 0 for no limit. Note that the threshold may be -# exceeded by 50% before the limit is enforced. - -UML_LIMIT_NUM_FIELDS = 10 - -# If set to YES, the inheritance and collaboration graphs will show the -# relations between templates and their instances. - -TEMPLATE_RELATIONS = YES - -# If the ENABLE_PREPROCESSING, SEARCH_INCLUDES, INCLUDE_GRAPH, and HAVE_DOT -# tags are set to YES then doxygen will generate a graph for each documented -# file showing the direct and indirect include dependencies of the file with -# other documented files. - -INCLUDE_GRAPH = NO - -# If the ENABLE_PREPROCESSING, SEARCH_INCLUDES, INCLUDED_BY_GRAPH, and -# HAVE_DOT tags are set to YES then doxygen will generate a graph for each -# documented header file showing the documented files that directly or -# indirectly include this file. - -INCLUDED_BY_GRAPH = NO - -# If the CALL_GRAPH and HAVE_DOT options are set to YES then -# doxygen will generate a call dependency graph for every global function -# or class method. Note that enabling this option will significantly increase -# the time of a run. So in most cases it will be better to enable call graphs -# for selected functions only using the \callgraph command. - -CALL_GRAPH = NO - -# If the CALLER_GRAPH and HAVE_DOT tags are set to YES then -# doxygen will generate a caller dependency graph for every global function -# or class method. Note that enabling this option will significantly increase -# the time of a run. So in most cases it will be better to enable caller -# graphs for selected functions only using the \callergraph command. - -CALLER_GRAPH = NO - -# If the GRAPHICAL_HIERARCHY and HAVE_DOT tags are set to YES then doxygen -# will generate a graphical hierarchy of all classes instead of a textual one. - -GRAPHICAL_HIERARCHY = YES - -# If the DIRECTORY_GRAPH and HAVE_DOT tags are set to YES -# then doxygen will show the dependencies a directory has on other directories -# in a graphical way. The dependency relations are determined by the #include -# relations between the files in the directories. - -DIRECTORY_GRAPH = YES - -# The DOT_IMAGE_FORMAT tag can be used to set the image format of the images -# generated by dot. Possible values are svg, png, jpg, or gif. -# If left blank png will be used. If you choose svg you need to set -# HTML_FILE_EXTENSION to xhtml in order to make the SVG files -# visible in IE 9+ (other browsers do not have this requirement). - -DOT_IMAGE_FORMAT = png - -# If DOT_IMAGE_FORMAT is set to svg, then this option can be set to YES to -# enable generation of interactive SVG images that allow zooming and panning. -# Note that this requires a modern browser other than Internet Explorer. -# Tested and working are Firefox, Chrome, Safari, and Opera. For IE 9+ you -# need to set HTML_FILE_EXTENSION to xhtml in order to make the SVG files -# visible. Older versions of IE do not have SVG support. - -INTERACTIVE_SVG = NO - -# The tag DOT_PATH can be used to specify the path where the dot tool can be -# found. If left blank, it is assumed the dot tool can be found in the path. - -DOT_PATH = - -# The DOTFILE_DIRS tag can be used to specify one or more directories that -# contain dot files that are included in the documentation (see the -# \dotfile command). - -DOTFILE_DIRS = - -# The MSCFILE_DIRS tag can be used to specify one or more directories that -# contain msc files that are included in the documentation (see the -# \mscfile command). - -MSCFILE_DIRS = - -# The DOT_GRAPH_MAX_NODES tag can be used to set the maximum number of -# nodes that will be shown in the graph. If the number of nodes in a graph -# becomes larger than this value, doxygen will truncate the graph, which is -# visualized by representing a node as a red box. Note that doxygen if the -# number of direct children of the root node in a graph is already larger than -# DOT_GRAPH_MAX_NODES then the graph will not be shown at all. Also note -# that the size of a graph can be further restricted by MAX_DOT_GRAPH_DEPTH. - -DOT_GRAPH_MAX_NODES = 50 - -# The MAX_DOT_GRAPH_DEPTH tag can be used to set the maximum depth of the -# graphs generated by dot. A depth value of 3 means that only nodes reachable -# from the root by following a path via at most 3 edges will be shown. Nodes -# that lay further from the root node will be omitted. Note that setting this -# option to 1 or 2 may greatly reduce the computation time needed for large -# code bases. Also note that the size of a graph can be further restricted by -# DOT_GRAPH_MAX_NODES. Using a depth of 0 means no depth restriction. - -MAX_DOT_GRAPH_DEPTH = 0 - -# Set the DOT_TRANSPARENT tag to YES to generate images with a transparent -# background. This is disabled by default, because dot on Windows does not -# seem to support this out of the box. Warning: Depending on the platform used, -# enabling this option may lead to badly anti-aliased labels on the edges of -# a graph (i.e. they become hard to read). - -DOT_TRANSPARENT = NO - -# Set the DOT_MULTI_TARGETS tag to YES allow dot to generate multiple output -# files in one run (i.e. multiple -o and -T options on the command line). This -# makes dot run faster, but since only newer versions of dot (>1.8.10) -# support this, this feature is disabled by default. - -DOT_MULTI_TARGETS = NO - -# If the GENERATE_LEGEND tag is set to YES (the default) Doxygen will -# generate a legend page explaining the meaning of the various boxes and -# arrows in the dot generated graphs. - -GENERATE_LEGEND = YES - -# If the DOT_CLEANUP tag is set to YES (the default) Doxygen will -# remove the intermediate dot files that are used to generate -# the various graphs. - -DOT_CLEANUP = YES +# Doxyfile 1.o8.2 + +# This file describes the settings to be used by the documentation system +# doxygen (www.doxygen.org) for a project. +# +# All text after a hash (#) is considered a comment and will be ignored. +# The format is: +# TAG = value [value, ...] +# For lists items can also be appended using: +# TAG += value [value, ...] +# Values that contain spaces should be placed between quotes (" "). + +#--------------------------------------------------------------------------- +# Project related configuration options +#--------------------------------------------------------------------------- + +# This tag specifies the encoding used for all characters in the config file +# that follow. The default is UTF-8 which is also the encoding used for all +# text before the first occurrence of this tag. Doxygen uses libiconv (or the +# iconv built into libc) for the transcoding. See +# http://www.gnu.org/software/libiconv for the list of possible encodings. + +DOXYFILE_ENCODING = UTF-8 + +# The PROJECT_NAME tag is a single word (or sequence of words) that should +# identify the project. Note that if you do not use Doxywizard you need +# to put quotes around the project name if it contains spaces. + +PROJECT_NAME = "LLVM OpenMP* Runtime Library" + +# The PROJECT_NUMBER tag can be used to enter a project or revision number. +# This could be handy for archiving the generated documentation or +# if some version control system is used. + +PROJECT_NUMBER = + +# Using the PROJECT_BRIEF tag one can provide an optional one line description +# for a project that appears at the top of each page and should give viewer +# a quick idea about the purpose of the project. Keep the description short. + +PROJECT_BRIEF = + +# With the PROJECT_LOGO tag one can specify an logo or icon that is +# included in the documentation. The maximum height of the logo should not +# exceed 55 pixels and the maximum width should not exceed 200 pixels. +# Doxygen will copy the logo to the output directory. + +PROJECT_LOGO = + +# The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute) +# base path where the generated documentation will be put. +# If a relative path is entered, it will be relative to the location +# where doxygen was started. If left blank the current directory will be used. + +OUTPUT_DIRECTORY = doc/doxygen/generated + +# If the CREATE_SUBDIRS tag is set to YES, then doxygen will create +# 4096 sub-directories (in 2 levels) under the output directory of each output +# format and will distribute the generated files over these directories. +# Enabling this option can be useful when feeding doxygen a huge amount of +# source files, where putting all generated files in the same directory would +# otherwise cause performance problems for the file system. + +CREATE_SUBDIRS = NO + +# The OUTPUT_LANGUAGE tag is used to specify the language in which all +# documentation generated by doxygen is written. Doxygen will use this +# information to generate all constant output in the proper language. +# The default language is English, other supported languages are: +# Afrikaans, Arabic, Brazilian, Catalan, Chinese, Chinese-Traditional, +# Croatian, Czech, Danish, Dutch, Esperanto, Farsi, Finnish, French, German, +# Greek, Hungarian, Italian, Japanese, Japanese-en (Japanese with English +# messages), Korean, Korean-en, Lithuanian, Norwegian, Macedonian, Persian, +# Polish, Portuguese, Romanian, Russian, Serbian, Serbian-Cyrillic, Slovak, +# Slovene, Spanish, Swedish, Ukrainian, and Vietnamese. + +OUTPUT_LANGUAGE = English + +# If the BRIEF_MEMBER_DESC tag is set to YES (the default) Doxygen will +# include brief member descriptions after the members that are listed in +# the file and class documentation (similar to JavaDoc). +# Set to NO to disable this. + +BRIEF_MEMBER_DESC = YES + +# If the REPEAT_BRIEF tag is set to YES (the default) Doxygen will prepend +# the brief description of a member or function before the detailed description. +# Note: if both HIDE_UNDOC_MEMBERS and BRIEF_MEMBER_DESC are set to NO, the +# brief descriptions will be completely suppressed. + +REPEAT_BRIEF = YES + +# This tag implements a quasi-intelligent brief description abbreviator +# that is used to form the text in various listings. Each string +# in this list, if found as the leading text of the brief description, will be +# stripped from the text and the result after processing the whole list, is +# used as the annotated text. Otherwise, the brief description is used as-is. +# If left blank, the following values are used ("$name" is automatically +# replaced with the name of the entity): "The $name class" "The $name widget" +# "The $name file" "is" "provides" "specifies" "contains" +# "represents" "a" "an" "the" + +ABBREVIATE_BRIEF = + +# If the ALWAYS_DETAILED_SEC and REPEAT_BRIEF tags are both set to YES then +# Doxygen will generate a detailed section even if there is only a brief +# description. + +ALWAYS_DETAILED_SEC = NO + +# If the INLINE_INHERITED_MEMB tag is set to YES, doxygen will show all +# inherited members of a class in the documentation of that class as if those +# members were ordinary class members. Constructors, destructors and assignment +# operators of the base classes will not be shown. + +INLINE_INHERITED_MEMB = NO + +# If the FULL_PATH_NAMES tag is set to YES then Doxygen will prepend the full +# path before files name in the file list and in the header files. If set +# to NO the shortest path that makes the file name unique will be used. + +FULL_PATH_NAMES = NO + +# If the FULL_PATH_NAMES tag is set to YES then the STRIP_FROM_PATH tag +# can be used to strip a user-defined part of the path. Stripping is +# only done if one of the specified strings matches the left-hand part of +# the path. The tag can be used to show relative paths in the file list. +# If left blank the directory from which doxygen is run is used as the +# path to strip. Note that you specify absolute paths here, but also +# relative paths, which will be relative from the directory where doxygen is +# started. + +STRIP_FROM_PATH = + +# The STRIP_FROM_INC_PATH tag can be used to strip a user-defined part of +# the path mentioned in the documentation of a class, which tells +# the reader which header file to include in order to use a class. +# If left blank only the name of the header file containing the class +# definition is used. Otherwise one should specify the include paths that +# are normally passed to the compiler using the -I flag. + +STRIP_FROM_INC_PATH = + +# If the SHORT_NAMES tag is set to YES, doxygen will generate much shorter +# (but less readable) file names. This can be useful if your file system +# doesn't support long names like on DOS, Mac, or CD-ROM. + +SHORT_NAMES = NO + +# If the JAVADOC_AUTOBRIEF tag is set to YES then Doxygen +# will interpret the first line (until the first dot) of a JavaDoc-style +# comment as the brief description. If set to NO, the JavaDoc +# comments will behave just like regular Qt-style comments +# (thus requiring an explicit @brief command for a brief description.) + +JAVADOC_AUTOBRIEF = NO + +# If the QT_AUTOBRIEF tag is set to YES then Doxygen will +# interpret the first line (until the first dot) of a Qt-style +# comment as the brief description. If set to NO, the comments +# will behave just like regular Qt-style comments (thus requiring +# an explicit \brief command for a brief description.) + +QT_AUTOBRIEF = NO + +# The MULTILINE_CPP_IS_BRIEF tag can be set to YES to make Doxygen +# treat a multi-line C++ special comment block (i.e. a block of //! or /// +# comments) as a brief description. This used to be the default behaviour. +# The new default is to treat a multi-line C++ comment block as a detailed +# description. Set this tag to YES if you prefer the old behaviour instead. + +MULTILINE_CPP_IS_BRIEF = NO + +# If the INHERIT_DOCS tag is set to YES (the default) then an undocumented +# member inherits the documentation from any documented member that it +# re-implements. + +INHERIT_DOCS = YES + +# If the SEPARATE_MEMBER_PAGES tag is set to YES, then doxygen will produce +# a new page for each member. If set to NO, the documentation of a member will +# be part of the file/class/namespace that contains it. + +SEPARATE_MEMBER_PAGES = NO + +# The TAB_SIZE tag can be used to set the number of spaces in a tab. +# Doxygen uses this value to replace tabs by spaces in code fragments. + +TAB_SIZE = 8 + +# This tag can be used to specify a number of aliases that acts +# as commands in the documentation. An alias has the form "name=value". +# For example adding "sideeffect=\par Side Effects:\n" will allow you to +# put the command \sideeffect (or @sideeffect) in the documentation, which +# will result in a user-defined paragraph with heading "Side Effects:". +# You can put \n's in the value part of an alias to insert newlines. + +ALIASES = "other=*" + +# This tag can be used to specify a number of word-keyword mappings (TCL only). +# A mapping has the form "name=value". For example adding +# "class=itcl::class" will allow you to use the command class in the +# itcl::class meaning. + +TCL_SUBST = + +# Set the OPTIMIZE_OUTPUT_FOR_C tag to YES if your project consists of C +# sources only. Doxygen will then generate output that is more tailored for C. +# For instance, some of the names that are used will be different. The list +# of all members will be omitted, etc. + +OPTIMIZE_OUTPUT_FOR_C = NO + +# Set the OPTIMIZE_OUTPUT_JAVA tag to YES if your project consists of Java +# sources only. Doxygen will then generate output that is more tailored for +# Java. For instance, namespaces will be presented as packages, qualified +# scopes will look different, etc. + +OPTIMIZE_OUTPUT_JAVA = NO + +# Set the OPTIMIZE_FOR_FORTRAN tag to YES if your project consists of Fortran +# sources only. Doxygen will then generate output that is more tailored for +# Fortran. + +OPTIMIZE_FOR_FORTRAN = NO + +# Set the OPTIMIZE_OUTPUT_VHDL tag to YES if your project consists of VHDL +# sources. Doxygen will then generate output that is tailored for +# VHDL. + +OPTIMIZE_OUTPUT_VHDL = NO + +# Doxygen selects the parser to use depending on the extension of the files it +# parses. With this tag you can assign which parser to use for a given +# extension. Doxygen has a built-in mapping, but you can override or extend it +# using this tag. The format is ext=language, where ext is a file extension, +# and language is one of the parsers supported by doxygen: IDL, Java, +# Javascript, CSharp, C, C++, D, PHP, Objective-C, Python, Fortran, VHDL, C, +# C++. For instance to make doxygen treat .inc files as Fortran files (default +# is PHP), and .f files as C (default is Fortran), use: inc=Fortran f=C. Note +# that for custom extensions you also need to set FILE_PATTERNS otherwise the +# files are not read by doxygen. + +EXTENSION_MAPPING = + +# If MARKDOWN_SUPPORT is enabled (the default) then doxygen pre-processes all +# comments according to the Markdown format, which allows for more readable +# documentation. See http://daringfireball.net/projects/markdown/ for details. +# The output of markdown processing is further processed by doxygen, so you +# can mix doxygen, HTML, and XML commands with Markdown formatting. +# Disable only in case of backward compatibilities issues. + +MARKDOWN_SUPPORT = YES + +# When enabled doxygen tries to link words that correspond to documented classes, +# or namespaces to their corresponding documentation. Such a link can be +# prevented in individual cases by by putting a % sign in front of the word or +# globally by setting AUTOLINK_SUPPORT to NO. + +AUTOLINK_SUPPORT = YES + +# If you use STL classes (i.e. std::string, std::vector, etc.) but do not want +# to include (a tag file for) the STL sources as input, then you should +# set this tag to YES in order to let doxygen match functions declarations and +# definitions whose arguments contain STL classes (e.g. func(std::string); v.s. +# func(std::string) {}). This also makes the inheritance and collaboration +# diagrams that involve STL classes more complete and accurate. + +BUILTIN_STL_SUPPORT = NO + +# If you use Microsoft's C++/CLI language, you should set this option to YES to +# enable parsing support. + +CPP_CLI_SUPPORT = NO + +# Set the SIP_SUPPORT tag to YES if your project consists of sip sources only. +# Doxygen will parse them like normal C++ but will assume all classes use public +# instead of private inheritance when no explicit protection keyword is present. + +SIP_SUPPORT = NO + +# For Microsoft's IDL there are propget and propput attributes to +# indicate getter and setter methods for a property. Setting this +# option to YES (the default) will make doxygen replace the get and +# set methods by a property in the documentation. This will only work +# if the methods are indeed getting or setting a simple type. If this +# is not the case, or you want to show the methods anyway, you should +# set this option to NO. + +IDL_PROPERTY_SUPPORT = YES + +# If member grouping is used in the documentation and the DISTRIBUTE_GROUP_DOC +# tag is set to YES, then doxygen will reuse the documentation of the first +# member in the group (if any) for the other members of the group. By default +# all members of a group must be documented explicitly. + +DISTRIBUTE_GROUP_DOC = NO + +# Set the SUBGROUPING tag to YES (the default) to allow class member groups of +# the same type (for instance a group of public functions) to be put as a +# subgroup of that type (e.g. under the Public Functions section). Set it to +# NO to prevent subgrouping. Alternatively, this can be done per class using +# the \nosubgrouping command. + +SUBGROUPING = YES + +# When the INLINE_GROUPED_CLASSES tag is set to YES, classes, structs and +# unions are shown inside the group in which they are included (e.g. using +# @ingroup) instead of on a separate page (for HTML and Man pages) or +# section (for LaTeX and RTF). + +INLINE_GROUPED_CLASSES = NO + +# When the INLINE_SIMPLE_STRUCTS tag is set to YES, structs, classes, and +# unions with only public data fields will be shown inline in the documentation +# of the scope in which they are defined (i.e. file, namespace, or group +# documentation), provided this scope is documented. If set to NO (the default), +# structs, classes, and unions are shown on a separate page (for HTML and Man +# pages) or section (for LaTeX and RTF). + +INLINE_SIMPLE_STRUCTS = NO + +# When TYPEDEF_HIDES_STRUCT is enabled, a typedef of a struct, union, or enum +# is documented as struct, union, or enum with the name of the typedef. So +# typedef struct TypeS {} TypeT, will appear in the documentation as a struct +# with name TypeT. When disabled the typedef will appear as a member of a file, +# namespace, or class. And the struct will be named TypeS. This can typically +# be useful for C code in case the coding convention dictates that all compound +# types are typedef'ed and only the typedef is referenced, never the tag name. + +TYPEDEF_HIDES_STRUCT = NO + +# The SYMBOL_CACHE_SIZE determines the size of the internal cache use to +# determine which symbols to keep in memory and which to flush to disk. +# When the cache is full, less often used symbols will be written to disk. +# For small to medium size projects (<1000 input files) the default value is +# probably good enough. For larger projects a too small cache size can cause +# doxygen to be busy swapping symbols to and from disk most of the time +# causing a significant performance penalty. +# If the system has enough physical memory increasing the cache will improve the +# performance by keeping more symbols in memory. Note that the value works on +# a logarithmic scale so increasing the size by one will roughly double the +# memory usage. The cache size is given by this formula: +# 2^(16+SYMBOL_CACHE_SIZE). The valid range is 0..9, the default is 0, +# corresponding to a cache size of 2^16 = 65536 symbols. + +SYMBOL_CACHE_SIZE = 0 + +# Similar to the SYMBOL_CACHE_SIZE the size of the symbol lookup cache can be +# set using LOOKUP_CACHE_SIZE. This cache is used to resolve symbols given +# their name and scope. Since this can be an expensive process and often the +# same symbol appear multiple times in the code, doxygen keeps a cache of +# pre-resolved symbols. If the cache is too small doxygen will become slower. +# If the cache is too large, memory is wasted. The cache size is given by this +# formula: 2^(16+LOOKUP_CACHE_SIZE). The valid range is 0..9, the default is 0, +# corresponding to a cache size of 2^16 = 65536 symbols. + +LOOKUP_CACHE_SIZE = 0 + +#--------------------------------------------------------------------------- +# Build related configuration options +#--------------------------------------------------------------------------- + +# If the EXTRACT_ALL tag is set to YES doxygen will assume all entities in +# documentation are documented, even if no documentation was available. +# Private class members and static file members will be hidden unless +# the EXTRACT_PRIVATE and EXTRACT_STATIC tags are set to YES + +EXTRACT_ALL = NO + +# If the EXTRACT_PRIVATE tag is set to YES all private members of a class +# will be included in the documentation. + +EXTRACT_PRIVATE = YES + +# If the EXTRACT_PACKAGE tag is set to YES all members with package or internal +# scope will be included in the documentation. + +EXTRACT_PACKAGE = NO + +# If the EXTRACT_STATIC tag is set to YES all static members of a file +# will be included in the documentation. + +EXTRACT_STATIC = YES + +# If the EXTRACT_LOCAL_CLASSES tag is set to YES classes (and structs) +# defined locally in source files will be included in the documentation. +# If set to NO only classes defined in header files are included. + +EXTRACT_LOCAL_CLASSES = YES + +# This flag is only useful for Objective-C code. When set to YES local +# methods, which are defined in the implementation section but not in +# the interface are included in the documentation. +# If set to NO (the default) only methods in the interface are included. + +EXTRACT_LOCAL_METHODS = NO + +# If this flag is set to YES, the members of anonymous namespaces will be +# extracted and appear in the documentation as a namespace called +# 'anonymous_namespace{file}', where file will be replaced with the base +# name of the file that contains the anonymous namespace. By default +# anonymous namespaces are hidden. + +EXTRACT_ANON_NSPACES = NO + +# If the HIDE_UNDOC_MEMBERS tag is set to YES, Doxygen will hide all +# undocumented members of documented classes, files or namespaces. +# If set to NO (the default) these members will be included in the +# various overviews, but no documentation section is generated. +# This option has no effect if EXTRACT_ALL is enabled. + +HIDE_UNDOC_MEMBERS = YES + +# If the HIDE_UNDOC_CLASSES tag is set to YES, Doxygen will hide all +# undocumented classes that are normally visible in the class hierarchy. +# If set to NO (the default) these classes will be included in the various +# overviews. This option has no effect if EXTRACT_ALL is enabled. + +HIDE_UNDOC_CLASSES = YES + +# If the HIDE_FRIEND_COMPOUNDS tag is set to YES, Doxygen will hide all +# friend (class|struct|union) declarations. +# If set to NO (the default) these declarations will be included in the +# documentation. + +HIDE_FRIEND_COMPOUNDS = NO + +# If the HIDE_IN_BODY_DOCS tag is set to YES, Doxygen will hide any +# documentation blocks found inside the body of a function. +# If set to NO (the default) these blocks will be appended to the +# function's detailed documentation block. + +HIDE_IN_BODY_DOCS = NO + +# The INTERNAL_DOCS tag determines if documentation +# that is typed after a \internal command is included. If the tag is set +# to NO (the default) then the documentation will be excluded. +# Set it to YES to include the internal documentation. + +INTERNAL_DOCS = NO + +# If the CASE_SENSE_NAMES tag is set to NO then Doxygen will only generate +# file names in lower-case letters. If set to YES upper-case letters are also +# allowed. This is useful if you have classes or files whose names only differ +# in case and if your file system supports case sensitive file names. Windows +# and Mac users are advised to set this option to NO. + +CASE_SENSE_NAMES = YES + +# If the HIDE_SCOPE_NAMES tag is set to NO (the default) then Doxygen +# will show members with their full class and namespace scopes in the +# documentation. If set to YES the scope will be hidden. + +HIDE_SCOPE_NAMES = NO + +# If the SHOW_INCLUDE_FILES tag is set to YES (the default) then Doxygen +# will put a list of the files that are included by a file in the documentation +# of that file. + +SHOW_INCLUDE_FILES = YES + +# If the FORCE_LOCAL_INCLUDES tag is set to YES then Doxygen +# will list include files with double quotes in the documentation +# rather than with sharp brackets. + +FORCE_LOCAL_INCLUDES = NO + +# If the INLINE_INFO tag is set to YES (the default) then a tag [inline] +# is inserted in the documentation for inline members. + +INLINE_INFO = YES + +# If the SORT_MEMBER_DOCS tag is set to YES (the default) then doxygen +# will sort the (detailed) documentation of file and class members +# alphabetically by member name. If set to NO the members will appear in +# declaration order. + +SORT_MEMBER_DOCS = YES + +# If the SORT_BRIEF_DOCS tag is set to YES then doxygen will sort the +# brief documentation of file, namespace and class members alphabetically +# by member name. If set to NO (the default) the members will appear in +# declaration order. + +SORT_BRIEF_DOCS = NO + +# If the SORT_MEMBERS_CTORS_1ST tag is set to YES then doxygen +# will sort the (brief and detailed) documentation of class members so that +# constructors and destructors are listed first. If set to NO (the default) +# the constructors will appear in the respective orders defined by +# SORT_MEMBER_DOCS and SORT_BRIEF_DOCS. +# This tag will be ignored for brief docs if SORT_BRIEF_DOCS is set to NO +# and ignored for detailed docs if SORT_MEMBER_DOCS is set to NO. + +SORT_MEMBERS_CTORS_1ST = NO + +# If the SORT_GROUP_NAMES tag is set to YES then doxygen will sort the +# hierarchy of group names into alphabetical order. If set to NO (the default) +# the group names will appear in their defined order. + +SORT_GROUP_NAMES = NO + +# If the SORT_BY_SCOPE_NAME tag is set to YES, the class list will be +# sorted by fully-qualified names, including namespaces. If set to +# NO (the default), the class list will be sorted only by class name, +# not including the namespace part. +# Note: This option is not very useful if HIDE_SCOPE_NAMES is set to YES. +# Note: This option applies only to the class list, not to the +# alphabetical list. + +SORT_BY_SCOPE_NAME = NO + +# If the STRICT_PROTO_MATCHING option is enabled and doxygen fails to +# do proper type resolution of all parameters of a function it will reject a +# match between the prototype and the implementation of a member function even +# if there is only one candidate or it is obvious which candidate to choose +# by doing a simple string match. By disabling STRICT_PROTO_MATCHING doxygen +# will still accept a match between prototype and implementation in such cases. + +STRICT_PROTO_MATCHING = NO + +# The GENERATE_TODOLIST tag can be used to enable (YES) or +# disable (NO) the todo list. This list is created by putting \todo +# commands in the documentation. + +GENERATE_TODOLIST = YES + +# The GENERATE_TESTLIST tag can be used to enable (YES) or +# disable (NO) the test list. This list is created by putting \test +# commands in the documentation. + +GENERATE_TESTLIST = YES + +# The GENERATE_BUGLIST tag can be used to enable (YES) or +# disable (NO) the bug list. This list is created by putting \bug +# commands in the documentation. + +GENERATE_BUGLIST = YES + +# The GENERATE_DEPRECATEDLIST tag can be used to enable (YES) or +# disable (NO) the deprecated list. This list is created by putting +# \deprecated commands in the documentation. + +GENERATE_DEPRECATEDLIST= YES + +# The ENABLED_SECTIONS tag can be used to enable conditional +# documentation sections, marked by \if sectionname ... \endif. + +ENABLED_SECTIONS = + +# The MAX_INITIALIZER_LINES tag determines the maximum number of lines +# the initial value of a variable or macro consists of for it to appear in +# the documentation. If the initializer consists of more lines than specified +# here it will be hidden. Use a value of 0 to hide initializers completely. +# The appearance of the initializer of individual variables and macros in the +# documentation can be controlled using \showinitializer or \hideinitializer +# command in the documentation regardless of this setting. + +MAX_INITIALIZER_LINES = 30 + +# Set the SHOW_USED_FILES tag to NO to disable the list of files generated +# at the bottom of the documentation of classes and structs. If set to YES the +# list will mention the files that were used to generate the documentation. + +SHOW_USED_FILES = YES + +# Set the SHOW_FILES tag to NO to disable the generation of the Files page. +# This will remove the Files entry from the Quick Index and from the +# Folder Tree View (if specified). The default is YES. + +# We probably will want this, but we have no file documentation yet so it's simpler to remove +# it for now. +SHOW_FILES = NO + +# Set the SHOW_NAMESPACES tag to NO to disable the generation of the +# Namespaces page. +# This will remove the Namespaces entry from the Quick Index +# and from the Folder Tree View (if specified). The default is YES. + +SHOW_NAMESPACES = YES + +# The FILE_VERSION_FILTER tag can be used to specify a program or script that +# doxygen should invoke to get the current version for each file (typically from +# the version control system). Doxygen will invoke the program by executing (via +# popen()) the command , where is the value of +# the FILE_VERSION_FILTER tag, and is the name of an input file +# provided by doxygen. Whatever the program writes to standard output +# is used as the file version. See the manual for examples. + +FILE_VERSION_FILTER = + +# The LAYOUT_FILE tag can be used to specify a layout file which will be parsed +# by doxygen. The layout file controls the global structure of the generated +# output files in an output format independent way. To create the layout file +# that represents doxygen's defaults, run doxygen with the -l option. +# You can optionally specify a file name after the option, if omitted +# DoxygenLayout.xml will be used as the name of the layout file. + +LAYOUT_FILE = + +# The CITE_BIB_FILES tag can be used to specify one or more bib files +# containing the references data. This must be a list of .bib files. The +# .bib extension is automatically appended if omitted. Using this command +# requires the bibtex tool to be installed. See also +# http://en.wikipedia.org/wiki/BibTeX for more info. For LaTeX the style +# of the bibliography can be controlled using LATEX_BIB_STYLE. To use this +# feature you need bibtex and perl available in the search path. + +CITE_BIB_FILES = + +#--------------------------------------------------------------------------- +# configuration options related to warning and progress messages +#--------------------------------------------------------------------------- + +# The QUIET tag can be used to turn on/off the messages that are generated +# by doxygen. Possible values are YES and NO. If left blank NO is used. + +QUIET = NO + +# The WARNINGS tag can be used to turn on/off the warning messages that are +# generated by doxygen. Possible values are YES and NO. If left blank +# NO is used. + +WARNINGS = YES + +# If WARN_IF_UNDOCUMENTED is set to YES, then doxygen will generate warnings +# for undocumented members. If EXTRACT_ALL is set to YES then this flag will +# automatically be disabled. + +WARN_IF_UNDOCUMENTED = YES + +# If WARN_IF_DOC_ERROR is set to YES, doxygen will generate warnings for +# potential errors in the documentation, such as not documenting some +# parameters in a documented function, or documenting parameters that +# don't exist or using markup commands wrongly. + +WARN_IF_DOC_ERROR = YES + +# The WARN_NO_PARAMDOC option can be enabled to get warnings for +# functions that are documented, but have no documentation for their parameters +# or return value. If set to NO (the default) doxygen will only warn about +# wrong or incomplete parameter documentation, but not about the absence of +# documentation. + +WARN_NO_PARAMDOC = NO + +# The WARN_FORMAT tag determines the format of the warning messages that +# doxygen can produce. The string should contain the $file, $line, and $text +# tags, which will be replaced by the file and line number from which the +# warning originated and the warning text. Optionally the format may contain +# $version, which will be replaced by the version of the file (if it could +# be obtained via FILE_VERSION_FILTER) + +WARN_FORMAT = + +# The WARN_LOGFILE tag can be used to specify a file to which warning +# and error messages should be written. If left blank the output is written +# to stderr. + +WARN_LOGFILE = + +#--------------------------------------------------------------------------- +# configuration options related to the input files +#--------------------------------------------------------------------------- + +# The INPUT tag can be used to specify the files and/or directories that contain +# documented source files. You may enter file names like "myfile.cpp" or +# directories like "/usr/src/myproject". Separate the files or directories +# with spaces. + +INPUT = src doc/doxygen/libomp_interface.h +# The ittnotify code also has doxygen documentation, but if we include it here +# it takes over from us! +# src/thirdparty/ittnotify + +# This tag can be used to specify the character encoding of the source files +# that doxygen parses. Internally doxygen uses the UTF-8 encoding, which is +# also the default input encoding. Doxygen uses libiconv (or the iconv built +# into libc) for the transcoding. See http://www.gnu.org/software/libiconv for +# the list of possible encodings. + +INPUT_ENCODING = UTF-8 + +# If the value of the INPUT tag contains directories, you can use the +# FILE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp +# and *.h) to filter out the source-files in the directories. If left +# blank the following patterns are tested: +# *.c *.cc *.cxx *.cpp *.c++ *.d *.java *.ii *.ixx *.ipp *.i++ *.inl *.h *.hh +# *.hxx *.hpp *.h++ *.idl *.odl *.cs *.php *.php3 *.inc *.m *.mm *.dox *.py +# *.f90 *.f *.for *.vhd *.vhdl + +FILE_PATTERNS = *.c *.h *.cpp +# We may also want to include the asm files with appropriate ifdef to ensure +# doxygen doesn't see the content, just the documentation... + +# The RECURSIVE tag can be used to turn specify whether or not subdirectories +# should be searched for input files as well. Possible values are YES and NO. +# If left blank NO is used. + +# Only look in the one directory. +RECURSIVE = NO + +# The EXCLUDE tag can be used to specify files and/or directories that should be +# excluded from the INPUT source files. This way you can easily exclude a +# subdirectory from a directory tree whose root is specified with the INPUT tag. +# Note that relative paths are relative to the directory from which doxygen is +# run. + +EXCLUDE = src/test-touch.c + +# The EXCLUDE_SYMLINKS tag can be used to select whether or not files or +# directories that are symbolic links (a Unix file system feature) are excluded +# from the input. + +EXCLUDE_SYMLINKS = NO + +# If the value of the INPUT tag contains directories, you can use the +# EXCLUDE_PATTERNS tag to specify one or more wildcard patterns to exclude +# certain files from those directories. Note that the wildcards are matched +# against the file with absolute path, so to exclude all test directories +# for example use the pattern */test/* + +EXCLUDE_PATTERNS = + +# The EXCLUDE_SYMBOLS tag can be used to specify one or more symbol names +# (namespaces, classes, functions, etc.) that should be excluded from the +# output. The symbol name can be a fully qualified name, a word, or if the +# wildcard * is used, a substring. Examples: ANamespace, AClass, +# AClass::ANamespace, ANamespace::*Test + +EXCLUDE_SYMBOLS = + +# The EXAMPLE_PATH tag can be used to specify one or more files or +# directories that contain example code fragments that are included (see +# the \include command). + +EXAMPLE_PATH = + +# If the value of the EXAMPLE_PATH tag contains directories, you can use the +# EXAMPLE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp +# and *.h) to filter out the source-files in the directories. If left +# blank all files are included. + +EXAMPLE_PATTERNS = + +# If the EXAMPLE_RECURSIVE tag is set to YES then subdirectories will be +# searched for input files to be used with the \include or \dontinclude +# commands irrespective of the value of the RECURSIVE tag. +# Possible values are YES and NO. If left blank NO is used. + +EXAMPLE_RECURSIVE = NO + +# The IMAGE_PATH tag can be used to specify one or more files or +# directories that contain image that are included in the documentation (see +# the \image command). + +IMAGE_PATH = + +# The INPUT_FILTER tag can be used to specify a program that doxygen should +# invoke to filter for each input file. Doxygen will invoke the filter program +# by executing (via popen()) the command , where +# is the value of the INPUT_FILTER tag, and is the name of an +# input file. Doxygen will then use the output that the filter program writes +# to standard output. +# If FILTER_PATTERNS is specified, this tag will be +# ignored. + +INPUT_FILTER = + +# The FILTER_PATTERNS tag can be used to specify filters on a per file pattern +# basis. +# Doxygen will compare the file name with each pattern and apply the +# filter if there is a match. +# The filters are a list of the form: +# pattern=filter (like *.cpp=my_cpp_filter). See INPUT_FILTER for further +# info on how filters are used. If FILTER_PATTERNS is empty or if +# non of the patterns match the file name, INPUT_FILTER is applied. + +FILTER_PATTERNS = + +# If the FILTER_SOURCE_FILES tag is set to YES, the input filter (if set using +# INPUT_FILTER) will be used to filter the input files when producing source +# files to browse (i.e. when SOURCE_BROWSER is set to YES). + +FILTER_SOURCE_FILES = NO + +# The FILTER_SOURCE_PATTERNS tag can be used to specify source filters per file +# pattern. A pattern will override the setting for FILTER_PATTERN (if any) +# and it is also possible to disable source filtering for a specific pattern +# using *.ext= (so without naming a filter). This option only has effect when +# FILTER_SOURCE_FILES is enabled. + +FILTER_SOURCE_PATTERNS = + +#--------------------------------------------------------------------------- +# configuration options related to source browsing +#--------------------------------------------------------------------------- + +# If the SOURCE_BROWSER tag is set to YES then a list of source files will +# be generated. Documented entities will be cross-referenced with these sources. +# Note: To get rid of all source code in the generated output, make sure also +# VERBATIM_HEADERS is set to NO. + +SOURCE_BROWSER = YES + +# Setting the INLINE_SOURCES tag to YES will include the body +# of functions and classes directly in the documentation. + +INLINE_SOURCES = NO + +# Setting the STRIP_CODE_COMMENTS tag to YES (the default) will instruct +# doxygen to hide any special comment blocks from generated source code +# fragments. Normal C, C++ and Fortran comments will always remain visible. + +STRIP_CODE_COMMENTS = YES + +# If the REFERENCED_BY_RELATION tag is set to YES +# then for each documented function all documented +# functions referencing it will be listed. + +REFERENCED_BY_RELATION = YES + +# If the REFERENCES_RELATION tag is set to YES +# then for each documented function all documented entities +# called/used by that function will be listed. + +REFERENCES_RELATION = NO + +# If the REFERENCES_LINK_SOURCE tag is set to YES (the default) +# and SOURCE_BROWSER tag is set to YES, then the hyperlinks from +# functions in REFERENCES_RELATION and REFERENCED_BY_RELATION lists will +# link to the source code. +# Otherwise they will link to the documentation. + +REFERENCES_LINK_SOURCE = YES + +# If the USE_HTAGS tag is set to YES then the references to source code +# will point to the HTML generated by the htags(1) tool instead of doxygen +# built-in source browser. The htags tool is part of GNU's global source +# tagging system (see http://www.gnu.org/software/global/global.html). You +# will need version 4.8.6 or higher. + +USE_HTAGS = NO + +# If the VERBATIM_HEADERS tag is set to YES (the default) then Doxygen +# will generate a verbatim copy of the header file for each class for +# which an include is specified. Set to NO to disable this. + +VERBATIM_HEADERS = YES + +#--------------------------------------------------------------------------- +# configuration options related to the alphabetical class index +#--------------------------------------------------------------------------- + +# If the ALPHABETICAL_INDEX tag is set to YES, an alphabetical index +# of all compounds will be generated. Enable this if the project +# contains a lot of classes, structs, unions or interfaces. + +ALPHABETICAL_INDEX = YES + +# If the alphabetical index is enabled (see ALPHABETICAL_INDEX) then +# the COLS_IN_ALPHA_INDEX tag can be used to specify the number of columns +# in which this list will be split (can be a number in the range [1..20]) + +COLS_IN_ALPHA_INDEX = 5 + +# In case all classes in a project start with a common prefix, all +# classes will be put under the same header in the alphabetical index. +# The IGNORE_PREFIX tag can be used to specify one or more prefixes that +# should be ignored while generating the index headers. + +IGNORE_PREFIX = + +#--------------------------------------------------------------------------- +# configuration options related to the HTML output +#--------------------------------------------------------------------------- + +# If the GENERATE_HTML tag is set to YES (the default) Doxygen will +# generate HTML output. + +GENERATE_HTML = YES + +# The HTML_OUTPUT tag is used to specify where the HTML docs will be put. +# If a relative path is entered the value of OUTPUT_DIRECTORY will be +# put in front of it. If left blank `html' will be used as the default path. + +HTML_OUTPUT = + +# The HTML_FILE_EXTENSION tag can be used to specify the file extension for +# each generated HTML page (for example: .htm,.php,.asp). If it is left blank +# doxygen will generate files with .html extension. + +HTML_FILE_EXTENSION = .html + +# The HTML_HEADER tag can be used to specify a personal HTML header for +# each generated HTML page. If it is left blank doxygen will generate a +# standard header. Note that when using a custom header you are responsible +# for the proper inclusion of any scripts and style sheets that doxygen +# needs, which is dependent on the configuration options used. +# It is advised to generate a default header using "doxygen -w html +# header.html footer.html stylesheet.css YourConfigFile" and then modify +# that header. Note that the header is subject to change so you typically +# have to redo this when upgrading to a newer version of doxygen or when +# changing the value of configuration settings such as GENERATE_TREEVIEW! + +HTML_HEADER = + +# The HTML_FOOTER tag can be used to specify a personal HTML footer for +# each generated HTML page. If it is left blank doxygen will generate a +# standard footer. + +HTML_FOOTER = + +# The HTML_STYLESHEET tag can be used to specify a user-defined cascading +# style sheet that is used by each HTML page. It can be used to +# fine-tune the look of the HTML output. If left blank doxygen will +# generate a default style sheet. Note that it is recommended to use +# HTML_EXTRA_STYLESHEET instead of this one, as it is more robust and this +# tag will in the future become obsolete. + +HTML_STYLESHEET = + +# The HTML_EXTRA_STYLESHEET tag can be used to specify an additional +# user-defined cascading style sheet that is included after the standard +# style sheets created by doxygen. Using this option one can overrule +# certain style aspects. This is preferred over using HTML_STYLESHEET +# since it does not replace the standard style sheet and is therefor more +# robust against future updates. Doxygen will copy the style sheet file to +# the output directory. + +HTML_EXTRA_STYLESHEET = + +# The HTML_EXTRA_FILES tag can be used to specify one or more extra images or +# other source files which should be copied to the HTML output directory. Note +# that these files will be copied to the base HTML output directory. Use the +# $relpath$ marker in the HTML_HEADER and/or HTML_FOOTER files to load these +# files. In the HTML_STYLESHEET file, use the file name only. Also note that +# the files will be copied as-is; there are no commands or markers available. + +HTML_EXTRA_FILES = + +# The HTML_COLORSTYLE_HUE tag controls the color of the HTML output. +# Doxygen will adjust the colors in the style sheet and background images +# according to this color. Hue is specified as an angle on a colorwheel, +# see http://en.wikipedia.org/wiki/Hue for more information. +# For instance the value 0 represents red, 60 is yellow, 120 is green, +# 180 is cyan, 240 is blue, 300 purple, and 360 is red again. +# The allowed range is 0 to 359. + +HTML_COLORSTYLE_HUE = 220 + +# The HTML_COLORSTYLE_SAT tag controls the purity (or saturation) of +# the colors in the HTML output. For a value of 0 the output will use +# grayscales only. A value of 255 will produce the most vivid colors. + +HTML_COLORSTYLE_SAT = 100 + +# The HTML_COLORSTYLE_GAMMA tag controls the gamma correction applied to +# the luminance component of the colors in the HTML output. Values below +# 100 gradually make the output lighter, whereas values above 100 make +# the output darker. The value divided by 100 is the actual gamma applied, +# so 80 represents a gamma of 0.8, The value 220 represents a gamma of 2.2, +# and 100 does not change the gamma. + +HTML_COLORSTYLE_GAMMA = 80 + +# If the HTML_TIMESTAMP tag is set to YES then the footer of each generated HTML +# page will contain the date and time when the page was generated. Setting +# this to NO can help when comparing the output of multiple runs. + +HTML_TIMESTAMP = NO + +# If the HTML_DYNAMIC_SECTIONS tag is set to YES then the generated HTML +# documentation will contain sections that can be hidden and shown after the +# page has loaded. + +HTML_DYNAMIC_SECTIONS = NO + +# With HTML_INDEX_NUM_ENTRIES one can control the preferred number of +# entries shown in the various tree structured indices initially; the user +# can expand and collapse entries dynamically later on. Doxygen will expand +# the tree to such a level that at most the specified number of entries are +# visible (unless a fully collapsed tree already exceeds this amount). +# So setting the number of entries 1 will produce a full collapsed tree by +# default. 0 is a special value representing an infinite number of entries +# and will result in a full expanded tree by default. + +HTML_INDEX_NUM_ENTRIES = 100 + +# If the GENERATE_DOCSET tag is set to YES, additional index files +# will be generated that can be used as input for Apple's Xcode 3 +# integrated development environment, introduced with OSX 10.5 (Leopard). +# To create a documentation set, doxygen will generate a Makefile in the +# HTML output directory. Running make will produce the docset in that +# directory and running "make install" will install the docset in +# ~/Library/Developer/Shared/Documentation/DocSets so that Xcode will find +# it at startup. +# See http://developer.apple.com/tools/creatingdocsetswithdoxygen.html +# for more information. + +GENERATE_DOCSET = NO + +# When GENERATE_DOCSET tag is set to YES, this tag determines the name of the +# feed. A documentation feed provides an umbrella under which multiple +# documentation sets from a single provider (such as a company or product suite) +# can be grouped. + +DOCSET_FEEDNAME = "Doxygen generated docs" + +# When GENERATE_DOCSET tag is set to YES, this tag specifies a string that +# should uniquely identify the documentation set bundle. This should be a +# reverse domain-name style string, e.g. com.mycompany.MyDocSet. Doxygen +# will append .docset to the name. + +DOCSET_BUNDLE_ID = org.doxygen.Project + +# When GENERATE_PUBLISHER_ID tag specifies a string that should uniquely +# identify the documentation publisher. This should be a reverse domain-name +# style string, e.g. com.mycompany.MyDocSet.documentation. + +DOCSET_PUBLISHER_ID = org.doxygen.Publisher + +# The GENERATE_PUBLISHER_NAME tag identifies the documentation publisher. + +DOCSET_PUBLISHER_NAME = Publisher + +# If the GENERATE_HTMLHELP tag is set to YES, additional index files +# will be generated that can be used as input for tools like the +# Microsoft HTML help workshop to generate a compiled HTML help file (.chm) +# of the generated HTML documentation. + +GENERATE_HTMLHELP = NO + +# If the GENERATE_HTMLHELP tag is set to YES, the CHM_FILE tag can +# be used to specify the file name of the resulting .chm file. You +# can add a path in front of the file if the result should not be +# written to the html output directory. + +CHM_FILE = + +# If the GENERATE_HTMLHELP tag is set to YES, the HHC_LOCATION tag can +# be used to specify the location (absolute path including file name) of +# the HTML help compiler (hhc.exe). If non-empty doxygen will try to run +# the HTML help compiler on the generated index.hhp. + +HHC_LOCATION = + +# If the GENERATE_HTMLHELP tag is set to YES, the GENERATE_CHI flag +# controls if a separate .chi index file is generated (YES) or that +# it should be included in the main .chm file (NO). + +GENERATE_CHI = NO + +# If the GENERATE_HTMLHELP tag is set to YES, the CHM_INDEX_ENCODING +# is used to encode HtmlHelp index (hhk), content (hhc) and project file +# content. + +CHM_INDEX_ENCODING = + +# If the GENERATE_HTMLHELP tag is set to YES, the BINARY_TOC flag +# controls whether a binary table of contents is generated (YES) or a +# normal table of contents (NO) in the .chm file. + +BINARY_TOC = NO + +# The TOC_EXPAND flag can be set to YES to add extra items for group members +# to the contents of the HTML help documentation and to the tree view. + +TOC_EXPAND = NO + +# If the GENERATE_QHP tag is set to YES and both QHP_NAMESPACE and +# QHP_VIRTUAL_FOLDER are set, an additional index file will be generated +# that can be used as input for Qt's qhelpgenerator to generate a +# Qt Compressed Help (.qch) of the generated HTML documentation. + +GENERATE_QHP = NO + +# If the QHG_LOCATION tag is specified, the QCH_FILE tag can +# be used to specify the file name of the resulting .qch file. +# The path specified is relative to the HTML output folder. + +QCH_FILE = + +# The QHP_NAMESPACE tag specifies the namespace to use when generating +# Qt Help Project output. For more information please see +# http://doc.trolltech.com/qthelpproject.html#namespace + +QHP_NAMESPACE = org.doxygen.Project + +# The QHP_VIRTUAL_FOLDER tag specifies the namespace to use when generating +# Qt Help Project output. For more information please see +# http://doc.trolltech.com/qthelpproject.html#virtual-folders + +QHP_VIRTUAL_FOLDER = doc + +# If QHP_CUST_FILTER_NAME is set, it specifies the name of a custom filter to +# add. For more information please see +# http://doc.trolltech.com/qthelpproject.html#custom-filters + +QHP_CUST_FILTER_NAME = + +# The QHP_CUST_FILT_ATTRS tag specifies the list of the attributes of the +# custom filter to add. For more information please see +# +# Qt Help Project / Custom Filters. + +QHP_CUST_FILTER_ATTRS = + +# The QHP_SECT_FILTER_ATTRS tag specifies the list of the attributes this +# project's +# filter section matches. +# +# Qt Help Project / Filter Attributes. + +QHP_SECT_FILTER_ATTRS = + +# If the GENERATE_QHP tag is set to YES, the QHG_LOCATION tag can +# be used to specify the location of Qt's qhelpgenerator. +# If non-empty doxygen will try to run qhelpgenerator on the generated +# .qhp file. + +QHG_LOCATION = + +# If the GENERATE_ECLIPSEHELP tag is set to YES, additional index files +# will be generated, which together with the HTML files, form an Eclipse help +# plugin. To install this plugin and make it available under the help contents +# menu in Eclipse, the contents of the directory containing the HTML and XML +# files needs to be copied into the plugins directory of eclipse. The name of +# the directory within the plugins directory should be the same as +# the ECLIPSE_DOC_ID value. After copying Eclipse needs to be restarted before +# the help appears. + +GENERATE_ECLIPSEHELP = NO + +# A unique identifier for the eclipse help plugin. When installing the plugin +# the directory name containing the HTML and XML files should also have +# this name. + +ECLIPSE_DOC_ID = org.doxygen.Project + +# The DISABLE_INDEX tag can be used to turn on/off the condensed index (tabs) +# at top of each HTML page. The value NO (the default) enables the index and +# the value YES disables it. Since the tabs have the same information as the +# navigation tree you can set this option to NO if you already set +# GENERATE_TREEVIEW to YES. + +DISABLE_INDEX = NO + +# The GENERATE_TREEVIEW tag is used to specify whether a tree-like index +# structure should be generated to display hierarchical information. +# If the tag value is set to YES, a side panel will be generated +# containing a tree-like index structure (just like the one that +# is generated for HTML Help). For this to work a browser that supports +# JavaScript, DHTML, CSS and frames is required (i.e. any modern browser). +# Windows users are probably better off using the HTML help feature. +# Since the tree basically has the same information as the tab index you +# could consider to set DISABLE_INDEX to NO when enabling this option. + +GENERATE_TREEVIEW = NO + +# The ENUM_VALUES_PER_LINE tag can be used to set the number of enum values +# (range [0,1..20]) that doxygen will group on one line in the generated HTML +# documentation. Note that a value of 0 will completely suppress the enum +# values from appearing in the overview section. + +ENUM_VALUES_PER_LINE = 4 + +# If the treeview is enabled (see GENERATE_TREEVIEW) then this tag can be +# used to set the initial width (in pixels) of the frame in which the tree +# is shown. + +TREEVIEW_WIDTH = 250 + +# When the EXT_LINKS_IN_WINDOW option is set to YES doxygen will open +# links to external symbols imported via tag files in a separate window. + +EXT_LINKS_IN_WINDOW = NO + +# Use this tag to change the font size of Latex formulas included +# as images in the HTML documentation. The default is 10. Note that +# when you change the font size after a successful doxygen run you need +# to manually remove any form_*.png images from the HTML output directory +# to force them to be regenerated. + +FORMULA_FONTSIZE = 10 + +# Use the FORMULA_TRANPARENT tag to determine whether or not the images +# generated for formulas are transparent PNGs. Transparent PNGs are +# not supported properly for IE 6.0, but are supported on all modern browsers. +# Note that when changing this option you need to delete any form_*.png files +# in the HTML output before the changes have effect. + +FORMULA_TRANSPARENT = YES + +# Enable the USE_MATHJAX option to render LaTeX formulas using MathJax +# (see http://www.mathjax.org) which uses client side Javascript for the +# rendering instead of using prerendered bitmaps. Use this if you do not +# have LaTeX installed or if you want to formulas look prettier in the HTML +# output. When enabled you may also need to install MathJax separately and +# configure the path to it using the MATHJAX_RELPATH option. + +USE_MATHJAX = NO + +# When MathJax is enabled you need to specify the location relative to the +# HTML output directory using the MATHJAX_RELPATH option. The destination +# directory should contain the MathJax.js script. For instance, if the mathjax +# directory is located at the same level as the HTML output directory, then +# MATHJAX_RELPATH should be ../mathjax. The default value points to +# the MathJax Content Delivery Network so you can quickly see the result without +# installing MathJax. +# However, it is strongly recommended to install a local +# copy of MathJax from http://www.mathjax.org before deployment. + +MATHJAX_RELPATH = http://cdn.mathjax.org/mathjax/latest + +# The MATHJAX_EXTENSIONS tag can be used to specify one or MathJax extension +# names that should be enabled during MathJax rendering. + +MATHJAX_EXTENSIONS = + +# When the SEARCHENGINE tag is enabled doxygen will generate a search box +# for the HTML output. The underlying search engine uses javascript +# and DHTML and should work on any modern browser. Note that when using +# HTML help (GENERATE_HTMLHELP), Qt help (GENERATE_QHP), or docsets +# (GENERATE_DOCSET) there is already a search function so this one should +# typically be disabled. For large projects the javascript based search engine +# can be slow, then enabling SERVER_BASED_SEARCH may provide a better solution. + +SEARCHENGINE = YES + +# When the SERVER_BASED_SEARCH tag is enabled the search engine will be +# implemented using a PHP enabled web server instead of at the web client +# using Javascript. Doxygen will generate the search PHP script and index +# file to put on the web server. The advantage of the server +# based approach is that it scales better to large projects and allows +# full text search. The disadvantages are that it is more difficult to setup +# and does not have live searching capabilities. + +SERVER_BASED_SEARCH = NO + +#--------------------------------------------------------------------------- +# configuration options related to the LaTeX output +#--------------------------------------------------------------------------- + +# If the GENERATE_LATEX tag is set to YES (the default) Doxygen will +# generate Latex output. + +GENERATE_LATEX = YES + +# The LATEX_OUTPUT tag is used to specify where the LaTeX docs will be put. +# If a relative path is entered the value of OUTPUT_DIRECTORY will be +# put in front of it. If left blank `latex' will be used as the default path. + +LATEX_OUTPUT = + +# The LATEX_CMD_NAME tag can be used to specify the LaTeX command name to be +# invoked. If left blank `latex' will be used as the default command name. +# Note that when enabling USE_PDFLATEX this option is only used for +# generating bitmaps for formulas in the HTML output, but not in the +# Makefile that is written to the output directory. + +LATEX_CMD_NAME = latex + +# The MAKEINDEX_CMD_NAME tag can be used to specify the command name to +# generate index for LaTeX. If left blank `makeindex' will be used as the +# default command name. + +MAKEINDEX_CMD_NAME = makeindex + +# If the COMPACT_LATEX tag is set to YES Doxygen generates more compact +# LaTeX documents. This may be useful for small projects and may help to +# save some trees in general. + +COMPACT_LATEX = NO + +# The PAPER_TYPE tag can be used to set the paper type that is used +# by the printer. Possible values are: a4, letter, legal and +# executive. If left blank a4wide will be used. + +PAPER_TYPE = a4wide + +# The EXTRA_PACKAGES tag can be to specify one or more names of LaTeX +# packages that should be included in the LaTeX output. + +EXTRA_PACKAGES = + +# The LATEX_HEADER tag can be used to specify a personal LaTeX header for +# the generated latex document. The header should contain everything until +# the first chapter. If it is left blank doxygen will generate a +# standard header. Notice: only use this tag if you know what you are doing! + +LATEX_HEADER = doc/doxygen/header.tex + +# The LATEX_FOOTER tag can be used to specify a personal LaTeX footer for +# the generated latex document. The footer should contain everything after +# the last chapter. If it is left blank doxygen will generate a +# standard footer. Notice: only use this tag if you know what you are doing! + +LATEX_FOOTER = + +# If the PDF_HYPERLINKS tag is set to YES, the LaTeX that is generated +# is prepared for conversion to pdf (using ps2pdf). The pdf file will +# contain links (just like the HTML output) instead of page references +# This makes the output suitable for online browsing using a pdf viewer. + +PDF_HYPERLINKS = YES + +# If the USE_PDFLATEX tag is set to YES, pdflatex will be used instead of +# plain latex in the generated Makefile. Set this option to YES to get a +# higher quality PDF documentation. + +USE_PDFLATEX = YES + +# If the LATEX_BATCHMODE tag is set to YES, doxygen will add the \\batchmode. +# command to the generated LaTeX files. This will instruct LaTeX to keep +# running if errors occur, instead of asking the user for help. +# This option is also used when generating formulas in HTML. + +LATEX_BATCHMODE = NO + +# If LATEX_HIDE_INDICES is set to YES then doxygen will not +# include the index chapters (such as File Index, Compound Index, etc.) +# in the output. + +LATEX_HIDE_INDICES = NO + +# If LATEX_SOURCE_CODE is set to YES then doxygen will include +# source code with syntax highlighting in the LaTeX output. +# Note that which sources are shown also depends on other settings +# such as SOURCE_BROWSER. + +LATEX_SOURCE_CODE = NO + +# The LATEX_BIB_STYLE tag can be used to specify the style to use for the +# bibliography, e.g. plainnat, or ieeetr. The default style is "plain". See +# http://en.wikipedia.org/wiki/BibTeX for more info. + +LATEX_BIB_STYLE = plain + +#--------------------------------------------------------------------------- +# configuration options related to the RTF output +#--------------------------------------------------------------------------- + +# If the GENERATE_RTF tag is set to YES Doxygen will generate RTF output +# The RTF output is optimized for Word 97 and may not look very pretty with +# other RTF readers or editors. + +GENERATE_RTF = NO + +# The RTF_OUTPUT tag is used to specify where the RTF docs will be put. +# If a relative path is entered the value of OUTPUT_DIRECTORY will be +# put in front of it. If left blank `rtf' will be used as the default path. + +RTF_OUTPUT = + +# If the COMPACT_RTF tag is set to YES Doxygen generates more compact +# RTF documents. This may be useful for small projects and may help to +# save some trees in general. + +COMPACT_RTF = NO + +# If the RTF_HYPERLINKS tag is set to YES, the RTF that is generated +# will contain hyperlink fields. The RTF file will +# contain links (just like the HTML output) instead of page references. +# This makes the output suitable for online browsing using WORD or other +# programs which support those fields. +# Note: wordpad (write) and others do not support links. + +RTF_HYPERLINKS = NO + +# Load style sheet definitions from file. Syntax is similar to doxygen's +# config file, i.e. a series of assignments. You only have to provide +# replacements, missing definitions are set to their default value. + +RTF_STYLESHEET_FILE = + +# Set optional variables used in the generation of an rtf document. +# Syntax is similar to doxygen's config file. + +RTF_EXTENSIONS_FILE = + +#--------------------------------------------------------------------------- +# configuration options related to the man page output +#--------------------------------------------------------------------------- + +# If the GENERATE_MAN tag is set to YES (the default) Doxygen will +# generate man pages + +GENERATE_MAN = NO + +# The MAN_OUTPUT tag is used to specify where the man pages will be put. +# If a relative path is entered the value of OUTPUT_DIRECTORY will be +# put in front of it. If left blank `man' will be used as the default path. + +MAN_OUTPUT = + +# The MAN_EXTENSION tag determines the extension that is added to +# the generated man pages (default is the subroutine's section .3) + +MAN_EXTENSION = + +# If the MAN_LINKS tag is set to YES and Doxygen generates man output, +# then it will generate one additional man file for each entity +# documented in the real man page(s). These additional files +# only source the real man page, but without them the man command +# would be unable to find the correct page. The default is NO. + +MAN_LINKS = NO + +#--------------------------------------------------------------------------- +# configuration options related to the XML output +#--------------------------------------------------------------------------- + +# If the GENERATE_XML tag is set to YES Doxygen will +# generate an XML file that captures the structure of +# the code including all documentation. + +GENERATE_XML = NO + +# The XML_OUTPUT tag is used to specify where the XML pages will be put. +# If a relative path is entered the value of OUTPUT_DIRECTORY will be +# put in front of it. If left blank `xml' will be used as the default path. + +XML_OUTPUT = xml + +# The XML_SCHEMA tag can be used to specify an XML schema, +# which can be used by a validating XML parser to check the +# syntax of the XML files. + +XML_SCHEMA = + +# The XML_DTD tag can be used to specify an XML DTD, +# which can be used by a validating XML parser to check the +# syntax of the XML files. + +XML_DTD = + +# If the XML_PROGRAMLISTING tag is set to YES Doxygen will +# dump the program listings (including syntax highlighting +# and cross-referencing information) to the XML output. Note that +# enabling this will significantly increase the size of the XML output. + +XML_PROGRAMLISTING = YES + +#--------------------------------------------------------------------------- +# configuration options for the AutoGen Definitions output +#--------------------------------------------------------------------------- + +# If the GENERATE_AUTOGEN_DEF tag is set to YES Doxygen will +# generate an AutoGen Definitions (see autogen.sf.net) file +# that captures the structure of the code including all +# documentation. Note that this feature is still experimental +# and incomplete at the moment. + +GENERATE_AUTOGEN_DEF = NO + +#--------------------------------------------------------------------------- +# configuration options related to the Perl module output +#--------------------------------------------------------------------------- + +# If the GENERATE_PERLMOD tag is set to YES Doxygen will +# generate a Perl module file that captures the structure of +# the code including all documentation. Note that this +# feature is still experimental and incomplete at the +# moment. + +GENERATE_PERLMOD = NO + +# If the PERLMOD_LATEX tag is set to YES Doxygen will generate +# the necessary Makefile rules, Perl scripts and LaTeX code to be able +# to generate PDF and DVI output from the Perl module output. + +PERLMOD_LATEX = NO + +# If the PERLMOD_PRETTY tag is set to YES the Perl module output will be +# nicely formatted so it can be parsed by a human reader. +# This is useful +# if you want to understand what is going on. +# On the other hand, if this +# tag is set to NO the size of the Perl module output will be much smaller +# and Perl will parse it just the same. + +PERLMOD_PRETTY = YES + +# The names of the make variables in the generated doxyrules.make file +# are prefixed with the string contained in PERLMOD_MAKEVAR_PREFIX. +# This is useful so different doxyrules.make files included by the same +# Makefile don't overwrite each other's variables. + +PERLMOD_MAKEVAR_PREFIX = + +#--------------------------------------------------------------------------- +# Configuration options related to the preprocessor +#--------------------------------------------------------------------------- + +# If the ENABLE_PREPROCESSING tag is set to YES (the default) Doxygen will +# evaluate all C-preprocessor directives found in the sources and include +# files. + +ENABLE_PREPROCESSING = YES + +# If the MACRO_EXPANSION tag is set to YES Doxygen will expand all macro +# names in the source code. If set to NO (the default) only conditional +# compilation will be performed. Macro expansion can be done in a controlled +# way by setting EXPAND_ONLY_PREDEF to YES. + +MACRO_EXPANSION = YES + +# If the EXPAND_ONLY_PREDEF and MACRO_EXPANSION tags are both set to YES +# then the macro expansion is limited to the macros specified with the +# PREDEFINED and EXPAND_AS_DEFINED tags. + +EXPAND_ONLY_PREDEF = YES + +# If the SEARCH_INCLUDES tag is set to YES (the default) the includes files +# pointed to by INCLUDE_PATH will be searched when a #include is found. + +SEARCH_INCLUDES = YES + +# The INCLUDE_PATH tag can be used to specify one or more directories that +# contain include files that are not input files but should be processed by +# the preprocessor. + +INCLUDE_PATH = + +# You can use the INCLUDE_FILE_PATTERNS tag to specify one or more wildcard +# patterns (like *.h and *.hpp) to filter out the header-files in the +# directories. If left blank, the patterns specified with FILE_PATTERNS will +# be used. + +INCLUDE_FILE_PATTERNS = + +# The PREDEFINED tag can be used to specify one or more macro names that +# are defined before the preprocessor is started (similar to the -D option of +# gcc). The argument of the tag is a list of macros of the form: name +# or name=definition (no spaces). If the definition and the = are +# omitted =1 is assumed. To prevent a macro definition from being +# undefined via #undef or recursively expanded use the := operator +# instead of the = operator. + +PREDEFINED = OMP_30_ENABLED=1, OMP_40_ENABLED=1, KMP_STATS_ENABLED=1 + +# If the MACRO_EXPANSION and EXPAND_ONLY_PREDEF tags are set to YES then +# this tag can be used to specify a list of macro names that should be expanded. +# The macro definition that is found in the sources will be used. +# Use the PREDEFINED tag if you want to use a different macro definition that +# overrules the definition found in the source code. + +EXPAND_AS_DEFINED = + +# If the SKIP_FUNCTION_MACROS tag is set to YES (the default) then +# doxygen's preprocessor will remove all references to function-like macros +# that are alone on a line, have an all uppercase name, and do not end with a +# semicolon, because these will confuse the parser if not removed. + +SKIP_FUNCTION_MACROS = YES + +#--------------------------------------------------------------------------- +# Configuration::additions related to external references +#--------------------------------------------------------------------------- + +# The TAGFILES option can be used to specify one or more tagfiles. For each +# tag file the location of the external documentation should be added. The +# format of a tag file without this location is as follows: +# +# TAGFILES = file1 file2 ... +# Adding location for the tag files is done as follows: +# +# TAGFILES = file1=loc1 "file2 = loc2" ... +# where "loc1" and "loc2" can be relative or absolute paths +# or URLs. Note that each tag file must have a unique name (where the name does +# NOT include the path). If a tag file is not located in the directory in which +# doxygen is run, you must also specify the path to the tagfile here. + +TAGFILES = + +# When a file name is specified after GENERATE_TAGFILE, doxygen will create +# a tag file that is based on the input files it reads. + +GENERATE_TAGFILE = + +# If the ALLEXTERNALS tag is set to YES all external classes will be listed +# in the class index. If set to NO only the inherited external classes +# will be listed. + +ALLEXTERNALS = NO + +# If the EXTERNAL_GROUPS tag is set to YES all external groups will be listed +# in the modules index. If set to NO, only the current project's groups will +# be listed. + +EXTERNAL_GROUPS = YES + +# The PERL_PATH should be the absolute path and name of the perl script +# interpreter (i.e. the result of `which perl'). + +PERL_PATH = + +#--------------------------------------------------------------------------- +# Configuration options related to the dot tool +#--------------------------------------------------------------------------- + +# If the CLASS_DIAGRAMS tag is set to YES (the default) Doxygen will +# generate a inheritance diagram (in HTML, RTF and LaTeX) for classes with base +# or super classes. Setting the tag to NO turns the diagrams off. Note that +# this option also works with HAVE_DOT disabled, but it is recommended to +# install and use dot, since it yields more powerful graphs. + +CLASS_DIAGRAMS = YES + +# You can define message sequence charts within doxygen comments using the \msc +# command. Doxygen will then run the mscgen tool (see +# http://www.mcternan.me.uk/mscgen/) to produce the chart and insert it in the +# documentation. The MSCGEN_PATH tag allows you to specify the directory where +# the mscgen tool resides. If left empty the tool is assumed to be found in the +# default search path. + +MSCGEN_PATH = + +# If set to YES, the inheritance and collaboration graphs will hide +# inheritance and usage relations if the target is undocumented +# or is not a class. + +HIDE_UNDOC_RELATIONS = YES + +# If you set the HAVE_DOT tag to YES then doxygen will assume the dot tool is +# available from the path. This tool is part of Graphviz, a graph visualization +# toolkit from AT&T and Lucent Bell Labs. The other options in this section +# have no effect if this option is set to NO (the default) + +HAVE_DOT = NO + +# The DOT_NUM_THREADS specifies the number of dot invocations doxygen is +# allowed to run in parallel. When set to 0 (the default) doxygen will +# base this on the number of processors available in the system. You can set it +# explicitly to a value larger than 0 to get control over the balance +# between CPU load and processing speed. + +DOT_NUM_THREADS = 0 + +# By default doxygen will use the Helvetica font for all dot files that +# doxygen generates. When you want a differently looking font you can specify +# the font name using DOT_FONTNAME. You need to make sure dot is able to find +# the font, which can be done by putting it in a standard location or by setting +# the DOTFONTPATH environment variable or by setting DOT_FONTPATH to the +# directory containing the font. + +DOT_FONTNAME = Helvetica + +# The DOT_FONTSIZE tag can be used to set the size of the font of dot graphs. +# The default size is 10pt. + +DOT_FONTSIZE = 10 + +# By default doxygen will tell dot to use the Helvetica font. +# If you specify a different font using DOT_FONTNAME you can use DOT_FONTPATH to +# set the path where dot can find it. + +DOT_FONTPATH = + +# If the CLASS_GRAPH and HAVE_DOT tags are set to YES then doxygen +# will generate a graph for each documented class showing the direct and +# indirect inheritance relations. Setting this tag to YES will force the +# CLASS_DIAGRAMS tag to NO. + +CLASS_GRAPH = YES + +# If the COLLABORATION_GRAPH and HAVE_DOT tags are set to YES then doxygen +# will generate a graph for each documented class showing the direct and +# indirect implementation dependencies (inheritance, containment, and +# class references variables) of the class with other documented classes. + +COLLABORATION_GRAPH = NO + +# If the GROUP_GRAPHS and HAVE_DOT tags are set to YES then doxygen +# will generate a graph for groups, showing the direct groups dependencies + +GROUP_GRAPHS = YES + +# If the UML_LOOK tag is set to YES doxygen will generate inheritance and +# collaboration diagrams in a style similar to the OMG's Unified Modeling +# Language. + +UML_LOOK = NO + +# If the UML_LOOK tag is enabled, the fields and methods are shown inside +# the class node. If there are many fields or methods and many nodes the +# graph may become too big to be useful. The UML_LIMIT_NUM_FIELDS +# threshold limits the number of items for each type to make the size more +# manageable. Set this to 0 for no limit. Note that the threshold may be +# exceeded by 50% before the limit is enforced. + +UML_LIMIT_NUM_FIELDS = 10 + +# If set to YES, the inheritance and collaboration graphs will show the +# relations between templates and their instances. + +TEMPLATE_RELATIONS = YES + +# If the ENABLE_PREPROCESSING, SEARCH_INCLUDES, INCLUDE_GRAPH, and HAVE_DOT +# tags are set to YES then doxygen will generate a graph for each documented +# file showing the direct and indirect include dependencies of the file with +# other documented files. + +INCLUDE_GRAPH = NO + +# If the ENABLE_PREPROCESSING, SEARCH_INCLUDES, INCLUDED_BY_GRAPH, and +# HAVE_DOT tags are set to YES then doxygen will generate a graph for each +# documented header file showing the documented files that directly or +# indirectly include this file. + +INCLUDED_BY_GRAPH = NO + +# If the CALL_GRAPH and HAVE_DOT options are set to YES then +# doxygen will generate a call dependency graph for every global function +# or class method. Note that enabling this option will significantly increase +# the time of a run. So in most cases it will be better to enable call graphs +# for selected functions only using the \callgraph command. + +CALL_GRAPH = NO + +# If the CALLER_GRAPH and HAVE_DOT tags are set to YES then +# doxygen will generate a caller dependency graph for every global function +# or class method. Note that enabling this option will significantly increase +# the time of a run. So in most cases it will be better to enable caller +# graphs for selected functions only using the \callergraph command. + +CALLER_GRAPH = NO + +# If the GRAPHICAL_HIERARCHY and HAVE_DOT tags are set to YES then doxygen +# will generate a graphical hierarchy of all classes instead of a textual one. + +GRAPHICAL_HIERARCHY = YES + +# If the DIRECTORY_GRAPH and HAVE_DOT tags are set to YES +# then doxygen will show the dependencies a directory has on other directories +# in a graphical way. The dependency relations are determined by the #include +# relations between the files in the directories. + +DIRECTORY_GRAPH = YES + +# The DOT_IMAGE_FORMAT tag can be used to set the image format of the images +# generated by dot. Possible values are svg, png, jpg, or gif. +# If left blank png will be used. If you choose svg you need to set +# HTML_FILE_EXTENSION to xhtml in order to make the SVG files +# visible in IE 9+ (other browsers do not have this requirement). + +DOT_IMAGE_FORMAT = png + +# If DOT_IMAGE_FORMAT is set to svg, then this option can be set to YES to +# enable generation of interactive SVG images that allow zooming and panning. +# Note that this requires a modern browser other than Internet Explorer. +# Tested and working are Firefox, Chrome, Safari, and Opera. For IE 9+ you +# need to set HTML_FILE_EXTENSION to xhtml in order to make the SVG files +# visible. Older versions of IE do not have SVG support. + +INTERACTIVE_SVG = NO + +# The tag DOT_PATH can be used to specify the path where the dot tool can be +# found. If left blank, it is assumed the dot tool can be found in the path. + +DOT_PATH = + +# The DOTFILE_DIRS tag can be used to specify one or more directories that +# contain dot files that are included in the documentation (see the +# \dotfile command). + +DOTFILE_DIRS = + +# The MSCFILE_DIRS tag can be used to specify one or more directories that +# contain msc files that are included in the documentation (see the +# \mscfile command). + +MSCFILE_DIRS = + +# The DOT_GRAPH_MAX_NODES tag can be used to set the maximum number of +# nodes that will be shown in the graph. If the number of nodes in a graph +# becomes larger than this value, doxygen will truncate the graph, which is +# visualized by representing a node as a red box. Note that doxygen if the +# number of direct children of the root node in a graph is already larger than +# DOT_GRAPH_MAX_NODES then the graph will not be shown at all. Also note +# that the size of a graph can be further restricted by MAX_DOT_GRAPH_DEPTH. + +DOT_GRAPH_MAX_NODES = 50 + +# The MAX_DOT_GRAPH_DEPTH tag can be used to set the maximum depth of the +# graphs generated by dot. A depth value of 3 means that only nodes reachable +# from the root by following a path via at most 3 edges will be shown. Nodes +# that lay further from the root node will be omitted. Note that setting this +# option to 1 or 2 may greatly reduce the computation time needed for large +# code bases. Also note that the size of a graph can be further restricted by +# DOT_GRAPH_MAX_NODES. Using a depth of 0 means no depth restriction. + +MAX_DOT_GRAPH_DEPTH = 0 + +# Set the DOT_TRANSPARENT tag to YES to generate images with a transparent +# background. This is disabled by default, because dot on Windows does not +# seem to support this out of the box. Warning: Depending on the platform used, +# enabling this option may lead to badly anti-aliased labels on the edges of +# a graph (i.e. they become hard to read). + +DOT_TRANSPARENT = NO + +# Set the DOT_MULTI_TARGETS tag to YES allow dot to generate multiple output +# files in one run (i.e. multiple -o and -T options on the command line). This +# makes dot run faster, but since only newer versions of dot (>1.8.10) +# support this, this feature is disabled by default. + +DOT_MULTI_TARGETS = NO + +# If the GENERATE_LEGEND tag is set to YES (the default) Doxygen will +# generate a legend page explaining the meaning of the various boxes and +# arrows in the dot generated graphs. + +GENERATE_LEGEND = YES + +# If the DOT_CLEANUP tag is set to YES (the default) Doxygen will +# remove the intermediate dot files that are used to generate +# the various graphs. + +DOT_CLEANUP = YES diff --git a/pstl/CREDITS.txt b/pstl/CREDITS.txt index 4945fd5ad308be..174722510fdea4 100644 --- a/pstl/CREDITS.txt +++ b/pstl/CREDITS.txt @@ -1,21 +1,21 @@ -This file is a partial list of people who have contributed to the LLVM/pstl -(Parallel STL) project. If you have contributed a patch or made some other -contribution to LLVM/pstl, please submit a patch to this file to add yourself, -and it will be done! - -The list is sorted by surname and formatted to allow easy grepping and -beautification by scripts. The fields are: name (N), email (E), web-address -(W), PGP key ID and fingerprint (P), description (D), and snail-mail address -(S). - -N: Intel Corporation -W: http://www.intel.com -D: Created the initial implementation. - -N: Thomas Rodgers -E: trodgers at redhat.com -D: Identifier name transformation for inclusion in a Standard C++ library. - -N: Christopher Nelson -E: nadiasvertex at gmail.com -D: Add support for an OpenMP backend. +This file is a partial list of people who have contributed to the LLVM/pstl +(Parallel STL) project. If you have contributed a patch or made some other +contribution to LLVM/pstl, please submit a patch to this file to add yourself, +and it will be done! + +The list is sorted by surname and formatted to allow easy grepping and +beautification by scripts. The fields are: name (N), email (E), web-address +(W), PGP key ID and fingerprint (P), description (D), and snail-mail address +(S). + +N: Intel Corporation +W: http://www.intel.com +D: Created the initial implementation. + +N: Thomas Rodgers +E: trodgers at redhat.com +D: Identifier name transformation for inclusion in a Standard C++ library. + +N: Christopher Nelson +E: nadiasvertex at gmail.com +D: Add support for an OpenMP backend. From openmp-commits at lists.llvm.org Thu Oct 17 06:47:17 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Thu, 17 Oct 2024 06:47:17 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <67111565.050a0220.76e4e.3bf9@mx.google.com> ldrumm wrote: I've let this patch stew for long enough, and I think it's now time. The spurious test failures in files unchanged by this patch have gone away. After re-reading the discussion above I'm ready to merge these changes. There's been a good discussion, which helped me polish this a fair bit, and a few contentions which needed clarifying or ironing out. The main contention seems to have been that what's stored in git's database is always LF - even for for windows users or files that *need* CRLF endings (e.g. `.bat`). As I pointed out in a few comments (e.g.[1](https://github.com/llvm/llvm-project/pull/86318#issuecomment-2078216855),[2](https://github.com/llvm/llvm-project/pull/86318#issuecomment-2078219678),[3](https://github.com/llvm/llvm-project/pull/86318#issuecomment-2078259364)) it may be unintuitive to store everything in git's internal database with normalized LF, but it's how git *checks them out* that's important, and I believe I've ironed out any issues here by tracking down files and testcases that depend on a particular line-ending style and adding rules for them. For those who had such concerns, I believe I've clarified the principal and pointed out the practicalities that will make life *easier* for windows *and* unix contributors. To summarize: **No user should need adjust their local config as a result of this change** - Adrian is in favour - Fangrui Song has approved - Chris Bienemann has approved - Reid Kleckner had some concerns but they were clarified - Mehdi Amini approves in principle - Florian had some concerns I *believe* we addressed. - Aaron and Saleem had some objections that I *believe* I addressed in [this comment](https://github.com/llvm/llvm-project/pull/86318#issuecomment-2078259364) since I pointed out that the files will be checked out with the correct line endings on *all* systems - regardless of local config. I'll keep an eye on this, and will be happy to react to any issues that arise over the next week or so. Thanks for all the input. https://github.com/llvm/llvm-project/pull/86318 From openmp-commits at lists.llvm.org Thu Oct 17 06:49:03 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Thu, 17 Oct 2024 06:49:03 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <671115cf.050a0220.9bcf5.4054@mx.google.com> https://github.com/ldrumm updated https://github.com/llvm/llvm-project/pull/86318 >From dccebddb3b802c4c1fe287222e454b63f850f012 Mon Sep 17 00:00:00 2001 From: Luke Drummond Date: Fri, 22 Mar 2024 17:09:54 +0000 Subject: [PATCH 1/2] Finally formalise our defacto line-ending policy Historically, we've not automatically enforced how git tracks line endings, but there are many, many commits that "undo" unintended CRLFs getting into history. `git log --pretty=oneline --grep=CRLF` shows nearly 100 commits involving reverts of CRLF making its way into the index and then history. As far as I can tell, there are none the other way round except for specific cases like `.bat` files or tests for parsers that need to accept such sequences. Of note, one of the earliest of those listed in that output is: ``` commit 9795860250734e5c2a879546c534e35d9edd5944 Author: NAKAMURA Takumi Date: Thu Feb 3 11:41:27 2011 +0000 cmake/*: Add svn:eol-style=native and fix CRLF. llvm-svn: 124793 ``` ...which introduced such a defacto policy for subversion. With old versions of git, it's been a bit of a crap-shoot whether enforcing storing line endings in the history will upset checkouts on machines where such line endings are the norm. Indeed many users have enforced that git checks out the working copy according to a global or per-user config via core crlf, or core autocrlf. For ~8 years now[1], however, git has supported the ability to "do as the Romans do" on checkout, but internally store subsets of text files with line-endings specified via a system of patterns in the `.gitattributes` file. Since we now have this ability, and we've been specifying attributes for various binary files, I think it makes sense to rid us of all that work converting things "back", and just let git handle the local checkout. Thus the new toplevel policy here is * text=auto In simple terms this means "unless otherwise specified, convert all files considered "text" files to LF in the project history, but check them out as expected on the local machine. What is "expected on the local machine" is dependent on configuration and default. For those files in the repository that *do* need CRLF endings, I've adopted a policy of `eol=crlf` which means that git will store them in history with LF, but regardless of user config, they'll be checked out in tree with CRLF. Finally, existing files have been "corrected" in history via `git add --renormalize .` End users should *not* need to adjust their local git config or workflow. [1]: git 2.10 was released with fixed support for fine-grained line-ending tracking that respects user-config *and* repo policy. This can be considered the point at which git will respect both the user's local working tree preference *and* the history as specified by the maintainers. See https://github.com/git/git/blob/master/Documentation/RelNotes/2.10.0.txt#L248 for the release note. --- .gitattributes | 7 +++++++ clang-tools-extra/clangd/test/.gitattributes | 3 +++ clang/test/.gitattributes | 4 ++++ llvm/docs/TestingGuide.rst | 6 ++++++ llvm/test/FileCheck/.gitattributes | 1 + llvm/test/tools/llvm-ar/Inputs/.gitattributes | 1 + llvm/utils/lit/tests/Inputs/shtest-shell/.gitattributes | 1 + 7 files changed, 23 insertions(+) create mode 100644 clang-tools-extra/clangd/test/.gitattributes create mode 100644 clang/test/.gitattributes create mode 100644 llvm/test/FileCheck/.gitattributes create mode 100644 llvm/test/tools/llvm-ar/Inputs/.gitattributes create mode 100644 llvm/utils/lit/tests/Inputs/shtest-shell/.gitattributes diff --git a/.gitattributes b/.gitattributes index 6b281f33f737db..aced01d485c181 100644 --- a/.gitattributes +++ b/.gitattributes @@ -1,3 +1,10 @@ +# Checkout as native, commit as LF except in specific circumstances +* text=auto +*.bat text eol=crlf +*.rc text eol=crlf +*.sln text eol=crlf +*.natvis text eol=crlf + libcxx/src/**/*.cpp merge=libcxx-reformat libcxx/include/**/*.h merge=libcxx-reformat diff --git a/clang-tools-extra/clangd/test/.gitattributes b/clang-tools-extra/clangd/test/.gitattributes new file mode 100644 index 00000000000000..20971adc2b5d03 --- /dev/null +++ b/clang-tools-extra/clangd/test/.gitattributes @@ -0,0 +1,3 @@ +input-mirror.test text eol=crlf +too_large.test text eol=crlf +protocol.test text eol=crlf diff --git a/clang/test/.gitattributes b/clang/test/.gitattributes new file mode 100644 index 00000000000000..160fc6cf561751 --- /dev/null +++ b/clang/test/.gitattributes @@ -0,0 +1,4 @@ +FixIt/fixit-newline-style.c text eol=crlf +Frontend/system-header-line-directive-ms-lineendings.c text eol=crlf +Frontend/rewrite-includes-mixed-eol-crlf.* text eol=crlf +clang/test/Frontend/rewrite-includes-mixed-eol-lf.h text eolf=lf diff --git a/llvm/docs/TestingGuide.rst b/llvm/docs/TestingGuide.rst index 08617933519fdb..344a295226f6ae 100644 --- a/llvm/docs/TestingGuide.rst +++ b/llvm/docs/TestingGuide.rst @@ -360,6 +360,12 @@ Best practices for regression tests - Try to give values (including variables, blocks and functions) meaningful names, and avoid retaining complex names generated by the optimization pipeline (such as ``%foo.0.0.0.0.0.0``). +- If your tests depend on specific input file encodings, beware of line-ending + issues across different platforms, and in the project's history. Before you + commit tests that depend on explicit encodings, consider adding filetype or + specific line-ending annotations to a `<.gitattributes + https://git-scm.com/docs/gitattributes#_effects>`_ file in the appropriate + directory in the repository. Extra files ----------- diff --git a/llvm/test/FileCheck/.gitattributes b/llvm/test/FileCheck/.gitattributes new file mode 100644 index 00000000000000..ba27d7fad76d50 --- /dev/null +++ b/llvm/test/FileCheck/.gitattributes @@ -0,0 +1 @@ +dos-style-eol.txt text eol=crlf diff --git a/llvm/test/tools/llvm-ar/Inputs/.gitattributes b/llvm/test/tools/llvm-ar/Inputs/.gitattributes new file mode 100644 index 00000000000000..6c8a26285daf7f --- /dev/null +++ b/llvm/test/tools/llvm-ar/Inputs/.gitattributes @@ -0,0 +1 @@ +mri-crlf.mri text eol=crlf diff --git a/llvm/utils/lit/tests/Inputs/shtest-shell/.gitattributes b/llvm/utils/lit/tests/Inputs/shtest-shell/.gitattributes new file mode 100644 index 00000000000000..2df17345df5b87 --- /dev/null +++ b/llvm/utils/lit/tests/Inputs/shtest-shell/.gitattributes @@ -0,0 +1 @@ +*.dos text eol=crlf >From 8999a0613a340dd02039ec6dde23057cb38b17f5 Mon Sep 17 00:00:00 2001 From: Luke Drummond Date: Thu, 17 Oct 2024 14:48:42 +0100 Subject: [PATCH 2/2] Renormalize line endings whitespace only after dccebddb3b80 Line ending policies were changed in the parent, dccebddb3b80. To make it easier to resolve downstream merge conflicts after line-ending policies are adjusted this is a separate whitespace-only commit. If you have merge conflicts as a result, you can simply `git add --renormalize -u && git merge --continue` or `git add --renormalize -u && git rebase --continue` - depending on your workflow. --- .../clangd/test/input-mirror.test | 34 +- clang-tools-extra/clangd/test/protocol.test | 226 +- clang-tools-extra/clangd/test/too_large.test | 14 +- clang/test/AST/HLSL/StructuredBuffer-AST.hlsl | 128 +- clang/test/C/C2y/n3262.c | 40 +- clang/test/C/C2y/n3274.c | 36 +- .../StructuredBuffer-annotations.hlsl | 44 +- .../StructuredBuffer-constructor.hlsl | 38 +- .../StructuredBuffer-elementtype.hlsl | 140 +- .../builtins/StructuredBuffer-subscript.hlsl | 34 +- clang/test/CodeGenHLSL/builtins/atan2.hlsl | 118 +- clang/test/CodeGenHLSL/builtins/cross.hlsl | 74 +- clang/test/CodeGenHLSL/builtins/length.hlsl | 146 +- .../test/CodeGenHLSL/builtins/normalize.hlsl | 170 +- clang/test/CodeGenHLSL/builtins/step.hlsl | 168 +- clang/test/Driver/flang/msvc-link.f90 | 10 +- clang/test/FixIt/fixit-newline-style.c | 22 +- .../rewrite-includes-mixed-eol-crlf.c | 16 +- .../rewrite-includes-mixed-eol-crlf.h | 22 +- ...tem-header-line-directive-ms-lineendings.c | 42 +- clang/test/ParserHLSL/bitfields.hlsl | 60 +- .../hlsl_annotations_on_struct_members.hlsl | 42 +- .../ParserHLSL/hlsl_contained_type_attr.hlsl | 50 +- .../hlsl_contained_type_attr_error.hlsl | 56 +- clang/test/ParserHLSL/hlsl_is_rov_attr.hlsl | 44 +- .../ParserHLSL/hlsl_is_rov_attr_error.hlsl | 40 +- .../test/ParserHLSL/hlsl_raw_buffer_attr.hlsl | 44 +- .../hlsl_raw_buffer_attr_error.hlsl | 34 +- .../ParserHLSL/hlsl_resource_class_attr.hlsl | 74 +- .../hlsl_resource_class_attr_error.hlsl | 44 +- .../hlsl_resource_handle_attrs.hlsl | 42 +- clang/test/Sema/aarch64-sve-vector-trig-ops.c | 130 +- clang/test/Sema/riscv-rvv-vector-trig-ops.c | 134 +- .../avail-diag-default-compute.hlsl | 238 +- .../Availability/avail-diag-default-lib.hlsl | 360 +- .../avail-diag-relaxed-compute.hlsl | 238 +- .../Availability/avail-diag-relaxed-lib.hlsl | 324 +- .../avail-diag-strict-compute.hlsl | 256 +- .../Availability/avail-diag-strict-lib.hlsl | 384 +- .../avail-lib-multiple-stages.hlsl | 114 +- .../SemaHLSL/BuiltIns/StructuredBuffers.hlsl | 38 +- .../test/SemaHLSL/BuiltIns/cross-errors.hlsl | 86 +- .../BuiltIns/half-float-only-errors2.hlsl | 26 +- .../test/SemaHLSL/BuiltIns/length-errors.hlsl | 64 +- .../SemaHLSL/BuiltIns/normalize-errors.hlsl | 62 +- clang/test/SemaHLSL/BuiltIns/step-errors.hlsl | 62 +- .../Types/Traits/IsIntangibleType.hlsl | 162 +- .../Types/Traits/IsIntangibleTypeErrors.hlsl | 24 +- .../resource_binding_attr_error_basic.hlsl | 84 +- .../resource_binding_attr_error_other.hlsl | 18 +- .../resource_binding_attr_error_resource.hlsl | 98 +- ...urce_binding_attr_error_silence_diags.hlsl | 54 +- .../resource_binding_attr_error_space.hlsl | 124 +- .../resource_binding_attr_error_udt.hlsl | 270 +- clang/tools/scan-build/bin/scan-build.bat | 2 +- .../tools/scan-build/libexec/c++-analyzer.bat | 2 +- .../tools/scan-build/libexec/ccc-analyzer.bat | 2 +- clang/utils/ClangVisualizers/clang.natvis | 2178 ++--- .../test/Driver/msvc-dependent-lib-flags.f90 | 72 +- .../ir-interpreter-phi-nodes/Makefile | 8 +- .../postmortem/minidump/fizzbuzz.syms | 4 +- .../target-new-solib-notifications/Makefile | 46 +- .../target-new-solib-notifications/a.cpp | 6 +- .../target-new-solib-notifications/b.cpp | 2 +- .../target-new-solib-notifications/c.cpp | 2 +- .../target-new-solib-notifications/d.cpp | 2 +- .../target-new-solib-notifications/main.cpp | 32 +- .../unwind/zeroth_frame/Makefile | 6 +- .../unwind/zeroth_frame/TestZerothFrame.py | 176 +- lldb/test/API/python_api/debugger/Makefile | 6 +- lldb/test/Shell/BuildScript/modes.test | 70 +- lldb/test/Shell/BuildScript/script-args.test | 64 +- .../Shell/BuildScript/toolchain-clang-cl.test | 98 +- .../Windows/Sigsegv/Inputs/sigsegv.cpp | 80 +- .../NativePDB/Inputs/inline_sites.s | 1244 +-- .../Inputs/inline_sites_live.lldbinit | 14 +- .../Inputs/local-variables-registers.lldbinit | 70 +- .../NativePDB/Inputs/lookup-by-types.lldbinit | 6 +- .../subfield_register_simple_type.lldbinit | 4 +- .../NativePDB/function-types-classes.cpp | 12 +- .../NativePDB/inline_sites_live.cpp | 68 +- .../SymbolFile/NativePDB/lookup-by-types.cpp | 92 +- lldb/unittests/Breakpoint/CMakeLists.txt | 20 +- llvm/benchmarks/FormatVariadicBM.cpp | 126 +- .../GetIntrinsicForClangBuiltin.cpp | 100 +- .../GetIntrinsicInfoTableEntriesBM.cpp | 60 +- llvm/docs/_static/LoopOptWG_invite.ics | 160 +- llvm/lib/Support/rpmalloc/CACHE.md | 38 +- llvm/lib/Support/rpmalloc/README.md | 440 +- llvm/lib/Support/rpmalloc/malloc.c | 1448 +-- llvm/lib/Support/rpmalloc/rpmalloc.c | 7984 ++++++++--------- llvm/lib/Support/rpmalloc/rpmalloc.h | 856 +- llvm/lib/Support/rpmalloc/rpnew.h | 226 +- .../Target/DirectX/DXILFinalizeLinkage.cpp | 130 +- .../DirectX/DirectXTargetTransformInfo.cpp | 76 +- llvm/test/CodeGen/DirectX/atan2.ll | 174 +- llvm/test/CodeGen/DirectX/atan2_error.ll | 22 +- llvm/test/CodeGen/DirectX/cross.ll | 112 +- llvm/test/CodeGen/DirectX/finalize_linkage.ll | 128 +- llvm/test/CodeGen/DirectX/normalize.ll | 224 +- llvm/test/CodeGen/DirectX/normalize_error.ll | 20 +- llvm/test/CodeGen/DirectX/step.ll | 156 +- .../CodeGen/SPIRV/hlsl-intrinsics/atan2.ll | 98 +- .../CodeGen/SPIRV/hlsl-intrinsics/cross.ll | 66 +- .../CodeGen/SPIRV/hlsl-intrinsics/length.ll | 58 +- .../SPIRV/hlsl-intrinsics/normalize.ll | 62 +- .../CodeGen/SPIRV/hlsl-intrinsics/step.ll | 66 +- .../Demangle/ms-placeholder-return-type.test | 36 +- llvm/test/FileCheck/dos-style-eol.txt | 20 +- llvm/test/tools/llvm-ar/Inputs/mri-crlf.mri | 8 +- .../tools/llvm-cvtres/Inputs/languages.rc | 72 +- .../tools/llvm-cvtres/Inputs/test_resource.rc | 98 +- .../tools/llvm-rc/Inputs/dialog-with-menu.rc | 32 +- .../COFF/Inputs/resources/test_resource.rc | 88 +- llvm/unittests/Support/ModRefTest.cpp | 54 +- llvm/utils/LLVMVisualizers/llvm.natvis | 816 +- .../lit/tests/Inputs/shtest-shell/diff-in.dos | 6 +- llvm/utils/release/build_llvm_release.bat | 1030 +-- openmp/runtime/doc/doxygen/config | 3644 ++++---- pstl/CREDITS.txt | 42 +- 120 files changed, 14283 insertions(+), 14283 deletions(-) diff --git a/clang-tools-extra/clangd/test/input-mirror.test b/clang-tools-extra/clangd/test/input-mirror.test index a34a4a08cf60cf..bce3f9923a3b90 100644 --- a/clang-tools-extra/clangd/test/input-mirror.test +++ b/clang-tools-extra/clangd/test/input-mirror.test @@ -1,17 +1,17 @@ -# RUN: clangd -pretty -sync -input-mirror-file %t < %s -# Note that we have to use '-b' as -input-mirror-file does not have a newline at the end of file. -# RUN: diff -b %t %s -# It is absolutely vital that this file has CRLF line endings. -# -Content-Length: 125 - -{"jsonrpc":"2.0","id":0,"method":"initialize","params":{"processId":123,"rootPath":"clangd","capabilities":{},"trace":"off"}} -Content-Length: 172 - -{"jsonrpc":"2.0","method":"textDocument/didOpen","params":{"textDocument":{"uri":"test:///main.cpp","languageId":"cpp","version":1,"text":"int main() {\nint a;\na;\n}\n"}}} -Content-Length: 44 - -{"jsonrpc":"2.0","id":3,"method":"shutdown"} -Content-Length: 33 - -{"jsonrpc":"2.0","method":"exit"} +# RUN: clangd -pretty -sync -input-mirror-file %t < %s +# Note that we have to use '-b' as -input-mirror-file does not have a newline at the end of file. +# RUN: diff -b %t %s +# It is absolutely vital that this file has CRLF line endings. +# +Content-Length: 125 + +{"jsonrpc":"2.0","id":0,"method":"initialize","params":{"processId":123,"rootPath":"clangd","capabilities":{},"trace":"off"}} +Content-Length: 172 + +{"jsonrpc":"2.0","method":"textDocument/didOpen","params":{"textDocument":{"uri":"test:///main.cpp","languageId":"cpp","version":1,"text":"int main() {\nint a;\na;\n}\n"}}} +Content-Length: 44 + +{"jsonrpc":"2.0","id":3,"method":"shutdown"} +Content-Length: 33 + +{"jsonrpc":"2.0","method":"exit"} diff --git a/clang-tools-extra/clangd/test/protocol.test b/clang-tools-extra/clangd/test/protocol.test index 5e852d1d9deebc..64ccfaef189111 100644 --- a/clang-tools-extra/clangd/test/protocol.test +++ b/clang-tools-extra/clangd/test/protocol.test @@ -1,113 +1,113 @@ -# RUN: not clangd -pretty -sync -enable-test-uri-scheme < %s | FileCheck -strict-whitespace %s -# RUN: not clangd -pretty -sync -enable-test-uri-scheme < %s 2>&1 | FileCheck -check-prefix=STDERR %s -# vim: fileformat=dos -# It is absolutely vital that this file has CRLF line endings. -# -# Note that we invert the test because we intent to let clangd exit prematurely. -# -# Test protocol parsing -Content-Length: 125 -Content-Type: application/vscode-jsonrpc; charset-utf-8 - -{"jsonrpc":"2.0","id":0,"method":"initialize","params":{"processId":123,"rootPath":"clangd","capabilities":{},"trace":"off"}} -# Test message with Content-Type after Content-Length -# -# CHECK: "jsonrpc": "2.0", -# CHECK-NEXT: "result": { -# CHECK: } -Content-Length: 246 - -{"jsonrpc":"2.0","method":"textDocument/didOpen","params":{"textDocument":{"uri":"test:///main.cpp","languageId":"cpp","version":1,"text":"struct fake { int a, bb, ccc; int f(int i, const float f) const; };\nint main() {\n fake f;\n f.\n}\n"}}} - -Content-Length: 104 - -{"jsonrpc":"2.0","method":"textDocument/didChange","params":{"textDocument":{"uri":"test:///main.cpp"}}} - -Content-Type: application/vscode-jsonrpc; charset-utf-8 -Content-Length: 146 - -{"jsonrpc":"2.0","id":1,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} -# Test message with Content-Type before Content-Length -# -# CHECK: "id": 1, -# CHECK-NEXT: "jsonrpc": "2.0", -# CHECK-NEXT: "result": { -# CHECK-NEXT: "isIncomplete": false, -# CHECK-NEXT: "items": [ -# CHECK: "filterText": "a", -# CHECK-NEXT: "insertText": "a", -# CHECK-NEXT: "insertTextFormat": 1, -# CHECK-NEXT: "kind": 5, -# CHECK-NEXT: "label": " a", -# CHECK-NEXT: "score": {{[0-9]+.[0-9]+}}, -# CHECK-NEXT: "sortText": "{{.*}}" -# CHECK: ] -# CHECK-NEXT: } - -X-Test: Testing -Content-Type: application/vscode-jsonrpc; charset-utf-8 -Content-Length: 146 -Content-Type: application/vscode-jsonrpc; charset-utf-8 -X-Testing: Test - -{"jsonrpc":"2.0","id":2,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} - -Content-Type: application/vscode-jsonrpc; charset-utf-8 -Content-Length: 10 -Content-Length: 146 - -{"jsonrpc":"2.0","id":3,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} -# Test message with duplicate Content-Length headers -# -# CHECK: "id": 3, -# CHECK-NEXT: "jsonrpc": "2.0", -# CHECK-NEXT: "result": { -# CHECK-NEXT: "isIncomplete": false, -# CHECK-NEXT: "items": [ -# CHECK: "filterText": "a", -# CHECK-NEXT: "insertText": "a", -# CHECK-NEXT: "insertTextFormat": 1, -# CHECK-NEXT: "kind": 5, -# CHECK-NEXT: "label": " a", -# CHECK-NEXT: "score": {{[0-9]+.[0-9]+}}, -# CHECK-NEXT: "sortText": "{{.*}}" -# CHECK: ] -# CHECK-NEXT: } -# STDERR: Warning: Duplicate Content-Length header received. The previous value for this message (10) was ignored. - -Content-Type: application/vscode-jsonrpc; charset-utf-8 -Content-Length: 10 - -{"jsonrpc":"2.0","id":4,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} -# Test message with malformed Content-Length -# -# STDERR: JSON parse error -# Ensure we recover by sending another (valid) message - -Content-Length: 146 - -{"jsonrpc":"2.0","id":5,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} -# Test message with Content-Type before Content-Length -# -# CHECK: "id": 5, -# CHECK-NEXT: "jsonrpc": "2.0", -# CHECK-NEXT: "result": { -# CHECK-NEXT: "isIncomplete": false, -# CHECK-NEXT: "items": [ -# CHECK: "filterText": "a", -# CHECK-NEXT: "insertText": "a", -# CHECK-NEXT: "insertTextFormat": 1, -# CHECK-NEXT: "kind": 5, -# CHECK-NEXT: "label": " a", -# CHECK-NEXT: "score": {{[0-9]+.[0-9]+}}, -# CHECK-NEXT: "sortText": "{{.*}}" -# CHECK: ] -# CHECK-NEXT: } -Content-Length: 1024 - -{"jsonrpc":"2.0","id":5,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} -# Test message which reads beyond the end of the stream. -# -# Ensure this is the last test in the file! -# STDERR: Input was aborted. Read only {{[0-9]+}} bytes of expected {{[0-9]+}}. - +# RUN: not clangd -pretty -sync -enable-test-uri-scheme < %s | FileCheck -strict-whitespace %s +# RUN: not clangd -pretty -sync -enable-test-uri-scheme < %s 2>&1 | FileCheck -check-prefix=STDERR %s +# vim: fileformat=dos +# It is absolutely vital that this file has CRLF line endings. +# +# Note that we invert the test because we intent to let clangd exit prematurely. +# +# Test protocol parsing +Content-Length: 125 +Content-Type: application/vscode-jsonrpc; charset-utf-8 + +{"jsonrpc":"2.0","id":0,"method":"initialize","params":{"processId":123,"rootPath":"clangd","capabilities":{},"trace":"off"}} +# Test message with Content-Type after Content-Length +# +# CHECK: "jsonrpc": "2.0", +# CHECK-NEXT: "result": { +# CHECK: } +Content-Length: 246 + +{"jsonrpc":"2.0","method":"textDocument/didOpen","params":{"textDocument":{"uri":"test:///main.cpp","languageId":"cpp","version":1,"text":"struct fake { int a, bb, ccc; int f(int i, const float f) const; };\nint main() {\n fake f;\n f.\n}\n"}}} + +Content-Length: 104 + +{"jsonrpc":"2.0","method":"textDocument/didChange","params":{"textDocument":{"uri":"test:///main.cpp"}}} + +Content-Type: application/vscode-jsonrpc; charset-utf-8 +Content-Length: 146 + +{"jsonrpc":"2.0","id":1,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} +# Test message with Content-Type before Content-Length +# +# CHECK: "id": 1, +# CHECK-NEXT: "jsonrpc": "2.0", +# CHECK-NEXT: "result": { +# CHECK-NEXT: "isIncomplete": false, +# CHECK-NEXT: "items": [ +# CHECK: "filterText": "a", +# CHECK-NEXT: "insertText": "a", +# CHECK-NEXT: "insertTextFormat": 1, +# CHECK-NEXT: "kind": 5, +# CHECK-NEXT: "label": " a", +# CHECK-NEXT: "score": {{[0-9]+.[0-9]+}}, +# CHECK-NEXT: "sortText": "{{.*}}" +# CHECK: ] +# CHECK-NEXT: } + +X-Test: Testing +Content-Type: application/vscode-jsonrpc; charset-utf-8 +Content-Length: 146 +Content-Type: application/vscode-jsonrpc; charset-utf-8 +X-Testing: Test + +{"jsonrpc":"2.0","id":2,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} + +Content-Type: application/vscode-jsonrpc; charset-utf-8 +Content-Length: 10 +Content-Length: 146 + +{"jsonrpc":"2.0","id":3,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} +# Test message with duplicate Content-Length headers +# +# CHECK: "id": 3, +# CHECK-NEXT: "jsonrpc": "2.0", +# CHECK-NEXT: "result": { +# CHECK-NEXT: "isIncomplete": false, +# CHECK-NEXT: "items": [ +# CHECK: "filterText": "a", +# CHECK-NEXT: "insertText": "a", +# CHECK-NEXT: "insertTextFormat": 1, +# CHECK-NEXT: "kind": 5, +# CHECK-NEXT: "label": " a", +# CHECK-NEXT: "score": {{[0-9]+.[0-9]+}}, +# CHECK-NEXT: "sortText": "{{.*}}" +# CHECK: ] +# CHECK-NEXT: } +# STDERR: Warning: Duplicate Content-Length header received. The previous value for this message (10) was ignored. + +Content-Type: application/vscode-jsonrpc; charset-utf-8 +Content-Length: 10 + +{"jsonrpc":"2.0","id":4,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} +# Test message with malformed Content-Length +# +# STDERR: JSON parse error +# Ensure we recover by sending another (valid) message + +Content-Length: 146 + +{"jsonrpc":"2.0","id":5,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} +# Test message with Content-Type before Content-Length +# +# CHECK: "id": 5, +# CHECK-NEXT: "jsonrpc": "2.0", +# CHECK-NEXT: "result": { +# CHECK-NEXT: "isIncomplete": false, +# CHECK-NEXT: "items": [ +# CHECK: "filterText": "a", +# CHECK-NEXT: "insertText": "a", +# CHECK-NEXT: "insertTextFormat": 1, +# CHECK-NEXT: "kind": 5, +# CHECK-NEXT: "label": " a", +# CHECK-NEXT: "score": {{[0-9]+.[0-9]+}}, +# CHECK-NEXT: "sortText": "{{.*}}" +# CHECK: ] +# CHECK-NEXT: } +Content-Length: 1024 + +{"jsonrpc":"2.0","id":5,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} +# Test message which reads beyond the end of the stream. +# +# Ensure this is the last test in the file! +# STDERR: Input was aborted. Read only {{[0-9]+}} bytes of expected {{[0-9]+}}. + diff --git a/clang-tools-extra/clangd/test/too_large.test b/clang-tools-extra/clangd/test/too_large.test index 7df981e7942073..6986bd5e258e87 100644 --- a/clang-tools-extra/clangd/test/too_large.test +++ b/clang-tools-extra/clangd/test/too_large.test @@ -1,7 +1,7 @@ -# RUN: not clangd -sync < %s 2>&1 | FileCheck -check-prefix=STDERR %s -# vim: fileformat=dos -# It is absolutely vital that this file has CRLF line endings. -# -Content-Length: 2147483648 - -# STDERR: Refusing to read message +# RUN: not clangd -sync < %s 2>&1 | FileCheck -check-prefix=STDERR %s +# vim: fileformat=dos +# It is absolutely vital that this file has CRLF line endings. +# +Content-Length: 2147483648 + +# STDERR: Refusing to read message diff --git a/clang/test/AST/HLSL/StructuredBuffer-AST.hlsl b/clang/test/AST/HLSL/StructuredBuffer-AST.hlsl index 030fcfc31691dc..9c1630f6f570aa 100644 --- a/clang/test/AST/HLSL/StructuredBuffer-AST.hlsl +++ b/clang/test/AST/HLSL/StructuredBuffer-AST.hlsl @@ -1,64 +1,64 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library -x hlsl -ast-dump -DEMPTY %s | FileCheck -check-prefix=EMPTY %s -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library -x hlsl -ast-dump %s | FileCheck %s - - -// This test tests two different AST generations. The "EMPTY" test mode verifies -// the AST generated by forward declaration of the HLSL types which happens on -// initializing the HLSL external AST with an AST Context. - -// The non-empty mode has a use that requires the StructuredBuffer type be complete, -// which results in the AST being populated by the external AST source. That -// case covers the full implementation of the template declaration and the -// instantiated specialization. - -// EMPTY: ClassTemplateDecl 0x{{[0-9A-Fa-f]+}} <> implicit StructuredBuffer -// EMPTY-NEXT: TemplateTypeParmDecl 0x{{[0-9A-Fa-f]+}} <> class depth 0 index 0 element_type -// EMPTY-NEXT: CXXRecordDecl 0x{{[0-9A-Fa-f]+}} <> implicit class StructuredBuffer -// EMPTY-NEXT: FinalAttr 0x{{[0-9A-Fa-f]+}} <> Implicit final - -// There should be no more occurrances of StructuredBuffer -// EMPTY-NOT: StructuredBuffer - -#ifndef EMPTY - -StructuredBuffer Buffer; - -#endif - -// CHECK: ClassTemplateDecl 0x{{[0-9A-Fa-f]+}} <> implicit StructuredBuffer -// CHECK-NEXT: TemplateTypeParmDecl 0x{{[0-9A-Fa-f]+}} <> class depth 0 index 0 element_type -// CHECK-NEXT: CXXRecordDecl 0x{{[0-9A-Fa-f]+}} <> implicit class StructuredBuffer definition - -// CHECK: FinalAttr 0x{{[0-9A-Fa-f]+}} <> Implicit final -// CHECK-NEXT: FieldDecl 0x{{[0-9A-Fa-f]+}} <> implicit h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] -// CHECK-SAME{LITERAL}: [[hlsl::contained_type(element_type)]] -// CHECK-NEXT: HLSLResourceAttr 0x{{[0-9A-Fa-f]+}} <> Implicit TypedBuffer - -// CHECK: CXXMethodDecl 0x{{[0-9A-Fa-f]+}} <> operator[] 'element_type &const (unsigned int) const' -// CHECK-NEXT: ParmVarDecl 0x{{[0-9A-Fa-f]+}} <> Idx 'unsigned int' -// CHECK-NEXT: CompoundStmt 0x{{[0-9A-Fa-f]+}} <> -// CHECK-NEXT: ReturnStmt 0x{{[0-9A-Fa-f]+}} <> -// CHECK-NEXT: MemberExpr 0x{{[0-9A-Fa-f]+}} <> 'element_type' lvalue .e 0x{{[0-9A-Fa-f]+}} -// CHECK-NEXT: CXXThisExpr 0x{{[0-9A-Fa-f]+}} <> 'const StructuredBuffer' lvalue implicit this -// CHECK-NEXT: AlwaysInlineAttr 0x{{[0-9A-Fa-f]+}} <> Implicit always_inline - -// CHECK-NEXT: CXXMethodDecl 0x{{[0-9A-Fa-f]+}} <> operator[] 'element_type &(unsigned int)' -// CHECK-NEXT: ParmVarDecl 0x{{[0-9A-Fa-f]+}} <> Idx 'unsigned int' -// CHECK-NEXT: CompoundStmt 0x{{[0-9A-Fa-f]+}} <> -// CHECK-NEXT: ReturnStmt 0x{{[0-9A-Fa-f]+}} <> -// CHECK-NEXT: MemberExpr 0x{{[0-9A-Fa-f]+}} <> 'element_type' lvalue .e 0x{{[0-9A-Fa-f]+}} -// CHECK-NEXT: CXXThisExpr 0x{{[0-9A-Fa-f]+}} <> 'StructuredBuffer' lvalue implicit this -// CHECK-NEXT: AlwaysInlineAttr 0x{{[0-9A-Fa-f]+}} <> Implicit always_inline - -// CHECK: ClassTemplateSpecializationDecl 0x{{[0-9A-Fa-f]+}} <> class StructuredBuffer definition - -// CHECK: TemplateArgument type 'float' -// CHECK-NEXT: BuiltinType 0x{{[0-9A-Fa-f]+}} 'float' -// CHECK-NEXT: FinalAttr 0x{{[0-9A-Fa-f]+}} <> Implicit final -// CHECK-NEXT: FieldDecl 0x{{[0-9A-Fa-f]+}} <> implicit h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] -// CHECK-SAME{LITERAL}: [[hlsl::contained_type(float)]] -// CHECK-NEXT: HLSLResourceAttr 0x{{[0-9A-Fa-f]+}} <> Implicit TypedBuffer +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library -x hlsl -ast-dump -DEMPTY %s | FileCheck -check-prefix=EMPTY %s +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library -x hlsl -ast-dump %s | FileCheck %s + + +// This test tests two different AST generations. The "EMPTY" test mode verifies +// the AST generated by forward declaration of the HLSL types which happens on +// initializing the HLSL external AST with an AST Context. + +// The non-empty mode has a use that requires the StructuredBuffer type be complete, +// which results in the AST being populated by the external AST source. That +// case covers the full implementation of the template declaration and the +// instantiated specialization. + +// EMPTY: ClassTemplateDecl 0x{{[0-9A-Fa-f]+}} <> implicit StructuredBuffer +// EMPTY-NEXT: TemplateTypeParmDecl 0x{{[0-9A-Fa-f]+}} <> class depth 0 index 0 element_type +// EMPTY-NEXT: CXXRecordDecl 0x{{[0-9A-Fa-f]+}} <> implicit class StructuredBuffer +// EMPTY-NEXT: FinalAttr 0x{{[0-9A-Fa-f]+}} <> Implicit final + +// There should be no more occurrances of StructuredBuffer +// EMPTY-NOT: StructuredBuffer + +#ifndef EMPTY + +StructuredBuffer Buffer; + +#endif + +// CHECK: ClassTemplateDecl 0x{{[0-9A-Fa-f]+}} <> implicit StructuredBuffer +// CHECK-NEXT: TemplateTypeParmDecl 0x{{[0-9A-Fa-f]+}} <> class depth 0 index 0 element_type +// CHECK-NEXT: CXXRecordDecl 0x{{[0-9A-Fa-f]+}} <> implicit class StructuredBuffer definition + +// CHECK: FinalAttr 0x{{[0-9A-Fa-f]+}} <> Implicit final +// CHECK-NEXT: FieldDecl 0x{{[0-9A-Fa-f]+}} <> implicit h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] +// CHECK-SAME{LITERAL}: [[hlsl::contained_type(element_type)]] +// CHECK-NEXT: HLSLResourceAttr 0x{{[0-9A-Fa-f]+}} <> Implicit TypedBuffer + +// CHECK: CXXMethodDecl 0x{{[0-9A-Fa-f]+}} <> operator[] 'element_type &const (unsigned int) const' +// CHECK-NEXT: ParmVarDecl 0x{{[0-9A-Fa-f]+}} <> Idx 'unsigned int' +// CHECK-NEXT: CompoundStmt 0x{{[0-9A-Fa-f]+}} <> +// CHECK-NEXT: ReturnStmt 0x{{[0-9A-Fa-f]+}} <> +// CHECK-NEXT: MemberExpr 0x{{[0-9A-Fa-f]+}} <> 'element_type' lvalue .e 0x{{[0-9A-Fa-f]+}} +// CHECK-NEXT: CXXThisExpr 0x{{[0-9A-Fa-f]+}} <> 'const StructuredBuffer' lvalue implicit this +// CHECK-NEXT: AlwaysInlineAttr 0x{{[0-9A-Fa-f]+}} <> Implicit always_inline + +// CHECK-NEXT: CXXMethodDecl 0x{{[0-9A-Fa-f]+}} <> operator[] 'element_type &(unsigned int)' +// CHECK-NEXT: ParmVarDecl 0x{{[0-9A-Fa-f]+}} <> Idx 'unsigned int' +// CHECK-NEXT: CompoundStmt 0x{{[0-9A-Fa-f]+}} <> +// CHECK-NEXT: ReturnStmt 0x{{[0-9A-Fa-f]+}} <> +// CHECK-NEXT: MemberExpr 0x{{[0-9A-Fa-f]+}} <> 'element_type' lvalue .e 0x{{[0-9A-Fa-f]+}} +// CHECK-NEXT: CXXThisExpr 0x{{[0-9A-Fa-f]+}} <> 'StructuredBuffer' lvalue implicit this +// CHECK-NEXT: AlwaysInlineAttr 0x{{[0-9A-Fa-f]+}} <> Implicit always_inline + +// CHECK: ClassTemplateSpecializationDecl 0x{{[0-9A-Fa-f]+}} <> class StructuredBuffer definition + +// CHECK: TemplateArgument type 'float' +// CHECK-NEXT: BuiltinType 0x{{[0-9A-Fa-f]+}} 'float' +// CHECK-NEXT: FinalAttr 0x{{[0-9A-Fa-f]+}} <> Implicit final +// CHECK-NEXT: FieldDecl 0x{{[0-9A-Fa-f]+}} <> implicit h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] +// CHECK-SAME{LITERAL}: [[hlsl::contained_type(float)]] +// CHECK-NEXT: HLSLResourceAttr 0x{{[0-9A-Fa-f]+}} <> Implicit TypedBuffer diff --git a/clang/test/C/C2y/n3262.c b/clang/test/C/C2y/n3262.c index 3ff2062d88dde8..864ab351bdbc23 100644 --- a/clang/test/C/C2y/n3262.c +++ b/clang/test/C/C2y/n3262.c @@ -1,20 +1,20 @@ -// RUN: %clang_cc1 -verify -std=c2y -Wall -pedantic %s -// expected-no-diagnostics - -/* WG14 N3262: Yes - * Usability of a byte-wise copy of va_list - * - * NB: Clang explicitly documents this as being undefined behavior. A - * diagnostic is produced for some targets but not for others for assignment or - * initialization, but no diagnostic is possible to produce for use with memcpy - * in the general case, nor with a manual bytewise copy via a for loop. - * - * Therefore, nothing is tested in this file; it serves as a reminder that we - * validated our documentation against the paper. See - * clang/docs/LanguageExtensions.rst for more details. - * - * FIXME: it would be nice to add ubsan support for recognizing when an invalid - * copy is made and diagnosing on copy (or on use of the copied va_list). - */ - -int main() {} +// RUN: %clang_cc1 -verify -std=c2y -Wall -pedantic %s +// expected-no-diagnostics + +/* WG14 N3262: Yes + * Usability of a byte-wise copy of va_list + * + * NB: Clang explicitly documents this as being undefined behavior. A + * diagnostic is produced for some targets but not for others for assignment or + * initialization, but no diagnostic is possible to produce for use with memcpy + * in the general case, nor with a manual bytewise copy via a for loop. + * + * Therefore, nothing is tested in this file; it serves as a reminder that we + * validated our documentation against the paper. See + * clang/docs/LanguageExtensions.rst for more details. + * + * FIXME: it would be nice to add ubsan support for recognizing when an invalid + * copy is made and diagnosing on copy (or on use of the copied va_list). + */ + +int main() {} diff --git a/clang/test/C/C2y/n3274.c b/clang/test/C/C2y/n3274.c index ccdb89f4069ded..6bf8d72d0f3319 100644 --- a/clang/test/C/C2y/n3274.c +++ b/clang/test/C/C2y/n3274.c @@ -1,18 +1,18 @@ -// RUN: %clang_cc1 -verify -std=c23 -Wall -pedantic %s -// RUN: %clang_cc1 -verify -std=c2y -Wall -pedantic %s - -/* WG14 N3274: Yes - * Remove imaginary types - */ - -// Clang has never supported _Imaginary. -#ifdef __STDC_IEC_559_COMPLEX__ -#error "When did this happen?" -#endif - -_Imaginary float i; // expected-error {{imaginary types are not supported}} - -// _Imaginary is a keyword in older language modes, but doesn't need to be one -// in C2y or later. However, to improve diagnostic behavior, we retain it as a -// keyword in all language modes -- it is not available as an identifier. -static_assert(!__is_identifier(_Imaginary)); +// RUN: %clang_cc1 -verify -std=c23 -Wall -pedantic %s +// RUN: %clang_cc1 -verify -std=c2y -Wall -pedantic %s + +/* WG14 N3274: Yes + * Remove imaginary types + */ + +// Clang has never supported _Imaginary. +#ifdef __STDC_IEC_559_COMPLEX__ +#error "When did this happen?" +#endif + +_Imaginary float i; // expected-error {{imaginary types are not supported}} + +// _Imaginary is a keyword in older language modes, but doesn't need to be one +// in C2y or later. However, to improve diagnostic behavior, we retain it as a +// keyword in all language modes -- it is not available as an identifier. +static_assert(!__is_identifier(_Imaginary)); diff --git a/clang/test/CodeGenHLSL/builtins/StructuredBuffer-annotations.hlsl b/clang/test/CodeGenHLSL/builtins/StructuredBuffer-annotations.hlsl index 4d3d4908c396e6..81c5837d8f2077 100644 --- a/clang/test/CodeGenHLSL/builtins/StructuredBuffer-annotations.hlsl +++ b/clang/test/CodeGenHLSL/builtins/StructuredBuffer-annotations.hlsl @@ -1,22 +1,22 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s - -StructuredBuffer Buffer1; -StructuredBuffer > BufferArray[4]; - -StructuredBuffer Buffer2 : register(u3); -StructuredBuffer > BufferArray2[4] : register(u4); - -StructuredBuffer Buffer3 : register(u3, space1); -StructuredBuffer > BufferArray3[4] : register(u4, space1); - -[numthreads(1,1,1)] -void main() { -} - -// CHECK: !hlsl.uavs = !{![[Single:[0-9]+]], ![[Array:[0-9]+]], ![[SingleAllocated:[0-9]+]], ![[ArrayAllocated:[0-9]+]], ![[SingleSpace:[0-9]+]], ![[ArraySpace:[0-9]+]]} -// CHECK-DAG: ![[Single]] = !{ptr @Buffer1, i32 10, i32 9, i1 false, i32 -1, i32 0} -// CHECK-DAG: ![[Array]] = !{ptr @BufferArray, i32 10, i32 9, i1 false, i32 -1, i32 0} -// CHECK-DAG: ![[SingleAllocated]] = !{ptr @Buffer2, i32 10, i32 9, i1 false, i32 3, i32 0} -// CHECK-DAG: ![[ArrayAllocated]] = !{ptr @BufferArray2, i32 10, i32 9, i1 false, i32 4, i32 0} -// CHECK-DAG: ![[SingleSpace]] = !{ptr @Buffer3, i32 10, i32 9, i1 false, i32 3, i32 1} -// CHECK-DAG: ![[ArraySpace]] = !{ptr @BufferArray3, i32 10, i32 9, i1 false, i32 4, i32 1} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s + +StructuredBuffer Buffer1; +StructuredBuffer > BufferArray[4]; + +StructuredBuffer Buffer2 : register(u3); +StructuredBuffer > BufferArray2[4] : register(u4); + +StructuredBuffer Buffer3 : register(u3, space1); +StructuredBuffer > BufferArray3[4] : register(u4, space1); + +[numthreads(1,1,1)] +void main() { +} + +// CHECK: !hlsl.uavs = !{![[Single:[0-9]+]], ![[Array:[0-9]+]], ![[SingleAllocated:[0-9]+]], ![[ArrayAllocated:[0-9]+]], ![[SingleSpace:[0-9]+]], ![[ArraySpace:[0-9]+]]} +// CHECK-DAG: ![[Single]] = !{ptr @Buffer1, i32 10, i32 9, i1 false, i32 -1, i32 0} +// CHECK-DAG: ![[Array]] = !{ptr @BufferArray, i32 10, i32 9, i1 false, i32 -1, i32 0} +// CHECK-DAG: ![[SingleAllocated]] = !{ptr @Buffer2, i32 10, i32 9, i1 false, i32 3, i32 0} +// CHECK-DAG: ![[ArrayAllocated]] = !{ptr @BufferArray2, i32 10, i32 9, i1 false, i32 4, i32 0} +// CHECK-DAG: ![[SingleSpace]] = !{ptr @Buffer3, i32 10, i32 9, i1 false, i32 3, i32 1} +// CHECK-DAG: ![[ArraySpace]] = !{ptr @BufferArray3, i32 10, i32 9, i1 false, i32 4, i32 1} diff --git a/clang/test/CodeGenHLSL/builtins/StructuredBuffer-constructor.hlsl b/clang/test/CodeGenHLSL/builtins/StructuredBuffer-constructor.hlsl index 178332d03e6404..f65090410ce66f 100644 --- a/clang/test/CodeGenHLSL/builtins/StructuredBuffer-constructor.hlsl +++ b/clang/test/CodeGenHLSL/builtins/StructuredBuffer-constructor.hlsl @@ -1,19 +1,19 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s -// RUN: %clang_cc1 -triple spirv-vulkan-library -x hlsl -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s --check-prefix=CHECK-SPIRV - -// XFAIL: * -// This expectedly fails because create.handle is no longer invoked -// from StructuredBuffer constructor and the replacement has not been -// implemented yet. This test should be updated to expect -// dx.create.handleFromBinding as part of issue #105076. - -StructuredBuffer Buf; - -// CHECK: define linkonce_odr noundef ptr @"??0?$StructuredBuffer at M@hlsl@@QAA at XZ" -// CHECK-NEXT: entry: - -// CHECK: %[[HandleRes:[0-9]+]] = call ptr @llvm.dx.create.handle(i8 1) -// CHECK: store ptr %[[HandleRes]], ptr %h, align 4 - -// CHECK-SPIRV: %[[HandleRes:[0-9]+]] = call ptr @llvm.spv.create.handle(i8 1) -// CHECK-SPIRV: store ptr %[[HandleRes]], ptr %h, align 8 +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple spirv-vulkan-library -x hlsl -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s --check-prefix=CHECK-SPIRV + +// XFAIL: * +// This expectedly fails because create.handle is no longer invoked +// from StructuredBuffer constructor and the replacement has not been +// implemented yet. This test should be updated to expect +// dx.create.handleFromBinding as part of issue #105076. + +StructuredBuffer Buf; + +// CHECK: define linkonce_odr noundef ptr @"??0?$StructuredBuffer at M@hlsl@@QAA at XZ" +// CHECK-NEXT: entry: + +// CHECK: %[[HandleRes:[0-9]+]] = call ptr @llvm.dx.create.handle(i8 1) +// CHECK: store ptr %[[HandleRes]], ptr %h, align 4 + +// CHECK-SPIRV: %[[HandleRes:[0-9]+]] = call ptr @llvm.spv.create.handle(i8 1) +// CHECK-SPIRV: store ptr %[[HandleRes]], ptr %h, align 8 diff --git a/clang/test/CodeGenHLSL/builtins/StructuredBuffer-elementtype.hlsl b/clang/test/CodeGenHLSL/builtins/StructuredBuffer-elementtype.hlsl index a99c7f98a1afb6..435a904327a26a 100644 --- a/clang/test/CodeGenHLSL/builtins/StructuredBuffer-elementtype.hlsl +++ b/clang/test/CodeGenHLSL/builtins/StructuredBuffer-elementtype.hlsl @@ -1,70 +1,70 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.2-compute -finclude-default-header -fnative-half-type -emit-llvm -o - %s | FileCheck %s - -// NOTE: The number in type name and whether the struct is packed or not will mostly -// likely change once subscript operators are properly implemented (llvm/llvm-project#95956) -// and theinterim field of the contained type is removed. - -// CHECK: %"class.hlsl::StructuredBuffer" = type <{ target("dx.RawBuffer", i16, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.0" = type <{ target("dx.RawBuffer", i16, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.2" = type { target("dx.RawBuffer", i32, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.3" = type { target("dx.RawBuffer", i32, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.4" = type { target("dx.RawBuffer", i64, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.5" = type { target("dx.RawBuffer", i64, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.6" = type <{ target("dx.RawBuffer", half, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.8" = type { target("dx.RawBuffer", float, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.9" = type { target("dx.RawBuffer", double, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.10" = type { target("dx.RawBuffer", <4 x i16>, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.11" = type { target("dx.RawBuffer", <3 x i32>, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.12" = type { target("dx.RawBuffer", <2 x half>, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.13" = type { target("dx.RawBuffer", <3 x float>, 1, 0) - -StructuredBuffer BufI16; -StructuredBuffer BufU16; -StructuredBuffer BufI32; -StructuredBuffer BufU32; -StructuredBuffer BufI64; -StructuredBuffer BufU64; -StructuredBuffer BufF16; -StructuredBuffer BufF32; -StructuredBuffer BufF64; -StructuredBuffer< vector > BufI16x4; -StructuredBuffer< vector > BufU32x3; -StructuredBuffer BufF16x2; -StructuredBuffer BufF32x3; -// TODO: StructuredBuffer BufSNormF16; -> 11 -// TODO: StructuredBuffer BufUNormF16; -> 12 -// TODO: StructuredBuffer BufSNormF32; -> 13 -// TODO: StructuredBuffer BufUNormF32; -> 14 -// TODO: StructuredBuffer BufSNormF64; -> 15 -// TODO: StructuredBuffer BufUNormF64; -> 16 - -[numthreads(1,1,1)] -void main(int GI : SV_GroupIndex) { - BufI16[GI] = 0; - BufU16[GI] = 0; - BufI32[GI] = 0; - BufU32[GI] = 0; - BufI64[GI] = 0; - BufU64[GI] = 0; - BufF16[GI] = 0; - BufF32[GI] = 0; - BufF64[GI] = 0; - BufI16x4[GI] = 0; - BufU32x3[GI] = 0; - BufF16x2[GI] = 0; - BufF32x3[GI] = 0; -} - -// CHECK: !{{[0-9]+}} = !{ptr @BufI16, i32 10, i32 2, -// CHECK: !{{[0-9]+}} = !{ptr @BufU16, i32 10, i32 3, -// CHECK: !{{[0-9]+}} = !{ptr @BufI32, i32 10, i32 4, -// CHECK: !{{[0-9]+}} = !{ptr @BufU32, i32 10, i32 5, -// CHECK: !{{[0-9]+}} = !{ptr @BufI64, i32 10, i32 6, -// CHECK: !{{[0-9]+}} = !{ptr @BufU64, i32 10, i32 7, -// CHECK: !{{[0-9]+}} = !{ptr @BufF16, i32 10, i32 8, -// CHECK: !{{[0-9]+}} = !{ptr @BufF32, i32 10, i32 9, -// CHECK: !{{[0-9]+}} = !{ptr @BufF64, i32 10, i32 10, -// CHECK: !{{[0-9]+}} = !{ptr @BufI16x4, i32 10, i32 2, -// CHECK: !{{[0-9]+}} = !{ptr @BufU32x3, i32 10, i32 5, -// CHECK: !{{[0-9]+}} = !{ptr @BufF16x2, i32 10, i32 8, -// CHECK: !{{[0-9]+}} = !{ptr @BufF32x3, i32 10, i32 9, +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.2-compute -finclude-default-header -fnative-half-type -emit-llvm -o - %s | FileCheck %s + +// NOTE: The number in type name and whether the struct is packed or not will mostly +// likely change once subscript operators are properly implemented (llvm/llvm-project#95956) +// and theinterim field of the contained type is removed. + +// CHECK: %"class.hlsl::StructuredBuffer" = type <{ target("dx.RawBuffer", i16, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.0" = type <{ target("dx.RawBuffer", i16, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.2" = type { target("dx.RawBuffer", i32, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.3" = type { target("dx.RawBuffer", i32, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.4" = type { target("dx.RawBuffer", i64, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.5" = type { target("dx.RawBuffer", i64, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.6" = type <{ target("dx.RawBuffer", half, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.8" = type { target("dx.RawBuffer", float, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.9" = type { target("dx.RawBuffer", double, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.10" = type { target("dx.RawBuffer", <4 x i16>, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.11" = type { target("dx.RawBuffer", <3 x i32>, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.12" = type { target("dx.RawBuffer", <2 x half>, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.13" = type { target("dx.RawBuffer", <3 x float>, 1, 0) + +StructuredBuffer BufI16; +StructuredBuffer BufU16; +StructuredBuffer BufI32; +StructuredBuffer BufU32; +StructuredBuffer BufI64; +StructuredBuffer BufU64; +StructuredBuffer BufF16; +StructuredBuffer BufF32; +StructuredBuffer BufF64; +StructuredBuffer< vector > BufI16x4; +StructuredBuffer< vector > BufU32x3; +StructuredBuffer BufF16x2; +StructuredBuffer BufF32x3; +// TODO: StructuredBuffer BufSNormF16; -> 11 +// TODO: StructuredBuffer BufUNormF16; -> 12 +// TODO: StructuredBuffer BufSNormF32; -> 13 +// TODO: StructuredBuffer BufUNormF32; -> 14 +// TODO: StructuredBuffer BufSNormF64; -> 15 +// TODO: StructuredBuffer BufUNormF64; -> 16 + +[numthreads(1,1,1)] +void main(int GI : SV_GroupIndex) { + BufI16[GI] = 0; + BufU16[GI] = 0; + BufI32[GI] = 0; + BufU32[GI] = 0; + BufI64[GI] = 0; + BufU64[GI] = 0; + BufF16[GI] = 0; + BufF32[GI] = 0; + BufF64[GI] = 0; + BufI16x4[GI] = 0; + BufU32x3[GI] = 0; + BufF16x2[GI] = 0; + BufF32x3[GI] = 0; +} + +// CHECK: !{{[0-9]+}} = !{ptr @BufI16, i32 10, i32 2, +// CHECK: !{{[0-9]+}} = !{ptr @BufU16, i32 10, i32 3, +// CHECK: !{{[0-9]+}} = !{ptr @BufI32, i32 10, i32 4, +// CHECK: !{{[0-9]+}} = !{ptr @BufU32, i32 10, i32 5, +// CHECK: !{{[0-9]+}} = !{ptr @BufI64, i32 10, i32 6, +// CHECK: !{{[0-9]+}} = !{ptr @BufU64, i32 10, i32 7, +// CHECK: !{{[0-9]+}} = !{ptr @BufF16, i32 10, i32 8, +// CHECK: !{{[0-9]+}} = !{ptr @BufF32, i32 10, i32 9, +// CHECK: !{{[0-9]+}} = !{ptr @BufF64, i32 10, i32 10, +// CHECK: !{{[0-9]+}} = !{ptr @BufI16x4, i32 10, i32 2, +// CHECK: !{{[0-9]+}} = !{ptr @BufU32x3, i32 10, i32 5, +// CHECK: !{{[0-9]+}} = !{ptr @BufF16x2, i32 10, i32 8, +// CHECK: !{{[0-9]+}} = !{ptr @BufF32x3, i32 10, i32 9, diff --git a/clang/test/CodeGenHLSL/builtins/StructuredBuffer-subscript.hlsl b/clang/test/CodeGenHLSL/builtins/StructuredBuffer-subscript.hlsl index 155749ec4f94a9..89bde9236288fc 100644 --- a/clang/test/CodeGenHLSL/builtins/StructuredBuffer-subscript.hlsl +++ b/clang/test/CodeGenHLSL/builtins/StructuredBuffer-subscript.hlsl @@ -1,17 +1,17 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -emit-llvm -o - -O0 %s | FileCheck %s - -StructuredBuffer In; -StructuredBuffer Out; - -[numthreads(1,1,1)] -void main(unsigned GI : SV_GroupIndex) { - Out[GI] = In[GI]; -} - -// Even at -O0 the subscript operators get inlined. The -O0 IR is a bit messy -// and confusing to follow so the match here is pretty weak. - -// CHECK: define void @main() -// Verify inlining leaves only calls to "llvm." intrinsics -// CHECK-NOT: call {{[^@]*}} @{{[^l][^l][^v][^m][^\.]}} -// CHECK: ret void +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -emit-llvm -o - -O0 %s | FileCheck %s + +StructuredBuffer In; +StructuredBuffer Out; + +[numthreads(1,1,1)] +void main(unsigned GI : SV_GroupIndex) { + Out[GI] = In[GI]; +} + +// Even at -O0 the subscript operators get inlined. The -O0 IR is a bit messy +// and confusing to follow so the match here is pretty weak. + +// CHECK: define void @main() +// Verify inlining leaves only calls to "llvm." intrinsics +// CHECK-NOT: call {{[^@]*}} @{{[^l][^l][^v][^m][^\.]}} +// CHECK: ret void diff --git a/clang/test/CodeGenHLSL/builtins/atan2.hlsl b/clang/test/CodeGenHLSL/builtins/atan2.hlsl index 40796052e608fe..ada269db2f00d3 100644 --- a/clang/test/CodeGenHLSL/builtins/atan2.hlsl +++ b/clang/test/CodeGenHLSL/builtins/atan2.hlsl @@ -1,59 +1,59 @@ -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ -// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ -// RUN: --check-prefixes=CHECK,NATIVE_HALF -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: spirv-unknown-vulkan-compute %s -emit-llvm -disable-llvm-passes \ -// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF - -// CHECK-LABEL: test_atan2_half -// NATIVE_HALF: call half @llvm.atan2.f16 -// NO_HALF: call float @llvm.atan2.f32 -half test_atan2_half (half p0, half p1) { - return atan2(p0, p1); -} - -// CHECK-LABEL: test_atan2_half2 -// NATIVE_HALF: call <2 x half> @llvm.atan2.v2f16 -// NO_HALF: call <2 x float> @llvm.atan2.v2f32 -half2 test_atan2_half2 (half2 p0, half2 p1) { - return atan2(p0, p1); -} - -// CHECK-LABEL: test_atan2_half3 -// NATIVE_HALF: call <3 x half> @llvm.atan2.v3f16 -// NO_HALF: call <3 x float> @llvm.atan2.v3f32 -half3 test_atan2_half3 (half3 p0, half3 p1) { - return atan2(p0, p1); -} - -// CHECK-LABEL: test_atan2_half4 -// NATIVE_HALF: call <4 x half> @llvm.atan2.v4f16 -// NO_HALF: call <4 x float> @llvm.atan2.v4f32 -half4 test_atan2_half4 (half4 p0, half4 p1) { - return atan2(p0, p1); -} - -// CHECK-LABEL: test_atan2_float -// CHECK: call float @llvm.atan2.f32 -float test_atan2_float (float p0, float p1) { - return atan2(p0, p1); -} - -// CHECK-LABEL: test_atan2_float2 -// CHECK: call <2 x float> @llvm.atan2.v2f32 -float2 test_atan2_float2 (float2 p0, float2 p1) { - return atan2(p0, p1); -} - -// CHECK-LABEL: test_atan2_float3 -// CHECK: call <3 x float> @llvm.atan2.v3f32 -float3 test_atan2_float3 (float3 p0, float3 p1) { - return atan2(p0, p1); -} - -// CHECK-LABEL: test_atan2_float4 -// CHECK: call <4 x float> @llvm.atan2.v4f32 -float4 test_atan2_float4 (float4 p0, float4 p1) { - return atan2(p0, p1); -} +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ +// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ +// RUN: --check-prefixes=CHECK,NATIVE_HALF +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: spirv-unknown-vulkan-compute %s -emit-llvm -disable-llvm-passes \ +// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF + +// CHECK-LABEL: test_atan2_half +// NATIVE_HALF: call half @llvm.atan2.f16 +// NO_HALF: call float @llvm.atan2.f32 +half test_atan2_half (half p0, half p1) { + return atan2(p0, p1); +} + +// CHECK-LABEL: test_atan2_half2 +// NATIVE_HALF: call <2 x half> @llvm.atan2.v2f16 +// NO_HALF: call <2 x float> @llvm.atan2.v2f32 +half2 test_atan2_half2 (half2 p0, half2 p1) { + return atan2(p0, p1); +} + +// CHECK-LABEL: test_atan2_half3 +// NATIVE_HALF: call <3 x half> @llvm.atan2.v3f16 +// NO_HALF: call <3 x float> @llvm.atan2.v3f32 +half3 test_atan2_half3 (half3 p0, half3 p1) { + return atan2(p0, p1); +} + +// CHECK-LABEL: test_atan2_half4 +// NATIVE_HALF: call <4 x half> @llvm.atan2.v4f16 +// NO_HALF: call <4 x float> @llvm.atan2.v4f32 +half4 test_atan2_half4 (half4 p0, half4 p1) { + return atan2(p0, p1); +} + +// CHECK-LABEL: test_atan2_float +// CHECK: call float @llvm.atan2.f32 +float test_atan2_float (float p0, float p1) { + return atan2(p0, p1); +} + +// CHECK-LABEL: test_atan2_float2 +// CHECK: call <2 x float> @llvm.atan2.v2f32 +float2 test_atan2_float2 (float2 p0, float2 p1) { + return atan2(p0, p1); +} + +// CHECK-LABEL: test_atan2_float3 +// CHECK: call <3 x float> @llvm.atan2.v3f32 +float3 test_atan2_float3 (float3 p0, float3 p1) { + return atan2(p0, p1); +} + +// CHECK-LABEL: test_atan2_float4 +// CHECK: call <4 x float> @llvm.atan2.v4f32 +float4 test_atan2_float4 (float4 p0, float4 p1) { + return atan2(p0, p1); +} diff --git a/clang/test/CodeGenHLSL/builtins/cross.hlsl b/clang/test/CodeGenHLSL/builtins/cross.hlsl index 514e57d36b2016..eba710c905bf46 100644 --- a/clang/test/CodeGenHLSL/builtins/cross.hlsl +++ b/clang/test/CodeGenHLSL/builtins/cross.hlsl @@ -1,37 +1,37 @@ -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ -// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ -// RUN: --check-prefixes=CHECK,NATIVE_HALF \ -// RUN: -DFNATTRS=noundef -DTARGET=dx -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -emit-llvm -disable-llvm-passes \ -// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ -// RUN: -DFNATTRS=noundef -DTARGET=dx -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: spirv-unknown-vulkan-compute %s -fnative-half-type \ -// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ -// RUN: --check-prefixes=CHECK,NATIVE_HALF \ -// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: spirv-unknown-vulkan-compute %s -emit-llvm -disable-llvm-passes \ -// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ -// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv - -// NATIVE_HALF: define [[FNATTRS]] <3 x half> @ -// NATIVE_HALF: call <3 x half> @llvm.[[TARGET]].cross.v3f16(<3 x half> -// NATIVE_HALF: ret <3 x half> %hlsl.cross -// NO_HALF: define [[FNATTRS]] <3 x float> @ -// NO_HALF: call <3 x float> @llvm.[[TARGET]].cross.v3f32(<3 x float> -// NO_HALF: ret <3 x float> %hlsl.cross -half3 test_cross_half3(half3 p0, half3 p1) -{ - return cross(p0, p1); -} - -// CHECK: define [[FNATTRS]] <3 x float> @ -// CHECK: %hlsl.cross = call <3 x float> @llvm.[[TARGET]].cross.v3f32( -// CHECK: ret <3 x float> %hlsl.cross -float3 test_cross_float3(float3 p0, float3 p1) -{ - return cross(p0, p1); -} +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ +// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ +// RUN: --check-prefixes=CHECK,NATIVE_HALF \ +// RUN: -DFNATTRS=noundef -DTARGET=dx +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -emit-llvm -disable-llvm-passes \ +// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ +// RUN: -DFNATTRS=noundef -DTARGET=dx +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: spirv-unknown-vulkan-compute %s -fnative-half-type \ +// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ +// RUN: --check-prefixes=CHECK,NATIVE_HALF \ +// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: spirv-unknown-vulkan-compute %s -emit-llvm -disable-llvm-passes \ +// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ +// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv + +// NATIVE_HALF: define [[FNATTRS]] <3 x half> @ +// NATIVE_HALF: call <3 x half> @llvm.[[TARGET]].cross.v3f16(<3 x half> +// NATIVE_HALF: ret <3 x half> %hlsl.cross +// NO_HALF: define [[FNATTRS]] <3 x float> @ +// NO_HALF: call <3 x float> @llvm.[[TARGET]].cross.v3f32(<3 x float> +// NO_HALF: ret <3 x float> %hlsl.cross +half3 test_cross_half3(half3 p0, half3 p1) +{ + return cross(p0, p1); +} + +// CHECK: define [[FNATTRS]] <3 x float> @ +// CHECK: %hlsl.cross = call <3 x float> @llvm.[[TARGET]].cross.v3f32( +// CHECK: ret <3 x float> %hlsl.cross +float3 test_cross_float3(float3 p0, float3 p1) +{ + return cross(p0, p1); +} diff --git a/clang/test/CodeGenHLSL/builtins/length.hlsl b/clang/test/CodeGenHLSL/builtins/length.hlsl index 1c23b0df04df98..9b0293c218a5de 100644 --- a/clang/test/CodeGenHLSL/builtins/length.hlsl +++ b/clang/test/CodeGenHLSL/builtins/length.hlsl @@ -1,73 +1,73 @@ -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ -// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ -// RUN: --check-prefixes=CHECK,NATIVE_HALF -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -emit-llvm -disable-llvm-passes \ -// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF - -// NATIVE_HALF: define noundef half @ -// NATIVE_HALF: call half @llvm.fabs.f16(half -// NO_HALF: call float @llvm.fabs.f32(float -// NATIVE_HALF: ret half -// NO_HALF: ret float -half test_length_half(half p0) -{ - return length(p0); -} -// NATIVE_HALF: define noundef half @ -// NATIVE_HALF: %hlsl.length = call half @llvm.dx.length.v2f16 -// NO_HALF: %hlsl.length = call float @llvm.dx.length.v2f32( -// NATIVE_HALF: ret half %hlsl.length -// NO_HALF: ret float %hlsl.length -half test_length_half2(half2 p0) -{ - return length(p0); -} -// NATIVE_HALF: define noundef half @ -// NATIVE_HALF: %hlsl.length = call half @llvm.dx.length.v3f16 -// NO_HALF: %hlsl.length = call float @llvm.dx.length.v3f32( -// NATIVE_HALF: ret half %hlsl.length -// NO_HALF: ret float %hlsl.length -half test_length_half3(half3 p0) -{ - return length(p0); -} -// NATIVE_HALF: define noundef half @ -// NATIVE_HALF: %hlsl.length = call half @llvm.dx.length.v4f16 -// NO_HALF: %hlsl.length = call float @llvm.dx.length.v4f32( -// NATIVE_HALF: ret half %hlsl.length -// NO_HALF: ret float %hlsl.length -half test_length_half4(half4 p0) -{ - return length(p0); -} - -// CHECK: define noundef float @ -// CHECK: call float @llvm.fabs.f32(float -// CHECK: ret float -float test_length_float(float p0) -{ - return length(p0); -} -// CHECK: define noundef float @ -// CHECK: %hlsl.length = call float @llvm.dx.length.v2f32( -// CHECK: ret float %hlsl.length -float test_length_float2(float2 p0) -{ - return length(p0); -} -// CHECK: define noundef float @ -// CHECK: %hlsl.length = call float @llvm.dx.length.v3f32( -// CHECK: ret float %hlsl.length -float test_length_float3(float3 p0) -{ - return length(p0); -} -// CHECK: define noundef float @ -// CHECK: %hlsl.length = call float @llvm.dx.length.v4f32( -// CHECK: ret float %hlsl.length -float test_length_float4(float4 p0) -{ - return length(p0); -} +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ +// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ +// RUN: --check-prefixes=CHECK,NATIVE_HALF +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -emit-llvm -disable-llvm-passes \ +// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF + +// NATIVE_HALF: define noundef half @ +// NATIVE_HALF: call half @llvm.fabs.f16(half +// NO_HALF: call float @llvm.fabs.f32(float +// NATIVE_HALF: ret half +// NO_HALF: ret float +half test_length_half(half p0) +{ + return length(p0); +} +// NATIVE_HALF: define noundef half @ +// NATIVE_HALF: %hlsl.length = call half @llvm.dx.length.v2f16 +// NO_HALF: %hlsl.length = call float @llvm.dx.length.v2f32( +// NATIVE_HALF: ret half %hlsl.length +// NO_HALF: ret float %hlsl.length +half test_length_half2(half2 p0) +{ + return length(p0); +} +// NATIVE_HALF: define noundef half @ +// NATIVE_HALF: %hlsl.length = call half @llvm.dx.length.v3f16 +// NO_HALF: %hlsl.length = call float @llvm.dx.length.v3f32( +// NATIVE_HALF: ret half %hlsl.length +// NO_HALF: ret float %hlsl.length +half test_length_half3(half3 p0) +{ + return length(p0); +} +// NATIVE_HALF: define noundef half @ +// NATIVE_HALF: %hlsl.length = call half @llvm.dx.length.v4f16 +// NO_HALF: %hlsl.length = call float @llvm.dx.length.v4f32( +// NATIVE_HALF: ret half %hlsl.length +// NO_HALF: ret float %hlsl.length +half test_length_half4(half4 p0) +{ + return length(p0); +} + +// CHECK: define noundef float @ +// CHECK: call float @llvm.fabs.f32(float +// CHECK: ret float +float test_length_float(float p0) +{ + return length(p0); +} +// CHECK: define noundef float @ +// CHECK: %hlsl.length = call float @llvm.dx.length.v2f32( +// CHECK: ret float %hlsl.length +float test_length_float2(float2 p0) +{ + return length(p0); +} +// CHECK: define noundef float @ +// CHECK: %hlsl.length = call float @llvm.dx.length.v3f32( +// CHECK: ret float %hlsl.length +float test_length_float3(float3 p0) +{ + return length(p0); +} +// CHECK: define noundef float @ +// CHECK: %hlsl.length = call float @llvm.dx.length.v4f32( +// CHECK: ret float %hlsl.length +float test_length_float4(float4 p0) +{ + return length(p0); +} diff --git a/clang/test/CodeGenHLSL/builtins/normalize.hlsl b/clang/test/CodeGenHLSL/builtins/normalize.hlsl index 83ad607c14a607..d14e7c70ce0653 100644 --- a/clang/test/CodeGenHLSL/builtins/normalize.hlsl +++ b/clang/test/CodeGenHLSL/builtins/normalize.hlsl @@ -1,85 +1,85 @@ -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ -// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ -// RUN: --check-prefixes=CHECK,NATIVE_HALF \ -// RUN: -DFNATTRS=noundef -DTARGET=dx -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -emit-llvm -disable-llvm-passes \ -// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ -// RUN: -DFNATTRS=noundef -DTARGET=dx -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: spirv-unknown-vulkan-compute %s -fnative-half-type \ -// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ -// RUN: --check-prefixes=CHECK,NATIVE_HALF \ -// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: spirv-unknown-vulkan-compute %s -emit-llvm -disable-llvm-passes \ -// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ -// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv - -// NATIVE_HALF: define [[FNATTRS]] half @ -// NATIVE_HALF: call half @llvm.[[TARGET]].normalize.f16(half -// NO_HALF: call float @llvm.[[TARGET]].normalize.f32(float -// NATIVE_HALF: ret half -// NO_HALF: ret float -half test_normalize_half(half p0) -{ - return normalize(p0); -} -// NATIVE_HALF: define [[FNATTRS]] <2 x half> @ -// NATIVE_HALF: call <2 x half> @llvm.[[TARGET]].normalize.v2f16(<2 x half> -// NO_HALF: call <2 x float> @llvm.[[TARGET]].normalize.v2f32(<2 x float> -// NATIVE_HALF: ret <2 x half> %hlsl.normalize -// NO_HALF: ret <2 x float> %hlsl.normalize -half2 test_normalize_half2(half2 p0) -{ - return normalize(p0); -} -// NATIVE_HALF: define [[FNATTRS]] <3 x half> @ -// NATIVE_HALF: call <3 x half> @llvm.[[TARGET]].normalize.v3f16(<3 x half> -// NO_HALF: call <3 x float> @llvm.[[TARGET]].normalize.v3f32(<3 x float> -// NATIVE_HALF: ret <3 x half> %hlsl.normalize -// NO_HALF: ret <3 x float> %hlsl.normalize -half3 test_normalize_half3(half3 p0) -{ - return normalize(p0); -} -// NATIVE_HALF: define [[FNATTRS]] <4 x half> @ -// NATIVE_HALF: call <4 x half> @llvm.[[TARGET]].normalize.v4f16(<4 x half> -// NO_HALF: call <4 x float> @llvm.[[TARGET]].normalize.v4f32(<4 x float> -// NATIVE_HALF: ret <4 x half> %hlsl.normalize -// NO_HALF: ret <4 x float> %hlsl.normalize -half4 test_normalize_half4(half4 p0) -{ - return normalize(p0); -} - -// CHECK: define [[FNATTRS]] float @ -// CHECK: call float @llvm.[[TARGET]].normalize.f32(float -// CHECK: ret float -float test_normalize_float(float p0) -{ - return normalize(p0); -} -// CHECK: define [[FNATTRS]] <2 x float> @ -// CHECK: %hlsl.normalize = call <2 x float> @llvm.[[TARGET]].normalize.v2f32(<2 x float> - -// CHECK: ret <2 x float> %hlsl.normalize -float2 test_normalize_float2(float2 p0) -{ - return normalize(p0); -} -// CHECK: define [[FNATTRS]] <3 x float> @ -// CHECK: %hlsl.normalize = call <3 x float> @llvm.[[TARGET]].normalize.v3f32( -// CHECK: ret <3 x float> %hlsl.normalize -float3 test_normalize_float3(float3 p0) -{ - return normalize(p0); -} -// CHECK: define [[FNATTRS]] <4 x float> @ -// CHECK: %hlsl.normalize = call <4 x float> @llvm.[[TARGET]].normalize.v4f32( -// CHECK: ret <4 x float> %hlsl.normalize -float4 test_length_float4(float4 p0) -{ - return normalize(p0); -} +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ +// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ +// RUN: --check-prefixes=CHECK,NATIVE_HALF \ +// RUN: -DFNATTRS=noundef -DTARGET=dx +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -emit-llvm -disable-llvm-passes \ +// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ +// RUN: -DFNATTRS=noundef -DTARGET=dx +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: spirv-unknown-vulkan-compute %s -fnative-half-type \ +// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ +// RUN: --check-prefixes=CHECK,NATIVE_HALF \ +// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: spirv-unknown-vulkan-compute %s -emit-llvm -disable-llvm-passes \ +// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ +// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv + +// NATIVE_HALF: define [[FNATTRS]] half @ +// NATIVE_HALF: call half @llvm.[[TARGET]].normalize.f16(half +// NO_HALF: call float @llvm.[[TARGET]].normalize.f32(float +// NATIVE_HALF: ret half +// NO_HALF: ret float +half test_normalize_half(half p0) +{ + return normalize(p0); +} +// NATIVE_HALF: define [[FNATTRS]] <2 x half> @ +// NATIVE_HALF: call <2 x half> @llvm.[[TARGET]].normalize.v2f16(<2 x half> +// NO_HALF: call <2 x float> @llvm.[[TARGET]].normalize.v2f32(<2 x float> +// NATIVE_HALF: ret <2 x half> %hlsl.normalize +// NO_HALF: ret <2 x float> %hlsl.normalize +half2 test_normalize_half2(half2 p0) +{ + return normalize(p0); +} +// NATIVE_HALF: define [[FNATTRS]] <3 x half> @ +// NATIVE_HALF: call <3 x half> @llvm.[[TARGET]].normalize.v3f16(<3 x half> +// NO_HALF: call <3 x float> @llvm.[[TARGET]].normalize.v3f32(<3 x float> +// NATIVE_HALF: ret <3 x half> %hlsl.normalize +// NO_HALF: ret <3 x float> %hlsl.normalize +half3 test_normalize_half3(half3 p0) +{ + return normalize(p0); +} +// NATIVE_HALF: define [[FNATTRS]] <4 x half> @ +// NATIVE_HALF: call <4 x half> @llvm.[[TARGET]].normalize.v4f16(<4 x half> +// NO_HALF: call <4 x float> @llvm.[[TARGET]].normalize.v4f32(<4 x float> +// NATIVE_HALF: ret <4 x half> %hlsl.normalize +// NO_HALF: ret <4 x float> %hlsl.normalize +half4 test_normalize_half4(half4 p0) +{ + return normalize(p0); +} + +// CHECK: define [[FNATTRS]] float @ +// CHECK: call float @llvm.[[TARGET]].normalize.f32(float +// CHECK: ret float +float test_normalize_float(float p0) +{ + return normalize(p0); +} +// CHECK: define [[FNATTRS]] <2 x float> @ +// CHECK: %hlsl.normalize = call <2 x float> @llvm.[[TARGET]].normalize.v2f32(<2 x float> + +// CHECK: ret <2 x float> %hlsl.normalize +float2 test_normalize_float2(float2 p0) +{ + return normalize(p0); +} +// CHECK: define [[FNATTRS]] <3 x float> @ +// CHECK: %hlsl.normalize = call <3 x float> @llvm.[[TARGET]].normalize.v3f32( +// CHECK: ret <3 x float> %hlsl.normalize +float3 test_normalize_float3(float3 p0) +{ + return normalize(p0); +} +// CHECK: define [[FNATTRS]] <4 x float> @ +// CHECK: %hlsl.normalize = call <4 x float> @llvm.[[TARGET]].normalize.v4f32( +// CHECK: ret <4 x float> %hlsl.normalize +float4 test_length_float4(float4 p0) +{ + return normalize(p0); +} diff --git a/clang/test/CodeGenHLSL/builtins/step.hlsl b/clang/test/CodeGenHLSL/builtins/step.hlsl index 442f4930ca579c..8ef52794a3be5d 100644 --- a/clang/test/CodeGenHLSL/builtins/step.hlsl +++ b/clang/test/CodeGenHLSL/builtins/step.hlsl @@ -1,84 +1,84 @@ -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ -// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ -// RUN: --check-prefixes=CHECK,NATIVE_HALF \ -// RUN: -DFNATTRS=noundef -DTARGET=dx -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -emit-llvm -disable-llvm-passes \ -// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ -// RUN: -DFNATTRS=noundef -DTARGET=dx -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: spirv-unknown-vulkan-compute %s -fnative-half-type \ -// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ -// RUN: --check-prefixes=CHECK,NATIVE_HALF \ -// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: spirv-unknown-vulkan-compute %s -emit-llvm -disable-llvm-passes \ -// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ -// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv - -// NATIVE_HALF: define [[FNATTRS]] half @ -// NATIVE_HALF: call half @llvm.[[TARGET]].step.f16(half -// NO_HALF: call float @llvm.[[TARGET]].step.f32(float -// NATIVE_HALF: ret half -// NO_HALF: ret float -half test_step_half(half p0, half p1) -{ - return step(p0, p1); -} -// NATIVE_HALF: define [[FNATTRS]] <2 x half> @ -// NATIVE_HALF: call <2 x half> @llvm.[[TARGET]].step.v2f16(<2 x half> -// NO_HALF: call <2 x float> @llvm.[[TARGET]].step.v2f32(<2 x float> -// NATIVE_HALF: ret <2 x half> %hlsl.step -// NO_HALF: ret <2 x float> %hlsl.step -half2 test_step_half2(half2 p0, half2 p1) -{ - return step(p0, p1); -} -// NATIVE_HALF: define [[FNATTRS]] <3 x half> @ -// NATIVE_HALF: call <3 x half> @llvm.[[TARGET]].step.v3f16(<3 x half> -// NO_HALF: call <3 x float> @llvm.[[TARGET]].step.v3f32(<3 x float> -// NATIVE_HALF: ret <3 x half> %hlsl.step -// NO_HALF: ret <3 x float> %hlsl.step -half3 test_step_half3(half3 p0, half3 p1) -{ - return step(p0, p1); -} -// NATIVE_HALF: define [[FNATTRS]] <4 x half> @ -// NATIVE_HALF: call <4 x half> @llvm.[[TARGET]].step.v4f16(<4 x half> -// NO_HALF: call <4 x float> @llvm.[[TARGET]].step.v4f32(<4 x float> -// NATIVE_HALF: ret <4 x half> %hlsl.step -// NO_HALF: ret <4 x float> %hlsl.step -half4 test_step_half4(half4 p0, half4 p1) -{ - return step(p0, p1); -} - -// CHECK: define [[FNATTRS]] float @ -// CHECK: call float @llvm.[[TARGET]].step.f32(float -// CHECK: ret float -float test_step_float(float p0, float p1) -{ - return step(p0, p1); -} -// CHECK: define [[FNATTRS]] <2 x float> @ -// CHECK: %hlsl.step = call <2 x float> @llvm.[[TARGET]].step.v2f32( -// CHECK: ret <2 x float> %hlsl.step -float2 test_step_float2(float2 p0, float2 p1) -{ - return step(p0, p1); -} -// CHECK: define [[FNATTRS]] <3 x float> @ -// CHECK: %hlsl.step = call <3 x float> @llvm.[[TARGET]].step.v3f32( -// CHECK: ret <3 x float> %hlsl.step -float3 test_step_float3(float3 p0, float3 p1) -{ - return step(p0, p1); -} -// CHECK: define [[FNATTRS]] <4 x float> @ -// CHECK: %hlsl.step = call <4 x float> @llvm.[[TARGET]].step.v4f32( -// CHECK: ret <4 x float> %hlsl.step -float4 test_step_float4(float4 p0, float4 p1) -{ - return step(p0, p1); -} +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ +// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ +// RUN: --check-prefixes=CHECK,NATIVE_HALF \ +// RUN: -DFNATTRS=noundef -DTARGET=dx +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -emit-llvm -disable-llvm-passes \ +// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ +// RUN: -DFNATTRS=noundef -DTARGET=dx +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: spirv-unknown-vulkan-compute %s -fnative-half-type \ +// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ +// RUN: --check-prefixes=CHECK,NATIVE_HALF \ +// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: spirv-unknown-vulkan-compute %s -emit-llvm -disable-llvm-passes \ +// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ +// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv + +// NATIVE_HALF: define [[FNATTRS]] half @ +// NATIVE_HALF: call half @llvm.[[TARGET]].step.f16(half +// NO_HALF: call float @llvm.[[TARGET]].step.f32(float +// NATIVE_HALF: ret half +// NO_HALF: ret float +half test_step_half(half p0, half p1) +{ + return step(p0, p1); +} +// NATIVE_HALF: define [[FNATTRS]] <2 x half> @ +// NATIVE_HALF: call <2 x half> @llvm.[[TARGET]].step.v2f16(<2 x half> +// NO_HALF: call <2 x float> @llvm.[[TARGET]].step.v2f32(<2 x float> +// NATIVE_HALF: ret <2 x half> %hlsl.step +// NO_HALF: ret <2 x float> %hlsl.step +half2 test_step_half2(half2 p0, half2 p1) +{ + return step(p0, p1); +} +// NATIVE_HALF: define [[FNATTRS]] <3 x half> @ +// NATIVE_HALF: call <3 x half> @llvm.[[TARGET]].step.v3f16(<3 x half> +// NO_HALF: call <3 x float> @llvm.[[TARGET]].step.v3f32(<3 x float> +// NATIVE_HALF: ret <3 x half> %hlsl.step +// NO_HALF: ret <3 x float> %hlsl.step +half3 test_step_half3(half3 p0, half3 p1) +{ + return step(p0, p1); +} +// NATIVE_HALF: define [[FNATTRS]] <4 x half> @ +// NATIVE_HALF: call <4 x half> @llvm.[[TARGET]].step.v4f16(<4 x half> +// NO_HALF: call <4 x float> @llvm.[[TARGET]].step.v4f32(<4 x float> +// NATIVE_HALF: ret <4 x half> %hlsl.step +// NO_HALF: ret <4 x float> %hlsl.step +half4 test_step_half4(half4 p0, half4 p1) +{ + return step(p0, p1); +} + +// CHECK: define [[FNATTRS]] float @ +// CHECK: call float @llvm.[[TARGET]].step.f32(float +// CHECK: ret float +float test_step_float(float p0, float p1) +{ + return step(p0, p1); +} +// CHECK: define [[FNATTRS]] <2 x float> @ +// CHECK: %hlsl.step = call <2 x float> @llvm.[[TARGET]].step.v2f32( +// CHECK: ret <2 x float> %hlsl.step +float2 test_step_float2(float2 p0, float2 p1) +{ + return step(p0, p1); +} +// CHECK: define [[FNATTRS]] <3 x float> @ +// CHECK: %hlsl.step = call <3 x float> @llvm.[[TARGET]].step.v3f32( +// CHECK: ret <3 x float> %hlsl.step +float3 test_step_float3(float3 p0, float3 p1) +{ + return step(p0, p1); +} +// CHECK: define [[FNATTRS]] <4 x float> @ +// CHECK: %hlsl.step = call <4 x float> @llvm.[[TARGET]].step.v4f32( +// CHECK: ret <4 x float> %hlsl.step +float4 test_step_float4(float4 p0, float4 p1) +{ + return step(p0, p1); +} diff --git a/clang/test/Driver/flang/msvc-link.f90 b/clang/test/Driver/flang/msvc-link.f90 index 463749510eb5f8..3f7e162a9a6116 100644 --- a/clang/test/Driver/flang/msvc-link.f90 +++ b/clang/test/Driver/flang/msvc-link.f90 @@ -1,5 +1,5 @@ -! RUN: %clang --driver-mode=flang --target=x86_64-pc-windows-msvc -### %s -Ltest 2>&1 | FileCheck %s -! -! Test that user provided paths come before the Flang runtimes -! CHECK: "-libpath:test" -! CHECK: "-libpath:{{.*(\\|/)}}lib" +! RUN: %clang --driver-mode=flang --target=x86_64-pc-windows-msvc -### %s -Ltest 2>&1 | FileCheck %s +! +! Test that user provided paths come before the Flang runtimes +! CHECK: "-libpath:test" +! CHECK: "-libpath:{{.*(\\|/)}}lib" diff --git a/clang/test/FixIt/fixit-newline-style.c b/clang/test/FixIt/fixit-newline-style.c index 61e4df67e85bac..2aac143d4d753e 100644 --- a/clang/test/FixIt/fixit-newline-style.c +++ b/clang/test/FixIt/fixit-newline-style.c @@ -1,11 +1,11 @@ -// RUN: %clang_cc1 -pedantic -Wunused-label -fno-diagnostics-show-line-numbers -x c %s 2>&1 | FileCheck %s -strict-whitespace - -// This file intentionally uses a CRLF newline style -// CHECK: warning: unused label 'ddd' -// CHECK-NEXT: {{^ ddd:}} -// CHECK-NEXT: {{^ \^~~~$}} -// CHECK-NOT: {{^ ;}} -void f(void) { - ddd: - ; -} +// RUN: %clang_cc1 -pedantic -Wunused-label -fno-diagnostics-show-line-numbers -x c %s 2>&1 | FileCheck %s -strict-whitespace + +// This file intentionally uses a CRLF newline style +// CHECK: warning: unused label 'ddd' +// CHECK-NEXT: {{^ ddd:}} +// CHECK-NEXT: {{^ \^~~~$}} +// CHECK-NOT: {{^ ;}} +void f(void) { + ddd: + ; +} diff --git a/clang/test/Frontend/rewrite-includes-mixed-eol-crlf.c b/clang/test/Frontend/rewrite-includes-mixed-eol-crlf.c index d6724444c06676..2faeaba3229218 100644 --- a/clang/test/Frontend/rewrite-includes-mixed-eol-crlf.c +++ b/clang/test/Frontend/rewrite-includes-mixed-eol-crlf.c @@ -1,8 +1,8 @@ -// RUN: %clang_cc1 -E -frewrite-includes %s | %clang_cc1 - -// expected-no-diagnostics -// Note: This source file has CRLF line endings. -// This test validates that -frewrite-includes translates the end of line (EOL) -// form used in header files to the EOL form used in the the primary source -// file when the files use different EOL forms. -#include "rewrite-includes-mixed-eol-crlf.h" -#include "rewrite-includes-mixed-eol-lf.h" +// RUN: %clang_cc1 -E -frewrite-includes %s | %clang_cc1 - +// expected-no-diagnostics +// Note: This source file has CRLF line endings. +// This test validates that -frewrite-includes translates the end of line (EOL) +// form used in header files to the EOL form used in the the primary source +// file when the files use different EOL forms. +#include "rewrite-includes-mixed-eol-crlf.h" +#include "rewrite-includes-mixed-eol-lf.h" diff --git a/clang/test/Frontend/rewrite-includes-mixed-eol-crlf.h b/clang/test/Frontend/rewrite-includes-mixed-eol-crlf.h index 0439b88b75e2cf..baedc282296bd7 100644 --- a/clang/test/Frontend/rewrite-includes-mixed-eol-crlf.h +++ b/clang/test/Frontend/rewrite-includes-mixed-eol-crlf.h @@ -1,11 +1,11 @@ -// Note: This header file has CRLF line endings. -// The indentation in some of the conditional inclusion directives below is -// intentional and is required for this test to function as a regression test -// for GH59736. -_Static_assert(__LINE__ == 5, ""); -#if 1 -_Static_assert(__LINE__ == 7, ""); - #if 1 - _Static_assert(__LINE__ == 9, ""); - #endif -#endif +// Note: This header file has CRLF line endings. +// The indentation in some of the conditional inclusion directives below is +// intentional and is required for this test to function as a regression test +// for GH59736. +_Static_assert(__LINE__ == 5, ""); +#if 1 +_Static_assert(__LINE__ == 7, ""); + #if 1 + _Static_assert(__LINE__ == 9, ""); + #endif +#endif diff --git a/clang/test/Frontend/system-header-line-directive-ms-lineendings.c b/clang/test/Frontend/system-header-line-directive-ms-lineendings.c index 92fc07f65e0d4d..dffdd5cf1959ae 100644 --- a/clang/test/Frontend/system-header-line-directive-ms-lineendings.c +++ b/clang/test/Frontend/system-header-line-directive-ms-lineendings.c @@ -1,21 +1,21 @@ -// RUN: %clang_cc1 %s -E -o - -I %S/Inputs -isystem %S/Inputs/SystemHeaderPrefix | FileCheck %s -#include -#include - -#include "line-directive.h" - -// This tests that the line numbers for the current file are correctly outputted -// for the include-file-completed test case. This file should be CRLF. - -// CHECK: # 1 "{{.*}}system-header-line-directive-ms-lineendings.c" 2 -// CHECK: # 1 "{{.*}}noline.h" 1 3 -// CHECK: foo(void); -// CHECK: # 3 "{{.*}}system-header-line-directive-ms-lineendings.c" 2 -// CHECK: # 1 "{{.*}}line-directive-in-system.h" 1 3 -// The "3" below indicates that "foo.h" is considered a system header. -// CHECK: # 1 "foo.h" 3 -// CHECK: foo(void); -// CHECK: # 4 "{{.*}}system-header-line-directive-ms-lineendings.c" 2 -// CHECK: # 1 "{{.*}}line-directive.h" 1 -// CHECK: # 10 "foo.h"{{$}} -// CHECK: # 6 "{{.*}}system-header-line-directive-ms-lineendings.c" 2 +// RUN: %clang_cc1 %s -E -o - -I %S/Inputs -isystem %S/Inputs/SystemHeaderPrefix | FileCheck %s +#include +#include + +#include "line-directive.h" + +// This tests that the line numbers for the current file are correctly outputted +// for the include-file-completed test case. This file should be CRLF. + +// CHECK: # 1 "{{.*}}system-header-line-directive-ms-lineendings.c" 2 +// CHECK: # 1 "{{.*}}noline.h" 1 3 +// CHECK: foo(void); +// CHECK: # 3 "{{.*}}system-header-line-directive-ms-lineendings.c" 2 +// CHECK: # 1 "{{.*}}line-directive-in-system.h" 1 3 +// The "3" below indicates that "foo.h" is considered a system header. +// CHECK: # 1 "foo.h" 3 +// CHECK: foo(void); +// CHECK: # 4 "{{.*}}system-header-line-directive-ms-lineendings.c" 2 +// CHECK: # 1 "{{.*}}line-directive.h" 1 +// CHECK: # 10 "foo.h"{{$}} +// CHECK: # 6 "{{.*}}system-header-line-directive-ms-lineendings.c" 2 diff --git a/clang/test/ParserHLSL/bitfields.hlsl b/clang/test/ParserHLSL/bitfields.hlsl index 307d1143a068e2..57b6705babdc12 100644 --- a/clang/test/ParserHLSL/bitfields.hlsl +++ b/clang/test/ParserHLSL/bitfields.hlsl @@ -1,31 +1,31 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -ast-dump -x hlsl -o - %s | FileCheck %s - - -struct MyBitFields { - // CHECK: FieldDecl 0x{{[0-9a-f]+}} col:16 referenced field1 'unsigned int' - // CHECK:-ConstantExpr 0x{{[0-9a-f]+}} 'int' - // CHECK:-value: Int 3 - // CHECK:-IntegerLiteral 0x{{[0-9a-f]+}} 'int' 3 - unsigned int field1 : 3; // 3 bits for field1 - - // CHECK:FieldDecl 0x{{[0-9a-f]+}} col:16 referenced field2 'unsigned int' - // CHECK:-ConstantExpr 0x{{[0-9a-f]+}} 'int' - // CHECK:-value: Int 4 - // CHECK:-IntegerLiteral 0x{{[0-9a-f]+}} 'int' 4 - unsigned int field2 : 4; // 4 bits for field2 - - // CHECK:FieldDecl 0x{{[0-9a-f]+}} col:7 field3 'int' - // CHECK:-ConstantExpr 0x{{[0-9a-f]+}} 'int' - // CHECK:-value: Int 5 - // CHECK:-IntegerLiteral 0x{{[0-9a-f]+}} 'int' 5 - int field3 : 5; // 5 bits for field3 (signed) -}; - - - -[numthreads(1,1,1)] -void main() { - MyBitFields m; - m.field1 = 4; - m.field2 = m.field1*2; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -ast-dump -x hlsl -o - %s | FileCheck %s + + +struct MyBitFields { + // CHECK: FieldDecl 0x{{[0-9a-f]+}} col:16 referenced field1 'unsigned int' + // CHECK:-ConstantExpr 0x{{[0-9a-f]+}} 'int' + // CHECK:-value: Int 3 + // CHECK:-IntegerLiteral 0x{{[0-9a-f]+}} 'int' 3 + unsigned int field1 : 3; // 3 bits for field1 + + // CHECK:FieldDecl 0x{{[0-9a-f]+}} col:16 referenced field2 'unsigned int' + // CHECK:-ConstantExpr 0x{{[0-9a-f]+}} 'int' + // CHECK:-value: Int 4 + // CHECK:-IntegerLiteral 0x{{[0-9a-f]+}} 'int' 4 + unsigned int field2 : 4; // 4 bits for field2 + + // CHECK:FieldDecl 0x{{[0-9a-f]+}} col:7 field3 'int' + // CHECK:-ConstantExpr 0x{{[0-9a-f]+}} 'int' + // CHECK:-value: Int 5 + // CHECK:-IntegerLiteral 0x{{[0-9a-f]+}} 'int' 5 + int field3 : 5; // 5 bits for field3 (signed) +}; + + + +[numthreads(1,1,1)] +void main() { + MyBitFields m; + m.field1 = 4; + m.field2 = m.field1*2; } \ No newline at end of file diff --git a/clang/test/ParserHLSL/hlsl_annotations_on_struct_members.hlsl b/clang/test/ParserHLSL/hlsl_annotations_on_struct_members.hlsl index 2eebc920388b5b..5b228d039345e1 100644 --- a/clang/test/ParserHLSL/hlsl_annotations_on_struct_members.hlsl +++ b/clang/test/ParserHLSL/hlsl_annotations_on_struct_members.hlsl @@ -1,21 +1,21 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -ast-dump -o - %s | FileCheck %s - -// tests that hlsl annotations are properly parsed when applied on field decls, -// and that the annotation gets properly placed on the AST. - -struct Eg9{ - // CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} col:8 implicit struct Eg9 - // CHECK: FieldDecl 0x{{[0-9a-f]+}} col:16 referenced a 'unsigned int' - // CHECK: -HLSLSV_DispatchThreadIDAttr 0x{{[0-9a-f]+}} - unsigned int a : SV_DispatchThreadID; -}; -Eg9 e9; - - -RWBuffer In : register(u1); - - -[numthreads(1,1,1)] -void main() { - In[0] = e9.a; -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -ast-dump -o - %s | FileCheck %s + +// tests that hlsl annotations are properly parsed when applied on field decls, +// and that the annotation gets properly placed on the AST. + +struct Eg9{ + // CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} col:8 implicit struct Eg9 + // CHECK: FieldDecl 0x{{[0-9a-f]+}} col:16 referenced a 'unsigned int' + // CHECK: -HLSLSV_DispatchThreadIDAttr 0x{{[0-9a-f]+}} + unsigned int a : SV_DispatchThreadID; +}; +Eg9 e9; + + +RWBuffer In : register(u1); + + +[numthreads(1,1,1)] +void main() { + In[0] = e9.a; +} diff --git a/clang/test/ParserHLSL/hlsl_contained_type_attr.hlsl b/clang/test/ParserHLSL/hlsl_contained_type_attr.hlsl index 5a72aa242e581d..476ec39e14da98 100644 --- a/clang/test/ParserHLSL/hlsl_contained_type_attr.hlsl +++ b/clang/test/ParserHLSL/hlsl_contained_type_attr.hlsl @@ -1,25 +1,25 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -std=hlsl202x -x hlsl -ast-dump -o - %s | FileCheck %s - -typedef vector float4; - -// CHECK: -TypeAliasDecl 0x{{[0-9a-f]+}} -// CHECK: -HLSLAttributedResourceType 0x{{[0-9a-f]+}} '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::contained_type(int)]] -using ResourceIntAliasT = __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(int)]]; -ResourceIntAliasT h1; - -// CHECK: -VarDecl 0x{{[0-9a-f]+}} col:82 h2 '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::contained_type(float4)]] -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(float4)]] h2; - -// CHECK: ClassTemplateDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 6]]:30 S -// CHECK: TemplateTypeParmDecl 0x{{[0-9a-f]+}} col:20 referenced typename depth 0 index 0 T -// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 4]]:30 struct S definition -// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:79 h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::contained_type(T)]] -template struct S { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(T)]] h; -}; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -std=hlsl202x -x hlsl -ast-dump -o - %s | FileCheck %s + +typedef vector float4; + +// CHECK: -TypeAliasDecl 0x{{[0-9a-f]+}} +// CHECK: -HLSLAttributedResourceType 0x{{[0-9a-f]+}} '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::contained_type(int)]] +using ResourceIntAliasT = __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(int)]]; +ResourceIntAliasT h1; + +// CHECK: -VarDecl 0x{{[0-9a-f]+}} col:82 h2 '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::contained_type(float4)]] +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(float4)]] h2; + +// CHECK: ClassTemplateDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 6]]:30 S +// CHECK: TemplateTypeParmDecl 0x{{[0-9a-f]+}} col:20 referenced typename depth 0 index 0 T +// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 4]]:30 struct S definition +// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:79 h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::contained_type(T)]] +template struct S { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(T)]] h; +}; diff --git a/clang/test/ParserHLSL/hlsl_contained_type_attr_error.hlsl b/clang/test/ParserHLSL/hlsl_contained_type_attr_error.hlsl index b2d492d95945c1..673ff8693b83b8 100644 --- a/clang/test/ParserHLSL/hlsl_contained_type_attr_error.hlsl +++ b/clang/test/ParserHLSL/hlsl_contained_type_attr_error.hlsl @@ -1,28 +1,28 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -std=hlsl202x -x hlsl -o - %s -verify - -typedef vector float4; - -// expected-error at +1{{'contained_type' attribute cannot be applied to a declaration}} -[[hlsl::contained_type(float4)]] __hlsl_resource_t h1; - -// expected-error at +1{{'contained_type' attribute takes one argument}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type()]] h3; - -// expected-error at +1{{expected a type}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(0)]] h4; - -// expected-error at +1{{unknown type name 'a'}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(a)]] h5; - -// expected-error at +1{{expected a type}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type("b", c)]] h6; - -// expected-warning at +1{{attribute 'contained_type' is already applied}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(float)]] [[hlsl::contained_type(float)]] h7; - -// expected-warning at +1{{attribute 'contained_type' is already applied with different arguments}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(float)]] [[hlsl::contained_type(int)]] h8; - -// expected-error at +2{{attribute 'resource_class' can be used only on HLSL intangible type '__hlsl_resource_t'}} -// expected-error at +1{{attribute 'contained_type' can be used only on HLSL intangible type '__hlsl_resource_t'}} -float [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(float)]] res5; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -std=hlsl202x -x hlsl -o - %s -verify + +typedef vector float4; + +// expected-error at +1{{'contained_type' attribute cannot be applied to a declaration}} +[[hlsl::contained_type(float4)]] __hlsl_resource_t h1; + +// expected-error at +1{{'contained_type' attribute takes one argument}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type()]] h3; + +// expected-error at +1{{expected a type}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(0)]] h4; + +// expected-error at +1{{unknown type name 'a'}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(a)]] h5; + +// expected-error at +1{{expected a type}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type("b", c)]] h6; + +// expected-warning at +1{{attribute 'contained_type' is already applied}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(float)]] [[hlsl::contained_type(float)]] h7; + +// expected-warning at +1{{attribute 'contained_type' is already applied with different arguments}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(float)]] [[hlsl::contained_type(int)]] h8; + +// expected-error at +2{{attribute 'resource_class' can be used only on HLSL intangible type '__hlsl_resource_t'}} +// expected-error at +1{{attribute 'contained_type' can be used only on HLSL intangible type '__hlsl_resource_t'}} +float [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(float)]] res5; diff --git a/clang/test/ParserHLSL/hlsl_is_rov_attr.hlsl b/clang/test/ParserHLSL/hlsl_is_rov_attr.hlsl index 836d129c8d0002..487dc32413032d 100644 --- a/clang/test/ParserHLSL/hlsl_is_rov_attr.hlsl +++ b/clang/test/ParserHLSL/hlsl_is_rov_attr.hlsl @@ -1,22 +1,22 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -ast-dump -o - %s | FileCheck %s - -// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} {{.*}} struct MyBuffer definition -// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:68 h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::is_rov]] -struct MyBuffer { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::is_rov]] h; -}; - -// CHECK: VarDecl 0x{{[0-9a-f]+}} col:66 res '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(SRV)]] -// CHECK-SAME{LITERAL}: [[hlsl::is_rov]] -__hlsl_resource_t [[hlsl::is_rov]] [[hlsl::resource_class(SRV)]] res; - -// CHECK: FunctionDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 4]]:6 f 'void () -// CHECK: VarDecl 0x{{[0-9a-f]+}} col:72 r '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(Sampler)]] -// CHECK-SAME{LITERAL}: [[hlsl::is_rov]] -void f() { - __hlsl_resource_t [[hlsl::resource_class(Sampler)]] [[hlsl::is_rov]] r; -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -ast-dump -o - %s | FileCheck %s + +// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} {{.*}} struct MyBuffer definition +// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:68 h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::is_rov]] +struct MyBuffer { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::is_rov]] h; +}; + +// CHECK: VarDecl 0x{{[0-9a-f]+}} col:66 res '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(SRV)]] +// CHECK-SAME{LITERAL}: [[hlsl::is_rov]] +__hlsl_resource_t [[hlsl::is_rov]] [[hlsl::resource_class(SRV)]] res; + +// CHECK: FunctionDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 4]]:6 f 'void () +// CHECK: VarDecl 0x{{[0-9a-f]+}} col:72 r '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(Sampler)]] +// CHECK-SAME{LITERAL}: [[hlsl::is_rov]] +void f() { + __hlsl_resource_t [[hlsl::resource_class(Sampler)]] [[hlsl::is_rov]] r; +} diff --git a/clang/test/ParserHLSL/hlsl_is_rov_attr_error.hlsl b/clang/test/ParserHLSL/hlsl_is_rov_attr_error.hlsl index 3b2c12e7a96c5c..9bb64ea990e284 100644 --- a/clang/test/ParserHLSL/hlsl_is_rov_attr_error.hlsl +++ b/clang/test/ParserHLSL/hlsl_is_rov_attr_error.hlsl @@ -1,20 +1,20 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -o - %s -verify - -// expected-error at +1{{'is_rov' attribute cannot be applied to a declaration}} -[[hlsl::is_rov]] __hlsl_resource_t res0; - -// expected-error at +1{{HLSL resource needs to have [[hlsl::resource_class()]] attribute}} -__hlsl_resource_t [[hlsl::is_rov]] res1; - -// expected-error at +1{{'is_rov' attribute takes no arguments}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::is_rov(3)]] res2; - -// expected-error at +1{{use of undeclared identifier 'gibberish'}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::is_rov(gibberish)]] res3; - -// expected-warning at +1{{attribute 'is_rov' is already applied}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::is_rov]] [[hlsl::is_rov]] res4; - -// expected-error at +2{{attribute 'resource_class' can be used only on HLSL intangible type '__hlsl_resource_t'}} -// expected-error at +1{{attribute 'is_rov' can be used only on HLSL intangible type '__hlsl_resource_t'}} -float [[hlsl::resource_class(UAV)]] [[hlsl::is_rov]] res5; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -o - %s -verify + +// expected-error at +1{{'is_rov' attribute cannot be applied to a declaration}} +[[hlsl::is_rov]] __hlsl_resource_t res0; + +// expected-error at +1{{HLSL resource needs to have [[hlsl::resource_class()]] attribute}} +__hlsl_resource_t [[hlsl::is_rov]] res1; + +// expected-error at +1{{'is_rov' attribute takes no arguments}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::is_rov(3)]] res2; + +// expected-error at +1{{use of undeclared identifier 'gibberish'}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::is_rov(gibberish)]] res3; + +// expected-warning at +1{{attribute 'is_rov' is already applied}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::is_rov]] [[hlsl::is_rov]] res4; + +// expected-error at +2{{attribute 'resource_class' can be used only on HLSL intangible type '__hlsl_resource_t'}} +// expected-error at +1{{attribute 'is_rov' can be used only on HLSL intangible type '__hlsl_resource_t'}} +float [[hlsl::resource_class(UAV)]] [[hlsl::is_rov]] res5; diff --git a/clang/test/ParserHLSL/hlsl_raw_buffer_attr.hlsl b/clang/test/ParserHLSL/hlsl_raw_buffer_attr.hlsl index 84c924eec24efc..e09ed5586c1025 100644 --- a/clang/test/ParserHLSL/hlsl_raw_buffer_attr.hlsl +++ b/clang/test/ParserHLSL/hlsl_raw_buffer_attr.hlsl @@ -1,22 +1,22 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -ast-dump -o - %s | FileCheck %s - -// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} {{.*}} struct MyBuffer definition -// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:72 h1 '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] -struct MyBuffer { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer]] h1; -}; - -// CHECK: VarDecl 0x{{[0-9a-f]+}} col:70 h2 '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(SRV)]] -// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] -__hlsl_resource_t [[hlsl::raw_buffer]] [[hlsl::resource_class(SRV)]] h2; - -// CHECK: FunctionDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 4]]:6 f 'void () -// CHECK: VarDecl 0x{{[0-9a-f]+}} col:72 h3 '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] -void f() { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer]] h3; -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -ast-dump -o - %s | FileCheck %s + +// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} {{.*}} struct MyBuffer definition +// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:72 h1 '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] +struct MyBuffer { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer]] h1; +}; + +// CHECK: VarDecl 0x{{[0-9a-f]+}} col:70 h2 '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(SRV)]] +// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] +__hlsl_resource_t [[hlsl::raw_buffer]] [[hlsl::resource_class(SRV)]] h2; + +// CHECK: FunctionDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 4]]:6 f 'void () +// CHECK: VarDecl 0x{{[0-9a-f]+}} col:72 h3 '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] +void f() { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer]] h3; +} diff --git a/clang/test/ParserHLSL/hlsl_raw_buffer_attr_error.hlsl b/clang/test/ParserHLSL/hlsl_raw_buffer_attr_error.hlsl index 77530cbf9e4d92..a10aca4e96fc53 100644 --- a/clang/test/ParserHLSL/hlsl_raw_buffer_attr_error.hlsl +++ b/clang/test/ParserHLSL/hlsl_raw_buffer_attr_error.hlsl @@ -1,17 +1,17 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -o - %s -verify - -// expected-error at +1{{'raw_buffer' attribute cannot be applied to a declaration}} -[[hlsl::raw_buffer]] __hlsl_resource_t res0; - -// expected-error at +1{{'raw_buffer' attribute takes no arguments}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer(3)]] res2; - -// expected-error at +1{{use of undeclared identifier 'gibberish'}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer(gibberish)]] res3; - -// expected-warning at +1{{attribute 'raw_buffer' is already applied}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer]] [[hlsl::raw_buffer]] res4; - -// expected-error at +2{{attribute 'resource_class' can be used only on HLSL intangible type '__hlsl_resource_t'}} -// expected-error at +1{{attribute 'raw_buffer' can be used only on HLSL intangible type '__hlsl_resource_t'}} -float [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer]] res5; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -o - %s -verify + +// expected-error at +1{{'raw_buffer' attribute cannot be applied to a declaration}} +[[hlsl::raw_buffer]] __hlsl_resource_t res0; + +// expected-error at +1{{'raw_buffer' attribute takes no arguments}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer(3)]] res2; + +// expected-error at +1{{use of undeclared identifier 'gibberish'}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer(gibberish)]] res3; + +// expected-warning at +1{{attribute 'raw_buffer' is already applied}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer]] [[hlsl::raw_buffer]] res4; + +// expected-error at +2{{attribute 'resource_class' can be used only on HLSL intangible type '__hlsl_resource_t'}} +// expected-error at +1{{attribute 'raw_buffer' can be used only on HLSL intangible type '__hlsl_resource_t'}} +float [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer]] res5; diff --git a/clang/test/ParserHLSL/hlsl_resource_class_attr.hlsl b/clang/test/ParserHLSL/hlsl_resource_class_attr.hlsl index fbada8b4b99f75..9fee9edddf619a 100644 --- a/clang/test/ParserHLSL/hlsl_resource_class_attr.hlsl +++ b/clang/test/ParserHLSL/hlsl_resource_class_attr.hlsl @@ -1,37 +1,37 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -ast-dump -o - %s | FileCheck %s - -// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} {{.*}} struct MyBuffer definition -// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:51 h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -struct MyBuffer { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] h; -}; - -// CHECK: VarDecl 0x{{[0-9a-f]+}} col:49 res '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(SRV)]] -__hlsl_resource_t [[hlsl::resource_class(SRV)]] res; - -// CHECK: FunctionDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 3]]:6 f 'void () -// CHECK: VarDecl 0x{{[0-9a-f]+}} col:55 r '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(Sampler)]] -void f() { - __hlsl_resource_t [[hlsl::resource_class(Sampler)]] r; -} - -// CHECK: ClassTemplateDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 6]]:29 MyBuffer2 -// CHECK: TemplateTypeParmDecl 0x{{[0-9a-f]+}} col:19 typename depth 0 index 0 T -// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 4]]:29 struct MyBuffer2 definition -// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} col:29 implicit struct MyBuffer2 -// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:51 h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -template struct MyBuffer2 { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] h; -}; - -// CHECK: ClassTemplateSpecializationDecl 0x{{[0-9a-f]+}} line:[[# @LINE - 4]]:29 struct MyBuffer2 definition implicit_instantiation -// CHECK: TemplateArgument type 'float' -// CHECK: BuiltinType 0x{{[0-9a-f]+}} 'float' -// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} col:29 implicit struct MyBuffer2 -// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:51 h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -MyBuffer2 myBuffer2; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -ast-dump -o - %s | FileCheck %s + +// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} {{.*}} struct MyBuffer definition +// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:51 h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +struct MyBuffer { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] h; +}; + +// CHECK: VarDecl 0x{{[0-9a-f]+}} col:49 res '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(SRV)]] +__hlsl_resource_t [[hlsl::resource_class(SRV)]] res; + +// CHECK: FunctionDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 3]]:6 f 'void () +// CHECK: VarDecl 0x{{[0-9a-f]+}} col:55 r '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(Sampler)]] +void f() { + __hlsl_resource_t [[hlsl::resource_class(Sampler)]] r; +} + +// CHECK: ClassTemplateDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 6]]:29 MyBuffer2 +// CHECK: TemplateTypeParmDecl 0x{{[0-9a-f]+}} col:19 typename depth 0 index 0 T +// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 4]]:29 struct MyBuffer2 definition +// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} col:29 implicit struct MyBuffer2 +// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:51 h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +template struct MyBuffer2 { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] h; +}; + +// CHECK: ClassTemplateSpecializationDecl 0x{{[0-9a-f]+}} line:[[# @LINE - 4]]:29 struct MyBuffer2 definition implicit_instantiation +// CHECK: TemplateArgument type 'float' +// CHECK: BuiltinType 0x{{[0-9a-f]+}} 'float' +// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} col:29 implicit struct MyBuffer2 +// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:51 h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +MyBuffer2 myBuffer2; diff --git a/clang/test/ParserHLSL/hlsl_resource_class_attr_error.hlsl b/clang/test/ParserHLSL/hlsl_resource_class_attr_error.hlsl index 63e39daff949b4..a0a4da1dc2bf44 100644 --- a/clang/test/ParserHLSL/hlsl_resource_class_attr_error.hlsl +++ b/clang/test/ParserHLSL/hlsl_resource_class_attr_error.hlsl @@ -1,22 +1,22 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -o - %s -verify - -// expected-error at +1{{'resource_class' attribute cannot be applied to a declaration}} -[[hlsl::resource_class(UAV)]] __hlsl_resource_t e0; - -// expected-error at +1{{'resource_class' attribute takes one argument}} -__hlsl_resource_t [[hlsl::resource_class()]] e1; - -// expected-warning at +1{{ResourceClass attribute argument not supported: gibberish}} -__hlsl_resource_t [[hlsl::resource_class(gibberish)]] e2; - -// expected-warning at +1{{attribute 'resource_class' is already applied with different arguments}} -__hlsl_resource_t [[hlsl::resource_class(SRV)]] [[hlsl::resource_class(UAV)]] e3; - -// expected-warning at +1{{attribute 'resource_class' is already applied}} -__hlsl_resource_t [[hlsl::resource_class(SRV)]] [[hlsl::resource_class(SRV)]] e4; - -// expected-error at +1{{'resource_class' attribute takes one argument}} -__hlsl_resource_t [[hlsl::resource_class(SRV, "aa")]] e5; - -// expected-error at +1{{attribute 'resource_class' can be used only on HLSL intangible type '__hlsl_resource_t'}} -float [[hlsl::resource_class(UAV)]] e6; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -o - %s -verify + +// expected-error at +1{{'resource_class' attribute cannot be applied to a declaration}} +[[hlsl::resource_class(UAV)]] __hlsl_resource_t e0; + +// expected-error at +1{{'resource_class' attribute takes one argument}} +__hlsl_resource_t [[hlsl::resource_class()]] e1; + +// expected-warning at +1{{ResourceClass attribute argument not supported: gibberish}} +__hlsl_resource_t [[hlsl::resource_class(gibberish)]] e2; + +// expected-warning at +1{{attribute 'resource_class' is already applied with different arguments}} +__hlsl_resource_t [[hlsl::resource_class(SRV)]] [[hlsl::resource_class(UAV)]] e3; + +// expected-warning at +1{{attribute 'resource_class' is already applied}} +__hlsl_resource_t [[hlsl::resource_class(SRV)]] [[hlsl::resource_class(SRV)]] e4; + +// expected-error at +1{{'resource_class' attribute takes one argument}} +__hlsl_resource_t [[hlsl::resource_class(SRV, "aa")]] e5; + +// expected-error at +1{{attribute 'resource_class' can be used only on HLSL intangible type '__hlsl_resource_t'}} +float [[hlsl::resource_class(UAV)]] e6; diff --git a/clang/test/ParserHLSL/hlsl_resource_handle_attrs.hlsl b/clang/test/ParserHLSL/hlsl_resource_handle_attrs.hlsl index 38d27bc21e4aa8..8885e39237357d 100644 --- a/clang/test/ParserHLSL/hlsl_resource_handle_attrs.hlsl +++ b/clang/test/ParserHLSL/hlsl_resource_handle_attrs.hlsl @@ -1,21 +1,21 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -ast-dump -o - %s | FileCheck %s - -// CHECK: -ClassTemplateSpecializationDecl 0x{{[0-9a-f]+}} <> class RWBuffer definition implicit_instantiation -// CHECK: -TemplateArgument type 'float' -// CHECK: `-BuiltinType 0x{{[0-9a-f]+}} 'float' -// CHECK: -FieldDecl 0x{{[0-9a-f]+}} <> implicit h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::contained_type(float)]] -// CHECK: -HLSLResourceAttr 0x{{[0-9a-f]+}} <> Implicit TypedBuffer -RWBuffer Buffer1; - -// CHECK: -ClassTemplateSpecializationDecl 0x{{[0-9a-f]+}} <> class RasterizerOrderedBuffer definition implicit_instantiation -// CHECK: -TemplateArgument type 'vector' -// CHECK: `-ExtVectorType 0x{{[0-9a-f]+}} 'vector' 4 -// CHECK: `-BuiltinType 0x{{[0-9a-f]+}} 'float' -// CHECK: -FieldDecl 0x{{[0-9a-f]+}} <> implicit h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)] -// CHECK-SAME{LITERAL}: [[hlsl::is_rov]] -// CHECK-SAME{LITERAL}: [[hlsl::contained_type(vector)]] -// CHECK: -HLSLResourceAttr 0x{{[0-9a-f]+}} <> Implicit TypedBuffer -RasterizerOrderedBuffer > BufferArray3[4]; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -ast-dump -o - %s | FileCheck %s + +// CHECK: -ClassTemplateSpecializationDecl 0x{{[0-9a-f]+}} <> class RWBuffer definition implicit_instantiation +// CHECK: -TemplateArgument type 'float' +// CHECK: `-BuiltinType 0x{{[0-9a-f]+}} 'float' +// CHECK: -FieldDecl 0x{{[0-9a-f]+}} <> implicit h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::contained_type(float)]] +// CHECK: -HLSLResourceAttr 0x{{[0-9a-f]+}} <> Implicit TypedBuffer +RWBuffer Buffer1; + +// CHECK: -ClassTemplateSpecializationDecl 0x{{[0-9a-f]+}} <> class RasterizerOrderedBuffer definition implicit_instantiation +// CHECK: -TemplateArgument type 'vector' +// CHECK: `-ExtVectorType 0x{{[0-9a-f]+}} 'vector' 4 +// CHECK: `-BuiltinType 0x{{[0-9a-f]+}} 'float' +// CHECK: -FieldDecl 0x{{[0-9a-f]+}} <> implicit h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)] +// CHECK-SAME{LITERAL}: [[hlsl::is_rov]] +// CHECK-SAME{LITERAL}: [[hlsl::contained_type(vector)]] +// CHECK: -HLSLResourceAttr 0x{{[0-9a-f]+}} <> Implicit TypedBuffer +RasterizerOrderedBuffer > BufferArray3[4]; diff --git a/clang/test/Sema/aarch64-sve-vector-trig-ops.c b/clang/test/Sema/aarch64-sve-vector-trig-ops.c index 3fe6834be2e0b7..f853abcd3379fa 100644 --- a/clang/test/Sema/aarch64-sve-vector-trig-ops.c +++ b/clang/test/Sema/aarch64-sve-vector-trig-ops.c @@ -1,65 +1,65 @@ -// RUN: %clang_cc1 -triple aarch64 -target-feature +sve \ -// RUN: -disable-O0-optnone -o - -fsyntax-only %s -verify -// REQUIRES: aarch64-registered-target - -#include - -svfloat32_t test_asin_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_asin(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -svfloat32_t test_acos_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_acos(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -svfloat32_t test_atan_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_atan(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -svfloat32_t test_atan2_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_atan2(v, v); - // expected-error at -1 {{1st argument must be a floating point type}} -} - -svfloat32_t test_sin_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_sin(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -svfloat32_t test_cos_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_cos(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -svfloat32_t test_tan_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_tan(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -svfloat32_t test_sinh_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_sinh(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -svfloat32_t test_cosh_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_cosh(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -svfloat32_t test_tanh_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_tanh(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} +// RUN: %clang_cc1 -triple aarch64 -target-feature +sve \ +// RUN: -disable-O0-optnone -o - -fsyntax-only %s -verify +// REQUIRES: aarch64-registered-target + +#include + +svfloat32_t test_asin_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_asin(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +svfloat32_t test_acos_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_acos(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +svfloat32_t test_atan_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_atan(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +svfloat32_t test_atan2_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_atan2(v, v); + // expected-error at -1 {{1st argument must be a floating point type}} +} + +svfloat32_t test_sin_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_sin(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +svfloat32_t test_cos_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_cos(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +svfloat32_t test_tan_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_tan(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +svfloat32_t test_sinh_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_sinh(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +svfloat32_t test_cosh_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_cosh(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +svfloat32_t test_tanh_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_tanh(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} diff --git a/clang/test/Sema/riscv-rvv-vector-trig-ops.c b/clang/test/Sema/riscv-rvv-vector-trig-ops.c index 0aed1b2a099865..006c136f80332c 100644 --- a/clang/test/Sema/riscv-rvv-vector-trig-ops.c +++ b/clang/test/Sema/riscv-rvv-vector-trig-ops.c @@ -1,67 +1,67 @@ -// RUN: %clang_cc1 -triple riscv64 -target-feature +f -target-feature +d \ -// RUN: -target-feature +v -target-feature +zfh -target-feature +zvfh \ -// RUN: -disable-O0-optnone -o - -fsyntax-only %s -verify -// REQUIRES: riscv-registered-target - -#include - -vfloat32mf2_t test_asin_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_asin(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} - } - - vfloat32mf2_t test_acos_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_acos(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} - } - - vfloat32mf2_t test_atan_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_atan(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} - } - -vfloat32mf2_t test_atan2_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_atan2(v, v); - // expected-error at -1 {{1st argument must be a floating point type}} -} - -vfloat32mf2_t test_sin_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_sin(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -vfloat32mf2_t test_cos_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_cos(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -vfloat32mf2_t test_tan_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_tan(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -vfloat32mf2_t test_sinh_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_sinh(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} - } - - vfloat32mf2_t test_cosh_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_cosh(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} - } - - vfloat32mf2_t test_tanh_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_tanh(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} - } - +// RUN: %clang_cc1 -triple riscv64 -target-feature +f -target-feature +d \ +// RUN: -target-feature +v -target-feature +zfh -target-feature +zvfh \ +// RUN: -disable-O0-optnone -o - -fsyntax-only %s -verify +// REQUIRES: riscv-registered-target + +#include + +vfloat32mf2_t test_asin_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_asin(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} + } + + vfloat32mf2_t test_acos_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_acos(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} + } + + vfloat32mf2_t test_atan_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_atan(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} + } + +vfloat32mf2_t test_atan2_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_atan2(v, v); + // expected-error at -1 {{1st argument must be a floating point type}} +} + +vfloat32mf2_t test_sin_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_sin(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +vfloat32mf2_t test_cos_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_cos(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +vfloat32mf2_t test_tan_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_tan(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +vfloat32mf2_t test_sinh_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_sinh(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} + } + + vfloat32mf2_t test_cosh_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_cosh(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} + } + + vfloat32mf2_t test_tanh_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_tanh(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} + } + diff --git a/clang/test/SemaHLSL/Availability/avail-diag-default-compute.hlsl b/clang/test/SemaHLSL/Availability/avail-diag-default-compute.hlsl index 764b9e843f7f1c..b60fba62bdb000 100644 --- a/clang/test/SemaHLSL/Availability/avail-diag-default-compute.hlsl +++ b/clang/test/SemaHLSL/Availability/avail-diag-default-compute.hlsl @@ -1,119 +1,119 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute \ -// RUN: -fsyntax-only -verify %s - -__attribute__((availability(shadermodel, introduced = 6.5))) -float fx(float); // #fx - -__attribute__((availability(shadermodel, introduced = 6.6))) -half fx(half); // #fx_half - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) -float fy(float); // #fy - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) -float fz(float); // #fz - -float also_alive(float f) { - // expected-error@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #also_alive_fx_call - // expected-error@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #also_alive_fy_call - // expected-error@#also_alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #also_alive_fz_call - return 0; -} - -float alive(float f) { - // expected-error@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #alive_fx_call - // expected-error@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #alive_fy_call - // expected-error@#alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #alive_fz_call - - return also_alive(f); -} - -float also_dead(float f) { - // unreachable code - no errors expected - float A = fx(f); - float B = fy(f); - float C = fz(f); - return 0; -} - -float dead(float f) { - // unreachable code - no errors expected - float A = fx(f); - float B = fy(f); - float C = fz(f); - - return also_dead(f); -} - -template -T aliveTemp(T f) { - // expected-error@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #aliveTemp_fx_call - // expected-error@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #aliveTemp_fy_call - // expected-error@#aliveTemp_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #aliveTemp_fz_call - return 0; -} - -template T aliveTemp2(T f) { - // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} - // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} - // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - return fx(f); // #aliveTemp2_fx_call -} - -half test(half x) { - return aliveTemp2(x); -} - -float test(float x) { - return aliveTemp2(x); -} - -class MyClass -{ - float F; - float makeF() { - // expected-error@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(F); // #MyClass_makeF_fx_call - // expected-error@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(F); // #MyClass_makeF_fy_call - // expected-error@#MyClass_makeF_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(F); // #MyClass_makeF_fz_call - return 0; - } -}; - -[numthreads(4,1,1)] -float main() { - float f = 3; - MyClass C = { 1.0f }; - float a = alive(f); - float b = aliveTemp(f); // #aliveTemp_inst - float c = C.makeF(); - float d = test((float)1.0); - float e = test((half)1.0); - return a * b * c; -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute \ +// RUN: -fsyntax-only -verify %s + +__attribute__((availability(shadermodel, introduced = 6.5))) +float fx(float); // #fx + +__attribute__((availability(shadermodel, introduced = 6.6))) +half fx(half); // #fx_half + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) +float fy(float); // #fy + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) +float fz(float); // #fz + +float also_alive(float f) { + // expected-error@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #also_alive_fx_call + // expected-error@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #also_alive_fy_call + // expected-error@#also_alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #also_alive_fz_call + return 0; +} + +float alive(float f) { + // expected-error@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #alive_fx_call + // expected-error@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #alive_fy_call + // expected-error@#alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #alive_fz_call + + return also_alive(f); +} + +float also_dead(float f) { + // unreachable code - no errors expected + float A = fx(f); + float B = fy(f); + float C = fz(f); + return 0; +} + +float dead(float f) { + // unreachable code - no errors expected + float A = fx(f); + float B = fy(f); + float C = fz(f); + + return also_dead(f); +} + +template +T aliveTemp(T f) { + // expected-error@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #aliveTemp_fx_call + // expected-error@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #aliveTemp_fy_call + // expected-error@#aliveTemp_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #aliveTemp_fz_call + return 0; +} + +template T aliveTemp2(T f) { + // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} + // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} + // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + return fx(f); // #aliveTemp2_fx_call +} + +half test(half x) { + return aliveTemp2(x); +} + +float test(float x) { + return aliveTemp2(x); +} + +class MyClass +{ + float F; + float makeF() { + // expected-error@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(F); // #MyClass_makeF_fx_call + // expected-error@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(F); // #MyClass_makeF_fy_call + // expected-error@#MyClass_makeF_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(F); // #MyClass_makeF_fz_call + return 0; + } +}; + +[numthreads(4,1,1)] +float main() { + float f = 3; + MyClass C = { 1.0f }; + float a = alive(f); + float b = aliveTemp(f); // #aliveTemp_inst + float c = C.makeF(); + float d = test((float)1.0); + float e = test((half)1.0); + return a * b * c; +} diff --git a/clang/test/SemaHLSL/Availability/avail-diag-default-lib.hlsl b/clang/test/SemaHLSL/Availability/avail-diag-default-lib.hlsl index 6bfc8577670cc7..35b7c384f26cdd 100644 --- a/clang/test/SemaHLSL/Availability/avail-diag-default-lib.hlsl +++ b/clang/test/SemaHLSL/Availability/avail-diag-default-lib.hlsl @@ -1,180 +1,180 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library \ -// RUN: -fsyntax-only -verify %s - -__attribute__((availability(shadermodel, introduced = 6.5))) -float fx(float); // #fx - -__attribute__((availability(shadermodel, introduced = 6.6))) -half fx(half); // #fx_half - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) -float fy(float); // #fy - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) -float fz(float); // #fz - -float also_alive(float f) { - // expected-error@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #also_alive_fx_call - - // expected-error@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #also_alive_fy_call - - // expected-error@#also_alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #also_alive_fz_call - - return 0; -} - -float alive(float f) { - // expected-error@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #alive_fx_call - - // expected-error@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #alive_fy_call - - // expected-error@#alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #alive_fz_call - - return also_alive(f); -} - -float also_dead(float f) { - // unreachable code - no errors expected - float A = fx(f); - float B = fy(f); - float C = fz(f); - return 0; -} - -float dead(float f) { - // unreachable code - no errors expected - float A = fx(f); - float B = fy(f); - float C = fz(f); - return also_dead(f); -} - -template -T aliveTemp(T f) { - // expected-error@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #aliveTemp_fx_call - // expected-error@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #aliveTemp_fy_call - // expected-error@#aliveTemp_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #aliveTemp_fz_call - return 0; -} - -template T aliveTemp2(T f) { - // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} - // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} - // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - return fx(f); // #aliveTemp2_fx_call -} - -half test(half x) { - return aliveTemp2(x); -} - -float test(float x) { - return aliveTemp2(x); -} - -class MyClass -{ - float F; - float makeF() { - // expected-error@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(F); // #MyClass_makeF_fx_call - // expected-error@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(F); // #MyClass_makeF_fy_call - // expected-error@#MyClass_makeF_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(F); // #MyClass_makeF_fz_call - return 0; - } -}; - -// Exported function without body, not used -export void exportedFunctionUnused(float f); - -// Exported function with body, without export, not used -void exportedFunctionUnused(float f) { - // expected-error@#exportedFunctionUnused_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #exportedFunctionUnused_fx_call - - // API with shader-stage-specific availability in unused exported library function - // - no errors expected because the actual shader stage this function - // will be used in not known at this time - float B = fy(f); - float C = fz(f); -} - -// Exported function with body - called from main() which is a compute shader entry point -export void exportedFunctionUsed(float f) { - // expected-error@#exportedFunctionUsed_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #exportedFunctionUsed_fx_call - - // expected-error@#exportedFunctionUsed_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #exportedFunctionUsed_fy_call - - // expected-error@#exportedFunctionUsed_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #exportedFunctionUsed_fz_call -} - -namespace A { - namespace B { - export { - void exportedFunctionInNS(float x) { - // expected-error@#exportedFunctionInNS_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(x); // #exportedFunctionInNS_fx_call - - // API with shader-stage-specific availability in exported library function - // - no errors expected because the actual shader stage this function - // will be used in not known at this time - float B = fy(x); - float C = fz(x); - } - } - } -} - -// Shader entry point without body -[shader("compute")] -[numthreads(4,1,1)] -float main(); - -// Shader entry point with body -[shader("compute")] -[numthreads(4,1,1)] -float main() { - float f = 3; - MyClass C = { 1.0f }; - float a = alive(f); - float b = aliveTemp(f); // #aliveTemp_inst - float c = C.makeF(); - float d = test((float)1.0); - float e = test((half)1.0); - exportedFunctionUsed(1.0f); - return a * b * c; -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library \ +// RUN: -fsyntax-only -verify %s + +__attribute__((availability(shadermodel, introduced = 6.5))) +float fx(float); // #fx + +__attribute__((availability(shadermodel, introduced = 6.6))) +half fx(half); // #fx_half + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) +float fy(float); // #fy + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) +float fz(float); // #fz + +float also_alive(float f) { + // expected-error@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #also_alive_fx_call + + // expected-error@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #also_alive_fy_call + + // expected-error@#also_alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #also_alive_fz_call + + return 0; +} + +float alive(float f) { + // expected-error@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #alive_fx_call + + // expected-error@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #alive_fy_call + + // expected-error@#alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #alive_fz_call + + return also_alive(f); +} + +float also_dead(float f) { + // unreachable code - no errors expected + float A = fx(f); + float B = fy(f); + float C = fz(f); + return 0; +} + +float dead(float f) { + // unreachable code - no errors expected + float A = fx(f); + float B = fy(f); + float C = fz(f); + return also_dead(f); +} + +template +T aliveTemp(T f) { + // expected-error@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #aliveTemp_fx_call + // expected-error@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #aliveTemp_fy_call + // expected-error@#aliveTemp_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #aliveTemp_fz_call + return 0; +} + +template T aliveTemp2(T f) { + // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} + // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} + // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + return fx(f); // #aliveTemp2_fx_call +} + +half test(half x) { + return aliveTemp2(x); +} + +float test(float x) { + return aliveTemp2(x); +} + +class MyClass +{ + float F; + float makeF() { + // expected-error@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(F); // #MyClass_makeF_fx_call + // expected-error@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(F); // #MyClass_makeF_fy_call + // expected-error@#MyClass_makeF_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(F); // #MyClass_makeF_fz_call + return 0; + } +}; + +// Exported function without body, not used +export void exportedFunctionUnused(float f); + +// Exported function with body, without export, not used +void exportedFunctionUnused(float f) { + // expected-error@#exportedFunctionUnused_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #exportedFunctionUnused_fx_call + + // API with shader-stage-specific availability in unused exported library function + // - no errors expected because the actual shader stage this function + // will be used in not known at this time + float B = fy(f); + float C = fz(f); +} + +// Exported function with body - called from main() which is a compute shader entry point +export void exportedFunctionUsed(float f) { + // expected-error@#exportedFunctionUsed_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #exportedFunctionUsed_fx_call + + // expected-error@#exportedFunctionUsed_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #exportedFunctionUsed_fy_call + + // expected-error@#exportedFunctionUsed_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #exportedFunctionUsed_fz_call +} + +namespace A { + namespace B { + export { + void exportedFunctionInNS(float x) { + // expected-error@#exportedFunctionInNS_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(x); // #exportedFunctionInNS_fx_call + + // API with shader-stage-specific availability in exported library function + // - no errors expected because the actual shader stage this function + // will be used in not known at this time + float B = fy(x); + float C = fz(x); + } + } + } +} + +// Shader entry point without body +[shader("compute")] +[numthreads(4,1,1)] +float main(); + +// Shader entry point with body +[shader("compute")] +[numthreads(4,1,1)] +float main() { + float f = 3; + MyClass C = { 1.0f }; + float a = alive(f); + float b = aliveTemp(f); // #aliveTemp_inst + float c = C.makeF(); + float d = test((float)1.0); + float e = test((half)1.0); + exportedFunctionUsed(1.0f); + return a * b * c; +} diff --git a/clang/test/SemaHLSL/Availability/avail-diag-relaxed-compute.hlsl b/clang/test/SemaHLSL/Availability/avail-diag-relaxed-compute.hlsl index 65836c55821d77..40687983839303 100644 --- a/clang/test/SemaHLSL/Availability/avail-diag-relaxed-compute.hlsl +++ b/clang/test/SemaHLSL/Availability/avail-diag-relaxed-compute.hlsl @@ -1,119 +1,119 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute \ -// RUN: -fsyntax-only -Wno-error=hlsl-availability -verify %s - -__attribute__((availability(shadermodel, introduced = 6.5))) -float fx(float); // #fx - -__attribute__((availability(shadermodel, introduced = 6.6))) -half fx(half); // #fx_half - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) -float fy(float); // #fy - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) -float fz(float); // #fz - -float also_alive(float f) { - // expected-warning@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #also_alive_fx_call - // expected-warning@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #also_alive_fy_call - // expected-warning@#also_alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #also_alive_fz_call - return 0; -} - -float alive(float f) { - // expected-warning@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #alive_fx_call - // expected-warning@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #alive_fy_call - // expected-warning@#alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #alive_fz_call - - return also_alive(f); -} - -float also_dead(float f) { - // unreachable code - no errors expected - float A = fx(f); - float B = fy(f); - float C = fz(f); - return 0; -} - -float dead(float f) { - // unreachable code - no errors expected - float A = fx(f); - float B = fy(f); - float C = fz(f); - - return also_dead(f); -} - -template -T aliveTemp(T f) { - // expected-warning@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #aliveTemp_fx_call - // expected-warning@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #aliveTemp_fy_call - // expected-warning@#aliveTemp_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #aliveTemp_fz_call - return 0; -} - -template T aliveTemp2(T f) { - // expected-warning@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} - // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} - // expected-warning@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - return fx(f); // #aliveTemp2_fx_call -} - -half test(half x) { - return aliveTemp2(x); -} - -float test(float x) { - return aliveTemp2(x); -} - -class MyClass -{ - float F; - float makeF() { - // expected-warning@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(F); // #MyClass_makeF_fx_call - // expected-warning@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(F); // #MyClass_makeF_fy_call - // expected-warning@#MyClass_makeF_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(F); // #MyClass_makeF_fz_call - return 0; - } -}; - -[numthreads(4,1,1)] -float main() { - float f = 3; - MyClass C = { 1.0f }; - float a = alive(f); - float b = aliveTemp(f); // #aliveTemp_inst - float c = C.makeF(); - float d = test((float)1.0); - float e = test((half)1.0); - return a * b * c; -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute \ +// RUN: -fsyntax-only -Wno-error=hlsl-availability -verify %s + +__attribute__((availability(shadermodel, introduced = 6.5))) +float fx(float); // #fx + +__attribute__((availability(shadermodel, introduced = 6.6))) +half fx(half); // #fx_half + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) +float fy(float); // #fy + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) +float fz(float); // #fz + +float also_alive(float f) { + // expected-warning@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #also_alive_fx_call + // expected-warning@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #also_alive_fy_call + // expected-warning@#also_alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #also_alive_fz_call + return 0; +} + +float alive(float f) { + // expected-warning@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #alive_fx_call + // expected-warning@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #alive_fy_call + // expected-warning@#alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #alive_fz_call + + return also_alive(f); +} + +float also_dead(float f) { + // unreachable code - no errors expected + float A = fx(f); + float B = fy(f); + float C = fz(f); + return 0; +} + +float dead(float f) { + // unreachable code - no errors expected + float A = fx(f); + float B = fy(f); + float C = fz(f); + + return also_dead(f); +} + +template +T aliveTemp(T f) { + // expected-warning@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #aliveTemp_fx_call + // expected-warning@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #aliveTemp_fy_call + // expected-warning@#aliveTemp_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #aliveTemp_fz_call + return 0; +} + +template T aliveTemp2(T f) { + // expected-warning@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} + // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} + // expected-warning@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + return fx(f); // #aliveTemp2_fx_call +} + +half test(half x) { + return aliveTemp2(x); +} + +float test(float x) { + return aliveTemp2(x); +} + +class MyClass +{ + float F; + float makeF() { + // expected-warning@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(F); // #MyClass_makeF_fx_call + // expected-warning@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(F); // #MyClass_makeF_fy_call + // expected-warning@#MyClass_makeF_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(F); // #MyClass_makeF_fz_call + return 0; + } +}; + +[numthreads(4,1,1)] +float main() { + float f = 3; + MyClass C = { 1.0f }; + float a = alive(f); + float b = aliveTemp(f); // #aliveTemp_inst + float c = C.makeF(); + float d = test((float)1.0); + float e = test((half)1.0); + return a * b * c; +} diff --git a/clang/test/SemaHLSL/Availability/avail-diag-relaxed-lib.hlsl b/clang/test/SemaHLSL/Availability/avail-diag-relaxed-lib.hlsl index 4c9783138f6701..a23e91a546b167 100644 --- a/clang/test/SemaHLSL/Availability/avail-diag-relaxed-lib.hlsl +++ b/clang/test/SemaHLSL/Availability/avail-diag-relaxed-lib.hlsl @@ -1,162 +1,162 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library \ -// RUN: -fsyntax-only -Wno-error=hlsl-availability -verify %s - -__attribute__((availability(shadermodel, introduced = 6.5))) -float fx(float); // #fx - -__attribute__((availability(shadermodel, introduced = 6.6))) -half fx(half); // #fx_half - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) -float fy(float); // #fy - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) -float fz(float); // #fz - -float also_alive(float f) { - // expected-warning@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #also_alive_fx_call - - // expected-warning@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #also_alive_fy_call - - // expected-warning@#also_alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #also_alive_fz_call - - return 0; -} - -float alive(float f) { - // expected-warning@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #alive_fx_call - - // expected-warning@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #alive_fy_call - - // expected-warning@#alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #alive_fz_call - - return also_alive(f); -} - -float also_dead(float f) { - // unreachable code - no errors expected - float A = fx(f); - float B = fy(f); - float C = fz(f); - return 0; -} - -float dead(float f) { - // unreachable code - no errors expected - float A = fx(f); - float B = fy(f); - float C = fz(f); - return also_dead(f); -} - -template -T aliveTemp(T f) { - // expected-warning@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #aliveTemp_fx_call - // expected-warning@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #aliveTemp_fy_call - // expected-warning@#aliveTemp_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #aliveTemp_fz_call - return 0; -} - -template T aliveTemp2(T f) { - // expected-warning@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} - // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} - // expected-warning@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - return fx(f); // #aliveTemp2_fx_call -} - -half test(half x) { - return aliveTemp2(x); -} - -float test(float x) { - return aliveTemp2(x); -} - -class MyClass -{ - float F; - float makeF() { - // expected-warning@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(F); // #MyClass_makeF_fx_call - // expected-warning@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(F); // #MyClass_makeF_fy_call - // expected-warning@#MyClass_makeF_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(F); // #MyClass_makeF_fz_call - return 0; - } -}; - -// Exported function without body, not used -export void exportedFunctionUnused(float f); - -// Exported function with body, without export, not used -void exportedFunctionUnused(float f) { - // expected-warning@#exportedFunctionUnused_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #exportedFunctionUnused_fx_call - - // API with shader-stage-specific availability in unused exported library function - // - no errors expected because the actual shader stage this function - // will be used in not known at this time - float B = fy(f); - float C = fz(f); -} - -// Exported function with body - called from main() which is a compute shader entry point -export void exportedFunctionUsed(float f) { - // expected-warning@#exportedFunctionUsed_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #exportedFunctionUsed_fx_call - - // expected-warning@#exportedFunctionUsed_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #exportedFunctionUsed_fy_call - - // expected-warning@#exportedFunctionUsed_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #exportedFunctionUsed_fz_call -} - -// Shader entry point without body -[shader("compute")] -[numthreads(4,1,1)] -float main(); - -// Shader entry point with body -[shader("compute")] -[numthreads(4,1,1)] -float main() { - float f = 3; - MyClass C = { 1.0f }; - float a = alive(f); - float b = aliveTemp(f); // #aliveTemp_inst - float c = C.makeF(); - float d = test((float)1.0); - float e = test((half)1.0); - exportedFunctionUsed(1.0f); - return a * b * c; -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library \ +// RUN: -fsyntax-only -Wno-error=hlsl-availability -verify %s + +__attribute__((availability(shadermodel, introduced = 6.5))) +float fx(float); // #fx + +__attribute__((availability(shadermodel, introduced = 6.6))) +half fx(half); // #fx_half + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) +float fy(float); // #fy + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) +float fz(float); // #fz + +float also_alive(float f) { + // expected-warning@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #also_alive_fx_call + + // expected-warning@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #also_alive_fy_call + + // expected-warning@#also_alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #also_alive_fz_call + + return 0; +} + +float alive(float f) { + // expected-warning@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #alive_fx_call + + // expected-warning@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #alive_fy_call + + // expected-warning@#alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #alive_fz_call + + return also_alive(f); +} + +float also_dead(float f) { + // unreachable code - no errors expected + float A = fx(f); + float B = fy(f); + float C = fz(f); + return 0; +} + +float dead(float f) { + // unreachable code - no errors expected + float A = fx(f); + float B = fy(f); + float C = fz(f); + return also_dead(f); +} + +template +T aliveTemp(T f) { + // expected-warning@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #aliveTemp_fx_call + // expected-warning@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #aliveTemp_fy_call + // expected-warning@#aliveTemp_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #aliveTemp_fz_call + return 0; +} + +template T aliveTemp2(T f) { + // expected-warning@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} + // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} + // expected-warning@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + return fx(f); // #aliveTemp2_fx_call +} + +half test(half x) { + return aliveTemp2(x); +} + +float test(float x) { + return aliveTemp2(x); +} + +class MyClass +{ + float F; + float makeF() { + // expected-warning@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(F); // #MyClass_makeF_fx_call + // expected-warning@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(F); // #MyClass_makeF_fy_call + // expected-warning@#MyClass_makeF_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(F); // #MyClass_makeF_fz_call + return 0; + } +}; + +// Exported function without body, not used +export void exportedFunctionUnused(float f); + +// Exported function with body, without export, not used +void exportedFunctionUnused(float f) { + // expected-warning@#exportedFunctionUnused_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #exportedFunctionUnused_fx_call + + // API with shader-stage-specific availability in unused exported library function + // - no errors expected because the actual shader stage this function + // will be used in not known at this time + float B = fy(f); + float C = fz(f); +} + +// Exported function with body - called from main() which is a compute shader entry point +export void exportedFunctionUsed(float f) { + // expected-warning@#exportedFunctionUsed_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #exportedFunctionUsed_fx_call + + // expected-warning@#exportedFunctionUsed_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #exportedFunctionUsed_fy_call + + // expected-warning@#exportedFunctionUsed_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #exportedFunctionUsed_fz_call +} + +// Shader entry point without body +[shader("compute")] +[numthreads(4,1,1)] +float main(); + +// Shader entry point with body +[shader("compute")] +[numthreads(4,1,1)] +float main() { + float f = 3; + MyClass C = { 1.0f }; + float a = alive(f); + float b = aliveTemp(f); // #aliveTemp_inst + float c = C.makeF(); + float d = test((float)1.0); + float e = test((half)1.0); + exportedFunctionUsed(1.0f); + return a * b * c; +} diff --git a/clang/test/SemaHLSL/Availability/avail-diag-strict-compute.hlsl b/clang/test/SemaHLSL/Availability/avail-diag-strict-compute.hlsl index b67e10c9a9017a..a8783c10cbabca 100644 --- a/clang/test/SemaHLSL/Availability/avail-diag-strict-compute.hlsl +++ b/clang/test/SemaHLSL/Availability/avail-diag-strict-compute.hlsl @@ -1,129 +1,129 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute \ -// RUN: -fhlsl-strict-availability -fsyntax-only -verify %s - -__attribute__((availability(shadermodel, introduced = 6.5))) -float fx(float); // #fx - -__attribute__((availability(shadermodel, introduced = 6.6))) -half fx(half); // #fx_half - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) -float fy(float); // #fy - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) -float fz(float); // #fz - -float also_alive(float f) { - // expected-error@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #also_alive_fx_call - // expected-error@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #also_alive_fy_call - // expected-error@#also_alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #also_alive_fz_call - return 0; -} - -float alive(float f) { - // expected-error@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #alive_fx_call - // expected-error@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #alive_fy_call - // expected-error@#alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #alive_fz_call - - return also_alive(f); -} - -float also_dead(float f) { - // expected-error@#also_dead_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #also_dead_fx_call - // expected-error@#also_dead_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #also_dead_fy_call - // expected-error@#also_dead_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #also_dead_fz_call - return 0; -} - -float dead(float f) { - // expected-error@#dead_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #dead_fx_call - // expected-error@#dead_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #dead_fy_call - // expected-error@#dead_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #dead_fz_call - - return also_dead(f); -} - -template -T aliveTemp(T f) { - // expected-error@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#aliveTemp_inst {{in instantiation of function template specialization 'aliveTemp' requested here}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #aliveTemp_fx_call - // expected-error@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #aliveTemp_fy_call - // expected-error@#aliveTemp_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #aliveTemp_fz_call - return 0; -} - -template T aliveTemp2(T f) { - // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} - // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} - // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - return fx(f); // #aliveTemp2_fx_call -} - -half test(half x) { - return aliveTemp2(x); // expected-note {{in instantiation of function template specialization 'aliveTemp2' requested here}} -} - -float test(float x) { - return aliveTemp2(x); // expected-note {{in instantiation of function template specialization 'aliveTemp2' requested here}} -} - -class MyClass -{ - float F; - float makeF() { - // expected-error@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(F); // #MyClass_makeF_fx_call - // expected-error@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(F); // #MyClass_makeF_fy_call - // expected-error@#MyClass_makeF_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(F); // #MyClass_makeF_fz_call - } -}; - -[numthreads(4,1,1)] -float main() { - float f = 3; - MyClass C = { 1.0f }; - float a = alive(f); - float b = aliveTemp(f); // #aliveTemp_inst - float c = C.makeF(); - float d = test((float)1.0); - float e = test((half)1.0); - return a * b * c; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute \ +// RUN: -fhlsl-strict-availability -fsyntax-only -verify %s + +__attribute__((availability(shadermodel, introduced = 6.5))) +float fx(float); // #fx + +__attribute__((availability(shadermodel, introduced = 6.6))) +half fx(half); // #fx_half + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) +float fy(float); // #fy + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) +float fz(float); // #fz + +float also_alive(float f) { + // expected-error@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #also_alive_fx_call + // expected-error@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #also_alive_fy_call + // expected-error@#also_alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #also_alive_fz_call + return 0; +} + +float alive(float f) { + // expected-error@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #alive_fx_call + // expected-error@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #alive_fy_call + // expected-error@#alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #alive_fz_call + + return also_alive(f); +} + +float also_dead(float f) { + // expected-error@#also_dead_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #also_dead_fx_call + // expected-error@#also_dead_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #also_dead_fy_call + // expected-error@#also_dead_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #also_dead_fz_call + return 0; +} + +float dead(float f) { + // expected-error@#dead_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #dead_fx_call + // expected-error@#dead_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #dead_fy_call + // expected-error@#dead_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #dead_fz_call + + return also_dead(f); +} + +template +T aliveTemp(T f) { + // expected-error@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#aliveTemp_inst {{in instantiation of function template specialization 'aliveTemp' requested here}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #aliveTemp_fx_call + // expected-error@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #aliveTemp_fy_call + // expected-error@#aliveTemp_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #aliveTemp_fz_call + return 0; +} + +template T aliveTemp2(T f) { + // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} + // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} + // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + return fx(f); // #aliveTemp2_fx_call +} + +half test(half x) { + return aliveTemp2(x); // expected-note {{in instantiation of function template specialization 'aliveTemp2' requested here}} +} + +float test(float x) { + return aliveTemp2(x); // expected-note {{in instantiation of function template specialization 'aliveTemp2' requested here}} +} + +class MyClass +{ + float F; + float makeF() { + // expected-error@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(F); // #MyClass_makeF_fx_call + // expected-error@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(F); // #MyClass_makeF_fy_call + // expected-error@#MyClass_makeF_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(F); // #MyClass_makeF_fz_call + } +}; + +[numthreads(4,1,1)] +float main() { + float f = 3; + MyClass C = { 1.0f }; + float a = alive(f); + float b = aliveTemp(f); // #aliveTemp_inst + float c = C.makeF(); + float d = test((float)1.0); + float e = test((half)1.0); + return a * b * c; } \ No newline at end of file diff --git a/clang/test/SemaHLSL/Availability/avail-diag-strict-lib.hlsl b/clang/test/SemaHLSL/Availability/avail-diag-strict-lib.hlsl index c7be5afbc2d22f..0fffbc96dac194 100644 --- a/clang/test/SemaHLSL/Availability/avail-diag-strict-lib.hlsl +++ b/clang/test/SemaHLSL/Availability/avail-diag-strict-lib.hlsl @@ -1,192 +1,192 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library \ -// RUN: -fhlsl-strict-availability -fsyntax-only -verify %s - -__attribute__((availability(shadermodel, introduced = 6.5))) -float fx(float); // #fx - -__attribute__((availability(shadermodel, introduced = 6.6))) -half fx(half); // #fx_half - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) -float fy(float); // #fy - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) -float fz(float); // #fz - -// FIXME: all diagnostics marked as FUTURE will come alive when HLSL default -// diagnostic mode is implemented in a future PR which will verify calls in -// all functions that are reachable from the shader library entry points - -float also_alive(float f) { - // expected-error@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #also_alive_fx_call - - // expected-error@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #also_alive_fy_call - - // expected-error@#also_alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #also_alive_fz_call - - return 0; -} - -float alive(float f) { - // expected-error@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #alive_fx_call - - // expected-error@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #alive_fy_call - - // expected-error@#alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #alive_fz_call - - return also_alive(f); -} - -float also_dead(float f) { - // expected-error@#also_dead_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #also_dead_fx_call - - // Call to environment-specific function from an unreachable function - // in a shader library - no diagnostic expected. - float B = fy(f); // #also_dead_fy_call - - // Call to environment-specific function from an unreachable function - // in a shader library - no diagnostic expected. - float C = fz(f); // #also_dead_fz_call - return 0; -} - -float dead(float f) { - // expected-error@#dead_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #dead_fx_call - - // Call to environment-specific function from an unreachable function - // in a shader library - no diagnostic expected. - float B = fy(f); // #dead_fy_call - - // Call to environment-specific function from an unreachable function - // in a shader library - no diagnostic expected. - float C = fz(f); // #dead_fz_call - - return also_dead(f); -} - -template -T aliveTemp(T f) { - // expected-error@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#aliveTemp_inst {{in instantiation of function template specialization 'aliveTemp' requested here}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #aliveTemp_fx_call - // expected-error@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #aliveTemp_fy_call - // expected-error@#aliveTemp_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #aliveTemp_fz_call - return 0; -} - -template T aliveTemp2(T f) { - // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} - // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} - // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - return fx(f); // #aliveTemp2_fx_call -} - -half test(half x) { - return aliveTemp2(x); // expected-note {{in instantiation of function template specialization 'aliveTemp2' requested here}} -} - -float test(float x) { - return aliveTemp2(x); // expected-note {{in instantiation of function template specialization 'aliveTemp2' requested here}} -} - -class MyClass -{ - float F; - float makeF() { - // expected-error@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(F); // #MyClass_makeF_fx_call - // expected-error@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(F); // #MyClass_makeF_fy_call - // expected-error@#MyClass_makeF_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(F); // #MyClass_makeF_fz_call - } -}; - -// Exported function without body, not used -export void exportedFunctionUnused(float f); - -// Exported function with body, without export, not used -void exportedFunctionUnused(float f) { - // expected-error@#exportedFunctionUnused_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #exportedFunctionUnused_fx_call - - // API with shader-stage-specific availability in unused exported library function - // - no errors expected because the actual shader stage this function - // will be used in not known at this time - float B = fy(f); - float C = fz(f); -} - -// Exported function with body - called from main() which is a compute shader entry point -export void exportedFunctionUsed(float f) { - // expected-error@#exportedFunctionUsed_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #exportedFunctionUsed_fx_call - - // expected-error@#exportedFunctionUsed_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #exportedFunctionUsed_fy_call - - // expected-error@#exportedFunctionUsed_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #exportedFunctionUsed_fz_call -} - -namespace A { - namespace B { - export { - void exportedFunctionInNS(float x) { - // expected-error@#exportedFunctionInNS_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(x); // #exportedFunctionInNS_fx_call - - // API with shader-stage-specific availability in exported library function - // - no errors expected because the actual shader stage this function - // will be used in not known at this time - float B = fy(x); - float C = fz(x); - } - } - } -} - -[shader("compute")] -[numthreads(4,1,1)] -float main() { - float f = 3; - MyClass C = { 1.0f }; - float a = alive(f);float b = aliveTemp(f); // #aliveTemp_inst - float c = C.makeF(); - float d = test((float)1.0); - float e = test((half)1.0); - exportedFunctionUsed(1.0f); - return a * b * c; -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library \ +// RUN: -fhlsl-strict-availability -fsyntax-only -verify %s + +__attribute__((availability(shadermodel, introduced = 6.5))) +float fx(float); // #fx + +__attribute__((availability(shadermodel, introduced = 6.6))) +half fx(half); // #fx_half + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) +float fy(float); // #fy + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) +float fz(float); // #fz + +// FIXME: all diagnostics marked as FUTURE will come alive when HLSL default +// diagnostic mode is implemented in a future PR which will verify calls in +// all functions that are reachable from the shader library entry points + +float also_alive(float f) { + // expected-error@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #also_alive_fx_call + + // expected-error@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #also_alive_fy_call + + // expected-error@#also_alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #also_alive_fz_call + + return 0; +} + +float alive(float f) { + // expected-error@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #alive_fx_call + + // expected-error@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #alive_fy_call + + // expected-error@#alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #alive_fz_call + + return also_alive(f); +} + +float also_dead(float f) { + // expected-error@#also_dead_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #also_dead_fx_call + + // Call to environment-specific function from an unreachable function + // in a shader library - no diagnostic expected. + float B = fy(f); // #also_dead_fy_call + + // Call to environment-specific function from an unreachable function + // in a shader library - no diagnostic expected. + float C = fz(f); // #also_dead_fz_call + return 0; +} + +float dead(float f) { + // expected-error@#dead_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #dead_fx_call + + // Call to environment-specific function from an unreachable function + // in a shader library - no diagnostic expected. + float B = fy(f); // #dead_fy_call + + // Call to environment-specific function from an unreachable function + // in a shader library - no diagnostic expected. + float C = fz(f); // #dead_fz_call + + return also_dead(f); +} + +template +T aliveTemp(T f) { + // expected-error@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#aliveTemp_inst {{in instantiation of function template specialization 'aliveTemp' requested here}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #aliveTemp_fx_call + // expected-error@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #aliveTemp_fy_call + // expected-error@#aliveTemp_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #aliveTemp_fz_call + return 0; +} + +template T aliveTemp2(T f) { + // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} + // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} + // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + return fx(f); // #aliveTemp2_fx_call +} + +half test(half x) { + return aliveTemp2(x); // expected-note {{in instantiation of function template specialization 'aliveTemp2' requested here}} +} + +float test(float x) { + return aliveTemp2(x); // expected-note {{in instantiation of function template specialization 'aliveTemp2' requested here}} +} + +class MyClass +{ + float F; + float makeF() { + // expected-error@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(F); // #MyClass_makeF_fx_call + // expected-error@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(F); // #MyClass_makeF_fy_call + // expected-error@#MyClass_makeF_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(F); // #MyClass_makeF_fz_call + } +}; + +// Exported function without body, not used +export void exportedFunctionUnused(float f); + +// Exported function with body, without export, not used +void exportedFunctionUnused(float f) { + // expected-error@#exportedFunctionUnused_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #exportedFunctionUnused_fx_call + + // API with shader-stage-specific availability in unused exported library function + // - no errors expected because the actual shader stage this function + // will be used in not known at this time + float B = fy(f); + float C = fz(f); +} + +// Exported function with body - called from main() which is a compute shader entry point +export void exportedFunctionUsed(float f) { + // expected-error@#exportedFunctionUsed_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #exportedFunctionUsed_fx_call + + // expected-error@#exportedFunctionUsed_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #exportedFunctionUsed_fy_call + + // expected-error@#exportedFunctionUsed_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #exportedFunctionUsed_fz_call +} + +namespace A { + namespace B { + export { + void exportedFunctionInNS(float x) { + // expected-error@#exportedFunctionInNS_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(x); // #exportedFunctionInNS_fx_call + + // API with shader-stage-specific availability in exported library function + // - no errors expected because the actual shader stage this function + // will be used in not known at this time + float B = fy(x); + float C = fz(x); + } + } + } +} + +[shader("compute")] +[numthreads(4,1,1)] +float main() { + float f = 3; + MyClass C = { 1.0f }; + float a = alive(f);float b = aliveTemp(f); // #aliveTemp_inst + float c = C.makeF(); + float d = test((float)1.0); + float e = test((half)1.0); + exportedFunctionUsed(1.0f); + return a * b * c; +} diff --git a/clang/test/SemaHLSL/Availability/avail-lib-multiple-stages.hlsl b/clang/test/SemaHLSL/Availability/avail-lib-multiple-stages.hlsl index b56ab8fe4526ba..bfefc9b116a64f 100644 --- a/clang/test/SemaHLSL/Availability/avail-lib-multiple-stages.hlsl +++ b/clang/test/SemaHLSL/Availability/avail-lib-multiple-stages.hlsl @@ -1,57 +1,57 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library \ -// RUN: -fsyntax-only -verify %s - -__attribute__((availability(shadermodel, introduced = 6.5))) -float fx(float); // #fx - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) -float fy(float); // #fy - -__attribute__((availability(shadermodel, introduced = 5.0, environment = compute))) -float fz(float); // #fz - - -void F(float f) { - // Make sure we only get this error once, even though this function is scanned twice - once - // in compute shader context and once in pixel shader context. - // expected-error@#fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #fx_call - - // expected-error@#fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #fy_call - - // expected-error@#fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 5.0 in compute environment here, but the deployment target is Shader Model 6.0 pixel environment}} - float X = fz(f); // #fz_call -} - -void deadCode(float f) { - // no diagnostics expected under default diagnostic mode - float A = fx(f); - float B = fy(f); - float X = fz(f); -} - -// Pixel shader -[shader("pixel")] -void mainPixel() { - F(1.0); -} - -// First Compute shader -[shader("compute")] -[numthreads(4,1,1)] -void mainCompute1() { - F(2.0); -} - -// Second compute shader to make sure we do not get duplicate messages if F is called -// from multiple entry points. -[shader("compute")] -[numthreads(4,1,1)] -void mainCompute2() { - F(3.0); -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library \ +// RUN: -fsyntax-only -verify %s + +__attribute__((availability(shadermodel, introduced = 6.5))) +float fx(float); // #fx + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) +float fy(float); // #fy + +__attribute__((availability(shadermodel, introduced = 5.0, environment = compute))) +float fz(float); // #fz + + +void F(float f) { + // Make sure we only get this error once, even though this function is scanned twice - once + // in compute shader context and once in pixel shader context. + // expected-error@#fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #fx_call + + // expected-error@#fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #fy_call + + // expected-error@#fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 5.0 in compute environment here, but the deployment target is Shader Model 6.0 pixel environment}} + float X = fz(f); // #fz_call +} + +void deadCode(float f) { + // no diagnostics expected under default diagnostic mode + float A = fx(f); + float B = fy(f); + float X = fz(f); +} + +// Pixel shader +[shader("pixel")] +void mainPixel() { + F(1.0); +} + +// First Compute shader +[shader("compute")] +[numthreads(4,1,1)] +void mainCompute1() { + F(2.0); +} + +// Second compute shader to make sure we do not get duplicate messages if F is called +// from multiple entry points. +[shader("compute")] +[numthreads(4,1,1)] +void mainCompute2() { + F(3.0); +} diff --git a/clang/test/SemaHLSL/BuiltIns/StructuredBuffers.hlsl b/clang/test/SemaHLSL/BuiltIns/StructuredBuffers.hlsl index a472d5519dc51f..1ec56542113d90 100644 --- a/clang/test/SemaHLSL/BuiltIns/StructuredBuffers.hlsl +++ b/clang/test/SemaHLSL/BuiltIns/StructuredBuffers.hlsl @@ -1,19 +1,19 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -fsyntax-only -verify %s - -typedef vector float3; - -StructuredBuffer Buffer; - -// expected-error at +2 {{class template 'StructuredBuffer' requires template arguments}} -// expected-note@*:* {{template declaration from hidden source: template class StructuredBuffer}} -StructuredBuffer BufferErr1; - -// expected-error at +2 {{too few template arguments for class template 'StructuredBuffer'}} -// expected-note@*:* {{template declaration from hidden source: template class StructuredBuffer}} -StructuredBuffer<> BufferErr2; - -[numthreads(1,1,1)] -void main() { - (void)Buffer.h; // expected-error {{'h' is a private member of 'hlsl::StructuredBuffer>'}} - // expected-note@* {{implicitly declared private here}} -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -fsyntax-only -verify %s + +typedef vector float3; + +StructuredBuffer Buffer; + +// expected-error at +2 {{class template 'StructuredBuffer' requires template arguments}} +// expected-note@*:* {{template declaration from hidden source: template class StructuredBuffer}} +StructuredBuffer BufferErr1; + +// expected-error at +2 {{too few template arguments for class template 'StructuredBuffer'}} +// expected-note@*:* {{template declaration from hidden source: template class StructuredBuffer}} +StructuredBuffer<> BufferErr2; + +[numthreads(1,1,1)] +void main() { + (void)Buffer.h; // expected-error {{'h' is a private member of 'hlsl::StructuredBuffer>'}} + // expected-note@* {{implicitly declared private here}} +} diff --git a/clang/test/SemaHLSL/BuiltIns/cross-errors.hlsl b/clang/test/SemaHLSL/BuiltIns/cross-errors.hlsl index 423f5bac9471f4..354e7abb8a31eb 100644 --- a/clang/test/SemaHLSL/BuiltIns/cross-errors.hlsl +++ b/clang/test/SemaHLSL/BuiltIns/cross-errors.hlsl @@ -1,43 +1,43 @@ -// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -disable-llvm-passes -verify - -void test_too_few_arg() -{ - return __builtin_hlsl_cross(); - // expected-error at -1 {{too few arguments to function call, expected 2, have 0}} -} - -void test_too_many_arg(float3 p0) -{ - return __builtin_hlsl_cross(p0, p0, p0); - // expected-error at -1 {{too many arguments to function call, expected 2, have 3}} -} - -bool builtin_bool_to_float_type_promotion(bool p1) -{ - return __builtin_hlsl_cross(p1, p1); - // expected-error at -1 {{passing 'bool' to parameter of incompatible type 'float'}} -} - -bool builtin_cross_int_to_float_promotion(int p1) -{ - return __builtin_hlsl_cross(p1, p1); - // expected-error at -1 {{passing 'int' to parameter of incompatible type 'float'}} -} - -bool2 builtin_cross_int2_to_float2_promotion(int2 p1) -{ - return __builtin_hlsl_cross(p1, p1); - // expected-error at -1 {{passing 'int2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} -} - -float2 builtin_cross_float2(float2 p1, float2 p2) -{ - return __builtin_hlsl_cross(p1, p2); - // expected-error at -1 {{too many elements in vector operand (expected 3 elements, have 2)}} -} - -float3 builtin_cross_float3_int3(float3 p1, int3 p2) -{ - return __builtin_hlsl_cross(p1, p2); - // expected-error at -1 {{all arguments to '__builtin_hlsl_cross' must have the same type}} -} +// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -disable-llvm-passes -verify + +void test_too_few_arg() +{ + return __builtin_hlsl_cross(); + // expected-error at -1 {{too few arguments to function call, expected 2, have 0}} +} + +void test_too_many_arg(float3 p0) +{ + return __builtin_hlsl_cross(p0, p0, p0); + // expected-error at -1 {{too many arguments to function call, expected 2, have 3}} +} + +bool builtin_bool_to_float_type_promotion(bool p1) +{ + return __builtin_hlsl_cross(p1, p1); + // expected-error at -1 {{passing 'bool' to parameter of incompatible type 'float'}} +} + +bool builtin_cross_int_to_float_promotion(int p1) +{ + return __builtin_hlsl_cross(p1, p1); + // expected-error at -1 {{passing 'int' to parameter of incompatible type 'float'}} +} + +bool2 builtin_cross_int2_to_float2_promotion(int2 p1) +{ + return __builtin_hlsl_cross(p1, p1); + // expected-error at -1 {{passing 'int2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} +} + +float2 builtin_cross_float2(float2 p1, float2 p2) +{ + return __builtin_hlsl_cross(p1, p2); + // expected-error at -1 {{too many elements in vector operand (expected 3 elements, have 2)}} +} + +float3 builtin_cross_float3_int3(float3 p1, int3 p2) +{ + return __builtin_hlsl_cross(p1, p2); + // expected-error at -1 {{all arguments to '__builtin_hlsl_cross' must have the same type}} +} diff --git a/clang/test/SemaHLSL/BuiltIns/half-float-only-errors2.hlsl b/clang/test/SemaHLSL/BuiltIns/half-float-only-errors2.hlsl index bfbd8b28257a3b..b876a8e84cb3ac 100644 --- a/clang/test/SemaHLSL/BuiltIns/half-float-only-errors2.hlsl +++ b/clang/test/SemaHLSL/BuiltIns/half-float-only-errors2.hlsl @@ -1,13 +1,13 @@ -// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -emit-llvm-only -disable-llvm-passes -verify -DTEST_FUNC=__builtin_elementwise_atan2 -// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -emit-llvm-only -disable-llvm-passes -verify -DTEST_FUNC=__builtin_elementwise_fmod -// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -emit-llvm-only -disable-llvm-passes -verify -DTEST_FUNC=__builtin_elementwise_pow - -double test_double_builtin(double p0, double p1) { - return TEST_FUNC(p0, p1); - // expected-error at -1 {{passing 'double' to parameter of incompatible type 'float'}} -} - -double2 test_vec_double_builtin(double2 p0, double2 p1) { - return TEST_FUNC(p0, p1); - // expected-error at -1 {{passing 'double2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} -} +// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -emit-llvm-only -disable-llvm-passes -verify -DTEST_FUNC=__builtin_elementwise_atan2 +// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -emit-llvm-only -disable-llvm-passes -verify -DTEST_FUNC=__builtin_elementwise_fmod +// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -emit-llvm-only -disable-llvm-passes -verify -DTEST_FUNC=__builtin_elementwise_pow + +double test_double_builtin(double p0, double p1) { + return TEST_FUNC(p0, p1); + // expected-error at -1 {{passing 'double' to parameter of incompatible type 'float'}} +} + +double2 test_vec_double_builtin(double2 p0, double2 p1) { + return TEST_FUNC(p0, p1); + // expected-error at -1 {{passing 'double2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} +} diff --git a/clang/test/SemaHLSL/BuiltIns/length-errors.hlsl b/clang/test/SemaHLSL/BuiltIns/length-errors.hlsl index 281faada6f5e94..c5e2ac0b502dc4 100644 --- a/clang/test/SemaHLSL/BuiltIns/length-errors.hlsl +++ b/clang/test/SemaHLSL/BuiltIns/length-errors.hlsl @@ -1,32 +1,32 @@ -// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -disable-llvm-passes -verify -verify-ignore-unexpected - - -void test_too_few_arg() -{ - return __builtin_hlsl_length(); - // expected-error at -1 {{too few arguments to function call, expected 1, have 0}} -} - -void test_too_many_arg(float2 p0) -{ - return __builtin_hlsl_length(p0, p0); - // expected-error at -1 {{too many arguments to function call, expected 1, have 2}} -} - -bool builtin_bool_to_float_type_promotion(bool p1) -{ - return __builtin_hlsl_length(p1); - // expected-error at -1 {passing 'bool' to parameter of incompatible type 'float'}} -} - -bool builtin_length_int_to_float_promotion(int p1) -{ - return __builtin_hlsl_length(p1); - // expected-error at -1 {{passing 'int' to parameter of incompatible type 'float'}} -} - -bool2 builtin_length_int2_to_float2_promotion(int2 p1) -{ - return __builtin_hlsl_length(p1); - // expected-error at -1 {{passing 'int2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} -} +// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -disable-llvm-passes -verify -verify-ignore-unexpected + + +void test_too_few_arg() +{ + return __builtin_hlsl_length(); + // expected-error at -1 {{too few arguments to function call, expected 1, have 0}} +} + +void test_too_many_arg(float2 p0) +{ + return __builtin_hlsl_length(p0, p0); + // expected-error at -1 {{too many arguments to function call, expected 1, have 2}} +} + +bool builtin_bool_to_float_type_promotion(bool p1) +{ + return __builtin_hlsl_length(p1); + // expected-error at -1 {passing 'bool' to parameter of incompatible type 'float'}} +} + +bool builtin_length_int_to_float_promotion(int p1) +{ + return __builtin_hlsl_length(p1); + // expected-error at -1 {{passing 'int' to parameter of incompatible type 'float'}} +} + +bool2 builtin_length_int2_to_float2_promotion(int2 p1) +{ + return __builtin_hlsl_length(p1); + // expected-error at -1 {{passing 'int2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} +} diff --git a/clang/test/SemaHLSL/BuiltIns/normalize-errors.hlsl b/clang/test/SemaHLSL/BuiltIns/normalize-errors.hlsl index fc48c9b2589f7e..3720dca9b88a12 100644 --- a/clang/test/SemaHLSL/BuiltIns/normalize-errors.hlsl +++ b/clang/test/SemaHLSL/BuiltIns/normalize-errors.hlsl @@ -1,31 +1,31 @@ -// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -disable-llvm-passes -verify -verify-ignore-unexpected - -void test_too_few_arg() -{ - return __builtin_hlsl_normalize(); - // expected-error at -1 {{too few arguments to function call, expected 1, have 0}} -} - -void test_too_many_arg(float2 p0) -{ - return __builtin_hlsl_normalize(p0, p0); - // expected-error at -1 {{too many arguments to function call, expected 1, have 2}} -} - -bool builtin_bool_to_float_type_promotion(bool p1) -{ - return __builtin_hlsl_normalize(p1); - // expected-error at -1 {passing 'bool' to parameter of incompatible type 'float'}} -} - -bool builtin_normalize_int_to_float_promotion(int p1) -{ - return __builtin_hlsl_normalize(p1); - // expected-error at -1 {{passing 'int' to parameter of incompatible type 'float'}} -} - -bool2 builtin_normalize_int2_to_float2_promotion(int2 p1) -{ - return __builtin_hlsl_normalize(p1); - // expected-error at -1 {{passing 'int2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} -} +// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -disable-llvm-passes -verify -verify-ignore-unexpected + +void test_too_few_arg() +{ + return __builtin_hlsl_normalize(); + // expected-error at -1 {{too few arguments to function call, expected 1, have 0}} +} + +void test_too_many_arg(float2 p0) +{ + return __builtin_hlsl_normalize(p0, p0); + // expected-error at -1 {{too many arguments to function call, expected 1, have 2}} +} + +bool builtin_bool_to_float_type_promotion(bool p1) +{ + return __builtin_hlsl_normalize(p1); + // expected-error at -1 {passing 'bool' to parameter of incompatible type 'float'}} +} + +bool builtin_normalize_int_to_float_promotion(int p1) +{ + return __builtin_hlsl_normalize(p1); + // expected-error at -1 {{passing 'int' to parameter of incompatible type 'float'}} +} + +bool2 builtin_normalize_int2_to_float2_promotion(int2 p1) +{ + return __builtin_hlsl_normalize(p1); + // expected-error at -1 {{passing 'int2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} +} diff --git a/clang/test/SemaHLSL/BuiltIns/step-errors.hlsl b/clang/test/SemaHLSL/BuiltIns/step-errors.hlsl index 823585201ca62d..a76c5ff5dbd2ba 100644 --- a/clang/test/SemaHLSL/BuiltIns/step-errors.hlsl +++ b/clang/test/SemaHLSL/BuiltIns/step-errors.hlsl @@ -1,31 +1,31 @@ -// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -disable-llvm-passes -verify -verify-ignore-unexpected - -void test_too_few_arg() -{ - return __builtin_hlsl_step(); - // expected-error at -1 {{too few arguments to function call, expected 2, have 0}} -} - -void test_too_many_arg(float2 p0) -{ - return __builtin_hlsl_step(p0, p0, p0); - // expected-error at -1 {{too many arguments to function call, expected 2, have 3}} -} - -bool builtin_bool_to_float_type_promotion(bool p1) -{ - return __builtin_hlsl_step(p1, p1); - // expected-error at -1 {passing 'bool' to parameter of incompatible type 'float'}} -} - -bool builtin_step_int_to_float_promotion(int p1) -{ - return __builtin_hlsl_step(p1, p1); - // expected-error at -1 {{passing 'int' to parameter of incompatible type 'float'}} -} - -bool2 builtin_step_int2_to_float2_promotion(int2 p1) -{ - return __builtin_hlsl_step(p1, p1); - // expected-error at -1 {{passing 'int2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} -} +// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -disable-llvm-passes -verify -verify-ignore-unexpected + +void test_too_few_arg() +{ + return __builtin_hlsl_step(); + // expected-error at -1 {{too few arguments to function call, expected 2, have 0}} +} + +void test_too_many_arg(float2 p0) +{ + return __builtin_hlsl_step(p0, p0, p0); + // expected-error at -1 {{too many arguments to function call, expected 2, have 3}} +} + +bool builtin_bool_to_float_type_promotion(bool p1) +{ + return __builtin_hlsl_step(p1, p1); + // expected-error at -1 {passing 'bool' to parameter of incompatible type 'float'}} +} + +bool builtin_step_int_to_float_promotion(int p1) +{ + return __builtin_hlsl_step(p1, p1); + // expected-error at -1 {{passing 'int' to parameter of incompatible type 'float'}} +} + +bool2 builtin_step_int2_to_float2_promotion(int2 p1) +{ + return __builtin_hlsl_step(p1, p1); + // expected-error at -1 {{passing 'int2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} +} diff --git a/clang/test/SemaHLSL/Types/Traits/IsIntangibleType.hlsl b/clang/test/SemaHLSL/Types/Traits/IsIntangibleType.hlsl index 8c0f8d6f271dbd..1223a131af35c4 100644 --- a/clang/test/SemaHLSL/Types/Traits/IsIntangibleType.hlsl +++ b/clang/test/SemaHLSL/Types/Traits/IsIntangibleType.hlsl @@ -1,81 +1,81 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.6-library -finclude-default-header -verify %s -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.6-library -finclude-default-header -fnative-half-type -verify %s -// expected-no-diagnostics - -_Static_assert(__builtin_hlsl_is_intangible(__hlsl_resource_t), ""); -// no need to check array of __hlsl_resource_t, arrays of sizeless types are not supported - -_Static_assert(!__builtin_hlsl_is_intangible(int), ""); -_Static_assert(!__builtin_hlsl_is_intangible(float3), ""); -_Static_assert(!__builtin_hlsl_is_intangible(half[4]), ""); - -typedef __hlsl_resource_t Res; -_Static_assert(__builtin_hlsl_is_intangible(const Res), ""); -// no need to check array of Res, arrays of sizeless types are not supported - -struct ABuffer { - const int i[10]; - __hlsl_resource_t h; -}; -_Static_assert(__builtin_hlsl_is_intangible(ABuffer), ""); -_Static_assert(__builtin_hlsl_is_intangible(ABuffer[10]), ""); - -struct MyStruct { - half2 h2; - int3 i3; -}; -_Static_assert(!__builtin_hlsl_is_intangible(MyStruct), ""); -_Static_assert(!__builtin_hlsl_is_intangible(MyStruct[10]), ""); - -class MyClass { - int3 ivec; - float farray[12]; - MyStruct ms; - ABuffer buf; -}; -_Static_assert(__builtin_hlsl_is_intangible(MyClass), ""); -_Static_assert(__builtin_hlsl_is_intangible(MyClass[2]), ""); - -union U { - double d[4]; - Res buf; -}; -_Static_assert(__builtin_hlsl_is_intangible(U), ""); -_Static_assert(__builtin_hlsl_is_intangible(U[100]), ""); - -class MyClass2 { - int3 ivec; - float farray[12]; - U u; -}; -_Static_assert(__builtin_hlsl_is_intangible(MyClass2), ""); -_Static_assert(__builtin_hlsl_is_intangible(MyClass2[5]), ""); - -class Simple { - int a; -}; - -template struct TemplatedBuffer { - T a; - __hlsl_resource_t h; -}; -_Static_assert(__builtin_hlsl_is_intangible(TemplatedBuffer), ""); - -struct MyStruct2 : TemplatedBuffer { - float x; -}; -_Static_assert(__builtin_hlsl_is_intangible(MyStruct2), ""); - -struct MyStruct3 { - const TemplatedBuffer TB[10]; -}; -_Static_assert(__builtin_hlsl_is_intangible(MyStruct3), ""); - -template struct SimpleTemplate { - T a; -}; -_Static_assert(__builtin_hlsl_is_intangible(SimpleTemplate<__hlsl_resource_t>), ""); -_Static_assert(!__builtin_hlsl_is_intangible(SimpleTemplate), ""); - -_Static_assert(__builtin_hlsl_is_intangible(RWBuffer), ""); -_Static_assert(__builtin_hlsl_is_intangible(StructuredBuffer), ""); +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.6-library -finclude-default-header -verify %s +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.6-library -finclude-default-header -fnative-half-type -verify %s +// expected-no-diagnostics + +_Static_assert(__builtin_hlsl_is_intangible(__hlsl_resource_t), ""); +// no need to check array of __hlsl_resource_t, arrays of sizeless types are not supported + +_Static_assert(!__builtin_hlsl_is_intangible(int), ""); +_Static_assert(!__builtin_hlsl_is_intangible(float3), ""); +_Static_assert(!__builtin_hlsl_is_intangible(half[4]), ""); + +typedef __hlsl_resource_t Res; +_Static_assert(__builtin_hlsl_is_intangible(const Res), ""); +// no need to check array of Res, arrays of sizeless types are not supported + +struct ABuffer { + const int i[10]; + __hlsl_resource_t h; +}; +_Static_assert(__builtin_hlsl_is_intangible(ABuffer), ""); +_Static_assert(__builtin_hlsl_is_intangible(ABuffer[10]), ""); + +struct MyStruct { + half2 h2; + int3 i3; +}; +_Static_assert(!__builtin_hlsl_is_intangible(MyStruct), ""); +_Static_assert(!__builtin_hlsl_is_intangible(MyStruct[10]), ""); + +class MyClass { + int3 ivec; + float farray[12]; + MyStruct ms; + ABuffer buf; +}; +_Static_assert(__builtin_hlsl_is_intangible(MyClass), ""); +_Static_assert(__builtin_hlsl_is_intangible(MyClass[2]), ""); + +union U { + double d[4]; + Res buf; +}; +_Static_assert(__builtin_hlsl_is_intangible(U), ""); +_Static_assert(__builtin_hlsl_is_intangible(U[100]), ""); + +class MyClass2 { + int3 ivec; + float farray[12]; + U u; +}; +_Static_assert(__builtin_hlsl_is_intangible(MyClass2), ""); +_Static_assert(__builtin_hlsl_is_intangible(MyClass2[5]), ""); + +class Simple { + int a; +}; + +template struct TemplatedBuffer { + T a; + __hlsl_resource_t h; +}; +_Static_assert(__builtin_hlsl_is_intangible(TemplatedBuffer), ""); + +struct MyStruct2 : TemplatedBuffer { + float x; +}; +_Static_assert(__builtin_hlsl_is_intangible(MyStruct2), ""); + +struct MyStruct3 { + const TemplatedBuffer TB[10]; +}; +_Static_assert(__builtin_hlsl_is_intangible(MyStruct3), ""); + +template struct SimpleTemplate { + T a; +}; +_Static_assert(__builtin_hlsl_is_intangible(SimpleTemplate<__hlsl_resource_t>), ""); +_Static_assert(!__builtin_hlsl_is_intangible(SimpleTemplate), ""); + +_Static_assert(__builtin_hlsl_is_intangible(RWBuffer), ""); +_Static_assert(__builtin_hlsl_is_intangible(StructuredBuffer), ""); diff --git a/clang/test/SemaHLSL/Types/Traits/IsIntangibleTypeErrors.hlsl b/clang/test/SemaHLSL/Types/Traits/IsIntangibleTypeErrors.hlsl index de9ac90b895fc6..33614e87640dad 100644 --- a/clang/test/SemaHLSL/Types/Traits/IsIntangibleTypeErrors.hlsl +++ b/clang/test/SemaHLSL/Types/Traits/IsIntangibleTypeErrors.hlsl @@ -1,12 +1,12 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.6-library -finclude-default-header -verify %s - -struct Undefined; // expected-note {{forward declaration of 'Undefined'}} -_Static_assert(!__builtin_hlsl_is_intangible(Undefined), ""); // expected-error{{incomplete type 'Undefined' used in type trait expression}} - -void fn(int X) { // expected-note {{declared here}} - // expected-error@#vla {{variable length arrays are not supported for the current target}} - // expected-error@#vla {{variable length arrays are not supported in '__builtin_hlsl_is_intangible'}} - // expected-warning@#vla {{variable length arrays in C++ are a Clang extension}} - // expected-note@#vla {{function parameter 'X' with unknown value cannot be used in a constant expression}} - _Static_assert(!__builtin_hlsl_is_intangible(int[X]), ""); // #vla -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.6-library -finclude-default-header -verify %s + +struct Undefined; // expected-note {{forward declaration of 'Undefined'}} +_Static_assert(!__builtin_hlsl_is_intangible(Undefined), ""); // expected-error{{incomplete type 'Undefined' used in type trait expression}} + +void fn(int X) { // expected-note {{declared here}} + // expected-error@#vla {{variable length arrays are not supported for the current target}} + // expected-error@#vla {{variable length arrays are not supported in '__builtin_hlsl_is_intangible'}} + // expected-warning@#vla {{variable length arrays in C++ are a Clang extension}} + // expected-note@#vla {{function parameter 'X' with unknown value cannot be used in a constant expression}} + _Static_assert(!__builtin_hlsl_is_intangible(int[X]), ""); // #vla +} diff --git a/clang/test/SemaHLSL/resource_binding_attr_error_basic.hlsl b/clang/test/SemaHLSL/resource_binding_attr_error_basic.hlsl index 760c057630a7fa..4e50f70952ad13 100644 --- a/clang/test/SemaHLSL/resource_binding_attr_error_basic.hlsl +++ b/clang/test/SemaHLSL/resource_binding_attr_error_basic.hlsl @@ -1,42 +1,42 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s -verify - -// expected-error at +1{{binding type 't' only applies to SRV resources}} -float f1 : register(t0); - -// expected-error at +1 {{binding type 'u' only applies to UAV resources}} -float f2 : register(u0); - -// expected-error at +1{{binding type 'b' only applies to constant buffers. The 'bool constant' binding type is no longer supported}} -float f3 : register(b9); - -// expected-error at +1 {{binding type 's' only applies to sampler state}} -float f4 : register(s0); - -// expected-error at +1{{binding type 'i' ignored. The 'integer constant' binding type is no longer supported}} -float f5 : register(i9); - -// expected-error at +1{{binding type 'x' is invalid}} -float f6 : register(x9); - -cbuffer g_cbuffer1 { -// expected-error at +1{{binding type 'c' ignored in buffer declaration. Did you mean 'packoffset'?}} - float f7 : register(c2); -}; - -tbuffer g_tbuffer1 { -// expected-error at +1{{binding type 'c' ignored in buffer declaration. Did you mean 'packoffset'?}} - float f8 : register(c2); -}; - -cbuffer g_cbuffer2 { -// expected-error at +1{{binding type 'b' only applies to constant buffer resources}} - float f9 : register(b2); -}; - -tbuffer g_tbuffer2 { -// expected-error at +1{{binding type 'i' ignored. The 'integer constant' binding type is no longer supported}} - float f10 : register(i2); -}; - -// expected-error at +1{{binding type 'c' only applies to numeric variables in the global scope}} -RWBuffer f11 : register(c3); +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s -verify + +// expected-error at +1{{binding type 't' only applies to SRV resources}} +float f1 : register(t0); + +// expected-error at +1 {{binding type 'u' only applies to UAV resources}} +float f2 : register(u0); + +// expected-error at +1{{binding type 'b' only applies to constant buffers. The 'bool constant' binding type is no longer supported}} +float f3 : register(b9); + +// expected-error at +1 {{binding type 's' only applies to sampler state}} +float f4 : register(s0); + +// expected-error at +1{{binding type 'i' ignored. The 'integer constant' binding type is no longer supported}} +float f5 : register(i9); + +// expected-error at +1{{binding type 'x' is invalid}} +float f6 : register(x9); + +cbuffer g_cbuffer1 { +// expected-error at +1{{binding type 'c' ignored in buffer declaration. Did you mean 'packoffset'?}} + float f7 : register(c2); +}; + +tbuffer g_tbuffer1 { +// expected-error at +1{{binding type 'c' ignored in buffer declaration. Did you mean 'packoffset'?}} + float f8 : register(c2); +}; + +cbuffer g_cbuffer2 { +// expected-error at +1{{binding type 'b' only applies to constant buffer resources}} + float f9 : register(b2); +}; + +tbuffer g_tbuffer2 { +// expected-error at +1{{binding type 'i' ignored. The 'integer constant' binding type is no longer supported}} + float f10 : register(i2); +}; + +// expected-error at +1{{binding type 'c' only applies to numeric variables in the global scope}} +RWBuffer f11 : register(c3); diff --git a/clang/test/SemaHLSL/resource_binding_attr_error_other.hlsl b/clang/test/SemaHLSL/resource_binding_attr_error_other.hlsl index 4c9e9a6b44c928..503c8469666f3b 100644 --- a/clang/test/SemaHLSL/resource_binding_attr_error_other.hlsl +++ b/clang/test/SemaHLSL/resource_binding_attr_error_other.hlsl @@ -1,9 +1,9 @@ -// RUN: not %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s | FileCheck %s - -// XFAIL: * -// This expectedly fails because RayQuery is an unsupported type. -// When it becomes supported, we should expect an error due to -// the variable type being classified as "other", and according -// to the spec, err_hlsl_unsupported_register_type_and_variable_type -// should be emitted. -RayQuery<0> r1: register(t0); +// RUN: not %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s | FileCheck %s + +// XFAIL: * +// This expectedly fails because RayQuery is an unsupported type. +// When it becomes supported, we should expect an error due to +// the variable type being classified as "other", and according +// to the spec, err_hlsl_unsupported_register_type_and_variable_type +// should be emitted. +RayQuery<0> r1: register(t0); diff --git a/clang/test/SemaHLSL/resource_binding_attr_error_resource.hlsl b/clang/test/SemaHLSL/resource_binding_attr_error_resource.hlsl index 4b6af47c0ab725..ea43e27b5b5ac1 100644 --- a/clang/test/SemaHLSL/resource_binding_attr_error_resource.hlsl +++ b/clang/test/SemaHLSL/resource_binding_attr_error_resource.hlsl @@ -1,49 +1,49 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s -verify - -// This test validates the diagnostics that are emitted when a variable with a "resource" type -// is bound to a register using the register annotation - - -template -struct MyTemplatedSRV { - __hlsl_resource_t [[hlsl::resource_class(SRV)]] x; -}; - -struct MySRV { - __hlsl_resource_t [[hlsl::resource_class(SRV)]] x; -}; - -struct MySampler { - __hlsl_resource_t [[hlsl::resource_class(Sampler)]] x; -}; - -struct MyUAV { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] x; -}; - -struct MyCBuffer { - __hlsl_resource_t [[hlsl::resource_class(CBuffer)]] x; -}; - - -// expected-error at +1 {{binding type 'i' ignored. The 'integer constant' binding type is no longer supported}} -MySRV invalid : register(i2); - -// expected-error at +1 {{binding type 't' only applies to SRV resources}} -MyUAV a : register(t2, space1); - -// expected-error at +1 {{binding type 'u' only applies to UAV resources}} -MySampler b : register(u2, space1); - -// expected-error at +1 {{binding type 'b' only applies to constant buffer resources}} -MyTemplatedSRV c : register(b2); - -// expected-error at +1 {{binding type 's' only applies to sampler state}} -MyUAV d : register(s2, space1); - -// empty binding prefix cases: -// expected-error at +1 {{expected identifier}} -MyTemplatedSRV e: register(); - -// expected-error at +1 {{expected identifier}} -MyTemplatedSRV f: register(""); +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s -verify + +// This test validates the diagnostics that are emitted when a variable with a "resource" type +// is bound to a register using the register annotation + + +template +struct MyTemplatedSRV { + __hlsl_resource_t [[hlsl::resource_class(SRV)]] x; +}; + +struct MySRV { + __hlsl_resource_t [[hlsl::resource_class(SRV)]] x; +}; + +struct MySampler { + __hlsl_resource_t [[hlsl::resource_class(Sampler)]] x; +}; + +struct MyUAV { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] x; +}; + +struct MyCBuffer { + __hlsl_resource_t [[hlsl::resource_class(CBuffer)]] x; +}; + + +// expected-error at +1 {{binding type 'i' ignored. The 'integer constant' binding type is no longer supported}} +MySRV invalid : register(i2); + +// expected-error at +1 {{binding type 't' only applies to SRV resources}} +MyUAV a : register(t2, space1); + +// expected-error at +1 {{binding type 'u' only applies to UAV resources}} +MySampler b : register(u2, space1); + +// expected-error at +1 {{binding type 'b' only applies to constant buffer resources}} +MyTemplatedSRV c : register(b2); + +// expected-error at +1 {{binding type 's' only applies to sampler state}} +MyUAV d : register(s2, space1); + +// empty binding prefix cases: +// expected-error at +1 {{expected identifier}} +MyTemplatedSRV e: register(); + +// expected-error at +1 {{expected identifier}} +MyTemplatedSRV f: register(""); diff --git a/clang/test/SemaHLSL/resource_binding_attr_error_silence_diags.hlsl b/clang/test/SemaHLSL/resource_binding_attr_error_silence_diags.hlsl index e63f264452da79..7f248e30c07096 100644 --- a/clang/test/SemaHLSL/resource_binding_attr_error_silence_diags.hlsl +++ b/clang/test/SemaHLSL/resource_binding_attr_error_silence_diags.hlsl @@ -1,27 +1,27 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only -Wno-legacy-constant-register-binding %s -verify - -// expected-no-diagnostics -float f2 : register(b9); - -float f3 : register(i9); - -cbuffer g_cbuffer1 { - float f4 : register(c2); -}; - - -struct Eg12{ - RWBuffer a; -}; - -Eg12 e12 : register(c9); - -Eg12 bar : register(i1); - -struct Eg7 { - struct Bar { - float f; - }; - Bar b; -}; -Eg7 e7 : register(t0); +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only -Wno-legacy-constant-register-binding %s -verify + +// expected-no-diagnostics +float f2 : register(b9); + +float f3 : register(i9); + +cbuffer g_cbuffer1 { + float f4 : register(c2); +}; + + +struct Eg12{ + RWBuffer a; +}; + +Eg12 e12 : register(c9); + +Eg12 bar : register(i1); + +struct Eg7 { + struct Bar { + float f; + }; + Bar b; +}; +Eg7 e7 : register(t0); diff --git a/clang/test/SemaHLSL/resource_binding_attr_error_space.hlsl b/clang/test/SemaHLSL/resource_binding_attr_error_space.hlsl index 70e64e6ca75280..3001dbb1e3ec96 100644 --- a/clang/test/SemaHLSL/resource_binding_attr_error_space.hlsl +++ b/clang/test/SemaHLSL/resource_binding_attr_error_space.hlsl @@ -1,62 +1,62 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s -verify - -// valid -cbuffer cbuf { - RWBuffer r : register(u0, space0); -} - -cbuffer cbuf2 { - struct x { - // this test validates that no diagnostic is emitted on the space parameter, because - // this register annotation is not in the global scope. - // expected-error at +1 {{'register' attribute only applies to cbuffer/tbuffer and external global variables}} - RWBuffer E : register(u2, space3); - }; -} - -struct MyStruct { - RWBuffer E; -}; - -cbuffer cbuf3 { - // valid - MyStruct E : register(u2, space3); -} - -// valid -MyStruct F : register(u3, space4); - -cbuffer cbuf4 { - // this test validates that no diagnostic is emitted on the space parameter, because - // this register annotation is not in the global scope. - // expected-error at +1 {{binding type 'u' only applies to UAV resources}} - float a : register(u2, space3); -} - -// expected-error at +1 {{invalid space specifier 's2' used; expected 'space' followed by an integer, like space1}} -cbuffer a : register(b0, s2) { - -} - -// expected-error at +1 {{invalid space specifier 'spaces' used; expected 'space' followed by an integer, like space1}} -cbuffer b : register(b2, spaces) { - -} - -// expected-error at +1 {{wrong argument format for hlsl attribute, use space3 instead}} -cbuffer c : register(b2, space 3) {} - -// expected-error at +1 {{register space cannot be specified on global constants}} -int d : register(c2, space3); - -// expected-error at +1 {{register space cannot be specified on global constants}} -int e : register(c2, space0); - -// expected-error at +1 {{register space cannot be specified on global constants}} -int f : register(c2, space00); - -// valid -RWBuffer g : register(u2, space0); - -// valid -RWBuffer h : register(u2, space0); +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s -verify + +// valid +cbuffer cbuf { + RWBuffer r : register(u0, space0); +} + +cbuffer cbuf2 { + struct x { + // this test validates that no diagnostic is emitted on the space parameter, because + // this register annotation is not in the global scope. + // expected-error at +1 {{'register' attribute only applies to cbuffer/tbuffer and external global variables}} + RWBuffer E : register(u2, space3); + }; +} + +struct MyStruct { + RWBuffer E; +}; + +cbuffer cbuf3 { + // valid + MyStruct E : register(u2, space3); +} + +// valid +MyStruct F : register(u3, space4); + +cbuffer cbuf4 { + // this test validates that no diagnostic is emitted on the space parameter, because + // this register annotation is not in the global scope. + // expected-error at +1 {{binding type 'u' only applies to UAV resources}} + float a : register(u2, space3); +} + +// expected-error at +1 {{invalid space specifier 's2' used; expected 'space' followed by an integer, like space1}} +cbuffer a : register(b0, s2) { + +} + +// expected-error at +1 {{invalid space specifier 'spaces' used; expected 'space' followed by an integer, like space1}} +cbuffer b : register(b2, spaces) { + +} + +// expected-error at +1 {{wrong argument format for hlsl attribute, use space3 instead}} +cbuffer c : register(b2, space 3) {} + +// expected-error at +1 {{register space cannot be specified on global constants}} +int d : register(c2, space3); + +// expected-error at +1 {{register space cannot be specified on global constants}} +int e : register(c2, space0); + +// expected-error at +1 {{register space cannot be specified on global constants}} +int f : register(c2, space00); + +// valid +RWBuffer g : register(u2, space0); + +// valid +RWBuffer h : register(u2, space0); diff --git a/clang/test/SemaHLSL/resource_binding_attr_error_udt.hlsl b/clang/test/SemaHLSL/resource_binding_attr_error_udt.hlsl index 40517f393e1284..235004102a539b 100644 --- a/clang/test/SemaHLSL/resource_binding_attr_error_udt.hlsl +++ b/clang/test/SemaHLSL/resource_binding_attr_error_udt.hlsl @@ -1,135 +1,135 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s -verify - -template -struct MyTemplatedUAV { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] x; -}; - -struct MySRV { - __hlsl_resource_t [[hlsl::resource_class(SRV)]] x; -}; - -struct MySampler { - __hlsl_resource_t [[hlsl::resource_class(Sampler)]] x; -}; - -struct MyUAV { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] x; -}; - -struct MyCBuffer { - __hlsl_resource_t [[hlsl::resource_class(CBuffer)]] x; -}; - -// Valid: f is skipped, SRVBuf is bound to t0, UAVBuf is bound to u0 -struct Eg1 { - float f; - MySRV SRVBuf; - MyUAV UAVBuf; - }; -Eg1 e1 : register(t0) : register(u0); - -// Valid: f is skipped, SRVBuf is bound to t0, UAVBuf is bound to u0. -// UAVBuf2 gets automatically assigned to u1 even though there is no explicit binding for u1. -struct Eg2 { - float f; - MySRV SRVBuf; - MyUAV UAVBuf; - MyUAV UAVBuf2; - }; -Eg2 e2 : register(t0) : register(u0); - -// Valid: Bar, the struct within Eg3, has a valid resource that can be bound to t0. -struct Eg3 { - struct Bar { - MyUAV a; - }; - Bar b; -}; -Eg3 e3 : register(u0); - -// Valid: the first sampler state object within 's' is bound to slot 5 -struct Eg4 { - MySampler s[3]; -}; - -Eg4 e4 : register(s5); - - -struct Eg5 { - float f; -}; -// expected-warning at +1{{binding type 't' only applies to types containing SRV resources}} -Eg5 e5 : register(t0); - -struct Eg6 { - float f; -}; -// expected-warning at +1{{binding type 'u' only applies to types containing UAV resources}} -Eg6 e6 : register(u0); - -struct Eg7 { - float f; -}; -// expected-warning at +1{{binding type 'b' only applies to types containing constant buffer resources}} -Eg7 e7 : register(b0); - -struct Eg8 { - float f; -}; -// expected-warning at +1{{binding type 's' only applies to types containing sampler state}} -Eg8 e8 : register(s0); - -struct Eg9 { - MySRV s; -}; -// expected-warning at +1{{binding type 'c' only applies to types containing numeric types}} -Eg9 e9 : register(c0); - -struct Eg10{ - // expected-error at +1{{'register' attribute only applies to cbuffer/tbuffer and external global variables}} - MyTemplatedUAV a : register(u9); -}; -Eg10 e10; - - -template -struct Eg11 { - R b; -}; -// expected-warning at +1{{binding type 'u' only applies to types containing UAV resources}} -Eg11 e11 : register(u0); -// invalid because after template expansion, there are no valid resources inside Eg11 to bind as a UAV, only an SRV - - -struct Eg12{ - MySRV s1; - MySRV s2; -}; -// expected-warning at +2{{binding type 'u' only applies to types containing UAV resources}} -// expected-error at +1{{binding type 'u' cannot be applied more than once}} -Eg12 e12 : register(u9) : register(u10); - -struct Eg13{ - MySRV s1; - MySRV s2; -}; -// expected-warning at +3{{binding type 'u' only applies to types containing UAV resources}} -// expected-error at +2{{binding type 'u' cannot be applied more than once}} -// expected-error at +1{{binding type 'u' cannot be applied more than once}} -Eg13 e13 : register(u9) : register(u10) : register(u11); - -// expected-error at +1{{binding type 't' cannot be applied more than once}} -Eg13 e13_2 : register(t11) : register(t12); - -struct Eg14{ - MyTemplatedUAV r1; -}; -// expected-warning at +1{{binding type 't' only applies to types containing SRV resources}} -Eg14 e14 : register(t9); - -struct Eg15 { - float f[4]; -}; -// expected no error -Eg15 e15 : register(c0); +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s -verify + +template +struct MyTemplatedUAV { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] x; +}; + +struct MySRV { + __hlsl_resource_t [[hlsl::resource_class(SRV)]] x; +}; + +struct MySampler { + __hlsl_resource_t [[hlsl::resource_class(Sampler)]] x; +}; + +struct MyUAV { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] x; +}; + +struct MyCBuffer { + __hlsl_resource_t [[hlsl::resource_class(CBuffer)]] x; +}; + +// Valid: f is skipped, SRVBuf is bound to t0, UAVBuf is bound to u0 +struct Eg1 { + float f; + MySRV SRVBuf; + MyUAV UAVBuf; + }; +Eg1 e1 : register(t0) : register(u0); + +// Valid: f is skipped, SRVBuf is bound to t0, UAVBuf is bound to u0. +// UAVBuf2 gets automatically assigned to u1 even though there is no explicit binding for u1. +struct Eg2 { + float f; + MySRV SRVBuf; + MyUAV UAVBuf; + MyUAV UAVBuf2; + }; +Eg2 e2 : register(t0) : register(u0); + +// Valid: Bar, the struct within Eg3, has a valid resource that can be bound to t0. +struct Eg3 { + struct Bar { + MyUAV a; + }; + Bar b; +}; +Eg3 e3 : register(u0); + +// Valid: the first sampler state object within 's' is bound to slot 5 +struct Eg4 { + MySampler s[3]; +}; + +Eg4 e4 : register(s5); + + +struct Eg5 { + float f; +}; +// expected-warning at +1{{binding type 't' only applies to types containing SRV resources}} +Eg5 e5 : register(t0); + +struct Eg6 { + float f; +}; +// expected-warning at +1{{binding type 'u' only applies to types containing UAV resources}} +Eg6 e6 : register(u0); + +struct Eg7 { + float f; +}; +// expected-warning at +1{{binding type 'b' only applies to types containing constant buffer resources}} +Eg7 e7 : register(b0); + +struct Eg8 { + float f; +}; +// expected-warning at +1{{binding type 's' only applies to types containing sampler state}} +Eg8 e8 : register(s0); + +struct Eg9 { + MySRV s; +}; +// expected-warning at +1{{binding type 'c' only applies to types containing numeric types}} +Eg9 e9 : register(c0); + +struct Eg10{ + // expected-error at +1{{'register' attribute only applies to cbuffer/tbuffer and external global variables}} + MyTemplatedUAV a : register(u9); +}; +Eg10 e10; + + +template +struct Eg11 { + R b; +}; +// expected-warning at +1{{binding type 'u' only applies to types containing UAV resources}} +Eg11 e11 : register(u0); +// invalid because after template expansion, there are no valid resources inside Eg11 to bind as a UAV, only an SRV + + +struct Eg12{ + MySRV s1; + MySRV s2; +}; +// expected-warning at +2{{binding type 'u' only applies to types containing UAV resources}} +// expected-error at +1{{binding type 'u' cannot be applied more than once}} +Eg12 e12 : register(u9) : register(u10); + +struct Eg13{ + MySRV s1; + MySRV s2; +}; +// expected-warning at +3{{binding type 'u' only applies to types containing UAV resources}} +// expected-error at +2{{binding type 'u' cannot be applied more than once}} +// expected-error at +1{{binding type 'u' cannot be applied more than once}} +Eg13 e13 : register(u9) : register(u10) : register(u11); + +// expected-error at +1{{binding type 't' cannot be applied more than once}} +Eg13 e13_2 : register(t11) : register(t12); + +struct Eg14{ + MyTemplatedUAV r1; +}; +// expected-warning at +1{{binding type 't' only applies to types containing SRV resources}} +Eg14 e14 : register(t9); + +struct Eg15 { + float f[4]; +}; +// expected no error +Eg15 e15 : register(c0); diff --git a/clang/tools/scan-build/bin/scan-build.bat b/clang/tools/scan-build/bin/scan-build.bat index 77be6746318f11..f765f205b8ec50 100644 --- a/clang/tools/scan-build/bin/scan-build.bat +++ b/clang/tools/scan-build/bin/scan-build.bat @@ -1 +1 @@ -perl -S scan-build %* +perl -S scan-build %* diff --git a/clang/tools/scan-build/libexec/c++-analyzer.bat b/clang/tools/scan-build/libexec/c++-analyzer.bat index 69f048a91671f0..83c7172456a51a 100644 --- a/clang/tools/scan-build/libexec/c++-analyzer.bat +++ b/clang/tools/scan-build/libexec/c++-analyzer.bat @@ -1 +1 @@ -perl -S c++-analyzer %* +perl -S c++-analyzer %* diff --git a/clang/tools/scan-build/libexec/ccc-analyzer.bat b/clang/tools/scan-build/libexec/ccc-analyzer.bat index 2a85376eb82b16..fdd36f3bdd0437 100644 --- a/clang/tools/scan-build/libexec/ccc-analyzer.bat +++ b/clang/tools/scan-build/libexec/ccc-analyzer.bat @@ -1 +1 @@ -perl -S ccc-analyzer %* +perl -S ccc-analyzer %* diff --git a/clang/utils/ClangVisualizers/clang.natvis b/clang/utils/ClangVisualizers/clang.natvis index a7c70186bc46de..611c20dacce176 100644 --- a/clang/utils/ClangVisualizers/clang.natvis +++ b/clang/utils/ClangVisualizers/clang.natvis @@ -1,1089 +1,1089 @@ - - - - - - - LocInfoType - {(clang::Type::TypeClass)TypeBits.TC, en}Type - - {*(clang::BuiltinType *)this} - {*(clang::PointerType *)this} - {*(clang::ParenType *)this} - {(clang::BitIntType *)this} - {*(clang::LValueReferenceType *)this} - {*(clang::RValueReferenceType *)this} - {(clang::ConstantArrayType *)this,na} - {(clang::ConstantArrayType *)this,view(left)na} - {(clang::ConstantArrayType *)this,view(right)na} - {(clang::VariableArrayType *)this,na} - {(clang::VariableArrayType *)this,view(left)na} - {(clang::VariableArrayType *)this,view(right)na} - {(clang::IncompleteArrayType *)this,na} - {(clang::IncompleteArrayType *)this,view(left)na} - {(clang::IncompleteArrayType *)this,view(right)na} - {(clang::TypedefType *)this,na} - {(clang::TypedefType *)this,view(cpp)na} - {*(clang::AttributedType *)this} - {(clang::DecayedType *)this,na} - {(clang::DecayedType *)this,view(left)na} - {(clang::DecayedType *)this,view(right)na} - {(clang::ElaboratedType *)this,na} - {(clang::ElaboratedType *)this,view(left)na} - {(clang::ElaboratedType *)this,view(right)na} - {*(clang::TemplateTypeParmType *)this} - {*(clang::TemplateTypeParmType *)this,view(cpp)} - {*(clang::SubstTemplateTypeParmType *)this} - {*(clang::RecordType *)this} - {*(clang::RecordType *)this,view(cpp)} - {(clang::FunctionProtoType *)this,na} - {(clang::FunctionProtoType *)this,view(left)na} - {(clang::FunctionProtoType *)this,view(right)na} - {*(clang::TemplateSpecializationType *)this} - {*(clang::DeducedTemplateSpecializationType *)this} - {*(clang::DeducedTemplateSpecializationType *)this,view(cpp)} - {*(clang::InjectedClassNameType *)this} - {*(clang::DependentNameType *)this} - {*(clang::PackExpansionType *)this} - {(clang::LocInfoType *)this,na} - {(clang::LocInfoType *)this,view(cpp)na} - {this,view(poly)na} - {*this,view(cpp)} - - No visualizer yet for {(clang::Type::TypeClass)TypeBits.TC,en}Type - Dependence{" ",en} - - CachedLinkage: {(clang::Linkage)TypeBits.CachedLinkage,en} CachedLocalOrUnnamed - CachedLinkage: {(clang::Linkage)TypeBits.CachedLinkage,en}{" ",sb} - - FromAST - - - No TypeBits set beyond TypeClass - - {*this, view(Dependence)}{*this, view(Cache)}{*this, view(FromAST)} - {*this,view(cmn)} {{{*this,view(poly)}}} - - (clang::Type::TypeClass)TypeBits.TC - this,view(flags)na - CanonicalType - *(clang::BuiltinType *)this - *(clang::PointerType *)this - *(clang::ParenType*)this - *(clang::BitIntType*)this - *(clang::LValueReferenceType *)this - *(clang::RValueReferenceType *)this - (clang::ConstantArrayType *)this - (clang::VariableArrayType *)this - (clang::IncompleteArrayType *)this - *(clang::AttributedType *)this - (clang::DecayedType *)this - (clang::ElaboratedType *)this - (clang::TemplateTypeParmType *)this - (clang::SubstTemplateTypeParmType *)this - (clang::RecordType *)this - (clang::FunctionProtoType *)this - (clang::TemplateSpecializationType *)this - (clang::DeducedTemplateSpecializationType *)this - (clang::InjectedClassNameType *)this - (clang::DependentNameType *)this - (clang::PackExpansionType *)this - (clang::LocInfoType *)this - - - - - ElementType - - - - {ElementType,view(cpp)} - [{Size}] - {ElementType,view(cpp)}[{Size}] - - Size - (clang::ArrayType *)this - - - - {ElementType,view(cpp)} - [] - {ElementType,view(cpp)}[] - - (clang::ArrayType *)this - - - - {ElementType,view(cpp)} - [*] - {ElementType,view(cpp)}[*] - - (clang::Expr *)SizeExpr - (clang::ArrayType *)this - - - - {Decl,view(name)nd} - {Decl} - - Decl - *(clang::Type *)this, view(cmn) - - - - {PointeeType, view(cpp)} * - - PointeeType - *(clang::Type *)this, view(cmn) - - - - {Inner, view(cpp)} - - Inner - *(clang::Type *)this, view(cmn) - - - - signed _BitInt({NumBits}) - unsigned _BitInt({NumBits})( - - NumBits - (clang::Type *)this, view(cmn) - - - - - {((clang::ReferenceType *)this)->PointeeType,view(cpp)} & - - *(clang::Type *)this, view(cmn) - PointeeType - - - - {((clang::ReferenceType *)this)->PointeeType,view(cpp)} && - - *(clang::Type *)this, view(cmn) - PointeeType - - - - {ModifiedType} Attribute={(clang::AttributedType::Kind)AttributedTypeBits.AttrKind} - - - - - {(clang::Decl::Kind)DeclContextBits.DeclKind,en}Decl - - (clang::Decl::Kind)DeclContextBits.DeclKind,en - - - - - FirstDecl - (clang::Decl *)(*(intptr_t *)NextInContextAndBits.Value.Data & ~3) - *this - - - - - - - Field {{{*(clang::DeclaratorDecl *)this,view(cpp)nd}}} - - - {*(clang::FunctionDecl *)this,nd} - Method {{{*this,view(cpp)}}} - - - Constructor {{{Name,view(cpp)}({*(clang::FunctionDecl *)this,view(parm0)nd})}} - - - Destructor {{~{Name,view(cpp)}()}} - - - typename - class - (not yet known if parameter pack) - ... - - {(TypeSourceInfo *)(*(uintptr_t *)DefaultArgument.ValueOrInherited.Val.Value.Data&~3LL),view(cpp)} - {{InheritedInitializer}} - = {this,view(DefaultArg)na} - - {*this,view(TorC)} {*this,view(MaybeEllipses)}{Name,view(cpp)} {this,view(Initializer)na} - - - {*TemplatedDecl,view(cpp)} - template{TemplateParams,na} {*TemplatedDecl}; - - TemplateParams,na - TemplatedDecl,na - - - - - {(clang::TypeSourceInfo *)(*(uintptr_t *)MaybeModedTInfo.Value.Data & ~7LL),view(cpp)na} - {(clang::TypedefNameDecl::ModedTInfo *)(*(uintptr_t *)MaybeModedTInfo.Value.Data & ~7LL),view(cpp)na} - {(TypeDecl *)this,view(cpp)nand} - typedef {this,view(type)na} {this,view(name)na}; - - "Not yet calculated",sb - (bool)(*(uintptr_t *)MaybeModedTInfo.Value.Data & 2) - (clang::TypeSourceInfo *)(*(uintptr_t *)MaybeModedTInfo.Value.Data & ~7LL) - (clang::TypedefNameDecl::ModedTInfo *)(*(uintptr_t *)MaybeModedTInfo.Value.Data & ~7LL) - (TypeDecl *)this,nd - - - - {(TypedefNameDecl *)this,view(name)nand} - using {(TypedefNameDecl *)this,view(name)nand} = {(TypedefNameDecl *)this,view(type)nand} - - - {Name} - - - Kind={(UncommonTemplateNameStorage::Kind)Kind,en}, Size={Size} - - (UncommonTemplateNameStorage::Kind)Kind - Size - - - - {Bits}, - {this,view(cmn)na},{(OverloadedTemplateStorage*)this,na} - {this,view(cmn)na},{(AssumedTemplateStorage*)this,na} - {this,view(cmn)na},{(SubstTemplateTemplateParmStorage*)this,na} - {this,view(cmn)na},{(SubstTemplateTemplateParmPackStorage*)this,na} - {this,view(cmn)na} - - Bits - (OverloadedTemplateStorage*)this - (AssumedTemplateStorage*)this - (SubstTemplateTemplateParmStorage*)this - (SubstTemplateTemplateParmPackStorage*)this - - - - - - - {(clang::TemplateDecl *)(Val.Value & ~3LL),view(cpp)na} - - - {(clang::TemplateDecl *)(Val.Value & ~3LL),na} - - - {(clang::UncommonTemplateNameStorage *)(Val.Value & ~3LL),view(cpp)na} - - - {(clang::UncommonTemplateNameStorage *)(Val.Value & ~3LL),na} - - - {(clang::QualifiedTemplateName *)(Val.Value & ~3LL),view(cpp)na} - - - {(clang::QualifiedTemplateName *)(Val.Value & ~3LL),na} - - - {(clang::DependentTemplateName *)(Val.Value & ~3LL),view(cpp)na} - - - {(clang::DependentTemplateName *)(Val.Value & ~3LL),na} - - - "TemplateDecl",s8b - - (clang::TemplateDecl *)(Val.Value & ~3LL) - - "UncommonTemplateNameStorage",s8b - - (clang::UncommonTemplateNameStorage *)(Val.Value & ~3LL) - - "QualifiedTemplateName",s8b - - (clang::QualifiedTemplateName *)(Val.Value & ~3LL) - - "DependentTemplateName",s8b - - (clang::DependentTemplateName *)(Val.Value & ~3LL) - - Val - - - - - {Storage,view(cpp)na} - {Storage,na} - - Storage - - - - {Name,view(cpp)} - {Name} - - - implicit{" ",sb} - - {*this,view(implicit)nd} - {*this,view(modifiers)}{Name,view(cpp)} - {*this,view(modifiers)nd}struct {Name,view(cpp)} - {*this,view(modifiers)nd}interface {Name,view(cpp)} - {*this,view(modifiers)nd}union {Name,view(cpp)} - {*this,view(modifiers)nd}class {Name,view(cpp)} - {*this,view(modifiers)nd}enum {Name,view(cpp)} - - (clang::DeclContext *)this - - - - {decl,view(cpp)na} - {*decl} - - *(clang::Type *)this, view(cmn) - decl - - - - {(clang::TagType *)this,view(cpp)na} - {(clang::TagType *)this,na} - - *(clang::TagType *)this - - - - {{{*Replaced,view(cpp)} <= {CanonicalType,view(cpp)}}} - - *(clang::Type *)this, view(cmn) - *Replaced - - - - - - {ResultType,view(cpp)} - - {*(clang::QualType *)(this+1),view(cpp)}{*this,view(parm1)} - - , {*((clang::QualType *)(this+1)+1),view(cpp)}{*this,view(parm2)} - - , {*((clang::QualType *)(this+1)+2),view(cpp)}{*this,view(parm3)} - - , {*((clang::QualType *)(this+1)+3),view(cpp)}{*this,view(parm4)} - - , {*((clang::QualType *)(this+1)+4),view(cpp)}{*this,view(parm5)} - - , /* expand for more params */ - ({*this,view(parm0)}) -> {ResultType,view(cpp)} - ({*this,view(parm0)}) - {this,view(left)na}{this,view(right)na} - - ResultType - - {*this,view(parm0)} - - - FunctionTypeBits.NumParams - (clang::QualType *)(this+1) - - - - *(clang::Type *)this, view(cmn) - - - - - {OriginalTy} adjusted to {AdjustedTy} - - OriginalTy - AdjustedTy - - - - {OriginalTy,view(left)} - {OriginalTy,view(right)} - {OriginalTy} - - (clang::AdjustedType *)this - - - - {NamedType,view(left)} - {NamedType,view(right)} - {NamedType} - - (clang::ElaboratedTypeKeyword)TypeWithKeywordBits.Keyword - NNS - NamedType,view(cmn) - - - - {TTPDecl->Name,view(cpp)} - Non-canonical: {*TTPDecl} - Canonical: {CanTTPTInfo} - - *(clang::Type *)this, view(cmn) - - - - {Decl,view(cpp)} - - Decl - InjectedType - *(clang::Type *)this, view(cmn) - - - - {NNS}{Name,view(cpp)na} - - NNS - Name - *(clang::Type *)this, view(cmn) - - - - - {(IdentifierInfo*)Specifier,view(cpp)na}:: - {(NamedDecl*)Specifier,view(cpp)na}:: - {(Type*)Specifier,view(cpp)na}:: - - (NestedNameSpecifier::StoredSpecifierKind)((*(uintptr_t *)Prefix.Value.Data>>1)&3) - - - - {Pattern} - - Pattern - NumExpansions - *(clang::Type *)this, view(cmn) - - - - {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,view(poly)}{*this,view(fastQuals)} - {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,view(cpp)}{*this,view(fastQuals)} - {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,view(left)}{*this,view(fastQuals)} - {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,view(right)}{*this,view(fastQuals)} - - - {" ",sb}const - {" ",sb}restrict - {" ",sb}const restrict - {" ",sb}volatile - {" ",sb}const volatile - {" ",sb}volatile restrict - {" ",sb}const volatile restrict - Cannot visualize non-fast qualifiers - Null - {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,na}{*this,view(fastQuals)} - - *this,view(fastQuals) - ((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType - - - - - {DeclInfo,view(cpp)na} - {DeclInfo,na} - - DeclInfo - *(clang::Type *)this, view(cmn) - - - - {Ty,view(cpp)} - {Ty} - - Ty - - - - {(QualType *)&Ty,na} - - (QualType *)&Ty - Data - - - - Not building anything - Building a {LastTy} - - - {Argument,view(cpp)} - {Argument} - - - {*(clang::QualType *)&TypeOrValue.V,view(cpp)} - {(clang::TemplateArgument::ArgKind)TypeOrValue.Kind,en} template argument: {*(clang::QualType *)&TypeOrValue.V} - - {Args.Args[0]}{*this,view(arg1)} - - , {Args.Args[1]}{*this,view(arg2)} - - , {Args.Args[2]}, ... - - {Args.Args[0],view(cpp)}{*this,view(arg1cpp)} - - , {Args.Args[1],view(cpp)}{*this,view(arg2cpp)} - - , {Args.Args[2],view(cpp)}, ... - {*this,view(arg0cpp)} - {*this,view(arg0)} - {(clang::Expr *)TypeOrValue.V,view(cpp)na} - {(clang::TemplateArgument::ArgKind)TypeOrValue.Kind,en} - - *(clang::QualType *)&TypeOrValue.V - (clang::Expr *)TypeOrValue.V - - Args.NumArgs - Args.Args - - - - - - - {((TemplateArgumentLoc*)Arguments.BeginX)[0],view(cpp)}{*this,view(elt1)} - - , {((TemplateArgumentLoc*)Arguments.BeginX)[1],view(cpp)}{*this,view(elt2)} - - , {((TemplateArgumentLoc*)Arguments.BeginX)[2],view(cpp)}{*this,view(elt3)} - - , {((TemplateArgumentLoc*)Arguments.BeginX)[3],view(cpp)}{*this,view(elt4)} - - , ... - empty - <{*this,view(elt0)}> - Uninitialized - - - - {Arguments[0],view(cpp)}{*this,view(arg1)} - - , {Arguments[1],view(cpp)}{*this,view(arg2)} - - , {Arguments[1],view(cpp)}, ... - <{*this,view(arg0)}> - - NumArguments - - NumArguments - Arguments - - - - - - {Data[0],view(cpp)}{*this,view(arg1)} - - , {Data[1],view(cpp)}{*this,view(arg2)} - - , {Data[2],view(cpp)}, ... - <{*this,view(arg0)}> - - Length - - - - Length - Data - - - - - - - - {((llvm::ArrayRef<clang::TemplateArgument> *)TemplateArgumentLists.BeginX)[0],view(cpp)}{*this,view(level1)} - - ::{((llvm::ArrayRef<clang::TemplateArgument> *)TemplateArgumentLists.BeginX)[1],view(cpp)}{*this,view(level2)} - - ::{((llvm::ArrayRef<clang::TemplateArgument> *)TemplateArgumentLists.BeginX)[2],view(cpp)}, ... - {*this,view(level0)} - - TemplateArgumentLists - - - - {(clang::QualType *)Arg,view(cpp)na} - Type template argument: {*(clang::QualType *)Arg} - Non-type template argument: {*(clang::Expr *)Arg} - Template template argument: {*(clang::TemplateName *)Arg - - Kind,en - (clang::QualType *)Arg - (clang::Expr *)Arg - (clang::TemplateName *)Arg - - - - - void - bool - char - unsigned char - wchar_t - char16_t - char32_t - unsigned short - unsigned int - unsigned long - unsigned long long - __uint128_t - char - signed char - wchar_t - short - int - long - long long - __int128_t - __fp16 - float - double - long double - nullptr_t - {(clang::BuiltinType::Kind)BuiltinTypeBits.Kind, en} - - (clang::BuiltinType::Kind)BuiltinTypeBits.Kind - - - - - - {((clang::TemplateArgument *)(this+1))[0],view(cpp)}{*this,view(arg1)} - - , {((clang::TemplateArgument *)(this+1))[1],view(cpp)}{*this,view(arg2)} - - , {((clang::TemplateArgument *)(this+1))[2],view(cpp)}{*this,view(arg3)} - - {*((clang::TemplateDecl *)(Template.Storage.Val.Value))->TemplatedDecl,view(cpp)}<{*this,view(arg0)}> - - Can't visualize this TemplateSpecializationType - - Template.Storage - - TemplateSpecializationTypeBits.NumArgs - (clang::TemplateArgument *)(this+1) - - *(clang::Type *)this, view(cmn) - - - - - (CanonicalType.Value.Value != this) || TypeBits.Dependent - *(clang::Type *)this,view(cmn) - - - - {CanonicalType,view(cpp)} - {Template,view(cpp)} - {Template} - - Template - CanonicalType,view(cpp) - (clang::DeducedType *)this - Template - - - - {*(CXXRecordDecl *)this,nd}{*TemplateArgs} - - (CXXRecordDecl *)this,nd - TemplateArgs - - - - {((llvm::StringMapEntry<clang::IdentifierInfo *>*)Entry)+1,sb} - - ((llvm::StringMapEntry<clang::IdentifierInfo *>*)Entry)+1,s - (clang::tok::TokenKind)TokenID - - - - - Empty - {*(clang::IdentifierInfo *)(Ptr & ~PtrMask)} - {{Identifier ({*(clang::IdentifierInfo *)(Ptr & ~PtrMask)})}} - {{ObjC Zero Arg Selector (*{(clang::IdentifierInfo *)(Ptr & ~PtrMask)})}} - {{ObjC One Arg Selector (*{(clang::IdentifierInfo *)(Ptr & ~PtrMask)})}} - {(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),view(cpp)na} - C++ Constructor {{{(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),view(cpp)na}}} - C++ Destructor {{*(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask)}} - C++ Conversion function {{*(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask)}} - C++ Operator {{*(clang::detail::CXXOperatorIdName *)(Ptr & ~PtrMask)}} - {*(clang::detail::DeclarationNameExtra *)(Ptr & ~PtrMask),view(cpp)} - {{Extra ({*(clang::detail::DeclarationNameExtra *)(Ptr & ~PtrMask)})}} - - StoredNameKind(Ptr & PtrMask),en - *(clang::IdentifierInfo *)(Ptr & ~PtrMask),na - *(clang::IdentifierInfo *)(Ptr & ~PtrMask),na - *(clang::IdentifierInfo *)(Ptr & ~PtrMask),na - *(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),na - *(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),na - *(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),na - *(clang::detail::CXXOperatorIdName *)(Ptr & ~PtrMask),na - (clang::detail::DeclarationNameExtra *)(Ptr & ~PtrMask),na - - - - - {(CXXDeductionGuideNameExtra *)this,view(cpp)nand} - - - {(CXXDeductionGuideNameExtra *)this,nand} - - C++ Literal operator - C++ Using directive - Objective-C MultiArg selector - {(clang::detail::DeclarationNameExtra::ExtraKind)ExtraKindOrNumArgs,en}{" ",sb}{*this,view(cpp)} - - (CXXDeductionGuideNameExtra *)this - ExtraKindOrNumArgs - - - - {Template->TemplatedDecl,view(cpp)} - C++ Deduction guide for {Template->TemplatedDecl,view(cpp)na} - - - {Type,view(cpp)} - {Type} - - - {Name} - - - - {(ParsedTemplateArgument *)(this+1),view(cpp)na}{this,view(arg1)na} - - , {((ParsedTemplateArgument *)(this+1))+1,view(cpp)na}{this,view(arg2)na} - - , ... - {Name,na}<{this,view(arg0)na}> - - Name - - {this,view(arg0)na} - - - NumArgs - (ParsedTemplateArgument *)(this+1) - - - - Operator - - - - {{annot_template_id ({(clang::TemplateIdAnnotation *)(PtrData),na})}} - {{Identifier ({(clang::IdentifierInfo *)(PtrData),na})}} - {(clang::tok::TokenKind)Kind,en} - - - {BufferPtr,nasb} - - - {TheLexer._Mypair._Myval2,na} - Expanding Macro: {TheTokenLexer._Mypair._Myval2,na} - - - - - [{(Token *)(CachedTokens.BeginX) + CachedLexPos,na}] {IncludeMacroStack._Mypair._Myval2._Mylast - 1,na} - - {IncludeMacroStack._Mypair._Myval2._Mylast - 1,na} - {CurLexer._Mypair._Myval2,na} - Expanding Macro: {CurTokenLexer._Mypair._Myval2,na} - - - {this,view(cached)} - - CLK_LexAfterModuleImport - - - [{Tok}] {PP,na} - - - this - *this - {Id} - &{Id} - No visualizer for {Kind} - - - - =, - &, - - {(LambdaCapture *)(Captures.BeginX),na}{this,view(capture1)na} - - ,{(LambdaCapture *)(Captures.BeginX)+1,na}{this,view(capture2)na} - - ,{(LambdaCapture *)(Captures.BeginX)+2,na}{this,view(capture3)na} - - ,... - [{this,view(default)na}{this,view(capture0)na}] - - - - , [{TypeRep}] - - - , [{ExprRep}] - - - , [{DeclRep}] - - - [{(clang::DeclSpec::SCS)StorageClassSpec,en}], [{(clang::TypeSpecifierType)TypeSpecType,en}]{this,view(extra)na} - - (clang::DeclSpec::SCS)StorageClassSpec - (clang::TypeSpecifierType)TypeSpecType - - TypeRep - - - ExprRep - - - DeclRep - - - - - - {Name,s} - - - {RealPathName,s} - - - {Name,s} - - - - (clang::StorageClass)SClass - (clang::ThreadStorageClassSpecifier)TSCSpec - (clang::VarDecl::InitializationStyle)InitStyle - - - - {DeclType,view(left)} {Name,view(cpp)}{DeclType,view(right)} - - Name - DeclType - - - - {(DeclaratorDecl*)this,nand} - - (DeclaratorDecl*)this,nd - Init - VarDeclBits - - - - {*(VarDecl*)this,nd} - - ParmVarDeclBits - *(VarDecl*)this,nd - - - - {"explicit ",sb} - - explicit({ExplicitSpec,view(ptr)na}) - {ExplicitSpec,view(int)en} - {ExplicitSpec,view(int)en} : {ExplicitSpec,view(ptr)na} - - - {ExplicitSpec,view(cpp)}{Name,view(cpp)nd}({(FunctionDecl*)this,view(parm0)nand}) -> {((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)(((uintptr_t)DeclType.Value.Value) & ~15))->BaseType)->ResultType,view(cpp)} - - ExplicitSpec - (bool)FunctionDeclBits.IsCopyDeductionCandidate - (FunctionDecl*)this,nd - - - - {((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)DeclType.Value.Value.Data) & ~15))->BaseType)->ResultType,view(cpp)} - - {ParamInfo[0],na}{*this,view(parm1)nd} - - , {ParamInfo[1],na}{*this,view(parm2)nd} - - , {ParamInfo[2],na}{*this,view(parm3)nd} - - , {ParamInfo[3],na}{*this,view(parm4)nd} - - , {ParamInfo[4],na}{*this,view(parm5)nd} - - , /* expand for more params */ - - auto {Name,view(cpp)nd}({*this,view(parm0)nd}) -> {((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)DeclType.Value.Value.Data) & ~15))->BaseType)->ResultType,view(cpp)} - - {this,view(retType)nand} {Name,view(cpp)nd}({*this,view(parm0)nd}) - - (clang::DeclaratorDecl *)this,nd - ((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)DeclType.Value.Value.Data) & ~15))->BaseType)->ResultType - - {*this,view(parm0)nd} - - - ((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)DeclType.Value.Value.Data) & ~15))->BaseType)->FunctionTypeBits.NumParams - ParamInfo - - - - TemplateOrSpecialization - - - - {*($T1*)&Ptr} - - ($T1*)&Ptr - - - - {($T1 *)Ptr} - - ($T1 *)Ptr - - - - - {*((NamedDecl **)(this+1))[0],view(cpp)}{*this,view(parm1)} - - , {*((NamedDecl **)(this+1))[1],view(cpp)}{*this,view(parm2)} - - , {*((NamedDecl **)(this+1))[2],view(cpp)}{*this,view(parm3)} - - , {*((NamedDecl **)(this+1))[3],view(cpp)}{*this,view(parm4)} - - , {*((NamedDecl **)(this+1))[4],view(cpp)}{*this,view(parm5)} - - , /* Expand for more params */ - <{*this,view(parm0)}> - - - NumParams - (NamedDecl **)(this+1) - - - - - {(clang::Stmt::StmtClass)StmtBits.sClass,en} - - (clang::Stmt::StmtClass)StmtBits.sClass,en - - - - {*(clang::StringLiteral *)this} - Expression of class {(clang::Stmt::StmtClass)StmtBits.sClass,en} and type {TR,view(cpp)} - - - - *(unsigned *)(((clang::StringLiteral *)this)+1) - (const char *)(((clang::StringLiteral *)this)+1)+4+4,[*(unsigned *)(((clang::StringLiteral *)this)+1)]s8 - - - - public - protected - private - - {*(clang::NamedDecl *)(Ptr&~Mask)} - {*this,view(access)} {*this,view(decl)} - - (clang::AccessSpecifier)(Ptr&Mask),en - *(clang::NamedDecl *)(Ptr&~Mask) - - - - [IK_Identifier] {*Identifier} - [IK_OperatorFunctionId] {OperatorFunctionId} - [IK_ConversionFunctionId] {ConversionFunctionId} - [IK_ConstructorName] {ConstructorName} - [IK_DestructorName] {DestructorName} - [IK_DeductionGuideName] {TemplateName} - [IK_TemplateId] {TemplateId} - [IK_ConstructorTemplateId] {TemplateId} - Kind - - Identifier - OperatorFunctionId - ConversionFunctionId - ConstructorName - DestructorName - TemplateName - TemplateId - TemplateId - - - - NumDecls={NumDecls} - - - NumDecls - (Decl **)(this+1) - - - - - {*D} - {*(DeclGroup *)((uintptr_t)D&~1)} - - D - (DeclGroup *)((uintptr_t)D&~1) - - - - {DS} {Name} - - - {Decls} - - Decls - - - - {Ambiguity,en}: {Decls} - {ResultKind,en}: {Decls} - - - Invalid - Unset - {Val} - - - Invalid - Unset - {($T1)(Value&~1)} - - (bool)(Value&1) - ($T1)(Value&~1) - - - + + + + + + + LocInfoType + {(clang::Type::TypeClass)TypeBits.TC, en}Type + + {*(clang::BuiltinType *)this} + {*(clang::PointerType *)this} + {*(clang::ParenType *)this} + {(clang::BitIntType *)this} + {*(clang::LValueReferenceType *)this} + {*(clang::RValueReferenceType *)this} + {(clang::ConstantArrayType *)this,na} + {(clang::ConstantArrayType *)this,view(left)na} + {(clang::ConstantArrayType *)this,view(right)na} + {(clang::VariableArrayType *)this,na} + {(clang::VariableArrayType *)this,view(left)na} + {(clang::VariableArrayType *)this,view(right)na} + {(clang::IncompleteArrayType *)this,na} + {(clang::IncompleteArrayType *)this,view(left)na} + {(clang::IncompleteArrayType *)this,view(right)na} + {(clang::TypedefType *)this,na} + {(clang::TypedefType *)this,view(cpp)na} + {*(clang::AttributedType *)this} + {(clang::DecayedType *)this,na} + {(clang::DecayedType *)this,view(left)na} + {(clang::DecayedType *)this,view(right)na} + {(clang::ElaboratedType *)this,na} + {(clang::ElaboratedType *)this,view(left)na} + {(clang::ElaboratedType *)this,view(right)na} + {*(clang::TemplateTypeParmType *)this} + {*(clang::TemplateTypeParmType *)this,view(cpp)} + {*(clang::SubstTemplateTypeParmType *)this} + {*(clang::RecordType *)this} + {*(clang::RecordType *)this,view(cpp)} + {(clang::FunctionProtoType *)this,na} + {(clang::FunctionProtoType *)this,view(left)na} + {(clang::FunctionProtoType *)this,view(right)na} + {*(clang::TemplateSpecializationType *)this} + {*(clang::DeducedTemplateSpecializationType *)this} + {*(clang::DeducedTemplateSpecializationType *)this,view(cpp)} + {*(clang::InjectedClassNameType *)this} + {*(clang::DependentNameType *)this} + {*(clang::PackExpansionType *)this} + {(clang::LocInfoType *)this,na} + {(clang::LocInfoType *)this,view(cpp)na} + {this,view(poly)na} + {*this,view(cpp)} + + No visualizer yet for {(clang::Type::TypeClass)TypeBits.TC,en}Type + Dependence{" ",en} + + CachedLinkage: {(clang::Linkage)TypeBits.CachedLinkage,en} CachedLocalOrUnnamed + CachedLinkage: {(clang::Linkage)TypeBits.CachedLinkage,en}{" ",sb} + + FromAST + + + No TypeBits set beyond TypeClass + + {*this, view(Dependence)}{*this, view(Cache)}{*this, view(FromAST)} + {*this,view(cmn)} {{{*this,view(poly)}}} + + (clang::Type::TypeClass)TypeBits.TC + this,view(flags)na + CanonicalType + *(clang::BuiltinType *)this + *(clang::PointerType *)this + *(clang::ParenType*)this + *(clang::BitIntType*)this + *(clang::LValueReferenceType *)this + *(clang::RValueReferenceType *)this + (clang::ConstantArrayType *)this + (clang::VariableArrayType *)this + (clang::IncompleteArrayType *)this + *(clang::AttributedType *)this + (clang::DecayedType *)this + (clang::ElaboratedType *)this + (clang::TemplateTypeParmType *)this + (clang::SubstTemplateTypeParmType *)this + (clang::RecordType *)this + (clang::FunctionProtoType *)this + (clang::TemplateSpecializationType *)this + (clang::DeducedTemplateSpecializationType *)this + (clang::InjectedClassNameType *)this + (clang::DependentNameType *)this + (clang::PackExpansionType *)this + (clang::LocInfoType *)this + + + + + ElementType + + + + {ElementType,view(cpp)} + [{Size}] + {ElementType,view(cpp)}[{Size}] + + Size + (clang::ArrayType *)this + + + + {ElementType,view(cpp)} + [] + {ElementType,view(cpp)}[] + + (clang::ArrayType *)this + + + + {ElementType,view(cpp)} + [*] + {ElementType,view(cpp)}[*] + + (clang::Expr *)SizeExpr + (clang::ArrayType *)this + + + + {Decl,view(name)nd} + {Decl} + + Decl + *(clang::Type *)this, view(cmn) + + + + {PointeeType, view(cpp)} * + + PointeeType + *(clang::Type *)this, view(cmn) + + + + {Inner, view(cpp)} + + Inner + *(clang::Type *)this, view(cmn) + + + + signed _BitInt({NumBits}) + unsigned _BitInt({NumBits})( + + NumBits + (clang::Type *)this, view(cmn) + + + + + {((clang::ReferenceType *)this)->PointeeType,view(cpp)} & + + *(clang::Type *)this, view(cmn) + PointeeType + + + + {((clang::ReferenceType *)this)->PointeeType,view(cpp)} && + + *(clang::Type *)this, view(cmn) + PointeeType + + + + {ModifiedType} Attribute={(clang::AttributedType::Kind)AttributedTypeBits.AttrKind} + + + + + {(clang::Decl::Kind)DeclContextBits.DeclKind,en}Decl + + (clang::Decl::Kind)DeclContextBits.DeclKind,en + + + + + FirstDecl + (clang::Decl *)(*(intptr_t *)NextInContextAndBits.Value.Data & ~3) + *this + + + + + + + Field {{{*(clang::DeclaratorDecl *)this,view(cpp)nd}}} + + + {*(clang::FunctionDecl *)this,nd} + Method {{{*this,view(cpp)}}} + + + Constructor {{{Name,view(cpp)}({*(clang::FunctionDecl *)this,view(parm0)nd})}} + + + Destructor {{~{Name,view(cpp)}()}} + + + typename + class + (not yet known if parameter pack) + ... + + {(TypeSourceInfo *)(*(uintptr_t *)DefaultArgument.ValueOrInherited.Val.Value.Data&~3LL),view(cpp)} + {{InheritedInitializer}} + = {this,view(DefaultArg)na} + + {*this,view(TorC)} {*this,view(MaybeEllipses)}{Name,view(cpp)} {this,view(Initializer)na} + + + {*TemplatedDecl,view(cpp)} + template{TemplateParams,na} {*TemplatedDecl}; + + TemplateParams,na + TemplatedDecl,na + + + + + {(clang::TypeSourceInfo *)(*(uintptr_t *)MaybeModedTInfo.Value.Data & ~7LL),view(cpp)na} + {(clang::TypedefNameDecl::ModedTInfo *)(*(uintptr_t *)MaybeModedTInfo.Value.Data & ~7LL),view(cpp)na} + {(TypeDecl *)this,view(cpp)nand} + typedef {this,view(type)na} {this,view(name)na}; + + "Not yet calculated",sb + (bool)(*(uintptr_t *)MaybeModedTInfo.Value.Data & 2) + (clang::TypeSourceInfo *)(*(uintptr_t *)MaybeModedTInfo.Value.Data & ~7LL) + (clang::TypedefNameDecl::ModedTInfo *)(*(uintptr_t *)MaybeModedTInfo.Value.Data & ~7LL) + (TypeDecl *)this,nd + + + + {(TypedefNameDecl *)this,view(name)nand} + using {(TypedefNameDecl *)this,view(name)nand} = {(TypedefNameDecl *)this,view(type)nand} + + + {Name} + + + Kind={(UncommonTemplateNameStorage::Kind)Kind,en}, Size={Size} + + (UncommonTemplateNameStorage::Kind)Kind + Size + + + + {Bits}, + {this,view(cmn)na},{(OverloadedTemplateStorage*)this,na} + {this,view(cmn)na},{(AssumedTemplateStorage*)this,na} + {this,view(cmn)na},{(SubstTemplateTemplateParmStorage*)this,na} + {this,view(cmn)na},{(SubstTemplateTemplateParmPackStorage*)this,na} + {this,view(cmn)na} + + Bits + (OverloadedTemplateStorage*)this + (AssumedTemplateStorage*)this + (SubstTemplateTemplateParmStorage*)this + (SubstTemplateTemplateParmPackStorage*)this + + + + + + + {(clang::TemplateDecl *)(Val.Value & ~3LL),view(cpp)na} + + + {(clang::TemplateDecl *)(Val.Value & ~3LL),na} + + + {(clang::UncommonTemplateNameStorage *)(Val.Value & ~3LL),view(cpp)na} + + + {(clang::UncommonTemplateNameStorage *)(Val.Value & ~3LL),na} + + + {(clang::QualifiedTemplateName *)(Val.Value & ~3LL),view(cpp)na} + + + {(clang::QualifiedTemplateName *)(Val.Value & ~3LL),na} + + + {(clang::DependentTemplateName *)(Val.Value & ~3LL),view(cpp)na} + + + {(clang::DependentTemplateName *)(Val.Value & ~3LL),na} + + + "TemplateDecl",s8b + + (clang::TemplateDecl *)(Val.Value & ~3LL) + + "UncommonTemplateNameStorage",s8b + + (clang::UncommonTemplateNameStorage *)(Val.Value & ~3LL) + + "QualifiedTemplateName",s8b + + (clang::QualifiedTemplateName *)(Val.Value & ~3LL) + + "DependentTemplateName",s8b + + (clang::DependentTemplateName *)(Val.Value & ~3LL) + + Val + + + + + {Storage,view(cpp)na} + {Storage,na} + + Storage + + + + {Name,view(cpp)} + {Name} + + + implicit{" ",sb} + + {*this,view(implicit)nd} + {*this,view(modifiers)}{Name,view(cpp)} + {*this,view(modifiers)nd}struct {Name,view(cpp)} + {*this,view(modifiers)nd}interface {Name,view(cpp)} + {*this,view(modifiers)nd}union {Name,view(cpp)} + {*this,view(modifiers)nd}class {Name,view(cpp)} + {*this,view(modifiers)nd}enum {Name,view(cpp)} + + (clang::DeclContext *)this + + + + {decl,view(cpp)na} + {*decl} + + *(clang::Type *)this, view(cmn) + decl + + + + {(clang::TagType *)this,view(cpp)na} + {(clang::TagType *)this,na} + + *(clang::TagType *)this + + + + {{{*Replaced,view(cpp)} <= {CanonicalType,view(cpp)}}} + + *(clang::Type *)this, view(cmn) + *Replaced + + + + + + {ResultType,view(cpp)} + + {*(clang::QualType *)(this+1),view(cpp)}{*this,view(parm1)} + + , {*((clang::QualType *)(this+1)+1),view(cpp)}{*this,view(parm2)} + + , {*((clang::QualType *)(this+1)+2),view(cpp)}{*this,view(parm3)} + + , {*((clang::QualType *)(this+1)+3),view(cpp)}{*this,view(parm4)} + + , {*((clang::QualType *)(this+1)+4),view(cpp)}{*this,view(parm5)} + + , /* expand for more params */ + ({*this,view(parm0)}) -> {ResultType,view(cpp)} + ({*this,view(parm0)}) + {this,view(left)na}{this,view(right)na} + + ResultType + + {*this,view(parm0)} + + + FunctionTypeBits.NumParams + (clang::QualType *)(this+1) + + + + *(clang::Type *)this, view(cmn) + + + + + {OriginalTy} adjusted to {AdjustedTy} + + OriginalTy + AdjustedTy + + + + {OriginalTy,view(left)} + {OriginalTy,view(right)} + {OriginalTy} + + (clang::AdjustedType *)this + + + + {NamedType,view(left)} + {NamedType,view(right)} + {NamedType} + + (clang::ElaboratedTypeKeyword)TypeWithKeywordBits.Keyword + NNS + NamedType,view(cmn) + + + + {TTPDecl->Name,view(cpp)} + Non-canonical: {*TTPDecl} + Canonical: {CanTTPTInfo} + + *(clang::Type *)this, view(cmn) + + + + {Decl,view(cpp)} + + Decl + InjectedType + *(clang::Type *)this, view(cmn) + + + + {NNS}{Name,view(cpp)na} + + NNS + Name + *(clang::Type *)this, view(cmn) + + + + + {(IdentifierInfo*)Specifier,view(cpp)na}:: + {(NamedDecl*)Specifier,view(cpp)na}:: + {(Type*)Specifier,view(cpp)na}:: + + (NestedNameSpecifier::StoredSpecifierKind)((*(uintptr_t *)Prefix.Value.Data>>1)&3) + + + + {Pattern} + + Pattern + NumExpansions + *(clang::Type *)this, view(cmn) + + + + {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,view(poly)}{*this,view(fastQuals)} + {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,view(cpp)}{*this,view(fastQuals)} + {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,view(left)}{*this,view(fastQuals)} + {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,view(right)}{*this,view(fastQuals)} + + + {" ",sb}const + {" ",sb}restrict + {" ",sb}const restrict + {" ",sb}volatile + {" ",sb}const volatile + {" ",sb}volatile restrict + {" ",sb}const volatile restrict + Cannot visualize non-fast qualifiers + Null + {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,na}{*this,view(fastQuals)} + + *this,view(fastQuals) + ((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType + + + + + {DeclInfo,view(cpp)na} + {DeclInfo,na} + + DeclInfo + *(clang::Type *)this, view(cmn) + + + + {Ty,view(cpp)} + {Ty} + + Ty + + + + {(QualType *)&Ty,na} + + (QualType *)&Ty + Data + + + + Not building anything + Building a {LastTy} + + + {Argument,view(cpp)} + {Argument} + + + {*(clang::QualType *)&TypeOrValue.V,view(cpp)} + {(clang::TemplateArgument::ArgKind)TypeOrValue.Kind,en} template argument: {*(clang::QualType *)&TypeOrValue.V} + + {Args.Args[0]}{*this,view(arg1)} + + , {Args.Args[1]}{*this,view(arg2)} + + , {Args.Args[2]}, ... + + {Args.Args[0],view(cpp)}{*this,view(arg1cpp)} + + , {Args.Args[1],view(cpp)}{*this,view(arg2cpp)} + + , {Args.Args[2],view(cpp)}, ... + {*this,view(arg0cpp)} + {*this,view(arg0)} + {(clang::Expr *)TypeOrValue.V,view(cpp)na} + {(clang::TemplateArgument::ArgKind)TypeOrValue.Kind,en} + + *(clang::QualType *)&TypeOrValue.V + (clang::Expr *)TypeOrValue.V + + Args.NumArgs + Args.Args + + + + + + + {((TemplateArgumentLoc*)Arguments.BeginX)[0],view(cpp)}{*this,view(elt1)} + + , {((TemplateArgumentLoc*)Arguments.BeginX)[1],view(cpp)}{*this,view(elt2)} + + , {((TemplateArgumentLoc*)Arguments.BeginX)[2],view(cpp)}{*this,view(elt3)} + + , {((TemplateArgumentLoc*)Arguments.BeginX)[3],view(cpp)}{*this,view(elt4)} + + , ... + empty + <{*this,view(elt0)}> + Uninitialized + + + + {Arguments[0],view(cpp)}{*this,view(arg1)} + + , {Arguments[1],view(cpp)}{*this,view(arg2)} + + , {Arguments[1],view(cpp)}, ... + <{*this,view(arg0)}> + + NumArguments + + NumArguments + Arguments + + + + + + {Data[0],view(cpp)}{*this,view(arg1)} + + , {Data[1],view(cpp)}{*this,view(arg2)} + + , {Data[2],view(cpp)}, ... + <{*this,view(arg0)}> + + Length + + + + Length + Data + + + + + + + + {((llvm::ArrayRef<clang::TemplateArgument> *)TemplateArgumentLists.BeginX)[0],view(cpp)}{*this,view(level1)} + + ::{((llvm::ArrayRef<clang::TemplateArgument> *)TemplateArgumentLists.BeginX)[1],view(cpp)}{*this,view(level2)} + + ::{((llvm::ArrayRef<clang::TemplateArgument> *)TemplateArgumentLists.BeginX)[2],view(cpp)}, ... + {*this,view(level0)} + + TemplateArgumentLists + + + + {(clang::QualType *)Arg,view(cpp)na} + Type template argument: {*(clang::QualType *)Arg} + Non-type template argument: {*(clang::Expr *)Arg} + Template template argument: {*(clang::TemplateName *)Arg + + Kind,en + (clang::QualType *)Arg + (clang::Expr *)Arg + (clang::TemplateName *)Arg + + + + + void + bool + char + unsigned char + wchar_t + char16_t + char32_t + unsigned short + unsigned int + unsigned long + unsigned long long + __uint128_t + char + signed char + wchar_t + short + int + long + long long + __int128_t + __fp16 + float + double + long double + nullptr_t + {(clang::BuiltinType::Kind)BuiltinTypeBits.Kind, en} + + (clang::BuiltinType::Kind)BuiltinTypeBits.Kind + + + + + + {((clang::TemplateArgument *)(this+1))[0],view(cpp)}{*this,view(arg1)} + + , {((clang::TemplateArgument *)(this+1))[1],view(cpp)}{*this,view(arg2)} + + , {((clang::TemplateArgument *)(this+1))[2],view(cpp)}{*this,view(arg3)} + + {*((clang::TemplateDecl *)(Template.Storage.Val.Value))->TemplatedDecl,view(cpp)}<{*this,view(arg0)}> + + Can't visualize this TemplateSpecializationType + + Template.Storage + + TemplateSpecializationTypeBits.NumArgs + (clang::TemplateArgument *)(this+1) + + *(clang::Type *)this, view(cmn) + + + + + (CanonicalType.Value.Value != this) || TypeBits.Dependent + *(clang::Type *)this,view(cmn) + + + + {CanonicalType,view(cpp)} + {Template,view(cpp)} + {Template} + + Template + CanonicalType,view(cpp) + (clang::DeducedType *)this + Template + + + + {*(CXXRecordDecl *)this,nd}{*TemplateArgs} + + (CXXRecordDecl *)this,nd + TemplateArgs + + + + {((llvm::StringMapEntry<clang::IdentifierInfo *>*)Entry)+1,sb} + + ((llvm::StringMapEntry<clang::IdentifierInfo *>*)Entry)+1,s + (clang::tok::TokenKind)TokenID + + + + + Empty + {*(clang::IdentifierInfo *)(Ptr & ~PtrMask)} + {{Identifier ({*(clang::IdentifierInfo *)(Ptr & ~PtrMask)})}} + {{ObjC Zero Arg Selector (*{(clang::IdentifierInfo *)(Ptr & ~PtrMask)})}} + {{ObjC One Arg Selector (*{(clang::IdentifierInfo *)(Ptr & ~PtrMask)})}} + {(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),view(cpp)na} + C++ Constructor {{{(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),view(cpp)na}}} + C++ Destructor {{*(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask)}} + C++ Conversion function {{*(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask)}} + C++ Operator {{*(clang::detail::CXXOperatorIdName *)(Ptr & ~PtrMask)}} + {*(clang::detail::DeclarationNameExtra *)(Ptr & ~PtrMask),view(cpp)} + {{Extra ({*(clang::detail::DeclarationNameExtra *)(Ptr & ~PtrMask)})}} + + StoredNameKind(Ptr & PtrMask),en + *(clang::IdentifierInfo *)(Ptr & ~PtrMask),na + *(clang::IdentifierInfo *)(Ptr & ~PtrMask),na + *(clang::IdentifierInfo *)(Ptr & ~PtrMask),na + *(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),na + *(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),na + *(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),na + *(clang::detail::CXXOperatorIdName *)(Ptr & ~PtrMask),na + (clang::detail::DeclarationNameExtra *)(Ptr & ~PtrMask),na + + + + + {(CXXDeductionGuideNameExtra *)this,view(cpp)nand} + + + {(CXXDeductionGuideNameExtra *)this,nand} + + C++ Literal operator + C++ Using directive + Objective-C MultiArg selector + {(clang::detail::DeclarationNameExtra::ExtraKind)ExtraKindOrNumArgs,en}{" ",sb}{*this,view(cpp)} + + (CXXDeductionGuideNameExtra *)this + ExtraKindOrNumArgs + + + + {Template->TemplatedDecl,view(cpp)} + C++ Deduction guide for {Template->TemplatedDecl,view(cpp)na} + + + {Type,view(cpp)} + {Type} + + + {Name} + + + + {(ParsedTemplateArgument *)(this+1),view(cpp)na}{this,view(arg1)na} + + , {((ParsedTemplateArgument *)(this+1))+1,view(cpp)na}{this,view(arg2)na} + + , ... + {Name,na}<{this,view(arg0)na}> + + Name + + {this,view(arg0)na} + + + NumArgs + (ParsedTemplateArgument *)(this+1) + + + + Operator + + + + {{annot_template_id ({(clang::TemplateIdAnnotation *)(PtrData),na})}} + {{Identifier ({(clang::IdentifierInfo *)(PtrData),na})}} + {(clang::tok::TokenKind)Kind,en} + + + {BufferPtr,nasb} + + + {TheLexer._Mypair._Myval2,na} + Expanding Macro: {TheTokenLexer._Mypair._Myval2,na} + + + + + [{(Token *)(CachedTokens.BeginX) + CachedLexPos,na}] {IncludeMacroStack._Mypair._Myval2._Mylast - 1,na} + + {IncludeMacroStack._Mypair._Myval2._Mylast - 1,na} + {CurLexer._Mypair._Myval2,na} + Expanding Macro: {CurTokenLexer._Mypair._Myval2,na} + + + {this,view(cached)} + + CLK_LexAfterModuleImport + + + [{Tok}] {PP,na} + + + this + *this + {Id} + &{Id} + No visualizer for {Kind} + + + + =, + &, + + {(LambdaCapture *)(Captures.BeginX),na}{this,view(capture1)na} + + ,{(LambdaCapture *)(Captures.BeginX)+1,na}{this,view(capture2)na} + + ,{(LambdaCapture *)(Captures.BeginX)+2,na}{this,view(capture3)na} + + ,... + [{this,view(default)na}{this,view(capture0)na}] + + + + , [{TypeRep}] + + + , [{ExprRep}] + + + , [{DeclRep}] + + + [{(clang::DeclSpec::SCS)StorageClassSpec,en}], [{(clang::TypeSpecifierType)TypeSpecType,en}]{this,view(extra)na} + + (clang::DeclSpec::SCS)StorageClassSpec + (clang::TypeSpecifierType)TypeSpecType + + TypeRep + + + ExprRep + + + DeclRep + + + + + + {Name,s} + + + {RealPathName,s} + + + {Name,s} + + + + (clang::StorageClass)SClass + (clang::ThreadStorageClassSpecifier)TSCSpec + (clang::VarDecl::InitializationStyle)InitStyle + + + + {DeclType,view(left)} {Name,view(cpp)}{DeclType,view(right)} + + Name + DeclType + + + + {(DeclaratorDecl*)this,nand} + + (DeclaratorDecl*)this,nd + Init + VarDeclBits + + + + {*(VarDecl*)this,nd} + + ParmVarDeclBits + *(VarDecl*)this,nd + + + + {"explicit ",sb} + + explicit({ExplicitSpec,view(ptr)na}) + {ExplicitSpec,view(int)en} + {ExplicitSpec,view(int)en} : {ExplicitSpec,view(ptr)na} + + + {ExplicitSpec,view(cpp)}{Name,view(cpp)nd}({(FunctionDecl*)this,view(parm0)nand}) -> {((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)(((uintptr_t)DeclType.Value.Value) & ~15))->BaseType)->ResultType,view(cpp)} + + ExplicitSpec + (bool)FunctionDeclBits.IsCopyDeductionCandidate + (FunctionDecl*)this,nd + + + + {((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)DeclType.Value.Value.Data) & ~15))->BaseType)->ResultType,view(cpp)} + + {ParamInfo[0],na}{*this,view(parm1)nd} + + , {ParamInfo[1],na}{*this,view(parm2)nd} + + , {ParamInfo[2],na}{*this,view(parm3)nd} + + , {ParamInfo[3],na}{*this,view(parm4)nd} + + , {ParamInfo[4],na}{*this,view(parm5)nd} + + , /* expand for more params */ + + auto {Name,view(cpp)nd}({*this,view(parm0)nd}) -> {((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)DeclType.Value.Value.Data) & ~15))->BaseType)->ResultType,view(cpp)} + + {this,view(retType)nand} {Name,view(cpp)nd}({*this,view(parm0)nd}) + + (clang::DeclaratorDecl *)this,nd + ((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)DeclType.Value.Value.Data) & ~15))->BaseType)->ResultType + + {*this,view(parm0)nd} + + + ((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)DeclType.Value.Value.Data) & ~15))->BaseType)->FunctionTypeBits.NumParams + ParamInfo + + + + TemplateOrSpecialization + + + + {*($T1*)&Ptr} + + ($T1*)&Ptr + + + + {($T1 *)Ptr} + + ($T1 *)Ptr + + + + + {*((NamedDecl **)(this+1))[0],view(cpp)}{*this,view(parm1)} + + , {*((NamedDecl **)(this+1))[1],view(cpp)}{*this,view(parm2)} + + , {*((NamedDecl **)(this+1))[2],view(cpp)}{*this,view(parm3)} + + , {*((NamedDecl **)(this+1))[3],view(cpp)}{*this,view(parm4)} + + , {*((NamedDecl **)(this+1))[4],view(cpp)}{*this,view(parm5)} + + , /* Expand for more params */ + <{*this,view(parm0)}> + + + NumParams + (NamedDecl **)(this+1) + + + + + {(clang::Stmt::StmtClass)StmtBits.sClass,en} + + (clang::Stmt::StmtClass)StmtBits.sClass,en + + + + {*(clang::StringLiteral *)this} + Expression of class {(clang::Stmt::StmtClass)StmtBits.sClass,en} and type {TR,view(cpp)} + + + + *(unsigned *)(((clang::StringLiteral *)this)+1) + (const char *)(((clang::StringLiteral *)this)+1)+4+4,[*(unsigned *)(((clang::StringLiteral *)this)+1)]s8 + + + + public + protected + private + + {*(clang::NamedDecl *)(Ptr&~Mask)} + {*this,view(access)} {*this,view(decl)} + + (clang::AccessSpecifier)(Ptr&Mask),en + *(clang::NamedDecl *)(Ptr&~Mask) + + + + [IK_Identifier] {*Identifier} + [IK_OperatorFunctionId] {OperatorFunctionId} + [IK_ConversionFunctionId] {ConversionFunctionId} + [IK_ConstructorName] {ConstructorName} + [IK_DestructorName] {DestructorName} + [IK_DeductionGuideName] {TemplateName} + [IK_TemplateId] {TemplateId} + [IK_ConstructorTemplateId] {TemplateId} + Kind + + Identifier + OperatorFunctionId + ConversionFunctionId + ConstructorName + DestructorName + TemplateName + TemplateId + TemplateId + + + + NumDecls={NumDecls} + + + NumDecls + (Decl **)(this+1) + + + + + {*D} + {*(DeclGroup *)((uintptr_t)D&~1)} + + D + (DeclGroup *)((uintptr_t)D&~1) + + + + {DS} {Name} + + + {Decls} + + Decls + + + + {Ambiguity,en}: {Decls} + {ResultKind,en}: {Decls} + + + Invalid + Unset + {Val} + + + Invalid + Unset + {($T1)(Value&~1)} + + (bool)(Value&1) + ($T1)(Value&~1) + + + diff --git a/flang/test/Driver/msvc-dependent-lib-flags.f90 b/flang/test/Driver/msvc-dependent-lib-flags.f90 index 765917f07d8e72..1b7ecb604ad67d 100644 --- a/flang/test/Driver/msvc-dependent-lib-flags.f90 +++ b/flang/test/Driver/msvc-dependent-lib-flags.f90 @@ -1,36 +1,36 @@ -! RUN: %flang -### --target=aarch64-windows-msvc -resource-dir=%S/Inputs/resource_dir %S/Inputs/hello.f90 -v 2>&1 | FileCheck %s --check-prefixes=MSVC -! RUN: %flang -### --target=aarch64-windows-msvc -resource-dir=%S/Inputs/resource_dir -fms-runtime-lib=static_dbg %S/Inputs/hello.f90 -v 2>&1 | FileCheck %s --check-prefixes=MSVC-DEBUG -! RUN: %flang -### --target=aarch64-windows-msvc -resource-dir=%S/Inputs/resource_dir -fms-runtime-lib=dll %S/Inputs/hello.f90 -v 2>&1 | FileCheck %s --check-prefixes=MSVC-DLL -! RUN: %flang -### --target=aarch64-windows-msvc -resource-dir=%S/Inputs/resource_dir -fms-runtime-lib=dll_dbg %S/Inputs/hello.f90 -v 2>&1 | FileCheck %s --check-prefixes=MSVC-DLL-DEBUG - -! MSVC: -fc1 -! MSVC-SAME: --dependent-lib=clang_rt.builtins.lib -! MSVC-SAME: -D_MT -! MSVC-SAME: --dependent-lib=libcmt -! MSVC-SAME: --dependent-lib=FortranRuntime.static.lib -! MSVC-SAME: --dependent-lib=FortranDecimal.static.lib - -! MSVC-DEBUG: -fc1 -! MSVC-DEBUG-SAME: --dependent-lib=clang_rt.builtins.lib -! MSVC-DEBUG-SAME: -D_MT -! MSVC-DEBUG-SAME: -D_DEBUG -! MSVC-DEBUG-SAME: --dependent-lib=libcmtd -! MSVC-DEBUG-SAME: --dependent-lib=FortranRuntime.static_dbg.lib -! MSVC-DEBUG-SAME: --dependent-lib=FortranDecimal.static_dbg.lib - -! MSVC-DLL: -fc1 -! MSVC-DLL-SAME: --dependent-lib=clang_rt.builtins.lib -! MSVC-DLL-SAME: -D_MT -! MSVC-DLL-SAME: -D_DLL -! MSVC-DLL-SAME: --dependent-lib=msvcrt -! MSVC-DLL-SAME: --dependent-lib=FortranRuntime.dynamic.lib -! MSVC-DLL-SAME: --dependent-lib=FortranDecimal.dynamic.lib - -! MSVC-DLL-DEBUG: -fc1 -! MSVC-DLL-DEBUG-SAME: --dependent-lib=clang_rt.builtins.lib -! MSVC-DLL-DEBUG-SAME: -D_MT -! MSVC-DLL-DEBUG-SAME: -D_DEBUG -! MSVC-DLL-DEBUG-SAME: -D_DLL -! MSVC-DLL-DEBUG-SAME: --dependent-lib=msvcrtd -! MSVC-DLL-DEBUG-SAME: --dependent-lib=FortranRuntime.dynamic_dbg.lib -! MSVC-DLL-DEBUG-SAME: --dependent-lib=FortranDecimal.dynamic_dbg.lib +! RUN: %flang -### --target=aarch64-windows-msvc -resource-dir=%S/Inputs/resource_dir %S/Inputs/hello.f90 -v 2>&1 | FileCheck %s --check-prefixes=MSVC +! RUN: %flang -### --target=aarch64-windows-msvc -resource-dir=%S/Inputs/resource_dir -fms-runtime-lib=static_dbg %S/Inputs/hello.f90 -v 2>&1 | FileCheck %s --check-prefixes=MSVC-DEBUG +! RUN: %flang -### --target=aarch64-windows-msvc -resource-dir=%S/Inputs/resource_dir -fms-runtime-lib=dll %S/Inputs/hello.f90 -v 2>&1 | FileCheck %s --check-prefixes=MSVC-DLL +! RUN: %flang -### --target=aarch64-windows-msvc -resource-dir=%S/Inputs/resource_dir -fms-runtime-lib=dll_dbg %S/Inputs/hello.f90 -v 2>&1 | FileCheck %s --check-prefixes=MSVC-DLL-DEBUG + +! MSVC: -fc1 +! MSVC-SAME: --dependent-lib=clang_rt.builtins.lib +! MSVC-SAME: -D_MT +! MSVC-SAME: --dependent-lib=libcmt +! MSVC-SAME: --dependent-lib=FortranRuntime.static.lib +! MSVC-SAME: --dependent-lib=FortranDecimal.static.lib + +! MSVC-DEBUG: -fc1 +! MSVC-DEBUG-SAME: --dependent-lib=clang_rt.builtins.lib +! MSVC-DEBUG-SAME: -D_MT +! MSVC-DEBUG-SAME: -D_DEBUG +! MSVC-DEBUG-SAME: --dependent-lib=libcmtd +! MSVC-DEBUG-SAME: --dependent-lib=FortranRuntime.static_dbg.lib +! MSVC-DEBUG-SAME: --dependent-lib=FortranDecimal.static_dbg.lib + +! MSVC-DLL: -fc1 +! MSVC-DLL-SAME: --dependent-lib=clang_rt.builtins.lib +! MSVC-DLL-SAME: -D_MT +! MSVC-DLL-SAME: -D_DLL +! MSVC-DLL-SAME: --dependent-lib=msvcrt +! MSVC-DLL-SAME: --dependent-lib=FortranRuntime.dynamic.lib +! MSVC-DLL-SAME: --dependent-lib=FortranDecimal.dynamic.lib + +! MSVC-DLL-DEBUG: -fc1 +! MSVC-DLL-DEBUG-SAME: --dependent-lib=clang_rt.builtins.lib +! MSVC-DLL-DEBUG-SAME: -D_MT +! MSVC-DLL-DEBUG-SAME: -D_DEBUG +! MSVC-DLL-DEBUG-SAME: -D_DLL +! MSVC-DLL-DEBUG-SAME: --dependent-lib=msvcrtd +! MSVC-DLL-DEBUG-SAME: --dependent-lib=FortranRuntime.dynamic_dbg.lib +! MSVC-DLL-DEBUG-SAME: --dependent-lib=FortranDecimal.dynamic_dbg.lib diff --git a/lldb/test/API/commands/expression/ir-interpreter-phi-nodes/Makefile b/lldb/test/API/commands/expression/ir-interpreter-phi-nodes/Makefile index a1f689e07c77ff..d420a34c03e785 100644 --- a/lldb/test/API/commands/expression/ir-interpreter-phi-nodes/Makefile +++ b/lldb/test/API/commands/expression/ir-interpreter-phi-nodes/Makefile @@ -1,4 +1,4 @@ - -CXX_SOURCES := main.cpp - -include Makefile.rules + +CXX_SOURCES := main.cpp + +include Makefile.rules diff --git a/lldb/test/API/functionalities/postmortem/minidump/fizzbuzz.syms b/lldb/test/API/functionalities/postmortem/minidump/fizzbuzz.syms index cab06c1c9d50b1..e817a491af5750 100644 --- a/lldb/test/API/functionalities/postmortem/minidump/fizzbuzz.syms +++ b/lldb/test/API/functionalities/postmortem/minidump/fizzbuzz.syms @@ -1,2 +1,2 @@ -MODULE windows x86 0F45B7919A9646F9BF8F2D6076EA421A11 fizzbuzz.pdb -PUBLIC 1000 0 main +MODULE windows x86 0F45B7919A9646F9BF8F2D6076EA421A11 fizzbuzz.pdb +PUBLIC 1000 0 main diff --git a/lldb/test/API/functionalities/target-new-solib-notifications/Makefile b/lldb/test/API/functionalities/target-new-solib-notifications/Makefile index e3b48697fd7837..745f6cc9d65ae3 100644 --- a/lldb/test/API/functionalities/target-new-solib-notifications/Makefile +++ b/lldb/test/API/functionalities/target-new-solib-notifications/Makefile @@ -1,23 +1,23 @@ -CXX_SOURCES := main.cpp -LD_EXTRAS := -L. -l_d -l_c -l_a -l_b - -a.out: lib_b lib_a lib_c lib_d - -include Makefile.rules - -lib_a: lib_b - "$(MAKE)" -f $(MAKEFILE_RULES) \ - DYLIB_ONLY=YES DYLIB_CXX_SOURCES=a.cpp DYLIB_NAME=_a \ - LD_EXTRAS="-L. -l_b" - -lib_b: - "$(MAKE)" -f $(MAKEFILE_RULES) \ - DYLIB_ONLY=YES DYLIB_CXX_SOURCES=b.cpp DYLIB_NAME=_b - -lib_c: - "$(MAKE)" -f $(MAKEFILE_RULES) \ - DYLIB_ONLY=YES DYLIB_CXX_SOURCES=c.cpp DYLIB_NAME=_c - -lib_d: - "$(MAKE)" -f $(MAKEFILE_RULES) \ - DYLIB_ONLY=YES DYLIB_CXX_SOURCES=d.cpp DYLIB_NAME=_d +CXX_SOURCES := main.cpp +LD_EXTRAS := -L. -l_d -l_c -l_a -l_b + +a.out: lib_b lib_a lib_c lib_d + +include Makefile.rules + +lib_a: lib_b + "$(MAKE)" -f $(MAKEFILE_RULES) \ + DYLIB_ONLY=YES DYLIB_CXX_SOURCES=a.cpp DYLIB_NAME=_a \ + LD_EXTRAS="-L. -l_b" + +lib_b: + "$(MAKE)" -f $(MAKEFILE_RULES) \ + DYLIB_ONLY=YES DYLIB_CXX_SOURCES=b.cpp DYLIB_NAME=_b + +lib_c: + "$(MAKE)" -f $(MAKEFILE_RULES) \ + DYLIB_ONLY=YES DYLIB_CXX_SOURCES=c.cpp DYLIB_NAME=_c + +lib_d: + "$(MAKE)" -f $(MAKEFILE_RULES) \ + DYLIB_ONLY=YES DYLIB_CXX_SOURCES=d.cpp DYLIB_NAME=_d diff --git a/lldb/test/API/functionalities/target-new-solib-notifications/a.cpp b/lldb/test/API/functionalities/target-new-solib-notifications/a.cpp index 778b46ed5cef1a..66633b70ee1e50 100644 --- a/lldb/test/API/functionalities/target-new-solib-notifications/a.cpp +++ b/lldb/test/API/functionalities/target-new-solib-notifications/a.cpp @@ -1,3 +1,3 @@ -extern "C" int b_function(); - -extern "C" int a_function() { return b_function(); } +extern "C" int b_function(); + +extern "C" int a_function() { return b_function(); } diff --git a/lldb/test/API/functionalities/target-new-solib-notifications/b.cpp b/lldb/test/API/functionalities/target-new-solib-notifications/b.cpp index 4f1a4032ee0eed..8b16fbdb5728cd 100644 --- a/lldb/test/API/functionalities/target-new-solib-notifications/b.cpp +++ b/lldb/test/API/functionalities/target-new-solib-notifications/b.cpp @@ -1 +1 @@ -extern "C" int b_function() { return 500; } +extern "C" int b_function() { return 500; } diff --git a/lldb/test/API/functionalities/target-new-solib-notifications/c.cpp b/lldb/test/API/functionalities/target-new-solib-notifications/c.cpp index 8abd1b155a7590..120c88f2bb609a 100644 --- a/lldb/test/API/functionalities/target-new-solib-notifications/c.cpp +++ b/lldb/test/API/functionalities/target-new-solib-notifications/c.cpp @@ -1 +1 @@ -extern "C" int c_function() { return 600; } +extern "C" int c_function() { return 600; } diff --git a/lldb/test/API/functionalities/target-new-solib-notifications/d.cpp b/lldb/test/API/functionalities/target-new-solib-notifications/d.cpp index 58888a29ba323a..d37ad2621ae4e9 100644 --- a/lldb/test/API/functionalities/target-new-solib-notifications/d.cpp +++ b/lldb/test/API/functionalities/target-new-solib-notifications/d.cpp @@ -1 +1 @@ -extern "C" int d_function() { return 700; } +extern "C" int d_function() { return 700; } diff --git a/lldb/test/API/functionalities/target-new-solib-notifications/main.cpp b/lldb/test/API/functionalities/target-new-solib-notifications/main.cpp index 77b38c5ccdc698..bd2c79cdab9daa 100644 --- a/lldb/test/API/functionalities/target-new-solib-notifications/main.cpp +++ b/lldb/test/API/functionalities/target-new-solib-notifications/main.cpp @@ -1,16 +1,16 @@ -#include - -extern "C" int a_function(); -extern "C" int c_function(); -extern "C" int b_function(); -extern "C" int d_function(); - -int main() { - a_function(); - b_function(); - c_function(); - d_function(); - - puts("running"); // breakpoint here - return 0; -} +#include + +extern "C" int a_function(); +extern "C" int c_function(); +extern "C" int b_function(); +extern "C" int d_function(); + +int main() { + a_function(); + b_function(); + c_function(); + d_function(); + + puts("running"); // breakpoint here + return 0; +} diff --git a/lldb/test/API/functionalities/unwind/zeroth_frame/Makefile b/lldb/test/API/functionalities/unwind/zeroth_frame/Makefile index 15a931850e17e5..10495940055b63 100644 --- a/lldb/test/API/functionalities/unwind/zeroth_frame/Makefile +++ b/lldb/test/API/functionalities/unwind/zeroth_frame/Makefile @@ -1,3 +1,3 @@ -C_SOURCES := main.c - -include Makefile.rules +C_SOURCES := main.c + +include Makefile.rules diff --git a/lldb/test/API/functionalities/unwind/zeroth_frame/TestZerothFrame.py b/lldb/test/API/functionalities/unwind/zeroth_frame/TestZerothFrame.py index d660844405e137..70f72c72c8340e 100644 --- a/lldb/test/API/functionalities/unwind/zeroth_frame/TestZerothFrame.py +++ b/lldb/test/API/functionalities/unwind/zeroth_frame/TestZerothFrame.py @@ -1,88 +1,88 @@ -""" -Test that line information is recalculated properly for a frame when it moves -from the middle of the backtrace to a zero index. - -This is a regression test for a StackFrame bug, where whether frame is zero or -not depends on an internal field. When LLDB was updating its frame list value -of the field wasn't copied into existing StackFrame instances, so those -StackFrame instances, would use an incorrect line entry evaluation logic in -situations if it was in the middle of the stack frame list (not zeroth), and -then moved to the top position. The difference in logic is that for zeroth -frames line entry is returned for program counter, while for other frame -(except for those that "behave like zeroth") it is for the instruction -preceding PC, as PC points to the next instruction after function call. When -the bug is present, when execution stops at the second breakpoint -SBFrame.GetLineEntry() returns line entry for the previous line, rather than -the one with a breakpoint. Note that this is specific to -SBFrame.GetLineEntry(), SBFrame.GetPCAddress().GetLineEntry() would return -correct entry. - -This bug doesn't reproduce through an LLDB interpretator, however it happens -when using API directly, for example in LLDB-MI. -""" - -import lldb -from lldbsuite.test.decorators import * -from lldbsuite.test.lldbtest import * -from lldbsuite.test import lldbutil - - -class ZerothFrame(TestBase): - def test(self): - """ - Test that line information is recalculated properly for a frame when it moves - from the middle of the backtrace to a zero index. - """ - self.build() - self.setTearDownCleanup() - - exe = self.getBuildArtifact("a.out") - target = self.dbg.CreateTarget(exe) - self.assertTrue(target, VALID_TARGET) - - main_dot_c = lldb.SBFileSpec("main.c") - bp1 = target.BreakpointCreateBySourceRegex( - "// Set breakpoint 1 here", main_dot_c - ) - bp2 = target.BreakpointCreateBySourceRegex( - "// Set breakpoint 2 here", main_dot_c - ) - - process = target.LaunchSimple(None, None, self.get_process_working_directory()) - self.assertTrue(process, VALID_PROCESS) - - thread = self.thread() - - if self.TraceOn(): - print("Backtrace at the first breakpoint:") - for f in thread.frames: - print(f) - - # Check that we have stopped at correct breakpoint. - self.assertEqual( - thread.frame[0].GetLineEntry().GetLine(), - bp1.GetLocationAtIndex(0).GetAddress().GetLineEntry().GetLine(), - "LLDB reported incorrect line number.", - ) - - # Important to use SBProcess::Continue() instead of - # self.runCmd('continue'), because the problem doesn't reproduce with - # 'continue' command. - process.Continue() - - if self.TraceOn(): - print("Backtrace at the second breakpoint:") - for f in thread.frames: - print(f) - # Check that we have stopped at the breakpoint - self.assertEqual( - thread.frame[0].GetLineEntry().GetLine(), - bp2.GetLocationAtIndex(0).GetAddress().GetLineEntry().GetLine(), - "LLDB reported incorrect line number.", - ) - # Double-check with GetPCAddress() - self.assertEqual( - thread.frame[0].GetLineEntry().GetLine(), - thread.frame[0].GetPCAddress().GetLineEntry().GetLine(), - "LLDB reported incorrect line number.", - ) +""" +Test that line information is recalculated properly for a frame when it moves +from the middle of the backtrace to a zero index. + +This is a regression test for a StackFrame bug, where whether frame is zero or +not depends on an internal field. When LLDB was updating its frame list value +of the field wasn't copied into existing StackFrame instances, so those +StackFrame instances, would use an incorrect line entry evaluation logic in +situations if it was in the middle of the stack frame list (not zeroth), and +then moved to the top position. The difference in logic is that for zeroth +frames line entry is returned for program counter, while for other frame +(except for those that "behave like zeroth") it is for the instruction +preceding PC, as PC points to the next instruction after function call. When +the bug is present, when execution stops at the second breakpoint +SBFrame.GetLineEntry() returns line entry for the previous line, rather than +the one with a breakpoint. Note that this is specific to +SBFrame.GetLineEntry(), SBFrame.GetPCAddress().GetLineEntry() would return +correct entry. + +This bug doesn't reproduce through an LLDB interpretator, however it happens +when using API directly, for example in LLDB-MI. +""" + +import lldb +from lldbsuite.test.decorators import * +from lldbsuite.test.lldbtest import * +from lldbsuite.test import lldbutil + + +class ZerothFrame(TestBase): + def test(self): + """ + Test that line information is recalculated properly for a frame when it moves + from the middle of the backtrace to a zero index. + """ + self.build() + self.setTearDownCleanup() + + exe = self.getBuildArtifact("a.out") + target = self.dbg.CreateTarget(exe) + self.assertTrue(target, VALID_TARGET) + + main_dot_c = lldb.SBFileSpec("main.c") + bp1 = target.BreakpointCreateBySourceRegex( + "// Set breakpoint 1 here", main_dot_c + ) + bp2 = target.BreakpointCreateBySourceRegex( + "// Set breakpoint 2 here", main_dot_c + ) + + process = target.LaunchSimple(None, None, self.get_process_working_directory()) + self.assertTrue(process, VALID_PROCESS) + + thread = self.thread() + + if self.TraceOn(): + print("Backtrace at the first breakpoint:") + for f in thread.frames: + print(f) + + # Check that we have stopped at correct breakpoint. + self.assertEqual( + thread.frame[0].GetLineEntry().GetLine(), + bp1.GetLocationAtIndex(0).GetAddress().GetLineEntry().GetLine(), + "LLDB reported incorrect line number.", + ) + + # Important to use SBProcess::Continue() instead of + # self.runCmd('continue'), because the problem doesn't reproduce with + # 'continue' command. + process.Continue() + + if self.TraceOn(): + print("Backtrace at the second breakpoint:") + for f in thread.frames: + print(f) + # Check that we have stopped at the breakpoint + self.assertEqual( + thread.frame[0].GetLineEntry().GetLine(), + bp2.GetLocationAtIndex(0).GetAddress().GetLineEntry().GetLine(), + "LLDB reported incorrect line number.", + ) + # Double-check with GetPCAddress() + self.assertEqual( + thread.frame[0].GetLineEntry().GetLine(), + thread.frame[0].GetPCAddress().GetLineEntry().GetLine(), + "LLDB reported incorrect line number.", + ) diff --git a/lldb/test/API/python_api/debugger/Makefile b/lldb/test/API/python_api/debugger/Makefile index bfad5f33e86753..99998b20bcb050 100644 --- a/lldb/test/API/python_api/debugger/Makefile +++ b/lldb/test/API/python_api/debugger/Makefile @@ -1,3 +1,3 @@ -CXX_SOURCES := main.cpp - -include Makefile.rules +CXX_SOURCES := main.cpp + +include Makefile.rules diff --git a/lldb/test/Shell/BuildScript/modes.test b/lldb/test/Shell/BuildScript/modes.test index 02311f712d770f..1ce50104855f46 100644 --- a/lldb/test/Shell/BuildScript/modes.test +++ b/lldb/test/Shell/BuildScript/modes.test @@ -1,35 +1,35 @@ -RUN: %build -n --verbose --arch=32 --mode=compile --compiler=any -o %t/foo.out foobar.c \ -RUN: | FileCheck --check-prefix=COMPILE %s - -RUN: %build -n --verbose --arch=32 --mode=compile --compiler=any --outdir %t foo.c bar.c \ -RUN: | FileCheck --check-prefix=COMPILE-MULTI %s - -RUN: %build -n --verbose --arch=32 --mode=link --compiler=any -o %t/foo.exe foobar.obj \ -RUN: | FileCheck --check-prefix=LINK %s - -RUN: %build -n --verbose --arch=32 --mode=link --compiler=any -o %t/foobar.exe foo.obj bar.obj \ -RUN: | FileCheck --check-prefix=LINK-MULTI %s - -RUN: %build -n --verbose --arch=32 --mode=compile-and-link --compiler=any -o %t/foobar.exe foobar.c \ -RUN: | FileCheck --check-prefix=BOTH %s - -RUN: %build -n --verbose --arch=32 --mode=compile-and-link --compiler=any -o %t/foobar.exe foo.c bar.c \ -RUN: | FileCheck --check-prefix=BOTH-MULTI %s - - -COMPILE: compiling foobar.c -> foo.out - -COMPILE-MULTI: compiling foo.c -> foo.o{{(bj)?}} -COMPILE-MULTI: compiling bar.c -> bar.o{{(bj)?}} - - -LINK: linking foobar.obj -> foo.exe - -LINK-MULTI: linking foo.obj+bar.obj -> foobar.exe - -BOTH: compiling foobar.c -> [[OBJFOO:foobar.exe-foobar.o(bj)?]] -BOTH: linking [[OBJFOO]] -> foobar.exe - -BOTH-MULTI: compiling foo.c -> [[OBJFOO:foobar.exe-foo.o(bj)?]] -BOTH-MULTI: compiling bar.c -> [[OBJBAR:foobar.exe-bar.o(bj)?]] -BOTH-MULTI: linking [[OBJFOO]]+[[OBJBAR]] -> foobar.exe +RUN: %build -n --verbose --arch=32 --mode=compile --compiler=any -o %t/foo.out foobar.c \ +RUN: | FileCheck --check-prefix=COMPILE %s + +RUN: %build -n --verbose --arch=32 --mode=compile --compiler=any --outdir %t foo.c bar.c \ +RUN: | FileCheck --check-prefix=COMPILE-MULTI %s + +RUN: %build -n --verbose --arch=32 --mode=link --compiler=any -o %t/foo.exe foobar.obj \ +RUN: | FileCheck --check-prefix=LINK %s + +RUN: %build -n --verbose --arch=32 --mode=link --compiler=any -o %t/foobar.exe foo.obj bar.obj \ +RUN: | FileCheck --check-prefix=LINK-MULTI %s + +RUN: %build -n --verbose --arch=32 --mode=compile-and-link --compiler=any -o %t/foobar.exe foobar.c \ +RUN: | FileCheck --check-prefix=BOTH %s + +RUN: %build -n --verbose --arch=32 --mode=compile-and-link --compiler=any -o %t/foobar.exe foo.c bar.c \ +RUN: | FileCheck --check-prefix=BOTH-MULTI %s + + +COMPILE: compiling foobar.c -> foo.out + +COMPILE-MULTI: compiling foo.c -> foo.o{{(bj)?}} +COMPILE-MULTI: compiling bar.c -> bar.o{{(bj)?}} + + +LINK: linking foobar.obj -> foo.exe + +LINK-MULTI: linking foo.obj+bar.obj -> foobar.exe + +BOTH: compiling foobar.c -> [[OBJFOO:foobar.exe-foobar.o(bj)?]] +BOTH: linking [[OBJFOO]] -> foobar.exe + +BOTH-MULTI: compiling foo.c -> [[OBJFOO:foobar.exe-foo.o(bj)?]] +BOTH-MULTI: compiling bar.c -> [[OBJBAR:foobar.exe-bar.o(bj)?]] +BOTH-MULTI: linking [[OBJFOO]]+[[OBJBAR]] -> foobar.exe diff --git a/lldb/test/Shell/BuildScript/script-args.test b/lldb/test/Shell/BuildScript/script-args.test index 13e8a516094267..647a48e4442b12 100644 --- a/lldb/test/Shell/BuildScript/script-args.test +++ b/lldb/test/Shell/BuildScript/script-args.test @@ -1,32 +1,32 @@ -RUN: %build -n --verbose --arch=32 --mode=compile --compiler=any -o %t/foo.out foobar.c \ -RUN: | FileCheck %s -RUN: %build -n --verbose --arch=32 --mode=compile --compiler=any --outdir %t foo.c bar.c \ -RUN: | FileCheck --check-prefix=MULTI-INPUT %s - - -CHECK: Script Arguments: -CHECK-NEXT: Arch: 32 -CHECK: Compiler: any -CHECK: Outdir: {{.*}}script-args.test.tmp -CHECK: Output: {{.*}}script-args.test.tmp{{.}}foo.out -CHECK: Nodefaultlib: False -CHECK: Opt: none -CHECK: Mode: compile -CHECK: Clean: True -CHECK: Verbose: True -CHECK: Dryrun: True -CHECK: Inputs: foobar.c - -MULTI-INPUT: Script Arguments: -MULTI-INPUT-NEXT: Arch: 32 -MULTI-INPUT-NEXT: Compiler: any -MULTI-INPUT-NEXT: Outdir: {{.*}}script-args.test.tmp -MULTI-INPUT-NEXT: Output: -MULTI-INPUT-NEXT: Nodefaultlib: False -MULTI-INPUT-NEXT: Opt: none -MULTI-INPUT-NEXT: Mode: compile -MULTI-INPUT-NEXT: Clean: True -MULTI-INPUT-NEXT: Verbose: True -MULTI-INPUT-NEXT: Dryrun: True -MULTI-INPUT-NEXT: Inputs: foo.c -MULTI-INPUT-NEXT: bar.c +RUN: %build -n --verbose --arch=32 --mode=compile --compiler=any -o %t/foo.out foobar.c \ +RUN: | FileCheck %s +RUN: %build -n --verbose --arch=32 --mode=compile --compiler=any --outdir %t foo.c bar.c \ +RUN: | FileCheck --check-prefix=MULTI-INPUT %s + + +CHECK: Script Arguments: +CHECK-NEXT: Arch: 32 +CHECK: Compiler: any +CHECK: Outdir: {{.*}}script-args.test.tmp +CHECK: Output: {{.*}}script-args.test.tmp{{.}}foo.out +CHECK: Nodefaultlib: False +CHECK: Opt: none +CHECK: Mode: compile +CHECK: Clean: True +CHECK: Verbose: True +CHECK: Dryrun: True +CHECK: Inputs: foobar.c + +MULTI-INPUT: Script Arguments: +MULTI-INPUT-NEXT: Arch: 32 +MULTI-INPUT-NEXT: Compiler: any +MULTI-INPUT-NEXT: Outdir: {{.*}}script-args.test.tmp +MULTI-INPUT-NEXT: Output: +MULTI-INPUT-NEXT: Nodefaultlib: False +MULTI-INPUT-NEXT: Opt: none +MULTI-INPUT-NEXT: Mode: compile +MULTI-INPUT-NEXT: Clean: True +MULTI-INPUT-NEXT: Verbose: True +MULTI-INPUT-NEXT: Dryrun: True +MULTI-INPUT-NEXT: Inputs: foo.c +MULTI-INPUT-NEXT: bar.c diff --git a/lldb/test/Shell/BuildScript/toolchain-clang-cl.test b/lldb/test/Shell/BuildScript/toolchain-clang-cl.test index 8c9ea9fddb8a50..4f64859a02b607 100644 --- a/lldb/test/Shell/BuildScript/toolchain-clang-cl.test +++ b/lldb/test/Shell/BuildScript/toolchain-clang-cl.test @@ -1,49 +1,49 @@ -REQUIRES: lld, system-windows - -RUN: %build -n --verbose --arch=32 --compiler=clang-cl --mode=compile-and-link -o %t/foo.exe foobar.c \ -RUN: | FileCheck --check-prefix=CHECK-32 %s - -RUN: %build -n --verbose --arch=64 --compiler=clang-cl --mode=compile-and-link -o %t/foo.exe foobar.c \ -RUN: | FileCheck --check-prefix=CHECK-64 %s - -CHECK-32: Script Arguments: -CHECK-32: Arch: 32 -CHECK-32: Compiler: clang-cl -CHECK-32: Outdir: {{.*}} -CHECK-32: Output: {{.*}}toolchain-clang-cl.test.tmp\foo.exe -CHECK-32: Nodefaultlib: False -CHECK-32: Opt: none -CHECK-32: Mode: compile -CHECK-32: Clean: True -CHECK-32: Verbose: True -CHECK-32: Dryrun: True -CHECK-32: Inputs: foobar.c -CHECK-32: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foobar.ilk -CHECK-32: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.exe-foobar.obj -CHECK-32: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.pdb -CHECK-32: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.exe -CHECK-32: compiling foobar.c -> foo.exe-foobar.obj -CHECK-32: {{.*}}clang-cl{{(\.EXE)?}} -m32 -CHECK-32: linking foo.exe-foobar.obj -> foo.exe -CHECK-32: {{.*}}lld-link{{(\.EXE)?}} - -CHECK-64: Script Arguments: -CHECK-64: Arch: 64 -CHECK-64: Compiler: clang-cl -CHECK-64: Outdir: {{.*}} -CHECK-64: Output: {{.*}}toolchain-clang-cl.test.tmp\foo.exe -CHECK-64: Nodefaultlib: False -CHECK-64: Opt: none -CHECK-64: Mode: compile -CHECK-64: Clean: True -CHECK-64: Verbose: True -CHECK-64: Dryrun: True -CHECK-64: Inputs: foobar.c -CHECK-64: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foobar.ilk -CHECK-64: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.exe-foobar.obj -CHECK-64: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.pdb -CHECK-64: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.exe -CHECK-64: compiling foobar.c -> foo.exe-foobar.obj -CHECK-64: {{.*}}clang-cl{{(\.EXE)?}} -m64 -CHECK-64: linking foo.exe-foobar.obj -> foo.exe -CHECK-64: {{.*}}lld-link{{(\.EXE)?}} +REQUIRES: lld, system-windows + +RUN: %build -n --verbose --arch=32 --compiler=clang-cl --mode=compile-and-link -o %t/foo.exe foobar.c \ +RUN: | FileCheck --check-prefix=CHECK-32 %s + +RUN: %build -n --verbose --arch=64 --compiler=clang-cl --mode=compile-and-link -o %t/foo.exe foobar.c \ +RUN: | FileCheck --check-prefix=CHECK-64 %s + +CHECK-32: Script Arguments: +CHECK-32: Arch: 32 +CHECK-32: Compiler: clang-cl +CHECK-32: Outdir: {{.*}} +CHECK-32: Output: {{.*}}toolchain-clang-cl.test.tmp\foo.exe +CHECK-32: Nodefaultlib: False +CHECK-32: Opt: none +CHECK-32: Mode: compile +CHECK-32: Clean: True +CHECK-32: Verbose: True +CHECK-32: Dryrun: True +CHECK-32: Inputs: foobar.c +CHECK-32: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foobar.ilk +CHECK-32: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.exe-foobar.obj +CHECK-32: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.pdb +CHECK-32: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.exe +CHECK-32: compiling foobar.c -> foo.exe-foobar.obj +CHECK-32: {{.*}}clang-cl{{(\.EXE)?}} -m32 +CHECK-32: linking foo.exe-foobar.obj -> foo.exe +CHECK-32: {{.*}}lld-link{{(\.EXE)?}} + +CHECK-64: Script Arguments: +CHECK-64: Arch: 64 +CHECK-64: Compiler: clang-cl +CHECK-64: Outdir: {{.*}} +CHECK-64: Output: {{.*}}toolchain-clang-cl.test.tmp\foo.exe +CHECK-64: Nodefaultlib: False +CHECK-64: Opt: none +CHECK-64: Mode: compile +CHECK-64: Clean: True +CHECK-64: Verbose: True +CHECK-64: Dryrun: True +CHECK-64: Inputs: foobar.c +CHECK-64: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foobar.ilk +CHECK-64: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.exe-foobar.obj +CHECK-64: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.pdb +CHECK-64: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.exe +CHECK-64: compiling foobar.c -> foo.exe-foobar.obj +CHECK-64: {{.*}}clang-cl{{(\.EXE)?}} -m64 +CHECK-64: linking foo.exe-foobar.obj -> foo.exe +CHECK-64: {{.*}}lld-link{{(\.EXE)?}} diff --git a/lldb/test/Shell/Minidump/Windows/Sigsegv/Inputs/sigsegv.cpp b/lldb/test/Shell/Minidump/Windows/Sigsegv/Inputs/sigsegv.cpp index 6bf78b5dc43b29..d5b96472eb117f 100644 --- a/lldb/test/Shell/Minidump/Windows/Sigsegv/Inputs/sigsegv.cpp +++ b/lldb/test/Shell/Minidump/Windows/Sigsegv/Inputs/sigsegv.cpp @@ -1,40 +1,40 @@ - -// nodefaultlib build: cl -Zi sigsegv.cpp /link /nodefaultlib - -#ifdef USE_CRT -#include -#else -int main(); -extern "C" -{ - int _fltused; - void mainCRTStartup() { main(); } - void printf(const char*, ...) {} -} -#endif - -void crash(bool crash_self) -{ - printf("Before...\n"); - if(crash_self) - { - printf("Crashing in 3, 2, 1 ...\n"); - *(volatile int*)nullptr = 0; - } - printf("After...\n"); -} - -int foo(int x, float y, const char* msg) -{ - bool flag = x > y; - if(flag) - printf("x = %d, y = %f, msg = %s\n", x, y, msg); - crash(flag); - return x << 1; -} - -int main() -{ - foo(10, 3.14, "testing"); -} - + +// nodefaultlib build: cl -Zi sigsegv.cpp /link /nodefaultlib + +#ifdef USE_CRT +#include +#else +int main(); +extern "C" +{ + int _fltused; + void mainCRTStartup() { main(); } + void printf(const char*, ...) {} +} +#endif + +void crash(bool crash_self) +{ + printf("Before...\n"); + if(crash_self) + { + printf("Crashing in 3, 2, 1 ...\n"); + *(volatile int*)nullptr = 0; + } + printf("After...\n"); +} + +int foo(int x, float y, const char* msg) +{ + bool flag = x > y; + if(flag) + printf("x = %d, y = %f, msg = %s\n", x, y, msg); + crash(flag); + return x << 1; +} + +int main() +{ + foo(10, 3.14, "testing"); +} + diff --git a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/inline_sites.s b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/inline_sites.s index aac8f4c1698038..a9d248758bfcec 100644 --- a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/inline_sites.s +++ b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/inline_sites.s @@ -1,622 +1,622 @@ -# Compiled from the following files, but replaced the call to abort with nop. -# clang-cl -fuse-ld=lld-link /Z7 /O1 /Faa.asm /winsysroot~/win_toolchain a.cpp -# a.cpp: -# #include "a.h" -# int main(int argc, char** argv) { -# volatile int main_local = Namespace1::foo(2); -# return 0; -# } -# a.h: -# #include -# #include "b.h" -# namespace Namespace1 { -# inline int foo(int x) { -# volatile int foo_local = x + 1; -# ++foo_local; -# if (!foo_local) -# abort(); -# return Class1::bar(foo_local); -# } -# } // namespace Namespace1 -# b.h: -# #include "c.h" -# class Class1 { -# public: -# inline static int bar(int x) { -# volatile int bar_local = x + 1; -# ++bar_local; -# return Namespace2::Class2::func(bar_local); -# } -# }; -# c.h: -# namespace Namespace2 { -# class Class2 { -# public: -# inline static int func(int x) { -# volatile int func_local = x + 1; -# func_local += x; -# return func_local; -# } -# }; -# } // namespace Namespace2 - - .text - .def @feat.00; - .scl 3; - .type 0; - .endef - .globl @feat.00 -.set @feat.00, 0 - .intel_syntax noprefix - .file "a.cpp" - .def main; - .scl 2; - .type 32; - .endef - .section .text,"xr",one_only,main - .globl main # -- Begin function main -main: # @main -.Lfunc_begin0: - .cv_func_id 0 - .cv_file 1 "/tmp/a.cpp" "4FFB96E5DF1A95CE7DB9732CFFE001D7" 1 - .cv_loc 0 1 2 0 # a.cpp:2:0 -.seh_proc main -# %bb.0: - #DEBUG_VALUE: main:argv <- $rdx - #DEBUG_VALUE: main:argc <- $ecx - #DEBUG_VALUE: foo:x <- 2 - sub rsp, 56 - .seh_stackalloc 56 - .seh_endprologue -.Ltmp0: - .cv_file 2 "/tmp/./a.h" "BBFED90EF093E9C1D032CC9B05B5D167" 1 - .cv_inline_site_id 1 within 0 inlined_at 1 3 0 - .cv_loc 1 2 5 0 # ./a.h:5:0 - mov dword ptr [rsp + 44], 3 - .cv_loc 1 2 6 0 # ./a.h:6:0 - inc dword ptr [rsp + 44] - .cv_loc 1 2 7 0 # ./a.h:7:0 - mov eax, dword ptr [rsp + 44] - test eax, eax - je .LBB0_2 -.Ltmp1: -# %bb.1: - #DEBUG_VALUE: main:argv <- $rdx - #DEBUG_VALUE: main:argc <- $ecx - #DEBUG_VALUE: foo:x <- 2 - .cv_loc 1 2 9 0 # ./a.h:9:0 - mov eax, dword ptr [rsp + 44] -.Ltmp2: - #DEBUG_VALUE: bar:x <- $eax - .cv_file 3 "/tmp/./b.h" "A26CC743A260115F33AF91AB11F95877" 1 - .cv_inline_site_id 2 within 1 inlined_at 2 9 0 - .cv_loc 2 3 5 0 # ./b.h:5:0 - inc eax -.Ltmp3: - mov dword ptr [rsp + 52], eax - .cv_loc 2 3 6 0 # ./b.h:6:0 - inc dword ptr [rsp + 52] - .cv_loc 2 3 7 0 # ./b.h:7:0 - mov eax, dword ptr [rsp + 52] -.Ltmp4: - #DEBUG_VALUE: func:x <- $eax - .cv_file 4 "/tmp/./c.h" "8AF4613F78624BBE96D1C408ABA39B2D" 1 - .cv_inline_site_id 3 within 2 inlined_at 3 7 0 - .cv_loc 3 4 5 0 # ./c.h:5:0 - lea ecx, [rax + 1] -.Ltmp5: - #DEBUG_VALUE: main:argc <- [DW_OP_LLVM_entry_value 1] $ecx - mov dword ptr [rsp + 48], ecx - .cv_loc 3 4 6 0 # ./c.h:6:0 - add dword ptr [rsp + 48], eax - .cv_loc 3 4 7 0 # ./c.h:7:0 - mov eax, dword ptr [rsp + 48] -.Ltmp6: - .cv_loc 0 1 3 0 # a.cpp:3:0 - mov dword ptr [rsp + 48], eax - .cv_loc 0 1 4 0 # a.cpp:4:0 - xor eax, eax - # Use fake debug info to tests inline info. - .cv_loc 1 2 20 0 - add rsp, 56 - ret -.Ltmp7: -.LBB0_2: - #DEBUG_VALUE: main:argv <- $rdx - #DEBUG_VALUE: main:argc <- $ecx - #DEBUG_VALUE: foo:x <- 2 - .cv_loc 1 2 8 0 # ./a.h:8:0 - nop -.Ltmp8: - int3 -.Ltmp9: - #DEBUG_VALUE: main:argc <- [DW_OP_LLVM_entry_value 1] $ecx - #DEBUG_VALUE: main:argv <- [DW_OP_LLVM_entry_value 1] $rdx -.Lfunc_end0: - .seh_endproc - # -- End function - .section .drectve,"yn" - .ascii " /DEFAULTLIB:libcmt.lib" - .ascii " /DEFAULTLIB:oldnames.lib" - .section .debug$S,"dr" - .p2align 2 - .long 4 # Debug section magic - .long 241 - .long .Ltmp11-.Ltmp10 # Subsection size -.Ltmp10: - .short .Ltmp13-.Ltmp12 # Record length -.Ltmp12: - .short 4353 # Record kind: S_OBJNAME - .long 0 # Signature - .asciz "/tmp/a-2b2ba0.obj" # Object name - .p2align 2 -.Ltmp13: - .short .Ltmp15-.Ltmp14 # Record length -.Ltmp14: - .short 4412 # Record kind: S_COMPILE3 - .long 1 # Flags and language - .short 208 # CPUType - .short 15 # Frontend version - .short 0 - .short 0 - .short 0 - .short 15000 # Backend version - .short 0 - .short 0 - .short 0 - .asciz "clang version 15.0.0" # Null-terminated compiler version string - .p2align 2 -.Ltmp15: -.Ltmp11: - .p2align 2 - .long 246 # Inlinee lines subsection - .long .Ltmp17-.Ltmp16 # Subsection size -.Ltmp16: - .long 0 # Inlinee lines signature - - # Inlined function foo starts at ./a.h:4 - .long 4099 # Type index of inlined function - .cv_filechecksumoffset 2 # Offset into filechecksum table - .long 4 # Starting line number - - # Inlined function bar starts at ./b.h:4 - .long 4106 # Type index of inlined function - .cv_filechecksumoffset 3 # Offset into filechecksum table - .long 4 # Starting line number - - # Inlined function func starts at ./c.h:4 - .long 4113 # Type index of inlined function - .cv_filechecksumoffset 4 # Offset into filechecksum table - .long 4 # Starting line number -.Ltmp17: - .p2align 2 - .section .debug$S,"dr",associative,main - .p2align 2 - .long 4 # Debug section magic - .long 241 # Symbol subsection for main - .long .Ltmp19-.Ltmp18 # Subsection size -.Ltmp18: - .short .Ltmp21-.Ltmp20 # Record length -.Ltmp20: - .short 4423 # Record kind: S_GPROC32_ID - .long 0 # PtrParent - .long 0 # PtrEnd - .long 0 # PtrNext - .long .Lfunc_end0-main # Code size - .long 0 # Offset after prologue - .long 0 # Offset before epilogue - .long 4117 # Function type index - .secrel32 main # Function section relative address - .secidx main # Function section index - .byte 0 # Flags - .asciz "main" # Function name - .p2align 2 -.Ltmp21: - .short .Ltmp23-.Ltmp22 # Record length -.Ltmp22: - .short 4114 # Record kind: S_FRAMEPROC - .long 56 # FrameSize - .long 0 # Padding - .long 0 # Offset of padding - .long 0 # Bytes of callee saved registers - .long 0 # Exception handler offset - .short 0 # Exception handler section - .long 81920 # Flags (defines frame register) - .p2align 2 -.Ltmp23: - .short .Ltmp25-.Ltmp24 # Record length -.Ltmp24: - .short 4414 # Record kind: S_LOCAL - .long 116 # TypeIndex - .short 1 # Flags - .asciz "argc" - .p2align 2 -.Ltmp25: - .cv_def_range .Lfunc_begin0 .Ltmp5 .Ltmp7 .Ltmp8, reg, 18 - .short .Ltmp27-.Ltmp26 # Record length -.Ltmp26: - .short 4414 # Record kind: S_LOCAL - .long 4114 # TypeIndex - .short 1 # Flags - .asciz "argv" - .p2align 2 -.Ltmp27: - .cv_def_range .Lfunc_begin0 .Ltmp8, reg, 331 - .short .Ltmp29-.Ltmp28 # Record length -.Ltmp28: - .short 4414 # Record kind: S_LOCAL - .long 4118 # TypeIndex - .short 0 # Flags - .asciz "main_local" - .p2align 2 -.Ltmp29: - .cv_def_range .Ltmp0 .Ltmp9, frame_ptr_rel, 48 - .short .Ltmp31-.Ltmp30 # Record length -.Ltmp30: - .short 4429 # Record kind: S_INLINESITE - .long 0 # PtrParent - .long 0 # PtrEnd - .long 4099 # Inlinee type index - .cv_inline_linetable 1 2 4 .Lfunc_begin0 .Lfunc_end0 - .p2align 2 -.Ltmp31: - .short .Ltmp33-.Ltmp32 # Record length -.Ltmp32: - .short 4414 # Record kind: S_LOCAL - .long 116 # TypeIndex - .short 257 # Flags - .asciz "x" - .p2align 2 -.Ltmp33: - .short .Ltmp35-.Ltmp34 # Record length -.Ltmp34: - .short 4414 # Record kind: S_LOCAL - .long 4118 # TypeIndex - .short 0 # Flags - .asciz "foo_local" - .p2align 2 -.Ltmp35: - .cv_def_range .Ltmp0 .Ltmp6 .Ltmp7 .Ltmp9, frame_ptr_rel, 44 - .short .Ltmp37-.Ltmp36 # Record length -.Ltmp36: - .short 4429 # Record kind: S_INLINESITE - .long 0 # PtrParent - .long 0 # PtrEnd - .long 4106 # Inlinee type index - .cv_inline_linetable 2 3 4 .Lfunc_begin0 .Lfunc_end0 - .p2align 2 -.Ltmp37: - .short .Ltmp39-.Ltmp38 # Record length -.Ltmp38: - .short 4414 # Record kind: S_LOCAL - .long 116 # TypeIndex - .short 1 # Flags - .asciz "x" - .p2align 2 -.Ltmp39: - .cv_def_range .Ltmp2 .Ltmp3, reg, 17 - .short .Ltmp41-.Ltmp40 # Record length -.Ltmp40: - .short 4414 # Record kind: S_LOCAL - .long 4118 # TypeIndex - .short 0 # Flags - .asciz "bar_local" - .p2align 2 -.Ltmp41: - .cv_def_range .Ltmp2 .Ltmp6, frame_ptr_rel, 52 - .short .Ltmp43-.Ltmp42 # Record length -.Ltmp42: - .short 4429 # Record kind: S_INLINESITE - .long 0 # PtrParent - .long 0 # PtrEnd - .long 4113 # Inlinee type index - .cv_inline_linetable 3 4 4 .Lfunc_begin0 .Lfunc_end0 - .p2align 2 -.Ltmp43: - .short .Ltmp45-.Ltmp44 # Record length -.Ltmp44: - .short 4414 # Record kind: S_LOCAL - .long 116 # TypeIndex - .short 1 # Flags - .asciz "x" - .p2align 2 -.Ltmp45: - .cv_def_range .Ltmp4 .Ltmp6, reg, 17 - .short .Ltmp47-.Ltmp46 # Record length -.Ltmp46: - .short 4414 # Record kind: S_LOCAL - .long 4118 # TypeIndex - .short 0 # Flags - .asciz "func_local" - .p2align 2 -.Ltmp47: - .cv_def_range .Ltmp4 .Ltmp6, frame_ptr_rel, 48 - .short 2 # Record length - .short 4430 # Record kind: S_INLINESITE_END - .short 2 # Record length - .short 4430 # Record kind: S_INLINESITE_END - .short 2 # Record length - .short 4430 # Record kind: S_INLINESITE_END - .short 2 # Record length - .short 4431 # Record kind: S_PROC_ID_END -.Ltmp19: - .p2align 2 - .cv_linetable 0, main, .Lfunc_end0 - .section .debug$S,"dr" - .long 241 - .long .Ltmp49-.Ltmp48 # Subsection size -.Ltmp48: - .short .Ltmp51-.Ltmp50 # Record length -.Ltmp50: - .short 4360 # Record kind: S_UDT - .long 4103 # Type - .asciz "Class1" - .p2align 2 -.Ltmp51: - .short .Ltmp53-.Ltmp52 # Record length -.Ltmp52: - .short 4360 # Record kind: S_UDT - .long 4110 # Type - .asciz "Namespace2::Class2" - .p2align 2 -.Ltmp53: -.Ltmp49: - .p2align 2 - .cv_filechecksums # File index to string table offset subsection - .cv_stringtable # String table - .long 241 - .long .Ltmp55-.Ltmp54 # Subsection size -.Ltmp54: - .short .Ltmp57-.Ltmp56 # Record length -.Ltmp56: - .short 4428 # Record kind: S_BUILDINFO - .long 4124 # LF_BUILDINFO index - .p2align 2 -.Ltmp57: -.Ltmp55: - .p2align 2 - .section .debug$T,"dr" - .p2align 2 - .long 4 # Debug section magic - # StringId (0x1000) - .short 0x12 # Record length - .short 0x1605 # Record kind: LF_STRING_ID - .long 0x0 # Id - .asciz "Namespace1" # StringData - .byte 241 - # ArgList (0x1001) - .short 0xa # Record length - .short 0x1201 # Record kind: LF_ARGLIST - .long 0x1 # NumArgs - .long 0x74 # Argument: int - # Procedure (0x1002) - .short 0xe # Record length - .short 0x1008 # Record kind: LF_PROCEDURE - .long 0x74 # ReturnType: int - .byte 0x0 # CallingConvention: NearC - .byte 0x0 # FunctionOptions - .short 0x1 # NumParameters - .long 0x1001 # ArgListType: (int) - # FuncId (0x1003) - .short 0xe # Record length - .short 0x1601 # Record kind: LF_FUNC_ID - .long 0x1000 # ParentScope: Namespace1 - .long 0x1002 # FunctionType: int (int) - .asciz "foo" # Name - # Class (0x1004) - .short 0x2a # Record length - .short 0x1504 # Record kind: LF_CLASS - .short 0x0 # MemberCount - .short 0x280 # Properties ( ForwardReference (0x80) | HasUniqueName (0x200) ) - .long 0x0 # FieldList - .long 0x0 # DerivedFrom - .long 0x0 # VShape - .short 0x0 # SizeOf - .asciz "Class1" # Name - .asciz ".?AVClass1@@" # LinkageName - .byte 242 - .byte 241 - # MemberFunction (0x1005) - .short 0x1a # Record length - .short 0x1009 # Record kind: LF_MFUNCTION - .long 0x74 # ReturnType: int - .long 0x1004 # ClassType: Class1 - .long 0x0 # ThisType - .byte 0x0 # CallingConvention: NearC - .byte 0x0 # FunctionOptions - .short 0x1 # NumParameters - .long 0x1001 # ArgListType: (int) - .long 0x0 # ThisAdjustment - # FieldList (0x1006) - .short 0xe # Record length - .short 0x1203 # Record kind: LF_FIELDLIST - .short 0x1511 # Member kind: OneMethod ( LF_ONEMETHOD ) - .short 0xb # Attrs: Public, Static - .long 0x1005 # Type: int Class1::(int) - .asciz "bar" # Name - # Class (0x1007) - .short 0x2a # Record length - .short 0x1504 # Record kind: LF_CLASS - .short 0x1 # MemberCount - .short 0x200 # Properties ( HasUniqueName (0x200) ) - .long 0x1006 # FieldList: - .long 0x0 # DerivedFrom - .long 0x0 # VShape - .short 0x1 # SizeOf - .asciz "Class1" # Name - .asciz ".?AVClass1@@" # LinkageName - .byte 242 - .byte 241 - # StringId (0x1008) - .short 0x12 # Record length - .short 0x1605 # Record kind: LF_STRING_ID - .long 0x0 # Id - .asciz "/tmp/./b.h" # StringData - .byte 241 - # UdtSourceLine (0x1009) - .short 0xe # Record length - .short 0x1606 # Record kind: LF_UDT_SRC_LINE - .long 0x1007 # UDT: Class1 - .long 0x1008 # SourceFile: /tmp/./b.h - .long 0x2 # LineNumber - # MemberFuncId (0x100A) - .short 0xe # Record length - .short 0x1602 # Record kind: LF_MFUNC_ID - .long 0x1004 # ClassType: Class1 - .long 0x1005 # FunctionType: int Class1::(int) - .asciz "bar" # Name - # Class (0x100B) - .short 0x42 # Record length - .short 0x1504 # Record kind: LF_CLASS - .short 0x0 # MemberCount - .short 0x280 # Properties ( ForwardReference (0x80) | HasUniqueName (0x200) ) - .long 0x0 # FieldList - .long 0x0 # DerivedFrom - .long 0x0 # VShape - .short 0x0 # SizeOf - .asciz "Namespace2::Class2" # Name - .asciz ".?AVClass2 at Namespace2@@" # LinkageName - .byte 243 - .byte 242 - .byte 241 - # MemberFunction (0x100C) - .short 0x1a # Record length - .short 0x1009 # Record kind: LF_MFUNCTION - .long 0x74 # ReturnType: int - .long 0x100b # ClassType: Namespace2::Class2 - .long 0x0 # ThisType - .byte 0x0 # CallingConvention: NearC - .byte 0x0 # FunctionOptions - .short 0x1 # NumParameters - .long 0x1001 # ArgListType: (int) - .long 0x0 # ThisAdjustment - # FieldList (0x100D) - .short 0x12 # Record length - .short 0x1203 # Record kind: LF_FIELDLIST - .short 0x1511 # Member kind: OneMethod ( LF_ONEMETHOD ) - .short 0xb # Attrs: Public, Static - .long 0x100c # Type: int Namespace2::Class2::(int) - .asciz "func" # Name - .byte 243 - .byte 242 - .byte 241 - # Class (0x100E) - .short 0x42 # Record length - .short 0x1504 # Record kind: LF_CLASS - .short 0x1 # MemberCount - .short 0x200 # Properties ( HasUniqueName (0x200) ) - .long 0x100d # FieldList: - .long 0x0 # DerivedFrom - .long 0x0 # VShape - .short 0x1 # SizeOf - .asciz "Namespace2::Class2" # Name - .asciz ".?AVClass2 at Namespace2@@" # LinkageName - .byte 243 - .byte 242 - .byte 241 - # StringId (0x100F) - .short 0x12 # Record length - .short 0x1605 # Record kind: LF_STRING_ID - .long 0x0 # Id - .asciz "/tmp/./c.h" # StringData - .byte 241 - # UdtSourceLine (0x1010) - .short 0xe # Record length - .short 0x1606 # Record kind: LF_UDT_SRC_LINE - .long 0x100e # UDT: Namespace2::Class2 - .long 0x100f # SourceFile: /tmp/./c.h - .long 0x2 # LineNumber - # MemberFuncId (0x1011) - .short 0x12 # Record length - .short 0x1602 # Record kind: LF_MFUNC_ID - .long 0x100b # ClassType: Namespace2::Class2 - .long 0x100c # FunctionType: int Namespace2::Class2::(int) - .asciz "func" # Name - .byte 243 - .byte 242 - .byte 241 - # Pointer (0x1012) - .short 0xa # Record length - .short 0x1002 # Record kind: LF_POINTER - .long 0x670 # PointeeType: char* - .long 0x1000c # Attrs: [ Type: Near64, Mode: Pointer, SizeOf: 8 ] - # ArgList (0x1013) - .short 0xe # Record length - .short 0x1201 # Record kind: LF_ARGLIST - .long 0x2 # NumArgs - .long 0x74 # Argument: int - .long 0x1012 # Argument: char** - # Procedure (0x1014) - .short 0xe # Record length - .short 0x1008 # Record kind: LF_PROCEDURE - .long 0x74 # ReturnType: int - .byte 0x0 # CallingConvention: NearC - .byte 0x0 # FunctionOptions - .short 0x2 # NumParameters - .long 0x1013 # ArgListType: (int, char**) - # FuncId (0x1015) - .short 0x12 # Record length - .short 0x1601 # Record kind: LF_FUNC_ID - .long 0x0 # ParentScope - .long 0x1014 # FunctionType: int (int, char**) - .asciz "main" # Name - .byte 243 - .byte 242 - .byte 241 - # Modifier (0x1016) - .short 0xa # Record length - .short 0x1001 # Record kind: LF_MODIFIER - .long 0x74 # ModifiedType: int - .short 0x2 # Modifiers ( Volatile (0x2) ) - .byte 242 - .byte 241 - # StringId (0x1017) - .short 0xe # Record length - .short 0x1605 # Record kind: LF_STRING_ID - .long 0x0 # Id - .asciz "/tmp" # StringData - .byte 243 - .byte 242 - .byte 241 - # StringId (0x1018) - .short 0xe # Record length - .short 0x1605 # Record kind: LF_STRING_ID - .long 0x0 # Id - .asciz "a.cpp" # StringData - .byte 242 - .byte 241 - # StringId (0x1019) - .short 0xa # Record length - .short 0x1605 # Record kind: LF_STRING_ID - .long 0x0 # Id - .byte 0 # StringData - .byte 243 - .byte 242 - .byte 241 - # StringId (0x101A) - .short 0x4e # Record length - .short 0x1605 # Record kind: LF_STRING_ID - .long 0x0 # Id - .asciz "/usr/local/google/home/zequanwu/llvm-project/build/release/bin/clang" # StringData - .byte 243 - .byte 242 - .byte 241 - # StringId (0x101B) - .short 0x9f6 # Record length - .short 0x1605 # Record kind: LF_STRING_ID - .long 0x0 # Id - .asciz "\"-cc1\" \"-triple\" \"x86_64-pc-windows-msvc19.20.0\" \"-S\" \"-disable-free\" \"-clear-ast-before-backend\" \"-disable-llvm-verifier\" \"-discard-value-names\" \"-mrelocation-model\" \"pic\" \"-pic-level\" \"2\" \"-mframe-pointer=none\" \"-relaxed-aliasing\" \"-fmath-errno\" \"-ffp-contract=on\" \"-fno-rounding-math\" \"-mconstructor-aliases\" \"-funwind-tables=2\" \"-target-cpu\" \"x86-64\" \"-mllvm\" \"-x86-asm-syntax=intel\" \"-tune-cpu\" \"generic\" \"-mllvm\" \"-treat-scalable-fixed-error-as-warning\" \"-D_MT\" \"-flto-visibility-public-std\" \"--dependent-lib=libcmt\" \"--dependent-lib=oldnames\" \"-stack-protector\" \"2\" \"-fms-volatile\" \"-fdiagnostics-format\" \"msvc\" \"-gno-column-info\" \"-gcodeview\" \"-debug-info-kind=constructor\" \"-ffunction-sections\" \"-fcoverage-compilation-dir=/tmp\" \"-resource-dir\" \"/usr/local/google/home/zequanwu/llvm-project/build/release/lib/clang/15.0.0\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/llvm-project/build/release/lib/clang/15.0.0/include\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/DIA SDK/include\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/VC/Tools/MSVC/14.26.28801/include\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/VC/Tools/MSVC/14.26.28801/atlmfc/include\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/ucrt\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/shared\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/um\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/winrt\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/cppwinrt\" \"-Os\" \"-fdeprecated-macro\" \"-fdebug-compilation-dir=/tmp\" \"-ferror-limit\" \"19\" \"-fno-use-cxa-atexit\" \"-fms-extensions\" \"-fms-compatibility\" \"-fms-compatibility-version=19.20\" \"-std=c++14\" \"-fdelayed-template-parsing\" \"-fcolor-diagnostics\" \"-vectorize-loops\" \"-vectorize-slp\" \"-faddrsig\" \"-x\" \"c++\"" # StringData - .byte 242 - .byte 241 - # BuildInfo (0x101C) - .short 0x1a # Record length - .short 0x1603 # Record kind: LF_BUILDINFO - .short 0x5 # NumArgs - .long 0x1017 # Argument: /tmp - .long 0x101a # Argument: /usr/local/google/home/zequanwu/llvm-project/build/release/bin/clang - .long 0x1018 # Argument: a.cpp - .long 0x1019 # Argument - .long 0x101b # Argument: "-cc1" "-triple" "x86_64-pc-windows-msvc19.20.0" "-S" "-disable-free" "-clear-ast-before-backend" "-disable-llvm-verifier" "-discard-value-names" "-mrelocation-model" "pic" "-pic-level" "2" "-mframe-pointer=none" "-relaxed-aliasing" "-fmath-errno" "-ffp-contract=on" "-fno-rounding-math" "-mconstructor-aliases" "-funwind-tables=2" "-target-cpu" "x86-64" "-mllvm" "-x86-asm-syntax=intel" "-tune-cpu" "generic" "-mllvm" "-treat-scalable-fixed-error-as-warning" "-D_MT" "-flto-visibility-public-std" "--dependent-lib=libcmt" "--dependent-lib=oldnames" "-stack-protector" "2" "-fms-volatile" "-fdiagnostics-format" "msvc" "-gno-column-info" "-gcodeview" "-debug-info-kind=constructor" "-ffunction-sections" "-fcoverage-compilation-dir=/tmp" "-resource-dir" "/usr/local/google/home/zequanwu/llvm-project/build/release/lib/clang/15.0.0" "-internal-isystem" "/usr/local/google/home/zequanwu/llvm-project/build/release/lib/clang/15.0.0/include" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/DIA SDK/include" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/VC/Tools/MSVC/14.26.28801/include" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/VC/Tools/MSVC/14.26.28801/atlmfc/include" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/ucrt" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/shared" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/um" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/winrt" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/cppwinrt" "-Os" "-fdeprecated-macro" "-fdebug-compilation-dir=/tmp" "-ferror-limit" "19" "-fno-use-cxa-atexit" "-fms-extensions" "-fms-compatibility" "-fms-compatibility-version=19.20" "-std=c++14" "-fdelayed-template-parsing" "-fcolor-diagnostics" "-vectorize-loops" "-vectorize-slp" "-faddrsig" "-x" "c++" - .byte 242 - .byte 241 - .addrsig +# Compiled from the following files, but replaced the call to abort with nop. +# clang-cl -fuse-ld=lld-link /Z7 /O1 /Faa.asm /winsysroot~/win_toolchain a.cpp +# a.cpp: +# #include "a.h" +# int main(int argc, char** argv) { +# volatile int main_local = Namespace1::foo(2); +# return 0; +# } +# a.h: +# #include +# #include "b.h" +# namespace Namespace1 { +# inline int foo(int x) { +# volatile int foo_local = x + 1; +# ++foo_local; +# if (!foo_local) +# abort(); +# return Class1::bar(foo_local); +# } +# } // namespace Namespace1 +# b.h: +# #include "c.h" +# class Class1 { +# public: +# inline static int bar(int x) { +# volatile int bar_local = x + 1; +# ++bar_local; +# return Namespace2::Class2::func(bar_local); +# } +# }; +# c.h: +# namespace Namespace2 { +# class Class2 { +# public: +# inline static int func(int x) { +# volatile int func_local = x + 1; +# func_local += x; +# return func_local; +# } +# }; +# } // namespace Namespace2 + + .text + .def @feat.00; + .scl 3; + .type 0; + .endef + .globl @feat.00 +.set @feat.00, 0 + .intel_syntax noprefix + .file "a.cpp" + .def main; + .scl 2; + .type 32; + .endef + .section .text,"xr",one_only,main + .globl main # -- Begin function main +main: # @main +.Lfunc_begin0: + .cv_func_id 0 + .cv_file 1 "/tmp/a.cpp" "4FFB96E5DF1A95CE7DB9732CFFE001D7" 1 + .cv_loc 0 1 2 0 # a.cpp:2:0 +.seh_proc main +# %bb.0: + #DEBUG_VALUE: main:argv <- $rdx + #DEBUG_VALUE: main:argc <- $ecx + #DEBUG_VALUE: foo:x <- 2 + sub rsp, 56 + .seh_stackalloc 56 + .seh_endprologue +.Ltmp0: + .cv_file 2 "/tmp/./a.h" "BBFED90EF093E9C1D032CC9B05B5D167" 1 + .cv_inline_site_id 1 within 0 inlined_at 1 3 0 + .cv_loc 1 2 5 0 # ./a.h:5:0 + mov dword ptr [rsp + 44], 3 + .cv_loc 1 2 6 0 # ./a.h:6:0 + inc dword ptr [rsp + 44] + .cv_loc 1 2 7 0 # ./a.h:7:0 + mov eax, dword ptr [rsp + 44] + test eax, eax + je .LBB0_2 +.Ltmp1: +# %bb.1: + #DEBUG_VALUE: main:argv <- $rdx + #DEBUG_VALUE: main:argc <- $ecx + #DEBUG_VALUE: foo:x <- 2 + .cv_loc 1 2 9 0 # ./a.h:9:0 + mov eax, dword ptr [rsp + 44] +.Ltmp2: + #DEBUG_VALUE: bar:x <- $eax + .cv_file 3 "/tmp/./b.h" "A26CC743A260115F33AF91AB11F95877" 1 + .cv_inline_site_id 2 within 1 inlined_at 2 9 0 + .cv_loc 2 3 5 0 # ./b.h:5:0 + inc eax +.Ltmp3: + mov dword ptr [rsp + 52], eax + .cv_loc 2 3 6 0 # ./b.h:6:0 + inc dword ptr [rsp + 52] + .cv_loc 2 3 7 0 # ./b.h:7:0 + mov eax, dword ptr [rsp + 52] +.Ltmp4: + #DEBUG_VALUE: func:x <- $eax + .cv_file 4 "/tmp/./c.h" "8AF4613F78624BBE96D1C408ABA39B2D" 1 + .cv_inline_site_id 3 within 2 inlined_at 3 7 0 + .cv_loc 3 4 5 0 # ./c.h:5:0 + lea ecx, [rax + 1] +.Ltmp5: + #DEBUG_VALUE: main:argc <- [DW_OP_LLVM_entry_value 1] $ecx + mov dword ptr [rsp + 48], ecx + .cv_loc 3 4 6 0 # ./c.h:6:0 + add dword ptr [rsp + 48], eax + .cv_loc 3 4 7 0 # ./c.h:7:0 + mov eax, dword ptr [rsp + 48] +.Ltmp6: + .cv_loc 0 1 3 0 # a.cpp:3:0 + mov dword ptr [rsp + 48], eax + .cv_loc 0 1 4 0 # a.cpp:4:0 + xor eax, eax + # Use fake debug info to tests inline info. + .cv_loc 1 2 20 0 + add rsp, 56 + ret +.Ltmp7: +.LBB0_2: + #DEBUG_VALUE: main:argv <- $rdx + #DEBUG_VALUE: main:argc <- $ecx + #DEBUG_VALUE: foo:x <- 2 + .cv_loc 1 2 8 0 # ./a.h:8:0 + nop +.Ltmp8: + int3 +.Ltmp9: + #DEBUG_VALUE: main:argc <- [DW_OP_LLVM_entry_value 1] $ecx + #DEBUG_VALUE: main:argv <- [DW_OP_LLVM_entry_value 1] $rdx +.Lfunc_end0: + .seh_endproc + # -- End function + .section .drectve,"yn" + .ascii " /DEFAULTLIB:libcmt.lib" + .ascii " /DEFAULTLIB:oldnames.lib" + .section .debug$S,"dr" + .p2align 2 + .long 4 # Debug section magic + .long 241 + .long .Ltmp11-.Ltmp10 # Subsection size +.Ltmp10: + .short .Ltmp13-.Ltmp12 # Record length +.Ltmp12: + .short 4353 # Record kind: S_OBJNAME + .long 0 # Signature + .asciz "/tmp/a-2b2ba0.obj" # Object name + .p2align 2 +.Ltmp13: + .short .Ltmp15-.Ltmp14 # Record length +.Ltmp14: + .short 4412 # Record kind: S_COMPILE3 + .long 1 # Flags and language + .short 208 # CPUType + .short 15 # Frontend version + .short 0 + .short 0 + .short 0 + .short 15000 # Backend version + .short 0 + .short 0 + .short 0 + .asciz "clang version 15.0.0" # Null-terminated compiler version string + .p2align 2 +.Ltmp15: +.Ltmp11: + .p2align 2 + .long 246 # Inlinee lines subsection + .long .Ltmp17-.Ltmp16 # Subsection size +.Ltmp16: + .long 0 # Inlinee lines signature + + # Inlined function foo starts at ./a.h:4 + .long 4099 # Type index of inlined function + .cv_filechecksumoffset 2 # Offset into filechecksum table + .long 4 # Starting line number + + # Inlined function bar starts at ./b.h:4 + .long 4106 # Type index of inlined function + .cv_filechecksumoffset 3 # Offset into filechecksum table + .long 4 # Starting line number + + # Inlined function func starts at ./c.h:4 + .long 4113 # Type index of inlined function + .cv_filechecksumoffset 4 # Offset into filechecksum table + .long 4 # Starting line number +.Ltmp17: + .p2align 2 + .section .debug$S,"dr",associative,main + .p2align 2 + .long 4 # Debug section magic + .long 241 # Symbol subsection for main + .long .Ltmp19-.Ltmp18 # Subsection size +.Ltmp18: + .short .Ltmp21-.Ltmp20 # Record length +.Ltmp20: + .short 4423 # Record kind: S_GPROC32_ID + .long 0 # PtrParent + .long 0 # PtrEnd + .long 0 # PtrNext + .long .Lfunc_end0-main # Code size + .long 0 # Offset after prologue + .long 0 # Offset before epilogue + .long 4117 # Function type index + .secrel32 main # Function section relative address + .secidx main # Function section index + .byte 0 # Flags + .asciz "main" # Function name + .p2align 2 +.Ltmp21: + .short .Ltmp23-.Ltmp22 # Record length +.Ltmp22: + .short 4114 # Record kind: S_FRAMEPROC + .long 56 # FrameSize + .long 0 # Padding + .long 0 # Offset of padding + .long 0 # Bytes of callee saved registers + .long 0 # Exception handler offset + .short 0 # Exception handler section + .long 81920 # Flags (defines frame register) + .p2align 2 +.Ltmp23: + .short .Ltmp25-.Ltmp24 # Record length +.Ltmp24: + .short 4414 # Record kind: S_LOCAL + .long 116 # TypeIndex + .short 1 # Flags + .asciz "argc" + .p2align 2 +.Ltmp25: + .cv_def_range .Lfunc_begin0 .Ltmp5 .Ltmp7 .Ltmp8, reg, 18 + .short .Ltmp27-.Ltmp26 # Record length +.Ltmp26: + .short 4414 # Record kind: S_LOCAL + .long 4114 # TypeIndex + .short 1 # Flags + .asciz "argv" + .p2align 2 +.Ltmp27: + .cv_def_range .Lfunc_begin0 .Ltmp8, reg, 331 + .short .Ltmp29-.Ltmp28 # Record length +.Ltmp28: + .short 4414 # Record kind: S_LOCAL + .long 4118 # TypeIndex + .short 0 # Flags + .asciz "main_local" + .p2align 2 +.Ltmp29: + .cv_def_range .Ltmp0 .Ltmp9, frame_ptr_rel, 48 + .short .Ltmp31-.Ltmp30 # Record length +.Ltmp30: + .short 4429 # Record kind: S_INLINESITE + .long 0 # PtrParent + .long 0 # PtrEnd + .long 4099 # Inlinee type index + .cv_inline_linetable 1 2 4 .Lfunc_begin0 .Lfunc_end0 + .p2align 2 +.Ltmp31: + .short .Ltmp33-.Ltmp32 # Record length +.Ltmp32: + .short 4414 # Record kind: S_LOCAL + .long 116 # TypeIndex + .short 257 # Flags + .asciz "x" + .p2align 2 +.Ltmp33: + .short .Ltmp35-.Ltmp34 # Record length +.Ltmp34: + .short 4414 # Record kind: S_LOCAL + .long 4118 # TypeIndex + .short 0 # Flags + .asciz "foo_local" + .p2align 2 +.Ltmp35: + .cv_def_range .Ltmp0 .Ltmp6 .Ltmp7 .Ltmp9, frame_ptr_rel, 44 + .short .Ltmp37-.Ltmp36 # Record length +.Ltmp36: + .short 4429 # Record kind: S_INLINESITE + .long 0 # PtrParent + .long 0 # PtrEnd + .long 4106 # Inlinee type index + .cv_inline_linetable 2 3 4 .Lfunc_begin0 .Lfunc_end0 + .p2align 2 +.Ltmp37: + .short .Ltmp39-.Ltmp38 # Record length +.Ltmp38: + .short 4414 # Record kind: S_LOCAL + .long 116 # TypeIndex + .short 1 # Flags + .asciz "x" + .p2align 2 +.Ltmp39: + .cv_def_range .Ltmp2 .Ltmp3, reg, 17 + .short .Ltmp41-.Ltmp40 # Record length +.Ltmp40: + .short 4414 # Record kind: S_LOCAL + .long 4118 # TypeIndex + .short 0 # Flags + .asciz "bar_local" + .p2align 2 +.Ltmp41: + .cv_def_range .Ltmp2 .Ltmp6, frame_ptr_rel, 52 + .short .Ltmp43-.Ltmp42 # Record length +.Ltmp42: + .short 4429 # Record kind: S_INLINESITE + .long 0 # PtrParent + .long 0 # PtrEnd + .long 4113 # Inlinee type index + .cv_inline_linetable 3 4 4 .Lfunc_begin0 .Lfunc_end0 + .p2align 2 +.Ltmp43: + .short .Ltmp45-.Ltmp44 # Record length +.Ltmp44: + .short 4414 # Record kind: S_LOCAL + .long 116 # TypeIndex + .short 1 # Flags + .asciz "x" + .p2align 2 +.Ltmp45: + .cv_def_range .Ltmp4 .Ltmp6, reg, 17 + .short .Ltmp47-.Ltmp46 # Record length +.Ltmp46: + .short 4414 # Record kind: S_LOCAL + .long 4118 # TypeIndex + .short 0 # Flags + .asciz "func_local" + .p2align 2 +.Ltmp47: + .cv_def_range .Ltmp4 .Ltmp6, frame_ptr_rel, 48 + .short 2 # Record length + .short 4430 # Record kind: S_INLINESITE_END + .short 2 # Record length + .short 4430 # Record kind: S_INLINESITE_END + .short 2 # Record length + .short 4430 # Record kind: S_INLINESITE_END + .short 2 # Record length + .short 4431 # Record kind: S_PROC_ID_END +.Ltmp19: + .p2align 2 + .cv_linetable 0, main, .Lfunc_end0 + .section .debug$S,"dr" + .long 241 + .long .Ltmp49-.Ltmp48 # Subsection size +.Ltmp48: + .short .Ltmp51-.Ltmp50 # Record length +.Ltmp50: + .short 4360 # Record kind: S_UDT + .long 4103 # Type + .asciz "Class1" + .p2align 2 +.Ltmp51: + .short .Ltmp53-.Ltmp52 # Record length +.Ltmp52: + .short 4360 # Record kind: S_UDT + .long 4110 # Type + .asciz "Namespace2::Class2" + .p2align 2 +.Ltmp53: +.Ltmp49: + .p2align 2 + .cv_filechecksums # File index to string table offset subsection + .cv_stringtable # String table + .long 241 + .long .Ltmp55-.Ltmp54 # Subsection size +.Ltmp54: + .short .Ltmp57-.Ltmp56 # Record length +.Ltmp56: + .short 4428 # Record kind: S_BUILDINFO + .long 4124 # LF_BUILDINFO index + .p2align 2 +.Ltmp57: +.Ltmp55: + .p2align 2 + .section .debug$T,"dr" + .p2align 2 + .long 4 # Debug section magic + # StringId (0x1000) + .short 0x12 # Record length + .short 0x1605 # Record kind: LF_STRING_ID + .long 0x0 # Id + .asciz "Namespace1" # StringData + .byte 241 + # ArgList (0x1001) + .short 0xa # Record length + .short 0x1201 # Record kind: LF_ARGLIST + .long 0x1 # NumArgs + .long 0x74 # Argument: int + # Procedure (0x1002) + .short 0xe # Record length + .short 0x1008 # Record kind: LF_PROCEDURE + .long 0x74 # ReturnType: int + .byte 0x0 # CallingConvention: NearC + .byte 0x0 # FunctionOptions + .short 0x1 # NumParameters + .long 0x1001 # ArgListType: (int) + # FuncId (0x1003) + .short 0xe # Record length + .short 0x1601 # Record kind: LF_FUNC_ID + .long 0x1000 # ParentScope: Namespace1 + .long 0x1002 # FunctionType: int (int) + .asciz "foo" # Name + # Class (0x1004) + .short 0x2a # Record length + .short 0x1504 # Record kind: LF_CLASS + .short 0x0 # MemberCount + .short 0x280 # Properties ( ForwardReference (0x80) | HasUniqueName (0x200) ) + .long 0x0 # FieldList + .long 0x0 # DerivedFrom + .long 0x0 # VShape + .short 0x0 # SizeOf + .asciz "Class1" # Name + .asciz ".?AVClass1@@" # LinkageName + .byte 242 + .byte 241 + # MemberFunction (0x1005) + .short 0x1a # Record length + .short 0x1009 # Record kind: LF_MFUNCTION + .long 0x74 # ReturnType: int + .long 0x1004 # ClassType: Class1 + .long 0x0 # ThisType + .byte 0x0 # CallingConvention: NearC + .byte 0x0 # FunctionOptions + .short 0x1 # NumParameters + .long 0x1001 # ArgListType: (int) + .long 0x0 # ThisAdjustment + # FieldList (0x1006) + .short 0xe # Record length + .short 0x1203 # Record kind: LF_FIELDLIST + .short 0x1511 # Member kind: OneMethod ( LF_ONEMETHOD ) + .short 0xb # Attrs: Public, Static + .long 0x1005 # Type: int Class1::(int) + .asciz "bar" # Name + # Class (0x1007) + .short 0x2a # Record length + .short 0x1504 # Record kind: LF_CLASS + .short 0x1 # MemberCount + .short 0x200 # Properties ( HasUniqueName (0x200) ) + .long 0x1006 # FieldList: + .long 0x0 # DerivedFrom + .long 0x0 # VShape + .short 0x1 # SizeOf + .asciz "Class1" # Name + .asciz ".?AVClass1@@" # LinkageName + .byte 242 + .byte 241 + # StringId (0x1008) + .short 0x12 # Record length + .short 0x1605 # Record kind: LF_STRING_ID + .long 0x0 # Id + .asciz "/tmp/./b.h" # StringData + .byte 241 + # UdtSourceLine (0x1009) + .short 0xe # Record length + .short 0x1606 # Record kind: LF_UDT_SRC_LINE + .long 0x1007 # UDT: Class1 + .long 0x1008 # SourceFile: /tmp/./b.h + .long 0x2 # LineNumber + # MemberFuncId (0x100A) + .short 0xe # Record length + .short 0x1602 # Record kind: LF_MFUNC_ID + .long 0x1004 # ClassType: Class1 + .long 0x1005 # FunctionType: int Class1::(int) + .asciz "bar" # Name + # Class (0x100B) + .short 0x42 # Record length + .short 0x1504 # Record kind: LF_CLASS + .short 0x0 # MemberCount + .short 0x280 # Properties ( ForwardReference (0x80) | HasUniqueName (0x200) ) + .long 0x0 # FieldList + .long 0x0 # DerivedFrom + .long 0x0 # VShape + .short 0x0 # SizeOf + .asciz "Namespace2::Class2" # Name + .asciz ".?AVClass2 at Namespace2@@" # LinkageName + .byte 243 + .byte 242 + .byte 241 + # MemberFunction (0x100C) + .short 0x1a # Record length + .short 0x1009 # Record kind: LF_MFUNCTION + .long 0x74 # ReturnType: int + .long 0x100b # ClassType: Namespace2::Class2 + .long 0x0 # ThisType + .byte 0x0 # CallingConvention: NearC + .byte 0x0 # FunctionOptions + .short 0x1 # NumParameters + .long 0x1001 # ArgListType: (int) + .long 0x0 # ThisAdjustment + # FieldList (0x100D) + .short 0x12 # Record length + .short 0x1203 # Record kind: LF_FIELDLIST + .short 0x1511 # Member kind: OneMethod ( LF_ONEMETHOD ) + .short 0xb # Attrs: Public, Static + .long 0x100c # Type: int Namespace2::Class2::(int) + .asciz "func" # Name + .byte 243 + .byte 242 + .byte 241 + # Class (0x100E) + .short 0x42 # Record length + .short 0x1504 # Record kind: LF_CLASS + .short 0x1 # MemberCount + .short 0x200 # Properties ( HasUniqueName (0x200) ) + .long 0x100d # FieldList: + .long 0x0 # DerivedFrom + .long 0x0 # VShape + .short 0x1 # SizeOf + .asciz "Namespace2::Class2" # Name + .asciz ".?AVClass2 at Namespace2@@" # LinkageName + .byte 243 + .byte 242 + .byte 241 + # StringId (0x100F) + .short 0x12 # Record length + .short 0x1605 # Record kind: LF_STRING_ID + .long 0x0 # Id + .asciz "/tmp/./c.h" # StringData + .byte 241 + # UdtSourceLine (0x1010) + .short 0xe # Record length + .short 0x1606 # Record kind: LF_UDT_SRC_LINE + .long 0x100e # UDT: Namespace2::Class2 + .long 0x100f # SourceFile: /tmp/./c.h + .long 0x2 # LineNumber + # MemberFuncId (0x1011) + .short 0x12 # Record length + .short 0x1602 # Record kind: LF_MFUNC_ID + .long 0x100b # ClassType: Namespace2::Class2 + .long 0x100c # FunctionType: int Namespace2::Class2::(int) + .asciz "func" # Name + .byte 243 + .byte 242 + .byte 241 + # Pointer (0x1012) + .short 0xa # Record length + .short 0x1002 # Record kind: LF_POINTER + .long 0x670 # PointeeType: char* + .long 0x1000c # Attrs: [ Type: Near64, Mode: Pointer, SizeOf: 8 ] + # ArgList (0x1013) + .short 0xe # Record length + .short 0x1201 # Record kind: LF_ARGLIST + .long 0x2 # NumArgs + .long 0x74 # Argument: int + .long 0x1012 # Argument: char** + # Procedure (0x1014) + .short 0xe # Record length + .short 0x1008 # Record kind: LF_PROCEDURE + .long 0x74 # ReturnType: int + .byte 0x0 # CallingConvention: NearC + .byte 0x0 # FunctionOptions + .short 0x2 # NumParameters + .long 0x1013 # ArgListType: (int, char**) + # FuncId (0x1015) + .short 0x12 # Record length + .short 0x1601 # Record kind: LF_FUNC_ID + .long 0x0 # ParentScope + .long 0x1014 # FunctionType: int (int, char**) + .asciz "main" # Name + .byte 243 + .byte 242 + .byte 241 + # Modifier (0x1016) + .short 0xa # Record length + .short 0x1001 # Record kind: LF_MODIFIER + .long 0x74 # ModifiedType: int + .short 0x2 # Modifiers ( Volatile (0x2) ) + .byte 242 + .byte 241 + # StringId (0x1017) + .short 0xe # Record length + .short 0x1605 # Record kind: LF_STRING_ID + .long 0x0 # Id + .asciz "/tmp" # StringData + .byte 243 + .byte 242 + .byte 241 + # StringId (0x1018) + .short 0xe # Record length + .short 0x1605 # Record kind: LF_STRING_ID + .long 0x0 # Id + .asciz "a.cpp" # StringData + .byte 242 + .byte 241 + # StringId (0x1019) + .short 0xa # Record length + .short 0x1605 # Record kind: LF_STRING_ID + .long 0x0 # Id + .byte 0 # StringData + .byte 243 + .byte 242 + .byte 241 + # StringId (0x101A) + .short 0x4e # Record length + .short 0x1605 # Record kind: LF_STRING_ID + .long 0x0 # Id + .asciz "/usr/local/google/home/zequanwu/llvm-project/build/release/bin/clang" # StringData + .byte 243 + .byte 242 + .byte 241 + # StringId (0x101B) + .short 0x9f6 # Record length + .short 0x1605 # Record kind: LF_STRING_ID + .long 0x0 # Id + .asciz "\"-cc1\" \"-triple\" \"x86_64-pc-windows-msvc19.20.0\" \"-S\" \"-disable-free\" \"-clear-ast-before-backend\" \"-disable-llvm-verifier\" \"-discard-value-names\" \"-mrelocation-model\" \"pic\" \"-pic-level\" \"2\" \"-mframe-pointer=none\" \"-relaxed-aliasing\" \"-fmath-errno\" \"-ffp-contract=on\" \"-fno-rounding-math\" \"-mconstructor-aliases\" \"-funwind-tables=2\" \"-target-cpu\" \"x86-64\" \"-mllvm\" \"-x86-asm-syntax=intel\" \"-tune-cpu\" \"generic\" \"-mllvm\" \"-treat-scalable-fixed-error-as-warning\" \"-D_MT\" \"-flto-visibility-public-std\" \"--dependent-lib=libcmt\" \"--dependent-lib=oldnames\" \"-stack-protector\" \"2\" \"-fms-volatile\" \"-fdiagnostics-format\" \"msvc\" \"-gno-column-info\" \"-gcodeview\" \"-debug-info-kind=constructor\" \"-ffunction-sections\" \"-fcoverage-compilation-dir=/tmp\" \"-resource-dir\" \"/usr/local/google/home/zequanwu/llvm-project/build/release/lib/clang/15.0.0\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/llvm-project/build/release/lib/clang/15.0.0/include\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/DIA SDK/include\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/VC/Tools/MSVC/14.26.28801/include\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/VC/Tools/MSVC/14.26.28801/atlmfc/include\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/ucrt\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/shared\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/um\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/winrt\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/cppwinrt\" \"-Os\" \"-fdeprecated-macro\" \"-fdebug-compilation-dir=/tmp\" \"-ferror-limit\" \"19\" \"-fno-use-cxa-atexit\" \"-fms-extensions\" \"-fms-compatibility\" \"-fms-compatibility-version=19.20\" \"-std=c++14\" \"-fdelayed-template-parsing\" \"-fcolor-diagnostics\" \"-vectorize-loops\" \"-vectorize-slp\" \"-faddrsig\" \"-x\" \"c++\"" # StringData + .byte 242 + .byte 241 + # BuildInfo (0x101C) + .short 0x1a # Record length + .short 0x1603 # Record kind: LF_BUILDINFO + .short 0x5 # NumArgs + .long 0x1017 # Argument: /tmp + .long 0x101a # Argument: /usr/local/google/home/zequanwu/llvm-project/build/release/bin/clang + .long 0x1018 # Argument: a.cpp + .long 0x1019 # Argument + .long 0x101b # Argument: "-cc1" "-triple" "x86_64-pc-windows-msvc19.20.0" "-S" "-disable-free" "-clear-ast-before-backend" "-disable-llvm-verifier" "-discard-value-names" "-mrelocation-model" "pic" "-pic-level" "2" "-mframe-pointer=none" "-relaxed-aliasing" "-fmath-errno" "-ffp-contract=on" "-fno-rounding-math" "-mconstructor-aliases" "-funwind-tables=2" "-target-cpu" "x86-64" "-mllvm" "-x86-asm-syntax=intel" "-tune-cpu" "generic" "-mllvm" "-treat-scalable-fixed-error-as-warning" "-D_MT" "-flto-visibility-public-std" "--dependent-lib=libcmt" "--dependent-lib=oldnames" "-stack-protector" "2" "-fms-volatile" "-fdiagnostics-format" "msvc" "-gno-column-info" "-gcodeview" "-debug-info-kind=constructor" "-ffunction-sections" "-fcoverage-compilation-dir=/tmp" "-resource-dir" "/usr/local/google/home/zequanwu/llvm-project/build/release/lib/clang/15.0.0" "-internal-isystem" "/usr/local/google/home/zequanwu/llvm-project/build/release/lib/clang/15.0.0/include" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/DIA SDK/include" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/VC/Tools/MSVC/14.26.28801/include" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/VC/Tools/MSVC/14.26.28801/atlmfc/include" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/ucrt" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/shared" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/um" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/winrt" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/cppwinrt" "-Os" "-fdeprecated-macro" "-fdebug-compilation-dir=/tmp" "-ferror-limit" "19" "-fno-use-cxa-atexit" "-fms-extensions" "-fms-compatibility" "-fms-compatibility-version=19.20" "-std=c++14" "-fdelayed-template-parsing" "-fcolor-diagnostics" "-vectorize-loops" "-vectorize-slp" "-faddrsig" "-x" "c++" + .byte 242 + .byte 241 + .addrsig diff --git a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/inline_sites_live.lldbinit b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/inline_sites_live.lldbinit index 2291c7c4527175..eab5061dafbdcd 100644 --- a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/inline_sites_live.lldbinit +++ b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/inline_sites_live.lldbinit @@ -1,7 +1,7 @@ -br set -p BP_bar -f inline_sites_live.cpp -br set -p BP_foo -f inline_sites_live.cpp -run -expression param -continue -expression param -expression local +br set -p BP_bar -f inline_sites_live.cpp +br set -p BP_foo -f inline_sites_live.cpp +run +expression param +continue +expression param +expression local diff --git a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/local-variables-registers.lldbinit b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/local-variables-registers.lldbinit index ad080da24dab71..feda7485675792 100644 --- a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/local-variables-registers.lldbinit +++ b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/local-variables-registers.lldbinit @@ -1,35 +1,35 @@ -image lookup -a 0x140001000 -v -image lookup -a 0x140001003 -v -image lookup -a 0x140001006 -v - -image lookup -a 0x140001011 -v -image lookup -a 0x140001017 -v -image lookup -a 0x140001019 -v -image lookup -a 0x14000101e -v -image lookup -a 0x14000102c -v - -image lookup -a 0x140001031 -v -image lookup -a 0x140001032 -v -image lookup -a 0x140001033 -v -image lookup -a 0x140001034 -v -image lookup -a 0x140001035 -v -image lookup -a 0x140001036 -v -image lookup -a 0x140001037 -v -image lookup -a 0x14000103b -v -image lookup -a 0x14000103d -v -image lookup -a 0x14000103f -v -image lookup -a 0x140001041 -v -image lookup -a 0x140001043 -v -image lookup -a 0x140001045 -v -image lookup -a 0x140001046 -v -image lookup -a 0x140001047 -v -image lookup -a 0x140001048 -v -image lookup -a 0x140001049 -v -image lookup -a 0x14000104a -v -image lookup -a 0x14000104b -v -image lookup -a 0x14000104c -v -image lookup -a 0x14000104e -v -image lookup -a 0x14000104f -v -image lookup -a 0x140001050 -v -image lookup -a 0x140001051 -v -exit +image lookup -a 0x140001000 -v +image lookup -a 0x140001003 -v +image lookup -a 0x140001006 -v + +image lookup -a 0x140001011 -v +image lookup -a 0x140001017 -v +image lookup -a 0x140001019 -v +image lookup -a 0x14000101e -v +image lookup -a 0x14000102c -v + +image lookup -a 0x140001031 -v +image lookup -a 0x140001032 -v +image lookup -a 0x140001033 -v +image lookup -a 0x140001034 -v +image lookup -a 0x140001035 -v +image lookup -a 0x140001036 -v +image lookup -a 0x140001037 -v +image lookup -a 0x14000103b -v +image lookup -a 0x14000103d -v +image lookup -a 0x14000103f -v +image lookup -a 0x140001041 -v +image lookup -a 0x140001043 -v +image lookup -a 0x140001045 -v +image lookup -a 0x140001046 -v +image lookup -a 0x140001047 -v +image lookup -a 0x140001048 -v +image lookup -a 0x140001049 -v +image lookup -a 0x14000104a -v +image lookup -a 0x14000104b -v +image lookup -a 0x14000104c -v +image lookup -a 0x14000104e -v +image lookup -a 0x14000104f -v +image lookup -a 0x140001050 -v +image lookup -a 0x140001051 -v +exit diff --git a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/lookup-by-types.lldbinit b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/lookup-by-types.lldbinit index afe3f2c8b943e3..3f639eb2e539bc 100644 --- a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/lookup-by-types.lldbinit +++ b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/lookup-by-types.lldbinit @@ -1,4 +1,4 @@ -image lookup -type A -image lookup -type B - +image lookup -type A +image lookup -type B + quit \ No newline at end of file diff --git a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/subfield_register_simple_type.lldbinit b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/subfield_register_simple_type.lldbinit index 3dc33fd789dac0..32758f1fbc51f3 100644 --- a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/subfield_register_simple_type.lldbinit +++ b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/subfield_register_simple_type.lldbinit @@ -1,2 +1,2 @@ -image lookup -a 0x40102f -v -quit +image lookup -a 0x40102f -v +quit diff --git a/lldb/test/Shell/SymbolFile/NativePDB/function-types-classes.cpp b/lldb/test/Shell/SymbolFile/NativePDB/function-types-classes.cpp index ca2a84de7698a4..f0fac90e5065a1 100644 --- a/lldb/test/Shell/SymbolFile/NativePDB/function-types-classes.cpp +++ b/lldb/test/Shell/SymbolFile/NativePDB/function-types-classes.cpp @@ -113,9 +113,9 @@ auto incomplete = &three; // CHECK: |-CXXRecordDecl {{.*}} union U // CHECK: |-EnumDecl {{.*}} E // CHECK: |-CXXRecordDecl {{.*}} struct S -// CHECK: |-VarDecl {{.*}} a 'S (*)(C *, U &, E &&)' -// CHECK: |-VarDecl {{.*}} b 'E (*)(const S *, const C &, const U &&)' -// CHECK: |-VarDecl {{.*}} c 'U (*)(volatile E *, volatile S &, volatile C &&)' +// CHECK: |-VarDecl {{.*}} a 'S (*)(C *, U &, E &&)' +// CHECK: |-VarDecl {{.*}} b 'E (*)(const S *, const C &, const U &&)' +// CHECK: |-VarDecl {{.*}} c 'U (*)(volatile E *, volatile S &, volatile C &&)' // CHECK: |-VarDecl {{.*}} d 'C (*)(const volatile U *, const volatile E &, const volatile S &&)' // CHECK: |-CXXRecordDecl {{.*}} struct B // CHECK: | `-CXXRecordDecl {{.*}} struct A @@ -125,14 +125,14 @@ auto incomplete = &three; // CHECK: | | `-CXXRecordDecl {{.*}} struct S // CHECK: | `-NamespaceDecl {{.*}} B // CHECK: | `-CXXRecordDecl {{.*}} struct S -// CHECK: |-VarDecl {{.*}} e 'A::B::S *(*)(B::A::S *, A::C::S &)' -// CHECK: |-VarDecl {{.*}} f 'A::C::S &(*)(A::B::S *, B::A::S *)' +// CHECK: |-VarDecl {{.*}} e 'A::B::S *(*)(B::A::S *, A::C::S &)' +// CHECK: |-VarDecl {{.*}} f 'A::C::S &(*)(A::B::S *, B::A::S *)' // CHECK: |-VarDecl {{.*}} g 'B::A::S *(*)(A::C::S &, A::B::S *)' // CHECK: |-CXXRecordDecl {{.*}} struct TC // CHECK: |-CXXRecordDecl {{.*}} struct TC> // CHECK: |-CXXRecordDecl {{.*}} struct TC // CHECK: |-CXXRecordDecl {{.*}} struct TC -// CHECK: |-VarDecl {{.*}} h 'TC (*)(TC, TC>, TC)' +// CHECK: |-VarDecl {{.*}} h 'TC (*)(TC, TC>, TC)' // CHECK: |-VarDecl {{.*}} i 'A::B::S (*)()' // CHECK: |-CXXRecordDecl {{.*}} struct Incomplete // CHECK: `-VarDecl {{.*}} incomplete 'Incomplete *(*)(Incomplete **, const Incomplete *)' diff --git a/lldb/test/Shell/SymbolFile/NativePDB/inline_sites_live.cpp b/lldb/test/Shell/SymbolFile/NativePDB/inline_sites_live.cpp index 767149ea18c468..40298272696580 100644 --- a/lldb/test/Shell/SymbolFile/NativePDB/inline_sites_live.cpp +++ b/lldb/test/Shell/SymbolFile/NativePDB/inline_sites_live.cpp @@ -1,34 +1,34 @@ -// clang-format off -// REQUIRES: system-windows - -// RUN: %build -o %t.exe -- %s -// RUN: env LLDB_USE_NATIVE_PDB_READER=1 %lldb -f %t.exe -s \ -// RUN: %p/Inputs/inline_sites_live.lldbinit 2>&1 | FileCheck %s - -void use(int) {} - -void __attribute__((always_inline)) bar(int param) { - use(param); // BP_bar -} - -void __attribute__((always_inline)) foo(int param) { - int local = param+1; - bar(local); - use(param); - use(local); // BP_foo -} - -int main(int argc, char** argv) { - foo(argc); -} - -// CHECK: * thread #1, stop reason = breakpoint 1 -// CHECK-NEXT: frame #0: {{.*}}`main [inlined] bar(param=2) -// CHECK: (lldb) expression param -// CHECK-NEXT: (int) $0 = 2 -// CHECK: * thread #1, stop reason = breakpoint 2 -// CHECK-NEXT: frame #0: {{.*}}`main [inlined] foo(param=1) -// CHECK: (lldb) expression param -// CHECK-NEXT: (int) $1 = 1 -// CHECK-NEXT: (lldb) expression local -// CHECK-NEXT: (int) $2 = 2 +// clang-format off +// REQUIRES: system-windows + +// RUN: %build -o %t.exe -- %s +// RUN: env LLDB_USE_NATIVE_PDB_READER=1 %lldb -f %t.exe -s \ +// RUN: %p/Inputs/inline_sites_live.lldbinit 2>&1 | FileCheck %s + +void use(int) {} + +void __attribute__((always_inline)) bar(int param) { + use(param); // BP_bar +} + +void __attribute__((always_inline)) foo(int param) { + int local = param+1; + bar(local); + use(param); + use(local); // BP_foo +} + +int main(int argc, char** argv) { + foo(argc); +} + +// CHECK: * thread #1, stop reason = breakpoint 1 +// CHECK-NEXT: frame #0: {{.*}}`main [inlined] bar(param=2) +// CHECK: (lldb) expression param +// CHECK-NEXT: (int) $0 = 2 +// CHECK: * thread #1, stop reason = breakpoint 2 +// CHECK-NEXT: frame #0: {{.*}}`main [inlined] foo(param=1) +// CHECK: (lldb) expression param +// CHECK-NEXT: (int) $1 = 1 +// CHECK-NEXT: (lldb) expression local +// CHECK-NEXT: (int) $2 = 2 diff --git a/lldb/test/Shell/SymbolFile/NativePDB/lookup-by-types.cpp b/lldb/test/Shell/SymbolFile/NativePDB/lookup-by-types.cpp index f3aea8115f3858..cd5bbfc30fa0e1 100644 --- a/lldb/test/Shell/SymbolFile/NativePDB/lookup-by-types.cpp +++ b/lldb/test/Shell/SymbolFile/NativePDB/lookup-by-types.cpp @@ -1,46 +1,46 @@ -// clang-format off - -// RUN: %build -o %t.exe -- %s -// RUN: env LLDB_USE_NATIVE_PDB_READER=1 %lldb -f %t.exe -s \ -// RUN: %p/Inputs/lookup-by-types.lldbinit 2>&1 | FileCheck %s - -class B; -class A { -public: - static const A constA; - static A a; - static B b; - int val = 1; -}; -class B { -public: - static A a; - int val = 2; -}; -A varA; -B varB; -const A A::constA = varA; -A A::a = varA; -B A::b = varB; -A B::a = varA; - -int main(int argc, char **argv) { - return varA.val + varB.val; -} - -// CHECK: image lookup -type A -// CHECK-NEXT: 1 match found in {{.*}}.exe -// CHECK-NEXT: compiler_type = "class A { -// CHECK-NEXT: static const A constA; -// CHECK-NEXT: static A a; -// CHECK-NEXT: static B b; -// CHECK-NEXT: public: -// CHECK-NEXT: int val; -// CHECK-NEXT: }" -// CHECK: image lookup -type B -// CHECK-NEXT: 1 match found in {{.*}}.exe -// CHECK-NEXT: compiler_type = "class B { -// CHECK-NEXT: static A a; -// CHECK-NEXT: public: -// CHECK-NEXT: int val; -// CHECK-NEXT: }" +// clang-format off + +// RUN: %build -o %t.exe -- %s +// RUN: env LLDB_USE_NATIVE_PDB_READER=1 %lldb -f %t.exe -s \ +// RUN: %p/Inputs/lookup-by-types.lldbinit 2>&1 | FileCheck %s + +class B; +class A { +public: + static const A constA; + static A a; + static B b; + int val = 1; +}; +class B { +public: + static A a; + int val = 2; +}; +A varA; +B varB; +const A A::constA = varA; +A A::a = varA; +B A::b = varB; +A B::a = varA; + +int main(int argc, char **argv) { + return varA.val + varB.val; +} + +// CHECK: image lookup -type A +// CHECK-NEXT: 1 match found in {{.*}}.exe +// CHECK-NEXT: compiler_type = "class A { +// CHECK-NEXT: static const A constA; +// CHECK-NEXT: static A a; +// CHECK-NEXT: static B b; +// CHECK-NEXT: public: +// CHECK-NEXT: int val; +// CHECK-NEXT: }" +// CHECK: image lookup -type B +// CHECK-NEXT: 1 match found in {{.*}}.exe +// CHECK-NEXT: compiler_type = "class B { +// CHECK-NEXT: static A a; +// CHECK-NEXT: public: +// CHECK-NEXT: int val; +// CHECK-NEXT: }" diff --git a/lldb/unittests/Breakpoint/CMakeLists.txt b/lldb/unittests/Breakpoint/CMakeLists.txt index 757c2da1a4d9de..db985bc82dc5e2 100644 --- a/lldb/unittests/Breakpoint/CMakeLists.txt +++ b/lldb/unittests/Breakpoint/CMakeLists.txt @@ -1,10 +1,10 @@ -add_lldb_unittest(LLDBBreakpointTests - BreakpointIDTest.cpp - WatchpointAlgorithmsTests.cpp - - LINK_LIBS - lldbBreakpoint - lldbCore - LINK_COMPONENTS - Support - ) +add_lldb_unittest(LLDBBreakpointTests + BreakpointIDTest.cpp + WatchpointAlgorithmsTests.cpp + + LINK_LIBS + lldbBreakpoint + lldbCore + LINK_COMPONENTS + Support + ) diff --git a/llvm/benchmarks/FormatVariadicBM.cpp b/llvm/benchmarks/FormatVariadicBM.cpp index c03ead400d0d5c..e351db338730e9 100644 --- a/llvm/benchmarks/FormatVariadicBM.cpp +++ b/llvm/benchmarks/FormatVariadicBM.cpp @@ -1,63 +1,63 @@ -//===- FormatVariadicBM.cpp - formatv() benchmark ---------- --------------===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// - -#include "benchmark/benchmark.h" -#include "llvm/Support/FormatVariadic.h" -#include -#include -#include - -using namespace llvm; -using namespace std; - -// Generate a list of format strings that have `NumReplacements` replacements -// by permuting the replacements and some literal text. -static vector getFormatStrings(int NumReplacements) { - vector Components; - for (int I = 0; I < NumReplacements; I++) - Components.push_back("{" + to_string(I) + "}"); - // Intersperse these with some other literal text (_). - const string_view Literal = "____"; - for (char C : Literal) - Components.push_back(string(1, C)); - - vector Formats; - do { - string Concat; - for (const string &C : Components) - Concat += C; - Formats.emplace_back(Concat); - } while (next_permutation(Components.begin(), Components.end())); - return Formats; -} - -// Generate the set of formats to exercise outside the benchmark code. -static const vector> Formats = { - getFormatStrings(1), getFormatStrings(2), getFormatStrings(3), - getFormatStrings(4), getFormatStrings(5), -}; - -// Benchmark formatv() for a variety of format strings and 1-5 replacements. -static void BM_FormatVariadic(benchmark::State &state) { - for (auto _ : state) { - for (const string &Fmt : Formats[0]) - formatv(Fmt.c_str(), 1).str(); - for (const string &Fmt : Formats[1]) - formatv(Fmt.c_str(), 1, 2).str(); - for (const string &Fmt : Formats[2]) - formatv(Fmt.c_str(), 1, 2, 3).str(); - for (const string &Fmt : Formats[3]) - formatv(Fmt.c_str(), 1, 2, 3, 4).str(); - for (const string &Fmt : Formats[4]) - formatv(Fmt.c_str(), 1, 2, 3, 4, 5).str(); - } -} - -BENCHMARK(BM_FormatVariadic); - -BENCHMARK_MAIN(); +//===- FormatVariadicBM.cpp - formatv() benchmark ---------- --------------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#include "benchmark/benchmark.h" +#include "llvm/Support/FormatVariadic.h" +#include +#include +#include + +using namespace llvm; +using namespace std; + +// Generate a list of format strings that have `NumReplacements` replacements +// by permuting the replacements and some literal text. +static vector getFormatStrings(int NumReplacements) { + vector Components; + for (int I = 0; I < NumReplacements; I++) + Components.push_back("{" + to_string(I) + "}"); + // Intersperse these with some other literal text (_). + const string_view Literal = "____"; + for (char C : Literal) + Components.push_back(string(1, C)); + + vector Formats; + do { + string Concat; + for (const string &C : Components) + Concat += C; + Formats.emplace_back(Concat); + } while (next_permutation(Components.begin(), Components.end())); + return Formats; +} + +// Generate the set of formats to exercise outside the benchmark code. +static const vector> Formats = { + getFormatStrings(1), getFormatStrings(2), getFormatStrings(3), + getFormatStrings(4), getFormatStrings(5), +}; + +// Benchmark formatv() for a variety of format strings and 1-5 replacements. +static void BM_FormatVariadic(benchmark::State &state) { + for (auto _ : state) { + for (const string &Fmt : Formats[0]) + formatv(Fmt.c_str(), 1).str(); + for (const string &Fmt : Formats[1]) + formatv(Fmt.c_str(), 1, 2).str(); + for (const string &Fmt : Formats[2]) + formatv(Fmt.c_str(), 1, 2, 3).str(); + for (const string &Fmt : Formats[3]) + formatv(Fmt.c_str(), 1, 2, 3, 4).str(); + for (const string &Fmt : Formats[4]) + formatv(Fmt.c_str(), 1, 2, 3, 4, 5).str(); + } +} + +BENCHMARK(BM_FormatVariadic); + +BENCHMARK_MAIN(); diff --git a/llvm/benchmarks/GetIntrinsicForClangBuiltin.cpp b/llvm/benchmarks/GetIntrinsicForClangBuiltin.cpp index fa9c528424c95f..953d9125e11ee2 100644 --- a/llvm/benchmarks/GetIntrinsicForClangBuiltin.cpp +++ b/llvm/benchmarks/GetIntrinsicForClangBuiltin.cpp @@ -1,50 +1,50 @@ -#include "benchmark/benchmark.h" -#include "llvm/IR/Intrinsics.h" - -using namespace llvm; -using namespace Intrinsic; - -// Benchmark intrinsic lookup from a variety of targets. -static void BM_GetIntrinsicForClangBuiltin(benchmark::State &state) { - static const char *Builtins[] = { - "__builtin_adjust_trampoline", - "__builtin_trap", - "__builtin_arm_ttest", - "__builtin_amdgcn_cubetc", - "__builtin_amdgcn_udot2", - "__builtin_arm_stc", - "__builtin_bpf_compare", - "__builtin_HEXAGON_A2_max", - "__builtin_lasx_xvabsd_b", - "__builtin_mips_dlsa", - "__nvvm_floor_f", - "__builtin_altivec_vslb", - "__builtin_r600_read_tgid_x", - "__builtin_riscv_aes64im", - "__builtin_s390_vcksm", - "__builtin_ve_vl_pvfmksge_Mvl", - "__builtin_ia32_axor64", - "__builtin_bitrev", - }; - static const char *Targets[] = {"", "aarch64", "amdgcn", "mips", - "nvvm", "r600", "riscv"}; - - for (auto _ : state) { - for (auto Builtin : Builtins) - for (auto Target : Targets) - getIntrinsicForClangBuiltin(Target, Builtin); - } -} - -static void -BM_GetIntrinsicForClangBuiltinHexagonFirst(benchmark::State &state) { - // Exercise the worst case by looking for the first builtin for a target - // that has a lot of builtins. - for (auto _ : state) - getIntrinsicForClangBuiltin("hexagon", "__builtin_HEXAGON_A2_abs"); -} - -BENCHMARK(BM_GetIntrinsicForClangBuiltin); -BENCHMARK(BM_GetIntrinsicForClangBuiltinHexagonFirst); - -BENCHMARK_MAIN(); +#include "benchmark/benchmark.h" +#include "llvm/IR/Intrinsics.h" + +using namespace llvm; +using namespace Intrinsic; + +// Benchmark intrinsic lookup from a variety of targets. +static void BM_GetIntrinsicForClangBuiltin(benchmark::State &state) { + static const char *Builtins[] = { + "__builtin_adjust_trampoline", + "__builtin_trap", + "__builtin_arm_ttest", + "__builtin_amdgcn_cubetc", + "__builtin_amdgcn_udot2", + "__builtin_arm_stc", + "__builtin_bpf_compare", + "__builtin_HEXAGON_A2_max", + "__builtin_lasx_xvabsd_b", + "__builtin_mips_dlsa", + "__nvvm_floor_f", + "__builtin_altivec_vslb", + "__builtin_r600_read_tgid_x", + "__builtin_riscv_aes64im", + "__builtin_s390_vcksm", + "__builtin_ve_vl_pvfmksge_Mvl", + "__builtin_ia32_axor64", + "__builtin_bitrev", + }; + static const char *Targets[] = {"", "aarch64", "amdgcn", "mips", + "nvvm", "r600", "riscv"}; + + for (auto _ : state) { + for (auto Builtin : Builtins) + for (auto Target : Targets) + getIntrinsicForClangBuiltin(Target, Builtin); + } +} + +static void +BM_GetIntrinsicForClangBuiltinHexagonFirst(benchmark::State &state) { + // Exercise the worst case by looking for the first builtin for a target + // that has a lot of builtins. + for (auto _ : state) + getIntrinsicForClangBuiltin("hexagon", "__builtin_HEXAGON_A2_abs"); +} + +BENCHMARK(BM_GetIntrinsicForClangBuiltin); +BENCHMARK(BM_GetIntrinsicForClangBuiltinHexagonFirst); + +BENCHMARK_MAIN(); diff --git a/llvm/benchmarks/GetIntrinsicInfoTableEntriesBM.cpp b/llvm/benchmarks/GetIntrinsicInfoTableEntriesBM.cpp index 7f3bd3bc9eb6b3..758291274675d6 100644 --- a/llvm/benchmarks/GetIntrinsicInfoTableEntriesBM.cpp +++ b/llvm/benchmarks/GetIntrinsicInfoTableEntriesBM.cpp @@ -1,30 +1,30 @@ -//===- GetIntrinsicInfoTableEntries.cpp - IIT signature benchmark ---------===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// - -#include "benchmark/benchmark.h" -#include "llvm/ADT/SmallVector.h" -#include "llvm/IR/Intrinsics.h" - -using namespace llvm; -using namespace Intrinsic; - -static void BM_GetIntrinsicInfoTableEntries(benchmark::State &state) { - SmallVector Table; - for (auto _ : state) { - for (ID ID = 1; ID < num_intrinsics; ++ID) { - // This makes sure the vector does not keep growing, as well as after the - // first iteration does not result in additional allocations. - Table.clear(); - getIntrinsicInfoTableEntries(ID, Table); - } - } -} - -BENCHMARK(BM_GetIntrinsicInfoTableEntries); - -BENCHMARK_MAIN(); +//===- GetIntrinsicInfoTableEntries.cpp - IIT signature benchmark ---------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#include "benchmark/benchmark.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/IR/Intrinsics.h" + +using namespace llvm; +using namespace Intrinsic; + +static void BM_GetIntrinsicInfoTableEntries(benchmark::State &state) { + SmallVector Table; + for (auto _ : state) { + for (ID ID = 1; ID < num_intrinsics; ++ID) { + // This makes sure the vector does not keep growing, as well as after the + // first iteration does not result in additional allocations. + Table.clear(); + getIntrinsicInfoTableEntries(ID, Table); + } + } +} + +BENCHMARK(BM_GetIntrinsicInfoTableEntries); + +BENCHMARK_MAIN(); diff --git a/llvm/docs/_static/LoopOptWG_invite.ics b/llvm/docs/_static/LoopOptWG_invite.ics index 65597d90a9c852..7c92e4048cc3d1 100644 --- a/llvm/docs/_static/LoopOptWG_invite.ics +++ b/llvm/docs/_static/LoopOptWG_invite.ics @@ -1,80 +1,80 @@ -BEGIN:VCALENDAR -PRODID:-//Google Inc//Google Calendar 70.9054//EN -VERSION:2.0 -CALSCALE:GREGORIAN -METHOD:PUBLISH -X-WR-CALNAME:LLVM Loop Optimization Discussion -X-WR-TIMEZONE:Europe/Berlin -BEGIN:VTIMEZONE -TZID:America/New_York -X-LIC-LOCATION:America/New_York -BEGIN:DAYLIGHT -TZOFFSETFROM:-0500 -TZOFFSETTO:-0400 -TZNAME:EDT -DTSTART:19700308T020000 -RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU -END:DAYLIGHT -BEGIN:STANDARD -TZOFFSETFROM:-0400 -TZOFFSETTO:-0500 -TZNAME:EST -DTSTART:19701101T020000 -RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU -END:STANDARD -END:VTIMEZONE -BEGIN:VEVENT -DTSTART;TZID=America/New_York:20240904T110000 -DTEND;TZID=America/New_York:20240904T120000 -RRULE:FREQ=MONTHLY;BYDAY=1WE -DTSTAMP:20240821T160951Z -UID:58h3f0kd3aooohmeii0johh23c at google.com -X-GOOGLE-CONFERENCE:https://meet.google.com/fmz-gspu-odg -CREATED:20240821T151507Z -DESCRIPTION:LLVM Loop Optimization Discussion
Video call link: https://meet.google.c - om/fmz-gspu-odg
Agenda/Minutes/Discussion: https://docs.google.com/document/d/1sdzoyB - 11s0ccTZ3fobqctDpgJmRoFcz0sviKxqczs4g/edit?usp=sharing\n\n-::~:~::~:~:~ - :~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~::~:~::-\ - nJoin with Google Meet: https://meet.google.com/fmz-gspu-odg\nOr dial: (DE) - +49 40 8081617343 PIN: 948106286#\nMore phone numbers: https://tel.meet/fm - z-gspu-odg?pin=6273693382184&hs=7\n\nLearn more about Meet at: https://supp - ort.google.com/a/users/answer/9282720\n\nPlease do not edit this section.\n - -::~:~::~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~ - :~:~::~:~::- -LAST-MODIFIED:20240821T160941Z -SEQUENCE:0 -STATUS:CONFIRMED -SUMMARY:LLVM Loop Optimization Discussion -TRANSP:OPAQUE -END:VEVENT -BEGIN:VEVENT -DTSTART;TZID=America/New_York:20240904T110000 -DTEND;TZID=America/New_York:20240904T120000 -DTSTAMP:20240821T160951Z -UID:58h3f0kd3aooohmeii0johh23c at google.com -X-GOOGLE-CONFERENCE:https://meet.google.com/fmz-gspu-odg -RECURRENCE-ID;TZID=America/New_York:20240904T110000 -CREATED:20240821T151507Z -DESCRIPTION:LLVM Loop Optimization Discussion
Video call link: https://meet.google.c - om/fmz-gspu-odg
Agenda/Minutes/Discussion: https://docs.google.com/document/d/1sdzoyB - 11s0ccTZ3fobqctDpgJmRoFcz0sviKxqczs4g/edit?usp=sharing\n\n-::~:~::~:~:~ - :~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~::~:~::-\ - nJoin with Google Meet: https://meet.google.com/fmz-gspu-odg\nOr dial: (DE) - +49 40 8081617343 PIN: 948106286#\nMore phone numbers: https://tel.meet/fm - z-gspu-odg?pin=6273693382184&hs=7\n\nLearn more about Meet at: https://supp - ort.google.com/a/users/answer/9282720\n\nPlease do not edit this section.\n - -::~:~::~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~ - :~:~::~:~::- -LAST-MODIFIED:20240821T160941Z -SEQUENCE:0 -STATUS:CONFIRMED -SUMMARY:LLVM Loop Optimization Discussion -TRANSP:OPAQUE -END:VEVENT -END:VCALENDAR +BEGIN:VCALENDAR +PRODID:-//Google Inc//Google Calendar 70.9054//EN +VERSION:2.0 +CALSCALE:GREGORIAN +METHOD:PUBLISH +X-WR-CALNAME:LLVM Loop Optimization Discussion +X-WR-TIMEZONE:Europe/Berlin +BEGIN:VTIMEZONE +TZID:America/New_York +X-LIC-LOCATION:America/New_York +BEGIN:DAYLIGHT +TZOFFSETFROM:-0500 +TZOFFSETTO:-0400 +TZNAME:EDT +DTSTART:19700308T020000 +RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU +END:DAYLIGHT +BEGIN:STANDARD +TZOFFSETFROM:-0400 +TZOFFSETTO:-0500 +TZNAME:EST +DTSTART:19701101T020000 +RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU +END:STANDARD +END:VTIMEZONE +BEGIN:VEVENT +DTSTART;TZID=America/New_York:20240904T110000 +DTEND;TZID=America/New_York:20240904T120000 +RRULE:FREQ=MONTHLY;BYDAY=1WE +DTSTAMP:20240821T160951Z +UID:58h3f0kd3aooohmeii0johh23c at google.com +X-GOOGLE-CONFERENCE:https://meet.google.com/fmz-gspu-odg +CREATED:20240821T151507Z +DESCRIPTION:LLVM Loop Optimization Discussion
Video call link: https://meet.google.c + om/fmz-gspu-odg
Agenda/Minutes/Discussion: https://docs.google.com/document/d/1sdzoyB + 11s0ccTZ3fobqctDpgJmRoFcz0sviKxqczs4g/edit?usp=sharing\n\n-::~:~::~:~:~ + :~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~::~:~::-\ + nJoin with Google Meet: https://meet.google.com/fmz-gspu-odg\nOr dial: (DE) + +49 40 8081617343 PIN: 948106286#\nMore phone numbers: https://tel.meet/fm + z-gspu-odg?pin=6273693382184&hs=7\n\nLearn more about Meet at: https://supp + ort.google.com/a/users/answer/9282720\n\nPlease do not edit this section.\n + -::~:~::~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~ + :~:~::~:~::- +LAST-MODIFIED:20240821T160941Z +SEQUENCE:0 +STATUS:CONFIRMED +SUMMARY:LLVM Loop Optimization Discussion +TRANSP:OPAQUE +END:VEVENT +BEGIN:VEVENT +DTSTART;TZID=America/New_York:20240904T110000 +DTEND;TZID=America/New_York:20240904T120000 +DTSTAMP:20240821T160951Z +UID:58h3f0kd3aooohmeii0johh23c at google.com +X-GOOGLE-CONFERENCE:https://meet.google.com/fmz-gspu-odg +RECURRENCE-ID;TZID=America/New_York:20240904T110000 +CREATED:20240821T151507Z +DESCRIPTION:LLVM Loop Optimization Discussion
Video call link: https://meet.google.c + om/fmz-gspu-odg
Agenda/Minutes/Discussion: https://docs.google.com/document/d/1sdzoyB + 11s0ccTZ3fobqctDpgJmRoFcz0sviKxqczs4g/edit?usp=sharing\n\n-::~:~::~:~:~ + :~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~::~:~::-\ + nJoin with Google Meet: https://meet.google.com/fmz-gspu-odg\nOr dial: (DE) + +49 40 8081617343 PIN: 948106286#\nMore phone numbers: https://tel.meet/fm + z-gspu-odg?pin=6273693382184&hs=7\n\nLearn more about Meet at: https://supp + ort.google.com/a/users/answer/9282720\n\nPlease do not edit this section.\n + -::~:~::~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~ + :~:~::~:~::- +LAST-MODIFIED:20240821T160941Z +SEQUENCE:0 +STATUS:CONFIRMED +SUMMARY:LLVM Loop Optimization Discussion +TRANSP:OPAQUE +END:VEVENT +END:VCALENDAR diff --git a/llvm/lib/Support/rpmalloc/CACHE.md b/llvm/lib/Support/rpmalloc/CACHE.md index 052320baf53275..645093026debf1 100644 --- a/llvm/lib/Support/rpmalloc/CACHE.md +++ b/llvm/lib/Support/rpmalloc/CACHE.md @@ -1,19 +1,19 @@ -# Thread caches -rpmalloc has a thread cache of free memory blocks which can be used in allocations without interfering with other threads or going to system to map more memory, as well as a global cache shared by all threads to let spans of memory pages flow between threads. Configuring the size of these caches can be crucial to obtaining good performance while minimizing memory overhead blowup. Below is a simple case study using the benchmark tool to compare different thread cache configurations for rpmalloc. - -The rpmalloc thread cache is configured to be unlimited, performance oriented as meaning default values, size oriented where both thread cache and global cache is reduced significantly, or disabled where both thread and global caches are disabled and completely free pages are directly unmapped. - -The benchmark is configured to run threads allocating 150000 blocks distributed in the `[16, 16000]` bytes range with a linear falloff probability. It runs 1000 loops, and every iteration 75000 blocks (50%) are freed and allocated in a scattered pattern. There are no cross thread allocations/deallocations. Parameters: `benchmark n 0 0 0 1000 150000 75000 16 16000`. The benchmarks are run on an Ubuntu 16.10 machine with 8 cores (4 physical, HT) and 12GiB RAM. - -The benchmark also includes results for the standard library malloc implementation as a reference for comparison with the nocache setting. - -![Ubuntu 16.10 random [16, 16000] bytes, 8 cores](https://docs.google.com/spreadsheets/d/1NWNuar1z0uPCB5iVS_Cs6hSo2xPkTmZf0KsgWS_Fb_4/pubchart?oid=387883204&format=image) -![Ubuntu 16.10 random [16, 16000] bytes, 8 cores](https://docs.google.com/spreadsheets/d/1NWNuar1z0uPCB5iVS_Cs6hSo2xPkTmZf0KsgWS_Fb_4/pubchart?oid=1644710241&format=image) - -For single threaded case the unlimited cache and performance oriented cache settings have identical performance and memory overhead, indicating that the memory pages fit in the combined thread and global cache. As number of threads increase to 2-4 threads, the performance settings have slightly higher performance which can seem odd at first, but can be explained by low contention on the global cache where some memory pages can flow between threads without stalling, reducing the overall number of calls to map new memory pages (also indicated by the slightly lower memory overhead). - -As threads increase even more to 5-10 threads, the increased contention and eventual limit of global cache cause the unlimited setting to gain a slight advantage in performance. As expected the memory overhead remains constant for unlimited caches, while going down for performance setting when number of threads increases. - -The size oriented setting maintain good performance compared to the standard library while reducing the memory overhead compared to the performance setting with a decent amount. - -The nocache setting still outperforms the reference standard library allocator for workloads up to 6 threads while maintaining a near zero memory overhead, which is even slightly lower than the standard library. For use case scenarios where number of allocation of each size class is lower the overhead in rpmalloc from the 64KiB span size will of course increase. +# Thread caches +rpmalloc has a thread cache of free memory blocks which can be used in allocations without interfering with other threads or going to system to map more memory, as well as a global cache shared by all threads to let spans of memory pages flow between threads. Configuring the size of these caches can be crucial to obtaining good performance while minimizing memory overhead blowup. Below is a simple case study using the benchmark tool to compare different thread cache configurations for rpmalloc. + +The rpmalloc thread cache is configured to be unlimited, performance oriented as meaning default values, size oriented where both thread cache and global cache is reduced significantly, or disabled where both thread and global caches are disabled and completely free pages are directly unmapped. + +The benchmark is configured to run threads allocating 150000 blocks distributed in the `[16, 16000]` bytes range with a linear falloff probability. It runs 1000 loops, and every iteration 75000 blocks (50%) are freed and allocated in a scattered pattern. There are no cross thread allocations/deallocations. Parameters: `benchmark n 0 0 0 1000 150000 75000 16 16000`. The benchmarks are run on an Ubuntu 16.10 machine with 8 cores (4 physical, HT) and 12GiB RAM. + +The benchmark also includes results for the standard library malloc implementation as a reference for comparison with the nocache setting. + +![Ubuntu 16.10 random [16, 16000] bytes, 8 cores](https://docs.google.com/spreadsheets/d/1NWNuar1z0uPCB5iVS_Cs6hSo2xPkTmZf0KsgWS_Fb_4/pubchart?oid=387883204&format=image) +![Ubuntu 16.10 random [16, 16000] bytes, 8 cores](https://docs.google.com/spreadsheets/d/1NWNuar1z0uPCB5iVS_Cs6hSo2xPkTmZf0KsgWS_Fb_4/pubchart?oid=1644710241&format=image) + +For single threaded case the unlimited cache and performance oriented cache settings have identical performance and memory overhead, indicating that the memory pages fit in the combined thread and global cache. As number of threads increase to 2-4 threads, the performance settings have slightly higher performance which can seem odd at first, but can be explained by low contention on the global cache where some memory pages can flow between threads without stalling, reducing the overall number of calls to map new memory pages (also indicated by the slightly lower memory overhead). + +As threads increase even more to 5-10 threads, the increased contention and eventual limit of global cache cause the unlimited setting to gain a slight advantage in performance. As expected the memory overhead remains constant for unlimited caches, while going down for performance setting when number of threads increases. + +The size oriented setting maintain good performance compared to the standard library while reducing the memory overhead compared to the performance setting with a decent amount. + +The nocache setting still outperforms the reference standard library allocator for workloads up to 6 threads while maintaining a near zero memory overhead, which is even slightly lower than the standard library. For use case scenarios where number of allocation of each size class is lower the overhead in rpmalloc from the 64KiB span size will of course increase. diff --git a/llvm/lib/Support/rpmalloc/README.md b/llvm/lib/Support/rpmalloc/README.md index 916bca0118d868..2233df9da42d52 100644 --- a/llvm/lib/Support/rpmalloc/README.md +++ b/llvm/lib/Support/rpmalloc/README.md @@ -1,220 +1,220 @@ -# rpmalloc - General Purpose Memory Allocator -This library provides a cross platform lock free thread caching 16-byte aligned memory allocator implemented in C. -This is a fork of rpmalloc 1.4.5. - -Platforms currently supported: - -- Windows -- MacOS -- iOS -- Linux -- Android -- Haiku - -The code should be easily portable to any platform with atomic operations and an mmap-style virtual memory management API. The API used to map/unmap memory pages can be configured in runtime to a custom implementation and mapping granularity/size. - -This library is put in the public domain; you can redistribute it and/or modify it without any restrictions. Or, if you choose, you can use it under the MIT license. - -# Performance -We believe rpmalloc is faster than most popular memory allocators like tcmalloc, hoard, ptmalloc3 and others without causing extra allocated memory overhead in the thread caches compared to these allocators. We also believe the implementation to be easier to read and modify compared to these allocators, as it is a single source file of ~3000 lines of C code. All allocations have a natural 16-byte alignment. - -Contained in a parallel repository is a benchmark utility that performs interleaved unaligned allocations and deallocations (both in-thread and cross-thread) in multiple threads. It measures number of memory operations performed per CPU second, as well as memory overhead by comparing the virtual memory mapped with the number of bytes requested in allocation calls. The setup of number of thread, cross-thread deallocation rate and allocation size limits is configured by command line arguments. - -https://github.com/mjansson/rpmalloc-benchmark - -Below is an example performance comparison chart of rpmalloc and other popular allocator implementations, with default configurations used. - -![Ubuntu 16.10, random [16, 8000] bytes, 8 cores](https://docs.google.com/spreadsheets/d/1NWNuar1z0uPCB5iVS_Cs6hSo2xPkTmZf0KsgWS_Fb_4/pubchart?oid=301017877&format=image) - -The benchmark producing these numbers were run on an Ubuntu 16.10 machine with 8 logical cores (4 physical, HT). The actual numbers are not to be interpreted as absolute performance figures, but rather as relative comparisons between the different allocators. For additional benchmark results, see the [BENCHMARKS](BENCHMARKS.md) file. - -Configuration of the thread and global caches can be important depending on your use pattern. See [CACHE](CACHE.md) for a case study and some comments/guidelines. - -# Required functions - -Before calling any other function in the API, you __MUST__ call the initialization function, either __rpmalloc_initialize__ or __rpmalloc_initialize_config__, or you will get undefined behaviour when calling other rpmalloc entry point. - -Before terminating your use of the allocator, you __SHOULD__ call __rpmalloc_finalize__ in order to release caches and unmap virtual memory, as well as prepare the allocator for global scope cleanup at process exit or dynamic library unload depending on your use case. - -# Using -The easiest way to use the library is simply adding __rpmalloc.[h|c]__ to your project and compile them along with your sources. This contains only the rpmalloc specific entry points and does not provide internal hooks to process and/or thread creation at the moment. You are required to call these functions from your own code in order to initialize and finalize the allocator in your process and threads: - -__rpmalloc_initialize__ : Call at process start to initialize the allocator - -__rpmalloc_initialize_config__ : Optional entry point to call at process start to initialize the allocator with a custom memory mapping backend, memory page size and mapping granularity. - -__rpmalloc_finalize__: Call at process exit to finalize the allocator - -__rpmalloc_thread_initialize__: Call at each thread start to initialize the thread local data for the allocator - -__rpmalloc_thread_finalize__: Call at each thread exit to finalize and release thread cache back to global cache - -__rpmalloc_config__: Get the current runtime configuration of the allocator - -Then simply use the __rpmalloc__/__rpfree__ and the other malloc style replacement functions. Remember all allocations are 16-byte aligned, so no need to call the explicit rpmemalign/rpaligned_alloc/rpposix_memalign functions unless you need greater alignment, they are simply wrappers to make it easier to replace in existing code. - -If you wish to override the standard library malloc family of functions and have automatic initialization/finalization of process and threads, define __ENABLE_OVERRIDE__ to non-zero which will include the `malloc.c` file in compilation of __rpmalloc.c__, and then rebuild the library or your project where you added the rpmalloc source. If you compile rpmalloc as a separate library you must make the linker use the override symbols from the library by referencing at least one symbol. The easiest way is to simply include `rpmalloc.h` in at least one source file and call `rpmalloc_linker_reference` somewhere - it's a dummy empty function. On Windows platforms and C++ overrides you have to `#include ` in at least one source file and also manually handle the initialize/finalize of the process and all threads. The list of libc entry points replaced may not be complete, use libc/stdc++ replacement only as a convenience for testing the library on an existing code base, not a final solution. - -For explicit first class heaps, see the __rpmalloc_heap_*__ API under [first class heaps](#first-class-heaps) section, requiring __RPMALLOC_FIRST_CLASS_HEAPS__ tp be defined to 1. - -# Building -To compile as a static library run the configure python script which generates a Ninja build script, then build using ninja. The ninja build produces two static libraries, one named `rpmalloc` and one named `rpmallocwrap`, where the latter includes the libc entry point overrides. - -The configure + ninja build also produces two shared object/dynamic libraries. The `rpmallocwrap` shared library can be used with LD_PRELOAD/DYLD_INSERT_LIBRARIES to inject in a preexisting binary, replacing any malloc/free family of function calls. This is only implemented for Linux and macOS targets. The list of libc entry points replaced may not be complete, use preloading as a convenience for testing the library on an existing binary, not a final solution. The dynamic library also provides automatic init/fini of process and threads for all platforms. - -The latest stable release is available in the master branch. For latest development code, use the develop branch. - -# Cache configuration options -Free memory pages are cached both per thread and in a global cache for all threads. The size of the thread caches is determined by an adaptive scheme where each cache is limited by a percentage of the maximum allocation count of the corresponding size class. The size of the global caches is determined by a multiple of the maximum of all thread caches. The factors controlling the cache sizes can be set by editing the individual defines in the `rpmalloc.c` source file for fine tuned control. - -__ENABLE_UNLIMITED_CACHE__: By default defined to 0, set to 1 to make all caches infinite, i.e never release spans to global cache unless thread finishes and never unmap memory pages back to the OS. Highest performance but largest memory overhead. - -__ENABLE_UNLIMITED_GLOBAL_CACHE__: By default defined to 0, set to 1 to make global caches infinite, i.e never unmap memory pages back to the OS. - -__ENABLE_UNLIMITED_THREAD_CACHE__: By default defined to 0, set to 1 to make thread caches infinite, i.e never release spans to global cache unless thread finishes. - -__ENABLE_GLOBAL_CACHE__: By default defined to 1, enables the global cache shared between all threads. Set to 0 to disable the global cache and directly unmap pages evicted from the thread cache. - -__ENABLE_THREAD_CACHE__: By default defined to 1, enables the per-thread cache. Set to 0 to disable the thread cache and directly unmap pages no longer in use (also disables the global cache). - -__ENABLE_ADAPTIVE_THREAD_CACHE__: Introduces a simple heuristics in the thread cache size, keeping 25% of the high water mark for each span count class. - -# Other configuration options -Detailed statistics are available if __ENABLE_STATISTICS__ is defined to 1 (default is 0, or disabled), either on compile command line or by setting the value in `rpmalloc.c`. This will cause a slight overhead in runtime to collect statistics for each memory operation, and will also add 4 bytes overhead per allocation to track sizes. - -Integer safety checks on all calls are enabled if __ENABLE_VALIDATE_ARGS__ is defined to 1 (default is 0, or disabled), either on compile command line or by setting the value in `rpmalloc.c`. If enabled, size arguments to the global entry points are verified not to cause integer overflows in calculations. - -Asserts are enabled if __ENABLE_ASSERTS__ is defined to 1 (default is 0, or disabled), either on compile command line or by setting the value in `rpmalloc.c`. - -To include __malloc.c__ in compilation and provide overrides of standard library malloc entry points define __ENABLE_OVERRIDE__ to 1. To enable automatic initialization of finalization of process and threads in order to preload the library into executables using standard library malloc, define __ENABLE_PRELOAD__ to 1. - -To enable the runtime configurable memory page and span sizes, define __RPMALLOC_CONFIGURABLE__ to 1. By default, memory page size is determined by system APIs and memory span size is set to 64KiB. - -To enable support for first class heaps, define __RPMALLOC_FIRST_CLASS_HEAPS__ to 1. By default, the first class heap API is disabled. - -# Huge pages -The allocator has support for huge/large pages on Windows, Linux and MacOS. To enable it, pass a non-zero value in the config value `enable_huge_pages` when initializing the allocator with `rpmalloc_initialize_config`. If the system does not support huge pages it will be automatically disabled. You can query the status by looking at `enable_huge_pages` in the config returned from a call to `rpmalloc_config` after initialization is done. - -# Quick overview -The allocator is similar in spirit to tcmalloc from the [Google Performance Toolkit](https://github.com/gperftools/gperftools). It uses separate heaps for each thread and partitions memory blocks according to a preconfigured set of size classes, up to 2MiB. Larger blocks are mapped and unmapped directly. Allocations for different size classes will be served from different set of memory pages, each "span" of pages is dedicated to one size class. Spans of pages can flow between threads when the thread cache overflows and are released to a global cache, or when the thread ends. Unlike tcmalloc, single blocks do not flow between threads, only entire spans of pages. - -# Implementation details -The allocator is based on a fixed but configurable page alignment (defaults to 64KiB) and 16 byte block alignment, where all runs of memory pages (spans) are mapped to this alignment boundary. On Windows this is automatically guaranteed up to 64KiB by the VirtualAlloc granularity, and on mmap systems it is achieved by oversizing the mapping and aligning the returned virtual memory address to the required boundaries. By aligning to a fixed size the free operation can locate the header of the memory span without having to do a table lookup (as tcmalloc does) by simply masking out the low bits of the address (for 64KiB this would be the low 16 bits). - -Memory blocks are divided into three categories. For 64KiB span size/alignment the small blocks are [16, 1024] bytes, medium blocks (1024, 32256] bytes, and large blocks (32256, 2097120] bytes. The three categories are further divided in size classes. If the span size is changed, the small block classes remain but medium blocks go from (1024, span size] bytes. - -Small blocks have a size class granularity of 16 bytes each in 64 buckets. Medium blocks have a granularity of 512 bytes, 61 buckets (default). Large blocks have the same granularity as the configured span size (default 64KiB). All allocations are fitted to these size class boundaries (an allocation of 36 bytes will allocate a block of 48 bytes). Each small and medium size class has an associated span (meaning a contiguous set of memory pages) configuration describing how many pages the size class will allocate each time the cache is empty and a new allocation is requested. - -Spans for small and medium blocks are cached in four levels to avoid calls to map/unmap memory pages. The first level is a per thread single active span for each size class. The second level is a per thread list of partially free spans for each size class. The third level is a per thread list of free spans. The fourth level is a global list of free spans. - -Each span for a small and medium size class keeps track of how many blocks are allocated/free, as well as a list of which blocks that are free for allocation. To avoid locks, each span is completely owned by the allocating thread, and all cross-thread deallocations will be deferred to the owner thread through a separate free list per span. - -Large blocks, or super spans, are cached in two levels. The first level is a per thread list of free super spans. The second level is a global list of free super spans. - -# Memory mapping -By default the allocator uses OS APIs to map virtual memory pages as needed, either `VirtualAlloc` on Windows or `mmap` on POSIX systems. If you want to use your own custom memory mapping provider you can use __rpmalloc_initialize_config__ and pass function pointers to map and unmap virtual memory. These function should reserve and free the requested number of bytes. - -The returned memory address from the memory map function MUST be aligned to the memory page size and the memory span size (which ever is larger), both of which is configurable. Either provide the page and span sizes during initialization using __rpmalloc_initialize_config__, or use __rpmalloc_config__ to find the required alignment which is equal to the maximum of page and span size. The span size MUST be a power of two in [4096, 262144] range, and be a multiple or divisor of the memory page size. - -Memory mapping requests are always done in multiples of the memory page size. You can specify a custom page size when initializing rpmalloc with __rpmalloc_initialize_config__, or pass 0 to let rpmalloc determine the system memory page size using OS APIs. The page size MUST be a power of two. - -To reduce system call overhead, memory spans are mapped in batches controlled by the `span_map_count` configuration variable (which defaults to the `DEFAULT_SPAN_MAP_COUNT` value if 0, which in turn is sized according to the cache configuration define, defaulting to 64). If the memory page size is larger than the span size, the number of spans to map in a single call will be adjusted to guarantee a multiple of the page size, and the spans will be kept mapped until the entire span range can be unmapped in one call (to avoid trying to unmap partial pages). - -On macOS and iOS mmap requests are tagged with tag 240 for easy identification with the vmmap tool. - -# Span breaking -Super spans (spans a multiple > 1 of the span size) can be subdivided into smaller spans to fulfill a need to map a new span of memory. By default the allocator will greedily grab and break any larger span from the available caches before mapping new virtual memory. However, spans can currently not be glued together to form larger super spans again. Subspans can traverse the cache and be used by different threads individually. - -A span that is a subspan of a larger super span can be individually decommitted to reduce physical memory pressure when the span is evicted from caches and scheduled to be unmapped. The entire original super span will keep track of the subspans it is broken up into, and when the entire range is decommitted the super span will be unmapped. This allows platforms like Windows that require the entire virtual memory range that was mapped in a call to VirtualAlloc to be unmapped in one call to VirtualFree, while still decommitting individual pages in subspans (if the page size is smaller than the span size). - -If you use a custom memory map/unmap function you need to take this into account by looking at the `release` parameter given to the `memory_unmap` function. It is set to 0 for decommitting individual pages and the total super span byte size for finally releasing the entire super span memory range. - -# Memory fragmentation -There is no memory fragmentation by the allocator in the sense that it will not leave unallocated and unusable "holes" in the memory pages by calls to allocate and free blocks of different sizes. This is due to the fact that the memory pages allocated for each size class is split up in perfectly aligned blocks which are not reused for a request of a different size. The block freed by a call to `rpfree` will always be immediately available for an allocation request within the same size class. - -However, there is memory fragmentation in the meaning that a request for x bytes followed by a request of y bytes where x and y are at least one size class different in size will return blocks that are at least one memory page apart in virtual address space. Only blocks of the same size will potentially be within the same memory page span. - -rpmalloc keeps an "active span" and free list for each size class. This leads to back-to-back allocations will most likely be served from within the same span of memory pages (unless the span runs out of free blocks). The rpmalloc implementation will also use any "holes" in memory pages in semi-filled spans before using a completely free span. - -# First class heaps -rpmalloc provides a first class heap type with explicit heap control API. Heaps are maintained with calls to __rpmalloc_heap_acquire__ and __rpmalloc_heap_release__ and allocations/frees are done with __rpmalloc_heap_alloc__ and __rpmalloc_heap_free__. See the `rpmalloc.h` documentation for the full list of functions in the heap API. The main use case of explicit heap control is to scope allocations in a heap and release everything with a single call to __rpmalloc_heap_free_all__ without having to maintain ownership of memory blocks. Note that the heap API is not thread-safe, the caller must make sure that each heap is only used in a single thread at any given time. - -# Producer-consumer scenario -Compared to the some other allocators, rpmalloc does not suffer as much from a producer-consumer thread scenario where one thread allocates memory blocks and another thread frees the blocks. In some allocators the free blocks need to traverse both the thread cache of the thread doing the free operations as well as the global cache before being reused in the allocating thread. In rpmalloc the freed blocks will be reused as soon as the allocating thread needs to get new spans from the thread cache. This enables faster release of completely freed memory pages as blocks in a memory page will not be aliased between different owning threads. - -# Best case scenarios -Threads that keep ownership of allocated memory blocks within the thread and free the blocks from the same thread will have optimal performance. - -Threads that have allocation patterns where the difference in memory usage high and low water marks fit within the thread cache thresholds in the allocator will never touch the global cache except during thread init/fini and have optimal performance. Tweaking the cache limits can be done on a per-size-class basis. - -# Worst case scenarios -Since each thread cache maps spans of memory pages per size class, a thread that allocates just a few blocks of each size class (16, 32, ...) for many size classes will never fill each bucket, and thus map a lot of memory pages while only using a small fraction of the mapped memory. However, the wasted memory will always be less than 4KiB (or the configured memory page size) per size class as each span is initialized one memory page at a time. The cache for free spans will be reused by all size classes. - -Threads that perform a lot of allocations and deallocations in a pattern that have a large difference in high and low water marks, and that difference is larger than the thread cache size, will put a lot of contention on the global cache. What will happen is the thread cache will overflow on each low water mark causing pages to be released to the global cache, then underflow on high water mark causing pages to be re-acquired from the global cache. This can be mitigated by changing the __MAX_SPAN_CACHE_DIVISOR__ define in the source code (at the cost of higher average memory overhead). - -# Caveats -VirtualAlloc has an internal granularity of 64KiB. However, mmap lacks this granularity control, and the implementation instead oversizes the memory mapping with configured span size to be able to always return a memory area with the required alignment. Since the extra memory pages are never touched this will not result in extra committed physical memory pages, but rather only increase virtual memory address space. - -All entry points assume the passed values are valid, for example passing an invalid pointer to free would most likely result in a segmentation fault. __The library does not try to guard against errors!__. - -To support global scope data doing dynamic allocation/deallocation such as C++ objects with custom constructors and destructors, the call to __rpmalloc_finalize__ will not completely terminate the allocator but rather empty all caches and put the allocator in finalization mode. Once this call has been made, the allocator is no longer thread safe and expects all remaining calls to originate from global data destruction on main thread. Any spans or heaps becoming free during this phase will be immediately unmapped to allow correct teardown of the process or dynamic library without any leaks. - -# Other languages - -[Johan Andersson](https://github.com/repi) at Embark has created a Rust wrapper available at [rpmalloc-rs](https://github.com/EmbarkStudios/rpmalloc-rs) - -[Stas Denisov](https://github.com/nxrighthere) has created a C# wrapper available at [Rpmalloc-CSharp](https://github.com/nxrighthere/Rpmalloc-CSharp) - -# License - -This is free and unencumbered software released into the public domain. - -Anyone is free to copy, modify, publish, use, compile, sell, or -distribute this software, either in source code form or as a compiled -binary, for any purpose, commercial or non-commercial, and by any -means. - -In jurisdictions that recognize copyright laws, the author or authors -of this software dedicate any and all copyright interest in the -software to the public domain. We make this dedication for the benefit -of the public at large and to the detriment of our heirs and -successors. We intend this dedication to be an overt act of -relinquishment in perpetuity of all present and future rights to this -software under copyright law. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, -EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF -MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. -IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR -OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, -ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR -OTHER DEALINGS IN THE SOFTWARE. - -For more information, please refer to - - -You can also use this software under the MIT license if public domain is -not recognized in your country - - -The MIT License (MIT) - -Copyright (c) 2017 Mattias Jansson - -Permission is hereby granted, free of charge, to any person obtaining a copy -of this software and associated documentation files (the "Software"), to deal -in the Software without restriction, including without limitation the rights -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -copies of the Software, and to permit persons to whom the Software is -furnished to do so, subject to the following conditions: - -The above copyright notice and this permission notice shall be included in -all copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN -THE SOFTWARE. +# rpmalloc - General Purpose Memory Allocator +This library provides a cross platform lock free thread caching 16-byte aligned memory allocator implemented in C. +This is a fork of rpmalloc 1.4.5. + +Platforms currently supported: + +- Windows +- MacOS +- iOS +- Linux +- Android +- Haiku + +The code should be easily portable to any platform with atomic operations and an mmap-style virtual memory management API. The API used to map/unmap memory pages can be configured in runtime to a custom implementation and mapping granularity/size. + +This library is put in the public domain; you can redistribute it and/or modify it without any restrictions. Or, if you choose, you can use it under the MIT license. + +# Performance +We believe rpmalloc is faster than most popular memory allocators like tcmalloc, hoard, ptmalloc3 and others without causing extra allocated memory overhead in the thread caches compared to these allocators. We also believe the implementation to be easier to read and modify compared to these allocators, as it is a single source file of ~3000 lines of C code. All allocations have a natural 16-byte alignment. + +Contained in a parallel repository is a benchmark utility that performs interleaved unaligned allocations and deallocations (both in-thread and cross-thread) in multiple threads. It measures number of memory operations performed per CPU second, as well as memory overhead by comparing the virtual memory mapped with the number of bytes requested in allocation calls. The setup of number of thread, cross-thread deallocation rate and allocation size limits is configured by command line arguments. + +https://github.com/mjansson/rpmalloc-benchmark + +Below is an example performance comparison chart of rpmalloc and other popular allocator implementations, with default configurations used. + +![Ubuntu 16.10, random [16, 8000] bytes, 8 cores](https://docs.google.com/spreadsheets/d/1NWNuar1z0uPCB5iVS_Cs6hSo2xPkTmZf0KsgWS_Fb_4/pubchart?oid=301017877&format=image) + +The benchmark producing these numbers were run on an Ubuntu 16.10 machine with 8 logical cores (4 physical, HT). The actual numbers are not to be interpreted as absolute performance figures, but rather as relative comparisons between the different allocators. For additional benchmark results, see the [BENCHMARKS](BENCHMARKS.md) file. + +Configuration of the thread and global caches can be important depending on your use pattern. See [CACHE](CACHE.md) for a case study and some comments/guidelines. + +# Required functions + +Before calling any other function in the API, you __MUST__ call the initialization function, either __rpmalloc_initialize__ or __rpmalloc_initialize_config__, or you will get undefined behaviour when calling other rpmalloc entry point. + +Before terminating your use of the allocator, you __SHOULD__ call __rpmalloc_finalize__ in order to release caches and unmap virtual memory, as well as prepare the allocator for global scope cleanup at process exit or dynamic library unload depending on your use case. + +# Using +The easiest way to use the library is simply adding __rpmalloc.[h|c]__ to your project and compile them along with your sources. This contains only the rpmalloc specific entry points and does not provide internal hooks to process and/or thread creation at the moment. You are required to call these functions from your own code in order to initialize and finalize the allocator in your process and threads: + +__rpmalloc_initialize__ : Call at process start to initialize the allocator + +__rpmalloc_initialize_config__ : Optional entry point to call at process start to initialize the allocator with a custom memory mapping backend, memory page size and mapping granularity. + +__rpmalloc_finalize__: Call at process exit to finalize the allocator + +__rpmalloc_thread_initialize__: Call at each thread start to initialize the thread local data for the allocator + +__rpmalloc_thread_finalize__: Call at each thread exit to finalize and release thread cache back to global cache + +__rpmalloc_config__: Get the current runtime configuration of the allocator + +Then simply use the __rpmalloc__/__rpfree__ and the other malloc style replacement functions. Remember all allocations are 16-byte aligned, so no need to call the explicit rpmemalign/rpaligned_alloc/rpposix_memalign functions unless you need greater alignment, they are simply wrappers to make it easier to replace in existing code. + +If you wish to override the standard library malloc family of functions and have automatic initialization/finalization of process and threads, define __ENABLE_OVERRIDE__ to non-zero which will include the `malloc.c` file in compilation of __rpmalloc.c__, and then rebuild the library or your project where you added the rpmalloc source. If you compile rpmalloc as a separate library you must make the linker use the override symbols from the library by referencing at least one symbol. The easiest way is to simply include `rpmalloc.h` in at least one source file and call `rpmalloc_linker_reference` somewhere - it's a dummy empty function. On Windows platforms and C++ overrides you have to `#include ` in at least one source file and also manually handle the initialize/finalize of the process and all threads. The list of libc entry points replaced may not be complete, use libc/stdc++ replacement only as a convenience for testing the library on an existing code base, not a final solution. + +For explicit first class heaps, see the __rpmalloc_heap_*__ API under [first class heaps](#first-class-heaps) section, requiring __RPMALLOC_FIRST_CLASS_HEAPS__ tp be defined to 1. + +# Building +To compile as a static library run the configure python script which generates a Ninja build script, then build using ninja. The ninja build produces two static libraries, one named `rpmalloc` and one named `rpmallocwrap`, where the latter includes the libc entry point overrides. + +The configure + ninja build also produces two shared object/dynamic libraries. The `rpmallocwrap` shared library can be used with LD_PRELOAD/DYLD_INSERT_LIBRARIES to inject in a preexisting binary, replacing any malloc/free family of function calls. This is only implemented for Linux and macOS targets. The list of libc entry points replaced may not be complete, use preloading as a convenience for testing the library on an existing binary, not a final solution. The dynamic library also provides automatic init/fini of process and threads for all platforms. + +The latest stable release is available in the master branch. For latest development code, use the develop branch. + +# Cache configuration options +Free memory pages are cached both per thread and in a global cache for all threads. The size of the thread caches is determined by an adaptive scheme where each cache is limited by a percentage of the maximum allocation count of the corresponding size class. The size of the global caches is determined by a multiple of the maximum of all thread caches. The factors controlling the cache sizes can be set by editing the individual defines in the `rpmalloc.c` source file for fine tuned control. + +__ENABLE_UNLIMITED_CACHE__: By default defined to 0, set to 1 to make all caches infinite, i.e never release spans to global cache unless thread finishes and never unmap memory pages back to the OS. Highest performance but largest memory overhead. + +__ENABLE_UNLIMITED_GLOBAL_CACHE__: By default defined to 0, set to 1 to make global caches infinite, i.e never unmap memory pages back to the OS. + +__ENABLE_UNLIMITED_THREAD_CACHE__: By default defined to 0, set to 1 to make thread caches infinite, i.e never release spans to global cache unless thread finishes. + +__ENABLE_GLOBAL_CACHE__: By default defined to 1, enables the global cache shared between all threads. Set to 0 to disable the global cache and directly unmap pages evicted from the thread cache. + +__ENABLE_THREAD_CACHE__: By default defined to 1, enables the per-thread cache. Set to 0 to disable the thread cache and directly unmap pages no longer in use (also disables the global cache). + +__ENABLE_ADAPTIVE_THREAD_CACHE__: Introduces a simple heuristics in the thread cache size, keeping 25% of the high water mark for each span count class. + +# Other configuration options +Detailed statistics are available if __ENABLE_STATISTICS__ is defined to 1 (default is 0, or disabled), either on compile command line or by setting the value in `rpmalloc.c`. This will cause a slight overhead in runtime to collect statistics for each memory operation, and will also add 4 bytes overhead per allocation to track sizes. + +Integer safety checks on all calls are enabled if __ENABLE_VALIDATE_ARGS__ is defined to 1 (default is 0, or disabled), either on compile command line or by setting the value in `rpmalloc.c`. If enabled, size arguments to the global entry points are verified not to cause integer overflows in calculations. + +Asserts are enabled if __ENABLE_ASSERTS__ is defined to 1 (default is 0, or disabled), either on compile command line or by setting the value in `rpmalloc.c`. + +To include __malloc.c__ in compilation and provide overrides of standard library malloc entry points define __ENABLE_OVERRIDE__ to 1. To enable automatic initialization of finalization of process and threads in order to preload the library into executables using standard library malloc, define __ENABLE_PRELOAD__ to 1. + +To enable the runtime configurable memory page and span sizes, define __RPMALLOC_CONFIGURABLE__ to 1. By default, memory page size is determined by system APIs and memory span size is set to 64KiB. + +To enable support for first class heaps, define __RPMALLOC_FIRST_CLASS_HEAPS__ to 1. By default, the first class heap API is disabled. + +# Huge pages +The allocator has support for huge/large pages on Windows, Linux and MacOS. To enable it, pass a non-zero value in the config value `enable_huge_pages` when initializing the allocator with `rpmalloc_initialize_config`. If the system does not support huge pages it will be automatically disabled. You can query the status by looking at `enable_huge_pages` in the config returned from a call to `rpmalloc_config` after initialization is done. + +# Quick overview +The allocator is similar in spirit to tcmalloc from the [Google Performance Toolkit](https://github.com/gperftools/gperftools). It uses separate heaps for each thread and partitions memory blocks according to a preconfigured set of size classes, up to 2MiB. Larger blocks are mapped and unmapped directly. Allocations for different size classes will be served from different set of memory pages, each "span" of pages is dedicated to one size class. Spans of pages can flow between threads when the thread cache overflows and are released to a global cache, or when the thread ends. Unlike tcmalloc, single blocks do not flow between threads, only entire spans of pages. + +# Implementation details +The allocator is based on a fixed but configurable page alignment (defaults to 64KiB) and 16 byte block alignment, where all runs of memory pages (spans) are mapped to this alignment boundary. On Windows this is automatically guaranteed up to 64KiB by the VirtualAlloc granularity, and on mmap systems it is achieved by oversizing the mapping and aligning the returned virtual memory address to the required boundaries. By aligning to a fixed size the free operation can locate the header of the memory span without having to do a table lookup (as tcmalloc does) by simply masking out the low bits of the address (for 64KiB this would be the low 16 bits). + +Memory blocks are divided into three categories. For 64KiB span size/alignment the small blocks are [16, 1024] bytes, medium blocks (1024, 32256] bytes, and large blocks (32256, 2097120] bytes. The three categories are further divided in size classes. If the span size is changed, the small block classes remain but medium blocks go from (1024, span size] bytes. + +Small blocks have a size class granularity of 16 bytes each in 64 buckets. Medium blocks have a granularity of 512 bytes, 61 buckets (default). Large blocks have the same granularity as the configured span size (default 64KiB). All allocations are fitted to these size class boundaries (an allocation of 36 bytes will allocate a block of 48 bytes). Each small and medium size class has an associated span (meaning a contiguous set of memory pages) configuration describing how many pages the size class will allocate each time the cache is empty and a new allocation is requested. + +Spans for small and medium blocks are cached in four levels to avoid calls to map/unmap memory pages. The first level is a per thread single active span for each size class. The second level is a per thread list of partially free spans for each size class. The third level is a per thread list of free spans. The fourth level is a global list of free spans. + +Each span for a small and medium size class keeps track of how many blocks are allocated/free, as well as a list of which blocks that are free for allocation. To avoid locks, each span is completely owned by the allocating thread, and all cross-thread deallocations will be deferred to the owner thread through a separate free list per span. + +Large blocks, or super spans, are cached in two levels. The first level is a per thread list of free super spans. The second level is a global list of free super spans. + +# Memory mapping +By default the allocator uses OS APIs to map virtual memory pages as needed, either `VirtualAlloc` on Windows or `mmap` on POSIX systems. If you want to use your own custom memory mapping provider you can use __rpmalloc_initialize_config__ and pass function pointers to map and unmap virtual memory. These function should reserve and free the requested number of bytes. + +The returned memory address from the memory map function MUST be aligned to the memory page size and the memory span size (which ever is larger), both of which is configurable. Either provide the page and span sizes during initialization using __rpmalloc_initialize_config__, or use __rpmalloc_config__ to find the required alignment which is equal to the maximum of page and span size. The span size MUST be a power of two in [4096, 262144] range, and be a multiple or divisor of the memory page size. + +Memory mapping requests are always done in multiples of the memory page size. You can specify a custom page size when initializing rpmalloc with __rpmalloc_initialize_config__, or pass 0 to let rpmalloc determine the system memory page size using OS APIs. The page size MUST be a power of two. + +To reduce system call overhead, memory spans are mapped in batches controlled by the `span_map_count` configuration variable (which defaults to the `DEFAULT_SPAN_MAP_COUNT` value if 0, which in turn is sized according to the cache configuration define, defaulting to 64). If the memory page size is larger than the span size, the number of spans to map in a single call will be adjusted to guarantee a multiple of the page size, and the spans will be kept mapped until the entire span range can be unmapped in one call (to avoid trying to unmap partial pages). + +On macOS and iOS mmap requests are tagged with tag 240 for easy identification with the vmmap tool. + +# Span breaking +Super spans (spans a multiple > 1 of the span size) can be subdivided into smaller spans to fulfill a need to map a new span of memory. By default the allocator will greedily grab and break any larger span from the available caches before mapping new virtual memory. However, spans can currently not be glued together to form larger super spans again. Subspans can traverse the cache and be used by different threads individually. + +A span that is a subspan of a larger super span can be individually decommitted to reduce physical memory pressure when the span is evicted from caches and scheduled to be unmapped. The entire original super span will keep track of the subspans it is broken up into, and when the entire range is decommitted the super span will be unmapped. This allows platforms like Windows that require the entire virtual memory range that was mapped in a call to VirtualAlloc to be unmapped in one call to VirtualFree, while still decommitting individual pages in subspans (if the page size is smaller than the span size). + +If you use a custom memory map/unmap function you need to take this into account by looking at the `release` parameter given to the `memory_unmap` function. It is set to 0 for decommitting individual pages and the total super span byte size for finally releasing the entire super span memory range. + +# Memory fragmentation +There is no memory fragmentation by the allocator in the sense that it will not leave unallocated and unusable "holes" in the memory pages by calls to allocate and free blocks of different sizes. This is due to the fact that the memory pages allocated for each size class is split up in perfectly aligned blocks which are not reused for a request of a different size. The block freed by a call to `rpfree` will always be immediately available for an allocation request within the same size class. + +However, there is memory fragmentation in the meaning that a request for x bytes followed by a request of y bytes where x and y are at least one size class different in size will return blocks that are at least one memory page apart in virtual address space. Only blocks of the same size will potentially be within the same memory page span. + +rpmalloc keeps an "active span" and free list for each size class. This leads to back-to-back allocations will most likely be served from within the same span of memory pages (unless the span runs out of free blocks). The rpmalloc implementation will also use any "holes" in memory pages in semi-filled spans before using a completely free span. + +# First class heaps +rpmalloc provides a first class heap type with explicit heap control API. Heaps are maintained with calls to __rpmalloc_heap_acquire__ and __rpmalloc_heap_release__ and allocations/frees are done with __rpmalloc_heap_alloc__ and __rpmalloc_heap_free__. See the `rpmalloc.h` documentation for the full list of functions in the heap API. The main use case of explicit heap control is to scope allocations in a heap and release everything with a single call to __rpmalloc_heap_free_all__ without having to maintain ownership of memory blocks. Note that the heap API is not thread-safe, the caller must make sure that each heap is only used in a single thread at any given time. + +# Producer-consumer scenario +Compared to the some other allocators, rpmalloc does not suffer as much from a producer-consumer thread scenario where one thread allocates memory blocks and another thread frees the blocks. In some allocators the free blocks need to traverse both the thread cache of the thread doing the free operations as well as the global cache before being reused in the allocating thread. In rpmalloc the freed blocks will be reused as soon as the allocating thread needs to get new spans from the thread cache. This enables faster release of completely freed memory pages as blocks in a memory page will not be aliased between different owning threads. + +# Best case scenarios +Threads that keep ownership of allocated memory blocks within the thread and free the blocks from the same thread will have optimal performance. + +Threads that have allocation patterns where the difference in memory usage high and low water marks fit within the thread cache thresholds in the allocator will never touch the global cache except during thread init/fini and have optimal performance. Tweaking the cache limits can be done on a per-size-class basis. + +# Worst case scenarios +Since each thread cache maps spans of memory pages per size class, a thread that allocates just a few blocks of each size class (16, 32, ...) for many size classes will never fill each bucket, and thus map a lot of memory pages while only using a small fraction of the mapped memory. However, the wasted memory will always be less than 4KiB (or the configured memory page size) per size class as each span is initialized one memory page at a time. The cache for free spans will be reused by all size classes. + +Threads that perform a lot of allocations and deallocations in a pattern that have a large difference in high and low water marks, and that difference is larger than the thread cache size, will put a lot of contention on the global cache. What will happen is the thread cache will overflow on each low water mark causing pages to be released to the global cache, then underflow on high water mark causing pages to be re-acquired from the global cache. This can be mitigated by changing the __MAX_SPAN_CACHE_DIVISOR__ define in the source code (at the cost of higher average memory overhead). + +# Caveats +VirtualAlloc has an internal granularity of 64KiB. However, mmap lacks this granularity control, and the implementation instead oversizes the memory mapping with configured span size to be able to always return a memory area with the required alignment. Since the extra memory pages are never touched this will not result in extra committed physical memory pages, but rather only increase virtual memory address space. + +All entry points assume the passed values are valid, for example passing an invalid pointer to free would most likely result in a segmentation fault. __The library does not try to guard against errors!__. + +To support global scope data doing dynamic allocation/deallocation such as C++ objects with custom constructors and destructors, the call to __rpmalloc_finalize__ will not completely terminate the allocator but rather empty all caches and put the allocator in finalization mode. Once this call has been made, the allocator is no longer thread safe and expects all remaining calls to originate from global data destruction on main thread. Any spans or heaps becoming free during this phase will be immediately unmapped to allow correct teardown of the process or dynamic library without any leaks. + +# Other languages + +[Johan Andersson](https://github.com/repi) at Embark has created a Rust wrapper available at [rpmalloc-rs](https://github.com/EmbarkStudios/rpmalloc-rs) + +[Stas Denisov](https://github.com/nxrighthere) has created a C# wrapper available at [Rpmalloc-CSharp](https://github.com/nxrighthere/Rpmalloc-CSharp) + +# License + +This is free and unencumbered software released into the public domain. + +Anyone is free to copy, modify, publish, use, compile, sell, or +distribute this software, either in source code form or as a compiled +binary, for any purpose, commercial or non-commercial, and by any +means. + +In jurisdictions that recognize copyright laws, the author or authors +of this software dedicate any and all copyright interest in the +software to the public domain. We make this dedication for the benefit +of the public at large and to the detriment of our heirs and +successors. We intend this dedication to be an overt act of +relinquishment in perpetuity of all present and future rights to this +software under copyright law. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. +IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR +OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, +ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR +OTHER DEALINGS IN THE SOFTWARE. + +For more information, please refer to + + +You can also use this software under the MIT license if public domain is +not recognized in your country + + +The MIT License (MIT) + +Copyright (c) 2017 Mattias Jansson + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. diff --git a/llvm/lib/Support/rpmalloc/malloc.c b/llvm/lib/Support/rpmalloc/malloc.c index 3fcfe848250c6b..59e13aab3ef7ed 100644 --- a/llvm/lib/Support/rpmalloc/malloc.c +++ b/llvm/lib/Support/rpmalloc/malloc.c @@ -1,724 +1,724 @@ -//===------------------------ malloc.c ------------------*- C -*-=============// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// -// -// This library provides a cross-platform lock free thread caching malloc -// implementation in C11. -// -// -// This file provides overrides for the standard library malloc entry points for -// C and new/delete operators for C++ It also provides automatic -// initialization/finalization of process and threads -// -//===----------------------------------------------------------------------===// - -#if defined(__TINYC__) -#include -#endif - -#ifndef ARCH_64BIT -#if defined(__LLP64__) || defined(__LP64__) || defined(_WIN64) -#define ARCH_64BIT 1 -_Static_assert(sizeof(size_t) == 8, "Data type size mismatch"); -_Static_assert(sizeof(void *) == 8, "Data type size mismatch"); -#else -#define ARCH_64BIT 0 -_Static_assert(sizeof(size_t) == 4, "Data type size mismatch"); -_Static_assert(sizeof(void *) == 4, "Data type size mismatch"); -#endif -#endif - -#if (defined(__GNUC__) || defined(__clang__)) -#pragma GCC visibility push(default) -#endif - -#define USE_IMPLEMENT 1 -#define USE_INTERPOSE 0 -#define USE_ALIAS 0 - -#if defined(__APPLE__) -#undef USE_INTERPOSE -#define USE_INTERPOSE 1 - -typedef struct interpose_t { - void *new_func; - void *orig_func; -} interpose_t; - -#define MAC_INTERPOSE_PAIR(newf, oldf) {(void *)newf, (void *)oldf} -#define MAC_INTERPOSE_SINGLE(newf, oldf) \ - __attribute__((used)) static const interpose_t macinterpose##newf##oldf \ - __attribute__((section("__DATA, __interpose"))) = \ - MAC_INTERPOSE_PAIR(newf, oldf) - -#endif - -#if !defined(_WIN32) && !defined(__APPLE__) -#undef USE_IMPLEMENT -#undef USE_ALIAS -#define USE_IMPLEMENT 0 -#define USE_ALIAS 1 -#endif - -#ifdef _MSC_VER -#pragma warning(disable : 4100) -#undef malloc -#undef free -#undef calloc -#define RPMALLOC_RESTRICT __declspec(restrict) -#else -#define RPMALLOC_RESTRICT -#endif - -#if ENABLE_OVERRIDE - -typedef struct rp_nothrow_t { - int __dummy; -} rp_nothrow_t; - -#if USE_IMPLEMENT - -extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL malloc(size_t size) { - return rpmalloc(size); -} -extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL calloc(size_t count, - size_t size) { - return rpcalloc(count, size); -} -extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL realloc(void *ptr, - size_t size) { - return rprealloc(ptr, size); -} -extern inline void *RPMALLOC_CDECL reallocf(void *ptr, size_t size) { - return rprealloc(ptr, size); -} -extern inline void *RPMALLOC_CDECL aligned_alloc(size_t alignment, - size_t size) { - return rpaligned_alloc(alignment, size); -} -extern inline void *RPMALLOC_CDECL memalign(size_t alignment, size_t size) { - return rpmemalign(alignment, size); -} -extern inline int RPMALLOC_CDECL posix_memalign(void **memptr, size_t alignment, - size_t size) { - return rpposix_memalign(memptr, alignment, size); -} -extern inline void RPMALLOC_CDECL free(void *ptr) { rpfree(ptr); } -extern inline void RPMALLOC_CDECL cfree(void *ptr) { rpfree(ptr); } -extern inline size_t RPMALLOC_CDECL malloc_usable_size(void *ptr) { - return rpmalloc_usable_size(ptr); -} -extern inline size_t RPMALLOC_CDECL malloc_size(void *ptr) { - return rpmalloc_usable_size(ptr); -} - -#ifdef _WIN32 -extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL _malloc_base(size_t size) { - return rpmalloc(size); -} -extern inline void RPMALLOC_CDECL _free_base(void *ptr) { rpfree(ptr); } -extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL _calloc_base(size_t count, - size_t size) { - return rpcalloc(count, size); -} -extern inline size_t RPMALLOC_CDECL _msize(void *ptr) { - return rpmalloc_usable_size(ptr); -} -extern inline size_t RPMALLOC_CDECL _msize_base(void *ptr) { - return rpmalloc_usable_size(ptr); -} -extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL -_realloc_base(void *ptr, size_t size) { - return rprealloc(ptr, size); -} -#endif - -#ifdef _WIN32 -// For Windows, #include in one source file to get the C++ operator -// overrides implemented in your module -#else -// Overload the C++ operators using the mangled names -// (https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling) operators -// delete and delete[] -#define RPDEFVIS __attribute__((visibility("default"))) -extern void _ZdlPv(void *p); -void RPDEFVIS _ZdlPv(void *p) { rpfree(p); } -extern void _ZdaPv(void *p); -void RPDEFVIS _ZdaPv(void *p) { rpfree(p); } -#if ARCH_64BIT -// 64-bit operators new and new[], normal and aligned -extern void *_Znwm(uint64_t size); -void *RPDEFVIS _Znwm(uint64_t size) { return rpmalloc(size); } -extern void *_Znam(uint64_t size); -void *RPDEFVIS _Znam(uint64_t size) { return rpmalloc(size); } -extern void *_Znwmm(uint64_t size, uint64_t align); -void *RPDEFVIS _Znwmm(uint64_t size, uint64_t align) { - return rpaligned_alloc(align, size); -} -extern void *_Znamm(uint64_t size, uint64_t align); -void *RPDEFVIS _Znamm(uint64_t size, uint64_t align) { - return rpaligned_alloc(align, size); -} -extern void *_ZnwmSt11align_val_t(uint64_t size, uint64_t align); -void *RPDEFVIS _ZnwmSt11align_val_t(uint64_t size, uint64_t align) { - return rpaligned_alloc(align, size); -} -extern void *_ZnamSt11align_val_t(uint64_t size, uint64_t align); -void *RPDEFVIS _ZnamSt11align_val_t(uint64_t size, uint64_t align) { - return rpaligned_alloc(align, size); -} -extern void *_ZnwmRKSt9nothrow_t(uint64_t size, rp_nothrow_t t); -void *RPDEFVIS _ZnwmRKSt9nothrow_t(uint64_t size, rp_nothrow_t t) { - (void)sizeof(t); - return rpmalloc(size); -} -extern void *_ZnamRKSt9nothrow_t(uint64_t size, rp_nothrow_t t); -void *RPDEFVIS _ZnamRKSt9nothrow_t(uint64_t size, rp_nothrow_t t) { - (void)sizeof(t); - return rpmalloc(size); -} -extern void *_ZnwmSt11align_val_tRKSt9nothrow_t(uint64_t size, uint64_t align, - rp_nothrow_t t); -void *RPDEFVIS _ZnwmSt11align_val_tRKSt9nothrow_t(uint64_t size, uint64_t align, - rp_nothrow_t t) { - (void)sizeof(t); - return rpaligned_alloc(align, size); -} -extern void *_ZnamSt11align_val_tRKSt9nothrow_t(uint64_t size, uint64_t align, - rp_nothrow_t t); -void *RPDEFVIS _ZnamSt11align_val_tRKSt9nothrow_t(uint64_t size, uint64_t align, - rp_nothrow_t t) { - (void)sizeof(t); - return rpaligned_alloc(align, size); -} -// 64-bit operators sized delete and delete[], normal and aligned -extern void _ZdlPvm(void *p, uint64_t size); -void RPDEFVIS _ZdlPvm(void *p, uint64_t size) { - rpfree(p); - (void)sizeof(size); -} -extern void _ZdaPvm(void *p, uint64_t size); -void RPDEFVIS _ZdaPvm(void *p, uint64_t size) { - rpfree(p); - (void)sizeof(size); -} -extern void _ZdlPvSt11align_val_t(void *p, uint64_t align); -void RPDEFVIS _ZdlPvSt11align_val_t(void *p, uint64_t align) { - rpfree(p); - (void)sizeof(align); -} -extern void _ZdaPvSt11align_val_t(void *p, uint64_t align); -void RPDEFVIS _ZdaPvSt11align_val_t(void *p, uint64_t align) { - rpfree(p); - (void)sizeof(align); -} -extern void _ZdlPvmSt11align_val_t(void *p, uint64_t size, uint64_t align); -void RPDEFVIS _ZdlPvmSt11align_val_t(void *p, uint64_t size, uint64_t align) { - rpfree(p); - (void)sizeof(size); - (void)sizeof(align); -} -extern void _ZdaPvmSt11align_val_t(void *p, uint64_t size, uint64_t align); -void RPDEFVIS _ZdaPvmSt11align_val_t(void *p, uint64_t size, uint64_t align) { - rpfree(p); - (void)sizeof(size); - (void)sizeof(align); -} -#else -// 32-bit operators new and new[], normal and aligned -extern void *_Znwj(uint32_t size); -void *RPDEFVIS _Znwj(uint32_t size) { return rpmalloc(size); } -extern void *_Znaj(uint32_t size); -void *RPDEFVIS _Znaj(uint32_t size) { return rpmalloc(size); } -extern void *_Znwjj(uint32_t size, uint32_t align); -void *RPDEFVIS _Znwjj(uint32_t size, uint32_t align) { - return rpaligned_alloc(align, size); -} -extern void *_Znajj(uint32_t size, uint32_t align); -void *RPDEFVIS _Znajj(uint32_t size, uint32_t align) { - return rpaligned_alloc(align, size); -} -extern void *_ZnwjSt11align_val_t(size_t size, size_t align); -void *RPDEFVIS _ZnwjSt11align_val_t(size_t size, size_t align) { - return rpaligned_alloc(align, size); -} -extern void *_ZnajSt11align_val_t(size_t size, size_t align); -void *RPDEFVIS _ZnajSt11align_val_t(size_t size, size_t align) { - return rpaligned_alloc(align, size); -} -extern void *_ZnwjRKSt9nothrow_t(size_t size, rp_nothrow_t t); -void *RPDEFVIS _ZnwjRKSt9nothrow_t(size_t size, rp_nothrow_t t) { - (void)sizeof(t); - return rpmalloc(size); -} -extern void *_ZnajRKSt9nothrow_t(size_t size, rp_nothrow_t t); -void *RPDEFVIS _ZnajRKSt9nothrow_t(size_t size, rp_nothrow_t t) { - (void)sizeof(t); - return rpmalloc(size); -} -extern void *_ZnwjSt11align_val_tRKSt9nothrow_t(size_t size, size_t align, - rp_nothrow_t t); -void *RPDEFVIS _ZnwjSt11align_val_tRKSt9nothrow_t(size_t size, size_t align, - rp_nothrow_t t) { - (void)sizeof(t); - return rpaligned_alloc(align, size); -} -extern void *_ZnajSt11align_val_tRKSt9nothrow_t(size_t size, size_t align, - rp_nothrow_t t); -void *RPDEFVIS _ZnajSt11align_val_tRKSt9nothrow_t(size_t size, size_t align, - rp_nothrow_t t) { - (void)sizeof(t); - return rpaligned_alloc(align, size); -} -// 32-bit operators sized delete and delete[], normal and aligned -extern void _ZdlPvj(void *p, uint64_t size); -void RPDEFVIS _ZdlPvj(void *p, uint64_t size) { - rpfree(p); - (void)sizeof(size); -} -extern void _ZdaPvj(void *p, uint64_t size); -void RPDEFVIS _ZdaPvj(void *p, uint64_t size) { - rpfree(p); - (void)sizeof(size); -} -extern void _ZdlPvSt11align_val_t(void *p, uint32_t align); -void RPDEFVIS _ZdlPvSt11align_val_t(void *p, uint64_t a) { - rpfree(p); - (void)sizeof(align); -} -extern void _ZdaPvSt11align_val_t(void *p, uint32_t align); -void RPDEFVIS _ZdaPvSt11align_val_t(void *p, uint64_t a) { - rpfree(p); - (void)sizeof(align); -} -extern void _ZdlPvjSt11align_val_t(void *p, uint32_t size, uint32_t align); -void RPDEFVIS _ZdlPvjSt11align_val_t(void *p, uint64_t size, uint64_t align) { - rpfree(p); - (void)sizeof(size); - (void)sizeof(a); -} -extern void _ZdaPvjSt11align_val_t(void *p, uint32_t size, uint32_t align); -void RPDEFVIS _ZdaPvjSt11align_val_t(void *p, uint64_t size, uint64_t align) { - rpfree(p); - (void)sizeof(size); - (void)sizeof(a); -} -#endif -#endif -#endif - -#if USE_INTERPOSE || USE_ALIAS - -static void *rpmalloc_nothrow(size_t size, rp_nothrow_t t) { - (void)sizeof(t); - return rpmalloc(size); -} -static void *rpaligned_alloc_reverse(size_t size, size_t align) { - return rpaligned_alloc(align, size); -} -static void *rpaligned_alloc_reverse_nothrow(size_t size, size_t align, - rp_nothrow_t t) { - (void)sizeof(t); - return rpaligned_alloc(align, size); -} -static void rpfree_size(void *p, size_t size) { - (void)sizeof(size); - rpfree(p); -} -static void rpfree_aligned(void *p, size_t align) { - (void)sizeof(align); - rpfree(p); -} -static void rpfree_size_aligned(void *p, size_t size, size_t align) { - (void)sizeof(size); - (void)sizeof(align); - rpfree(p); -} - -#endif - -#if USE_INTERPOSE - -__attribute__((used)) static const interpose_t macinterpose_malloc[] - __attribute__((section("__DATA, __interpose"))) = { - // new and new[] - MAC_INTERPOSE_PAIR(rpmalloc, _Znwm), - MAC_INTERPOSE_PAIR(rpmalloc, _Znam), - MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse, _Znwmm), - MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse, _Znamm), - MAC_INTERPOSE_PAIR(rpmalloc_nothrow, _ZnwmRKSt9nothrow_t), - MAC_INTERPOSE_PAIR(rpmalloc_nothrow, _ZnamRKSt9nothrow_t), - MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse, _ZnwmSt11align_val_t), - MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse, _ZnamSt11align_val_t), - MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse_nothrow, - _ZnwmSt11align_val_tRKSt9nothrow_t), - MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse_nothrow, - _ZnamSt11align_val_tRKSt9nothrow_t), - // delete and delete[] - MAC_INTERPOSE_PAIR(rpfree, _ZdlPv), MAC_INTERPOSE_PAIR(rpfree, _ZdaPv), - MAC_INTERPOSE_PAIR(rpfree_size, _ZdlPvm), - MAC_INTERPOSE_PAIR(rpfree_size, _ZdaPvm), - MAC_INTERPOSE_PAIR(rpfree_aligned, _ZdlPvSt11align_val_t), - MAC_INTERPOSE_PAIR(rpfree_aligned, _ZdaPvSt11align_val_t), - MAC_INTERPOSE_PAIR(rpfree_size_aligned, _ZdlPvmSt11align_val_t), - MAC_INTERPOSE_PAIR(rpfree_size_aligned, _ZdaPvmSt11align_val_t), - // libc entry points - MAC_INTERPOSE_PAIR(rpmalloc, malloc), - MAC_INTERPOSE_PAIR(rpmalloc, calloc), - MAC_INTERPOSE_PAIR(rprealloc, realloc), - MAC_INTERPOSE_PAIR(rprealloc, reallocf), -#if defined(__MAC_10_15) && __MAC_OS_X_VERSION_MIN_REQUIRED >= __MAC_10_15 - MAC_INTERPOSE_PAIR(rpaligned_alloc, aligned_alloc), -#endif - MAC_INTERPOSE_PAIR(rpmemalign, memalign), - MAC_INTERPOSE_PAIR(rpposix_memalign, posix_memalign), - MAC_INTERPOSE_PAIR(rpfree, free), MAC_INTERPOSE_PAIR(rpfree, cfree), - MAC_INTERPOSE_PAIR(rpmalloc_usable_size, malloc_usable_size), - MAC_INTERPOSE_PAIR(rpmalloc_usable_size, malloc_size)}; - -#endif - -#if USE_ALIAS - -#define RPALIAS(fn) __attribute__((alias(#fn), used, visibility("default"))); - -// Alias the C++ operators using the mangled names -// (https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling) - -// operators delete and delete[] -void _ZdlPv(void *p) RPALIAS(rpfree) void _ZdaPv(void *p) RPALIAS(rpfree) - -#if ARCH_64BIT - // 64-bit operators new and new[], normal and aligned - void *_Znwm(uint64_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(1) - RPALIAS(rpmalloc) void *_Znam(uint64_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(1) RPALIAS(rpmalloc) void *_Znwmm(uint64_t size, - uint64_t align) - RPALIAS(rpaligned_alloc_reverse) void *_Znamm(uint64_t size, - uint64_t align) - RPALIAS(rpaligned_alloc_reverse) void *_ZnwmSt11align_val_t( - size_t size, size_t align) - RPALIAS(rpaligned_alloc_reverse) void *_ZnamSt11align_val_t( - size_t size, size_t align) - RPALIAS(rpaligned_alloc_reverse) void *_ZnwmRKSt9nothrow_t( - size_t size, rp_nothrow_t t) - RPALIAS(rpmalloc_nothrow) void *_ZnamRKSt9nothrow_t( - size_t size, - rp_nothrow_t t) RPALIAS(rpmalloc_nothrow) void - *_ZnwmSt11align_val_tRKSt9nothrow_t(size_t size, - size_t align, - rp_nothrow_t t) - RPALIAS(rpaligned_alloc_reverse_nothrow) void - *_ZnamSt11align_val_tRKSt9nothrow_t( - size_t size, size_t align, - rp_nothrow_t t) - RPALIAS(rpaligned_alloc_reverse_nothrow) - // 64-bit operators delete and delete[], sized and aligned - void _ZdlPvm(void *p, size_t n) RPALIAS(rpfree_size) void _ZdaPvm(void *p, - size_t n) - RPALIAS(rpfree_size) void _ZdlPvSt11align_val_t(void *p, size_t a) - RPALIAS(rpfree_aligned) void _ZdaPvSt11align_val_t(void *p, - size_t a) - RPALIAS(rpfree_aligned) void _ZdlPvmSt11align_val_t(void *p, - size_t n, - size_t a) - RPALIAS(rpfree_size_aligned) void _ZdaPvmSt11align_val_t( - void *p, size_t n, size_t a) - RPALIAS(rpfree_size_aligned) -#else - // 32-bit operators new and new[], normal and aligned - void *_Znwj(uint32_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(1) - RPALIAS(rpmalloc) void *_Znaj(uint32_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(1) RPALIAS(rpmalloc) void *_Znwjj(uint32_t size, - uint32_t align) - RPALIAS(rpaligned_alloc_reverse) void *_Znajj(uint32_t size, - uint32_t align) - RPALIAS(rpaligned_alloc_reverse) void *_ZnwjSt11align_val_t( - size_t size, size_t align) - RPALIAS(rpaligned_alloc_reverse) void *_ZnajSt11align_val_t( - size_t size, size_t align) - RPALIAS(rpaligned_alloc_reverse) void *_ZnwjRKSt9nothrow_t( - size_t size, rp_nothrow_t t) - RPALIAS(rpmalloc_nothrow) void *_ZnajRKSt9nothrow_t( - size_t size, - rp_nothrow_t t) RPALIAS(rpmalloc_nothrow) void - *_ZnwjSt11align_val_tRKSt9nothrow_t(size_t size, - size_t align, - rp_nothrow_t t) - RPALIAS(rpaligned_alloc_reverse_nothrow) void - *_ZnajSt11align_val_tRKSt9nothrow_t( - size_t size, size_t align, - rp_nothrow_t t) - RPALIAS(rpaligned_alloc_reverse_nothrow) - // 32-bit operators delete and delete[], sized and aligned - void _ZdlPvj(void *p, size_t n) RPALIAS(rpfree_size) void _ZdaPvj(void *p, - size_t n) - RPALIAS(rpfree_size) void _ZdlPvSt11align_val_t(void *p, size_t a) - RPALIAS(rpfree_aligned) void _ZdaPvSt11align_val_t(void *p, - size_t a) - RPALIAS(rpfree_aligned) void _ZdlPvjSt11align_val_t(void *p, - size_t n, - size_t a) - RPALIAS(rpfree_size_aligned) void _ZdaPvjSt11align_val_t( - void *p, size_t n, size_t a) - RPALIAS(rpfree_size_aligned) -#endif - - void *malloc(size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(1) - RPALIAS(rpmalloc) void *calloc(size_t count, size_t size) - RPALIAS(rpcalloc) void *realloc(void *ptr, size_t size) - RPALIAS(rprealloc) void *reallocf(void *ptr, size_t size) - RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(2) - RPALIAS(rprealloc) void *aligned_alloc(size_t alignment, size_t size) - RPALIAS(rpaligned_alloc) void *memalign( - size_t alignment, size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(2) - RPALIAS(rpmemalign) int posix_memalign(void **memptr, size_t alignment, - size_t size) - RPALIAS(rpposix_memalign) void free(void *ptr) - RPALIAS(rpfree) void cfree(void *ptr) RPALIAS(rpfree) -#if defined(__ANDROID__) || defined(__FreeBSD__) - size_t - malloc_usable_size(const void *ptr) RPALIAS(rpmalloc_usable_size) -#else - size_t - malloc_usable_size(void *ptr) RPALIAS(rpmalloc_usable_size) -#endif - size_t malloc_size(void *ptr) RPALIAS(rpmalloc_usable_size) - -#endif - - static inline size_t _rpmalloc_page_size(void) { - return _memory_page_size; -} - -extern void *RPMALLOC_CDECL reallocarray(void *ptr, size_t count, size_t size); - -extern void *RPMALLOC_CDECL reallocarray(void *ptr, size_t count, size_t size) { - size_t total; -#if ENABLE_VALIDATE_ARGS -#ifdef _MSC_VER - int err = SizeTMult(count, size, &total); - if ((err != S_OK) || (total >= MAX_ALLOC_SIZE)) { - errno = EINVAL; - return 0; - } -#else - int err = __builtin_umull_overflow(count, size, &total); - if (err || (total >= MAX_ALLOC_SIZE)) { - errno = EINVAL; - return 0; - } -#endif -#else - total = count * size; -#endif - return realloc(ptr, total); -} - -extern inline void *RPMALLOC_CDECL valloc(size_t size) { - get_thread_heap(); - return rpaligned_alloc(_rpmalloc_page_size(), size); -} - -extern inline void *RPMALLOC_CDECL pvalloc(size_t size) { - get_thread_heap(); - const size_t page_size = _rpmalloc_page_size(); - const size_t aligned_size = ((size + page_size - 1) / page_size) * page_size; -#if ENABLE_VALIDATE_ARGS - if (aligned_size < size) { - errno = EINVAL; - return 0; - } -#endif - return rpaligned_alloc(_rpmalloc_page_size(), aligned_size); -} - -#endif // ENABLE_OVERRIDE - -#if ENABLE_PRELOAD - -#ifdef _WIN32 - -#if defined(BUILD_DYNAMIC_LINK) && BUILD_DYNAMIC_LINK - -extern __declspec(dllexport) BOOL WINAPI DllMain(HINSTANCE instance, - DWORD reason, LPVOID reserved); - -extern __declspec(dllexport) BOOL WINAPI DllMain(HINSTANCE instance, - DWORD reason, - LPVOID reserved) { - (void)sizeof(reserved); - (void)sizeof(instance); - if (reason == DLL_PROCESS_ATTACH) - rpmalloc_initialize(); - else if (reason == DLL_PROCESS_DETACH) - rpmalloc_finalize(); - else if (reason == DLL_THREAD_ATTACH) - rpmalloc_thread_initialize(); - else if (reason == DLL_THREAD_DETACH) - rpmalloc_thread_finalize(1); - return TRUE; -} - -// end BUILD_DYNAMIC_LINK -#else - -extern void _global_rpmalloc_init(void) { - rpmalloc_set_main_thread(); - rpmalloc_initialize(); -} - -#if defined(__clang__) || defined(__GNUC__) - -static void __attribute__((constructor)) initializer(void) { - _global_rpmalloc_init(); -} - -#elif defined(_MSC_VER) - -static int _global_rpmalloc_xib(void) { - _global_rpmalloc_init(); - return 0; -} - -#pragma section(".CRT$XIB", read) -__declspec(allocate(".CRT$XIB")) void (*_rpmalloc_module_init)(void) = - _global_rpmalloc_xib; -#if defined(_M_IX86) || defined(__i386__) -#pragma comment(linker, "/include:" \ - "__rpmalloc_module_init") -#else -#pragma comment(linker, "/include:" \ - "_rpmalloc_module_init") -#endif - -#endif - -// end !BUILD_DYNAMIC_LINK -#endif - -#else - -#include -#include -#include -#include - -extern void rpmalloc_set_main_thread(void); - -static pthread_key_t destructor_key; - -static void thread_destructor(void *); - -static void __attribute__((constructor)) initializer(void) { - rpmalloc_set_main_thread(); - rpmalloc_initialize(); - pthread_key_create(&destructor_key, thread_destructor); -} - -static void __attribute__((destructor)) finalizer(void) { rpmalloc_finalize(); } - -typedef struct { - void *(*real_start)(void *); - void *real_arg; -} thread_starter_arg; - -static void *thread_starter(void *argptr) { - thread_starter_arg *arg = argptr; - void *(*real_start)(void *) = arg->real_start; - void *real_arg = arg->real_arg; - rpmalloc_thread_initialize(); - rpfree(argptr); - pthread_setspecific(destructor_key, (void *)1); - return (*real_start)(real_arg); -} - -static void thread_destructor(void *value) { - (void)sizeof(value); - rpmalloc_thread_finalize(1); -} - -#ifdef __APPLE__ - -static int pthread_create_proxy(pthread_t *thread, const pthread_attr_t *attr, - void *(*start_routine)(void *), void *arg) { - rpmalloc_initialize(); - thread_starter_arg *starter_arg = rpmalloc(sizeof(thread_starter_arg)); - starter_arg->real_start = start_routine; - starter_arg->real_arg = arg; - return pthread_create(thread, attr, thread_starter, starter_arg); -} - -MAC_INTERPOSE_SINGLE(pthread_create_proxy, pthread_create); - -#else - -#include - -int pthread_create(pthread_t *thread, const pthread_attr_t *attr, - void *(*start_routine)(void *), void *arg) { -#if defined(__linux__) || defined(__FreeBSD__) || defined(__OpenBSD__) || \ - defined(__NetBSD__) || defined(__DragonFly__) || defined(__APPLE__) || \ - defined(__HAIKU__) - char fname[] = "pthread_create"; -#else - char fname[] = "_pthread_create"; -#endif - void *real_pthread_create = dlsym(RTLD_NEXT, fname); - rpmalloc_thread_initialize(); - thread_starter_arg *starter_arg = rpmalloc(sizeof(thread_starter_arg)); - starter_arg->real_start = start_routine; - starter_arg->real_arg = arg; - return (*(int (*)(pthread_t *, const pthread_attr_t *, void *(*)(void *), - void *))real_pthread_create)(thread, attr, thread_starter, - starter_arg); -} - -#endif - -#endif - -#endif - -#if ENABLE_OVERRIDE - -#if defined(__GLIBC__) && defined(__linux__) - -void *__libc_malloc(size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(1) - RPALIAS(rpmalloc) void *__libc_calloc(size_t count, size_t size) - RPMALLOC_ATTRIB_MALLOC RPMALLOC_ATTRIB_ALLOC_SIZE2(1, 2) - RPALIAS(rpcalloc) void *__libc_realloc(void *p, size_t size) - RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(2) RPALIAS(rprealloc) void __libc_free(void *p) - RPALIAS(rpfree) void __libc_cfree(void *p) - RPALIAS(rpfree) void *__libc_memalign(size_t align, size_t size) - RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(2) - RPALIAS(rpmemalign) int __posix_memalign(void **p, size_t align, - size_t size) - RPALIAS(rpposix_memalign) - - extern void *__libc_valloc(size_t size); -extern void *__libc_pvalloc(size_t size); - -void *__libc_valloc(size_t size) { return valloc(size); } - -void *__libc_pvalloc(size_t size) { return pvalloc(size); } - -#endif - -#endif - -#if (defined(__GNUC__) || defined(__clang__)) -#pragma GCC visibility pop -#endif +//===------------------------ malloc.c ------------------*- C -*-=============// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This library provides a cross-platform lock free thread caching malloc +// implementation in C11. +// +// +// This file provides overrides for the standard library malloc entry points for +// C and new/delete operators for C++ It also provides automatic +// initialization/finalization of process and threads +// +//===----------------------------------------------------------------------===// + +#if defined(__TINYC__) +#include +#endif + +#ifndef ARCH_64BIT +#if defined(__LLP64__) || defined(__LP64__) || defined(_WIN64) +#define ARCH_64BIT 1 +_Static_assert(sizeof(size_t) == 8, "Data type size mismatch"); +_Static_assert(sizeof(void *) == 8, "Data type size mismatch"); +#else +#define ARCH_64BIT 0 +_Static_assert(sizeof(size_t) == 4, "Data type size mismatch"); +_Static_assert(sizeof(void *) == 4, "Data type size mismatch"); +#endif +#endif + +#if (defined(__GNUC__) || defined(__clang__)) +#pragma GCC visibility push(default) +#endif + +#define USE_IMPLEMENT 1 +#define USE_INTERPOSE 0 +#define USE_ALIAS 0 + +#if defined(__APPLE__) +#undef USE_INTERPOSE +#define USE_INTERPOSE 1 + +typedef struct interpose_t { + void *new_func; + void *orig_func; +} interpose_t; + +#define MAC_INTERPOSE_PAIR(newf, oldf) {(void *)newf, (void *)oldf} +#define MAC_INTERPOSE_SINGLE(newf, oldf) \ + __attribute__((used)) static const interpose_t macinterpose##newf##oldf \ + __attribute__((section("__DATA, __interpose"))) = \ + MAC_INTERPOSE_PAIR(newf, oldf) + +#endif + +#if !defined(_WIN32) && !defined(__APPLE__) +#undef USE_IMPLEMENT +#undef USE_ALIAS +#define USE_IMPLEMENT 0 +#define USE_ALIAS 1 +#endif + +#ifdef _MSC_VER +#pragma warning(disable : 4100) +#undef malloc +#undef free +#undef calloc +#define RPMALLOC_RESTRICT __declspec(restrict) +#else +#define RPMALLOC_RESTRICT +#endif + +#if ENABLE_OVERRIDE + +typedef struct rp_nothrow_t { + int __dummy; +} rp_nothrow_t; + +#if USE_IMPLEMENT + +extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL malloc(size_t size) { + return rpmalloc(size); +} +extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL calloc(size_t count, + size_t size) { + return rpcalloc(count, size); +} +extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL realloc(void *ptr, + size_t size) { + return rprealloc(ptr, size); +} +extern inline void *RPMALLOC_CDECL reallocf(void *ptr, size_t size) { + return rprealloc(ptr, size); +} +extern inline void *RPMALLOC_CDECL aligned_alloc(size_t alignment, + size_t size) { + return rpaligned_alloc(alignment, size); +} +extern inline void *RPMALLOC_CDECL memalign(size_t alignment, size_t size) { + return rpmemalign(alignment, size); +} +extern inline int RPMALLOC_CDECL posix_memalign(void **memptr, size_t alignment, + size_t size) { + return rpposix_memalign(memptr, alignment, size); +} +extern inline void RPMALLOC_CDECL free(void *ptr) { rpfree(ptr); } +extern inline void RPMALLOC_CDECL cfree(void *ptr) { rpfree(ptr); } +extern inline size_t RPMALLOC_CDECL malloc_usable_size(void *ptr) { + return rpmalloc_usable_size(ptr); +} +extern inline size_t RPMALLOC_CDECL malloc_size(void *ptr) { + return rpmalloc_usable_size(ptr); +} + +#ifdef _WIN32 +extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL _malloc_base(size_t size) { + return rpmalloc(size); +} +extern inline void RPMALLOC_CDECL _free_base(void *ptr) { rpfree(ptr); } +extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL _calloc_base(size_t count, + size_t size) { + return rpcalloc(count, size); +} +extern inline size_t RPMALLOC_CDECL _msize(void *ptr) { + return rpmalloc_usable_size(ptr); +} +extern inline size_t RPMALLOC_CDECL _msize_base(void *ptr) { + return rpmalloc_usable_size(ptr); +} +extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL +_realloc_base(void *ptr, size_t size) { + return rprealloc(ptr, size); +} +#endif + +#ifdef _WIN32 +// For Windows, #include in one source file to get the C++ operator +// overrides implemented in your module +#else +// Overload the C++ operators using the mangled names +// (https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling) operators +// delete and delete[] +#define RPDEFVIS __attribute__((visibility("default"))) +extern void _ZdlPv(void *p); +void RPDEFVIS _ZdlPv(void *p) { rpfree(p); } +extern void _ZdaPv(void *p); +void RPDEFVIS _ZdaPv(void *p) { rpfree(p); } +#if ARCH_64BIT +// 64-bit operators new and new[], normal and aligned +extern void *_Znwm(uint64_t size); +void *RPDEFVIS _Znwm(uint64_t size) { return rpmalloc(size); } +extern void *_Znam(uint64_t size); +void *RPDEFVIS _Znam(uint64_t size) { return rpmalloc(size); } +extern void *_Znwmm(uint64_t size, uint64_t align); +void *RPDEFVIS _Znwmm(uint64_t size, uint64_t align) { + return rpaligned_alloc(align, size); +} +extern void *_Znamm(uint64_t size, uint64_t align); +void *RPDEFVIS _Znamm(uint64_t size, uint64_t align) { + return rpaligned_alloc(align, size); +} +extern void *_ZnwmSt11align_val_t(uint64_t size, uint64_t align); +void *RPDEFVIS _ZnwmSt11align_val_t(uint64_t size, uint64_t align) { + return rpaligned_alloc(align, size); +} +extern void *_ZnamSt11align_val_t(uint64_t size, uint64_t align); +void *RPDEFVIS _ZnamSt11align_val_t(uint64_t size, uint64_t align) { + return rpaligned_alloc(align, size); +} +extern void *_ZnwmRKSt9nothrow_t(uint64_t size, rp_nothrow_t t); +void *RPDEFVIS _ZnwmRKSt9nothrow_t(uint64_t size, rp_nothrow_t t) { + (void)sizeof(t); + return rpmalloc(size); +} +extern void *_ZnamRKSt9nothrow_t(uint64_t size, rp_nothrow_t t); +void *RPDEFVIS _ZnamRKSt9nothrow_t(uint64_t size, rp_nothrow_t t) { + (void)sizeof(t); + return rpmalloc(size); +} +extern void *_ZnwmSt11align_val_tRKSt9nothrow_t(uint64_t size, uint64_t align, + rp_nothrow_t t); +void *RPDEFVIS _ZnwmSt11align_val_tRKSt9nothrow_t(uint64_t size, uint64_t align, + rp_nothrow_t t) { + (void)sizeof(t); + return rpaligned_alloc(align, size); +} +extern void *_ZnamSt11align_val_tRKSt9nothrow_t(uint64_t size, uint64_t align, + rp_nothrow_t t); +void *RPDEFVIS _ZnamSt11align_val_tRKSt9nothrow_t(uint64_t size, uint64_t align, + rp_nothrow_t t) { + (void)sizeof(t); + return rpaligned_alloc(align, size); +} +// 64-bit operators sized delete and delete[], normal and aligned +extern void _ZdlPvm(void *p, uint64_t size); +void RPDEFVIS _ZdlPvm(void *p, uint64_t size) { + rpfree(p); + (void)sizeof(size); +} +extern void _ZdaPvm(void *p, uint64_t size); +void RPDEFVIS _ZdaPvm(void *p, uint64_t size) { + rpfree(p); + (void)sizeof(size); +} +extern void _ZdlPvSt11align_val_t(void *p, uint64_t align); +void RPDEFVIS _ZdlPvSt11align_val_t(void *p, uint64_t align) { + rpfree(p); + (void)sizeof(align); +} +extern void _ZdaPvSt11align_val_t(void *p, uint64_t align); +void RPDEFVIS _ZdaPvSt11align_val_t(void *p, uint64_t align) { + rpfree(p); + (void)sizeof(align); +} +extern void _ZdlPvmSt11align_val_t(void *p, uint64_t size, uint64_t align); +void RPDEFVIS _ZdlPvmSt11align_val_t(void *p, uint64_t size, uint64_t align) { + rpfree(p); + (void)sizeof(size); + (void)sizeof(align); +} +extern void _ZdaPvmSt11align_val_t(void *p, uint64_t size, uint64_t align); +void RPDEFVIS _ZdaPvmSt11align_val_t(void *p, uint64_t size, uint64_t align) { + rpfree(p); + (void)sizeof(size); + (void)sizeof(align); +} +#else +// 32-bit operators new and new[], normal and aligned +extern void *_Znwj(uint32_t size); +void *RPDEFVIS _Znwj(uint32_t size) { return rpmalloc(size); } +extern void *_Znaj(uint32_t size); +void *RPDEFVIS _Znaj(uint32_t size) { return rpmalloc(size); } +extern void *_Znwjj(uint32_t size, uint32_t align); +void *RPDEFVIS _Znwjj(uint32_t size, uint32_t align) { + return rpaligned_alloc(align, size); +} +extern void *_Znajj(uint32_t size, uint32_t align); +void *RPDEFVIS _Znajj(uint32_t size, uint32_t align) { + return rpaligned_alloc(align, size); +} +extern void *_ZnwjSt11align_val_t(size_t size, size_t align); +void *RPDEFVIS _ZnwjSt11align_val_t(size_t size, size_t align) { + return rpaligned_alloc(align, size); +} +extern void *_ZnajSt11align_val_t(size_t size, size_t align); +void *RPDEFVIS _ZnajSt11align_val_t(size_t size, size_t align) { + return rpaligned_alloc(align, size); +} +extern void *_ZnwjRKSt9nothrow_t(size_t size, rp_nothrow_t t); +void *RPDEFVIS _ZnwjRKSt9nothrow_t(size_t size, rp_nothrow_t t) { + (void)sizeof(t); + return rpmalloc(size); +} +extern void *_ZnajRKSt9nothrow_t(size_t size, rp_nothrow_t t); +void *RPDEFVIS _ZnajRKSt9nothrow_t(size_t size, rp_nothrow_t t) { + (void)sizeof(t); + return rpmalloc(size); +} +extern void *_ZnwjSt11align_val_tRKSt9nothrow_t(size_t size, size_t align, + rp_nothrow_t t); +void *RPDEFVIS _ZnwjSt11align_val_tRKSt9nothrow_t(size_t size, size_t align, + rp_nothrow_t t) { + (void)sizeof(t); + return rpaligned_alloc(align, size); +} +extern void *_ZnajSt11align_val_tRKSt9nothrow_t(size_t size, size_t align, + rp_nothrow_t t); +void *RPDEFVIS _ZnajSt11align_val_tRKSt9nothrow_t(size_t size, size_t align, + rp_nothrow_t t) { + (void)sizeof(t); + return rpaligned_alloc(align, size); +} +// 32-bit operators sized delete and delete[], normal and aligned +extern void _ZdlPvj(void *p, uint64_t size); +void RPDEFVIS _ZdlPvj(void *p, uint64_t size) { + rpfree(p); + (void)sizeof(size); +} +extern void _ZdaPvj(void *p, uint64_t size); +void RPDEFVIS _ZdaPvj(void *p, uint64_t size) { + rpfree(p); + (void)sizeof(size); +} +extern void _ZdlPvSt11align_val_t(void *p, uint32_t align); +void RPDEFVIS _ZdlPvSt11align_val_t(void *p, uint64_t a) { + rpfree(p); + (void)sizeof(align); +} +extern void _ZdaPvSt11align_val_t(void *p, uint32_t align); +void RPDEFVIS _ZdaPvSt11align_val_t(void *p, uint64_t a) { + rpfree(p); + (void)sizeof(align); +} +extern void _ZdlPvjSt11align_val_t(void *p, uint32_t size, uint32_t align); +void RPDEFVIS _ZdlPvjSt11align_val_t(void *p, uint64_t size, uint64_t align) { + rpfree(p); + (void)sizeof(size); + (void)sizeof(a); +} +extern void _ZdaPvjSt11align_val_t(void *p, uint32_t size, uint32_t align); +void RPDEFVIS _ZdaPvjSt11align_val_t(void *p, uint64_t size, uint64_t align) { + rpfree(p); + (void)sizeof(size); + (void)sizeof(a); +} +#endif +#endif +#endif + +#if USE_INTERPOSE || USE_ALIAS + +static void *rpmalloc_nothrow(size_t size, rp_nothrow_t t) { + (void)sizeof(t); + return rpmalloc(size); +} +static void *rpaligned_alloc_reverse(size_t size, size_t align) { + return rpaligned_alloc(align, size); +} +static void *rpaligned_alloc_reverse_nothrow(size_t size, size_t align, + rp_nothrow_t t) { + (void)sizeof(t); + return rpaligned_alloc(align, size); +} +static void rpfree_size(void *p, size_t size) { + (void)sizeof(size); + rpfree(p); +} +static void rpfree_aligned(void *p, size_t align) { + (void)sizeof(align); + rpfree(p); +} +static void rpfree_size_aligned(void *p, size_t size, size_t align) { + (void)sizeof(size); + (void)sizeof(align); + rpfree(p); +} + +#endif + +#if USE_INTERPOSE + +__attribute__((used)) static const interpose_t macinterpose_malloc[] + __attribute__((section("__DATA, __interpose"))) = { + // new and new[] + MAC_INTERPOSE_PAIR(rpmalloc, _Znwm), + MAC_INTERPOSE_PAIR(rpmalloc, _Znam), + MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse, _Znwmm), + MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse, _Znamm), + MAC_INTERPOSE_PAIR(rpmalloc_nothrow, _ZnwmRKSt9nothrow_t), + MAC_INTERPOSE_PAIR(rpmalloc_nothrow, _ZnamRKSt9nothrow_t), + MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse, _ZnwmSt11align_val_t), + MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse, _ZnamSt11align_val_t), + MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse_nothrow, + _ZnwmSt11align_val_tRKSt9nothrow_t), + MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse_nothrow, + _ZnamSt11align_val_tRKSt9nothrow_t), + // delete and delete[] + MAC_INTERPOSE_PAIR(rpfree, _ZdlPv), MAC_INTERPOSE_PAIR(rpfree, _ZdaPv), + MAC_INTERPOSE_PAIR(rpfree_size, _ZdlPvm), + MAC_INTERPOSE_PAIR(rpfree_size, _ZdaPvm), + MAC_INTERPOSE_PAIR(rpfree_aligned, _ZdlPvSt11align_val_t), + MAC_INTERPOSE_PAIR(rpfree_aligned, _ZdaPvSt11align_val_t), + MAC_INTERPOSE_PAIR(rpfree_size_aligned, _ZdlPvmSt11align_val_t), + MAC_INTERPOSE_PAIR(rpfree_size_aligned, _ZdaPvmSt11align_val_t), + // libc entry points + MAC_INTERPOSE_PAIR(rpmalloc, malloc), + MAC_INTERPOSE_PAIR(rpmalloc, calloc), + MAC_INTERPOSE_PAIR(rprealloc, realloc), + MAC_INTERPOSE_PAIR(rprealloc, reallocf), +#if defined(__MAC_10_15) && __MAC_OS_X_VERSION_MIN_REQUIRED >= __MAC_10_15 + MAC_INTERPOSE_PAIR(rpaligned_alloc, aligned_alloc), +#endif + MAC_INTERPOSE_PAIR(rpmemalign, memalign), + MAC_INTERPOSE_PAIR(rpposix_memalign, posix_memalign), + MAC_INTERPOSE_PAIR(rpfree, free), MAC_INTERPOSE_PAIR(rpfree, cfree), + MAC_INTERPOSE_PAIR(rpmalloc_usable_size, malloc_usable_size), + MAC_INTERPOSE_PAIR(rpmalloc_usable_size, malloc_size)}; + +#endif + +#if USE_ALIAS + +#define RPALIAS(fn) __attribute__((alias(#fn), used, visibility("default"))); + +// Alias the C++ operators using the mangled names +// (https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling) + +// operators delete and delete[] +void _ZdlPv(void *p) RPALIAS(rpfree) void _ZdaPv(void *p) RPALIAS(rpfree) + +#if ARCH_64BIT + // 64-bit operators new and new[], normal and aligned + void *_Znwm(uint64_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(1) + RPALIAS(rpmalloc) void *_Znam(uint64_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(1) RPALIAS(rpmalloc) void *_Znwmm(uint64_t size, + uint64_t align) + RPALIAS(rpaligned_alloc_reverse) void *_Znamm(uint64_t size, + uint64_t align) + RPALIAS(rpaligned_alloc_reverse) void *_ZnwmSt11align_val_t( + size_t size, size_t align) + RPALIAS(rpaligned_alloc_reverse) void *_ZnamSt11align_val_t( + size_t size, size_t align) + RPALIAS(rpaligned_alloc_reverse) void *_ZnwmRKSt9nothrow_t( + size_t size, rp_nothrow_t t) + RPALIAS(rpmalloc_nothrow) void *_ZnamRKSt9nothrow_t( + size_t size, + rp_nothrow_t t) RPALIAS(rpmalloc_nothrow) void + *_ZnwmSt11align_val_tRKSt9nothrow_t(size_t size, + size_t align, + rp_nothrow_t t) + RPALIAS(rpaligned_alloc_reverse_nothrow) void + *_ZnamSt11align_val_tRKSt9nothrow_t( + size_t size, size_t align, + rp_nothrow_t t) + RPALIAS(rpaligned_alloc_reverse_nothrow) + // 64-bit operators delete and delete[], sized and aligned + void _ZdlPvm(void *p, size_t n) RPALIAS(rpfree_size) void _ZdaPvm(void *p, + size_t n) + RPALIAS(rpfree_size) void _ZdlPvSt11align_val_t(void *p, size_t a) + RPALIAS(rpfree_aligned) void _ZdaPvSt11align_val_t(void *p, + size_t a) + RPALIAS(rpfree_aligned) void _ZdlPvmSt11align_val_t(void *p, + size_t n, + size_t a) + RPALIAS(rpfree_size_aligned) void _ZdaPvmSt11align_val_t( + void *p, size_t n, size_t a) + RPALIAS(rpfree_size_aligned) +#else + // 32-bit operators new and new[], normal and aligned + void *_Znwj(uint32_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(1) + RPALIAS(rpmalloc) void *_Znaj(uint32_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(1) RPALIAS(rpmalloc) void *_Znwjj(uint32_t size, + uint32_t align) + RPALIAS(rpaligned_alloc_reverse) void *_Znajj(uint32_t size, + uint32_t align) + RPALIAS(rpaligned_alloc_reverse) void *_ZnwjSt11align_val_t( + size_t size, size_t align) + RPALIAS(rpaligned_alloc_reverse) void *_ZnajSt11align_val_t( + size_t size, size_t align) + RPALIAS(rpaligned_alloc_reverse) void *_ZnwjRKSt9nothrow_t( + size_t size, rp_nothrow_t t) + RPALIAS(rpmalloc_nothrow) void *_ZnajRKSt9nothrow_t( + size_t size, + rp_nothrow_t t) RPALIAS(rpmalloc_nothrow) void + *_ZnwjSt11align_val_tRKSt9nothrow_t(size_t size, + size_t align, + rp_nothrow_t t) + RPALIAS(rpaligned_alloc_reverse_nothrow) void + *_ZnajSt11align_val_tRKSt9nothrow_t( + size_t size, size_t align, + rp_nothrow_t t) + RPALIAS(rpaligned_alloc_reverse_nothrow) + // 32-bit operators delete and delete[], sized and aligned + void _ZdlPvj(void *p, size_t n) RPALIAS(rpfree_size) void _ZdaPvj(void *p, + size_t n) + RPALIAS(rpfree_size) void _ZdlPvSt11align_val_t(void *p, size_t a) + RPALIAS(rpfree_aligned) void _ZdaPvSt11align_val_t(void *p, + size_t a) + RPALIAS(rpfree_aligned) void _ZdlPvjSt11align_val_t(void *p, + size_t n, + size_t a) + RPALIAS(rpfree_size_aligned) void _ZdaPvjSt11align_val_t( + void *p, size_t n, size_t a) + RPALIAS(rpfree_size_aligned) +#endif + + void *malloc(size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(1) + RPALIAS(rpmalloc) void *calloc(size_t count, size_t size) + RPALIAS(rpcalloc) void *realloc(void *ptr, size_t size) + RPALIAS(rprealloc) void *reallocf(void *ptr, size_t size) + RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(2) + RPALIAS(rprealloc) void *aligned_alloc(size_t alignment, size_t size) + RPALIAS(rpaligned_alloc) void *memalign( + size_t alignment, size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(2) + RPALIAS(rpmemalign) int posix_memalign(void **memptr, size_t alignment, + size_t size) + RPALIAS(rpposix_memalign) void free(void *ptr) + RPALIAS(rpfree) void cfree(void *ptr) RPALIAS(rpfree) +#if defined(__ANDROID__) || defined(__FreeBSD__) + size_t + malloc_usable_size(const void *ptr) RPALIAS(rpmalloc_usable_size) +#else + size_t + malloc_usable_size(void *ptr) RPALIAS(rpmalloc_usable_size) +#endif + size_t malloc_size(void *ptr) RPALIAS(rpmalloc_usable_size) + +#endif + + static inline size_t _rpmalloc_page_size(void) { + return _memory_page_size; +} + +extern void *RPMALLOC_CDECL reallocarray(void *ptr, size_t count, size_t size); + +extern void *RPMALLOC_CDECL reallocarray(void *ptr, size_t count, size_t size) { + size_t total; +#if ENABLE_VALIDATE_ARGS +#ifdef _MSC_VER + int err = SizeTMult(count, size, &total); + if ((err != S_OK) || (total >= MAX_ALLOC_SIZE)) { + errno = EINVAL; + return 0; + } +#else + int err = __builtin_umull_overflow(count, size, &total); + if (err || (total >= MAX_ALLOC_SIZE)) { + errno = EINVAL; + return 0; + } +#endif +#else + total = count * size; +#endif + return realloc(ptr, total); +} + +extern inline void *RPMALLOC_CDECL valloc(size_t size) { + get_thread_heap(); + return rpaligned_alloc(_rpmalloc_page_size(), size); +} + +extern inline void *RPMALLOC_CDECL pvalloc(size_t size) { + get_thread_heap(); + const size_t page_size = _rpmalloc_page_size(); + const size_t aligned_size = ((size + page_size - 1) / page_size) * page_size; +#if ENABLE_VALIDATE_ARGS + if (aligned_size < size) { + errno = EINVAL; + return 0; + } +#endif + return rpaligned_alloc(_rpmalloc_page_size(), aligned_size); +} + +#endif // ENABLE_OVERRIDE + +#if ENABLE_PRELOAD + +#ifdef _WIN32 + +#if defined(BUILD_DYNAMIC_LINK) && BUILD_DYNAMIC_LINK + +extern __declspec(dllexport) BOOL WINAPI DllMain(HINSTANCE instance, + DWORD reason, LPVOID reserved); + +extern __declspec(dllexport) BOOL WINAPI DllMain(HINSTANCE instance, + DWORD reason, + LPVOID reserved) { + (void)sizeof(reserved); + (void)sizeof(instance); + if (reason == DLL_PROCESS_ATTACH) + rpmalloc_initialize(); + else if (reason == DLL_PROCESS_DETACH) + rpmalloc_finalize(); + else if (reason == DLL_THREAD_ATTACH) + rpmalloc_thread_initialize(); + else if (reason == DLL_THREAD_DETACH) + rpmalloc_thread_finalize(1); + return TRUE; +} + +// end BUILD_DYNAMIC_LINK +#else + +extern void _global_rpmalloc_init(void) { + rpmalloc_set_main_thread(); + rpmalloc_initialize(); +} + +#if defined(__clang__) || defined(__GNUC__) + +static void __attribute__((constructor)) initializer(void) { + _global_rpmalloc_init(); +} + +#elif defined(_MSC_VER) + +static int _global_rpmalloc_xib(void) { + _global_rpmalloc_init(); + return 0; +} + +#pragma section(".CRT$XIB", read) +__declspec(allocate(".CRT$XIB")) void (*_rpmalloc_module_init)(void) = + _global_rpmalloc_xib; +#if defined(_M_IX86) || defined(__i386__) +#pragma comment(linker, "/include:" \ + "__rpmalloc_module_init") +#else +#pragma comment(linker, "/include:" \ + "_rpmalloc_module_init") +#endif + +#endif + +// end !BUILD_DYNAMIC_LINK +#endif + +#else + +#include +#include +#include +#include + +extern void rpmalloc_set_main_thread(void); + +static pthread_key_t destructor_key; + +static void thread_destructor(void *); + +static void __attribute__((constructor)) initializer(void) { + rpmalloc_set_main_thread(); + rpmalloc_initialize(); + pthread_key_create(&destructor_key, thread_destructor); +} + +static void __attribute__((destructor)) finalizer(void) { rpmalloc_finalize(); } + +typedef struct { + void *(*real_start)(void *); + void *real_arg; +} thread_starter_arg; + +static void *thread_starter(void *argptr) { + thread_starter_arg *arg = argptr; + void *(*real_start)(void *) = arg->real_start; + void *real_arg = arg->real_arg; + rpmalloc_thread_initialize(); + rpfree(argptr); + pthread_setspecific(destructor_key, (void *)1); + return (*real_start)(real_arg); +} + +static void thread_destructor(void *value) { + (void)sizeof(value); + rpmalloc_thread_finalize(1); +} + +#ifdef __APPLE__ + +static int pthread_create_proxy(pthread_t *thread, const pthread_attr_t *attr, + void *(*start_routine)(void *), void *arg) { + rpmalloc_initialize(); + thread_starter_arg *starter_arg = rpmalloc(sizeof(thread_starter_arg)); + starter_arg->real_start = start_routine; + starter_arg->real_arg = arg; + return pthread_create(thread, attr, thread_starter, starter_arg); +} + +MAC_INTERPOSE_SINGLE(pthread_create_proxy, pthread_create); + +#else + +#include + +int pthread_create(pthread_t *thread, const pthread_attr_t *attr, + void *(*start_routine)(void *), void *arg) { +#if defined(__linux__) || defined(__FreeBSD__) || defined(__OpenBSD__) || \ + defined(__NetBSD__) || defined(__DragonFly__) || defined(__APPLE__) || \ + defined(__HAIKU__) + char fname[] = "pthread_create"; +#else + char fname[] = "_pthread_create"; +#endif + void *real_pthread_create = dlsym(RTLD_NEXT, fname); + rpmalloc_thread_initialize(); + thread_starter_arg *starter_arg = rpmalloc(sizeof(thread_starter_arg)); + starter_arg->real_start = start_routine; + starter_arg->real_arg = arg; + return (*(int (*)(pthread_t *, const pthread_attr_t *, void *(*)(void *), + void *))real_pthread_create)(thread, attr, thread_starter, + starter_arg); +} + +#endif + +#endif + +#endif + +#if ENABLE_OVERRIDE + +#if defined(__GLIBC__) && defined(__linux__) + +void *__libc_malloc(size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(1) + RPALIAS(rpmalloc) void *__libc_calloc(size_t count, size_t size) + RPMALLOC_ATTRIB_MALLOC RPMALLOC_ATTRIB_ALLOC_SIZE2(1, 2) + RPALIAS(rpcalloc) void *__libc_realloc(void *p, size_t size) + RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(2) RPALIAS(rprealloc) void __libc_free(void *p) + RPALIAS(rpfree) void __libc_cfree(void *p) + RPALIAS(rpfree) void *__libc_memalign(size_t align, size_t size) + RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(2) + RPALIAS(rpmemalign) int __posix_memalign(void **p, size_t align, + size_t size) + RPALIAS(rpposix_memalign) + + extern void *__libc_valloc(size_t size); +extern void *__libc_pvalloc(size_t size); + +void *__libc_valloc(size_t size) { return valloc(size); } + +void *__libc_pvalloc(size_t size) { return pvalloc(size); } + +#endif + +#endif + +#if (defined(__GNUC__) || defined(__clang__)) +#pragma GCC visibility pop +#endif diff --git a/llvm/lib/Support/rpmalloc/rpmalloc.c b/llvm/lib/Support/rpmalloc/rpmalloc.c index a06d3cdb5b52ef..0976ec8ae6af4e 100644 --- a/llvm/lib/Support/rpmalloc/rpmalloc.c +++ b/llvm/lib/Support/rpmalloc/rpmalloc.c @@ -1,3992 +1,3992 @@ -//===---------------------- rpmalloc.c ------------------*- C -*-=============// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// -// -// This library provides a cross-platform lock free thread caching malloc -// implementation in C11. -// -//===----------------------------------------------------------------------===// - -#include "rpmalloc.h" - -//////////// -/// -/// Build time configurable limits -/// -////// - -#if defined(__clang__) -#pragma clang diagnostic ignored "-Wunused-macros" -#pragma clang diagnostic ignored "-Wunused-function" -#if __has_warning("-Wreserved-identifier") -#pragma clang diagnostic ignored "-Wreserved-identifier" -#endif -#if __has_warning("-Wstatic-in-inline") -#pragma clang diagnostic ignored "-Wstatic-in-inline" -#endif -#elif defined(__GNUC__) -#pragma GCC diagnostic ignored "-Wunused-macros" -#pragma GCC diagnostic ignored "-Wunused-function" -#endif - -#if !defined(__has_builtin) -#define __has_builtin(b) 0 -#endif - -#if defined(__GNUC__) || defined(__clang__) - -#if __has_builtin(__builtin_memcpy_inline) -#define _rpmalloc_memcpy_const(x, y, s) __builtin_memcpy_inline(x, y, s) -#else -#define _rpmalloc_memcpy_const(x, y, s) \ - do { \ - _Static_assert(__builtin_choose_expr(__builtin_constant_p(s), 1, 0), \ - "len must be a constant integer"); \ - memcpy(x, y, s); \ - } while (0) -#endif - -#if __has_builtin(__builtin_memset_inline) -#define _rpmalloc_memset_const(x, y, s) __builtin_memset_inline(x, y, s) -#else -#define _rpmalloc_memset_const(x, y, s) \ - do { \ - _Static_assert(__builtin_choose_expr(__builtin_constant_p(s), 1, 0), \ - "len must be a constant integer"); \ - memset(x, y, s); \ - } while (0) -#endif -#else -#define _rpmalloc_memcpy_const(x, y, s) memcpy(x, y, s) -#define _rpmalloc_memset_const(x, y, s) memset(x, y, s) -#endif - -#if __has_builtin(__builtin_assume) -#define rpmalloc_assume(cond) __builtin_assume(cond) -#elif defined(__GNUC__) -#define rpmalloc_assume(cond) \ - do { \ - if (!__builtin_expect(cond, 0)) \ - __builtin_unreachable(); \ - } while (0) -#elif defined(_MSC_VER) -#define rpmalloc_assume(cond) __assume(cond) -#else -#define rpmalloc_assume(cond) 0 -#endif - -#ifndef HEAP_ARRAY_SIZE -//! Size of heap hashmap -#define HEAP_ARRAY_SIZE 47 -#endif -#ifndef ENABLE_THREAD_CACHE -//! Enable per-thread cache -#define ENABLE_THREAD_CACHE 1 -#endif -#ifndef ENABLE_GLOBAL_CACHE -//! Enable global cache shared between all threads, requires thread cache -#define ENABLE_GLOBAL_CACHE 1 -#endif -#ifndef ENABLE_VALIDATE_ARGS -//! Enable validation of args to public entry points -#define ENABLE_VALIDATE_ARGS 0 -#endif -#ifndef ENABLE_STATISTICS -//! Enable statistics collection -#define ENABLE_STATISTICS 0 -#endif -#ifndef ENABLE_ASSERTS -//! Enable asserts -#define ENABLE_ASSERTS 0 -#endif -#ifndef ENABLE_OVERRIDE -//! Override standard library malloc/free and new/delete entry points -#define ENABLE_OVERRIDE 0 -#endif -#ifndef ENABLE_PRELOAD -//! Support preloading -#define ENABLE_PRELOAD 0 -#endif -#ifndef DISABLE_UNMAP -//! Disable unmapping memory pages (also enables unlimited cache) -#define DISABLE_UNMAP 0 -#endif -#ifndef ENABLE_UNLIMITED_CACHE -//! Enable unlimited global cache (no unmapping until finalization) -#define ENABLE_UNLIMITED_CACHE 0 -#endif -#ifndef ENABLE_ADAPTIVE_THREAD_CACHE -//! Enable adaptive thread cache size based on use heuristics -#define ENABLE_ADAPTIVE_THREAD_CACHE 0 -#endif -#ifndef DEFAULT_SPAN_MAP_COUNT -//! Default number of spans to map in call to map more virtual memory (default -//! values yield 4MiB here) -#define DEFAULT_SPAN_MAP_COUNT 64 -#endif -#ifndef GLOBAL_CACHE_MULTIPLIER -//! Multiplier for global cache -#define GLOBAL_CACHE_MULTIPLIER 8 -#endif - -#if DISABLE_UNMAP && !ENABLE_GLOBAL_CACHE -#error Must use global cache if unmap is disabled -#endif - -#if DISABLE_UNMAP -#undef ENABLE_UNLIMITED_CACHE -#define ENABLE_UNLIMITED_CACHE 1 -#endif - -#if !ENABLE_GLOBAL_CACHE -#undef ENABLE_UNLIMITED_CACHE -#define ENABLE_UNLIMITED_CACHE 0 -#endif - -#if !ENABLE_THREAD_CACHE -#undef ENABLE_ADAPTIVE_THREAD_CACHE -#define ENABLE_ADAPTIVE_THREAD_CACHE 0 -#endif - -#if defined(_WIN32) || defined(__WIN32__) || defined(_WIN64) -#define PLATFORM_WINDOWS 1 -#define PLATFORM_POSIX 0 -#else -#define PLATFORM_WINDOWS 0 -#define PLATFORM_POSIX 1 -#endif - -/// Platform and arch specifics -#if defined(_MSC_VER) && !defined(__clang__) -#pragma warning(disable : 5105) -#ifndef FORCEINLINE -#define FORCEINLINE inline __forceinline -#endif -#define _Static_assert static_assert -#else -#ifndef FORCEINLINE -#define FORCEINLINE inline __attribute__((__always_inline__)) -#endif -#endif -#if PLATFORM_WINDOWS -#ifndef WIN32_LEAN_AND_MEAN -#define WIN32_LEAN_AND_MEAN -#endif -#include -#if ENABLE_VALIDATE_ARGS -#include -#endif -#else -#include -#include -#include -#include -#if defined(__linux__) || defined(__ANDROID__) -#include -#if !defined(PR_SET_VMA) -#define PR_SET_VMA 0x53564d41 -#define PR_SET_VMA_ANON_NAME 0 -#endif -#endif -#if defined(__APPLE__) -#include -#if !TARGET_OS_IPHONE && !TARGET_OS_SIMULATOR -#include -#include -#endif -#include -#endif -#if defined(__HAIKU__) || defined(__TINYC__) -#include -#endif -#endif - -#include -#include -#include - -#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) -#include -static DWORD fls_key; -#endif - -#if PLATFORM_POSIX -#include -#include -#ifdef __FreeBSD__ -#include -#define MAP_HUGETLB MAP_ALIGNED_SUPER -#ifndef PROT_MAX -#define PROT_MAX(f) 0 -#endif -#else -#define PROT_MAX(f) 0 -#endif -#ifdef __sun -extern int madvise(caddr_t, size_t, int); -#endif -#ifndef MAP_UNINITIALIZED -#define MAP_UNINITIALIZED 0 -#endif -#endif -#include - -#if ENABLE_ASSERTS -#undef NDEBUG -#if defined(_MSC_VER) && !defined(_DEBUG) -#define _DEBUG -#endif -#include -#define RPMALLOC_TOSTRING_M(x) #x -#define RPMALLOC_TOSTRING(x) RPMALLOC_TOSTRING_M(x) -#define rpmalloc_assert(truth, message) \ - do { \ - if (!(truth)) { \ - if (_memory_config.error_callback) { \ - _memory_config.error_callback(message " (" RPMALLOC_TOSTRING( \ - truth) ") at " __FILE__ ":" RPMALLOC_TOSTRING(__LINE__)); \ - } else { \ - assert((truth) && message); \ - } \ - } \ - } while (0) -#else -#define rpmalloc_assert(truth, message) \ - do { \ - } while (0) -#endif -#if ENABLE_STATISTICS -#include -#endif - -////// -/// -/// Atomic access abstraction (since MSVC does not do C11 yet) -/// -////// - -#if defined(_MSC_VER) && !defined(__clang__) - -typedef volatile long atomic32_t; -typedef volatile long long atomic64_t; -typedef volatile void *atomicptr_t; - -static FORCEINLINE int32_t atomic_load32(atomic32_t *src) { return *src; } -static FORCEINLINE void atomic_store32(atomic32_t *dst, int32_t val) { - *dst = val; -} -static FORCEINLINE int32_t atomic_incr32(atomic32_t *val) { - return (int32_t)InterlockedIncrement(val); -} -static FORCEINLINE int32_t atomic_decr32(atomic32_t *val) { - return (int32_t)InterlockedDecrement(val); -} -static FORCEINLINE int32_t atomic_add32(atomic32_t *val, int32_t add) { - return (int32_t)InterlockedExchangeAdd(val, add) + add; -} -static FORCEINLINE int atomic_cas32_acquire(atomic32_t *dst, int32_t val, - int32_t ref) { - return (InterlockedCompareExchange(dst, val, ref) == ref) ? 1 : 0; -} -static FORCEINLINE void atomic_store32_release(atomic32_t *dst, int32_t val) { - *dst = val; -} -static FORCEINLINE int64_t atomic_load64(atomic64_t *src) { return *src; } -static FORCEINLINE int64_t atomic_add64(atomic64_t *val, int64_t add) { - return (int64_t)InterlockedExchangeAdd64(val, add) + add; -} -static FORCEINLINE void *atomic_load_ptr(atomicptr_t *src) { - return (void *)*src; -} -static FORCEINLINE void atomic_store_ptr(atomicptr_t *dst, void *val) { - *dst = val; -} -static FORCEINLINE void atomic_store_ptr_release(atomicptr_t *dst, void *val) { - *dst = val; -} -static FORCEINLINE void *atomic_exchange_ptr_acquire(atomicptr_t *dst, - void *val) { - return (void *)InterlockedExchangePointer((void *volatile *)dst, val); -} -static FORCEINLINE int atomic_cas_ptr(atomicptr_t *dst, void *val, void *ref) { - return (InterlockedCompareExchangePointer((void *volatile *)dst, val, ref) == - ref) - ? 1 - : 0; -} - -#define EXPECTED(x) (x) -#define UNEXPECTED(x) (x) - -#else - -#include - -typedef volatile _Atomic(int32_t) atomic32_t; -typedef volatile _Atomic(int64_t) atomic64_t; -typedef volatile _Atomic(void *) atomicptr_t; - -static FORCEINLINE int32_t atomic_load32(atomic32_t *src) { - return atomic_load_explicit(src, memory_order_relaxed); -} -static FORCEINLINE void atomic_store32(atomic32_t *dst, int32_t val) { - atomic_store_explicit(dst, val, memory_order_relaxed); -} -static FORCEINLINE int32_t atomic_incr32(atomic32_t *val) { - return atomic_fetch_add_explicit(val, 1, memory_order_relaxed) + 1; -} -static FORCEINLINE int32_t atomic_decr32(atomic32_t *val) { - return atomic_fetch_add_explicit(val, -1, memory_order_relaxed) - 1; -} -static FORCEINLINE int32_t atomic_add32(atomic32_t *val, int32_t add) { - return atomic_fetch_add_explicit(val, add, memory_order_relaxed) + add; -} -static FORCEINLINE int atomic_cas32_acquire(atomic32_t *dst, int32_t val, - int32_t ref) { - return atomic_compare_exchange_weak_explicit( - dst, &ref, val, memory_order_acquire, memory_order_relaxed); -} -static FORCEINLINE void atomic_store32_release(atomic32_t *dst, int32_t val) { - atomic_store_explicit(dst, val, memory_order_release); -} -static FORCEINLINE int64_t atomic_load64(atomic64_t *val) { - return atomic_load_explicit(val, memory_order_relaxed); -} -static FORCEINLINE int64_t atomic_add64(atomic64_t *val, int64_t add) { - return atomic_fetch_add_explicit(val, add, memory_order_relaxed) + add; -} -static FORCEINLINE void *atomic_load_ptr(atomicptr_t *src) { - return atomic_load_explicit(src, memory_order_relaxed); -} -static FORCEINLINE void atomic_store_ptr(atomicptr_t *dst, void *val) { - atomic_store_explicit(dst, val, memory_order_relaxed); -} -static FORCEINLINE void atomic_store_ptr_release(atomicptr_t *dst, void *val) { - atomic_store_explicit(dst, val, memory_order_release); -} -static FORCEINLINE void *atomic_exchange_ptr_acquire(atomicptr_t *dst, - void *val) { - return atomic_exchange_explicit(dst, val, memory_order_acquire); -} -static FORCEINLINE int atomic_cas_ptr(atomicptr_t *dst, void *val, void *ref) { - return atomic_compare_exchange_weak_explicit( - dst, &ref, val, memory_order_relaxed, memory_order_relaxed); -} - -#define EXPECTED(x) __builtin_expect((x), 1) -#define UNEXPECTED(x) __builtin_expect((x), 0) - -#endif - -//////////// -/// -/// Statistics related functions (evaluate to nothing when statistics not -/// enabled) -/// -////// - -#if ENABLE_STATISTICS -#define _rpmalloc_stat_inc(counter) atomic_incr32(counter) -#define _rpmalloc_stat_dec(counter) atomic_decr32(counter) -#define _rpmalloc_stat_add(counter, value) \ - atomic_add32(counter, (int32_t)(value)) -#define _rpmalloc_stat_add64(counter, value) \ - atomic_add64(counter, (int64_t)(value)) -#define _rpmalloc_stat_add_peak(counter, value, peak) \ - do { \ - int32_t _cur_count = atomic_add32(counter, (int32_t)(value)); \ - if (_cur_count > (peak)) \ - peak = _cur_count; \ - } while (0) -#define _rpmalloc_stat_sub(counter, value) \ - atomic_add32(counter, -(int32_t)(value)) -#define _rpmalloc_stat_inc_alloc(heap, class_idx) \ - do { \ - int32_t alloc_current = \ - atomic_incr32(&heap->size_class_use[class_idx].alloc_current); \ - if (alloc_current > heap->size_class_use[class_idx].alloc_peak) \ - heap->size_class_use[class_idx].alloc_peak = alloc_current; \ - atomic_incr32(&heap->size_class_use[class_idx].alloc_total); \ - } while (0) -#define _rpmalloc_stat_inc_free(heap, class_idx) \ - do { \ - atomic_decr32(&heap->size_class_use[class_idx].alloc_current); \ - atomic_incr32(&heap->size_class_use[class_idx].free_total); \ - } while (0) -#else -#define _rpmalloc_stat_inc(counter) \ - do { \ - } while (0) -#define _rpmalloc_stat_dec(counter) \ - do { \ - } while (0) -#define _rpmalloc_stat_add(counter, value) \ - do { \ - } while (0) -#define _rpmalloc_stat_add64(counter, value) \ - do { \ - } while (0) -#define _rpmalloc_stat_add_peak(counter, value, peak) \ - do { \ - } while (0) -#define _rpmalloc_stat_sub(counter, value) \ - do { \ - } while (0) -#define _rpmalloc_stat_inc_alloc(heap, class_idx) \ - do { \ - } while (0) -#define _rpmalloc_stat_inc_free(heap, class_idx) \ - do { \ - } while (0) -#endif - -/// -/// Preconfigured limits and sizes -/// - -//! Granularity of a small allocation block (must be power of two) -#define SMALL_GRANULARITY 16 -//! Small granularity shift count -#define SMALL_GRANULARITY_SHIFT 4 -//! Number of small block size classes -#define SMALL_CLASS_COUNT 65 -//! Maximum size of a small block -#define SMALL_SIZE_LIMIT (SMALL_GRANULARITY * (SMALL_CLASS_COUNT - 1)) -//! Granularity of a medium allocation block -#define MEDIUM_GRANULARITY 512 -//! Medium granularity shift count -#define MEDIUM_GRANULARITY_SHIFT 9 -//! Number of medium block size classes -#define MEDIUM_CLASS_COUNT 61 -//! Total number of small + medium size classes -#define SIZE_CLASS_COUNT (SMALL_CLASS_COUNT + MEDIUM_CLASS_COUNT) -//! Number of large block size classes -#define LARGE_CLASS_COUNT 63 -//! Maximum size of a medium block -#define MEDIUM_SIZE_LIMIT \ - (SMALL_SIZE_LIMIT + (MEDIUM_GRANULARITY * MEDIUM_CLASS_COUNT)) -//! Maximum size of a large block -#define LARGE_SIZE_LIMIT \ - ((LARGE_CLASS_COUNT * _memory_span_size) - SPAN_HEADER_SIZE) -//! Size of a span header (must be a multiple of SMALL_GRANULARITY and a power -//! of two) -#define SPAN_HEADER_SIZE 128 -//! Number of spans in thread cache -#define MAX_THREAD_SPAN_CACHE 400 -//! Number of spans to transfer between thread and global cache -#define THREAD_SPAN_CACHE_TRANSFER 64 -//! Number of spans in thread cache for large spans (must be greater than -//! LARGE_CLASS_COUNT / 2) -#define MAX_THREAD_SPAN_LARGE_CACHE 100 -//! Number of spans to transfer between thread and global cache for large spans -#define THREAD_SPAN_LARGE_CACHE_TRANSFER 6 - -_Static_assert((SMALL_GRANULARITY & (SMALL_GRANULARITY - 1)) == 0, - "Small granularity must be power of two"); -_Static_assert((SPAN_HEADER_SIZE & (SPAN_HEADER_SIZE - 1)) == 0, - "Span header size must be power of two"); - -#if ENABLE_VALIDATE_ARGS -//! Maximum allocation size to avoid integer overflow -#undef MAX_ALLOC_SIZE -#define MAX_ALLOC_SIZE (((size_t) - 1) - _memory_span_size) -#endif - -#define pointer_offset(ptr, ofs) (void *)((char *)(ptr) + (ptrdiff_t)(ofs)) -#define pointer_diff(first, second) \ - (ptrdiff_t)((const char *)(first) - (const char *)(second)) - -#define INVALID_POINTER ((void *)((uintptr_t) - 1)) - -#define SIZE_CLASS_LARGE SIZE_CLASS_COUNT -#define SIZE_CLASS_HUGE ((uint32_t) - 1) - -//////////// -/// -/// Data types -/// -////// - -//! A memory heap, per thread -typedef struct heap_t heap_t; -//! Span of memory pages -typedef struct span_t span_t; -//! Span list -typedef struct span_list_t span_list_t; -//! Span active data -typedef struct span_active_t span_active_t; -//! Size class definition -typedef struct size_class_t size_class_t; -//! Global cache -typedef struct global_cache_t global_cache_t; - -//! Flag indicating span is the first (master) span of a split superspan -#define SPAN_FLAG_MASTER 1U -//! Flag indicating span is a secondary (sub) span of a split superspan -#define SPAN_FLAG_SUBSPAN 2U -//! Flag indicating span has blocks with increased alignment -#define SPAN_FLAG_ALIGNED_BLOCKS 4U -//! Flag indicating an unmapped master span -#define SPAN_FLAG_UNMAPPED_MASTER 8U - -#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS -struct span_use_t { - //! Current number of spans used (actually used, not in cache) - atomic32_t current; - //! High water mark of spans used - atomic32_t high; -#if ENABLE_STATISTICS - //! Number of spans in deferred list - atomic32_t spans_deferred; - //! Number of spans transitioned to global cache - atomic32_t spans_to_global; - //! Number of spans transitioned from global cache - atomic32_t spans_from_global; - //! Number of spans transitioned to thread cache - atomic32_t spans_to_cache; - //! Number of spans transitioned from thread cache - atomic32_t spans_from_cache; - //! Number of spans transitioned to reserved state - atomic32_t spans_to_reserved; - //! Number of spans transitioned from reserved state - atomic32_t spans_from_reserved; - //! Number of raw memory map calls - atomic32_t spans_map_calls; -#endif -}; -typedef struct span_use_t span_use_t; -#endif - -#if ENABLE_STATISTICS -struct size_class_use_t { - //! Current number of allocations - atomic32_t alloc_current; - //! Peak number of allocations - int32_t alloc_peak; - //! Total number of allocations - atomic32_t alloc_total; - //! Total number of frees - atomic32_t free_total; - //! Number of spans in use - atomic32_t spans_current; - //! Number of spans transitioned to cache - int32_t spans_peak; - //! Number of spans transitioned to cache - atomic32_t spans_to_cache; - //! Number of spans transitioned from cache - atomic32_t spans_from_cache; - //! Number of spans transitioned from reserved state - atomic32_t spans_from_reserved; - //! Number of spans mapped - atomic32_t spans_map_calls; - int32_t unused; -}; -typedef struct size_class_use_t size_class_use_t; -#endif - -// A span can either represent a single span of memory pages with size declared -// by span_map_count configuration variable, or a set of spans in a continuous -// region, a super span. Any reference to the term "span" usually refers to both -// a single span or a super span. A super span can further be divided into -// multiple spans (or this, super spans), where the first (super)span is the -// master and subsequent (super)spans are subspans. The master span keeps track -// of how many subspans that are still alive and mapped in virtual memory, and -// once all subspans and master have been unmapped the entire superspan region -// is released and unmapped (on Windows for example, the entire superspan range -// has to be released in the same call to release the virtual memory range, but -// individual subranges can be decommitted individually to reduce physical -// memory use). -struct span_t { - //! Free list - void *free_list; - //! Total block count of size class - uint32_t block_count; - //! Size class - uint32_t size_class; - //! Index of last block initialized in free list - uint32_t free_list_limit; - //! Number of used blocks remaining when in partial state - uint32_t used_count; - //! Deferred free list - atomicptr_t free_list_deferred; - //! Size of deferred free list, or list of spans when part of a cache list - uint32_t list_size; - //! Size of a block - uint32_t block_size; - //! Flags and counters - uint32_t flags; - //! Number of spans - uint32_t span_count; - //! Total span counter for master spans - uint32_t total_spans; - //! Offset from master span for subspans - uint32_t offset_from_master; - //! Remaining span counter, for master spans - atomic32_t remaining_spans; - //! Alignment offset - uint32_t align_offset; - //! Owning heap - heap_t *heap; - //! Next span - span_t *next; - //! Previous span - span_t *prev; -}; -_Static_assert(sizeof(span_t) <= SPAN_HEADER_SIZE, "span size mismatch"); - -struct span_cache_t { - size_t count; - span_t *span[MAX_THREAD_SPAN_CACHE]; -}; -typedef struct span_cache_t span_cache_t; - -struct span_large_cache_t { - size_t count; - span_t *span[MAX_THREAD_SPAN_LARGE_CACHE]; -}; -typedef struct span_large_cache_t span_large_cache_t; - -struct heap_size_class_t { - //! Free list of active span - void *free_list; - //! Double linked list of partially used spans with free blocks. - // Previous span pointer in head points to tail span of list. - span_t *partial_span; - //! Early level cache of fully free spans - span_t *cache; -}; -typedef struct heap_size_class_t heap_size_class_t; - -// Control structure for a heap, either a thread heap or a first class heap if -// enabled -struct heap_t { - //! Owning thread ID - uintptr_t owner_thread; - //! Free lists for each size class - heap_size_class_t size_class[SIZE_CLASS_COUNT]; -#if ENABLE_THREAD_CACHE - //! Arrays of fully freed spans, single span - span_cache_t span_cache; -#endif - //! List of deferred free spans (single linked list) - atomicptr_t span_free_deferred; - //! Number of full spans - size_t full_span_count; - //! Mapped but unused spans - span_t *span_reserve; - //! Master span for mapped but unused spans - span_t *span_reserve_master; - //! Number of mapped but unused spans - uint32_t spans_reserved; - //! Child count - atomic32_t child_count; - //! Next heap in id list - heap_t *next_heap; - //! Next heap in orphan list - heap_t *next_orphan; - //! Heap ID - int32_t id; - //! Finalization state flag - int finalize; - //! Master heap owning the memory pages - heap_t *master_heap; -#if ENABLE_THREAD_CACHE - //! Arrays of fully freed spans, large spans with > 1 span count - span_large_cache_t span_large_cache[LARGE_CLASS_COUNT - 1]; -#endif -#if RPMALLOC_FIRST_CLASS_HEAPS - //! Double linked list of fully utilized spans with free blocks for each size - //! class. - // Previous span pointer in head points to tail span of list. - span_t *full_span[SIZE_CLASS_COUNT]; - //! Double linked list of large and huge spans allocated by this heap - span_t *large_huge_span; -#endif -#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS - //! Current and high water mark of spans used per span count - span_use_t span_use[LARGE_CLASS_COUNT]; -#endif -#if ENABLE_STATISTICS - //! Allocation stats per size class - size_class_use_t size_class_use[SIZE_CLASS_COUNT + 1]; - //! Number of bytes transitioned thread -> global - atomic64_t thread_to_global; - //! Number of bytes transitioned global -> thread - atomic64_t global_to_thread; -#endif -}; - -// Size class for defining a block size bucket -struct size_class_t { - //! Size of blocks in this class - uint32_t block_size; - //! Number of blocks in each chunk - uint16_t block_count; - //! Class index this class is merged with - uint16_t class_idx; -}; -_Static_assert(sizeof(size_class_t) == 8, "Size class size mismatch"); - -struct global_cache_t { - //! Cache lock - atomic32_t lock; - //! Cache count - uint32_t count; -#if ENABLE_STATISTICS - //! Insert count - size_t insert_count; - //! Extract count - size_t extract_count; -#endif - //! Cached spans - span_t *span[GLOBAL_CACHE_MULTIPLIER * MAX_THREAD_SPAN_CACHE]; - //! Unlimited cache overflow - span_t *overflow; -}; - -//////////// -/// -/// Global data -/// -////// - -//! Default span size (64KiB) -#define _memory_default_span_size (64 * 1024) -#define _memory_default_span_size_shift 16 -#define _memory_default_span_mask (~((uintptr_t)(_memory_span_size - 1))) - -//! Initialized flag -static int _rpmalloc_initialized; -//! Main thread ID -static uintptr_t _rpmalloc_main_thread_id; -//! Configuration -static rpmalloc_config_t _memory_config; -//! Memory page size -static size_t _memory_page_size; -//! Shift to divide by page size -static size_t _memory_page_size_shift; -//! Granularity at which memory pages are mapped by OS -static size_t _memory_map_granularity; -#if RPMALLOC_CONFIGURABLE -//! Size of a span of memory pages -static size_t _memory_span_size; -//! Shift to divide by span size -static size_t _memory_span_size_shift; -//! Mask to get to start of a memory span -static uintptr_t _memory_span_mask; -#else -//! Hardwired span size -#define _memory_span_size _memory_default_span_size -#define _memory_span_size_shift _memory_default_span_size_shift -#define _memory_span_mask _memory_default_span_mask -#endif -//! Number of spans to map in each map call -static size_t _memory_span_map_count; -//! Number of spans to keep reserved in each heap -static size_t _memory_heap_reserve_count; -//! Global size classes -static size_class_t _memory_size_class[SIZE_CLASS_COUNT]; -//! Run-time size limit of medium blocks -static size_t _memory_medium_size_limit; -//! Heap ID counter -static atomic32_t _memory_heap_id; -//! Huge page support -static int _memory_huge_pages; -#if ENABLE_GLOBAL_CACHE -//! Global span cache -static global_cache_t _memory_span_cache[LARGE_CLASS_COUNT]; -#endif -//! Global reserved spans -static span_t *_memory_global_reserve; -//! Global reserved count -static size_t _memory_global_reserve_count; -//! Global reserved master -static span_t *_memory_global_reserve_master; -//! All heaps -static heap_t *_memory_heaps[HEAP_ARRAY_SIZE]; -//! Used to restrict access to mapping memory for huge pages -static atomic32_t _memory_global_lock; -//! Orphaned heaps -static heap_t *_memory_orphan_heaps; -#if RPMALLOC_FIRST_CLASS_HEAPS -//! Orphaned heaps (first class heaps) -static heap_t *_memory_first_class_orphan_heaps; -#endif -#if ENABLE_STATISTICS -//! Allocations counter -static atomic64_t _allocation_counter; -//! Deallocations counter -static atomic64_t _deallocation_counter; -//! Active heap count -static atomic32_t _memory_active_heaps; -//! Number of currently mapped memory pages -static atomic32_t _mapped_pages; -//! Peak number of concurrently mapped memory pages -static int32_t _mapped_pages_peak; -//! Number of mapped master spans -static atomic32_t _master_spans; -//! Number of unmapped dangling master spans -static atomic32_t _unmapped_master_spans; -//! Running counter of total number of mapped memory pages since start -static atomic32_t _mapped_total; -//! Running counter of total number of unmapped memory pages since start -static atomic32_t _unmapped_total; -//! Number of currently mapped memory pages in OS calls -static atomic32_t _mapped_pages_os; -//! Number of currently allocated pages in huge allocations -static atomic32_t _huge_pages_current; -//! Peak number of currently allocated pages in huge allocations -static int32_t _huge_pages_peak; -#endif - -//////////// -/// -/// Thread local heap and ID -/// -////// - -//! Current thread heap -#if ((defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD) || \ - defined(__TINYC__) -static pthread_key_t _memory_thread_heap; -#else -#ifdef _MSC_VER -#define _Thread_local __declspec(thread) -#define TLS_MODEL -#else -#ifndef __HAIKU__ -#define TLS_MODEL __attribute__((tls_model("initial-exec"))) -#else -#define TLS_MODEL -#endif -#if !defined(__clang__) && defined(__GNUC__) -#define _Thread_local __thread -#endif -#endif -static _Thread_local heap_t *_memory_thread_heap TLS_MODEL; -#endif - -static inline heap_t *get_thread_heap_raw(void) { -#if (defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD - return pthread_getspecific(_memory_thread_heap); -#else - return _memory_thread_heap; -#endif -} - -//! Get the current thread heap -static inline heap_t *get_thread_heap(void) { - heap_t *heap = get_thread_heap_raw(); -#if ENABLE_PRELOAD - if (EXPECTED(heap != 0)) - return heap; - rpmalloc_initialize(); - return get_thread_heap_raw(); -#else - return heap; -#endif -} - -//! Fast thread ID -static inline uintptr_t get_thread_id(void) { -#if defined(_WIN32) - return (uintptr_t)((void *)NtCurrentTeb()); -#elif (defined(__GNUC__) || defined(__clang__)) && !defined(__CYGWIN__) - uintptr_t tid; -#if defined(__i386__) - __asm__("movl %%gs:0, %0" : "=r"(tid) : :); -#elif defined(__x86_64__) -#if defined(__MACH__) - __asm__("movq %%gs:0, %0" : "=r"(tid) : :); -#else - __asm__("movq %%fs:0, %0" : "=r"(tid) : :); -#endif -#elif defined(__arm__) - __asm__ volatile("mrc p15, 0, %0, c13, c0, 3" : "=r"(tid)); -#elif defined(__aarch64__) -#if defined(__MACH__) - // tpidr_el0 likely unused, always return 0 on iOS - __asm__ volatile("mrs %0, tpidrro_el0" : "=r"(tid)); -#else - __asm__ volatile("mrs %0, tpidr_el0" : "=r"(tid)); -#endif -#else -#error This platform needs implementation of get_thread_id() -#endif - return tid; -#else -#error This platform needs implementation of get_thread_id() -#endif -} - -//! Set the current thread heap -static void set_thread_heap(heap_t *heap) { -#if ((defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD) || \ - defined(__TINYC__) - pthread_setspecific(_memory_thread_heap, heap); -#else - _memory_thread_heap = heap; -#endif - if (heap) - heap->owner_thread = get_thread_id(); -} - -//! Set main thread ID -extern void rpmalloc_set_main_thread(void); - -void rpmalloc_set_main_thread(void) { - _rpmalloc_main_thread_id = get_thread_id(); -} - -static void _rpmalloc_spin(void) { -#if defined(_MSC_VER) -#if defined(_M_ARM64) - __yield(); -#else - _mm_pause(); -#endif -#elif defined(__x86_64__) || defined(__i386__) - __asm__ volatile("pause" ::: "memory"); -#elif defined(__aarch64__) || (defined(__arm__) && __ARM_ARCH >= 7) - __asm__ volatile("yield" ::: "memory"); -#elif defined(__powerpc__) || defined(__powerpc64__) - // No idea if ever been compiled in such archs but ... as precaution - __asm__ volatile("or 27,27,27"); -#elif defined(__sparc__) - __asm__ volatile("rd %ccr, %g0 \n\trd %ccr, %g0 \n\trd %ccr, %g0"); -#else - struct timespec ts = {0}; - nanosleep(&ts, 0); -#endif -} - -#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) -static void NTAPI _rpmalloc_thread_destructor(void *value) { -#if ENABLE_OVERRIDE - // If this is called on main thread it means rpmalloc_finalize - // has not been called and shutdown is forced (through _exit) or unclean - if (get_thread_id() == _rpmalloc_main_thread_id) - return; -#endif - if (value) - rpmalloc_thread_finalize(1); -} -#endif - -//////////// -/// -/// Low level memory map/unmap -/// -////// - -static void _rpmalloc_set_name(void *address, size_t size) { -#if defined(__linux__) || defined(__ANDROID__) - const char *name = _memory_huge_pages ? _memory_config.huge_page_name - : _memory_config.page_name; - if (address == MAP_FAILED || !name) - return; - // If the kernel does not support CONFIG_ANON_VMA_NAME or if the call fails - // (e.g. invalid name) it is a no-op basically. - (void)prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, (uintptr_t)address, size, - (uintptr_t)name); -#else - (void)sizeof(size); - (void)sizeof(address); -#endif -} - -//! Map more virtual memory -// size is number of bytes to map -// offset receives the offset in bytes from start of mapped region -// returns address to start of mapped region to use -static void *_rpmalloc_mmap(size_t size, size_t *offset) { - rpmalloc_assert(!(size % _memory_page_size), "Invalid mmap size"); - rpmalloc_assert(size >= _memory_page_size, "Invalid mmap size"); - void *address = _memory_config.memory_map(size, offset); - if (EXPECTED(address != 0)) { - _rpmalloc_stat_add_peak(&_mapped_pages, (size >> _memory_page_size_shift), - _mapped_pages_peak); - _rpmalloc_stat_add(&_mapped_total, (size >> _memory_page_size_shift)); - } - return address; -} - -//! Unmap virtual memory -// address is the memory address to unmap, as returned from _memory_map -// size is the number of bytes to unmap, which might be less than full region -// for a partial unmap offset is the offset in bytes to the actual mapped -// region, as set by _memory_map release is set to 0 for partial unmap, or size -// of entire range for a full unmap -static void _rpmalloc_unmap(void *address, size_t size, size_t offset, - size_t release) { - rpmalloc_assert(!release || (release >= size), "Invalid unmap size"); - rpmalloc_assert(!release || (release >= _memory_page_size), - "Invalid unmap size"); - if (release) { - rpmalloc_assert(!(release % _memory_page_size), "Invalid unmap size"); - _rpmalloc_stat_sub(&_mapped_pages, (release >> _memory_page_size_shift)); - _rpmalloc_stat_add(&_unmapped_total, (release >> _memory_page_size_shift)); - } - _memory_config.memory_unmap(address, size, offset, release); -} - -//! Default implementation to map new pages to virtual memory -static void *_rpmalloc_mmap_os(size_t size, size_t *offset) { - // Either size is a heap (a single page) or a (multiple) span - we only need - // to align spans, and only if larger than map granularity - size_t padding = ((size >= _memory_span_size) && - (_memory_span_size > _memory_map_granularity)) - ? _memory_span_size - : 0; - rpmalloc_assert(size >= _memory_page_size, "Invalid mmap size"); -#if PLATFORM_WINDOWS - // Ok to MEM_COMMIT - according to MSDN, "actual physical pages are not - // allocated unless/until the virtual addresses are actually accessed" - void *ptr = VirtualAlloc(0, size + padding, - (_memory_huge_pages ? MEM_LARGE_PAGES : 0) | - MEM_RESERVE | MEM_COMMIT, - PAGE_READWRITE); - if (!ptr) { - if (_memory_config.map_fail_callback) { - if (_memory_config.map_fail_callback(size + padding)) - return _rpmalloc_mmap_os(size, offset); - } else { - rpmalloc_assert(ptr, "Failed to map virtual memory block"); - } - return 0; - } -#else - int flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_UNINITIALIZED; -#if defined(__APPLE__) && !TARGET_OS_IPHONE && !TARGET_OS_SIMULATOR - int fd = (int)VM_MAKE_TAG(240U); - if (_memory_huge_pages) - fd |= VM_FLAGS_SUPERPAGE_SIZE_2MB; - void *ptr = mmap(0, size + padding, PROT_READ | PROT_WRITE, flags, fd, 0); -#elif defined(MAP_HUGETLB) - void *ptr = mmap(0, size + padding, - PROT_READ | PROT_WRITE | PROT_MAX(PROT_READ | PROT_WRITE), - (_memory_huge_pages ? MAP_HUGETLB : 0) | flags, -1, 0); -#if defined(MADV_HUGEPAGE) - // In some configurations, huge pages allocations might fail thus - // we fallback to normal allocations and promote the region as transparent - // huge page - if ((ptr == MAP_FAILED || !ptr) && _memory_huge_pages) { - ptr = mmap(0, size + padding, PROT_READ | PROT_WRITE, flags, -1, 0); - if (ptr && ptr != MAP_FAILED) { - int prm = madvise(ptr, size + padding, MADV_HUGEPAGE); - (void)prm; - rpmalloc_assert((prm == 0), "Failed to promote the page to THP"); - } - } -#endif - _rpmalloc_set_name(ptr, size + padding); -#elif defined(MAP_ALIGNED) - const size_t align = - (sizeof(size_t) * 8) - (size_t)(__builtin_clzl(size - 1)); - void *ptr = - mmap(0, size + padding, PROT_READ | PROT_WRITE, - (_memory_huge_pages ? MAP_ALIGNED(align) : 0) | flags, -1, 0); -#elif defined(MAP_ALIGN) - caddr_t base = (_memory_huge_pages ? (caddr_t)(4 << 20) : 0); - void *ptr = mmap(base, size + padding, PROT_READ | PROT_WRITE, - (_memory_huge_pages ? MAP_ALIGN : 0) | flags, -1, 0); -#else - void *ptr = mmap(0, size + padding, PROT_READ | PROT_WRITE, flags, -1, 0); -#endif - if ((ptr == MAP_FAILED) || !ptr) { - if (_memory_config.map_fail_callback) { - if (_memory_config.map_fail_callback(size + padding)) - return _rpmalloc_mmap_os(size, offset); - } else if (errno != ENOMEM) { - rpmalloc_assert((ptr != MAP_FAILED) && ptr, - "Failed to map virtual memory block"); - } - return 0; - } -#endif - _rpmalloc_stat_add(&_mapped_pages_os, - (int32_t)((size + padding) >> _memory_page_size_shift)); - if (padding) { - size_t final_padding = padding - ((uintptr_t)ptr & ~_memory_span_mask); - rpmalloc_assert(final_padding <= _memory_span_size, - "Internal failure in padding"); - rpmalloc_assert(final_padding <= padding, "Internal failure in padding"); - rpmalloc_assert(!(final_padding % 8), "Internal failure in padding"); - ptr = pointer_offset(ptr, final_padding); - *offset = final_padding >> 3; - } - rpmalloc_assert((size < _memory_span_size) || - !((uintptr_t)ptr & ~_memory_span_mask), - "Internal failure in padding"); - return ptr; -} - -//! Default implementation to unmap pages from virtual memory -static void _rpmalloc_unmap_os(void *address, size_t size, size_t offset, - size_t release) { - rpmalloc_assert(release || (offset == 0), "Invalid unmap size"); - rpmalloc_assert(!release || (release >= _memory_page_size), - "Invalid unmap size"); - rpmalloc_assert(size >= _memory_page_size, "Invalid unmap size"); - if (release && offset) { - offset <<= 3; - address = pointer_offset(address, -(int32_t)offset); - if ((release >= _memory_span_size) && - (_memory_span_size > _memory_map_granularity)) { - // Padding is always one span size - release += _memory_span_size; - } - } -#if !DISABLE_UNMAP -#if PLATFORM_WINDOWS - if (!VirtualFree(address, release ? 0 : size, - release ? MEM_RELEASE : MEM_DECOMMIT)) { - rpmalloc_assert(0, "Failed to unmap virtual memory block"); - } -#else - if (release) { - if (munmap(address, release)) { - rpmalloc_assert(0, "Failed to unmap virtual memory block"); - } - } else { -#if defined(MADV_FREE_REUSABLE) - int ret; - while ((ret = madvise(address, size, MADV_FREE_REUSABLE)) == -1 && - (errno == EAGAIN)) - errno = 0; - if ((ret == -1) && (errno != 0)) { -#elif defined(MADV_DONTNEED) - if (madvise(address, size, MADV_DONTNEED)) { -#elif defined(MADV_PAGEOUT) - if (madvise(address, size, MADV_PAGEOUT)) { -#elif defined(MADV_FREE) - if (madvise(address, size, MADV_FREE)) { -#else - if (posix_madvise(address, size, POSIX_MADV_DONTNEED)) { -#endif - rpmalloc_assert(0, "Failed to madvise virtual memory block as free"); - } - } -#endif -#endif - if (release) - _rpmalloc_stat_sub(&_mapped_pages_os, release >> _memory_page_size_shift); -} - -static void _rpmalloc_span_mark_as_subspan_unless_master(span_t *master, - span_t *subspan, - size_t span_count); - -//! Use global reserved spans to fulfill a memory map request (reserve size must -//! be checked by caller) -static span_t *_rpmalloc_global_get_reserved_spans(size_t span_count) { - span_t *span = _memory_global_reserve; - _rpmalloc_span_mark_as_subspan_unless_master(_memory_global_reserve_master, - span, span_count); - _memory_global_reserve_count -= span_count; - if (_memory_global_reserve_count) - _memory_global_reserve = - (span_t *)pointer_offset(span, span_count << _memory_span_size_shift); - else - _memory_global_reserve = 0; - return span; -} - -//! Store the given spans as global reserve (must only be called from within new -//! heap allocation, not thread safe) -static void _rpmalloc_global_set_reserved_spans(span_t *master, span_t *reserve, - size_t reserve_span_count) { - _memory_global_reserve_master = master; - _memory_global_reserve_count = reserve_span_count; - _memory_global_reserve = reserve; -} - -//////////// -/// -/// Span linked list management -/// -////// - -//! Add a span to double linked list at the head -static void _rpmalloc_span_double_link_list_add(span_t **head, span_t *span) { - if (*head) - (*head)->prev = span; - span->next = *head; - *head = span; -} - -//! Pop head span from double linked list -static void _rpmalloc_span_double_link_list_pop_head(span_t **head, - span_t *span) { - rpmalloc_assert(*head == span, "Linked list corrupted"); - span = *head; - *head = span->next; -} - -//! Remove a span from double linked list -static void _rpmalloc_span_double_link_list_remove(span_t **head, - span_t *span) { - rpmalloc_assert(*head, "Linked list corrupted"); - if (*head == span) { - *head = span->next; - } else { - span_t *next_span = span->next; - span_t *prev_span = span->prev; - prev_span->next = next_span; - if (EXPECTED(next_span != 0)) - next_span->prev = prev_span; - } -} - -//////////// -/// -/// Span control -/// -////// - -static void _rpmalloc_heap_cache_insert(heap_t *heap, span_t *span); - -static void _rpmalloc_heap_finalize(heap_t *heap); - -static void _rpmalloc_heap_set_reserved_spans(heap_t *heap, span_t *master, - span_t *reserve, - size_t reserve_span_count); - -//! Declare the span to be a subspan and store distance from master span and -//! span count -static void _rpmalloc_span_mark_as_subspan_unless_master(span_t *master, - span_t *subspan, - size_t span_count) { - rpmalloc_assert((subspan != master) || (subspan->flags & SPAN_FLAG_MASTER), - "Span master pointer and/or flag mismatch"); - if (subspan != master) { - subspan->flags = SPAN_FLAG_SUBSPAN; - subspan->offset_from_master = - (uint32_t)((uintptr_t)pointer_diff(subspan, master) >> - _memory_span_size_shift); - subspan->align_offset = 0; - } - subspan->span_count = (uint32_t)span_count; -} - -//! Use reserved spans to fulfill a memory map request (reserve size must be -//! checked by caller) -static span_t *_rpmalloc_span_map_from_reserve(heap_t *heap, - size_t span_count) { - // Update the heap span reserve - span_t *span = heap->span_reserve; - heap->span_reserve = - (span_t *)pointer_offset(span, span_count * _memory_span_size); - heap->spans_reserved -= (uint32_t)span_count; - - _rpmalloc_span_mark_as_subspan_unless_master(heap->span_reserve_master, span, - span_count); - if (span_count <= LARGE_CLASS_COUNT) - _rpmalloc_stat_inc(&heap->span_use[span_count - 1].spans_from_reserved); - - return span; -} - -//! Get the aligned number of spans to map in based on wanted count, configured -//! mapping granularity and the page size -static size_t _rpmalloc_span_align_count(size_t span_count) { - size_t request_count = (span_count > _memory_span_map_count) - ? span_count - : _memory_span_map_count; - if ((_memory_page_size > _memory_span_size) && - ((request_count * _memory_span_size) % _memory_page_size)) - request_count += - _memory_span_map_count - (request_count % _memory_span_map_count); - return request_count; -} - -//! Setup a newly mapped span -static void _rpmalloc_span_initialize(span_t *span, size_t total_span_count, - size_t span_count, size_t align_offset) { - span->total_spans = (uint32_t)total_span_count; - span->span_count = (uint32_t)span_count; - span->align_offset = (uint32_t)align_offset; - span->flags = SPAN_FLAG_MASTER; - atomic_store32(&span->remaining_spans, (int32_t)total_span_count); -} - -static void _rpmalloc_span_unmap(span_t *span); - -//! Map an aligned set of spans, taking configured mapping granularity and the -//! page size into account -static span_t *_rpmalloc_span_map_aligned_count(heap_t *heap, - size_t span_count) { - // If we already have some, but not enough, reserved spans, release those to - // heap cache and map a new full set of spans. Otherwise we would waste memory - // if page size > span size (huge pages) - size_t aligned_span_count = _rpmalloc_span_align_count(span_count); - size_t align_offset = 0; - span_t *span = (span_t *)_rpmalloc_mmap( - aligned_span_count * _memory_span_size, &align_offset); - if (!span) - return 0; - _rpmalloc_span_initialize(span, aligned_span_count, span_count, align_offset); - _rpmalloc_stat_inc(&_master_spans); - if (span_count <= LARGE_CLASS_COUNT) - _rpmalloc_stat_inc(&heap->span_use[span_count - 1].spans_map_calls); - if (aligned_span_count > span_count) { - span_t *reserved_spans = - (span_t *)pointer_offset(span, span_count * _memory_span_size); - size_t reserved_count = aligned_span_count - span_count; - if (heap->spans_reserved) { - _rpmalloc_span_mark_as_subspan_unless_master( - heap->span_reserve_master, heap->span_reserve, heap->spans_reserved); - _rpmalloc_heap_cache_insert(heap, heap->span_reserve); - } - if (reserved_count > _memory_heap_reserve_count) { - // If huge pages or eager spam map count, the global reserve spin lock is - // held by caller, _rpmalloc_span_map - rpmalloc_assert(atomic_load32(&_memory_global_lock) == 1, - "Global spin lock not held as expected"); - size_t remain_count = reserved_count - _memory_heap_reserve_count; - reserved_count = _memory_heap_reserve_count; - span_t *remain_span = (span_t *)pointer_offset( - reserved_spans, reserved_count * _memory_span_size); - if (_memory_global_reserve) { - _rpmalloc_span_mark_as_subspan_unless_master( - _memory_global_reserve_master, _memory_global_reserve, - _memory_global_reserve_count); - _rpmalloc_span_unmap(_memory_global_reserve); - } - _rpmalloc_global_set_reserved_spans(span, remain_span, remain_count); - } - _rpmalloc_heap_set_reserved_spans(heap, span, reserved_spans, - reserved_count); - } - return span; -} - -//! Map in memory pages for the given number of spans (or use previously -//! reserved pages) -static span_t *_rpmalloc_span_map(heap_t *heap, size_t span_count) { - if (span_count <= heap->spans_reserved) - return _rpmalloc_span_map_from_reserve(heap, span_count); - span_t *span = 0; - int use_global_reserve = - (_memory_page_size > _memory_span_size) || - (_memory_span_map_count > _memory_heap_reserve_count); - if (use_global_reserve) { - // If huge pages, make sure only one thread maps more memory to avoid bloat - while (!atomic_cas32_acquire(&_memory_global_lock, 1, 0)) - _rpmalloc_spin(); - if (_memory_global_reserve_count >= span_count) { - size_t reserve_count = - (!heap->spans_reserved ? _memory_heap_reserve_count : span_count); - if (_memory_global_reserve_count < reserve_count) - reserve_count = _memory_global_reserve_count; - span = _rpmalloc_global_get_reserved_spans(reserve_count); - if (span) { - if (reserve_count > span_count) { - span_t *reserved_span = (span_t *)pointer_offset( - span, span_count << _memory_span_size_shift); - _rpmalloc_heap_set_reserved_spans(heap, _memory_global_reserve_master, - reserved_span, - reserve_count - span_count); - } - // Already marked as subspan in _rpmalloc_global_get_reserved_spans - span->span_count = (uint32_t)span_count; - } - } - } - if (!span) - span = _rpmalloc_span_map_aligned_count(heap, span_count); - if (use_global_reserve) - atomic_store32_release(&_memory_global_lock, 0); - return span; -} - -//! Unmap memory pages for the given number of spans (or mark as unused if no -//! partial unmappings) -static void _rpmalloc_span_unmap(span_t *span) { - rpmalloc_assert((span->flags & SPAN_FLAG_MASTER) || - (span->flags & SPAN_FLAG_SUBSPAN), - "Span flag corrupted"); - rpmalloc_assert(!(span->flags & SPAN_FLAG_MASTER) || - !(span->flags & SPAN_FLAG_SUBSPAN), - "Span flag corrupted"); - - int is_master = !!(span->flags & SPAN_FLAG_MASTER); - span_t *master = - is_master ? span - : ((span_t *)pointer_offset( - span, -(intptr_t)((uintptr_t)span->offset_from_master * - _memory_span_size))); - rpmalloc_assert(is_master || (span->flags & SPAN_FLAG_SUBSPAN), - "Span flag corrupted"); - rpmalloc_assert(master->flags & SPAN_FLAG_MASTER, "Span flag corrupted"); - - size_t span_count = span->span_count; - if (!is_master) { - // Directly unmap subspans (unless huge pages, in which case we defer and - // unmap entire page range with master) - rpmalloc_assert(span->align_offset == 0, "Span align offset corrupted"); - if (_memory_span_size >= _memory_page_size) - _rpmalloc_unmap(span, span_count * _memory_span_size, 0, 0); - } else { - // Special double flag to denote an unmapped master - // It must be kept in memory since span header must be used - span->flags |= - SPAN_FLAG_MASTER | SPAN_FLAG_SUBSPAN | SPAN_FLAG_UNMAPPED_MASTER; - _rpmalloc_stat_add(&_unmapped_master_spans, 1); - } - - if (atomic_add32(&master->remaining_spans, -(int32_t)span_count) <= 0) { - // Everything unmapped, unmap the master span with release flag to unmap the - // entire range of the super span - rpmalloc_assert(!!(master->flags & SPAN_FLAG_MASTER) && - !!(master->flags & SPAN_FLAG_SUBSPAN), - "Span flag corrupted"); - size_t unmap_count = master->span_count; - if (_memory_span_size < _memory_page_size) - unmap_count = master->total_spans; - _rpmalloc_stat_sub(&_master_spans, 1); - _rpmalloc_stat_sub(&_unmapped_master_spans, 1); - _rpmalloc_unmap(master, unmap_count * _memory_span_size, - master->align_offset, - (size_t)master->total_spans * _memory_span_size); - } -} - -//! Move the span (used for small or medium allocations) to the heap thread -//! cache -static void _rpmalloc_span_release_to_cache(heap_t *heap, span_t *span) { - rpmalloc_assert(heap == span->heap, "Span heap pointer corrupted"); - rpmalloc_assert(span->size_class < SIZE_CLASS_COUNT, - "Invalid span size class"); - rpmalloc_assert(span->span_count == 1, "Invalid span count"); -#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS - atomic_decr32(&heap->span_use[0].current); -#endif - _rpmalloc_stat_dec(&heap->size_class_use[span->size_class].spans_current); - if (!heap->finalize) { - _rpmalloc_stat_inc(&heap->span_use[0].spans_to_cache); - _rpmalloc_stat_inc(&heap->size_class_use[span->size_class].spans_to_cache); - if (heap->size_class[span->size_class].cache) - _rpmalloc_heap_cache_insert(heap, - heap->size_class[span->size_class].cache); - heap->size_class[span->size_class].cache = span; - } else { - _rpmalloc_span_unmap(span); - } -} - -//! Initialize a (partial) free list up to next system memory page, while -//! reserving the first block as allocated, returning number of blocks in list -static uint32_t free_list_partial_init(void **list, void **first_block, - void *page_start, void *block_start, - uint32_t block_count, - uint32_t block_size) { - rpmalloc_assert(block_count, "Internal failure"); - *first_block = block_start; - if (block_count > 1) { - void *free_block = pointer_offset(block_start, block_size); - void *block_end = - pointer_offset(block_start, (size_t)block_size * block_count); - // If block size is less than half a memory page, bound init to next memory - // page boundary - if (block_size < (_memory_page_size >> 1)) { - void *page_end = pointer_offset(page_start, _memory_page_size); - if (page_end < block_end) - block_end = page_end; - } - *list = free_block; - block_count = 2; - void *next_block = pointer_offset(free_block, block_size); - while (next_block < block_end) { - *((void **)free_block) = next_block; - free_block = next_block; - ++block_count; - next_block = pointer_offset(next_block, block_size); - } - *((void **)free_block) = 0; - } else { - *list = 0; - } - return block_count; -} - -//! Initialize an unused span (from cache or mapped) to be new active span, -//! putting the initial free list in heap class free list -static void *_rpmalloc_span_initialize_new(heap_t *heap, - heap_size_class_t *heap_size_class, - span_t *span, uint32_t class_idx) { - rpmalloc_assert(span->span_count == 1, "Internal failure"); - size_class_t *size_class = _memory_size_class + class_idx; - span->size_class = class_idx; - span->heap = heap; - span->flags &= ~SPAN_FLAG_ALIGNED_BLOCKS; - span->block_size = size_class->block_size; - span->block_count = size_class->block_count; - span->free_list = 0; - span->list_size = 0; - atomic_store_ptr_release(&span->free_list_deferred, 0); - - // Setup free list. Only initialize one system page worth of free blocks in - // list - void *block; - span->free_list_limit = - free_list_partial_init(&heap_size_class->free_list, &block, span, - pointer_offset(span, SPAN_HEADER_SIZE), - size_class->block_count, size_class->block_size); - // Link span as partial if there remains blocks to be initialized as free - // list, or full if fully initialized - if (span->free_list_limit < span->block_count) { - _rpmalloc_span_double_link_list_add(&heap_size_class->partial_span, span); - span->used_count = span->free_list_limit; - } else { -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_add(&heap->full_span[class_idx], span); -#endif - ++heap->full_span_count; - span->used_count = span->block_count; - } - return block; -} - -static void _rpmalloc_span_extract_free_list_deferred(span_t *span) { - // We need acquire semantics on the CAS operation since we are interested in - // the list size Refer to _rpmalloc_deallocate_defer_small_or_medium for - // further comments on this dependency - do { - span->free_list = - atomic_exchange_ptr_acquire(&span->free_list_deferred, INVALID_POINTER); - } while (span->free_list == INVALID_POINTER); - span->used_count -= span->list_size; - span->list_size = 0; - atomic_store_ptr_release(&span->free_list_deferred, 0); -} - -static int _rpmalloc_span_is_fully_utilized(span_t *span) { - rpmalloc_assert(span->free_list_limit <= span->block_count, - "Span free list corrupted"); - return !span->free_list && (span->free_list_limit >= span->block_count); -} - -static int _rpmalloc_span_finalize(heap_t *heap, size_t iclass, span_t *span, - span_t **list_head) { - void *free_list = heap->size_class[iclass].free_list; - span_t *class_span = (span_t *)((uintptr_t)free_list & _memory_span_mask); - if (span == class_span) { - // Adopt the heap class free list back into the span free list - void *block = span->free_list; - void *last_block = 0; - while (block) { - last_block = block; - block = *((void **)block); - } - uint32_t free_count = 0; - block = free_list; - while (block) { - ++free_count; - block = *((void **)block); - } - if (last_block) { - *((void **)last_block) = free_list; - } else { - span->free_list = free_list; - } - heap->size_class[iclass].free_list = 0; - span->used_count -= free_count; - } - // If this assert triggers you have memory leaks - rpmalloc_assert(span->list_size == span->used_count, "Memory leak detected"); - if (span->list_size == span->used_count) { - _rpmalloc_stat_dec(&heap->span_use[0].current); - _rpmalloc_stat_dec(&heap->size_class_use[iclass].spans_current); - // This function only used for spans in double linked lists - if (list_head) - _rpmalloc_span_double_link_list_remove(list_head, span); - _rpmalloc_span_unmap(span); - return 1; - } - return 0; -} - -//////////// -/// -/// Global cache -/// -////// - -#if ENABLE_GLOBAL_CACHE - -//! Finalize a global cache -static void _rpmalloc_global_cache_finalize(global_cache_t *cache) { - while (!atomic_cas32_acquire(&cache->lock, 1, 0)) - _rpmalloc_spin(); - - for (size_t ispan = 0; ispan < cache->count; ++ispan) - _rpmalloc_span_unmap(cache->span[ispan]); - cache->count = 0; - - while (cache->overflow) { - span_t *span = cache->overflow; - cache->overflow = span->next; - _rpmalloc_span_unmap(span); - } - - atomic_store32_release(&cache->lock, 0); -} - -static void _rpmalloc_global_cache_insert_spans(span_t **span, - size_t span_count, - size_t count) { - const size_t cache_limit = - (span_count == 1) ? GLOBAL_CACHE_MULTIPLIER * MAX_THREAD_SPAN_CACHE - : GLOBAL_CACHE_MULTIPLIER * - (MAX_THREAD_SPAN_LARGE_CACHE - (span_count >> 1)); - - global_cache_t *cache = &_memory_span_cache[span_count - 1]; - - size_t insert_count = count; - while (!atomic_cas32_acquire(&cache->lock, 1, 0)) - _rpmalloc_spin(); - -#if ENABLE_STATISTICS - cache->insert_count += count; -#endif - if ((cache->count + insert_count) > cache_limit) - insert_count = cache_limit - cache->count; - - memcpy(cache->span + cache->count, span, sizeof(span_t *) * insert_count); - cache->count += (uint32_t)insert_count; - -#if ENABLE_UNLIMITED_CACHE - while (insert_count < count) { -#else - // Enable unlimited cache if huge pages, or we will leak since it is unlikely - // that an entire huge page will be unmapped, and we're unable to partially - // decommit a huge page - while ((_memory_page_size > _memory_span_size) && (insert_count < count)) { -#endif - span_t *current_span = span[insert_count++]; - current_span->next = cache->overflow; - cache->overflow = current_span; - } - atomic_store32_release(&cache->lock, 0); - - span_t *keep = 0; - for (size_t ispan = insert_count; ispan < count; ++ispan) { - span_t *current_span = span[ispan]; - // Keep master spans that has remaining subspans to avoid dangling them - if ((current_span->flags & SPAN_FLAG_MASTER) && - (atomic_load32(¤t_span->remaining_spans) > - (int32_t)current_span->span_count)) { - current_span->next = keep; - keep = current_span; - } else { - _rpmalloc_span_unmap(current_span); - } - } - - if (keep) { - while (!atomic_cas32_acquire(&cache->lock, 1, 0)) - _rpmalloc_spin(); - - size_t islot = 0; - while (keep) { - for (; islot < cache->count; ++islot) { - span_t *current_span = cache->span[islot]; - if (!(current_span->flags & SPAN_FLAG_MASTER) || - ((current_span->flags & SPAN_FLAG_MASTER) && - (atomic_load32(¤t_span->remaining_spans) <= - (int32_t)current_span->span_count))) { - _rpmalloc_span_unmap(current_span); - cache->span[islot] = keep; - break; - } - } - if (islot == cache->count) - break; - keep = keep->next; - } - - if (keep) { - span_t *tail = keep; - while (tail->next) - tail = tail->next; - tail->next = cache->overflow; - cache->overflow = keep; - } - - atomic_store32_release(&cache->lock, 0); - } -} - -static size_t _rpmalloc_global_cache_extract_spans(span_t **span, - size_t span_count, - size_t count) { - global_cache_t *cache = &_memory_span_cache[span_count - 1]; - - size_t extract_count = 0; - while (!atomic_cas32_acquire(&cache->lock, 1, 0)) - _rpmalloc_spin(); - -#if ENABLE_STATISTICS - cache->extract_count += count; -#endif - size_t want = count - extract_count; - if (want > cache->count) - want = cache->count; - - memcpy(span + extract_count, cache->span + (cache->count - want), - sizeof(span_t *) * want); - cache->count -= (uint32_t)want; - extract_count += want; - - while ((extract_count < count) && cache->overflow) { - span_t *current_span = cache->overflow; - span[extract_count++] = current_span; - cache->overflow = current_span->next; - } - -#if ENABLE_ASSERTS - for (size_t ispan = 0; ispan < extract_count; ++ispan) { - rpmalloc_assert(span[ispan]->span_count == span_count, - "Global cache span count mismatch"); - } -#endif - - atomic_store32_release(&cache->lock, 0); - - return extract_count; -} - -#endif - -//////////// -/// -/// Heap control -/// -////// - -static void _rpmalloc_deallocate_huge(span_t *); - -//! Store the given spans as reserve in the given heap -static void _rpmalloc_heap_set_reserved_spans(heap_t *heap, span_t *master, - span_t *reserve, - size_t reserve_span_count) { - heap->span_reserve_master = master; - heap->span_reserve = reserve; - heap->spans_reserved = (uint32_t)reserve_span_count; -} - -//! Adopt the deferred span cache list, optionally extracting the first single -//! span for immediate re-use -static void _rpmalloc_heap_cache_adopt_deferred(heap_t *heap, - span_t **single_span) { - span_t *span = (span_t *)((void *)atomic_exchange_ptr_acquire( - &heap->span_free_deferred, 0)); - while (span) { - span_t *next_span = (span_t *)span->free_list; - rpmalloc_assert(span->heap == heap, "Span heap pointer corrupted"); - if (EXPECTED(span->size_class < SIZE_CLASS_COUNT)) { - rpmalloc_assert(heap->full_span_count, "Heap span counter corrupted"); - --heap->full_span_count; - _rpmalloc_stat_dec(&heap->span_use[0].spans_deferred); -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_remove(&heap->full_span[span->size_class], - span); -#endif - _rpmalloc_stat_dec(&heap->span_use[0].current); - _rpmalloc_stat_dec(&heap->size_class_use[span->size_class].spans_current); - if (single_span && !*single_span) - *single_span = span; - else - _rpmalloc_heap_cache_insert(heap, span); - } else { - if (span->size_class == SIZE_CLASS_HUGE) { - _rpmalloc_deallocate_huge(span); - } else { - rpmalloc_assert(span->size_class == SIZE_CLASS_LARGE, - "Span size class invalid"); - rpmalloc_assert(heap->full_span_count, "Heap span counter corrupted"); - --heap->full_span_count; -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_remove(&heap->large_huge_span, span); -#endif - uint32_t idx = span->span_count - 1; - _rpmalloc_stat_dec(&heap->span_use[idx].spans_deferred); - _rpmalloc_stat_dec(&heap->span_use[idx].current); - if (!idx && single_span && !*single_span) - *single_span = span; - else - _rpmalloc_heap_cache_insert(heap, span); - } - } - span = next_span; - } -} - -static void _rpmalloc_heap_unmap(heap_t *heap) { - if (!heap->master_heap) { - if ((heap->finalize > 1) && !atomic_load32(&heap->child_count)) { - span_t *span = (span_t *)((uintptr_t)heap & _memory_span_mask); - _rpmalloc_span_unmap(span); - } - } else { - if (atomic_decr32(&heap->master_heap->child_count) == 0) { - _rpmalloc_heap_unmap(heap->master_heap); - } - } -} - -static void _rpmalloc_heap_global_finalize(heap_t *heap) { - if (heap->finalize++ > 1) { - --heap->finalize; - return; - } - - _rpmalloc_heap_finalize(heap); - -#if ENABLE_THREAD_CACHE - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - span_cache_t *span_cache; - if (!iclass) - span_cache = &heap->span_cache; - else - span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); - for (size_t ispan = 0; ispan < span_cache->count; ++ispan) - _rpmalloc_span_unmap(span_cache->span[ispan]); - span_cache->count = 0; - } -#endif - - if (heap->full_span_count) { - --heap->finalize; - return; - } - - for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { - if (heap->size_class[iclass].free_list || - heap->size_class[iclass].partial_span) { - --heap->finalize; - return; - } - } - // Heap is now completely free, unmap and remove from heap list - size_t list_idx = (size_t)heap->id % HEAP_ARRAY_SIZE; - heap_t *list_heap = _memory_heaps[list_idx]; - if (list_heap == heap) { - _memory_heaps[list_idx] = heap->next_heap; - } else { - while (list_heap->next_heap != heap) - list_heap = list_heap->next_heap; - list_heap->next_heap = heap->next_heap; - } - - _rpmalloc_heap_unmap(heap); -} - -//! Insert a single span into thread heap cache, releasing to global cache if -//! overflow -static void _rpmalloc_heap_cache_insert(heap_t *heap, span_t *span) { - if (UNEXPECTED(heap->finalize != 0)) { - _rpmalloc_span_unmap(span); - _rpmalloc_heap_global_finalize(heap); - return; - } -#if ENABLE_THREAD_CACHE - size_t span_count = span->span_count; - _rpmalloc_stat_inc(&heap->span_use[span_count - 1].spans_to_cache); - if (span_count == 1) { - span_cache_t *span_cache = &heap->span_cache; - span_cache->span[span_cache->count++] = span; - if (span_cache->count == MAX_THREAD_SPAN_CACHE) { - const size_t remain_count = - MAX_THREAD_SPAN_CACHE - THREAD_SPAN_CACHE_TRANSFER; -#if ENABLE_GLOBAL_CACHE - _rpmalloc_stat_add64(&heap->thread_to_global, - THREAD_SPAN_CACHE_TRANSFER * _memory_span_size); - _rpmalloc_stat_add(&heap->span_use[span_count - 1].spans_to_global, - THREAD_SPAN_CACHE_TRANSFER); - _rpmalloc_global_cache_insert_spans(span_cache->span + remain_count, - span_count, - THREAD_SPAN_CACHE_TRANSFER); -#else - for (size_t ispan = 0; ispan < THREAD_SPAN_CACHE_TRANSFER; ++ispan) - _rpmalloc_span_unmap(span_cache->span[remain_count + ispan]); -#endif - span_cache->count = remain_count; - } - } else { - size_t cache_idx = span_count - 2; - span_large_cache_t *span_cache = heap->span_large_cache + cache_idx; - span_cache->span[span_cache->count++] = span; - const size_t cache_limit = - (MAX_THREAD_SPAN_LARGE_CACHE - (span_count >> 1)); - if (span_cache->count == cache_limit) { - const size_t transfer_limit = 2 + (cache_limit >> 2); - const size_t transfer_count = - (THREAD_SPAN_LARGE_CACHE_TRANSFER <= transfer_limit - ? THREAD_SPAN_LARGE_CACHE_TRANSFER - : transfer_limit); - const size_t remain_count = cache_limit - transfer_count; -#if ENABLE_GLOBAL_CACHE - _rpmalloc_stat_add64(&heap->thread_to_global, - transfer_count * span_count * _memory_span_size); - _rpmalloc_stat_add(&heap->span_use[span_count - 1].spans_to_global, - transfer_count); - _rpmalloc_global_cache_insert_spans(span_cache->span + remain_count, - span_count, transfer_count); -#else - for (size_t ispan = 0; ispan < transfer_count; ++ispan) - _rpmalloc_span_unmap(span_cache->span[remain_count + ispan]); -#endif - span_cache->count = remain_count; - } - } -#else - (void)sizeof(heap); - _rpmalloc_span_unmap(span); -#endif -} - -//! Extract the given number of spans from the different cache levels -static span_t *_rpmalloc_heap_thread_cache_extract(heap_t *heap, - size_t span_count) { - span_t *span = 0; -#if ENABLE_THREAD_CACHE - span_cache_t *span_cache; - if (span_count == 1) - span_cache = &heap->span_cache; - else - span_cache = (span_cache_t *)(heap->span_large_cache + (span_count - 2)); - if (span_cache->count) { - _rpmalloc_stat_inc(&heap->span_use[span_count - 1].spans_from_cache); - return span_cache->span[--span_cache->count]; - } -#endif - return span; -} - -static span_t *_rpmalloc_heap_thread_cache_deferred_extract(heap_t *heap, - size_t span_count) { - span_t *span = 0; - if (span_count == 1) { - _rpmalloc_heap_cache_adopt_deferred(heap, &span); - } else { - _rpmalloc_heap_cache_adopt_deferred(heap, 0); - span = _rpmalloc_heap_thread_cache_extract(heap, span_count); - } - return span; -} - -static span_t *_rpmalloc_heap_reserved_extract(heap_t *heap, - size_t span_count) { - if (heap->spans_reserved >= span_count) - return _rpmalloc_span_map(heap, span_count); - return 0; -} - -//! Extract a span from the global cache -static span_t *_rpmalloc_heap_global_cache_extract(heap_t *heap, - size_t span_count) { -#if ENABLE_GLOBAL_CACHE -#if ENABLE_THREAD_CACHE - span_cache_t *span_cache; - size_t wanted_count; - if (span_count == 1) { - span_cache = &heap->span_cache; - wanted_count = THREAD_SPAN_CACHE_TRANSFER; - } else { - span_cache = (span_cache_t *)(heap->span_large_cache + (span_count - 2)); - wanted_count = THREAD_SPAN_LARGE_CACHE_TRANSFER; - } - span_cache->count = _rpmalloc_global_cache_extract_spans( - span_cache->span, span_count, wanted_count); - if (span_cache->count) { - _rpmalloc_stat_add64(&heap->global_to_thread, - span_count * span_cache->count * _memory_span_size); - _rpmalloc_stat_add(&heap->span_use[span_count - 1].spans_from_global, - span_cache->count); - return span_cache->span[--span_cache->count]; - } -#else - span_t *span = 0; - size_t count = _rpmalloc_global_cache_extract_spans(&span, span_count, 1); - if (count) { - _rpmalloc_stat_add64(&heap->global_to_thread, - span_count * count * _memory_span_size); - _rpmalloc_stat_add(&heap->span_use[span_count - 1].spans_from_global, - count); - return span; - } -#endif -#endif - (void)sizeof(heap); - (void)sizeof(span_count); - return 0; -} - -static void _rpmalloc_inc_span_statistics(heap_t *heap, size_t span_count, - uint32_t class_idx) { - (void)sizeof(heap); - (void)sizeof(span_count); - (void)sizeof(class_idx); -#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS - uint32_t idx = (uint32_t)span_count - 1; - uint32_t current_count = - (uint32_t)atomic_incr32(&heap->span_use[idx].current); - if (current_count > (uint32_t)atomic_load32(&heap->span_use[idx].high)) - atomic_store32(&heap->span_use[idx].high, (int32_t)current_count); - _rpmalloc_stat_add_peak(&heap->size_class_use[class_idx].spans_current, 1, - heap->size_class_use[class_idx].spans_peak); -#endif -} - -//! Get a span from one of the cache levels (thread cache, reserved, global -//! cache) or fallback to mapping more memory -static span_t * -_rpmalloc_heap_extract_new_span(heap_t *heap, - heap_size_class_t *heap_size_class, - size_t span_count, uint32_t class_idx) { - span_t *span; -#if ENABLE_THREAD_CACHE - if (heap_size_class && heap_size_class->cache) { - span = heap_size_class->cache; - heap_size_class->cache = - (heap->span_cache.count - ? heap->span_cache.span[--heap->span_cache.count] - : 0); - _rpmalloc_inc_span_statistics(heap, span_count, class_idx); - return span; - } -#endif - (void)sizeof(class_idx); - // Allow 50% overhead to increase cache hits - size_t base_span_count = span_count; - size_t limit_span_count = - (span_count > 2) ? (span_count + (span_count >> 1)) : span_count; - if (limit_span_count > LARGE_CLASS_COUNT) - limit_span_count = LARGE_CLASS_COUNT; - do { - span = _rpmalloc_heap_thread_cache_extract(heap, span_count); - if (EXPECTED(span != 0)) { - _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_from_cache); - _rpmalloc_inc_span_statistics(heap, span_count, class_idx); - return span; - } - span = _rpmalloc_heap_thread_cache_deferred_extract(heap, span_count); - if (EXPECTED(span != 0)) { - _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_from_cache); - _rpmalloc_inc_span_statistics(heap, span_count, class_idx); - return span; - } - span = _rpmalloc_heap_global_cache_extract(heap, span_count); - if (EXPECTED(span != 0)) { - _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_from_cache); - _rpmalloc_inc_span_statistics(heap, span_count, class_idx); - return span; - } - span = _rpmalloc_heap_reserved_extract(heap, span_count); - if (EXPECTED(span != 0)) { - _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_from_reserved); - _rpmalloc_inc_span_statistics(heap, span_count, class_idx); - return span; - } - ++span_count; - } while (span_count <= limit_span_count); - // Final fallback, map in more virtual memory - span = _rpmalloc_span_map(heap, base_span_count); - _rpmalloc_inc_span_statistics(heap, base_span_count, class_idx); - _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_map_calls); - return span; -} - -static void _rpmalloc_heap_initialize(heap_t *heap) { - _rpmalloc_memset_const(heap, 0, sizeof(heap_t)); - // Get a new heap ID - heap->id = 1 + atomic_incr32(&_memory_heap_id); - - // Link in heap in heap ID map - size_t list_idx = (size_t)heap->id % HEAP_ARRAY_SIZE; - heap->next_heap = _memory_heaps[list_idx]; - _memory_heaps[list_idx] = heap; -} - -static void _rpmalloc_heap_orphan(heap_t *heap, int first_class) { - heap->owner_thread = (uintptr_t)-1; -#if RPMALLOC_FIRST_CLASS_HEAPS - heap_t **heap_list = - (first_class ? &_memory_first_class_orphan_heaps : &_memory_orphan_heaps); -#else - (void)sizeof(first_class); - heap_t **heap_list = &_memory_orphan_heaps; -#endif - heap->next_orphan = *heap_list; - *heap_list = heap; -} - -//! Allocate a new heap from newly mapped memory pages -static heap_t *_rpmalloc_heap_allocate_new(void) { - // Map in pages for a 16 heaps. If page size is greater than required size for - // this, map a page and use first part for heaps and remaining part for spans - // for allocations. Adds a lot of complexity, but saves a lot of memory on - // systems where page size > 64 spans (4MiB) - size_t heap_size = sizeof(heap_t); - size_t aligned_heap_size = 16 * ((heap_size + 15) / 16); - size_t request_heap_count = 16; - size_t heap_span_count = ((aligned_heap_size * request_heap_count) + - sizeof(span_t) + _memory_span_size - 1) / - _memory_span_size; - size_t block_size = _memory_span_size * heap_span_count; - size_t span_count = heap_span_count; - span_t *span = 0; - // If there are global reserved spans, use these first - if (_memory_global_reserve_count >= heap_span_count) { - span = _rpmalloc_global_get_reserved_spans(heap_span_count); - } - if (!span) { - if (_memory_page_size > block_size) { - span_count = _memory_page_size / _memory_span_size; - block_size = _memory_page_size; - // If using huge pages, make sure to grab enough heaps to avoid - // reallocating a huge page just to serve new heaps - size_t possible_heap_count = - (block_size - sizeof(span_t)) / aligned_heap_size; - if (possible_heap_count >= (request_heap_count * 16)) - request_heap_count *= 16; - else if (possible_heap_count < request_heap_count) - request_heap_count = possible_heap_count; - heap_span_count = ((aligned_heap_size * request_heap_count) + - sizeof(span_t) + _memory_span_size - 1) / - _memory_span_size; - } - - size_t align_offset = 0; - span = (span_t *)_rpmalloc_mmap(block_size, &align_offset); - if (!span) - return 0; - - // Master span will contain the heaps - _rpmalloc_stat_inc(&_master_spans); - _rpmalloc_span_initialize(span, span_count, heap_span_count, align_offset); - } - - size_t remain_size = _memory_span_size - sizeof(span_t); - heap_t *heap = (heap_t *)pointer_offset(span, sizeof(span_t)); - _rpmalloc_heap_initialize(heap); - - // Put extra heaps as orphans - size_t num_heaps = remain_size / aligned_heap_size; - if (num_heaps < request_heap_count) - num_heaps = request_heap_count; - atomic_store32(&heap->child_count, (int32_t)num_heaps - 1); - heap_t *extra_heap = (heap_t *)pointer_offset(heap, aligned_heap_size); - while (num_heaps > 1) { - _rpmalloc_heap_initialize(extra_heap); - extra_heap->master_heap = heap; - _rpmalloc_heap_orphan(extra_heap, 1); - extra_heap = (heap_t *)pointer_offset(extra_heap, aligned_heap_size); - --num_heaps; - } - - if (span_count > heap_span_count) { - // Cap reserved spans - size_t remain_count = span_count - heap_span_count; - size_t reserve_count = - (remain_count > _memory_heap_reserve_count ? _memory_heap_reserve_count - : remain_count); - span_t *remain_span = - (span_t *)pointer_offset(span, heap_span_count * _memory_span_size); - _rpmalloc_heap_set_reserved_spans(heap, span, remain_span, reserve_count); - - if (remain_count > reserve_count) { - // Set to global reserved spans - remain_span = (span_t *)pointer_offset(remain_span, - reserve_count * _memory_span_size); - reserve_count = remain_count - reserve_count; - _rpmalloc_global_set_reserved_spans(span, remain_span, reserve_count); - } - } - - return heap; -} - -static heap_t *_rpmalloc_heap_extract_orphan(heap_t **heap_list) { - heap_t *heap = *heap_list; - *heap_list = (heap ? heap->next_orphan : 0); - return heap; -} - -//! Allocate a new heap, potentially reusing a previously orphaned heap -static heap_t *_rpmalloc_heap_allocate(int first_class) { - heap_t *heap = 0; - while (!atomic_cas32_acquire(&_memory_global_lock, 1, 0)) - _rpmalloc_spin(); - if (first_class == 0) - heap = _rpmalloc_heap_extract_orphan(&_memory_orphan_heaps); -#if RPMALLOC_FIRST_CLASS_HEAPS - if (!heap) - heap = _rpmalloc_heap_extract_orphan(&_memory_first_class_orphan_heaps); -#endif - if (!heap) - heap = _rpmalloc_heap_allocate_new(); - atomic_store32_release(&_memory_global_lock, 0); - if (heap) - _rpmalloc_heap_cache_adopt_deferred(heap, 0); - return heap; -} - -static void _rpmalloc_heap_release(void *heapptr, int first_class, - int release_cache) { - heap_t *heap = (heap_t *)heapptr; - if (!heap) - return; - // Release thread cache spans back to global cache - _rpmalloc_heap_cache_adopt_deferred(heap, 0); - if (release_cache || heap->finalize) { -#if ENABLE_THREAD_CACHE - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - span_cache_t *span_cache; - if (!iclass) - span_cache = &heap->span_cache; - else - span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); - if (!span_cache->count) - continue; -#if ENABLE_GLOBAL_CACHE - if (heap->finalize) { - for (size_t ispan = 0; ispan < span_cache->count; ++ispan) - _rpmalloc_span_unmap(span_cache->span[ispan]); - } else { - _rpmalloc_stat_add64(&heap->thread_to_global, span_cache->count * - (iclass + 1) * - _memory_span_size); - _rpmalloc_stat_add(&heap->span_use[iclass].spans_to_global, - span_cache->count); - _rpmalloc_global_cache_insert_spans(span_cache->span, iclass + 1, - span_cache->count); - } -#else - for (size_t ispan = 0; ispan < span_cache->count; ++ispan) - _rpmalloc_span_unmap(span_cache->span[ispan]); -#endif - span_cache->count = 0; - } -#endif - } - - if (get_thread_heap_raw() == heap) - set_thread_heap(0); - -#if ENABLE_STATISTICS - atomic_decr32(&_memory_active_heaps); - rpmalloc_assert(atomic_load32(&_memory_active_heaps) >= 0, - "Still active heaps during finalization"); -#endif - - // If we are forcibly terminating with _exit the state of the - // lock atomic is unknown and it's best to just go ahead and exit - if (get_thread_id() != _rpmalloc_main_thread_id) { - while (!atomic_cas32_acquire(&_memory_global_lock, 1, 0)) - _rpmalloc_spin(); - } - _rpmalloc_heap_orphan(heap, first_class); - atomic_store32_release(&_memory_global_lock, 0); -} - -static void _rpmalloc_heap_release_raw(void *heapptr, int release_cache) { - _rpmalloc_heap_release(heapptr, 0, release_cache); -} - -static void _rpmalloc_heap_release_raw_fc(void *heapptr) { - _rpmalloc_heap_release_raw(heapptr, 1); -} - -static void _rpmalloc_heap_finalize(heap_t *heap) { - if (heap->spans_reserved) { - span_t *span = _rpmalloc_span_map(heap, heap->spans_reserved); - _rpmalloc_span_unmap(span); - heap->spans_reserved = 0; - } - - _rpmalloc_heap_cache_adopt_deferred(heap, 0); - - for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { - if (heap->size_class[iclass].cache) - _rpmalloc_span_unmap(heap->size_class[iclass].cache); - heap->size_class[iclass].cache = 0; - span_t *span = heap->size_class[iclass].partial_span; - while (span) { - span_t *next = span->next; - _rpmalloc_span_finalize(heap, iclass, span, - &heap->size_class[iclass].partial_span); - span = next; - } - // If class still has a free list it must be a full span - if (heap->size_class[iclass].free_list) { - span_t *class_span = - (span_t *)((uintptr_t)heap->size_class[iclass].free_list & - _memory_span_mask); - span_t **list = 0; -#if RPMALLOC_FIRST_CLASS_HEAPS - list = &heap->full_span[iclass]; -#endif - --heap->full_span_count; - if (!_rpmalloc_span_finalize(heap, iclass, class_span, list)) { - if (list) - _rpmalloc_span_double_link_list_remove(list, class_span); - _rpmalloc_span_double_link_list_add( - &heap->size_class[iclass].partial_span, class_span); - } - } - } - -#if ENABLE_THREAD_CACHE - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - span_cache_t *span_cache; - if (!iclass) - span_cache = &heap->span_cache; - else - span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); - for (size_t ispan = 0; ispan < span_cache->count; ++ispan) - _rpmalloc_span_unmap(span_cache->span[ispan]); - span_cache->count = 0; - } -#endif - rpmalloc_assert(!atomic_load_ptr(&heap->span_free_deferred), - "Heaps still active during finalization"); -} - -//////////// -/// -/// Allocation entry points -/// -////// - -//! Pop first block from a free list -static void *free_list_pop(void **list) { - void *block = *list; - *list = *((void **)block); - return block; -} - -//! Allocate a small/medium sized memory block from the given heap -static void *_rpmalloc_allocate_from_heap_fallback( - heap_t *heap, heap_size_class_t *heap_size_class, uint32_t class_idx) { - span_t *span = heap_size_class->partial_span; - rpmalloc_assume(heap != 0); - if (EXPECTED(span != 0)) { - rpmalloc_assert(span->block_count == - _memory_size_class[span->size_class].block_count, - "Span block count corrupted"); - rpmalloc_assert(!_rpmalloc_span_is_fully_utilized(span), - "Internal failure"); - void *block; - if (span->free_list) { - // Span local free list is not empty, swap to size class free list - block = free_list_pop(&span->free_list); - heap_size_class->free_list = span->free_list; - span->free_list = 0; - } else { - // If the span did not fully initialize free list, link up another page - // worth of blocks - void *block_start = pointer_offset( - span, SPAN_HEADER_SIZE + - ((size_t)span->free_list_limit * span->block_size)); - span->free_list_limit += free_list_partial_init( - &heap_size_class->free_list, &block, - (void *)((uintptr_t)block_start & ~(_memory_page_size - 1)), - block_start, span->block_count - span->free_list_limit, - span->block_size); - } - rpmalloc_assert(span->free_list_limit <= span->block_count, - "Span block count corrupted"); - span->used_count = span->free_list_limit; - - // Swap in deferred free list if present - if (atomic_load_ptr(&span->free_list_deferred)) - _rpmalloc_span_extract_free_list_deferred(span); - - // If span is still not fully utilized keep it in partial list and early - // return block - if (!_rpmalloc_span_is_fully_utilized(span)) - return block; - - // The span is fully utilized, unlink from partial list and add to fully - // utilized list - _rpmalloc_span_double_link_list_pop_head(&heap_size_class->partial_span, - span); -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_add(&heap->full_span[class_idx], span); -#endif - ++heap->full_span_count; - return block; - } - - // Find a span in one of the cache levels - span = _rpmalloc_heap_extract_new_span(heap, heap_size_class, 1, class_idx); - if (EXPECTED(span != 0)) { - // Mark span as owned by this heap and set base data, return first block - return _rpmalloc_span_initialize_new(heap, heap_size_class, span, - class_idx); - } - - return 0; -} - -//! Allocate a small sized memory block from the given heap -static void *_rpmalloc_allocate_small(heap_t *heap, size_t size) { - rpmalloc_assert(heap, "No thread heap"); - // Small sizes have unique size classes - const uint32_t class_idx = - (uint32_t)((size + (SMALL_GRANULARITY - 1)) >> SMALL_GRANULARITY_SHIFT); - heap_size_class_t *heap_size_class = heap->size_class + class_idx; - _rpmalloc_stat_inc_alloc(heap, class_idx); - if (EXPECTED(heap_size_class->free_list != 0)) - return free_list_pop(&heap_size_class->free_list); - return _rpmalloc_allocate_from_heap_fallback(heap, heap_size_class, - class_idx); -} - -//! Allocate a medium sized memory block from the given heap -static void *_rpmalloc_allocate_medium(heap_t *heap, size_t size) { - rpmalloc_assert(heap, "No thread heap"); - // Calculate the size class index and do a dependent lookup of the final class - // index (in case of merged classes) - const uint32_t base_idx = - (uint32_t)(SMALL_CLASS_COUNT + - ((size - (SMALL_SIZE_LIMIT + 1)) >> MEDIUM_GRANULARITY_SHIFT)); - const uint32_t class_idx = _memory_size_class[base_idx].class_idx; - heap_size_class_t *heap_size_class = heap->size_class + class_idx; - _rpmalloc_stat_inc_alloc(heap, class_idx); - if (EXPECTED(heap_size_class->free_list != 0)) - return free_list_pop(&heap_size_class->free_list); - return _rpmalloc_allocate_from_heap_fallback(heap, heap_size_class, - class_idx); -} - -//! Allocate a large sized memory block from the given heap -static void *_rpmalloc_allocate_large(heap_t *heap, size_t size) { - rpmalloc_assert(heap, "No thread heap"); - // Calculate number of needed max sized spans (including header) - // Since this function is never called if size > LARGE_SIZE_LIMIT - // the span_count is guaranteed to be <= LARGE_CLASS_COUNT - size += SPAN_HEADER_SIZE; - size_t span_count = size >> _memory_span_size_shift; - if (size & (_memory_span_size - 1)) - ++span_count; - - // Find a span in one of the cache levels - span_t *span = - _rpmalloc_heap_extract_new_span(heap, 0, span_count, SIZE_CLASS_LARGE); - if (!span) - return span; - - // Mark span as owned by this heap and set base data - rpmalloc_assert(span->span_count >= span_count, "Internal failure"); - span->size_class = SIZE_CLASS_LARGE; - span->heap = heap; - -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_add(&heap->large_huge_span, span); -#endif - ++heap->full_span_count; - - return pointer_offset(span, SPAN_HEADER_SIZE); -} - -//! Allocate a huge block by mapping memory pages directly -static void *_rpmalloc_allocate_huge(heap_t *heap, size_t size) { - rpmalloc_assert(heap, "No thread heap"); - _rpmalloc_heap_cache_adopt_deferred(heap, 0); - size += SPAN_HEADER_SIZE; - size_t num_pages = size >> _memory_page_size_shift; - if (size & (_memory_page_size - 1)) - ++num_pages; - size_t align_offset = 0; - span_t *span = - (span_t *)_rpmalloc_mmap(num_pages * _memory_page_size, &align_offset); - if (!span) - return span; - - // Store page count in span_count - span->size_class = SIZE_CLASS_HUGE; - span->span_count = (uint32_t)num_pages; - span->align_offset = (uint32_t)align_offset; - span->heap = heap; - _rpmalloc_stat_add_peak(&_huge_pages_current, num_pages, _huge_pages_peak); - -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_add(&heap->large_huge_span, span); -#endif - ++heap->full_span_count; - - return pointer_offset(span, SPAN_HEADER_SIZE); -} - -//! Allocate a block of the given size -static void *_rpmalloc_allocate(heap_t *heap, size_t size) { - _rpmalloc_stat_add64(&_allocation_counter, 1); - if (EXPECTED(size <= SMALL_SIZE_LIMIT)) - return _rpmalloc_allocate_small(heap, size); - else if (size <= _memory_medium_size_limit) - return _rpmalloc_allocate_medium(heap, size); - else if (size <= LARGE_SIZE_LIMIT) - return _rpmalloc_allocate_large(heap, size); - return _rpmalloc_allocate_huge(heap, size); -} - -static void *_rpmalloc_aligned_allocate(heap_t *heap, size_t alignment, - size_t size) { - if (alignment <= SMALL_GRANULARITY) - return _rpmalloc_allocate(heap, size); - -#if ENABLE_VALIDATE_ARGS - if ((size + alignment) < size) { - errno = EINVAL; - return 0; - } - if (alignment & (alignment - 1)) { - errno = EINVAL; - return 0; - } -#endif - - if ((alignment <= SPAN_HEADER_SIZE) && - ((size + SPAN_HEADER_SIZE) < _memory_medium_size_limit)) { - // If alignment is less or equal to span header size (which is power of - // two), and size aligned to span header size multiples is less than size + - // alignment, then use natural alignment of blocks to provide alignment - size_t multiple_size = size ? (size + (SPAN_HEADER_SIZE - 1)) & - ~(uintptr_t)(SPAN_HEADER_SIZE - 1) - : SPAN_HEADER_SIZE; - rpmalloc_assert(!(multiple_size % SPAN_HEADER_SIZE), - "Failed alignment calculation"); - if (multiple_size <= (size + alignment)) - return _rpmalloc_allocate(heap, multiple_size); - } - - void *ptr = 0; - size_t align_mask = alignment - 1; - if (alignment <= _memory_page_size) { - ptr = _rpmalloc_allocate(heap, size + alignment); - if ((uintptr_t)ptr & align_mask) { - ptr = (void *)(((uintptr_t)ptr & ~(uintptr_t)align_mask) + alignment); - // Mark as having aligned blocks - span_t *span = (span_t *)((uintptr_t)ptr & _memory_span_mask); - span->flags |= SPAN_FLAG_ALIGNED_BLOCKS; - } - return ptr; - } - - // Fallback to mapping new pages for this request. Since pointers passed - // to rpfree must be able to reach the start of the span by bitmasking of - // the address with the span size, the returned aligned pointer from this - // function must be with a span size of the start of the mapped area. - // In worst case this requires us to loop and map pages until we get a - // suitable memory address. It also means we can never align to span size - // or greater, since the span header will push alignment more than one - // span size away from span start (thus causing pointer mask to give us - // an invalid span start on free) - if (alignment & align_mask) { - errno = EINVAL; - return 0; - } - if (alignment >= _memory_span_size) { - errno = EINVAL; - return 0; - } - - size_t extra_pages = alignment / _memory_page_size; - - // Since each span has a header, we will at least need one extra memory page - size_t num_pages = 1 + (size / _memory_page_size); - if (size & (_memory_page_size - 1)) - ++num_pages; - - if (extra_pages > num_pages) - num_pages = 1 + extra_pages; - - size_t original_pages = num_pages; - size_t limit_pages = (_memory_span_size / _memory_page_size) * 2; - if (limit_pages < (original_pages * 2)) - limit_pages = original_pages * 2; - - size_t mapped_size, align_offset; - span_t *span; - -retry: - align_offset = 0; - mapped_size = num_pages * _memory_page_size; - - span = (span_t *)_rpmalloc_mmap(mapped_size, &align_offset); - if (!span) { - errno = ENOMEM; - return 0; - } - ptr = pointer_offset(span, SPAN_HEADER_SIZE); - - if ((uintptr_t)ptr & align_mask) - ptr = (void *)(((uintptr_t)ptr & ~(uintptr_t)align_mask) + alignment); - - if (((size_t)pointer_diff(ptr, span) >= _memory_span_size) || - (pointer_offset(ptr, size) > pointer_offset(span, mapped_size)) || - (((uintptr_t)ptr & _memory_span_mask) != (uintptr_t)span)) { - _rpmalloc_unmap(span, mapped_size, align_offset, mapped_size); - ++num_pages; - if (num_pages > limit_pages) { - errno = EINVAL; - return 0; - } - goto retry; - } - - // Store page count in span_count - span->size_class = SIZE_CLASS_HUGE; - span->span_count = (uint32_t)num_pages; - span->align_offset = (uint32_t)align_offset; - span->heap = heap; - _rpmalloc_stat_add_peak(&_huge_pages_current, num_pages, _huge_pages_peak); - -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_add(&heap->large_huge_span, span); -#endif - ++heap->full_span_count; - - _rpmalloc_stat_add64(&_allocation_counter, 1); - - return ptr; -} - -//////////// -/// -/// Deallocation entry points -/// -////// - -//! Deallocate the given small/medium memory block in the current thread local -//! heap -static void _rpmalloc_deallocate_direct_small_or_medium(span_t *span, - void *block) { - heap_t *heap = span->heap; - rpmalloc_assert(heap->owner_thread == get_thread_id() || - !heap->owner_thread || heap->finalize, - "Internal failure"); - // Add block to free list - if (UNEXPECTED(_rpmalloc_span_is_fully_utilized(span))) { - span->used_count = span->block_count; -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_remove(&heap->full_span[span->size_class], - span); -#endif - _rpmalloc_span_double_link_list_add( - &heap->size_class[span->size_class].partial_span, span); - --heap->full_span_count; - } - *((void **)block) = span->free_list; - --span->used_count; - span->free_list = block; - if (UNEXPECTED(span->used_count == span->list_size)) { - // If there are no used blocks it is guaranteed that no other external - // thread is accessing the span - if (span->used_count) { - // Make sure we have synchronized the deferred list and list size by using - // acquire semantics and guarantee that no external thread is accessing - // span concurrently - void *free_list; - do { - free_list = atomic_exchange_ptr_acquire(&span->free_list_deferred, - INVALID_POINTER); - } while (free_list == INVALID_POINTER); - atomic_store_ptr_release(&span->free_list_deferred, free_list); - } - _rpmalloc_span_double_link_list_remove( - &heap->size_class[span->size_class].partial_span, span); - _rpmalloc_span_release_to_cache(heap, span); - } -} - -static void _rpmalloc_deallocate_defer_free_span(heap_t *heap, span_t *span) { - if (span->size_class != SIZE_CLASS_HUGE) - _rpmalloc_stat_inc(&heap->span_use[span->span_count - 1].spans_deferred); - // This list does not need ABA protection, no mutable side state - do { - span->free_list = (void *)atomic_load_ptr(&heap->span_free_deferred); - } while (!atomic_cas_ptr(&heap->span_free_deferred, span, span->free_list)); -} - -//! Put the block in the deferred free list of the owning span -static void _rpmalloc_deallocate_defer_small_or_medium(span_t *span, - void *block) { - // The memory ordering here is a bit tricky, to avoid having to ABA protect - // the deferred free list to avoid desynchronization of list and list size - // we need to have acquire semantics on successful CAS of the pointer to - // guarantee the list_size variable validity + release semantics on pointer - // store - void *free_list; - do { - free_list = - atomic_exchange_ptr_acquire(&span->free_list_deferred, INVALID_POINTER); - } while (free_list == INVALID_POINTER); - *((void **)block) = free_list; - uint32_t free_count = ++span->list_size; - int all_deferred_free = (free_count == span->block_count); - atomic_store_ptr_release(&span->free_list_deferred, block); - if (all_deferred_free) { - // Span was completely freed by this block. Due to the INVALID_POINTER spin - // lock no other thread can reach this state simultaneously on this span. - // Safe to move to owner heap deferred cache - _rpmalloc_deallocate_defer_free_span(span->heap, span); - } -} - -static void _rpmalloc_deallocate_small_or_medium(span_t *span, void *p) { - _rpmalloc_stat_inc_free(span->heap, span->size_class); - if (span->flags & SPAN_FLAG_ALIGNED_BLOCKS) { - // Realign pointer to block start - void *blocks_start = pointer_offset(span, SPAN_HEADER_SIZE); - uint32_t block_offset = (uint32_t)pointer_diff(p, blocks_start); - p = pointer_offset(p, -(int32_t)(block_offset % span->block_size)); - } - // Check if block belongs to this heap or if deallocation should be deferred -#if RPMALLOC_FIRST_CLASS_HEAPS - int defer = - (span->heap->owner_thread && - (span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); -#else - int defer = - ((span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); -#endif - if (!defer) - _rpmalloc_deallocate_direct_small_or_medium(span, p); - else - _rpmalloc_deallocate_defer_small_or_medium(span, p); -} - -//! Deallocate the given large memory block to the current heap -static void _rpmalloc_deallocate_large(span_t *span) { - rpmalloc_assert(span->size_class == SIZE_CLASS_LARGE, "Bad span size class"); - rpmalloc_assert(!(span->flags & SPAN_FLAG_MASTER) || - !(span->flags & SPAN_FLAG_SUBSPAN), - "Span flag corrupted"); - rpmalloc_assert((span->flags & SPAN_FLAG_MASTER) || - (span->flags & SPAN_FLAG_SUBSPAN), - "Span flag corrupted"); - // We must always defer (unless finalizing) if from another heap since we - // cannot touch the list or counters of another heap -#if RPMALLOC_FIRST_CLASS_HEAPS - int defer = - (span->heap->owner_thread && - (span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); -#else - int defer = - ((span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); -#endif - if (defer) { - _rpmalloc_deallocate_defer_free_span(span->heap, span); - return; - } - rpmalloc_assert(span->heap->full_span_count, "Heap span counter corrupted"); - --span->heap->full_span_count; -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_remove(&span->heap->large_huge_span, span); -#endif -#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS - // Decrease counter - size_t idx = span->span_count - 1; - atomic_decr32(&span->heap->span_use[idx].current); -#endif - heap_t *heap = span->heap; - rpmalloc_assert(heap, "No thread heap"); -#if ENABLE_THREAD_CACHE - const int set_as_reserved = - ((span->span_count > 1) && (heap->span_cache.count == 0) && - !heap->finalize && !heap->spans_reserved); -#else - const int set_as_reserved = - ((span->span_count > 1) && !heap->finalize && !heap->spans_reserved); -#endif - if (set_as_reserved) { - heap->span_reserve = span; - heap->spans_reserved = span->span_count; - if (span->flags & SPAN_FLAG_MASTER) { - heap->span_reserve_master = span; - } else { // SPAN_FLAG_SUBSPAN - span_t *master = (span_t *)pointer_offset( - span, - -(intptr_t)((size_t)span->offset_from_master * _memory_span_size)); - heap->span_reserve_master = master; - rpmalloc_assert(master->flags & SPAN_FLAG_MASTER, "Span flag corrupted"); - rpmalloc_assert(atomic_load32(&master->remaining_spans) >= - (int32_t)span->span_count, - "Master span count corrupted"); - } - _rpmalloc_stat_inc(&heap->span_use[idx].spans_to_reserved); - } else { - // Insert into cache list - _rpmalloc_heap_cache_insert(heap, span); - } -} - -//! Deallocate the given huge span -static void _rpmalloc_deallocate_huge(span_t *span) { - rpmalloc_assert(span->heap, "No span heap"); -#if RPMALLOC_FIRST_CLASS_HEAPS - int defer = - (span->heap->owner_thread && - (span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); -#else - int defer = - ((span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); -#endif - if (defer) { - _rpmalloc_deallocate_defer_free_span(span->heap, span); - return; - } - rpmalloc_assert(span->heap->full_span_count, "Heap span counter corrupted"); - --span->heap->full_span_count; -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_remove(&span->heap->large_huge_span, span); -#endif - - // Oversized allocation, page count is stored in span_count - size_t num_pages = span->span_count; - _rpmalloc_unmap(span, num_pages * _memory_page_size, span->align_offset, - num_pages * _memory_page_size); - _rpmalloc_stat_sub(&_huge_pages_current, num_pages); -} - -//! Deallocate the given block -static void _rpmalloc_deallocate(void *p) { - _rpmalloc_stat_add64(&_deallocation_counter, 1); - // Grab the span (always at start of span, using span alignment) - span_t *span = (span_t *)((uintptr_t)p & _memory_span_mask); - if (UNEXPECTED(!span)) - return; - if (EXPECTED(span->size_class < SIZE_CLASS_COUNT)) - _rpmalloc_deallocate_small_or_medium(span, p); - else if (span->size_class == SIZE_CLASS_LARGE) - _rpmalloc_deallocate_large(span); - else - _rpmalloc_deallocate_huge(span); -} - -//////////// -/// -/// Reallocation entry points -/// -////// - -static size_t _rpmalloc_usable_size(void *p); - -//! Reallocate the given block to the given size -static void *_rpmalloc_reallocate(heap_t *heap, void *p, size_t size, - size_t oldsize, unsigned int flags) { - if (p) { - // Grab the span using guaranteed span alignment - span_t *span = (span_t *)((uintptr_t)p & _memory_span_mask); - if (EXPECTED(span->size_class < SIZE_CLASS_COUNT)) { - // Small/medium sized block - rpmalloc_assert(span->span_count == 1, "Span counter corrupted"); - void *blocks_start = pointer_offset(span, SPAN_HEADER_SIZE); - uint32_t block_offset = (uint32_t)pointer_diff(p, blocks_start); - uint32_t block_idx = block_offset / span->block_size; - void *block = - pointer_offset(blocks_start, (size_t)block_idx * span->block_size); - if (!oldsize) - oldsize = - (size_t)((ptrdiff_t)span->block_size - pointer_diff(p, block)); - if ((size_t)span->block_size >= size) { - // Still fits in block, never mind trying to save memory, but preserve - // data if alignment changed - if ((p != block) && !(flags & RPMALLOC_NO_PRESERVE)) - memmove(block, p, oldsize); - return block; - } - } else if (span->size_class == SIZE_CLASS_LARGE) { - // Large block - size_t total_size = size + SPAN_HEADER_SIZE; - size_t num_spans = total_size >> _memory_span_size_shift; - if (total_size & (_memory_span_mask - 1)) - ++num_spans; - size_t current_spans = span->span_count; - void *block = pointer_offset(span, SPAN_HEADER_SIZE); - if (!oldsize) - oldsize = (current_spans * _memory_span_size) - - (size_t)pointer_diff(p, block) - SPAN_HEADER_SIZE; - if ((current_spans >= num_spans) && (total_size >= (oldsize / 2))) { - // Still fits in block, never mind trying to save memory, but preserve - // data if alignment changed - if ((p != block) && !(flags & RPMALLOC_NO_PRESERVE)) - memmove(block, p, oldsize); - return block; - } - } else { - // Oversized block - size_t total_size = size + SPAN_HEADER_SIZE; - size_t num_pages = total_size >> _memory_page_size_shift; - if (total_size & (_memory_page_size - 1)) - ++num_pages; - // Page count is stored in span_count - size_t current_pages = span->span_count; - void *block = pointer_offset(span, SPAN_HEADER_SIZE); - if (!oldsize) - oldsize = (current_pages * _memory_page_size) - - (size_t)pointer_diff(p, block) - SPAN_HEADER_SIZE; - if ((current_pages >= num_pages) && (num_pages >= (current_pages / 2))) { - // Still fits in block, never mind trying to save memory, but preserve - // data if alignment changed - if ((p != block) && !(flags & RPMALLOC_NO_PRESERVE)) - memmove(block, p, oldsize); - return block; - } - } - } else { - oldsize = 0; - } - - if (!!(flags & RPMALLOC_GROW_OR_FAIL)) - return 0; - - // Size is greater than block size, need to allocate a new block and - // deallocate the old Avoid hysteresis by overallocating if increase is small - // (below 37%) - size_t lower_bound = oldsize + (oldsize >> 2) + (oldsize >> 3); - size_t new_size = - (size > lower_bound) ? size : ((size > oldsize) ? lower_bound : size); - void *block = _rpmalloc_allocate(heap, new_size); - if (p && block) { - if (!(flags & RPMALLOC_NO_PRESERVE)) - memcpy(block, p, oldsize < new_size ? oldsize : new_size); - _rpmalloc_deallocate(p); - } - - return block; -} - -static void *_rpmalloc_aligned_reallocate(heap_t *heap, void *ptr, - size_t alignment, size_t size, - size_t oldsize, unsigned int flags) { - if (alignment <= SMALL_GRANULARITY) - return _rpmalloc_reallocate(heap, ptr, size, oldsize, flags); - - int no_alloc = !!(flags & RPMALLOC_GROW_OR_FAIL); - size_t usablesize = (ptr ? _rpmalloc_usable_size(ptr) : 0); - if ((usablesize >= size) && !((uintptr_t)ptr & (alignment - 1))) { - if (no_alloc || (size >= (usablesize / 2))) - return ptr; - } - // Aligned alloc marks span as having aligned blocks - void *block = - (!no_alloc ? _rpmalloc_aligned_allocate(heap, alignment, size) : 0); - if (EXPECTED(block != 0)) { - if (!(flags & RPMALLOC_NO_PRESERVE) && ptr) { - if (!oldsize) - oldsize = usablesize; - memcpy(block, ptr, oldsize < size ? oldsize : size); - } - _rpmalloc_deallocate(ptr); - } - return block; -} - -//////////// -/// -/// Initialization, finalization and utility -/// -////// - -//! Get the usable size of the given block -static size_t _rpmalloc_usable_size(void *p) { - // Grab the span using guaranteed span alignment - span_t *span = (span_t *)((uintptr_t)p & _memory_span_mask); - if (span->size_class < SIZE_CLASS_COUNT) { - // Small/medium block - void *blocks_start = pointer_offset(span, SPAN_HEADER_SIZE); - return span->block_size - - ((size_t)pointer_diff(p, blocks_start) % span->block_size); - } - if (span->size_class == SIZE_CLASS_LARGE) { - // Large block - size_t current_spans = span->span_count; - return (current_spans * _memory_span_size) - (size_t)pointer_diff(p, span); - } - // Oversized block, page count is stored in span_count - size_t current_pages = span->span_count; - return (current_pages * _memory_page_size) - (size_t)pointer_diff(p, span); -} - -//! Adjust and optimize the size class properties for the given class -static void _rpmalloc_adjust_size_class(size_t iclass) { - size_t block_size = _memory_size_class[iclass].block_size; - size_t block_count = (_memory_span_size - SPAN_HEADER_SIZE) / block_size; - - _memory_size_class[iclass].block_count = (uint16_t)block_count; - _memory_size_class[iclass].class_idx = (uint16_t)iclass; - - // Check if previous size classes can be merged - if (iclass >= SMALL_CLASS_COUNT) { - size_t prevclass = iclass; - while (prevclass > 0) { - --prevclass; - // A class can be merged if number of pages and number of blocks are equal - if (_memory_size_class[prevclass].block_count == - _memory_size_class[iclass].block_count) - _rpmalloc_memcpy_const(_memory_size_class + prevclass, - _memory_size_class + iclass, - sizeof(_memory_size_class[iclass])); - else - break; - } - } -} - -//! Initialize the allocator and setup global data -extern inline int rpmalloc_initialize(void) { - if (_rpmalloc_initialized) { - rpmalloc_thread_initialize(); - return 0; - } - return rpmalloc_initialize_config(0); -} - -int rpmalloc_initialize_config(const rpmalloc_config_t *config) { - if (_rpmalloc_initialized) { - rpmalloc_thread_initialize(); - return 0; - } - _rpmalloc_initialized = 1; - - if (config) - memcpy(&_memory_config, config, sizeof(rpmalloc_config_t)); - else - _rpmalloc_memset_const(&_memory_config, 0, sizeof(rpmalloc_config_t)); - - if (!_memory_config.memory_map || !_memory_config.memory_unmap) { - _memory_config.memory_map = _rpmalloc_mmap_os; - _memory_config.memory_unmap = _rpmalloc_unmap_os; - } - -#if PLATFORM_WINDOWS - SYSTEM_INFO system_info; - memset(&system_info, 0, sizeof(system_info)); - GetSystemInfo(&system_info); - _memory_map_granularity = system_info.dwAllocationGranularity; -#else - _memory_map_granularity = (size_t)sysconf(_SC_PAGESIZE); -#endif - -#if RPMALLOC_CONFIGURABLE - _memory_page_size = _memory_config.page_size; -#else - _memory_page_size = 0; -#endif - _memory_huge_pages = 0; - if (!_memory_page_size) { -#if PLATFORM_WINDOWS - _memory_page_size = system_info.dwPageSize; -#else - _memory_page_size = _memory_map_granularity; - if (_memory_config.enable_huge_pages) { -#if defined(__linux__) - size_t huge_page_size = 0; - FILE *meminfo = fopen("/proc/meminfo", "r"); - if (meminfo) { - char line[128]; - while (!huge_page_size && fgets(line, sizeof(line) - 1, meminfo)) { - line[sizeof(line) - 1] = 0; - if (strstr(line, "Hugepagesize:")) - huge_page_size = (size_t)strtol(line + 13, 0, 10) * 1024; - } - fclose(meminfo); - } - if (huge_page_size) { - _memory_huge_pages = 1; - _memory_page_size = huge_page_size; - _memory_map_granularity = huge_page_size; - } -#elif defined(__FreeBSD__) - int rc; - size_t sz = sizeof(rc); - - if (sysctlbyname("vm.pmap.pg_ps_enabled", &rc, &sz, NULL, 0) == 0 && - rc == 1) { - static size_t defsize = 2 * 1024 * 1024; - int nsize = 0; - size_t sizes[4] = {0}; - _memory_huge_pages = 1; - _memory_page_size = defsize; - if ((nsize = getpagesizes(sizes, 4)) >= 2) { - nsize--; - for (size_t csize = sizes[nsize]; nsize >= 0 && csize; - --nsize, csize = sizes[nsize]) { - //! Unlikely, but as a precaution.. - rpmalloc_assert(!(csize & (csize - 1)) && !(csize % 1024), - "Invalid page size"); - if (defsize < csize) { - _memory_page_size = csize; - break; - } - } - } - _memory_map_granularity = _memory_page_size; - } -#elif defined(__APPLE__) || defined(__NetBSD__) - _memory_huge_pages = 1; - _memory_page_size = 2 * 1024 * 1024; - _memory_map_granularity = _memory_page_size; -#endif - } -#endif - } else { - if (_memory_config.enable_huge_pages) - _memory_huge_pages = 1; - } - -#if PLATFORM_WINDOWS - if (_memory_config.enable_huge_pages) { - HANDLE token = 0; - size_t large_page_minimum = GetLargePageMinimum(); - if (large_page_minimum) - OpenProcessToken(GetCurrentProcess(), - TOKEN_ADJUST_PRIVILEGES | TOKEN_QUERY, &token); - if (token) { - LUID luid; - if (LookupPrivilegeValue(0, SE_LOCK_MEMORY_NAME, &luid)) { - TOKEN_PRIVILEGES token_privileges; - memset(&token_privileges, 0, sizeof(token_privileges)); - token_privileges.PrivilegeCount = 1; - token_privileges.Privileges[0].Luid = luid; - token_privileges.Privileges[0].Attributes = SE_PRIVILEGE_ENABLED; - if (AdjustTokenPrivileges(token, FALSE, &token_privileges, 0, 0, 0)) { - if (GetLastError() == ERROR_SUCCESS) - _memory_huge_pages = 1; - } - } - CloseHandle(token); - } - if (_memory_huge_pages) { - if (large_page_minimum > _memory_page_size) - _memory_page_size = large_page_minimum; - if (large_page_minimum > _memory_map_granularity) - _memory_map_granularity = large_page_minimum; - } - } -#endif - - size_t min_span_size = 256; - size_t max_page_size; -#if UINTPTR_MAX > 0xFFFFFFFF - max_page_size = 4096ULL * 1024ULL * 1024ULL; -#else - max_page_size = 4 * 1024 * 1024; -#endif - if (_memory_page_size < min_span_size) - _memory_page_size = min_span_size; - if (_memory_page_size > max_page_size) - _memory_page_size = max_page_size; - _memory_page_size_shift = 0; - size_t page_size_bit = _memory_page_size; - while (page_size_bit != 1) { - ++_memory_page_size_shift; - page_size_bit >>= 1; - } - _memory_page_size = ((size_t)1 << _memory_page_size_shift); - -#if RPMALLOC_CONFIGURABLE - if (!_memory_config.span_size) { - _memory_span_size = _memory_default_span_size; - _memory_span_size_shift = _memory_default_span_size_shift; - _memory_span_mask = _memory_default_span_mask; - } else { - size_t span_size = _memory_config.span_size; - if (span_size > (256 * 1024)) - span_size = (256 * 1024); - _memory_span_size = 4096; - _memory_span_size_shift = 12; - while (_memory_span_size < span_size) { - _memory_span_size <<= 1; - ++_memory_span_size_shift; - } - _memory_span_mask = ~(uintptr_t)(_memory_span_size - 1); - } -#endif - - _memory_span_map_count = - (_memory_config.span_map_count ? _memory_config.span_map_count - : DEFAULT_SPAN_MAP_COUNT); - if ((_memory_span_size * _memory_span_map_count) < _memory_page_size) - _memory_span_map_count = (_memory_page_size / _memory_span_size); - if ((_memory_page_size >= _memory_span_size) && - ((_memory_span_map_count * _memory_span_size) % _memory_page_size)) - _memory_span_map_count = (_memory_page_size / _memory_span_size); - _memory_heap_reserve_count = (_memory_span_map_count > DEFAULT_SPAN_MAP_COUNT) - ? DEFAULT_SPAN_MAP_COUNT - : _memory_span_map_count; - - _memory_config.page_size = _memory_page_size; - _memory_config.span_size = _memory_span_size; - _memory_config.span_map_count = _memory_span_map_count; - _memory_config.enable_huge_pages = _memory_huge_pages; - -#if ((defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD) || \ - defined(__TINYC__) - if (pthread_key_create(&_memory_thread_heap, _rpmalloc_heap_release_raw_fc)) - return -1; -#endif -#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) - fls_key = FlsAlloc(&_rpmalloc_thread_destructor); -#endif - - // Setup all small and medium size classes - size_t iclass = 0; - _memory_size_class[iclass].block_size = SMALL_GRANULARITY; - _rpmalloc_adjust_size_class(iclass); - for (iclass = 1; iclass < SMALL_CLASS_COUNT; ++iclass) { - size_t size = iclass * SMALL_GRANULARITY; - _memory_size_class[iclass].block_size = (uint32_t)size; - _rpmalloc_adjust_size_class(iclass); - } - // At least two blocks per span, then fall back to large allocations - _memory_medium_size_limit = (_memory_span_size - SPAN_HEADER_SIZE) >> 1; - if (_memory_medium_size_limit > MEDIUM_SIZE_LIMIT) - _memory_medium_size_limit = MEDIUM_SIZE_LIMIT; - for (iclass = 0; iclass < MEDIUM_CLASS_COUNT; ++iclass) { - size_t size = SMALL_SIZE_LIMIT + ((iclass + 1) * MEDIUM_GRANULARITY); - if (size > _memory_medium_size_limit) { - _memory_medium_size_limit = - SMALL_SIZE_LIMIT + (iclass * MEDIUM_GRANULARITY); - break; - } - _memory_size_class[SMALL_CLASS_COUNT + iclass].block_size = (uint32_t)size; - _rpmalloc_adjust_size_class(SMALL_CLASS_COUNT + iclass); - } - - _memory_orphan_heaps = 0; -#if RPMALLOC_FIRST_CLASS_HEAPS - _memory_first_class_orphan_heaps = 0; -#endif -#if ENABLE_STATISTICS - atomic_store32(&_memory_active_heaps, 0); - atomic_store32(&_mapped_pages, 0); - _mapped_pages_peak = 0; - atomic_store32(&_master_spans, 0); - atomic_store32(&_mapped_total, 0); - atomic_store32(&_unmapped_total, 0); - atomic_store32(&_mapped_pages_os, 0); - atomic_store32(&_huge_pages_current, 0); - _huge_pages_peak = 0; -#endif - memset(_memory_heaps, 0, sizeof(_memory_heaps)); - atomic_store32_release(&_memory_global_lock, 0); - - rpmalloc_linker_reference(); - - // Initialize this thread - rpmalloc_thread_initialize(); - return 0; -} - -//! Finalize the allocator -void rpmalloc_finalize(void) { - rpmalloc_thread_finalize(1); - // rpmalloc_dump_statistics(stdout); - - if (_memory_global_reserve) { - atomic_add32(&_memory_global_reserve_master->remaining_spans, - -(int32_t)_memory_global_reserve_count); - _memory_global_reserve_master = 0; - _memory_global_reserve_count = 0; - _memory_global_reserve = 0; - } - atomic_store32_release(&_memory_global_lock, 0); - - // Free all thread caches and fully free spans - for (size_t list_idx = 0; list_idx < HEAP_ARRAY_SIZE; ++list_idx) { - heap_t *heap = _memory_heaps[list_idx]; - while (heap) { - heap_t *next_heap = heap->next_heap; - heap->finalize = 1; - _rpmalloc_heap_global_finalize(heap); - heap = next_heap; - } - } - -#if ENABLE_GLOBAL_CACHE - // Free global caches - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) - _rpmalloc_global_cache_finalize(&_memory_span_cache[iclass]); -#endif - -#if (defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD - pthread_key_delete(_memory_thread_heap); -#endif -#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) - FlsFree(fls_key); - fls_key = 0; -#endif -#if ENABLE_STATISTICS - // If you hit these asserts you probably have memory leaks (perhaps global - // scope data doing dynamic allocations) or double frees in your code - rpmalloc_assert(atomic_load32(&_mapped_pages) == 0, "Memory leak detected"); - rpmalloc_assert(atomic_load32(&_mapped_pages_os) == 0, - "Memory leak detected"); -#endif - - _rpmalloc_initialized = 0; -} - -//! Initialize thread, assign heap -extern inline void rpmalloc_thread_initialize(void) { - if (!get_thread_heap_raw()) { - heap_t *heap = _rpmalloc_heap_allocate(0); - if (heap) { - _rpmalloc_stat_inc(&_memory_active_heaps); - set_thread_heap(heap); -#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) - FlsSetValue(fls_key, heap); -#endif - } - } -} - -//! Finalize thread, orphan heap -void rpmalloc_thread_finalize(int release_caches) { - heap_t *heap = get_thread_heap_raw(); - if (heap) - _rpmalloc_heap_release_raw(heap, release_caches); - set_thread_heap(0); -#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) - FlsSetValue(fls_key, 0); -#endif -} - -int rpmalloc_is_thread_initialized(void) { - return (get_thread_heap_raw() != 0) ? 1 : 0; -} - -const rpmalloc_config_t *rpmalloc_config(void) { return &_memory_config; } - -// Extern interface - -extern inline RPMALLOC_ALLOCATOR void *rpmalloc(size_t size) { -#if ENABLE_VALIDATE_ARGS - if (size >= MAX_ALLOC_SIZE) { - errno = EINVAL; - return 0; - } -#endif - heap_t *heap = get_thread_heap(); - return _rpmalloc_allocate(heap, size); -} - -extern inline void rpfree(void *ptr) { _rpmalloc_deallocate(ptr); } - -extern inline RPMALLOC_ALLOCATOR void *rpcalloc(size_t num, size_t size) { - size_t total; -#if ENABLE_VALIDATE_ARGS -#if PLATFORM_WINDOWS - int err = SizeTMult(num, size, &total); - if ((err != S_OK) || (total >= MAX_ALLOC_SIZE)) { - errno = EINVAL; - return 0; - } -#else - int err = __builtin_umull_overflow(num, size, &total); - if (err || (total >= MAX_ALLOC_SIZE)) { - errno = EINVAL; - return 0; - } -#endif -#else - total = num * size; -#endif - heap_t *heap = get_thread_heap(); - void *block = _rpmalloc_allocate(heap, total); - if (block) - memset(block, 0, total); - return block; -} - -extern inline RPMALLOC_ALLOCATOR void *rprealloc(void *ptr, size_t size) { -#if ENABLE_VALIDATE_ARGS - if (size >= MAX_ALLOC_SIZE) { - errno = EINVAL; - return ptr; - } -#endif - heap_t *heap = get_thread_heap(); - return _rpmalloc_reallocate(heap, ptr, size, 0, 0); -} - -extern RPMALLOC_ALLOCATOR void *rpaligned_realloc(void *ptr, size_t alignment, - size_t size, size_t oldsize, - unsigned int flags) { -#if ENABLE_VALIDATE_ARGS - if ((size + alignment < size) || (alignment > _memory_page_size)) { - errno = EINVAL; - return 0; - } -#endif - heap_t *heap = get_thread_heap(); - return _rpmalloc_aligned_reallocate(heap, ptr, alignment, size, oldsize, - flags); -} - -extern RPMALLOC_ALLOCATOR void *rpaligned_alloc(size_t alignment, size_t size) { - heap_t *heap = get_thread_heap(); - return _rpmalloc_aligned_allocate(heap, alignment, size); -} - -extern inline RPMALLOC_ALLOCATOR void * -rpaligned_calloc(size_t alignment, size_t num, size_t size) { - size_t total; -#if ENABLE_VALIDATE_ARGS -#if PLATFORM_WINDOWS - int err = SizeTMult(num, size, &total); - if ((err != S_OK) || (total >= MAX_ALLOC_SIZE)) { - errno = EINVAL; - return 0; - } -#else - int err = __builtin_umull_overflow(num, size, &total); - if (err || (total >= MAX_ALLOC_SIZE)) { - errno = EINVAL; - return 0; - } -#endif -#else - total = num * size; -#endif - void *block = rpaligned_alloc(alignment, total); - if (block) - memset(block, 0, total); - return block; -} - -extern inline RPMALLOC_ALLOCATOR void *rpmemalign(size_t alignment, - size_t size) { - return rpaligned_alloc(alignment, size); -} - -extern inline int rpposix_memalign(void **memptr, size_t alignment, - size_t size) { - if (memptr) - *memptr = rpaligned_alloc(alignment, size); - else - return EINVAL; - return *memptr ? 0 : ENOMEM; -} - -extern inline size_t rpmalloc_usable_size(void *ptr) { - return (ptr ? _rpmalloc_usable_size(ptr) : 0); -} - -extern inline void rpmalloc_thread_collect(void) {} - -void rpmalloc_thread_statistics(rpmalloc_thread_statistics_t *stats) { - memset(stats, 0, sizeof(rpmalloc_thread_statistics_t)); - heap_t *heap = get_thread_heap_raw(); - if (!heap) - return; - - for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { - size_class_t *size_class = _memory_size_class + iclass; - span_t *span = heap->size_class[iclass].partial_span; - while (span) { - size_t free_count = span->list_size; - size_t block_count = size_class->block_count; - if (span->free_list_limit < block_count) - block_count = span->free_list_limit; - free_count += (block_count - span->used_count); - stats->sizecache += free_count * size_class->block_size; - span = span->next; - } - } - -#if ENABLE_THREAD_CACHE - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - span_cache_t *span_cache; - if (!iclass) - span_cache = &heap->span_cache; - else - span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); - stats->spancache += span_cache->count * (iclass + 1) * _memory_span_size; - } -#endif - - span_t *deferred = (span_t *)atomic_load_ptr(&heap->span_free_deferred); - while (deferred) { - if (deferred->size_class != SIZE_CLASS_HUGE) - stats->spancache += (size_t)deferred->span_count * _memory_span_size; - deferred = (span_t *)deferred->free_list; - } - -#if ENABLE_STATISTICS - stats->thread_to_global = (size_t)atomic_load64(&heap->thread_to_global); - stats->global_to_thread = (size_t)atomic_load64(&heap->global_to_thread); - - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - stats->span_use[iclass].current = - (size_t)atomic_load32(&heap->span_use[iclass].current); - stats->span_use[iclass].peak = - (size_t)atomic_load32(&heap->span_use[iclass].high); - stats->span_use[iclass].to_global = - (size_t)atomic_load32(&heap->span_use[iclass].spans_to_global); - stats->span_use[iclass].from_global = - (size_t)atomic_load32(&heap->span_use[iclass].spans_from_global); - stats->span_use[iclass].to_cache = - (size_t)atomic_load32(&heap->span_use[iclass].spans_to_cache); - stats->span_use[iclass].from_cache = - (size_t)atomic_load32(&heap->span_use[iclass].spans_from_cache); - stats->span_use[iclass].to_reserved = - (size_t)atomic_load32(&heap->span_use[iclass].spans_to_reserved); - stats->span_use[iclass].from_reserved = - (size_t)atomic_load32(&heap->span_use[iclass].spans_from_reserved); - stats->span_use[iclass].map_calls = - (size_t)atomic_load32(&heap->span_use[iclass].spans_map_calls); - } - for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { - stats->size_use[iclass].alloc_current = - (size_t)atomic_load32(&heap->size_class_use[iclass].alloc_current); - stats->size_use[iclass].alloc_peak = - (size_t)heap->size_class_use[iclass].alloc_peak; - stats->size_use[iclass].alloc_total = - (size_t)atomic_load32(&heap->size_class_use[iclass].alloc_total); - stats->size_use[iclass].free_total = - (size_t)atomic_load32(&heap->size_class_use[iclass].free_total); - stats->size_use[iclass].spans_to_cache = - (size_t)atomic_load32(&heap->size_class_use[iclass].spans_to_cache); - stats->size_use[iclass].spans_from_cache = - (size_t)atomic_load32(&heap->size_class_use[iclass].spans_from_cache); - stats->size_use[iclass].spans_from_reserved = (size_t)atomic_load32( - &heap->size_class_use[iclass].spans_from_reserved); - stats->size_use[iclass].map_calls = - (size_t)atomic_load32(&heap->size_class_use[iclass].spans_map_calls); - } -#endif -} - -void rpmalloc_global_statistics(rpmalloc_global_statistics_t *stats) { - memset(stats, 0, sizeof(rpmalloc_global_statistics_t)); -#if ENABLE_STATISTICS - stats->mapped = (size_t)atomic_load32(&_mapped_pages) * _memory_page_size; - stats->mapped_peak = (size_t)_mapped_pages_peak * _memory_page_size; - stats->mapped_total = - (size_t)atomic_load32(&_mapped_total) * _memory_page_size; - stats->unmapped_total = - (size_t)atomic_load32(&_unmapped_total) * _memory_page_size; - stats->huge_alloc = - (size_t)atomic_load32(&_huge_pages_current) * _memory_page_size; - stats->huge_alloc_peak = (size_t)_huge_pages_peak * _memory_page_size; -#endif -#if ENABLE_GLOBAL_CACHE - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - global_cache_t *cache = &_memory_span_cache[iclass]; - while (!atomic_cas32_acquire(&cache->lock, 1, 0)) - _rpmalloc_spin(); - uint32_t count = cache->count; -#if ENABLE_UNLIMITED_CACHE - span_t *current_span = cache->overflow; - while (current_span) { - ++count; - current_span = current_span->next; - } -#endif - atomic_store32_release(&cache->lock, 0); - stats->cached += count * (iclass + 1) * _memory_span_size; - } -#endif -} - -#if ENABLE_STATISTICS - -static void _memory_heap_dump_statistics(heap_t *heap, void *file) { - fprintf(file, "Heap %d stats:\n", heap->id); - fprintf(file, "Class CurAlloc PeakAlloc TotAlloc TotFree BlkSize " - "BlkCount SpansCur SpansPeak PeakAllocMiB ToCacheMiB " - "FromCacheMiB FromReserveMiB MmapCalls\n"); - for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { - if (!atomic_load32(&heap->size_class_use[iclass].alloc_total)) - continue; - fprintf( - file, - "%3u: %10u %10u %10u %10u %8u %8u %8d %9d %13zu %11zu %12zu %14zu " - "%9u\n", - (uint32_t)iclass, - atomic_load32(&heap->size_class_use[iclass].alloc_current), - heap->size_class_use[iclass].alloc_peak, - atomic_load32(&heap->size_class_use[iclass].alloc_total), - atomic_load32(&heap->size_class_use[iclass].free_total), - _memory_size_class[iclass].block_size, - _memory_size_class[iclass].block_count, - atomic_load32(&heap->size_class_use[iclass].spans_current), - heap->size_class_use[iclass].spans_peak, - ((size_t)heap->size_class_use[iclass].alloc_peak * - (size_t)_memory_size_class[iclass].block_size) / - (size_t)(1024 * 1024), - ((size_t)atomic_load32(&heap->size_class_use[iclass].spans_to_cache) * - _memory_span_size) / - (size_t)(1024 * 1024), - ((size_t)atomic_load32(&heap->size_class_use[iclass].spans_from_cache) * - _memory_span_size) / - (size_t)(1024 * 1024), - ((size_t)atomic_load32( - &heap->size_class_use[iclass].spans_from_reserved) * - _memory_span_size) / - (size_t)(1024 * 1024), - atomic_load32(&heap->size_class_use[iclass].spans_map_calls)); - } - fprintf(file, "Spans Current Peak Deferred PeakMiB Cached ToCacheMiB " - "FromCacheMiB ToReserveMiB FromReserveMiB ToGlobalMiB " - "FromGlobalMiB MmapCalls\n"); - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - if (!atomic_load32(&heap->span_use[iclass].high) && - !atomic_load32(&heap->span_use[iclass].spans_map_calls)) - continue; - fprintf( - file, - "%4u: %8d %8u %8u %8zu %7u %11zu %12zu %12zu %14zu %11zu %13zu %10u\n", - (uint32_t)(iclass + 1), atomic_load32(&heap->span_use[iclass].current), - atomic_load32(&heap->span_use[iclass].high), - atomic_load32(&heap->span_use[iclass].spans_deferred), - ((size_t)atomic_load32(&heap->span_use[iclass].high) * - (size_t)_memory_span_size * (iclass + 1)) / - (size_t)(1024 * 1024), -#if ENABLE_THREAD_CACHE - (unsigned int)(!iclass ? heap->span_cache.count - : heap->span_large_cache[iclass - 1].count), - ((size_t)atomic_load32(&heap->span_use[iclass].spans_to_cache) * - (iclass + 1) * _memory_span_size) / - (size_t)(1024 * 1024), - ((size_t)atomic_load32(&heap->span_use[iclass].spans_from_cache) * - (iclass + 1) * _memory_span_size) / - (size_t)(1024 * 1024), -#else - 0, (size_t)0, (size_t)0, -#endif - ((size_t)atomic_load32(&heap->span_use[iclass].spans_to_reserved) * - (iclass + 1) * _memory_span_size) / - (size_t)(1024 * 1024), - ((size_t)atomic_load32(&heap->span_use[iclass].spans_from_reserved) * - (iclass + 1) * _memory_span_size) / - (size_t)(1024 * 1024), - ((size_t)atomic_load32(&heap->span_use[iclass].spans_to_global) * - (size_t)_memory_span_size * (iclass + 1)) / - (size_t)(1024 * 1024), - ((size_t)atomic_load32(&heap->span_use[iclass].spans_from_global) * - (size_t)_memory_span_size * (iclass + 1)) / - (size_t)(1024 * 1024), - atomic_load32(&heap->span_use[iclass].spans_map_calls)); - } - fprintf(file, "Full spans: %zu\n", heap->full_span_count); - fprintf(file, "ThreadToGlobalMiB GlobalToThreadMiB\n"); - fprintf( - file, "%17zu %17zu\n", - (size_t)atomic_load64(&heap->thread_to_global) / (size_t)(1024 * 1024), - (size_t)atomic_load64(&heap->global_to_thread) / (size_t)(1024 * 1024)); -} - -#endif - -void rpmalloc_dump_statistics(void *file) { -#if ENABLE_STATISTICS - for (size_t list_idx = 0; list_idx < HEAP_ARRAY_SIZE; ++list_idx) { - heap_t *heap = _memory_heaps[list_idx]; - while (heap) { - int need_dump = 0; - for (size_t iclass = 0; !need_dump && (iclass < SIZE_CLASS_COUNT); - ++iclass) { - if (!atomic_load32(&heap->size_class_use[iclass].alloc_total)) { - rpmalloc_assert( - !atomic_load32(&heap->size_class_use[iclass].free_total), - "Heap statistics counter mismatch"); - rpmalloc_assert( - !atomic_load32(&heap->size_class_use[iclass].spans_map_calls), - "Heap statistics counter mismatch"); - continue; - } - need_dump = 1; - } - for (size_t iclass = 0; !need_dump && (iclass < LARGE_CLASS_COUNT); - ++iclass) { - if (!atomic_load32(&heap->span_use[iclass].high) && - !atomic_load32(&heap->span_use[iclass].spans_map_calls)) - continue; - need_dump = 1; - } - if (need_dump) - _memory_heap_dump_statistics(heap, file); - heap = heap->next_heap; - } - } - fprintf(file, "Global stats:\n"); - size_t huge_current = - (size_t)atomic_load32(&_huge_pages_current) * _memory_page_size; - size_t huge_peak = (size_t)_huge_pages_peak * _memory_page_size; - fprintf(file, "HugeCurrentMiB HugePeakMiB\n"); - fprintf(file, "%14zu %11zu\n", huge_current / (size_t)(1024 * 1024), - huge_peak / (size_t)(1024 * 1024)); - -#if ENABLE_GLOBAL_CACHE - fprintf(file, "GlobalCacheMiB\n"); - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - global_cache_t *cache = _memory_span_cache + iclass; - size_t global_cache = (size_t)cache->count * iclass * _memory_span_size; - - size_t global_overflow_cache = 0; - span_t *span = cache->overflow; - while (span) { - global_overflow_cache += iclass * _memory_span_size; - span = span->next; - } - if (global_cache || global_overflow_cache || cache->insert_count || - cache->extract_count) - fprintf(file, - "%4zu: %8zuMiB (%8zuMiB overflow) %14zu insert %14zu extract\n", - iclass + 1, global_cache / (size_t)(1024 * 1024), - global_overflow_cache / (size_t)(1024 * 1024), - cache->insert_count, cache->extract_count); - } -#endif - - size_t mapped = (size_t)atomic_load32(&_mapped_pages) * _memory_page_size; - size_t mapped_os = - (size_t)atomic_load32(&_mapped_pages_os) * _memory_page_size; - size_t mapped_peak = (size_t)_mapped_pages_peak * _memory_page_size; - size_t mapped_total = - (size_t)atomic_load32(&_mapped_total) * _memory_page_size; - size_t unmapped_total = - (size_t)atomic_load32(&_unmapped_total) * _memory_page_size; - fprintf( - file, - "MappedMiB MappedOSMiB MappedPeakMiB MappedTotalMiB UnmappedTotalMiB\n"); - fprintf(file, "%9zu %11zu %13zu %14zu %16zu\n", - mapped / (size_t)(1024 * 1024), mapped_os / (size_t)(1024 * 1024), - mapped_peak / (size_t)(1024 * 1024), - mapped_total / (size_t)(1024 * 1024), - unmapped_total / (size_t)(1024 * 1024)); - - fprintf(file, "\n"); -#if 0 - int64_t allocated = atomic_load64(&_allocation_counter); - int64_t deallocated = atomic_load64(&_deallocation_counter); - fprintf(file, "Allocation count: %lli\n", allocated); - fprintf(file, "Deallocation count: %lli\n", deallocated); - fprintf(file, "Current allocations: %lli\n", (allocated - deallocated)); - fprintf(file, "Master spans: %d\n", atomic_load32(&_master_spans)); - fprintf(file, "Dangling master spans: %d\n", atomic_load32(&_unmapped_master_spans)); -#endif -#endif - (void)sizeof(file); -} - -#if RPMALLOC_FIRST_CLASS_HEAPS - -extern inline rpmalloc_heap_t *rpmalloc_heap_acquire(void) { - // Must be a pristine heap from newly mapped memory pages, or else memory - // blocks could already be allocated from the heap which would (wrongly) be - // released when heap is cleared with rpmalloc_heap_free_all(). Also heaps - // guaranteed to be pristine from the dedicated orphan list can be used. - heap_t *heap = _rpmalloc_heap_allocate(1); - rpmalloc_assume(heap != NULL); - heap->owner_thread = 0; - _rpmalloc_stat_inc(&_memory_active_heaps); - return heap; -} - -extern inline void rpmalloc_heap_release(rpmalloc_heap_t *heap) { - if (heap) - _rpmalloc_heap_release(heap, 1, 1); -} - -extern inline RPMALLOC_ALLOCATOR void * -rpmalloc_heap_alloc(rpmalloc_heap_t *heap, size_t size) { -#if ENABLE_VALIDATE_ARGS - if (size >= MAX_ALLOC_SIZE) { - errno = EINVAL; - return 0; - } -#endif - return _rpmalloc_allocate(heap, size); -} - -extern inline RPMALLOC_ALLOCATOR void * -rpmalloc_heap_aligned_alloc(rpmalloc_heap_t *heap, size_t alignment, - size_t size) { -#if ENABLE_VALIDATE_ARGS - if (size >= MAX_ALLOC_SIZE) { - errno = EINVAL; - return 0; - } -#endif - return _rpmalloc_aligned_allocate(heap, alignment, size); -} - -extern inline RPMALLOC_ALLOCATOR void * -rpmalloc_heap_calloc(rpmalloc_heap_t *heap, size_t num, size_t size) { - return rpmalloc_heap_aligned_calloc(heap, 0, num, size); -} - -extern inline RPMALLOC_ALLOCATOR void * -rpmalloc_heap_aligned_calloc(rpmalloc_heap_t *heap, size_t alignment, - size_t num, size_t size) { - size_t total; -#if ENABLE_VALIDATE_ARGS -#if PLATFORM_WINDOWS - int err = SizeTMult(num, size, &total); - if ((err != S_OK) || (total >= MAX_ALLOC_SIZE)) { - errno = EINVAL; - return 0; - } -#else - int err = __builtin_umull_overflow(num, size, &total); - if (err || (total >= MAX_ALLOC_SIZE)) { - errno = EINVAL; - return 0; - } -#endif -#else - total = num * size; -#endif - void *block = _rpmalloc_aligned_allocate(heap, alignment, total); - if (block) - memset(block, 0, total); - return block; -} - -extern inline RPMALLOC_ALLOCATOR void * -rpmalloc_heap_realloc(rpmalloc_heap_t *heap, void *ptr, size_t size, - unsigned int flags) { -#if ENABLE_VALIDATE_ARGS - if (size >= MAX_ALLOC_SIZE) { - errno = EINVAL; - return ptr; - } -#endif - return _rpmalloc_reallocate(heap, ptr, size, 0, flags); -} - -extern inline RPMALLOC_ALLOCATOR void * -rpmalloc_heap_aligned_realloc(rpmalloc_heap_t *heap, void *ptr, - size_t alignment, size_t size, - unsigned int flags) { -#if ENABLE_VALIDATE_ARGS - if ((size + alignment < size) || (alignment > _memory_page_size)) { - errno = EINVAL; - return 0; - } -#endif - return _rpmalloc_aligned_reallocate(heap, ptr, alignment, size, 0, flags); -} - -extern inline void rpmalloc_heap_free(rpmalloc_heap_t *heap, void *ptr) { - (void)sizeof(heap); - _rpmalloc_deallocate(ptr); -} - -extern inline void rpmalloc_heap_free_all(rpmalloc_heap_t *heap) { - span_t *span; - span_t *next_span; - - _rpmalloc_heap_cache_adopt_deferred(heap, 0); - - for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { - span = heap->size_class[iclass].partial_span; - while (span) { - next_span = span->next; - _rpmalloc_heap_cache_insert(heap, span); - span = next_span; - } - heap->size_class[iclass].partial_span = 0; - span = heap->full_span[iclass]; - while (span) { - next_span = span->next; - _rpmalloc_heap_cache_insert(heap, span); - span = next_span; - } - - span = heap->size_class[iclass].cache; - if (span) - _rpmalloc_heap_cache_insert(heap, span); - heap->size_class[iclass].cache = 0; - } - memset(heap->size_class, 0, sizeof(heap->size_class)); - memset(heap->full_span, 0, sizeof(heap->full_span)); - - span = heap->large_huge_span; - while (span) { - next_span = span->next; - if (UNEXPECTED(span->size_class == SIZE_CLASS_HUGE)) - _rpmalloc_deallocate_huge(span); - else - _rpmalloc_heap_cache_insert(heap, span); - span = next_span; - } - heap->large_huge_span = 0; - heap->full_span_count = 0; - -#if ENABLE_THREAD_CACHE - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - span_cache_t *span_cache; - if (!iclass) - span_cache = &heap->span_cache; - else - span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); - if (!span_cache->count) - continue; -#if ENABLE_GLOBAL_CACHE - _rpmalloc_stat_add64(&heap->thread_to_global, - span_cache->count * (iclass + 1) * _memory_span_size); - _rpmalloc_stat_add(&heap->span_use[iclass].spans_to_global, - span_cache->count); - _rpmalloc_global_cache_insert_spans(span_cache->span, iclass + 1, - span_cache->count); -#else - for (size_t ispan = 0; ispan < span_cache->count; ++ispan) - _rpmalloc_span_unmap(span_cache->span[ispan]); -#endif - span_cache->count = 0; - } -#endif - -#if ENABLE_STATISTICS - for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { - atomic_store32(&heap->size_class_use[iclass].alloc_current, 0); - atomic_store32(&heap->size_class_use[iclass].spans_current, 0); - } - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - atomic_store32(&heap->span_use[iclass].current, 0); - } -#endif -} - -extern inline void rpmalloc_heap_thread_set_current(rpmalloc_heap_t *heap) { - heap_t *prev_heap = get_thread_heap_raw(); - if (prev_heap != heap) { - set_thread_heap(heap); - if (prev_heap) - rpmalloc_heap_release(prev_heap); - } -} - -extern inline rpmalloc_heap_t *rpmalloc_get_heap_for_ptr(void *ptr) { - // Grab the span, and then the heap from the span - span_t *span = (span_t *)((uintptr_t)ptr & _memory_span_mask); - if (span) { - return span->heap; - } - return 0; -} - -#endif - -#if ENABLE_PRELOAD || ENABLE_OVERRIDE - -#include "malloc.c" - -#endif - -void rpmalloc_linker_reference(void) { (void)sizeof(_rpmalloc_initialized); } +//===---------------------- rpmalloc.c ------------------*- C -*-=============// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This library provides a cross-platform lock free thread caching malloc +// implementation in C11. +// +//===----------------------------------------------------------------------===// + +#include "rpmalloc.h" + +//////////// +/// +/// Build time configurable limits +/// +////// + +#if defined(__clang__) +#pragma clang diagnostic ignored "-Wunused-macros" +#pragma clang diagnostic ignored "-Wunused-function" +#if __has_warning("-Wreserved-identifier") +#pragma clang diagnostic ignored "-Wreserved-identifier" +#endif +#if __has_warning("-Wstatic-in-inline") +#pragma clang diagnostic ignored "-Wstatic-in-inline" +#endif +#elif defined(__GNUC__) +#pragma GCC diagnostic ignored "-Wunused-macros" +#pragma GCC diagnostic ignored "-Wunused-function" +#endif + +#if !defined(__has_builtin) +#define __has_builtin(b) 0 +#endif + +#if defined(__GNUC__) || defined(__clang__) + +#if __has_builtin(__builtin_memcpy_inline) +#define _rpmalloc_memcpy_const(x, y, s) __builtin_memcpy_inline(x, y, s) +#else +#define _rpmalloc_memcpy_const(x, y, s) \ + do { \ + _Static_assert(__builtin_choose_expr(__builtin_constant_p(s), 1, 0), \ + "len must be a constant integer"); \ + memcpy(x, y, s); \ + } while (0) +#endif + +#if __has_builtin(__builtin_memset_inline) +#define _rpmalloc_memset_const(x, y, s) __builtin_memset_inline(x, y, s) +#else +#define _rpmalloc_memset_const(x, y, s) \ + do { \ + _Static_assert(__builtin_choose_expr(__builtin_constant_p(s), 1, 0), \ + "len must be a constant integer"); \ + memset(x, y, s); \ + } while (0) +#endif +#else +#define _rpmalloc_memcpy_const(x, y, s) memcpy(x, y, s) +#define _rpmalloc_memset_const(x, y, s) memset(x, y, s) +#endif + +#if __has_builtin(__builtin_assume) +#define rpmalloc_assume(cond) __builtin_assume(cond) +#elif defined(__GNUC__) +#define rpmalloc_assume(cond) \ + do { \ + if (!__builtin_expect(cond, 0)) \ + __builtin_unreachable(); \ + } while (0) +#elif defined(_MSC_VER) +#define rpmalloc_assume(cond) __assume(cond) +#else +#define rpmalloc_assume(cond) 0 +#endif + +#ifndef HEAP_ARRAY_SIZE +//! Size of heap hashmap +#define HEAP_ARRAY_SIZE 47 +#endif +#ifndef ENABLE_THREAD_CACHE +//! Enable per-thread cache +#define ENABLE_THREAD_CACHE 1 +#endif +#ifndef ENABLE_GLOBAL_CACHE +//! Enable global cache shared between all threads, requires thread cache +#define ENABLE_GLOBAL_CACHE 1 +#endif +#ifndef ENABLE_VALIDATE_ARGS +//! Enable validation of args to public entry points +#define ENABLE_VALIDATE_ARGS 0 +#endif +#ifndef ENABLE_STATISTICS +//! Enable statistics collection +#define ENABLE_STATISTICS 0 +#endif +#ifndef ENABLE_ASSERTS +//! Enable asserts +#define ENABLE_ASSERTS 0 +#endif +#ifndef ENABLE_OVERRIDE +//! Override standard library malloc/free and new/delete entry points +#define ENABLE_OVERRIDE 0 +#endif +#ifndef ENABLE_PRELOAD +//! Support preloading +#define ENABLE_PRELOAD 0 +#endif +#ifndef DISABLE_UNMAP +//! Disable unmapping memory pages (also enables unlimited cache) +#define DISABLE_UNMAP 0 +#endif +#ifndef ENABLE_UNLIMITED_CACHE +//! Enable unlimited global cache (no unmapping until finalization) +#define ENABLE_UNLIMITED_CACHE 0 +#endif +#ifndef ENABLE_ADAPTIVE_THREAD_CACHE +//! Enable adaptive thread cache size based on use heuristics +#define ENABLE_ADAPTIVE_THREAD_CACHE 0 +#endif +#ifndef DEFAULT_SPAN_MAP_COUNT +//! Default number of spans to map in call to map more virtual memory (default +//! values yield 4MiB here) +#define DEFAULT_SPAN_MAP_COUNT 64 +#endif +#ifndef GLOBAL_CACHE_MULTIPLIER +//! Multiplier for global cache +#define GLOBAL_CACHE_MULTIPLIER 8 +#endif + +#if DISABLE_UNMAP && !ENABLE_GLOBAL_CACHE +#error Must use global cache if unmap is disabled +#endif + +#if DISABLE_UNMAP +#undef ENABLE_UNLIMITED_CACHE +#define ENABLE_UNLIMITED_CACHE 1 +#endif + +#if !ENABLE_GLOBAL_CACHE +#undef ENABLE_UNLIMITED_CACHE +#define ENABLE_UNLIMITED_CACHE 0 +#endif + +#if !ENABLE_THREAD_CACHE +#undef ENABLE_ADAPTIVE_THREAD_CACHE +#define ENABLE_ADAPTIVE_THREAD_CACHE 0 +#endif + +#if defined(_WIN32) || defined(__WIN32__) || defined(_WIN64) +#define PLATFORM_WINDOWS 1 +#define PLATFORM_POSIX 0 +#else +#define PLATFORM_WINDOWS 0 +#define PLATFORM_POSIX 1 +#endif + +/// Platform and arch specifics +#if defined(_MSC_VER) && !defined(__clang__) +#pragma warning(disable : 5105) +#ifndef FORCEINLINE +#define FORCEINLINE inline __forceinline +#endif +#define _Static_assert static_assert +#else +#ifndef FORCEINLINE +#define FORCEINLINE inline __attribute__((__always_inline__)) +#endif +#endif +#if PLATFORM_WINDOWS +#ifndef WIN32_LEAN_AND_MEAN +#define WIN32_LEAN_AND_MEAN +#endif +#include +#if ENABLE_VALIDATE_ARGS +#include +#endif +#else +#include +#include +#include +#include +#if defined(__linux__) || defined(__ANDROID__) +#include +#if !defined(PR_SET_VMA) +#define PR_SET_VMA 0x53564d41 +#define PR_SET_VMA_ANON_NAME 0 +#endif +#endif +#if defined(__APPLE__) +#include +#if !TARGET_OS_IPHONE && !TARGET_OS_SIMULATOR +#include +#include +#endif +#include +#endif +#if defined(__HAIKU__) || defined(__TINYC__) +#include +#endif +#endif + +#include +#include +#include + +#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) +#include +static DWORD fls_key; +#endif + +#if PLATFORM_POSIX +#include +#include +#ifdef __FreeBSD__ +#include +#define MAP_HUGETLB MAP_ALIGNED_SUPER +#ifndef PROT_MAX +#define PROT_MAX(f) 0 +#endif +#else +#define PROT_MAX(f) 0 +#endif +#ifdef __sun +extern int madvise(caddr_t, size_t, int); +#endif +#ifndef MAP_UNINITIALIZED +#define MAP_UNINITIALIZED 0 +#endif +#endif +#include + +#if ENABLE_ASSERTS +#undef NDEBUG +#if defined(_MSC_VER) && !defined(_DEBUG) +#define _DEBUG +#endif +#include +#define RPMALLOC_TOSTRING_M(x) #x +#define RPMALLOC_TOSTRING(x) RPMALLOC_TOSTRING_M(x) +#define rpmalloc_assert(truth, message) \ + do { \ + if (!(truth)) { \ + if (_memory_config.error_callback) { \ + _memory_config.error_callback(message " (" RPMALLOC_TOSTRING( \ + truth) ") at " __FILE__ ":" RPMALLOC_TOSTRING(__LINE__)); \ + } else { \ + assert((truth) && message); \ + } \ + } \ + } while (0) +#else +#define rpmalloc_assert(truth, message) \ + do { \ + } while (0) +#endif +#if ENABLE_STATISTICS +#include +#endif + +////// +/// +/// Atomic access abstraction (since MSVC does not do C11 yet) +/// +////// + +#if defined(_MSC_VER) && !defined(__clang__) + +typedef volatile long atomic32_t; +typedef volatile long long atomic64_t; +typedef volatile void *atomicptr_t; + +static FORCEINLINE int32_t atomic_load32(atomic32_t *src) { return *src; } +static FORCEINLINE void atomic_store32(atomic32_t *dst, int32_t val) { + *dst = val; +} +static FORCEINLINE int32_t atomic_incr32(atomic32_t *val) { + return (int32_t)InterlockedIncrement(val); +} +static FORCEINLINE int32_t atomic_decr32(atomic32_t *val) { + return (int32_t)InterlockedDecrement(val); +} +static FORCEINLINE int32_t atomic_add32(atomic32_t *val, int32_t add) { + return (int32_t)InterlockedExchangeAdd(val, add) + add; +} +static FORCEINLINE int atomic_cas32_acquire(atomic32_t *dst, int32_t val, + int32_t ref) { + return (InterlockedCompareExchange(dst, val, ref) == ref) ? 1 : 0; +} +static FORCEINLINE void atomic_store32_release(atomic32_t *dst, int32_t val) { + *dst = val; +} +static FORCEINLINE int64_t atomic_load64(atomic64_t *src) { return *src; } +static FORCEINLINE int64_t atomic_add64(atomic64_t *val, int64_t add) { + return (int64_t)InterlockedExchangeAdd64(val, add) + add; +} +static FORCEINLINE void *atomic_load_ptr(atomicptr_t *src) { + return (void *)*src; +} +static FORCEINLINE void atomic_store_ptr(atomicptr_t *dst, void *val) { + *dst = val; +} +static FORCEINLINE void atomic_store_ptr_release(atomicptr_t *dst, void *val) { + *dst = val; +} +static FORCEINLINE void *atomic_exchange_ptr_acquire(atomicptr_t *dst, + void *val) { + return (void *)InterlockedExchangePointer((void *volatile *)dst, val); +} +static FORCEINLINE int atomic_cas_ptr(atomicptr_t *dst, void *val, void *ref) { + return (InterlockedCompareExchangePointer((void *volatile *)dst, val, ref) == + ref) + ? 1 + : 0; +} + +#define EXPECTED(x) (x) +#define UNEXPECTED(x) (x) + +#else + +#include + +typedef volatile _Atomic(int32_t) atomic32_t; +typedef volatile _Atomic(int64_t) atomic64_t; +typedef volatile _Atomic(void *) atomicptr_t; + +static FORCEINLINE int32_t atomic_load32(atomic32_t *src) { + return atomic_load_explicit(src, memory_order_relaxed); +} +static FORCEINLINE void atomic_store32(atomic32_t *dst, int32_t val) { + atomic_store_explicit(dst, val, memory_order_relaxed); +} +static FORCEINLINE int32_t atomic_incr32(atomic32_t *val) { + return atomic_fetch_add_explicit(val, 1, memory_order_relaxed) + 1; +} +static FORCEINLINE int32_t atomic_decr32(atomic32_t *val) { + return atomic_fetch_add_explicit(val, -1, memory_order_relaxed) - 1; +} +static FORCEINLINE int32_t atomic_add32(atomic32_t *val, int32_t add) { + return atomic_fetch_add_explicit(val, add, memory_order_relaxed) + add; +} +static FORCEINLINE int atomic_cas32_acquire(atomic32_t *dst, int32_t val, + int32_t ref) { + return atomic_compare_exchange_weak_explicit( + dst, &ref, val, memory_order_acquire, memory_order_relaxed); +} +static FORCEINLINE void atomic_store32_release(atomic32_t *dst, int32_t val) { + atomic_store_explicit(dst, val, memory_order_release); +} +static FORCEINLINE int64_t atomic_load64(atomic64_t *val) { + return atomic_load_explicit(val, memory_order_relaxed); +} +static FORCEINLINE int64_t atomic_add64(atomic64_t *val, int64_t add) { + return atomic_fetch_add_explicit(val, add, memory_order_relaxed) + add; +} +static FORCEINLINE void *atomic_load_ptr(atomicptr_t *src) { + return atomic_load_explicit(src, memory_order_relaxed); +} +static FORCEINLINE void atomic_store_ptr(atomicptr_t *dst, void *val) { + atomic_store_explicit(dst, val, memory_order_relaxed); +} +static FORCEINLINE void atomic_store_ptr_release(atomicptr_t *dst, void *val) { + atomic_store_explicit(dst, val, memory_order_release); +} +static FORCEINLINE void *atomic_exchange_ptr_acquire(atomicptr_t *dst, + void *val) { + return atomic_exchange_explicit(dst, val, memory_order_acquire); +} +static FORCEINLINE int atomic_cas_ptr(atomicptr_t *dst, void *val, void *ref) { + return atomic_compare_exchange_weak_explicit( + dst, &ref, val, memory_order_relaxed, memory_order_relaxed); +} + +#define EXPECTED(x) __builtin_expect((x), 1) +#define UNEXPECTED(x) __builtin_expect((x), 0) + +#endif + +//////////// +/// +/// Statistics related functions (evaluate to nothing when statistics not +/// enabled) +/// +////// + +#if ENABLE_STATISTICS +#define _rpmalloc_stat_inc(counter) atomic_incr32(counter) +#define _rpmalloc_stat_dec(counter) atomic_decr32(counter) +#define _rpmalloc_stat_add(counter, value) \ + atomic_add32(counter, (int32_t)(value)) +#define _rpmalloc_stat_add64(counter, value) \ + atomic_add64(counter, (int64_t)(value)) +#define _rpmalloc_stat_add_peak(counter, value, peak) \ + do { \ + int32_t _cur_count = atomic_add32(counter, (int32_t)(value)); \ + if (_cur_count > (peak)) \ + peak = _cur_count; \ + } while (0) +#define _rpmalloc_stat_sub(counter, value) \ + atomic_add32(counter, -(int32_t)(value)) +#define _rpmalloc_stat_inc_alloc(heap, class_idx) \ + do { \ + int32_t alloc_current = \ + atomic_incr32(&heap->size_class_use[class_idx].alloc_current); \ + if (alloc_current > heap->size_class_use[class_idx].alloc_peak) \ + heap->size_class_use[class_idx].alloc_peak = alloc_current; \ + atomic_incr32(&heap->size_class_use[class_idx].alloc_total); \ + } while (0) +#define _rpmalloc_stat_inc_free(heap, class_idx) \ + do { \ + atomic_decr32(&heap->size_class_use[class_idx].alloc_current); \ + atomic_incr32(&heap->size_class_use[class_idx].free_total); \ + } while (0) +#else +#define _rpmalloc_stat_inc(counter) \ + do { \ + } while (0) +#define _rpmalloc_stat_dec(counter) \ + do { \ + } while (0) +#define _rpmalloc_stat_add(counter, value) \ + do { \ + } while (0) +#define _rpmalloc_stat_add64(counter, value) \ + do { \ + } while (0) +#define _rpmalloc_stat_add_peak(counter, value, peak) \ + do { \ + } while (0) +#define _rpmalloc_stat_sub(counter, value) \ + do { \ + } while (0) +#define _rpmalloc_stat_inc_alloc(heap, class_idx) \ + do { \ + } while (0) +#define _rpmalloc_stat_inc_free(heap, class_idx) \ + do { \ + } while (0) +#endif + +/// +/// Preconfigured limits and sizes +/// + +//! Granularity of a small allocation block (must be power of two) +#define SMALL_GRANULARITY 16 +//! Small granularity shift count +#define SMALL_GRANULARITY_SHIFT 4 +//! Number of small block size classes +#define SMALL_CLASS_COUNT 65 +//! Maximum size of a small block +#define SMALL_SIZE_LIMIT (SMALL_GRANULARITY * (SMALL_CLASS_COUNT - 1)) +//! Granularity of a medium allocation block +#define MEDIUM_GRANULARITY 512 +//! Medium granularity shift count +#define MEDIUM_GRANULARITY_SHIFT 9 +//! Number of medium block size classes +#define MEDIUM_CLASS_COUNT 61 +//! Total number of small + medium size classes +#define SIZE_CLASS_COUNT (SMALL_CLASS_COUNT + MEDIUM_CLASS_COUNT) +//! Number of large block size classes +#define LARGE_CLASS_COUNT 63 +//! Maximum size of a medium block +#define MEDIUM_SIZE_LIMIT \ + (SMALL_SIZE_LIMIT + (MEDIUM_GRANULARITY * MEDIUM_CLASS_COUNT)) +//! Maximum size of a large block +#define LARGE_SIZE_LIMIT \ + ((LARGE_CLASS_COUNT * _memory_span_size) - SPAN_HEADER_SIZE) +//! Size of a span header (must be a multiple of SMALL_GRANULARITY and a power +//! of two) +#define SPAN_HEADER_SIZE 128 +//! Number of spans in thread cache +#define MAX_THREAD_SPAN_CACHE 400 +//! Number of spans to transfer between thread and global cache +#define THREAD_SPAN_CACHE_TRANSFER 64 +//! Number of spans in thread cache for large spans (must be greater than +//! LARGE_CLASS_COUNT / 2) +#define MAX_THREAD_SPAN_LARGE_CACHE 100 +//! Number of spans to transfer between thread and global cache for large spans +#define THREAD_SPAN_LARGE_CACHE_TRANSFER 6 + +_Static_assert((SMALL_GRANULARITY & (SMALL_GRANULARITY - 1)) == 0, + "Small granularity must be power of two"); +_Static_assert((SPAN_HEADER_SIZE & (SPAN_HEADER_SIZE - 1)) == 0, + "Span header size must be power of two"); + +#if ENABLE_VALIDATE_ARGS +//! Maximum allocation size to avoid integer overflow +#undef MAX_ALLOC_SIZE +#define MAX_ALLOC_SIZE (((size_t) - 1) - _memory_span_size) +#endif + +#define pointer_offset(ptr, ofs) (void *)((char *)(ptr) + (ptrdiff_t)(ofs)) +#define pointer_diff(first, second) \ + (ptrdiff_t)((const char *)(first) - (const char *)(second)) + +#define INVALID_POINTER ((void *)((uintptr_t) - 1)) + +#define SIZE_CLASS_LARGE SIZE_CLASS_COUNT +#define SIZE_CLASS_HUGE ((uint32_t) - 1) + +//////////// +/// +/// Data types +/// +////// + +//! A memory heap, per thread +typedef struct heap_t heap_t; +//! Span of memory pages +typedef struct span_t span_t; +//! Span list +typedef struct span_list_t span_list_t; +//! Span active data +typedef struct span_active_t span_active_t; +//! Size class definition +typedef struct size_class_t size_class_t; +//! Global cache +typedef struct global_cache_t global_cache_t; + +//! Flag indicating span is the first (master) span of a split superspan +#define SPAN_FLAG_MASTER 1U +//! Flag indicating span is a secondary (sub) span of a split superspan +#define SPAN_FLAG_SUBSPAN 2U +//! Flag indicating span has blocks with increased alignment +#define SPAN_FLAG_ALIGNED_BLOCKS 4U +//! Flag indicating an unmapped master span +#define SPAN_FLAG_UNMAPPED_MASTER 8U + +#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS +struct span_use_t { + //! Current number of spans used (actually used, not in cache) + atomic32_t current; + //! High water mark of spans used + atomic32_t high; +#if ENABLE_STATISTICS + //! Number of spans in deferred list + atomic32_t spans_deferred; + //! Number of spans transitioned to global cache + atomic32_t spans_to_global; + //! Number of spans transitioned from global cache + atomic32_t spans_from_global; + //! Number of spans transitioned to thread cache + atomic32_t spans_to_cache; + //! Number of spans transitioned from thread cache + atomic32_t spans_from_cache; + //! Number of spans transitioned to reserved state + atomic32_t spans_to_reserved; + //! Number of spans transitioned from reserved state + atomic32_t spans_from_reserved; + //! Number of raw memory map calls + atomic32_t spans_map_calls; +#endif +}; +typedef struct span_use_t span_use_t; +#endif + +#if ENABLE_STATISTICS +struct size_class_use_t { + //! Current number of allocations + atomic32_t alloc_current; + //! Peak number of allocations + int32_t alloc_peak; + //! Total number of allocations + atomic32_t alloc_total; + //! Total number of frees + atomic32_t free_total; + //! Number of spans in use + atomic32_t spans_current; + //! Number of spans transitioned to cache + int32_t spans_peak; + //! Number of spans transitioned to cache + atomic32_t spans_to_cache; + //! Number of spans transitioned from cache + atomic32_t spans_from_cache; + //! Number of spans transitioned from reserved state + atomic32_t spans_from_reserved; + //! Number of spans mapped + atomic32_t spans_map_calls; + int32_t unused; +}; +typedef struct size_class_use_t size_class_use_t; +#endif + +// A span can either represent a single span of memory pages with size declared +// by span_map_count configuration variable, or a set of spans in a continuous +// region, a super span. Any reference to the term "span" usually refers to both +// a single span or a super span. A super span can further be divided into +// multiple spans (or this, super spans), where the first (super)span is the +// master and subsequent (super)spans are subspans. The master span keeps track +// of how many subspans that are still alive and mapped in virtual memory, and +// once all subspans and master have been unmapped the entire superspan region +// is released and unmapped (on Windows for example, the entire superspan range +// has to be released in the same call to release the virtual memory range, but +// individual subranges can be decommitted individually to reduce physical +// memory use). +struct span_t { + //! Free list + void *free_list; + //! Total block count of size class + uint32_t block_count; + //! Size class + uint32_t size_class; + //! Index of last block initialized in free list + uint32_t free_list_limit; + //! Number of used blocks remaining when in partial state + uint32_t used_count; + //! Deferred free list + atomicptr_t free_list_deferred; + //! Size of deferred free list, or list of spans when part of a cache list + uint32_t list_size; + //! Size of a block + uint32_t block_size; + //! Flags and counters + uint32_t flags; + //! Number of spans + uint32_t span_count; + //! Total span counter for master spans + uint32_t total_spans; + //! Offset from master span for subspans + uint32_t offset_from_master; + //! Remaining span counter, for master spans + atomic32_t remaining_spans; + //! Alignment offset + uint32_t align_offset; + //! Owning heap + heap_t *heap; + //! Next span + span_t *next; + //! Previous span + span_t *prev; +}; +_Static_assert(sizeof(span_t) <= SPAN_HEADER_SIZE, "span size mismatch"); + +struct span_cache_t { + size_t count; + span_t *span[MAX_THREAD_SPAN_CACHE]; +}; +typedef struct span_cache_t span_cache_t; + +struct span_large_cache_t { + size_t count; + span_t *span[MAX_THREAD_SPAN_LARGE_CACHE]; +}; +typedef struct span_large_cache_t span_large_cache_t; + +struct heap_size_class_t { + //! Free list of active span + void *free_list; + //! Double linked list of partially used spans with free blocks. + // Previous span pointer in head points to tail span of list. + span_t *partial_span; + //! Early level cache of fully free spans + span_t *cache; +}; +typedef struct heap_size_class_t heap_size_class_t; + +// Control structure for a heap, either a thread heap or a first class heap if +// enabled +struct heap_t { + //! Owning thread ID + uintptr_t owner_thread; + //! Free lists for each size class + heap_size_class_t size_class[SIZE_CLASS_COUNT]; +#if ENABLE_THREAD_CACHE + //! Arrays of fully freed spans, single span + span_cache_t span_cache; +#endif + //! List of deferred free spans (single linked list) + atomicptr_t span_free_deferred; + //! Number of full spans + size_t full_span_count; + //! Mapped but unused spans + span_t *span_reserve; + //! Master span for mapped but unused spans + span_t *span_reserve_master; + //! Number of mapped but unused spans + uint32_t spans_reserved; + //! Child count + atomic32_t child_count; + //! Next heap in id list + heap_t *next_heap; + //! Next heap in orphan list + heap_t *next_orphan; + //! Heap ID + int32_t id; + //! Finalization state flag + int finalize; + //! Master heap owning the memory pages + heap_t *master_heap; +#if ENABLE_THREAD_CACHE + //! Arrays of fully freed spans, large spans with > 1 span count + span_large_cache_t span_large_cache[LARGE_CLASS_COUNT - 1]; +#endif +#if RPMALLOC_FIRST_CLASS_HEAPS + //! Double linked list of fully utilized spans with free blocks for each size + //! class. + // Previous span pointer in head points to tail span of list. + span_t *full_span[SIZE_CLASS_COUNT]; + //! Double linked list of large and huge spans allocated by this heap + span_t *large_huge_span; +#endif +#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS + //! Current and high water mark of spans used per span count + span_use_t span_use[LARGE_CLASS_COUNT]; +#endif +#if ENABLE_STATISTICS + //! Allocation stats per size class + size_class_use_t size_class_use[SIZE_CLASS_COUNT + 1]; + //! Number of bytes transitioned thread -> global + atomic64_t thread_to_global; + //! Number of bytes transitioned global -> thread + atomic64_t global_to_thread; +#endif +}; + +// Size class for defining a block size bucket +struct size_class_t { + //! Size of blocks in this class + uint32_t block_size; + //! Number of blocks in each chunk + uint16_t block_count; + //! Class index this class is merged with + uint16_t class_idx; +}; +_Static_assert(sizeof(size_class_t) == 8, "Size class size mismatch"); + +struct global_cache_t { + //! Cache lock + atomic32_t lock; + //! Cache count + uint32_t count; +#if ENABLE_STATISTICS + //! Insert count + size_t insert_count; + //! Extract count + size_t extract_count; +#endif + //! Cached spans + span_t *span[GLOBAL_CACHE_MULTIPLIER * MAX_THREAD_SPAN_CACHE]; + //! Unlimited cache overflow + span_t *overflow; +}; + +//////////// +/// +/// Global data +/// +////// + +//! Default span size (64KiB) +#define _memory_default_span_size (64 * 1024) +#define _memory_default_span_size_shift 16 +#define _memory_default_span_mask (~((uintptr_t)(_memory_span_size - 1))) + +//! Initialized flag +static int _rpmalloc_initialized; +//! Main thread ID +static uintptr_t _rpmalloc_main_thread_id; +//! Configuration +static rpmalloc_config_t _memory_config; +//! Memory page size +static size_t _memory_page_size; +//! Shift to divide by page size +static size_t _memory_page_size_shift; +//! Granularity at which memory pages are mapped by OS +static size_t _memory_map_granularity; +#if RPMALLOC_CONFIGURABLE +//! Size of a span of memory pages +static size_t _memory_span_size; +//! Shift to divide by span size +static size_t _memory_span_size_shift; +//! Mask to get to start of a memory span +static uintptr_t _memory_span_mask; +#else +//! Hardwired span size +#define _memory_span_size _memory_default_span_size +#define _memory_span_size_shift _memory_default_span_size_shift +#define _memory_span_mask _memory_default_span_mask +#endif +//! Number of spans to map in each map call +static size_t _memory_span_map_count; +//! Number of spans to keep reserved in each heap +static size_t _memory_heap_reserve_count; +//! Global size classes +static size_class_t _memory_size_class[SIZE_CLASS_COUNT]; +//! Run-time size limit of medium blocks +static size_t _memory_medium_size_limit; +//! Heap ID counter +static atomic32_t _memory_heap_id; +//! Huge page support +static int _memory_huge_pages; +#if ENABLE_GLOBAL_CACHE +//! Global span cache +static global_cache_t _memory_span_cache[LARGE_CLASS_COUNT]; +#endif +//! Global reserved spans +static span_t *_memory_global_reserve; +//! Global reserved count +static size_t _memory_global_reserve_count; +//! Global reserved master +static span_t *_memory_global_reserve_master; +//! All heaps +static heap_t *_memory_heaps[HEAP_ARRAY_SIZE]; +//! Used to restrict access to mapping memory for huge pages +static atomic32_t _memory_global_lock; +//! Orphaned heaps +static heap_t *_memory_orphan_heaps; +#if RPMALLOC_FIRST_CLASS_HEAPS +//! Orphaned heaps (first class heaps) +static heap_t *_memory_first_class_orphan_heaps; +#endif +#if ENABLE_STATISTICS +//! Allocations counter +static atomic64_t _allocation_counter; +//! Deallocations counter +static atomic64_t _deallocation_counter; +//! Active heap count +static atomic32_t _memory_active_heaps; +//! Number of currently mapped memory pages +static atomic32_t _mapped_pages; +//! Peak number of concurrently mapped memory pages +static int32_t _mapped_pages_peak; +//! Number of mapped master spans +static atomic32_t _master_spans; +//! Number of unmapped dangling master spans +static atomic32_t _unmapped_master_spans; +//! Running counter of total number of mapped memory pages since start +static atomic32_t _mapped_total; +//! Running counter of total number of unmapped memory pages since start +static atomic32_t _unmapped_total; +//! Number of currently mapped memory pages in OS calls +static atomic32_t _mapped_pages_os; +//! Number of currently allocated pages in huge allocations +static atomic32_t _huge_pages_current; +//! Peak number of currently allocated pages in huge allocations +static int32_t _huge_pages_peak; +#endif + +//////////// +/// +/// Thread local heap and ID +/// +////// + +//! Current thread heap +#if ((defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD) || \ + defined(__TINYC__) +static pthread_key_t _memory_thread_heap; +#else +#ifdef _MSC_VER +#define _Thread_local __declspec(thread) +#define TLS_MODEL +#else +#ifndef __HAIKU__ +#define TLS_MODEL __attribute__((tls_model("initial-exec"))) +#else +#define TLS_MODEL +#endif +#if !defined(__clang__) && defined(__GNUC__) +#define _Thread_local __thread +#endif +#endif +static _Thread_local heap_t *_memory_thread_heap TLS_MODEL; +#endif + +static inline heap_t *get_thread_heap_raw(void) { +#if (defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD + return pthread_getspecific(_memory_thread_heap); +#else + return _memory_thread_heap; +#endif +} + +//! Get the current thread heap +static inline heap_t *get_thread_heap(void) { + heap_t *heap = get_thread_heap_raw(); +#if ENABLE_PRELOAD + if (EXPECTED(heap != 0)) + return heap; + rpmalloc_initialize(); + return get_thread_heap_raw(); +#else + return heap; +#endif +} + +//! Fast thread ID +static inline uintptr_t get_thread_id(void) { +#if defined(_WIN32) + return (uintptr_t)((void *)NtCurrentTeb()); +#elif (defined(__GNUC__) || defined(__clang__)) && !defined(__CYGWIN__) + uintptr_t tid; +#if defined(__i386__) + __asm__("movl %%gs:0, %0" : "=r"(tid) : :); +#elif defined(__x86_64__) +#if defined(__MACH__) + __asm__("movq %%gs:0, %0" : "=r"(tid) : :); +#else + __asm__("movq %%fs:0, %0" : "=r"(tid) : :); +#endif +#elif defined(__arm__) + __asm__ volatile("mrc p15, 0, %0, c13, c0, 3" : "=r"(tid)); +#elif defined(__aarch64__) +#if defined(__MACH__) + // tpidr_el0 likely unused, always return 0 on iOS + __asm__ volatile("mrs %0, tpidrro_el0" : "=r"(tid)); +#else + __asm__ volatile("mrs %0, tpidr_el0" : "=r"(tid)); +#endif +#else +#error This platform needs implementation of get_thread_id() +#endif + return tid; +#else +#error This platform needs implementation of get_thread_id() +#endif +} + +//! Set the current thread heap +static void set_thread_heap(heap_t *heap) { +#if ((defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD) || \ + defined(__TINYC__) + pthread_setspecific(_memory_thread_heap, heap); +#else + _memory_thread_heap = heap; +#endif + if (heap) + heap->owner_thread = get_thread_id(); +} + +//! Set main thread ID +extern void rpmalloc_set_main_thread(void); + +void rpmalloc_set_main_thread(void) { + _rpmalloc_main_thread_id = get_thread_id(); +} + +static void _rpmalloc_spin(void) { +#if defined(_MSC_VER) +#if defined(_M_ARM64) + __yield(); +#else + _mm_pause(); +#endif +#elif defined(__x86_64__) || defined(__i386__) + __asm__ volatile("pause" ::: "memory"); +#elif defined(__aarch64__) || (defined(__arm__) && __ARM_ARCH >= 7) + __asm__ volatile("yield" ::: "memory"); +#elif defined(__powerpc__) || defined(__powerpc64__) + // No idea if ever been compiled in such archs but ... as precaution + __asm__ volatile("or 27,27,27"); +#elif defined(__sparc__) + __asm__ volatile("rd %ccr, %g0 \n\trd %ccr, %g0 \n\trd %ccr, %g0"); +#else + struct timespec ts = {0}; + nanosleep(&ts, 0); +#endif +} + +#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) +static void NTAPI _rpmalloc_thread_destructor(void *value) { +#if ENABLE_OVERRIDE + // If this is called on main thread it means rpmalloc_finalize + // has not been called and shutdown is forced (through _exit) or unclean + if (get_thread_id() == _rpmalloc_main_thread_id) + return; +#endif + if (value) + rpmalloc_thread_finalize(1); +} +#endif + +//////////// +/// +/// Low level memory map/unmap +/// +////// + +static void _rpmalloc_set_name(void *address, size_t size) { +#if defined(__linux__) || defined(__ANDROID__) + const char *name = _memory_huge_pages ? _memory_config.huge_page_name + : _memory_config.page_name; + if (address == MAP_FAILED || !name) + return; + // If the kernel does not support CONFIG_ANON_VMA_NAME or if the call fails + // (e.g. invalid name) it is a no-op basically. + (void)prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, (uintptr_t)address, size, + (uintptr_t)name); +#else + (void)sizeof(size); + (void)sizeof(address); +#endif +} + +//! Map more virtual memory +// size is number of bytes to map +// offset receives the offset in bytes from start of mapped region +// returns address to start of mapped region to use +static void *_rpmalloc_mmap(size_t size, size_t *offset) { + rpmalloc_assert(!(size % _memory_page_size), "Invalid mmap size"); + rpmalloc_assert(size >= _memory_page_size, "Invalid mmap size"); + void *address = _memory_config.memory_map(size, offset); + if (EXPECTED(address != 0)) { + _rpmalloc_stat_add_peak(&_mapped_pages, (size >> _memory_page_size_shift), + _mapped_pages_peak); + _rpmalloc_stat_add(&_mapped_total, (size >> _memory_page_size_shift)); + } + return address; +} + +//! Unmap virtual memory +// address is the memory address to unmap, as returned from _memory_map +// size is the number of bytes to unmap, which might be less than full region +// for a partial unmap offset is the offset in bytes to the actual mapped +// region, as set by _memory_map release is set to 0 for partial unmap, or size +// of entire range for a full unmap +static void _rpmalloc_unmap(void *address, size_t size, size_t offset, + size_t release) { + rpmalloc_assert(!release || (release >= size), "Invalid unmap size"); + rpmalloc_assert(!release || (release >= _memory_page_size), + "Invalid unmap size"); + if (release) { + rpmalloc_assert(!(release % _memory_page_size), "Invalid unmap size"); + _rpmalloc_stat_sub(&_mapped_pages, (release >> _memory_page_size_shift)); + _rpmalloc_stat_add(&_unmapped_total, (release >> _memory_page_size_shift)); + } + _memory_config.memory_unmap(address, size, offset, release); +} + +//! Default implementation to map new pages to virtual memory +static void *_rpmalloc_mmap_os(size_t size, size_t *offset) { + // Either size is a heap (a single page) or a (multiple) span - we only need + // to align spans, and only if larger than map granularity + size_t padding = ((size >= _memory_span_size) && + (_memory_span_size > _memory_map_granularity)) + ? _memory_span_size + : 0; + rpmalloc_assert(size >= _memory_page_size, "Invalid mmap size"); +#if PLATFORM_WINDOWS + // Ok to MEM_COMMIT - according to MSDN, "actual physical pages are not + // allocated unless/until the virtual addresses are actually accessed" + void *ptr = VirtualAlloc(0, size + padding, + (_memory_huge_pages ? MEM_LARGE_PAGES : 0) | + MEM_RESERVE | MEM_COMMIT, + PAGE_READWRITE); + if (!ptr) { + if (_memory_config.map_fail_callback) { + if (_memory_config.map_fail_callback(size + padding)) + return _rpmalloc_mmap_os(size, offset); + } else { + rpmalloc_assert(ptr, "Failed to map virtual memory block"); + } + return 0; + } +#else + int flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_UNINITIALIZED; +#if defined(__APPLE__) && !TARGET_OS_IPHONE && !TARGET_OS_SIMULATOR + int fd = (int)VM_MAKE_TAG(240U); + if (_memory_huge_pages) + fd |= VM_FLAGS_SUPERPAGE_SIZE_2MB; + void *ptr = mmap(0, size + padding, PROT_READ | PROT_WRITE, flags, fd, 0); +#elif defined(MAP_HUGETLB) + void *ptr = mmap(0, size + padding, + PROT_READ | PROT_WRITE | PROT_MAX(PROT_READ | PROT_WRITE), + (_memory_huge_pages ? MAP_HUGETLB : 0) | flags, -1, 0); +#if defined(MADV_HUGEPAGE) + // In some configurations, huge pages allocations might fail thus + // we fallback to normal allocations and promote the region as transparent + // huge page + if ((ptr == MAP_FAILED || !ptr) && _memory_huge_pages) { + ptr = mmap(0, size + padding, PROT_READ | PROT_WRITE, flags, -1, 0); + if (ptr && ptr != MAP_FAILED) { + int prm = madvise(ptr, size + padding, MADV_HUGEPAGE); + (void)prm; + rpmalloc_assert((prm == 0), "Failed to promote the page to THP"); + } + } +#endif + _rpmalloc_set_name(ptr, size + padding); +#elif defined(MAP_ALIGNED) + const size_t align = + (sizeof(size_t) * 8) - (size_t)(__builtin_clzl(size - 1)); + void *ptr = + mmap(0, size + padding, PROT_READ | PROT_WRITE, + (_memory_huge_pages ? MAP_ALIGNED(align) : 0) | flags, -1, 0); +#elif defined(MAP_ALIGN) + caddr_t base = (_memory_huge_pages ? (caddr_t)(4 << 20) : 0); + void *ptr = mmap(base, size + padding, PROT_READ | PROT_WRITE, + (_memory_huge_pages ? MAP_ALIGN : 0) | flags, -1, 0); +#else + void *ptr = mmap(0, size + padding, PROT_READ | PROT_WRITE, flags, -1, 0); +#endif + if ((ptr == MAP_FAILED) || !ptr) { + if (_memory_config.map_fail_callback) { + if (_memory_config.map_fail_callback(size + padding)) + return _rpmalloc_mmap_os(size, offset); + } else if (errno != ENOMEM) { + rpmalloc_assert((ptr != MAP_FAILED) && ptr, + "Failed to map virtual memory block"); + } + return 0; + } +#endif + _rpmalloc_stat_add(&_mapped_pages_os, + (int32_t)((size + padding) >> _memory_page_size_shift)); + if (padding) { + size_t final_padding = padding - ((uintptr_t)ptr & ~_memory_span_mask); + rpmalloc_assert(final_padding <= _memory_span_size, + "Internal failure in padding"); + rpmalloc_assert(final_padding <= padding, "Internal failure in padding"); + rpmalloc_assert(!(final_padding % 8), "Internal failure in padding"); + ptr = pointer_offset(ptr, final_padding); + *offset = final_padding >> 3; + } + rpmalloc_assert((size < _memory_span_size) || + !((uintptr_t)ptr & ~_memory_span_mask), + "Internal failure in padding"); + return ptr; +} + +//! Default implementation to unmap pages from virtual memory +static void _rpmalloc_unmap_os(void *address, size_t size, size_t offset, + size_t release) { + rpmalloc_assert(release || (offset == 0), "Invalid unmap size"); + rpmalloc_assert(!release || (release >= _memory_page_size), + "Invalid unmap size"); + rpmalloc_assert(size >= _memory_page_size, "Invalid unmap size"); + if (release && offset) { + offset <<= 3; + address = pointer_offset(address, -(int32_t)offset); + if ((release >= _memory_span_size) && + (_memory_span_size > _memory_map_granularity)) { + // Padding is always one span size + release += _memory_span_size; + } + } +#if !DISABLE_UNMAP +#if PLATFORM_WINDOWS + if (!VirtualFree(address, release ? 0 : size, + release ? MEM_RELEASE : MEM_DECOMMIT)) { + rpmalloc_assert(0, "Failed to unmap virtual memory block"); + } +#else + if (release) { + if (munmap(address, release)) { + rpmalloc_assert(0, "Failed to unmap virtual memory block"); + } + } else { +#if defined(MADV_FREE_REUSABLE) + int ret; + while ((ret = madvise(address, size, MADV_FREE_REUSABLE)) == -1 && + (errno == EAGAIN)) + errno = 0; + if ((ret == -1) && (errno != 0)) { +#elif defined(MADV_DONTNEED) + if (madvise(address, size, MADV_DONTNEED)) { +#elif defined(MADV_PAGEOUT) + if (madvise(address, size, MADV_PAGEOUT)) { +#elif defined(MADV_FREE) + if (madvise(address, size, MADV_FREE)) { +#else + if (posix_madvise(address, size, POSIX_MADV_DONTNEED)) { +#endif + rpmalloc_assert(0, "Failed to madvise virtual memory block as free"); + } + } +#endif +#endif + if (release) + _rpmalloc_stat_sub(&_mapped_pages_os, release >> _memory_page_size_shift); +} + +static void _rpmalloc_span_mark_as_subspan_unless_master(span_t *master, + span_t *subspan, + size_t span_count); + +//! Use global reserved spans to fulfill a memory map request (reserve size must +//! be checked by caller) +static span_t *_rpmalloc_global_get_reserved_spans(size_t span_count) { + span_t *span = _memory_global_reserve; + _rpmalloc_span_mark_as_subspan_unless_master(_memory_global_reserve_master, + span, span_count); + _memory_global_reserve_count -= span_count; + if (_memory_global_reserve_count) + _memory_global_reserve = + (span_t *)pointer_offset(span, span_count << _memory_span_size_shift); + else + _memory_global_reserve = 0; + return span; +} + +//! Store the given spans as global reserve (must only be called from within new +//! heap allocation, not thread safe) +static void _rpmalloc_global_set_reserved_spans(span_t *master, span_t *reserve, + size_t reserve_span_count) { + _memory_global_reserve_master = master; + _memory_global_reserve_count = reserve_span_count; + _memory_global_reserve = reserve; +} + +//////////// +/// +/// Span linked list management +/// +////// + +//! Add a span to double linked list at the head +static void _rpmalloc_span_double_link_list_add(span_t **head, span_t *span) { + if (*head) + (*head)->prev = span; + span->next = *head; + *head = span; +} + +//! Pop head span from double linked list +static void _rpmalloc_span_double_link_list_pop_head(span_t **head, + span_t *span) { + rpmalloc_assert(*head == span, "Linked list corrupted"); + span = *head; + *head = span->next; +} + +//! Remove a span from double linked list +static void _rpmalloc_span_double_link_list_remove(span_t **head, + span_t *span) { + rpmalloc_assert(*head, "Linked list corrupted"); + if (*head == span) { + *head = span->next; + } else { + span_t *next_span = span->next; + span_t *prev_span = span->prev; + prev_span->next = next_span; + if (EXPECTED(next_span != 0)) + next_span->prev = prev_span; + } +} + +//////////// +/// +/// Span control +/// +////// + +static void _rpmalloc_heap_cache_insert(heap_t *heap, span_t *span); + +static void _rpmalloc_heap_finalize(heap_t *heap); + +static void _rpmalloc_heap_set_reserved_spans(heap_t *heap, span_t *master, + span_t *reserve, + size_t reserve_span_count); + +//! Declare the span to be a subspan and store distance from master span and +//! span count +static void _rpmalloc_span_mark_as_subspan_unless_master(span_t *master, + span_t *subspan, + size_t span_count) { + rpmalloc_assert((subspan != master) || (subspan->flags & SPAN_FLAG_MASTER), + "Span master pointer and/or flag mismatch"); + if (subspan != master) { + subspan->flags = SPAN_FLAG_SUBSPAN; + subspan->offset_from_master = + (uint32_t)((uintptr_t)pointer_diff(subspan, master) >> + _memory_span_size_shift); + subspan->align_offset = 0; + } + subspan->span_count = (uint32_t)span_count; +} + +//! Use reserved spans to fulfill a memory map request (reserve size must be +//! checked by caller) +static span_t *_rpmalloc_span_map_from_reserve(heap_t *heap, + size_t span_count) { + // Update the heap span reserve + span_t *span = heap->span_reserve; + heap->span_reserve = + (span_t *)pointer_offset(span, span_count * _memory_span_size); + heap->spans_reserved -= (uint32_t)span_count; + + _rpmalloc_span_mark_as_subspan_unless_master(heap->span_reserve_master, span, + span_count); + if (span_count <= LARGE_CLASS_COUNT) + _rpmalloc_stat_inc(&heap->span_use[span_count - 1].spans_from_reserved); + + return span; +} + +//! Get the aligned number of spans to map in based on wanted count, configured +//! mapping granularity and the page size +static size_t _rpmalloc_span_align_count(size_t span_count) { + size_t request_count = (span_count > _memory_span_map_count) + ? span_count + : _memory_span_map_count; + if ((_memory_page_size > _memory_span_size) && + ((request_count * _memory_span_size) % _memory_page_size)) + request_count += + _memory_span_map_count - (request_count % _memory_span_map_count); + return request_count; +} + +//! Setup a newly mapped span +static void _rpmalloc_span_initialize(span_t *span, size_t total_span_count, + size_t span_count, size_t align_offset) { + span->total_spans = (uint32_t)total_span_count; + span->span_count = (uint32_t)span_count; + span->align_offset = (uint32_t)align_offset; + span->flags = SPAN_FLAG_MASTER; + atomic_store32(&span->remaining_spans, (int32_t)total_span_count); +} + +static void _rpmalloc_span_unmap(span_t *span); + +//! Map an aligned set of spans, taking configured mapping granularity and the +//! page size into account +static span_t *_rpmalloc_span_map_aligned_count(heap_t *heap, + size_t span_count) { + // If we already have some, but not enough, reserved spans, release those to + // heap cache and map a new full set of spans. Otherwise we would waste memory + // if page size > span size (huge pages) + size_t aligned_span_count = _rpmalloc_span_align_count(span_count); + size_t align_offset = 0; + span_t *span = (span_t *)_rpmalloc_mmap( + aligned_span_count * _memory_span_size, &align_offset); + if (!span) + return 0; + _rpmalloc_span_initialize(span, aligned_span_count, span_count, align_offset); + _rpmalloc_stat_inc(&_master_spans); + if (span_count <= LARGE_CLASS_COUNT) + _rpmalloc_stat_inc(&heap->span_use[span_count - 1].spans_map_calls); + if (aligned_span_count > span_count) { + span_t *reserved_spans = + (span_t *)pointer_offset(span, span_count * _memory_span_size); + size_t reserved_count = aligned_span_count - span_count; + if (heap->spans_reserved) { + _rpmalloc_span_mark_as_subspan_unless_master( + heap->span_reserve_master, heap->span_reserve, heap->spans_reserved); + _rpmalloc_heap_cache_insert(heap, heap->span_reserve); + } + if (reserved_count > _memory_heap_reserve_count) { + // If huge pages or eager spam map count, the global reserve spin lock is + // held by caller, _rpmalloc_span_map + rpmalloc_assert(atomic_load32(&_memory_global_lock) == 1, + "Global spin lock not held as expected"); + size_t remain_count = reserved_count - _memory_heap_reserve_count; + reserved_count = _memory_heap_reserve_count; + span_t *remain_span = (span_t *)pointer_offset( + reserved_spans, reserved_count * _memory_span_size); + if (_memory_global_reserve) { + _rpmalloc_span_mark_as_subspan_unless_master( + _memory_global_reserve_master, _memory_global_reserve, + _memory_global_reserve_count); + _rpmalloc_span_unmap(_memory_global_reserve); + } + _rpmalloc_global_set_reserved_spans(span, remain_span, remain_count); + } + _rpmalloc_heap_set_reserved_spans(heap, span, reserved_spans, + reserved_count); + } + return span; +} + +//! Map in memory pages for the given number of spans (or use previously +//! reserved pages) +static span_t *_rpmalloc_span_map(heap_t *heap, size_t span_count) { + if (span_count <= heap->spans_reserved) + return _rpmalloc_span_map_from_reserve(heap, span_count); + span_t *span = 0; + int use_global_reserve = + (_memory_page_size > _memory_span_size) || + (_memory_span_map_count > _memory_heap_reserve_count); + if (use_global_reserve) { + // If huge pages, make sure only one thread maps more memory to avoid bloat + while (!atomic_cas32_acquire(&_memory_global_lock, 1, 0)) + _rpmalloc_spin(); + if (_memory_global_reserve_count >= span_count) { + size_t reserve_count = + (!heap->spans_reserved ? _memory_heap_reserve_count : span_count); + if (_memory_global_reserve_count < reserve_count) + reserve_count = _memory_global_reserve_count; + span = _rpmalloc_global_get_reserved_spans(reserve_count); + if (span) { + if (reserve_count > span_count) { + span_t *reserved_span = (span_t *)pointer_offset( + span, span_count << _memory_span_size_shift); + _rpmalloc_heap_set_reserved_spans(heap, _memory_global_reserve_master, + reserved_span, + reserve_count - span_count); + } + // Already marked as subspan in _rpmalloc_global_get_reserved_spans + span->span_count = (uint32_t)span_count; + } + } + } + if (!span) + span = _rpmalloc_span_map_aligned_count(heap, span_count); + if (use_global_reserve) + atomic_store32_release(&_memory_global_lock, 0); + return span; +} + +//! Unmap memory pages for the given number of spans (or mark as unused if no +//! partial unmappings) +static void _rpmalloc_span_unmap(span_t *span) { + rpmalloc_assert((span->flags & SPAN_FLAG_MASTER) || + (span->flags & SPAN_FLAG_SUBSPAN), + "Span flag corrupted"); + rpmalloc_assert(!(span->flags & SPAN_FLAG_MASTER) || + !(span->flags & SPAN_FLAG_SUBSPAN), + "Span flag corrupted"); + + int is_master = !!(span->flags & SPAN_FLAG_MASTER); + span_t *master = + is_master ? span + : ((span_t *)pointer_offset( + span, -(intptr_t)((uintptr_t)span->offset_from_master * + _memory_span_size))); + rpmalloc_assert(is_master || (span->flags & SPAN_FLAG_SUBSPAN), + "Span flag corrupted"); + rpmalloc_assert(master->flags & SPAN_FLAG_MASTER, "Span flag corrupted"); + + size_t span_count = span->span_count; + if (!is_master) { + // Directly unmap subspans (unless huge pages, in which case we defer and + // unmap entire page range with master) + rpmalloc_assert(span->align_offset == 0, "Span align offset corrupted"); + if (_memory_span_size >= _memory_page_size) + _rpmalloc_unmap(span, span_count * _memory_span_size, 0, 0); + } else { + // Special double flag to denote an unmapped master + // It must be kept in memory since span header must be used + span->flags |= + SPAN_FLAG_MASTER | SPAN_FLAG_SUBSPAN | SPAN_FLAG_UNMAPPED_MASTER; + _rpmalloc_stat_add(&_unmapped_master_spans, 1); + } + + if (atomic_add32(&master->remaining_spans, -(int32_t)span_count) <= 0) { + // Everything unmapped, unmap the master span with release flag to unmap the + // entire range of the super span + rpmalloc_assert(!!(master->flags & SPAN_FLAG_MASTER) && + !!(master->flags & SPAN_FLAG_SUBSPAN), + "Span flag corrupted"); + size_t unmap_count = master->span_count; + if (_memory_span_size < _memory_page_size) + unmap_count = master->total_spans; + _rpmalloc_stat_sub(&_master_spans, 1); + _rpmalloc_stat_sub(&_unmapped_master_spans, 1); + _rpmalloc_unmap(master, unmap_count * _memory_span_size, + master->align_offset, + (size_t)master->total_spans * _memory_span_size); + } +} + +//! Move the span (used for small or medium allocations) to the heap thread +//! cache +static void _rpmalloc_span_release_to_cache(heap_t *heap, span_t *span) { + rpmalloc_assert(heap == span->heap, "Span heap pointer corrupted"); + rpmalloc_assert(span->size_class < SIZE_CLASS_COUNT, + "Invalid span size class"); + rpmalloc_assert(span->span_count == 1, "Invalid span count"); +#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS + atomic_decr32(&heap->span_use[0].current); +#endif + _rpmalloc_stat_dec(&heap->size_class_use[span->size_class].spans_current); + if (!heap->finalize) { + _rpmalloc_stat_inc(&heap->span_use[0].spans_to_cache); + _rpmalloc_stat_inc(&heap->size_class_use[span->size_class].spans_to_cache); + if (heap->size_class[span->size_class].cache) + _rpmalloc_heap_cache_insert(heap, + heap->size_class[span->size_class].cache); + heap->size_class[span->size_class].cache = span; + } else { + _rpmalloc_span_unmap(span); + } +} + +//! Initialize a (partial) free list up to next system memory page, while +//! reserving the first block as allocated, returning number of blocks in list +static uint32_t free_list_partial_init(void **list, void **first_block, + void *page_start, void *block_start, + uint32_t block_count, + uint32_t block_size) { + rpmalloc_assert(block_count, "Internal failure"); + *first_block = block_start; + if (block_count > 1) { + void *free_block = pointer_offset(block_start, block_size); + void *block_end = + pointer_offset(block_start, (size_t)block_size * block_count); + // If block size is less than half a memory page, bound init to next memory + // page boundary + if (block_size < (_memory_page_size >> 1)) { + void *page_end = pointer_offset(page_start, _memory_page_size); + if (page_end < block_end) + block_end = page_end; + } + *list = free_block; + block_count = 2; + void *next_block = pointer_offset(free_block, block_size); + while (next_block < block_end) { + *((void **)free_block) = next_block; + free_block = next_block; + ++block_count; + next_block = pointer_offset(next_block, block_size); + } + *((void **)free_block) = 0; + } else { + *list = 0; + } + return block_count; +} + +//! Initialize an unused span (from cache or mapped) to be new active span, +//! putting the initial free list in heap class free list +static void *_rpmalloc_span_initialize_new(heap_t *heap, + heap_size_class_t *heap_size_class, + span_t *span, uint32_t class_idx) { + rpmalloc_assert(span->span_count == 1, "Internal failure"); + size_class_t *size_class = _memory_size_class + class_idx; + span->size_class = class_idx; + span->heap = heap; + span->flags &= ~SPAN_FLAG_ALIGNED_BLOCKS; + span->block_size = size_class->block_size; + span->block_count = size_class->block_count; + span->free_list = 0; + span->list_size = 0; + atomic_store_ptr_release(&span->free_list_deferred, 0); + + // Setup free list. Only initialize one system page worth of free blocks in + // list + void *block; + span->free_list_limit = + free_list_partial_init(&heap_size_class->free_list, &block, span, + pointer_offset(span, SPAN_HEADER_SIZE), + size_class->block_count, size_class->block_size); + // Link span as partial if there remains blocks to be initialized as free + // list, or full if fully initialized + if (span->free_list_limit < span->block_count) { + _rpmalloc_span_double_link_list_add(&heap_size_class->partial_span, span); + span->used_count = span->free_list_limit; + } else { +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_add(&heap->full_span[class_idx], span); +#endif + ++heap->full_span_count; + span->used_count = span->block_count; + } + return block; +} + +static void _rpmalloc_span_extract_free_list_deferred(span_t *span) { + // We need acquire semantics on the CAS operation since we are interested in + // the list size Refer to _rpmalloc_deallocate_defer_small_or_medium for + // further comments on this dependency + do { + span->free_list = + atomic_exchange_ptr_acquire(&span->free_list_deferred, INVALID_POINTER); + } while (span->free_list == INVALID_POINTER); + span->used_count -= span->list_size; + span->list_size = 0; + atomic_store_ptr_release(&span->free_list_deferred, 0); +} + +static int _rpmalloc_span_is_fully_utilized(span_t *span) { + rpmalloc_assert(span->free_list_limit <= span->block_count, + "Span free list corrupted"); + return !span->free_list && (span->free_list_limit >= span->block_count); +} + +static int _rpmalloc_span_finalize(heap_t *heap, size_t iclass, span_t *span, + span_t **list_head) { + void *free_list = heap->size_class[iclass].free_list; + span_t *class_span = (span_t *)((uintptr_t)free_list & _memory_span_mask); + if (span == class_span) { + // Adopt the heap class free list back into the span free list + void *block = span->free_list; + void *last_block = 0; + while (block) { + last_block = block; + block = *((void **)block); + } + uint32_t free_count = 0; + block = free_list; + while (block) { + ++free_count; + block = *((void **)block); + } + if (last_block) { + *((void **)last_block) = free_list; + } else { + span->free_list = free_list; + } + heap->size_class[iclass].free_list = 0; + span->used_count -= free_count; + } + // If this assert triggers you have memory leaks + rpmalloc_assert(span->list_size == span->used_count, "Memory leak detected"); + if (span->list_size == span->used_count) { + _rpmalloc_stat_dec(&heap->span_use[0].current); + _rpmalloc_stat_dec(&heap->size_class_use[iclass].spans_current); + // This function only used for spans in double linked lists + if (list_head) + _rpmalloc_span_double_link_list_remove(list_head, span); + _rpmalloc_span_unmap(span); + return 1; + } + return 0; +} + +//////////// +/// +/// Global cache +/// +////// + +#if ENABLE_GLOBAL_CACHE + +//! Finalize a global cache +static void _rpmalloc_global_cache_finalize(global_cache_t *cache) { + while (!atomic_cas32_acquire(&cache->lock, 1, 0)) + _rpmalloc_spin(); + + for (size_t ispan = 0; ispan < cache->count; ++ispan) + _rpmalloc_span_unmap(cache->span[ispan]); + cache->count = 0; + + while (cache->overflow) { + span_t *span = cache->overflow; + cache->overflow = span->next; + _rpmalloc_span_unmap(span); + } + + atomic_store32_release(&cache->lock, 0); +} + +static void _rpmalloc_global_cache_insert_spans(span_t **span, + size_t span_count, + size_t count) { + const size_t cache_limit = + (span_count == 1) ? GLOBAL_CACHE_MULTIPLIER * MAX_THREAD_SPAN_CACHE + : GLOBAL_CACHE_MULTIPLIER * + (MAX_THREAD_SPAN_LARGE_CACHE - (span_count >> 1)); + + global_cache_t *cache = &_memory_span_cache[span_count - 1]; + + size_t insert_count = count; + while (!atomic_cas32_acquire(&cache->lock, 1, 0)) + _rpmalloc_spin(); + +#if ENABLE_STATISTICS + cache->insert_count += count; +#endif + if ((cache->count + insert_count) > cache_limit) + insert_count = cache_limit - cache->count; + + memcpy(cache->span + cache->count, span, sizeof(span_t *) * insert_count); + cache->count += (uint32_t)insert_count; + +#if ENABLE_UNLIMITED_CACHE + while (insert_count < count) { +#else + // Enable unlimited cache if huge pages, or we will leak since it is unlikely + // that an entire huge page will be unmapped, and we're unable to partially + // decommit a huge page + while ((_memory_page_size > _memory_span_size) && (insert_count < count)) { +#endif + span_t *current_span = span[insert_count++]; + current_span->next = cache->overflow; + cache->overflow = current_span; + } + atomic_store32_release(&cache->lock, 0); + + span_t *keep = 0; + for (size_t ispan = insert_count; ispan < count; ++ispan) { + span_t *current_span = span[ispan]; + // Keep master spans that has remaining subspans to avoid dangling them + if ((current_span->flags & SPAN_FLAG_MASTER) && + (atomic_load32(¤t_span->remaining_spans) > + (int32_t)current_span->span_count)) { + current_span->next = keep; + keep = current_span; + } else { + _rpmalloc_span_unmap(current_span); + } + } + + if (keep) { + while (!atomic_cas32_acquire(&cache->lock, 1, 0)) + _rpmalloc_spin(); + + size_t islot = 0; + while (keep) { + for (; islot < cache->count; ++islot) { + span_t *current_span = cache->span[islot]; + if (!(current_span->flags & SPAN_FLAG_MASTER) || + ((current_span->flags & SPAN_FLAG_MASTER) && + (atomic_load32(¤t_span->remaining_spans) <= + (int32_t)current_span->span_count))) { + _rpmalloc_span_unmap(current_span); + cache->span[islot] = keep; + break; + } + } + if (islot == cache->count) + break; + keep = keep->next; + } + + if (keep) { + span_t *tail = keep; + while (tail->next) + tail = tail->next; + tail->next = cache->overflow; + cache->overflow = keep; + } + + atomic_store32_release(&cache->lock, 0); + } +} + +static size_t _rpmalloc_global_cache_extract_spans(span_t **span, + size_t span_count, + size_t count) { + global_cache_t *cache = &_memory_span_cache[span_count - 1]; + + size_t extract_count = 0; + while (!atomic_cas32_acquire(&cache->lock, 1, 0)) + _rpmalloc_spin(); + +#if ENABLE_STATISTICS + cache->extract_count += count; +#endif + size_t want = count - extract_count; + if (want > cache->count) + want = cache->count; + + memcpy(span + extract_count, cache->span + (cache->count - want), + sizeof(span_t *) * want); + cache->count -= (uint32_t)want; + extract_count += want; + + while ((extract_count < count) && cache->overflow) { + span_t *current_span = cache->overflow; + span[extract_count++] = current_span; + cache->overflow = current_span->next; + } + +#if ENABLE_ASSERTS + for (size_t ispan = 0; ispan < extract_count; ++ispan) { + rpmalloc_assert(span[ispan]->span_count == span_count, + "Global cache span count mismatch"); + } +#endif + + atomic_store32_release(&cache->lock, 0); + + return extract_count; +} + +#endif + +//////////// +/// +/// Heap control +/// +////// + +static void _rpmalloc_deallocate_huge(span_t *); + +//! Store the given spans as reserve in the given heap +static void _rpmalloc_heap_set_reserved_spans(heap_t *heap, span_t *master, + span_t *reserve, + size_t reserve_span_count) { + heap->span_reserve_master = master; + heap->span_reserve = reserve; + heap->spans_reserved = (uint32_t)reserve_span_count; +} + +//! Adopt the deferred span cache list, optionally extracting the first single +//! span for immediate re-use +static void _rpmalloc_heap_cache_adopt_deferred(heap_t *heap, + span_t **single_span) { + span_t *span = (span_t *)((void *)atomic_exchange_ptr_acquire( + &heap->span_free_deferred, 0)); + while (span) { + span_t *next_span = (span_t *)span->free_list; + rpmalloc_assert(span->heap == heap, "Span heap pointer corrupted"); + if (EXPECTED(span->size_class < SIZE_CLASS_COUNT)) { + rpmalloc_assert(heap->full_span_count, "Heap span counter corrupted"); + --heap->full_span_count; + _rpmalloc_stat_dec(&heap->span_use[0].spans_deferred); +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_remove(&heap->full_span[span->size_class], + span); +#endif + _rpmalloc_stat_dec(&heap->span_use[0].current); + _rpmalloc_stat_dec(&heap->size_class_use[span->size_class].spans_current); + if (single_span && !*single_span) + *single_span = span; + else + _rpmalloc_heap_cache_insert(heap, span); + } else { + if (span->size_class == SIZE_CLASS_HUGE) { + _rpmalloc_deallocate_huge(span); + } else { + rpmalloc_assert(span->size_class == SIZE_CLASS_LARGE, + "Span size class invalid"); + rpmalloc_assert(heap->full_span_count, "Heap span counter corrupted"); + --heap->full_span_count; +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_remove(&heap->large_huge_span, span); +#endif + uint32_t idx = span->span_count - 1; + _rpmalloc_stat_dec(&heap->span_use[idx].spans_deferred); + _rpmalloc_stat_dec(&heap->span_use[idx].current); + if (!idx && single_span && !*single_span) + *single_span = span; + else + _rpmalloc_heap_cache_insert(heap, span); + } + } + span = next_span; + } +} + +static void _rpmalloc_heap_unmap(heap_t *heap) { + if (!heap->master_heap) { + if ((heap->finalize > 1) && !atomic_load32(&heap->child_count)) { + span_t *span = (span_t *)((uintptr_t)heap & _memory_span_mask); + _rpmalloc_span_unmap(span); + } + } else { + if (atomic_decr32(&heap->master_heap->child_count) == 0) { + _rpmalloc_heap_unmap(heap->master_heap); + } + } +} + +static void _rpmalloc_heap_global_finalize(heap_t *heap) { + if (heap->finalize++ > 1) { + --heap->finalize; + return; + } + + _rpmalloc_heap_finalize(heap); + +#if ENABLE_THREAD_CACHE + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + span_cache_t *span_cache; + if (!iclass) + span_cache = &heap->span_cache; + else + span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); + for (size_t ispan = 0; ispan < span_cache->count; ++ispan) + _rpmalloc_span_unmap(span_cache->span[ispan]); + span_cache->count = 0; + } +#endif + + if (heap->full_span_count) { + --heap->finalize; + return; + } + + for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { + if (heap->size_class[iclass].free_list || + heap->size_class[iclass].partial_span) { + --heap->finalize; + return; + } + } + // Heap is now completely free, unmap and remove from heap list + size_t list_idx = (size_t)heap->id % HEAP_ARRAY_SIZE; + heap_t *list_heap = _memory_heaps[list_idx]; + if (list_heap == heap) { + _memory_heaps[list_idx] = heap->next_heap; + } else { + while (list_heap->next_heap != heap) + list_heap = list_heap->next_heap; + list_heap->next_heap = heap->next_heap; + } + + _rpmalloc_heap_unmap(heap); +} + +//! Insert a single span into thread heap cache, releasing to global cache if +//! overflow +static void _rpmalloc_heap_cache_insert(heap_t *heap, span_t *span) { + if (UNEXPECTED(heap->finalize != 0)) { + _rpmalloc_span_unmap(span); + _rpmalloc_heap_global_finalize(heap); + return; + } +#if ENABLE_THREAD_CACHE + size_t span_count = span->span_count; + _rpmalloc_stat_inc(&heap->span_use[span_count - 1].spans_to_cache); + if (span_count == 1) { + span_cache_t *span_cache = &heap->span_cache; + span_cache->span[span_cache->count++] = span; + if (span_cache->count == MAX_THREAD_SPAN_CACHE) { + const size_t remain_count = + MAX_THREAD_SPAN_CACHE - THREAD_SPAN_CACHE_TRANSFER; +#if ENABLE_GLOBAL_CACHE + _rpmalloc_stat_add64(&heap->thread_to_global, + THREAD_SPAN_CACHE_TRANSFER * _memory_span_size); + _rpmalloc_stat_add(&heap->span_use[span_count - 1].spans_to_global, + THREAD_SPAN_CACHE_TRANSFER); + _rpmalloc_global_cache_insert_spans(span_cache->span + remain_count, + span_count, + THREAD_SPAN_CACHE_TRANSFER); +#else + for (size_t ispan = 0; ispan < THREAD_SPAN_CACHE_TRANSFER; ++ispan) + _rpmalloc_span_unmap(span_cache->span[remain_count + ispan]); +#endif + span_cache->count = remain_count; + } + } else { + size_t cache_idx = span_count - 2; + span_large_cache_t *span_cache = heap->span_large_cache + cache_idx; + span_cache->span[span_cache->count++] = span; + const size_t cache_limit = + (MAX_THREAD_SPAN_LARGE_CACHE - (span_count >> 1)); + if (span_cache->count == cache_limit) { + const size_t transfer_limit = 2 + (cache_limit >> 2); + const size_t transfer_count = + (THREAD_SPAN_LARGE_CACHE_TRANSFER <= transfer_limit + ? THREAD_SPAN_LARGE_CACHE_TRANSFER + : transfer_limit); + const size_t remain_count = cache_limit - transfer_count; +#if ENABLE_GLOBAL_CACHE + _rpmalloc_stat_add64(&heap->thread_to_global, + transfer_count * span_count * _memory_span_size); + _rpmalloc_stat_add(&heap->span_use[span_count - 1].spans_to_global, + transfer_count); + _rpmalloc_global_cache_insert_spans(span_cache->span + remain_count, + span_count, transfer_count); +#else + for (size_t ispan = 0; ispan < transfer_count; ++ispan) + _rpmalloc_span_unmap(span_cache->span[remain_count + ispan]); +#endif + span_cache->count = remain_count; + } + } +#else + (void)sizeof(heap); + _rpmalloc_span_unmap(span); +#endif +} + +//! Extract the given number of spans from the different cache levels +static span_t *_rpmalloc_heap_thread_cache_extract(heap_t *heap, + size_t span_count) { + span_t *span = 0; +#if ENABLE_THREAD_CACHE + span_cache_t *span_cache; + if (span_count == 1) + span_cache = &heap->span_cache; + else + span_cache = (span_cache_t *)(heap->span_large_cache + (span_count - 2)); + if (span_cache->count) { + _rpmalloc_stat_inc(&heap->span_use[span_count - 1].spans_from_cache); + return span_cache->span[--span_cache->count]; + } +#endif + return span; +} + +static span_t *_rpmalloc_heap_thread_cache_deferred_extract(heap_t *heap, + size_t span_count) { + span_t *span = 0; + if (span_count == 1) { + _rpmalloc_heap_cache_adopt_deferred(heap, &span); + } else { + _rpmalloc_heap_cache_adopt_deferred(heap, 0); + span = _rpmalloc_heap_thread_cache_extract(heap, span_count); + } + return span; +} + +static span_t *_rpmalloc_heap_reserved_extract(heap_t *heap, + size_t span_count) { + if (heap->spans_reserved >= span_count) + return _rpmalloc_span_map(heap, span_count); + return 0; +} + +//! Extract a span from the global cache +static span_t *_rpmalloc_heap_global_cache_extract(heap_t *heap, + size_t span_count) { +#if ENABLE_GLOBAL_CACHE +#if ENABLE_THREAD_CACHE + span_cache_t *span_cache; + size_t wanted_count; + if (span_count == 1) { + span_cache = &heap->span_cache; + wanted_count = THREAD_SPAN_CACHE_TRANSFER; + } else { + span_cache = (span_cache_t *)(heap->span_large_cache + (span_count - 2)); + wanted_count = THREAD_SPAN_LARGE_CACHE_TRANSFER; + } + span_cache->count = _rpmalloc_global_cache_extract_spans( + span_cache->span, span_count, wanted_count); + if (span_cache->count) { + _rpmalloc_stat_add64(&heap->global_to_thread, + span_count * span_cache->count * _memory_span_size); + _rpmalloc_stat_add(&heap->span_use[span_count - 1].spans_from_global, + span_cache->count); + return span_cache->span[--span_cache->count]; + } +#else + span_t *span = 0; + size_t count = _rpmalloc_global_cache_extract_spans(&span, span_count, 1); + if (count) { + _rpmalloc_stat_add64(&heap->global_to_thread, + span_count * count * _memory_span_size); + _rpmalloc_stat_add(&heap->span_use[span_count - 1].spans_from_global, + count); + return span; + } +#endif +#endif + (void)sizeof(heap); + (void)sizeof(span_count); + return 0; +} + +static void _rpmalloc_inc_span_statistics(heap_t *heap, size_t span_count, + uint32_t class_idx) { + (void)sizeof(heap); + (void)sizeof(span_count); + (void)sizeof(class_idx); +#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS + uint32_t idx = (uint32_t)span_count - 1; + uint32_t current_count = + (uint32_t)atomic_incr32(&heap->span_use[idx].current); + if (current_count > (uint32_t)atomic_load32(&heap->span_use[idx].high)) + atomic_store32(&heap->span_use[idx].high, (int32_t)current_count); + _rpmalloc_stat_add_peak(&heap->size_class_use[class_idx].spans_current, 1, + heap->size_class_use[class_idx].spans_peak); +#endif +} + +//! Get a span from one of the cache levels (thread cache, reserved, global +//! cache) or fallback to mapping more memory +static span_t * +_rpmalloc_heap_extract_new_span(heap_t *heap, + heap_size_class_t *heap_size_class, + size_t span_count, uint32_t class_idx) { + span_t *span; +#if ENABLE_THREAD_CACHE + if (heap_size_class && heap_size_class->cache) { + span = heap_size_class->cache; + heap_size_class->cache = + (heap->span_cache.count + ? heap->span_cache.span[--heap->span_cache.count] + : 0); + _rpmalloc_inc_span_statistics(heap, span_count, class_idx); + return span; + } +#endif + (void)sizeof(class_idx); + // Allow 50% overhead to increase cache hits + size_t base_span_count = span_count; + size_t limit_span_count = + (span_count > 2) ? (span_count + (span_count >> 1)) : span_count; + if (limit_span_count > LARGE_CLASS_COUNT) + limit_span_count = LARGE_CLASS_COUNT; + do { + span = _rpmalloc_heap_thread_cache_extract(heap, span_count); + if (EXPECTED(span != 0)) { + _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_from_cache); + _rpmalloc_inc_span_statistics(heap, span_count, class_idx); + return span; + } + span = _rpmalloc_heap_thread_cache_deferred_extract(heap, span_count); + if (EXPECTED(span != 0)) { + _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_from_cache); + _rpmalloc_inc_span_statistics(heap, span_count, class_idx); + return span; + } + span = _rpmalloc_heap_global_cache_extract(heap, span_count); + if (EXPECTED(span != 0)) { + _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_from_cache); + _rpmalloc_inc_span_statistics(heap, span_count, class_idx); + return span; + } + span = _rpmalloc_heap_reserved_extract(heap, span_count); + if (EXPECTED(span != 0)) { + _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_from_reserved); + _rpmalloc_inc_span_statistics(heap, span_count, class_idx); + return span; + } + ++span_count; + } while (span_count <= limit_span_count); + // Final fallback, map in more virtual memory + span = _rpmalloc_span_map(heap, base_span_count); + _rpmalloc_inc_span_statistics(heap, base_span_count, class_idx); + _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_map_calls); + return span; +} + +static void _rpmalloc_heap_initialize(heap_t *heap) { + _rpmalloc_memset_const(heap, 0, sizeof(heap_t)); + // Get a new heap ID + heap->id = 1 + atomic_incr32(&_memory_heap_id); + + // Link in heap in heap ID map + size_t list_idx = (size_t)heap->id % HEAP_ARRAY_SIZE; + heap->next_heap = _memory_heaps[list_idx]; + _memory_heaps[list_idx] = heap; +} + +static void _rpmalloc_heap_orphan(heap_t *heap, int first_class) { + heap->owner_thread = (uintptr_t)-1; +#if RPMALLOC_FIRST_CLASS_HEAPS + heap_t **heap_list = + (first_class ? &_memory_first_class_orphan_heaps : &_memory_orphan_heaps); +#else + (void)sizeof(first_class); + heap_t **heap_list = &_memory_orphan_heaps; +#endif + heap->next_orphan = *heap_list; + *heap_list = heap; +} + +//! Allocate a new heap from newly mapped memory pages +static heap_t *_rpmalloc_heap_allocate_new(void) { + // Map in pages for a 16 heaps. If page size is greater than required size for + // this, map a page and use first part for heaps and remaining part for spans + // for allocations. Adds a lot of complexity, but saves a lot of memory on + // systems where page size > 64 spans (4MiB) + size_t heap_size = sizeof(heap_t); + size_t aligned_heap_size = 16 * ((heap_size + 15) / 16); + size_t request_heap_count = 16; + size_t heap_span_count = ((aligned_heap_size * request_heap_count) + + sizeof(span_t) + _memory_span_size - 1) / + _memory_span_size; + size_t block_size = _memory_span_size * heap_span_count; + size_t span_count = heap_span_count; + span_t *span = 0; + // If there are global reserved spans, use these first + if (_memory_global_reserve_count >= heap_span_count) { + span = _rpmalloc_global_get_reserved_spans(heap_span_count); + } + if (!span) { + if (_memory_page_size > block_size) { + span_count = _memory_page_size / _memory_span_size; + block_size = _memory_page_size; + // If using huge pages, make sure to grab enough heaps to avoid + // reallocating a huge page just to serve new heaps + size_t possible_heap_count = + (block_size - sizeof(span_t)) / aligned_heap_size; + if (possible_heap_count >= (request_heap_count * 16)) + request_heap_count *= 16; + else if (possible_heap_count < request_heap_count) + request_heap_count = possible_heap_count; + heap_span_count = ((aligned_heap_size * request_heap_count) + + sizeof(span_t) + _memory_span_size - 1) / + _memory_span_size; + } + + size_t align_offset = 0; + span = (span_t *)_rpmalloc_mmap(block_size, &align_offset); + if (!span) + return 0; + + // Master span will contain the heaps + _rpmalloc_stat_inc(&_master_spans); + _rpmalloc_span_initialize(span, span_count, heap_span_count, align_offset); + } + + size_t remain_size = _memory_span_size - sizeof(span_t); + heap_t *heap = (heap_t *)pointer_offset(span, sizeof(span_t)); + _rpmalloc_heap_initialize(heap); + + // Put extra heaps as orphans + size_t num_heaps = remain_size / aligned_heap_size; + if (num_heaps < request_heap_count) + num_heaps = request_heap_count; + atomic_store32(&heap->child_count, (int32_t)num_heaps - 1); + heap_t *extra_heap = (heap_t *)pointer_offset(heap, aligned_heap_size); + while (num_heaps > 1) { + _rpmalloc_heap_initialize(extra_heap); + extra_heap->master_heap = heap; + _rpmalloc_heap_orphan(extra_heap, 1); + extra_heap = (heap_t *)pointer_offset(extra_heap, aligned_heap_size); + --num_heaps; + } + + if (span_count > heap_span_count) { + // Cap reserved spans + size_t remain_count = span_count - heap_span_count; + size_t reserve_count = + (remain_count > _memory_heap_reserve_count ? _memory_heap_reserve_count + : remain_count); + span_t *remain_span = + (span_t *)pointer_offset(span, heap_span_count * _memory_span_size); + _rpmalloc_heap_set_reserved_spans(heap, span, remain_span, reserve_count); + + if (remain_count > reserve_count) { + // Set to global reserved spans + remain_span = (span_t *)pointer_offset(remain_span, + reserve_count * _memory_span_size); + reserve_count = remain_count - reserve_count; + _rpmalloc_global_set_reserved_spans(span, remain_span, reserve_count); + } + } + + return heap; +} + +static heap_t *_rpmalloc_heap_extract_orphan(heap_t **heap_list) { + heap_t *heap = *heap_list; + *heap_list = (heap ? heap->next_orphan : 0); + return heap; +} + +//! Allocate a new heap, potentially reusing a previously orphaned heap +static heap_t *_rpmalloc_heap_allocate(int first_class) { + heap_t *heap = 0; + while (!atomic_cas32_acquire(&_memory_global_lock, 1, 0)) + _rpmalloc_spin(); + if (first_class == 0) + heap = _rpmalloc_heap_extract_orphan(&_memory_orphan_heaps); +#if RPMALLOC_FIRST_CLASS_HEAPS + if (!heap) + heap = _rpmalloc_heap_extract_orphan(&_memory_first_class_orphan_heaps); +#endif + if (!heap) + heap = _rpmalloc_heap_allocate_new(); + atomic_store32_release(&_memory_global_lock, 0); + if (heap) + _rpmalloc_heap_cache_adopt_deferred(heap, 0); + return heap; +} + +static void _rpmalloc_heap_release(void *heapptr, int first_class, + int release_cache) { + heap_t *heap = (heap_t *)heapptr; + if (!heap) + return; + // Release thread cache spans back to global cache + _rpmalloc_heap_cache_adopt_deferred(heap, 0); + if (release_cache || heap->finalize) { +#if ENABLE_THREAD_CACHE + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + span_cache_t *span_cache; + if (!iclass) + span_cache = &heap->span_cache; + else + span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); + if (!span_cache->count) + continue; +#if ENABLE_GLOBAL_CACHE + if (heap->finalize) { + for (size_t ispan = 0; ispan < span_cache->count; ++ispan) + _rpmalloc_span_unmap(span_cache->span[ispan]); + } else { + _rpmalloc_stat_add64(&heap->thread_to_global, span_cache->count * + (iclass + 1) * + _memory_span_size); + _rpmalloc_stat_add(&heap->span_use[iclass].spans_to_global, + span_cache->count); + _rpmalloc_global_cache_insert_spans(span_cache->span, iclass + 1, + span_cache->count); + } +#else + for (size_t ispan = 0; ispan < span_cache->count; ++ispan) + _rpmalloc_span_unmap(span_cache->span[ispan]); +#endif + span_cache->count = 0; + } +#endif + } + + if (get_thread_heap_raw() == heap) + set_thread_heap(0); + +#if ENABLE_STATISTICS + atomic_decr32(&_memory_active_heaps); + rpmalloc_assert(atomic_load32(&_memory_active_heaps) >= 0, + "Still active heaps during finalization"); +#endif + + // If we are forcibly terminating with _exit the state of the + // lock atomic is unknown and it's best to just go ahead and exit + if (get_thread_id() != _rpmalloc_main_thread_id) { + while (!atomic_cas32_acquire(&_memory_global_lock, 1, 0)) + _rpmalloc_spin(); + } + _rpmalloc_heap_orphan(heap, first_class); + atomic_store32_release(&_memory_global_lock, 0); +} + +static void _rpmalloc_heap_release_raw(void *heapptr, int release_cache) { + _rpmalloc_heap_release(heapptr, 0, release_cache); +} + +static void _rpmalloc_heap_release_raw_fc(void *heapptr) { + _rpmalloc_heap_release_raw(heapptr, 1); +} + +static void _rpmalloc_heap_finalize(heap_t *heap) { + if (heap->spans_reserved) { + span_t *span = _rpmalloc_span_map(heap, heap->spans_reserved); + _rpmalloc_span_unmap(span); + heap->spans_reserved = 0; + } + + _rpmalloc_heap_cache_adopt_deferred(heap, 0); + + for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { + if (heap->size_class[iclass].cache) + _rpmalloc_span_unmap(heap->size_class[iclass].cache); + heap->size_class[iclass].cache = 0; + span_t *span = heap->size_class[iclass].partial_span; + while (span) { + span_t *next = span->next; + _rpmalloc_span_finalize(heap, iclass, span, + &heap->size_class[iclass].partial_span); + span = next; + } + // If class still has a free list it must be a full span + if (heap->size_class[iclass].free_list) { + span_t *class_span = + (span_t *)((uintptr_t)heap->size_class[iclass].free_list & + _memory_span_mask); + span_t **list = 0; +#if RPMALLOC_FIRST_CLASS_HEAPS + list = &heap->full_span[iclass]; +#endif + --heap->full_span_count; + if (!_rpmalloc_span_finalize(heap, iclass, class_span, list)) { + if (list) + _rpmalloc_span_double_link_list_remove(list, class_span); + _rpmalloc_span_double_link_list_add( + &heap->size_class[iclass].partial_span, class_span); + } + } + } + +#if ENABLE_THREAD_CACHE + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + span_cache_t *span_cache; + if (!iclass) + span_cache = &heap->span_cache; + else + span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); + for (size_t ispan = 0; ispan < span_cache->count; ++ispan) + _rpmalloc_span_unmap(span_cache->span[ispan]); + span_cache->count = 0; + } +#endif + rpmalloc_assert(!atomic_load_ptr(&heap->span_free_deferred), + "Heaps still active during finalization"); +} + +//////////// +/// +/// Allocation entry points +/// +////// + +//! Pop first block from a free list +static void *free_list_pop(void **list) { + void *block = *list; + *list = *((void **)block); + return block; +} + +//! Allocate a small/medium sized memory block from the given heap +static void *_rpmalloc_allocate_from_heap_fallback( + heap_t *heap, heap_size_class_t *heap_size_class, uint32_t class_idx) { + span_t *span = heap_size_class->partial_span; + rpmalloc_assume(heap != 0); + if (EXPECTED(span != 0)) { + rpmalloc_assert(span->block_count == + _memory_size_class[span->size_class].block_count, + "Span block count corrupted"); + rpmalloc_assert(!_rpmalloc_span_is_fully_utilized(span), + "Internal failure"); + void *block; + if (span->free_list) { + // Span local free list is not empty, swap to size class free list + block = free_list_pop(&span->free_list); + heap_size_class->free_list = span->free_list; + span->free_list = 0; + } else { + // If the span did not fully initialize free list, link up another page + // worth of blocks + void *block_start = pointer_offset( + span, SPAN_HEADER_SIZE + + ((size_t)span->free_list_limit * span->block_size)); + span->free_list_limit += free_list_partial_init( + &heap_size_class->free_list, &block, + (void *)((uintptr_t)block_start & ~(_memory_page_size - 1)), + block_start, span->block_count - span->free_list_limit, + span->block_size); + } + rpmalloc_assert(span->free_list_limit <= span->block_count, + "Span block count corrupted"); + span->used_count = span->free_list_limit; + + // Swap in deferred free list if present + if (atomic_load_ptr(&span->free_list_deferred)) + _rpmalloc_span_extract_free_list_deferred(span); + + // If span is still not fully utilized keep it in partial list and early + // return block + if (!_rpmalloc_span_is_fully_utilized(span)) + return block; + + // The span is fully utilized, unlink from partial list and add to fully + // utilized list + _rpmalloc_span_double_link_list_pop_head(&heap_size_class->partial_span, + span); +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_add(&heap->full_span[class_idx], span); +#endif + ++heap->full_span_count; + return block; + } + + // Find a span in one of the cache levels + span = _rpmalloc_heap_extract_new_span(heap, heap_size_class, 1, class_idx); + if (EXPECTED(span != 0)) { + // Mark span as owned by this heap and set base data, return first block + return _rpmalloc_span_initialize_new(heap, heap_size_class, span, + class_idx); + } + + return 0; +} + +//! Allocate a small sized memory block from the given heap +static void *_rpmalloc_allocate_small(heap_t *heap, size_t size) { + rpmalloc_assert(heap, "No thread heap"); + // Small sizes have unique size classes + const uint32_t class_idx = + (uint32_t)((size + (SMALL_GRANULARITY - 1)) >> SMALL_GRANULARITY_SHIFT); + heap_size_class_t *heap_size_class = heap->size_class + class_idx; + _rpmalloc_stat_inc_alloc(heap, class_idx); + if (EXPECTED(heap_size_class->free_list != 0)) + return free_list_pop(&heap_size_class->free_list); + return _rpmalloc_allocate_from_heap_fallback(heap, heap_size_class, + class_idx); +} + +//! Allocate a medium sized memory block from the given heap +static void *_rpmalloc_allocate_medium(heap_t *heap, size_t size) { + rpmalloc_assert(heap, "No thread heap"); + // Calculate the size class index and do a dependent lookup of the final class + // index (in case of merged classes) + const uint32_t base_idx = + (uint32_t)(SMALL_CLASS_COUNT + + ((size - (SMALL_SIZE_LIMIT + 1)) >> MEDIUM_GRANULARITY_SHIFT)); + const uint32_t class_idx = _memory_size_class[base_idx].class_idx; + heap_size_class_t *heap_size_class = heap->size_class + class_idx; + _rpmalloc_stat_inc_alloc(heap, class_idx); + if (EXPECTED(heap_size_class->free_list != 0)) + return free_list_pop(&heap_size_class->free_list); + return _rpmalloc_allocate_from_heap_fallback(heap, heap_size_class, + class_idx); +} + +//! Allocate a large sized memory block from the given heap +static void *_rpmalloc_allocate_large(heap_t *heap, size_t size) { + rpmalloc_assert(heap, "No thread heap"); + // Calculate number of needed max sized spans (including header) + // Since this function is never called if size > LARGE_SIZE_LIMIT + // the span_count is guaranteed to be <= LARGE_CLASS_COUNT + size += SPAN_HEADER_SIZE; + size_t span_count = size >> _memory_span_size_shift; + if (size & (_memory_span_size - 1)) + ++span_count; + + // Find a span in one of the cache levels + span_t *span = + _rpmalloc_heap_extract_new_span(heap, 0, span_count, SIZE_CLASS_LARGE); + if (!span) + return span; + + // Mark span as owned by this heap and set base data + rpmalloc_assert(span->span_count >= span_count, "Internal failure"); + span->size_class = SIZE_CLASS_LARGE; + span->heap = heap; + +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_add(&heap->large_huge_span, span); +#endif + ++heap->full_span_count; + + return pointer_offset(span, SPAN_HEADER_SIZE); +} + +//! Allocate a huge block by mapping memory pages directly +static void *_rpmalloc_allocate_huge(heap_t *heap, size_t size) { + rpmalloc_assert(heap, "No thread heap"); + _rpmalloc_heap_cache_adopt_deferred(heap, 0); + size += SPAN_HEADER_SIZE; + size_t num_pages = size >> _memory_page_size_shift; + if (size & (_memory_page_size - 1)) + ++num_pages; + size_t align_offset = 0; + span_t *span = + (span_t *)_rpmalloc_mmap(num_pages * _memory_page_size, &align_offset); + if (!span) + return span; + + // Store page count in span_count + span->size_class = SIZE_CLASS_HUGE; + span->span_count = (uint32_t)num_pages; + span->align_offset = (uint32_t)align_offset; + span->heap = heap; + _rpmalloc_stat_add_peak(&_huge_pages_current, num_pages, _huge_pages_peak); + +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_add(&heap->large_huge_span, span); +#endif + ++heap->full_span_count; + + return pointer_offset(span, SPAN_HEADER_SIZE); +} + +//! Allocate a block of the given size +static void *_rpmalloc_allocate(heap_t *heap, size_t size) { + _rpmalloc_stat_add64(&_allocation_counter, 1); + if (EXPECTED(size <= SMALL_SIZE_LIMIT)) + return _rpmalloc_allocate_small(heap, size); + else if (size <= _memory_medium_size_limit) + return _rpmalloc_allocate_medium(heap, size); + else if (size <= LARGE_SIZE_LIMIT) + return _rpmalloc_allocate_large(heap, size); + return _rpmalloc_allocate_huge(heap, size); +} + +static void *_rpmalloc_aligned_allocate(heap_t *heap, size_t alignment, + size_t size) { + if (alignment <= SMALL_GRANULARITY) + return _rpmalloc_allocate(heap, size); + +#if ENABLE_VALIDATE_ARGS + if ((size + alignment) < size) { + errno = EINVAL; + return 0; + } + if (alignment & (alignment - 1)) { + errno = EINVAL; + return 0; + } +#endif + + if ((alignment <= SPAN_HEADER_SIZE) && + ((size + SPAN_HEADER_SIZE) < _memory_medium_size_limit)) { + // If alignment is less or equal to span header size (which is power of + // two), and size aligned to span header size multiples is less than size + + // alignment, then use natural alignment of blocks to provide alignment + size_t multiple_size = size ? (size + (SPAN_HEADER_SIZE - 1)) & + ~(uintptr_t)(SPAN_HEADER_SIZE - 1) + : SPAN_HEADER_SIZE; + rpmalloc_assert(!(multiple_size % SPAN_HEADER_SIZE), + "Failed alignment calculation"); + if (multiple_size <= (size + alignment)) + return _rpmalloc_allocate(heap, multiple_size); + } + + void *ptr = 0; + size_t align_mask = alignment - 1; + if (alignment <= _memory_page_size) { + ptr = _rpmalloc_allocate(heap, size + alignment); + if ((uintptr_t)ptr & align_mask) { + ptr = (void *)(((uintptr_t)ptr & ~(uintptr_t)align_mask) + alignment); + // Mark as having aligned blocks + span_t *span = (span_t *)((uintptr_t)ptr & _memory_span_mask); + span->flags |= SPAN_FLAG_ALIGNED_BLOCKS; + } + return ptr; + } + + // Fallback to mapping new pages for this request. Since pointers passed + // to rpfree must be able to reach the start of the span by bitmasking of + // the address with the span size, the returned aligned pointer from this + // function must be with a span size of the start of the mapped area. + // In worst case this requires us to loop and map pages until we get a + // suitable memory address. It also means we can never align to span size + // or greater, since the span header will push alignment more than one + // span size away from span start (thus causing pointer mask to give us + // an invalid span start on free) + if (alignment & align_mask) { + errno = EINVAL; + return 0; + } + if (alignment >= _memory_span_size) { + errno = EINVAL; + return 0; + } + + size_t extra_pages = alignment / _memory_page_size; + + // Since each span has a header, we will at least need one extra memory page + size_t num_pages = 1 + (size / _memory_page_size); + if (size & (_memory_page_size - 1)) + ++num_pages; + + if (extra_pages > num_pages) + num_pages = 1 + extra_pages; + + size_t original_pages = num_pages; + size_t limit_pages = (_memory_span_size / _memory_page_size) * 2; + if (limit_pages < (original_pages * 2)) + limit_pages = original_pages * 2; + + size_t mapped_size, align_offset; + span_t *span; + +retry: + align_offset = 0; + mapped_size = num_pages * _memory_page_size; + + span = (span_t *)_rpmalloc_mmap(mapped_size, &align_offset); + if (!span) { + errno = ENOMEM; + return 0; + } + ptr = pointer_offset(span, SPAN_HEADER_SIZE); + + if ((uintptr_t)ptr & align_mask) + ptr = (void *)(((uintptr_t)ptr & ~(uintptr_t)align_mask) + alignment); + + if (((size_t)pointer_diff(ptr, span) >= _memory_span_size) || + (pointer_offset(ptr, size) > pointer_offset(span, mapped_size)) || + (((uintptr_t)ptr & _memory_span_mask) != (uintptr_t)span)) { + _rpmalloc_unmap(span, mapped_size, align_offset, mapped_size); + ++num_pages; + if (num_pages > limit_pages) { + errno = EINVAL; + return 0; + } + goto retry; + } + + // Store page count in span_count + span->size_class = SIZE_CLASS_HUGE; + span->span_count = (uint32_t)num_pages; + span->align_offset = (uint32_t)align_offset; + span->heap = heap; + _rpmalloc_stat_add_peak(&_huge_pages_current, num_pages, _huge_pages_peak); + +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_add(&heap->large_huge_span, span); +#endif + ++heap->full_span_count; + + _rpmalloc_stat_add64(&_allocation_counter, 1); + + return ptr; +} + +//////////// +/// +/// Deallocation entry points +/// +////// + +//! Deallocate the given small/medium memory block in the current thread local +//! heap +static void _rpmalloc_deallocate_direct_small_or_medium(span_t *span, + void *block) { + heap_t *heap = span->heap; + rpmalloc_assert(heap->owner_thread == get_thread_id() || + !heap->owner_thread || heap->finalize, + "Internal failure"); + // Add block to free list + if (UNEXPECTED(_rpmalloc_span_is_fully_utilized(span))) { + span->used_count = span->block_count; +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_remove(&heap->full_span[span->size_class], + span); +#endif + _rpmalloc_span_double_link_list_add( + &heap->size_class[span->size_class].partial_span, span); + --heap->full_span_count; + } + *((void **)block) = span->free_list; + --span->used_count; + span->free_list = block; + if (UNEXPECTED(span->used_count == span->list_size)) { + // If there are no used blocks it is guaranteed that no other external + // thread is accessing the span + if (span->used_count) { + // Make sure we have synchronized the deferred list and list size by using + // acquire semantics and guarantee that no external thread is accessing + // span concurrently + void *free_list; + do { + free_list = atomic_exchange_ptr_acquire(&span->free_list_deferred, + INVALID_POINTER); + } while (free_list == INVALID_POINTER); + atomic_store_ptr_release(&span->free_list_deferred, free_list); + } + _rpmalloc_span_double_link_list_remove( + &heap->size_class[span->size_class].partial_span, span); + _rpmalloc_span_release_to_cache(heap, span); + } +} + +static void _rpmalloc_deallocate_defer_free_span(heap_t *heap, span_t *span) { + if (span->size_class != SIZE_CLASS_HUGE) + _rpmalloc_stat_inc(&heap->span_use[span->span_count - 1].spans_deferred); + // This list does not need ABA protection, no mutable side state + do { + span->free_list = (void *)atomic_load_ptr(&heap->span_free_deferred); + } while (!atomic_cas_ptr(&heap->span_free_deferred, span, span->free_list)); +} + +//! Put the block in the deferred free list of the owning span +static void _rpmalloc_deallocate_defer_small_or_medium(span_t *span, + void *block) { + // The memory ordering here is a bit tricky, to avoid having to ABA protect + // the deferred free list to avoid desynchronization of list and list size + // we need to have acquire semantics on successful CAS of the pointer to + // guarantee the list_size variable validity + release semantics on pointer + // store + void *free_list; + do { + free_list = + atomic_exchange_ptr_acquire(&span->free_list_deferred, INVALID_POINTER); + } while (free_list == INVALID_POINTER); + *((void **)block) = free_list; + uint32_t free_count = ++span->list_size; + int all_deferred_free = (free_count == span->block_count); + atomic_store_ptr_release(&span->free_list_deferred, block); + if (all_deferred_free) { + // Span was completely freed by this block. Due to the INVALID_POINTER spin + // lock no other thread can reach this state simultaneously on this span. + // Safe to move to owner heap deferred cache + _rpmalloc_deallocate_defer_free_span(span->heap, span); + } +} + +static void _rpmalloc_deallocate_small_or_medium(span_t *span, void *p) { + _rpmalloc_stat_inc_free(span->heap, span->size_class); + if (span->flags & SPAN_FLAG_ALIGNED_BLOCKS) { + // Realign pointer to block start + void *blocks_start = pointer_offset(span, SPAN_HEADER_SIZE); + uint32_t block_offset = (uint32_t)pointer_diff(p, blocks_start); + p = pointer_offset(p, -(int32_t)(block_offset % span->block_size)); + } + // Check if block belongs to this heap or if deallocation should be deferred +#if RPMALLOC_FIRST_CLASS_HEAPS + int defer = + (span->heap->owner_thread && + (span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); +#else + int defer = + ((span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); +#endif + if (!defer) + _rpmalloc_deallocate_direct_small_or_medium(span, p); + else + _rpmalloc_deallocate_defer_small_or_medium(span, p); +} + +//! Deallocate the given large memory block to the current heap +static void _rpmalloc_deallocate_large(span_t *span) { + rpmalloc_assert(span->size_class == SIZE_CLASS_LARGE, "Bad span size class"); + rpmalloc_assert(!(span->flags & SPAN_FLAG_MASTER) || + !(span->flags & SPAN_FLAG_SUBSPAN), + "Span flag corrupted"); + rpmalloc_assert((span->flags & SPAN_FLAG_MASTER) || + (span->flags & SPAN_FLAG_SUBSPAN), + "Span flag corrupted"); + // We must always defer (unless finalizing) if from another heap since we + // cannot touch the list or counters of another heap +#if RPMALLOC_FIRST_CLASS_HEAPS + int defer = + (span->heap->owner_thread && + (span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); +#else + int defer = + ((span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); +#endif + if (defer) { + _rpmalloc_deallocate_defer_free_span(span->heap, span); + return; + } + rpmalloc_assert(span->heap->full_span_count, "Heap span counter corrupted"); + --span->heap->full_span_count; +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_remove(&span->heap->large_huge_span, span); +#endif +#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS + // Decrease counter + size_t idx = span->span_count - 1; + atomic_decr32(&span->heap->span_use[idx].current); +#endif + heap_t *heap = span->heap; + rpmalloc_assert(heap, "No thread heap"); +#if ENABLE_THREAD_CACHE + const int set_as_reserved = + ((span->span_count > 1) && (heap->span_cache.count == 0) && + !heap->finalize && !heap->spans_reserved); +#else + const int set_as_reserved = + ((span->span_count > 1) && !heap->finalize && !heap->spans_reserved); +#endif + if (set_as_reserved) { + heap->span_reserve = span; + heap->spans_reserved = span->span_count; + if (span->flags & SPAN_FLAG_MASTER) { + heap->span_reserve_master = span; + } else { // SPAN_FLAG_SUBSPAN + span_t *master = (span_t *)pointer_offset( + span, + -(intptr_t)((size_t)span->offset_from_master * _memory_span_size)); + heap->span_reserve_master = master; + rpmalloc_assert(master->flags & SPAN_FLAG_MASTER, "Span flag corrupted"); + rpmalloc_assert(atomic_load32(&master->remaining_spans) >= + (int32_t)span->span_count, + "Master span count corrupted"); + } + _rpmalloc_stat_inc(&heap->span_use[idx].spans_to_reserved); + } else { + // Insert into cache list + _rpmalloc_heap_cache_insert(heap, span); + } +} + +//! Deallocate the given huge span +static void _rpmalloc_deallocate_huge(span_t *span) { + rpmalloc_assert(span->heap, "No span heap"); +#if RPMALLOC_FIRST_CLASS_HEAPS + int defer = + (span->heap->owner_thread && + (span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); +#else + int defer = + ((span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); +#endif + if (defer) { + _rpmalloc_deallocate_defer_free_span(span->heap, span); + return; + } + rpmalloc_assert(span->heap->full_span_count, "Heap span counter corrupted"); + --span->heap->full_span_count; +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_remove(&span->heap->large_huge_span, span); +#endif + + // Oversized allocation, page count is stored in span_count + size_t num_pages = span->span_count; + _rpmalloc_unmap(span, num_pages * _memory_page_size, span->align_offset, + num_pages * _memory_page_size); + _rpmalloc_stat_sub(&_huge_pages_current, num_pages); +} + +//! Deallocate the given block +static void _rpmalloc_deallocate(void *p) { + _rpmalloc_stat_add64(&_deallocation_counter, 1); + // Grab the span (always at start of span, using span alignment) + span_t *span = (span_t *)((uintptr_t)p & _memory_span_mask); + if (UNEXPECTED(!span)) + return; + if (EXPECTED(span->size_class < SIZE_CLASS_COUNT)) + _rpmalloc_deallocate_small_or_medium(span, p); + else if (span->size_class == SIZE_CLASS_LARGE) + _rpmalloc_deallocate_large(span); + else + _rpmalloc_deallocate_huge(span); +} + +//////////// +/// +/// Reallocation entry points +/// +////// + +static size_t _rpmalloc_usable_size(void *p); + +//! Reallocate the given block to the given size +static void *_rpmalloc_reallocate(heap_t *heap, void *p, size_t size, + size_t oldsize, unsigned int flags) { + if (p) { + // Grab the span using guaranteed span alignment + span_t *span = (span_t *)((uintptr_t)p & _memory_span_mask); + if (EXPECTED(span->size_class < SIZE_CLASS_COUNT)) { + // Small/medium sized block + rpmalloc_assert(span->span_count == 1, "Span counter corrupted"); + void *blocks_start = pointer_offset(span, SPAN_HEADER_SIZE); + uint32_t block_offset = (uint32_t)pointer_diff(p, blocks_start); + uint32_t block_idx = block_offset / span->block_size; + void *block = + pointer_offset(blocks_start, (size_t)block_idx * span->block_size); + if (!oldsize) + oldsize = + (size_t)((ptrdiff_t)span->block_size - pointer_diff(p, block)); + if ((size_t)span->block_size >= size) { + // Still fits in block, never mind trying to save memory, but preserve + // data if alignment changed + if ((p != block) && !(flags & RPMALLOC_NO_PRESERVE)) + memmove(block, p, oldsize); + return block; + } + } else if (span->size_class == SIZE_CLASS_LARGE) { + // Large block + size_t total_size = size + SPAN_HEADER_SIZE; + size_t num_spans = total_size >> _memory_span_size_shift; + if (total_size & (_memory_span_mask - 1)) + ++num_spans; + size_t current_spans = span->span_count; + void *block = pointer_offset(span, SPAN_HEADER_SIZE); + if (!oldsize) + oldsize = (current_spans * _memory_span_size) - + (size_t)pointer_diff(p, block) - SPAN_HEADER_SIZE; + if ((current_spans >= num_spans) && (total_size >= (oldsize / 2))) { + // Still fits in block, never mind trying to save memory, but preserve + // data if alignment changed + if ((p != block) && !(flags & RPMALLOC_NO_PRESERVE)) + memmove(block, p, oldsize); + return block; + } + } else { + // Oversized block + size_t total_size = size + SPAN_HEADER_SIZE; + size_t num_pages = total_size >> _memory_page_size_shift; + if (total_size & (_memory_page_size - 1)) + ++num_pages; + // Page count is stored in span_count + size_t current_pages = span->span_count; + void *block = pointer_offset(span, SPAN_HEADER_SIZE); + if (!oldsize) + oldsize = (current_pages * _memory_page_size) - + (size_t)pointer_diff(p, block) - SPAN_HEADER_SIZE; + if ((current_pages >= num_pages) && (num_pages >= (current_pages / 2))) { + // Still fits in block, never mind trying to save memory, but preserve + // data if alignment changed + if ((p != block) && !(flags & RPMALLOC_NO_PRESERVE)) + memmove(block, p, oldsize); + return block; + } + } + } else { + oldsize = 0; + } + + if (!!(flags & RPMALLOC_GROW_OR_FAIL)) + return 0; + + // Size is greater than block size, need to allocate a new block and + // deallocate the old Avoid hysteresis by overallocating if increase is small + // (below 37%) + size_t lower_bound = oldsize + (oldsize >> 2) + (oldsize >> 3); + size_t new_size = + (size > lower_bound) ? size : ((size > oldsize) ? lower_bound : size); + void *block = _rpmalloc_allocate(heap, new_size); + if (p && block) { + if (!(flags & RPMALLOC_NO_PRESERVE)) + memcpy(block, p, oldsize < new_size ? oldsize : new_size); + _rpmalloc_deallocate(p); + } + + return block; +} + +static void *_rpmalloc_aligned_reallocate(heap_t *heap, void *ptr, + size_t alignment, size_t size, + size_t oldsize, unsigned int flags) { + if (alignment <= SMALL_GRANULARITY) + return _rpmalloc_reallocate(heap, ptr, size, oldsize, flags); + + int no_alloc = !!(flags & RPMALLOC_GROW_OR_FAIL); + size_t usablesize = (ptr ? _rpmalloc_usable_size(ptr) : 0); + if ((usablesize >= size) && !((uintptr_t)ptr & (alignment - 1))) { + if (no_alloc || (size >= (usablesize / 2))) + return ptr; + } + // Aligned alloc marks span as having aligned blocks + void *block = + (!no_alloc ? _rpmalloc_aligned_allocate(heap, alignment, size) : 0); + if (EXPECTED(block != 0)) { + if (!(flags & RPMALLOC_NO_PRESERVE) && ptr) { + if (!oldsize) + oldsize = usablesize; + memcpy(block, ptr, oldsize < size ? oldsize : size); + } + _rpmalloc_deallocate(ptr); + } + return block; +} + +//////////// +/// +/// Initialization, finalization and utility +/// +////// + +//! Get the usable size of the given block +static size_t _rpmalloc_usable_size(void *p) { + // Grab the span using guaranteed span alignment + span_t *span = (span_t *)((uintptr_t)p & _memory_span_mask); + if (span->size_class < SIZE_CLASS_COUNT) { + // Small/medium block + void *blocks_start = pointer_offset(span, SPAN_HEADER_SIZE); + return span->block_size - + ((size_t)pointer_diff(p, blocks_start) % span->block_size); + } + if (span->size_class == SIZE_CLASS_LARGE) { + // Large block + size_t current_spans = span->span_count; + return (current_spans * _memory_span_size) - (size_t)pointer_diff(p, span); + } + // Oversized block, page count is stored in span_count + size_t current_pages = span->span_count; + return (current_pages * _memory_page_size) - (size_t)pointer_diff(p, span); +} + +//! Adjust and optimize the size class properties for the given class +static void _rpmalloc_adjust_size_class(size_t iclass) { + size_t block_size = _memory_size_class[iclass].block_size; + size_t block_count = (_memory_span_size - SPAN_HEADER_SIZE) / block_size; + + _memory_size_class[iclass].block_count = (uint16_t)block_count; + _memory_size_class[iclass].class_idx = (uint16_t)iclass; + + // Check if previous size classes can be merged + if (iclass >= SMALL_CLASS_COUNT) { + size_t prevclass = iclass; + while (prevclass > 0) { + --prevclass; + // A class can be merged if number of pages and number of blocks are equal + if (_memory_size_class[prevclass].block_count == + _memory_size_class[iclass].block_count) + _rpmalloc_memcpy_const(_memory_size_class + prevclass, + _memory_size_class + iclass, + sizeof(_memory_size_class[iclass])); + else + break; + } + } +} + +//! Initialize the allocator and setup global data +extern inline int rpmalloc_initialize(void) { + if (_rpmalloc_initialized) { + rpmalloc_thread_initialize(); + return 0; + } + return rpmalloc_initialize_config(0); +} + +int rpmalloc_initialize_config(const rpmalloc_config_t *config) { + if (_rpmalloc_initialized) { + rpmalloc_thread_initialize(); + return 0; + } + _rpmalloc_initialized = 1; + + if (config) + memcpy(&_memory_config, config, sizeof(rpmalloc_config_t)); + else + _rpmalloc_memset_const(&_memory_config, 0, sizeof(rpmalloc_config_t)); + + if (!_memory_config.memory_map || !_memory_config.memory_unmap) { + _memory_config.memory_map = _rpmalloc_mmap_os; + _memory_config.memory_unmap = _rpmalloc_unmap_os; + } + +#if PLATFORM_WINDOWS + SYSTEM_INFO system_info; + memset(&system_info, 0, sizeof(system_info)); + GetSystemInfo(&system_info); + _memory_map_granularity = system_info.dwAllocationGranularity; +#else + _memory_map_granularity = (size_t)sysconf(_SC_PAGESIZE); +#endif + +#if RPMALLOC_CONFIGURABLE + _memory_page_size = _memory_config.page_size; +#else + _memory_page_size = 0; +#endif + _memory_huge_pages = 0; + if (!_memory_page_size) { +#if PLATFORM_WINDOWS + _memory_page_size = system_info.dwPageSize; +#else + _memory_page_size = _memory_map_granularity; + if (_memory_config.enable_huge_pages) { +#if defined(__linux__) + size_t huge_page_size = 0; + FILE *meminfo = fopen("/proc/meminfo", "r"); + if (meminfo) { + char line[128]; + while (!huge_page_size && fgets(line, sizeof(line) - 1, meminfo)) { + line[sizeof(line) - 1] = 0; + if (strstr(line, "Hugepagesize:")) + huge_page_size = (size_t)strtol(line + 13, 0, 10) * 1024; + } + fclose(meminfo); + } + if (huge_page_size) { + _memory_huge_pages = 1; + _memory_page_size = huge_page_size; + _memory_map_granularity = huge_page_size; + } +#elif defined(__FreeBSD__) + int rc; + size_t sz = sizeof(rc); + + if (sysctlbyname("vm.pmap.pg_ps_enabled", &rc, &sz, NULL, 0) == 0 && + rc == 1) { + static size_t defsize = 2 * 1024 * 1024; + int nsize = 0; + size_t sizes[4] = {0}; + _memory_huge_pages = 1; + _memory_page_size = defsize; + if ((nsize = getpagesizes(sizes, 4)) >= 2) { + nsize--; + for (size_t csize = sizes[nsize]; nsize >= 0 && csize; + --nsize, csize = sizes[nsize]) { + //! Unlikely, but as a precaution.. + rpmalloc_assert(!(csize & (csize - 1)) && !(csize % 1024), + "Invalid page size"); + if (defsize < csize) { + _memory_page_size = csize; + break; + } + } + } + _memory_map_granularity = _memory_page_size; + } +#elif defined(__APPLE__) || defined(__NetBSD__) + _memory_huge_pages = 1; + _memory_page_size = 2 * 1024 * 1024; + _memory_map_granularity = _memory_page_size; +#endif + } +#endif + } else { + if (_memory_config.enable_huge_pages) + _memory_huge_pages = 1; + } + +#if PLATFORM_WINDOWS + if (_memory_config.enable_huge_pages) { + HANDLE token = 0; + size_t large_page_minimum = GetLargePageMinimum(); + if (large_page_minimum) + OpenProcessToken(GetCurrentProcess(), + TOKEN_ADJUST_PRIVILEGES | TOKEN_QUERY, &token); + if (token) { + LUID luid; + if (LookupPrivilegeValue(0, SE_LOCK_MEMORY_NAME, &luid)) { + TOKEN_PRIVILEGES token_privileges; + memset(&token_privileges, 0, sizeof(token_privileges)); + token_privileges.PrivilegeCount = 1; + token_privileges.Privileges[0].Luid = luid; + token_privileges.Privileges[0].Attributes = SE_PRIVILEGE_ENABLED; + if (AdjustTokenPrivileges(token, FALSE, &token_privileges, 0, 0, 0)) { + if (GetLastError() == ERROR_SUCCESS) + _memory_huge_pages = 1; + } + } + CloseHandle(token); + } + if (_memory_huge_pages) { + if (large_page_minimum > _memory_page_size) + _memory_page_size = large_page_minimum; + if (large_page_minimum > _memory_map_granularity) + _memory_map_granularity = large_page_minimum; + } + } +#endif + + size_t min_span_size = 256; + size_t max_page_size; +#if UINTPTR_MAX > 0xFFFFFFFF + max_page_size = 4096ULL * 1024ULL * 1024ULL; +#else + max_page_size = 4 * 1024 * 1024; +#endif + if (_memory_page_size < min_span_size) + _memory_page_size = min_span_size; + if (_memory_page_size > max_page_size) + _memory_page_size = max_page_size; + _memory_page_size_shift = 0; + size_t page_size_bit = _memory_page_size; + while (page_size_bit != 1) { + ++_memory_page_size_shift; + page_size_bit >>= 1; + } + _memory_page_size = ((size_t)1 << _memory_page_size_shift); + +#if RPMALLOC_CONFIGURABLE + if (!_memory_config.span_size) { + _memory_span_size = _memory_default_span_size; + _memory_span_size_shift = _memory_default_span_size_shift; + _memory_span_mask = _memory_default_span_mask; + } else { + size_t span_size = _memory_config.span_size; + if (span_size > (256 * 1024)) + span_size = (256 * 1024); + _memory_span_size = 4096; + _memory_span_size_shift = 12; + while (_memory_span_size < span_size) { + _memory_span_size <<= 1; + ++_memory_span_size_shift; + } + _memory_span_mask = ~(uintptr_t)(_memory_span_size - 1); + } +#endif + + _memory_span_map_count = + (_memory_config.span_map_count ? _memory_config.span_map_count + : DEFAULT_SPAN_MAP_COUNT); + if ((_memory_span_size * _memory_span_map_count) < _memory_page_size) + _memory_span_map_count = (_memory_page_size / _memory_span_size); + if ((_memory_page_size >= _memory_span_size) && + ((_memory_span_map_count * _memory_span_size) % _memory_page_size)) + _memory_span_map_count = (_memory_page_size / _memory_span_size); + _memory_heap_reserve_count = (_memory_span_map_count > DEFAULT_SPAN_MAP_COUNT) + ? DEFAULT_SPAN_MAP_COUNT + : _memory_span_map_count; + + _memory_config.page_size = _memory_page_size; + _memory_config.span_size = _memory_span_size; + _memory_config.span_map_count = _memory_span_map_count; + _memory_config.enable_huge_pages = _memory_huge_pages; + +#if ((defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD) || \ + defined(__TINYC__) + if (pthread_key_create(&_memory_thread_heap, _rpmalloc_heap_release_raw_fc)) + return -1; +#endif +#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) + fls_key = FlsAlloc(&_rpmalloc_thread_destructor); +#endif + + // Setup all small and medium size classes + size_t iclass = 0; + _memory_size_class[iclass].block_size = SMALL_GRANULARITY; + _rpmalloc_adjust_size_class(iclass); + for (iclass = 1; iclass < SMALL_CLASS_COUNT; ++iclass) { + size_t size = iclass * SMALL_GRANULARITY; + _memory_size_class[iclass].block_size = (uint32_t)size; + _rpmalloc_adjust_size_class(iclass); + } + // At least two blocks per span, then fall back to large allocations + _memory_medium_size_limit = (_memory_span_size - SPAN_HEADER_SIZE) >> 1; + if (_memory_medium_size_limit > MEDIUM_SIZE_LIMIT) + _memory_medium_size_limit = MEDIUM_SIZE_LIMIT; + for (iclass = 0; iclass < MEDIUM_CLASS_COUNT; ++iclass) { + size_t size = SMALL_SIZE_LIMIT + ((iclass + 1) * MEDIUM_GRANULARITY); + if (size > _memory_medium_size_limit) { + _memory_medium_size_limit = + SMALL_SIZE_LIMIT + (iclass * MEDIUM_GRANULARITY); + break; + } + _memory_size_class[SMALL_CLASS_COUNT + iclass].block_size = (uint32_t)size; + _rpmalloc_adjust_size_class(SMALL_CLASS_COUNT + iclass); + } + + _memory_orphan_heaps = 0; +#if RPMALLOC_FIRST_CLASS_HEAPS + _memory_first_class_orphan_heaps = 0; +#endif +#if ENABLE_STATISTICS + atomic_store32(&_memory_active_heaps, 0); + atomic_store32(&_mapped_pages, 0); + _mapped_pages_peak = 0; + atomic_store32(&_master_spans, 0); + atomic_store32(&_mapped_total, 0); + atomic_store32(&_unmapped_total, 0); + atomic_store32(&_mapped_pages_os, 0); + atomic_store32(&_huge_pages_current, 0); + _huge_pages_peak = 0; +#endif + memset(_memory_heaps, 0, sizeof(_memory_heaps)); + atomic_store32_release(&_memory_global_lock, 0); + + rpmalloc_linker_reference(); + + // Initialize this thread + rpmalloc_thread_initialize(); + return 0; +} + +//! Finalize the allocator +void rpmalloc_finalize(void) { + rpmalloc_thread_finalize(1); + // rpmalloc_dump_statistics(stdout); + + if (_memory_global_reserve) { + atomic_add32(&_memory_global_reserve_master->remaining_spans, + -(int32_t)_memory_global_reserve_count); + _memory_global_reserve_master = 0; + _memory_global_reserve_count = 0; + _memory_global_reserve = 0; + } + atomic_store32_release(&_memory_global_lock, 0); + + // Free all thread caches and fully free spans + for (size_t list_idx = 0; list_idx < HEAP_ARRAY_SIZE; ++list_idx) { + heap_t *heap = _memory_heaps[list_idx]; + while (heap) { + heap_t *next_heap = heap->next_heap; + heap->finalize = 1; + _rpmalloc_heap_global_finalize(heap); + heap = next_heap; + } + } + +#if ENABLE_GLOBAL_CACHE + // Free global caches + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) + _rpmalloc_global_cache_finalize(&_memory_span_cache[iclass]); +#endif + +#if (defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD + pthread_key_delete(_memory_thread_heap); +#endif +#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) + FlsFree(fls_key); + fls_key = 0; +#endif +#if ENABLE_STATISTICS + // If you hit these asserts you probably have memory leaks (perhaps global + // scope data doing dynamic allocations) or double frees in your code + rpmalloc_assert(atomic_load32(&_mapped_pages) == 0, "Memory leak detected"); + rpmalloc_assert(atomic_load32(&_mapped_pages_os) == 0, + "Memory leak detected"); +#endif + + _rpmalloc_initialized = 0; +} + +//! Initialize thread, assign heap +extern inline void rpmalloc_thread_initialize(void) { + if (!get_thread_heap_raw()) { + heap_t *heap = _rpmalloc_heap_allocate(0); + if (heap) { + _rpmalloc_stat_inc(&_memory_active_heaps); + set_thread_heap(heap); +#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) + FlsSetValue(fls_key, heap); +#endif + } + } +} + +//! Finalize thread, orphan heap +void rpmalloc_thread_finalize(int release_caches) { + heap_t *heap = get_thread_heap_raw(); + if (heap) + _rpmalloc_heap_release_raw(heap, release_caches); + set_thread_heap(0); +#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) + FlsSetValue(fls_key, 0); +#endif +} + +int rpmalloc_is_thread_initialized(void) { + return (get_thread_heap_raw() != 0) ? 1 : 0; +} + +const rpmalloc_config_t *rpmalloc_config(void) { return &_memory_config; } + +// Extern interface + +extern inline RPMALLOC_ALLOCATOR void *rpmalloc(size_t size) { +#if ENABLE_VALIDATE_ARGS + if (size >= MAX_ALLOC_SIZE) { + errno = EINVAL; + return 0; + } +#endif + heap_t *heap = get_thread_heap(); + return _rpmalloc_allocate(heap, size); +} + +extern inline void rpfree(void *ptr) { _rpmalloc_deallocate(ptr); } + +extern inline RPMALLOC_ALLOCATOR void *rpcalloc(size_t num, size_t size) { + size_t total; +#if ENABLE_VALIDATE_ARGS +#if PLATFORM_WINDOWS + int err = SizeTMult(num, size, &total); + if ((err != S_OK) || (total >= MAX_ALLOC_SIZE)) { + errno = EINVAL; + return 0; + } +#else + int err = __builtin_umull_overflow(num, size, &total); + if (err || (total >= MAX_ALLOC_SIZE)) { + errno = EINVAL; + return 0; + } +#endif +#else + total = num * size; +#endif + heap_t *heap = get_thread_heap(); + void *block = _rpmalloc_allocate(heap, total); + if (block) + memset(block, 0, total); + return block; +} + +extern inline RPMALLOC_ALLOCATOR void *rprealloc(void *ptr, size_t size) { +#if ENABLE_VALIDATE_ARGS + if (size >= MAX_ALLOC_SIZE) { + errno = EINVAL; + return ptr; + } +#endif + heap_t *heap = get_thread_heap(); + return _rpmalloc_reallocate(heap, ptr, size, 0, 0); +} + +extern RPMALLOC_ALLOCATOR void *rpaligned_realloc(void *ptr, size_t alignment, + size_t size, size_t oldsize, + unsigned int flags) { +#if ENABLE_VALIDATE_ARGS + if ((size + alignment < size) || (alignment > _memory_page_size)) { + errno = EINVAL; + return 0; + } +#endif + heap_t *heap = get_thread_heap(); + return _rpmalloc_aligned_reallocate(heap, ptr, alignment, size, oldsize, + flags); +} + +extern RPMALLOC_ALLOCATOR void *rpaligned_alloc(size_t alignment, size_t size) { + heap_t *heap = get_thread_heap(); + return _rpmalloc_aligned_allocate(heap, alignment, size); +} + +extern inline RPMALLOC_ALLOCATOR void * +rpaligned_calloc(size_t alignment, size_t num, size_t size) { + size_t total; +#if ENABLE_VALIDATE_ARGS +#if PLATFORM_WINDOWS + int err = SizeTMult(num, size, &total); + if ((err != S_OK) || (total >= MAX_ALLOC_SIZE)) { + errno = EINVAL; + return 0; + } +#else + int err = __builtin_umull_overflow(num, size, &total); + if (err || (total >= MAX_ALLOC_SIZE)) { + errno = EINVAL; + return 0; + } +#endif +#else + total = num * size; +#endif + void *block = rpaligned_alloc(alignment, total); + if (block) + memset(block, 0, total); + return block; +} + +extern inline RPMALLOC_ALLOCATOR void *rpmemalign(size_t alignment, + size_t size) { + return rpaligned_alloc(alignment, size); +} + +extern inline int rpposix_memalign(void **memptr, size_t alignment, + size_t size) { + if (memptr) + *memptr = rpaligned_alloc(alignment, size); + else + return EINVAL; + return *memptr ? 0 : ENOMEM; +} + +extern inline size_t rpmalloc_usable_size(void *ptr) { + return (ptr ? _rpmalloc_usable_size(ptr) : 0); +} + +extern inline void rpmalloc_thread_collect(void) {} + +void rpmalloc_thread_statistics(rpmalloc_thread_statistics_t *stats) { + memset(stats, 0, sizeof(rpmalloc_thread_statistics_t)); + heap_t *heap = get_thread_heap_raw(); + if (!heap) + return; + + for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { + size_class_t *size_class = _memory_size_class + iclass; + span_t *span = heap->size_class[iclass].partial_span; + while (span) { + size_t free_count = span->list_size; + size_t block_count = size_class->block_count; + if (span->free_list_limit < block_count) + block_count = span->free_list_limit; + free_count += (block_count - span->used_count); + stats->sizecache += free_count * size_class->block_size; + span = span->next; + } + } + +#if ENABLE_THREAD_CACHE + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + span_cache_t *span_cache; + if (!iclass) + span_cache = &heap->span_cache; + else + span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); + stats->spancache += span_cache->count * (iclass + 1) * _memory_span_size; + } +#endif + + span_t *deferred = (span_t *)atomic_load_ptr(&heap->span_free_deferred); + while (deferred) { + if (deferred->size_class != SIZE_CLASS_HUGE) + stats->spancache += (size_t)deferred->span_count * _memory_span_size; + deferred = (span_t *)deferred->free_list; + } + +#if ENABLE_STATISTICS + stats->thread_to_global = (size_t)atomic_load64(&heap->thread_to_global); + stats->global_to_thread = (size_t)atomic_load64(&heap->global_to_thread); + + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + stats->span_use[iclass].current = + (size_t)atomic_load32(&heap->span_use[iclass].current); + stats->span_use[iclass].peak = + (size_t)atomic_load32(&heap->span_use[iclass].high); + stats->span_use[iclass].to_global = + (size_t)atomic_load32(&heap->span_use[iclass].spans_to_global); + stats->span_use[iclass].from_global = + (size_t)atomic_load32(&heap->span_use[iclass].spans_from_global); + stats->span_use[iclass].to_cache = + (size_t)atomic_load32(&heap->span_use[iclass].spans_to_cache); + stats->span_use[iclass].from_cache = + (size_t)atomic_load32(&heap->span_use[iclass].spans_from_cache); + stats->span_use[iclass].to_reserved = + (size_t)atomic_load32(&heap->span_use[iclass].spans_to_reserved); + stats->span_use[iclass].from_reserved = + (size_t)atomic_load32(&heap->span_use[iclass].spans_from_reserved); + stats->span_use[iclass].map_calls = + (size_t)atomic_load32(&heap->span_use[iclass].spans_map_calls); + } + for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { + stats->size_use[iclass].alloc_current = + (size_t)atomic_load32(&heap->size_class_use[iclass].alloc_current); + stats->size_use[iclass].alloc_peak = + (size_t)heap->size_class_use[iclass].alloc_peak; + stats->size_use[iclass].alloc_total = + (size_t)atomic_load32(&heap->size_class_use[iclass].alloc_total); + stats->size_use[iclass].free_total = + (size_t)atomic_load32(&heap->size_class_use[iclass].free_total); + stats->size_use[iclass].spans_to_cache = + (size_t)atomic_load32(&heap->size_class_use[iclass].spans_to_cache); + stats->size_use[iclass].spans_from_cache = + (size_t)atomic_load32(&heap->size_class_use[iclass].spans_from_cache); + stats->size_use[iclass].spans_from_reserved = (size_t)atomic_load32( + &heap->size_class_use[iclass].spans_from_reserved); + stats->size_use[iclass].map_calls = + (size_t)atomic_load32(&heap->size_class_use[iclass].spans_map_calls); + } +#endif +} + +void rpmalloc_global_statistics(rpmalloc_global_statistics_t *stats) { + memset(stats, 0, sizeof(rpmalloc_global_statistics_t)); +#if ENABLE_STATISTICS + stats->mapped = (size_t)atomic_load32(&_mapped_pages) * _memory_page_size; + stats->mapped_peak = (size_t)_mapped_pages_peak * _memory_page_size; + stats->mapped_total = + (size_t)atomic_load32(&_mapped_total) * _memory_page_size; + stats->unmapped_total = + (size_t)atomic_load32(&_unmapped_total) * _memory_page_size; + stats->huge_alloc = + (size_t)atomic_load32(&_huge_pages_current) * _memory_page_size; + stats->huge_alloc_peak = (size_t)_huge_pages_peak * _memory_page_size; +#endif +#if ENABLE_GLOBAL_CACHE + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + global_cache_t *cache = &_memory_span_cache[iclass]; + while (!atomic_cas32_acquire(&cache->lock, 1, 0)) + _rpmalloc_spin(); + uint32_t count = cache->count; +#if ENABLE_UNLIMITED_CACHE + span_t *current_span = cache->overflow; + while (current_span) { + ++count; + current_span = current_span->next; + } +#endif + atomic_store32_release(&cache->lock, 0); + stats->cached += count * (iclass + 1) * _memory_span_size; + } +#endif +} + +#if ENABLE_STATISTICS + +static void _memory_heap_dump_statistics(heap_t *heap, void *file) { + fprintf(file, "Heap %d stats:\n", heap->id); + fprintf(file, "Class CurAlloc PeakAlloc TotAlloc TotFree BlkSize " + "BlkCount SpansCur SpansPeak PeakAllocMiB ToCacheMiB " + "FromCacheMiB FromReserveMiB MmapCalls\n"); + for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { + if (!atomic_load32(&heap->size_class_use[iclass].alloc_total)) + continue; + fprintf( + file, + "%3u: %10u %10u %10u %10u %8u %8u %8d %9d %13zu %11zu %12zu %14zu " + "%9u\n", + (uint32_t)iclass, + atomic_load32(&heap->size_class_use[iclass].alloc_current), + heap->size_class_use[iclass].alloc_peak, + atomic_load32(&heap->size_class_use[iclass].alloc_total), + atomic_load32(&heap->size_class_use[iclass].free_total), + _memory_size_class[iclass].block_size, + _memory_size_class[iclass].block_count, + atomic_load32(&heap->size_class_use[iclass].spans_current), + heap->size_class_use[iclass].spans_peak, + ((size_t)heap->size_class_use[iclass].alloc_peak * + (size_t)_memory_size_class[iclass].block_size) / + (size_t)(1024 * 1024), + ((size_t)atomic_load32(&heap->size_class_use[iclass].spans_to_cache) * + _memory_span_size) / + (size_t)(1024 * 1024), + ((size_t)atomic_load32(&heap->size_class_use[iclass].spans_from_cache) * + _memory_span_size) / + (size_t)(1024 * 1024), + ((size_t)atomic_load32( + &heap->size_class_use[iclass].spans_from_reserved) * + _memory_span_size) / + (size_t)(1024 * 1024), + atomic_load32(&heap->size_class_use[iclass].spans_map_calls)); + } + fprintf(file, "Spans Current Peak Deferred PeakMiB Cached ToCacheMiB " + "FromCacheMiB ToReserveMiB FromReserveMiB ToGlobalMiB " + "FromGlobalMiB MmapCalls\n"); + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + if (!atomic_load32(&heap->span_use[iclass].high) && + !atomic_load32(&heap->span_use[iclass].spans_map_calls)) + continue; + fprintf( + file, + "%4u: %8d %8u %8u %8zu %7u %11zu %12zu %12zu %14zu %11zu %13zu %10u\n", + (uint32_t)(iclass + 1), atomic_load32(&heap->span_use[iclass].current), + atomic_load32(&heap->span_use[iclass].high), + atomic_load32(&heap->span_use[iclass].spans_deferred), + ((size_t)atomic_load32(&heap->span_use[iclass].high) * + (size_t)_memory_span_size * (iclass + 1)) / + (size_t)(1024 * 1024), +#if ENABLE_THREAD_CACHE + (unsigned int)(!iclass ? heap->span_cache.count + : heap->span_large_cache[iclass - 1].count), + ((size_t)atomic_load32(&heap->span_use[iclass].spans_to_cache) * + (iclass + 1) * _memory_span_size) / + (size_t)(1024 * 1024), + ((size_t)atomic_load32(&heap->span_use[iclass].spans_from_cache) * + (iclass + 1) * _memory_span_size) / + (size_t)(1024 * 1024), +#else + 0, (size_t)0, (size_t)0, +#endif + ((size_t)atomic_load32(&heap->span_use[iclass].spans_to_reserved) * + (iclass + 1) * _memory_span_size) / + (size_t)(1024 * 1024), + ((size_t)atomic_load32(&heap->span_use[iclass].spans_from_reserved) * + (iclass + 1) * _memory_span_size) / + (size_t)(1024 * 1024), + ((size_t)atomic_load32(&heap->span_use[iclass].spans_to_global) * + (size_t)_memory_span_size * (iclass + 1)) / + (size_t)(1024 * 1024), + ((size_t)atomic_load32(&heap->span_use[iclass].spans_from_global) * + (size_t)_memory_span_size * (iclass + 1)) / + (size_t)(1024 * 1024), + atomic_load32(&heap->span_use[iclass].spans_map_calls)); + } + fprintf(file, "Full spans: %zu\n", heap->full_span_count); + fprintf(file, "ThreadToGlobalMiB GlobalToThreadMiB\n"); + fprintf( + file, "%17zu %17zu\n", + (size_t)atomic_load64(&heap->thread_to_global) / (size_t)(1024 * 1024), + (size_t)atomic_load64(&heap->global_to_thread) / (size_t)(1024 * 1024)); +} + +#endif + +void rpmalloc_dump_statistics(void *file) { +#if ENABLE_STATISTICS + for (size_t list_idx = 0; list_idx < HEAP_ARRAY_SIZE; ++list_idx) { + heap_t *heap = _memory_heaps[list_idx]; + while (heap) { + int need_dump = 0; + for (size_t iclass = 0; !need_dump && (iclass < SIZE_CLASS_COUNT); + ++iclass) { + if (!atomic_load32(&heap->size_class_use[iclass].alloc_total)) { + rpmalloc_assert( + !atomic_load32(&heap->size_class_use[iclass].free_total), + "Heap statistics counter mismatch"); + rpmalloc_assert( + !atomic_load32(&heap->size_class_use[iclass].spans_map_calls), + "Heap statistics counter mismatch"); + continue; + } + need_dump = 1; + } + for (size_t iclass = 0; !need_dump && (iclass < LARGE_CLASS_COUNT); + ++iclass) { + if (!atomic_load32(&heap->span_use[iclass].high) && + !atomic_load32(&heap->span_use[iclass].spans_map_calls)) + continue; + need_dump = 1; + } + if (need_dump) + _memory_heap_dump_statistics(heap, file); + heap = heap->next_heap; + } + } + fprintf(file, "Global stats:\n"); + size_t huge_current = + (size_t)atomic_load32(&_huge_pages_current) * _memory_page_size; + size_t huge_peak = (size_t)_huge_pages_peak * _memory_page_size; + fprintf(file, "HugeCurrentMiB HugePeakMiB\n"); + fprintf(file, "%14zu %11zu\n", huge_current / (size_t)(1024 * 1024), + huge_peak / (size_t)(1024 * 1024)); + +#if ENABLE_GLOBAL_CACHE + fprintf(file, "GlobalCacheMiB\n"); + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + global_cache_t *cache = _memory_span_cache + iclass; + size_t global_cache = (size_t)cache->count * iclass * _memory_span_size; + + size_t global_overflow_cache = 0; + span_t *span = cache->overflow; + while (span) { + global_overflow_cache += iclass * _memory_span_size; + span = span->next; + } + if (global_cache || global_overflow_cache || cache->insert_count || + cache->extract_count) + fprintf(file, + "%4zu: %8zuMiB (%8zuMiB overflow) %14zu insert %14zu extract\n", + iclass + 1, global_cache / (size_t)(1024 * 1024), + global_overflow_cache / (size_t)(1024 * 1024), + cache->insert_count, cache->extract_count); + } +#endif + + size_t mapped = (size_t)atomic_load32(&_mapped_pages) * _memory_page_size; + size_t mapped_os = + (size_t)atomic_load32(&_mapped_pages_os) * _memory_page_size; + size_t mapped_peak = (size_t)_mapped_pages_peak * _memory_page_size; + size_t mapped_total = + (size_t)atomic_load32(&_mapped_total) * _memory_page_size; + size_t unmapped_total = + (size_t)atomic_load32(&_unmapped_total) * _memory_page_size; + fprintf( + file, + "MappedMiB MappedOSMiB MappedPeakMiB MappedTotalMiB UnmappedTotalMiB\n"); + fprintf(file, "%9zu %11zu %13zu %14zu %16zu\n", + mapped / (size_t)(1024 * 1024), mapped_os / (size_t)(1024 * 1024), + mapped_peak / (size_t)(1024 * 1024), + mapped_total / (size_t)(1024 * 1024), + unmapped_total / (size_t)(1024 * 1024)); + + fprintf(file, "\n"); +#if 0 + int64_t allocated = atomic_load64(&_allocation_counter); + int64_t deallocated = atomic_load64(&_deallocation_counter); + fprintf(file, "Allocation count: %lli\n", allocated); + fprintf(file, "Deallocation count: %lli\n", deallocated); + fprintf(file, "Current allocations: %lli\n", (allocated - deallocated)); + fprintf(file, "Master spans: %d\n", atomic_load32(&_master_spans)); + fprintf(file, "Dangling master spans: %d\n", atomic_load32(&_unmapped_master_spans)); +#endif +#endif + (void)sizeof(file); +} + +#if RPMALLOC_FIRST_CLASS_HEAPS + +extern inline rpmalloc_heap_t *rpmalloc_heap_acquire(void) { + // Must be a pristine heap from newly mapped memory pages, or else memory + // blocks could already be allocated from the heap which would (wrongly) be + // released when heap is cleared with rpmalloc_heap_free_all(). Also heaps + // guaranteed to be pristine from the dedicated orphan list can be used. + heap_t *heap = _rpmalloc_heap_allocate(1); + rpmalloc_assume(heap != NULL); + heap->owner_thread = 0; + _rpmalloc_stat_inc(&_memory_active_heaps); + return heap; +} + +extern inline void rpmalloc_heap_release(rpmalloc_heap_t *heap) { + if (heap) + _rpmalloc_heap_release(heap, 1, 1); +} + +extern inline RPMALLOC_ALLOCATOR void * +rpmalloc_heap_alloc(rpmalloc_heap_t *heap, size_t size) { +#if ENABLE_VALIDATE_ARGS + if (size >= MAX_ALLOC_SIZE) { + errno = EINVAL; + return 0; + } +#endif + return _rpmalloc_allocate(heap, size); +} + +extern inline RPMALLOC_ALLOCATOR void * +rpmalloc_heap_aligned_alloc(rpmalloc_heap_t *heap, size_t alignment, + size_t size) { +#if ENABLE_VALIDATE_ARGS + if (size >= MAX_ALLOC_SIZE) { + errno = EINVAL; + return 0; + } +#endif + return _rpmalloc_aligned_allocate(heap, alignment, size); +} + +extern inline RPMALLOC_ALLOCATOR void * +rpmalloc_heap_calloc(rpmalloc_heap_t *heap, size_t num, size_t size) { + return rpmalloc_heap_aligned_calloc(heap, 0, num, size); +} + +extern inline RPMALLOC_ALLOCATOR void * +rpmalloc_heap_aligned_calloc(rpmalloc_heap_t *heap, size_t alignment, + size_t num, size_t size) { + size_t total; +#if ENABLE_VALIDATE_ARGS +#if PLATFORM_WINDOWS + int err = SizeTMult(num, size, &total); + if ((err != S_OK) || (total >= MAX_ALLOC_SIZE)) { + errno = EINVAL; + return 0; + } +#else + int err = __builtin_umull_overflow(num, size, &total); + if (err || (total >= MAX_ALLOC_SIZE)) { + errno = EINVAL; + return 0; + } +#endif +#else + total = num * size; +#endif + void *block = _rpmalloc_aligned_allocate(heap, alignment, total); + if (block) + memset(block, 0, total); + return block; +} + +extern inline RPMALLOC_ALLOCATOR void * +rpmalloc_heap_realloc(rpmalloc_heap_t *heap, void *ptr, size_t size, + unsigned int flags) { +#if ENABLE_VALIDATE_ARGS + if (size >= MAX_ALLOC_SIZE) { + errno = EINVAL; + return ptr; + } +#endif + return _rpmalloc_reallocate(heap, ptr, size, 0, flags); +} + +extern inline RPMALLOC_ALLOCATOR void * +rpmalloc_heap_aligned_realloc(rpmalloc_heap_t *heap, void *ptr, + size_t alignment, size_t size, + unsigned int flags) { +#if ENABLE_VALIDATE_ARGS + if ((size + alignment < size) || (alignment > _memory_page_size)) { + errno = EINVAL; + return 0; + } +#endif + return _rpmalloc_aligned_reallocate(heap, ptr, alignment, size, 0, flags); +} + +extern inline void rpmalloc_heap_free(rpmalloc_heap_t *heap, void *ptr) { + (void)sizeof(heap); + _rpmalloc_deallocate(ptr); +} + +extern inline void rpmalloc_heap_free_all(rpmalloc_heap_t *heap) { + span_t *span; + span_t *next_span; + + _rpmalloc_heap_cache_adopt_deferred(heap, 0); + + for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { + span = heap->size_class[iclass].partial_span; + while (span) { + next_span = span->next; + _rpmalloc_heap_cache_insert(heap, span); + span = next_span; + } + heap->size_class[iclass].partial_span = 0; + span = heap->full_span[iclass]; + while (span) { + next_span = span->next; + _rpmalloc_heap_cache_insert(heap, span); + span = next_span; + } + + span = heap->size_class[iclass].cache; + if (span) + _rpmalloc_heap_cache_insert(heap, span); + heap->size_class[iclass].cache = 0; + } + memset(heap->size_class, 0, sizeof(heap->size_class)); + memset(heap->full_span, 0, sizeof(heap->full_span)); + + span = heap->large_huge_span; + while (span) { + next_span = span->next; + if (UNEXPECTED(span->size_class == SIZE_CLASS_HUGE)) + _rpmalloc_deallocate_huge(span); + else + _rpmalloc_heap_cache_insert(heap, span); + span = next_span; + } + heap->large_huge_span = 0; + heap->full_span_count = 0; + +#if ENABLE_THREAD_CACHE + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + span_cache_t *span_cache; + if (!iclass) + span_cache = &heap->span_cache; + else + span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); + if (!span_cache->count) + continue; +#if ENABLE_GLOBAL_CACHE + _rpmalloc_stat_add64(&heap->thread_to_global, + span_cache->count * (iclass + 1) * _memory_span_size); + _rpmalloc_stat_add(&heap->span_use[iclass].spans_to_global, + span_cache->count); + _rpmalloc_global_cache_insert_spans(span_cache->span, iclass + 1, + span_cache->count); +#else + for (size_t ispan = 0; ispan < span_cache->count; ++ispan) + _rpmalloc_span_unmap(span_cache->span[ispan]); +#endif + span_cache->count = 0; + } +#endif + +#if ENABLE_STATISTICS + for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { + atomic_store32(&heap->size_class_use[iclass].alloc_current, 0); + atomic_store32(&heap->size_class_use[iclass].spans_current, 0); + } + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + atomic_store32(&heap->span_use[iclass].current, 0); + } +#endif +} + +extern inline void rpmalloc_heap_thread_set_current(rpmalloc_heap_t *heap) { + heap_t *prev_heap = get_thread_heap_raw(); + if (prev_heap != heap) { + set_thread_heap(heap); + if (prev_heap) + rpmalloc_heap_release(prev_heap); + } +} + +extern inline rpmalloc_heap_t *rpmalloc_get_heap_for_ptr(void *ptr) { + // Grab the span, and then the heap from the span + span_t *span = (span_t *)((uintptr_t)ptr & _memory_span_mask); + if (span) { + return span->heap; + } + return 0; +} + +#endif + +#if ENABLE_PRELOAD || ENABLE_OVERRIDE + +#include "malloc.c" + +#endif + +void rpmalloc_linker_reference(void) { (void)sizeof(_rpmalloc_initialized); } diff --git a/llvm/lib/Support/rpmalloc/rpmalloc.h b/llvm/lib/Support/rpmalloc/rpmalloc.h index 3911c53b779b36..5b7fe1ff4286ba 100644 --- a/llvm/lib/Support/rpmalloc/rpmalloc.h +++ b/llvm/lib/Support/rpmalloc/rpmalloc.h @@ -1,428 +1,428 @@ -//===---------------------- rpmalloc.h ------------------*- C -*-=============// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// -// -// This library provides a cross-platform lock free thread caching malloc -// implementation in C11. -// -//===----------------------------------------------------------------------===// - -#pragma once - -#include - -#ifdef __cplusplus -extern "C" { -#endif - -#if defined(__clang__) || defined(__GNUC__) -#define RPMALLOC_EXPORT __attribute__((visibility("default"))) -#define RPMALLOC_ALLOCATOR -#if (defined(__clang_major__) && (__clang_major__ < 4)) || \ - (defined(__GNUC__) && defined(ENABLE_PRELOAD) && ENABLE_PRELOAD) -#define RPMALLOC_ATTRIB_MALLOC -#define RPMALLOC_ATTRIB_ALLOC_SIZE(size) -#define RPMALLOC_ATTRIB_ALLOC_SIZE2(count, size) -#else -#define RPMALLOC_ATTRIB_MALLOC __attribute__((__malloc__)) -#define RPMALLOC_ATTRIB_ALLOC_SIZE(size) __attribute__((alloc_size(size))) -#define RPMALLOC_ATTRIB_ALLOC_SIZE2(count, size) \ - __attribute__((alloc_size(count, size))) -#endif -#define RPMALLOC_CDECL -#elif defined(_MSC_VER) -#define RPMALLOC_EXPORT -#define RPMALLOC_ALLOCATOR __declspec(allocator) __declspec(restrict) -#define RPMALLOC_ATTRIB_MALLOC -#define RPMALLOC_ATTRIB_ALLOC_SIZE(size) -#define RPMALLOC_ATTRIB_ALLOC_SIZE2(count, size) -#define RPMALLOC_CDECL __cdecl -#else -#define RPMALLOC_EXPORT -#define RPMALLOC_ALLOCATOR -#define RPMALLOC_ATTRIB_MALLOC -#define RPMALLOC_ATTRIB_ALLOC_SIZE(size) -#define RPMALLOC_ATTRIB_ALLOC_SIZE2(count, size) -#define RPMALLOC_CDECL -#endif - -//! Define RPMALLOC_CONFIGURABLE to enable configuring sizes. Will introduce -// a very small overhead due to some size calculations not being compile time -// constants -#ifndef RPMALLOC_CONFIGURABLE -#define RPMALLOC_CONFIGURABLE 0 -#endif - -//! Define RPMALLOC_FIRST_CLASS_HEAPS to enable heap based API (rpmalloc_heap_* -//! functions). -// Will introduce a very small overhead to track fully allocated spans in heaps -#ifndef RPMALLOC_FIRST_CLASS_HEAPS -#define RPMALLOC_FIRST_CLASS_HEAPS 0 -#endif - -//! Flag to rpaligned_realloc to not preserve content in reallocation -#define RPMALLOC_NO_PRESERVE 1 -//! Flag to rpaligned_realloc to fail and return null pointer if grow cannot be -//! done in-place, -// in which case the original pointer is still valid (just like a call to -// realloc which failes to allocate a new block). -#define RPMALLOC_GROW_OR_FAIL 2 - -typedef struct rpmalloc_global_statistics_t { - //! Current amount of virtual memory mapped, all of which might not have been - //! committed (only if ENABLE_STATISTICS=1) - size_t mapped; - //! Peak amount of virtual memory mapped, all of which might not have been - //! committed (only if ENABLE_STATISTICS=1) - size_t mapped_peak; - //! Current amount of memory in global caches for small and medium sizes - //! (<32KiB) - size_t cached; - //! Current amount of memory allocated in huge allocations, i.e larger than - //! LARGE_SIZE_LIMIT which is 2MiB by default (only if ENABLE_STATISTICS=1) - size_t huge_alloc; - //! Peak amount of memory allocated in huge allocations, i.e larger than - //! LARGE_SIZE_LIMIT which is 2MiB by default (only if ENABLE_STATISTICS=1) - size_t huge_alloc_peak; - //! Total amount of memory mapped since initialization (only if - //! ENABLE_STATISTICS=1) - size_t mapped_total; - //! Total amount of memory unmapped since initialization (only if - //! ENABLE_STATISTICS=1) - size_t unmapped_total; -} rpmalloc_global_statistics_t; - -typedef struct rpmalloc_thread_statistics_t { - //! Current number of bytes available in thread size class caches for small - //! and medium sizes (<32KiB) - size_t sizecache; - //! Current number of bytes available in thread span caches for small and - //! medium sizes (<32KiB) - size_t spancache; - //! Total number of bytes transitioned from thread cache to global cache (only - //! if ENABLE_STATISTICS=1) - size_t thread_to_global; - //! Total number of bytes transitioned from global cache to thread cache (only - //! if ENABLE_STATISTICS=1) - size_t global_to_thread; - //! Per span count statistics (only if ENABLE_STATISTICS=1) - struct { - //! Currently used number of spans - size_t current; - //! High water mark of spans used - size_t peak; - //! Number of spans transitioned to global cache - size_t to_global; - //! Number of spans transitioned from global cache - size_t from_global; - //! Number of spans transitioned to thread cache - size_t to_cache; - //! Number of spans transitioned from thread cache - size_t from_cache; - //! Number of spans transitioned to reserved state - size_t to_reserved; - //! Number of spans transitioned from reserved state - size_t from_reserved; - //! Number of raw memory map calls (not hitting the reserve spans but - //! resulting in actual OS mmap calls) - size_t map_calls; - } span_use[64]; - //! Per size class statistics (only if ENABLE_STATISTICS=1) - struct { - //! Current number of allocations - size_t alloc_current; - //! Peak number of allocations - size_t alloc_peak; - //! Total number of allocations - size_t alloc_total; - //! Total number of frees - size_t free_total; - //! Number of spans transitioned to cache - size_t spans_to_cache; - //! Number of spans transitioned from cache - size_t spans_from_cache; - //! Number of spans transitioned from reserved state - size_t spans_from_reserved; - //! Number of raw memory map calls (not hitting the reserve spans but - //! resulting in actual OS mmap calls) - size_t map_calls; - } size_use[128]; -} rpmalloc_thread_statistics_t; - -typedef struct rpmalloc_config_t { - //! Map memory pages for the given number of bytes. The returned address MUST - //! be - // aligned to the rpmalloc span size, which will always be a power of two. - // Optionally the function can store an alignment offset in the offset - // variable in case it performs alignment and the returned pointer is offset - // from the actual start of the memory region due to this alignment. The - // alignment offset will be passed to the memory unmap function. The - // alignment offset MUST NOT be larger than 65535 (storable in an uint16_t), - // if it is you must use natural alignment to shift it into 16 bits. If you - // set a memory_map function, you must also set a memory_unmap function or - // else the default implementation will be used for both. This function must - // be thread safe, it can be called by multiple threads simultaneously. - void *(*memory_map)(size_t size, size_t *offset); - //! Unmap the memory pages starting at address and spanning the given number - //! of bytes. - // If release is set to non-zero, the unmap is for an entire span range as - // returned by a previous call to memory_map and that the entire range should - // be released. The release argument holds the size of the entire span range. - // If release is set to 0, the unmap is a partial decommit of a subset of the - // mapped memory range. If you set a memory_unmap function, you must also set - // a memory_map function or else the default implementation will be used for - // both. This function must be thread safe, it can be called by multiple - // threads simultaneously. - void (*memory_unmap)(void *address, size_t size, size_t offset, - size_t release); - //! Called when an assert fails, if asserts are enabled. Will use the standard - //! assert() - // if this is not set. - void (*error_callback)(const char *message); - //! Called when a call to map memory pages fails (out of memory). If this - //! callback is - // not set or returns zero the library will return a null pointer in the - // allocation call. If this callback returns non-zero the map call will be - // retried. The argument passed is the number of bytes that was requested in - // the map call. Only used if the default system memory map function is used - // (memory_map callback is not set). - int (*map_fail_callback)(size_t size); - //! Size of memory pages. The page size MUST be a power of two. All memory - //! mapping - // requests to memory_map will be made with size set to a multiple of the - // page size. Used if RPMALLOC_CONFIGURABLE is defined to 1, otherwise system - // page size is used. - size_t page_size; - //! Size of a span of memory blocks. MUST be a power of two, and in - //! [4096,262144] - // range (unless 0 - set to 0 to use the default span size). Used if - // RPMALLOC_CONFIGURABLE is defined to 1. - size_t span_size; - //! Number of spans to map at each request to map new virtual memory blocks. - //! This can - // be used to minimize the system call overhead at the cost of virtual memory - // address space. The extra mapped pages will not be written until actually - // used, so physical committed memory should not be affected in the default - // implementation. Will be aligned to a multiple of spans that match memory - // page size in case of huge pages. - size_t span_map_count; - //! Enable use of large/huge pages. If this flag is set to non-zero and page - //! size is - // zero, the allocator will try to enable huge pages and auto detect the - // configuration. If this is set to non-zero and page_size is also non-zero, - // the allocator will assume huge pages have been configured and enabled - // prior to initializing the allocator. For Windows, see - // https://docs.microsoft.com/en-us/windows/desktop/memory/large-page-support - // For Linux, see https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt - int enable_huge_pages; - //! Respectively allocated pages and huge allocated pages names for systems - // supporting it to be able to distinguish among anonymous regions. - const char *page_name; - const char *huge_page_name; -} rpmalloc_config_t; - -//! Initialize allocator with default configuration -RPMALLOC_EXPORT int rpmalloc_initialize(void); - -//! Initialize allocator with given configuration -RPMALLOC_EXPORT int rpmalloc_initialize_config(const rpmalloc_config_t *config); - -//! Get allocator configuration -RPMALLOC_EXPORT const rpmalloc_config_t *rpmalloc_config(void); - -//! Finalize allocator -RPMALLOC_EXPORT void rpmalloc_finalize(void); - -//! Initialize allocator for calling thread -RPMALLOC_EXPORT void rpmalloc_thread_initialize(void); - -//! Finalize allocator for calling thread -RPMALLOC_EXPORT void rpmalloc_thread_finalize(int release_caches); - -//! Perform deferred deallocations pending for the calling thread heap -RPMALLOC_EXPORT void rpmalloc_thread_collect(void); - -//! Query if allocator is initialized for calling thread -RPMALLOC_EXPORT int rpmalloc_is_thread_initialized(void); - -//! Get per-thread statistics -RPMALLOC_EXPORT void -rpmalloc_thread_statistics(rpmalloc_thread_statistics_t *stats); - -//! Get global statistics -RPMALLOC_EXPORT void -rpmalloc_global_statistics(rpmalloc_global_statistics_t *stats); - -//! Dump all statistics in human readable format to file (should be a FILE*) -RPMALLOC_EXPORT void rpmalloc_dump_statistics(void *file); - -//! Allocate a memory block of at least the given size -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpmalloc(size_t size) RPMALLOC_ATTRIB_MALLOC RPMALLOC_ATTRIB_ALLOC_SIZE(1); - -//! Free the given memory block -RPMALLOC_EXPORT void rpfree(void *ptr); - -//! Allocate a memory block of at least the given size and zero initialize it -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpcalloc(size_t num, size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE2(1, 2); - -//! Reallocate the given block to at least the given size -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rprealloc(void *ptr, size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(2); - -//! Reallocate the given block to at least the given size and alignment, -// with optional control flags (see RPMALLOC_NO_PRESERVE). -// Alignment must be a power of two and a multiple of sizeof(void*), -// and should ideally be less than memory page size. A caveat of rpmalloc -// internals is that this must also be strictly less than the span size -// (default 64KiB) -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpaligned_realloc(void *ptr, size_t alignment, size_t size, size_t oldsize, - unsigned int flags) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(3); - -//! Allocate a memory block of at least the given size and alignment. -// Alignment must be a power of two and a multiple of sizeof(void*), -// and should ideally be less than memory page size. A caveat of rpmalloc -// internals is that this must also be strictly less than the span size -// (default 64KiB) -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpaligned_alloc(size_t alignment, size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(2); - -//! Allocate a memory block of at least the given size and alignment, and zero -//! initialize it. -// Alignment must be a power of two and a multiple of sizeof(void*), -// and should ideally be less than memory page size. A caveat of rpmalloc -// internals is that this must also be strictly less than the span size -// (default 64KiB) -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpaligned_calloc(size_t alignment, size_t num, - size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE2(2, 3); - -//! Allocate a memory block of at least the given size and alignment. -// Alignment must be a power of two and a multiple of sizeof(void*), -// and should ideally be less than memory page size. A caveat of rpmalloc -// internals is that this must also be strictly less than the span size -// (default 64KiB) -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpmemalign(size_t alignment, size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(2); - -//! Allocate a memory block of at least the given size and alignment. -// Alignment must be a power of two and a multiple of sizeof(void*), -// and should ideally be less than memory page size. A caveat of rpmalloc -// internals is that this must also be strictly less than the span size -// (default 64KiB) -RPMALLOC_EXPORT int rpposix_memalign(void **memptr, size_t alignment, - size_t size); - -//! Query the usable size of the given memory block (from given pointer to the -//! end of block) -RPMALLOC_EXPORT size_t rpmalloc_usable_size(void *ptr); - -//! Dummy empty function for forcing linker symbol inclusion -RPMALLOC_EXPORT void rpmalloc_linker_reference(void); - -#if RPMALLOC_FIRST_CLASS_HEAPS - -//! Heap type -typedef struct heap_t rpmalloc_heap_t; - -//! Acquire a new heap. Will reuse existing released heaps or allocate memory -//! for a new heap -// if none available. Heap API is implemented with the strict assumption that -// only one single thread will call heap functions for a given heap at any -// given time, no functions are thread safe. -RPMALLOC_EXPORT rpmalloc_heap_t *rpmalloc_heap_acquire(void); - -//! Release a heap (does NOT free the memory allocated by the heap, use -//! rpmalloc_heap_free_all before destroying the heap). -// Releasing a heap will enable it to be reused by other threads. Safe to pass -// a null pointer. -RPMALLOC_EXPORT void rpmalloc_heap_release(rpmalloc_heap_t *heap); - -//! Allocate a memory block of at least the given size using the given heap. -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpmalloc_heap_alloc(rpmalloc_heap_t *heap, size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(2); - -//! Allocate a memory block of at least the given size using the given heap. The -//! returned -// block will have the requested alignment. Alignment must be a power of two -// and a multiple of sizeof(void*), and should ideally be less than memory page -// size. A caveat of rpmalloc internals is that this must also be strictly less -// than the span size (default 64KiB). -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpmalloc_heap_aligned_alloc(rpmalloc_heap_t *heap, size_t alignment, - size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(3); - -//! Allocate a memory block of at least the given size using the given heap and -//! zero initialize it. -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpmalloc_heap_calloc(rpmalloc_heap_t *heap, size_t num, - size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE2(2, 3); - -//! Allocate a memory block of at least the given size using the given heap and -//! zero initialize it. The returned -// block will have the requested alignment. Alignment must either be zero, or a -// power of two and a multiple of sizeof(void*), and should ideally be less -// than memory page size. A caveat of rpmalloc internals is that this must also -// be strictly less than the span size (default 64KiB). -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpmalloc_heap_aligned_calloc(rpmalloc_heap_t *heap, size_t alignment, - size_t num, size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE2(2, 3); - -//! Reallocate the given block to at least the given size. The memory block MUST -//! be allocated -// by the same heap given to this function. -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpmalloc_heap_realloc(rpmalloc_heap_t *heap, void *ptr, size_t size, - unsigned int flags) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(3); - -//! Reallocate the given block to at least the given size. The memory block MUST -//! be allocated -// by the same heap given to this function. The returned block will have the -// requested alignment. Alignment must be either zero, or a power of two and a -// multiple of sizeof(void*), and should ideally be less than memory page size. -// A caveat of rpmalloc internals is that this must also be strictly less than -// the span size (default 64KiB). -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void *rpmalloc_heap_aligned_realloc( - rpmalloc_heap_t *heap, void *ptr, size_t alignment, size_t size, - unsigned int flags) RPMALLOC_ATTRIB_MALLOC RPMALLOC_ATTRIB_ALLOC_SIZE(4); - -//! Free the given memory block from the given heap. The memory block MUST be -//! allocated -// by the same heap given to this function. -RPMALLOC_EXPORT void rpmalloc_heap_free(rpmalloc_heap_t *heap, void *ptr); - -//! Free all memory allocated by the heap -RPMALLOC_EXPORT void rpmalloc_heap_free_all(rpmalloc_heap_t *heap); - -//! Set the given heap as the current heap for the calling thread. A heap MUST -//! only be current heap -// for a single thread, a heap can never be shared between multiple threads. -// The previous current heap for the calling thread is released to be reused by -// other threads. -RPMALLOC_EXPORT void rpmalloc_heap_thread_set_current(rpmalloc_heap_t *heap); - -//! Returns which heap the given pointer is allocated on -RPMALLOC_EXPORT rpmalloc_heap_t *rpmalloc_get_heap_for_ptr(void *ptr); - -#endif - -#ifdef __cplusplus -} -#endif +//===---------------------- rpmalloc.h ------------------*- C -*-=============// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This library provides a cross-platform lock free thread caching malloc +// implementation in C11. +// +//===----------------------------------------------------------------------===// + +#pragma once + +#include + +#ifdef __cplusplus +extern "C" { +#endif + +#if defined(__clang__) || defined(__GNUC__) +#define RPMALLOC_EXPORT __attribute__((visibility("default"))) +#define RPMALLOC_ALLOCATOR +#if (defined(__clang_major__) && (__clang_major__ < 4)) || \ + (defined(__GNUC__) && defined(ENABLE_PRELOAD) && ENABLE_PRELOAD) +#define RPMALLOC_ATTRIB_MALLOC +#define RPMALLOC_ATTRIB_ALLOC_SIZE(size) +#define RPMALLOC_ATTRIB_ALLOC_SIZE2(count, size) +#else +#define RPMALLOC_ATTRIB_MALLOC __attribute__((__malloc__)) +#define RPMALLOC_ATTRIB_ALLOC_SIZE(size) __attribute__((alloc_size(size))) +#define RPMALLOC_ATTRIB_ALLOC_SIZE2(count, size) \ + __attribute__((alloc_size(count, size))) +#endif +#define RPMALLOC_CDECL +#elif defined(_MSC_VER) +#define RPMALLOC_EXPORT +#define RPMALLOC_ALLOCATOR __declspec(allocator) __declspec(restrict) +#define RPMALLOC_ATTRIB_MALLOC +#define RPMALLOC_ATTRIB_ALLOC_SIZE(size) +#define RPMALLOC_ATTRIB_ALLOC_SIZE2(count, size) +#define RPMALLOC_CDECL __cdecl +#else +#define RPMALLOC_EXPORT +#define RPMALLOC_ALLOCATOR +#define RPMALLOC_ATTRIB_MALLOC +#define RPMALLOC_ATTRIB_ALLOC_SIZE(size) +#define RPMALLOC_ATTRIB_ALLOC_SIZE2(count, size) +#define RPMALLOC_CDECL +#endif + +//! Define RPMALLOC_CONFIGURABLE to enable configuring sizes. Will introduce +// a very small overhead due to some size calculations not being compile time +// constants +#ifndef RPMALLOC_CONFIGURABLE +#define RPMALLOC_CONFIGURABLE 0 +#endif + +//! Define RPMALLOC_FIRST_CLASS_HEAPS to enable heap based API (rpmalloc_heap_* +//! functions). +// Will introduce a very small overhead to track fully allocated spans in heaps +#ifndef RPMALLOC_FIRST_CLASS_HEAPS +#define RPMALLOC_FIRST_CLASS_HEAPS 0 +#endif + +//! Flag to rpaligned_realloc to not preserve content in reallocation +#define RPMALLOC_NO_PRESERVE 1 +//! Flag to rpaligned_realloc to fail and return null pointer if grow cannot be +//! done in-place, +// in which case the original pointer is still valid (just like a call to +// realloc which failes to allocate a new block). +#define RPMALLOC_GROW_OR_FAIL 2 + +typedef struct rpmalloc_global_statistics_t { + //! Current amount of virtual memory mapped, all of which might not have been + //! committed (only if ENABLE_STATISTICS=1) + size_t mapped; + //! Peak amount of virtual memory mapped, all of which might not have been + //! committed (only if ENABLE_STATISTICS=1) + size_t mapped_peak; + //! Current amount of memory in global caches for small and medium sizes + //! (<32KiB) + size_t cached; + //! Current amount of memory allocated in huge allocations, i.e larger than + //! LARGE_SIZE_LIMIT which is 2MiB by default (only if ENABLE_STATISTICS=1) + size_t huge_alloc; + //! Peak amount of memory allocated in huge allocations, i.e larger than + //! LARGE_SIZE_LIMIT which is 2MiB by default (only if ENABLE_STATISTICS=1) + size_t huge_alloc_peak; + //! Total amount of memory mapped since initialization (only if + //! ENABLE_STATISTICS=1) + size_t mapped_total; + //! Total amount of memory unmapped since initialization (only if + //! ENABLE_STATISTICS=1) + size_t unmapped_total; +} rpmalloc_global_statistics_t; + +typedef struct rpmalloc_thread_statistics_t { + //! Current number of bytes available in thread size class caches for small + //! and medium sizes (<32KiB) + size_t sizecache; + //! Current number of bytes available in thread span caches for small and + //! medium sizes (<32KiB) + size_t spancache; + //! Total number of bytes transitioned from thread cache to global cache (only + //! if ENABLE_STATISTICS=1) + size_t thread_to_global; + //! Total number of bytes transitioned from global cache to thread cache (only + //! if ENABLE_STATISTICS=1) + size_t global_to_thread; + //! Per span count statistics (only if ENABLE_STATISTICS=1) + struct { + //! Currently used number of spans + size_t current; + //! High water mark of spans used + size_t peak; + //! Number of spans transitioned to global cache + size_t to_global; + //! Number of spans transitioned from global cache + size_t from_global; + //! Number of spans transitioned to thread cache + size_t to_cache; + //! Number of spans transitioned from thread cache + size_t from_cache; + //! Number of spans transitioned to reserved state + size_t to_reserved; + //! Number of spans transitioned from reserved state + size_t from_reserved; + //! Number of raw memory map calls (not hitting the reserve spans but + //! resulting in actual OS mmap calls) + size_t map_calls; + } span_use[64]; + //! Per size class statistics (only if ENABLE_STATISTICS=1) + struct { + //! Current number of allocations + size_t alloc_current; + //! Peak number of allocations + size_t alloc_peak; + //! Total number of allocations + size_t alloc_total; + //! Total number of frees + size_t free_total; + //! Number of spans transitioned to cache + size_t spans_to_cache; + //! Number of spans transitioned from cache + size_t spans_from_cache; + //! Number of spans transitioned from reserved state + size_t spans_from_reserved; + //! Number of raw memory map calls (not hitting the reserve spans but + //! resulting in actual OS mmap calls) + size_t map_calls; + } size_use[128]; +} rpmalloc_thread_statistics_t; + +typedef struct rpmalloc_config_t { + //! Map memory pages for the given number of bytes. The returned address MUST + //! be + // aligned to the rpmalloc span size, which will always be a power of two. + // Optionally the function can store an alignment offset in the offset + // variable in case it performs alignment and the returned pointer is offset + // from the actual start of the memory region due to this alignment. The + // alignment offset will be passed to the memory unmap function. The + // alignment offset MUST NOT be larger than 65535 (storable in an uint16_t), + // if it is you must use natural alignment to shift it into 16 bits. If you + // set a memory_map function, you must also set a memory_unmap function or + // else the default implementation will be used for both. This function must + // be thread safe, it can be called by multiple threads simultaneously. + void *(*memory_map)(size_t size, size_t *offset); + //! Unmap the memory pages starting at address and spanning the given number + //! of bytes. + // If release is set to non-zero, the unmap is for an entire span range as + // returned by a previous call to memory_map and that the entire range should + // be released. The release argument holds the size of the entire span range. + // If release is set to 0, the unmap is a partial decommit of a subset of the + // mapped memory range. If you set a memory_unmap function, you must also set + // a memory_map function or else the default implementation will be used for + // both. This function must be thread safe, it can be called by multiple + // threads simultaneously. + void (*memory_unmap)(void *address, size_t size, size_t offset, + size_t release); + //! Called when an assert fails, if asserts are enabled. Will use the standard + //! assert() + // if this is not set. + void (*error_callback)(const char *message); + //! Called when a call to map memory pages fails (out of memory). If this + //! callback is + // not set or returns zero the library will return a null pointer in the + // allocation call. If this callback returns non-zero the map call will be + // retried. The argument passed is the number of bytes that was requested in + // the map call. Only used if the default system memory map function is used + // (memory_map callback is not set). + int (*map_fail_callback)(size_t size); + //! Size of memory pages. The page size MUST be a power of two. All memory + //! mapping + // requests to memory_map will be made with size set to a multiple of the + // page size. Used if RPMALLOC_CONFIGURABLE is defined to 1, otherwise system + // page size is used. + size_t page_size; + //! Size of a span of memory blocks. MUST be a power of two, and in + //! [4096,262144] + // range (unless 0 - set to 0 to use the default span size). Used if + // RPMALLOC_CONFIGURABLE is defined to 1. + size_t span_size; + //! Number of spans to map at each request to map new virtual memory blocks. + //! This can + // be used to minimize the system call overhead at the cost of virtual memory + // address space. The extra mapped pages will not be written until actually + // used, so physical committed memory should not be affected in the default + // implementation. Will be aligned to a multiple of spans that match memory + // page size in case of huge pages. + size_t span_map_count; + //! Enable use of large/huge pages. If this flag is set to non-zero and page + //! size is + // zero, the allocator will try to enable huge pages and auto detect the + // configuration. If this is set to non-zero and page_size is also non-zero, + // the allocator will assume huge pages have been configured and enabled + // prior to initializing the allocator. For Windows, see + // https://docs.microsoft.com/en-us/windows/desktop/memory/large-page-support + // For Linux, see https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt + int enable_huge_pages; + //! Respectively allocated pages and huge allocated pages names for systems + // supporting it to be able to distinguish among anonymous regions. + const char *page_name; + const char *huge_page_name; +} rpmalloc_config_t; + +//! Initialize allocator with default configuration +RPMALLOC_EXPORT int rpmalloc_initialize(void); + +//! Initialize allocator with given configuration +RPMALLOC_EXPORT int rpmalloc_initialize_config(const rpmalloc_config_t *config); + +//! Get allocator configuration +RPMALLOC_EXPORT const rpmalloc_config_t *rpmalloc_config(void); + +//! Finalize allocator +RPMALLOC_EXPORT void rpmalloc_finalize(void); + +//! Initialize allocator for calling thread +RPMALLOC_EXPORT void rpmalloc_thread_initialize(void); + +//! Finalize allocator for calling thread +RPMALLOC_EXPORT void rpmalloc_thread_finalize(int release_caches); + +//! Perform deferred deallocations pending for the calling thread heap +RPMALLOC_EXPORT void rpmalloc_thread_collect(void); + +//! Query if allocator is initialized for calling thread +RPMALLOC_EXPORT int rpmalloc_is_thread_initialized(void); + +//! Get per-thread statistics +RPMALLOC_EXPORT void +rpmalloc_thread_statistics(rpmalloc_thread_statistics_t *stats); + +//! Get global statistics +RPMALLOC_EXPORT void +rpmalloc_global_statistics(rpmalloc_global_statistics_t *stats); + +//! Dump all statistics in human readable format to file (should be a FILE*) +RPMALLOC_EXPORT void rpmalloc_dump_statistics(void *file); + +//! Allocate a memory block of at least the given size +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpmalloc(size_t size) RPMALLOC_ATTRIB_MALLOC RPMALLOC_ATTRIB_ALLOC_SIZE(1); + +//! Free the given memory block +RPMALLOC_EXPORT void rpfree(void *ptr); + +//! Allocate a memory block of at least the given size and zero initialize it +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpcalloc(size_t num, size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE2(1, 2); + +//! Reallocate the given block to at least the given size +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rprealloc(void *ptr, size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(2); + +//! Reallocate the given block to at least the given size and alignment, +// with optional control flags (see RPMALLOC_NO_PRESERVE). +// Alignment must be a power of two and a multiple of sizeof(void*), +// and should ideally be less than memory page size. A caveat of rpmalloc +// internals is that this must also be strictly less than the span size +// (default 64KiB) +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpaligned_realloc(void *ptr, size_t alignment, size_t size, size_t oldsize, + unsigned int flags) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(3); + +//! Allocate a memory block of at least the given size and alignment. +// Alignment must be a power of two and a multiple of sizeof(void*), +// and should ideally be less than memory page size. A caveat of rpmalloc +// internals is that this must also be strictly less than the span size +// (default 64KiB) +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpaligned_alloc(size_t alignment, size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(2); + +//! Allocate a memory block of at least the given size and alignment, and zero +//! initialize it. +// Alignment must be a power of two and a multiple of sizeof(void*), +// and should ideally be less than memory page size. A caveat of rpmalloc +// internals is that this must also be strictly less than the span size +// (default 64KiB) +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpaligned_calloc(size_t alignment, size_t num, + size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE2(2, 3); + +//! Allocate a memory block of at least the given size and alignment. +// Alignment must be a power of two and a multiple of sizeof(void*), +// and should ideally be less than memory page size. A caveat of rpmalloc +// internals is that this must also be strictly less than the span size +// (default 64KiB) +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpmemalign(size_t alignment, size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(2); + +//! Allocate a memory block of at least the given size and alignment. +// Alignment must be a power of two and a multiple of sizeof(void*), +// and should ideally be less than memory page size. A caveat of rpmalloc +// internals is that this must also be strictly less than the span size +// (default 64KiB) +RPMALLOC_EXPORT int rpposix_memalign(void **memptr, size_t alignment, + size_t size); + +//! Query the usable size of the given memory block (from given pointer to the +//! end of block) +RPMALLOC_EXPORT size_t rpmalloc_usable_size(void *ptr); + +//! Dummy empty function for forcing linker symbol inclusion +RPMALLOC_EXPORT void rpmalloc_linker_reference(void); + +#if RPMALLOC_FIRST_CLASS_HEAPS + +//! Heap type +typedef struct heap_t rpmalloc_heap_t; + +//! Acquire a new heap. Will reuse existing released heaps or allocate memory +//! for a new heap +// if none available. Heap API is implemented with the strict assumption that +// only one single thread will call heap functions for a given heap at any +// given time, no functions are thread safe. +RPMALLOC_EXPORT rpmalloc_heap_t *rpmalloc_heap_acquire(void); + +//! Release a heap (does NOT free the memory allocated by the heap, use +//! rpmalloc_heap_free_all before destroying the heap). +// Releasing a heap will enable it to be reused by other threads. Safe to pass +// a null pointer. +RPMALLOC_EXPORT void rpmalloc_heap_release(rpmalloc_heap_t *heap); + +//! Allocate a memory block of at least the given size using the given heap. +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpmalloc_heap_alloc(rpmalloc_heap_t *heap, size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(2); + +//! Allocate a memory block of at least the given size using the given heap. The +//! returned +// block will have the requested alignment. Alignment must be a power of two +// and a multiple of sizeof(void*), and should ideally be less than memory page +// size. A caveat of rpmalloc internals is that this must also be strictly less +// than the span size (default 64KiB). +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpmalloc_heap_aligned_alloc(rpmalloc_heap_t *heap, size_t alignment, + size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(3); + +//! Allocate a memory block of at least the given size using the given heap and +//! zero initialize it. +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpmalloc_heap_calloc(rpmalloc_heap_t *heap, size_t num, + size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE2(2, 3); + +//! Allocate a memory block of at least the given size using the given heap and +//! zero initialize it. The returned +// block will have the requested alignment. Alignment must either be zero, or a +// power of two and a multiple of sizeof(void*), and should ideally be less +// than memory page size. A caveat of rpmalloc internals is that this must also +// be strictly less than the span size (default 64KiB). +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpmalloc_heap_aligned_calloc(rpmalloc_heap_t *heap, size_t alignment, + size_t num, size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE2(2, 3); + +//! Reallocate the given block to at least the given size. The memory block MUST +//! be allocated +// by the same heap given to this function. +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpmalloc_heap_realloc(rpmalloc_heap_t *heap, void *ptr, size_t size, + unsigned int flags) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(3); + +//! Reallocate the given block to at least the given size. The memory block MUST +//! be allocated +// by the same heap given to this function. The returned block will have the +// requested alignment. Alignment must be either zero, or a power of two and a +// multiple of sizeof(void*), and should ideally be less than memory page size. +// A caveat of rpmalloc internals is that this must also be strictly less than +// the span size (default 64KiB). +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void *rpmalloc_heap_aligned_realloc( + rpmalloc_heap_t *heap, void *ptr, size_t alignment, size_t size, + unsigned int flags) RPMALLOC_ATTRIB_MALLOC RPMALLOC_ATTRIB_ALLOC_SIZE(4); + +//! Free the given memory block from the given heap. The memory block MUST be +//! allocated +// by the same heap given to this function. +RPMALLOC_EXPORT void rpmalloc_heap_free(rpmalloc_heap_t *heap, void *ptr); + +//! Free all memory allocated by the heap +RPMALLOC_EXPORT void rpmalloc_heap_free_all(rpmalloc_heap_t *heap); + +//! Set the given heap as the current heap for the calling thread. A heap MUST +//! only be current heap +// for a single thread, a heap can never be shared between multiple threads. +// The previous current heap for the calling thread is released to be reused by +// other threads. +RPMALLOC_EXPORT void rpmalloc_heap_thread_set_current(rpmalloc_heap_t *heap); + +//! Returns which heap the given pointer is allocated on +RPMALLOC_EXPORT rpmalloc_heap_t *rpmalloc_get_heap_for_ptr(void *ptr); + +#endif + +#ifdef __cplusplus +} +#endif diff --git a/llvm/lib/Support/rpmalloc/rpnew.h b/llvm/lib/Support/rpmalloc/rpnew.h index d8303c6f95652f..a18f0799d56d1f 100644 --- a/llvm/lib/Support/rpmalloc/rpnew.h +++ b/llvm/lib/Support/rpmalloc/rpnew.h @@ -1,113 +1,113 @@ -//===-------------------------- rpnew.h -----------------*- C -*-=============// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// -// -// This library provides a cross-platform lock free thread caching malloc -// implementation in C11. -// -//===----------------------------------------------------------------------===// - -#ifdef __cplusplus - -#include -#include - -#ifndef __CRTDECL -#define __CRTDECL -#endif - -extern void __CRTDECL operator delete(void *p) noexcept { rpfree(p); } - -extern void __CRTDECL operator delete[](void *p) noexcept { rpfree(p); } - -extern void *__CRTDECL operator new(std::size_t size) noexcept(false) { - return rpmalloc(size); -} - -extern void *__CRTDECL operator new[](std::size_t size) noexcept(false) { - return rpmalloc(size); -} - -extern void *__CRTDECL operator new(std::size_t size, - const std::nothrow_t &tag) noexcept { - (void)sizeof(tag); - return rpmalloc(size); -} - -extern void *__CRTDECL operator new[](std::size_t size, - const std::nothrow_t &tag) noexcept { - (void)sizeof(tag); - return rpmalloc(size); -} - -#if (__cplusplus >= 201402L || _MSC_VER >= 1916) - -extern void __CRTDECL operator delete(void *p, std::size_t size) noexcept { - (void)sizeof(size); - rpfree(p); -} - -extern void __CRTDECL operator delete[](void *p, std::size_t size) noexcept { - (void)sizeof(size); - rpfree(p); -} - -#endif - -#if (__cplusplus > 201402L || defined(__cpp_aligned_new)) - -extern void __CRTDECL operator delete(void *p, - std::align_val_t align) noexcept { - (void)sizeof(align); - rpfree(p); -} - -extern void __CRTDECL operator delete[](void *p, - std::align_val_t align) noexcept { - (void)sizeof(align); - rpfree(p); -} - -extern void __CRTDECL operator delete(void *p, std::size_t size, - std::align_val_t align) noexcept { - (void)sizeof(size); - (void)sizeof(align); - rpfree(p); -} - -extern void __CRTDECL operator delete[](void *p, std::size_t size, - std::align_val_t align) noexcept { - (void)sizeof(size); - (void)sizeof(align); - rpfree(p); -} - -extern void *__CRTDECL operator new(std::size_t size, - std::align_val_t align) noexcept(false) { - return rpaligned_alloc(static_cast(align), size); -} - -extern void *__CRTDECL operator new[](std::size_t size, - std::align_val_t align) noexcept(false) { - return rpaligned_alloc(static_cast(align), size); -} - -extern void *__CRTDECL operator new(std::size_t size, std::align_val_t align, - const std::nothrow_t &tag) noexcept { - (void)sizeof(tag); - return rpaligned_alloc(static_cast(align), size); -} - -extern void *__CRTDECL operator new[](std::size_t size, std::align_val_t align, - const std::nothrow_t &tag) noexcept { - (void)sizeof(tag); - return rpaligned_alloc(static_cast(align), size); -} - -#endif - -#endif +//===-------------------------- rpnew.h -----------------*- C -*-=============// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This library provides a cross-platform lock free thread caching malloc +// implementation in C11. +// +//===----------------------------------------------------------------------===// + +#ifdef __cplusplus + +#include +#include + +#ifndef __CRTDECL +#define __CRTDECL +#endif + +extern void __CRTDECL operator delete(void *p) noexcept { rpfree(p); } + +extern void __CRTDECL operator delete[](void *p) noexcept { rpfree(p); } + +extern void *__CRTDECL operator new(std::size_t size) noexcept(false) { + return rpmalloc(size); +} + +extern void *__CRTDECL operator new[](std::size_t size) noexcept(false) { + return rpmalloc(size); +} + +extern void *__CRTDECL operator new(std::size_t size, + const std::nothrow_t &tag) noexcept { + (void)sizeof(tag); + return rpmalloc(size); +} + +extern void *__CRTDECL operator new[](std::size_t size, + const std::nothrow_t &tag) noexcept { + (void)sizeof(tag); + return rpmalloc(size); +} + +#if (__cplusplus >= 201402L || _MSC_VER >= 1916) + +extern void __CRTDECL operator delete(void *p, std::size_t size) noexcept { + (void)sizeof(size); + rpfree(p); +} + +extern void __CRTDECL operator delete[](void *p, std::size_t size) noexcept { + (void)sizeof(size); + rpfree(p); +} + +#endif + +#if (__cplusplus > 201402L || defined(__cpp_aligned_new)) + +extern void __CRTDECL operator delete(void *p, + std::align_val_t align) noexcept { + (void)sizeof(align); + rpfree(p); +} + +extern void __CRTDECL operator delete[](void *p, + std::align_val_t align) noexcept { + (void)sizeof(align); + rpfree(p); +} + +extern void __CRTDECL operator delete(void *p, std::size_t size, + std::align_val_t align) noexcept { + (void)sizeof(size); + (void)sizeof(align); + rpfree(p); +} + +extern void __CRTDECL operator delete[](void *p, std::size_t size, + std::align_val_t align) noexcept { + (void)sizeof(size); + (void)sizeof(align); + rpfree(p); +} + +extern void *__CRTDECL operator new(std::size_t size, + std::align_val_t align) noexcept(false) { + return rpaligned_alloc(static_cast(align), size); +} + +extern void *__CRTDECL operator new[](std::size_t size, + std::align_val_t align) noexcept(false) { + return rpaligned_alloc(static_cast(align), size); +} + +extern void *__CRTDECL operator new(std::size_t size, std::align_val_t align, + const std::nothrow_t &tag) noexcept { + (void)sizeof(tag); + return rpaligned_alloc(static_cast(align), size); +} + +extern void *__CRTDECL operator new[](std::size_t size, std::align_val_t align, + const std::nothrow_t &tag) noexcept { + (void)sizeof(tag); + return rpaligned_alloc(static_cast(align), size); +} + +#endif + +#endif diff --git a/llvm/lib/Target/DirectX/DXILFinalizeLinkage.cpp b/llvm/lib/Target/DirectX/DXILFinalizeLinkage.cpp index d315d9bd16f439..d32dda2a67c951 100644 --- a/llvm/lib/Target/DirectX/DXILFinalizeLinkage.cpp +++ b/llvm/lib/Target/DirectX/DXILFinalizeLinkage.cpp @@ -1,65 +1,65 @@ -//===- DXILFinalizeLinkage.cpp - Finalize linkage of functions ------------===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// - -#include "DXILFinalizeLinkage.h" -#include "DirectX.h" -#include "llvm/Analysis/DXILResource.h" -#include "llvm/IR/Function.h" -#include "llvm/IR/GlobalValue.h" -#include "llvm/IR/Metadata.h" -#include "llvm/IR/Module.h" - -#define DEBUG_TYPE "dxil-finalize-linkage" - -using namespace llvm; - -static bool finalizeLinkage(Module &M) { - SmallPtrSet EntriesAndExports; - - // Find all entry points and export functions - for (Function &EF : M.functions()) { - if (!EF.hasFnAttribute("hlsl.shader") && !EF.hasFnAttribute("hlsl.export")) - continue; - EntriesAndExports.insert(&EF); - } - - for (Function &F : M.functions()) { - if (F.getLinkage() == GlobalValue::ExternalLinkage && - !EntriesAndExports.contains(&F)) { - F.setLinkage(GlobalValue::InternalLinkage); - } - } - - return false; -} - -PreservedAnalyses DXILFinalizeLinkage::run(Module &M, - ModuleAnalysisManager &AM) { - if (finalizeLinkage(M)) - return PreservedAnalyses::none(); - return PreservedAnalyses::all(); -} - -bool DXILFinalizeLinkageLegacy::runOnModule(Module &M) { - return finalizeLinkage(M); -} - -void DXILFinalizeLinkageLegacy::getAnalysisUsage(AnalysisUsage &AU) const { - AU.addPreserved(); -} - -char DXILFinalizeLinkageLegacy::ID = 0; - -INITIALIZE_PASS_BEGIN(DXILFinalizeLinkageLegacy, DEBUG_TYPE, - "DXIL Finalize Linkage", false, false) -INITIALIZE_PASS_END(DXILFinalizeLinkageLegacy, DEBUG_TYPE, - "DXIL Finalize Linkage", false, false) - -ModulePass *llvm::createDXILFinalizeLinkageLegacyPass() { - return new DXILFinalizeLinkageLegacy(); -} +//===- DXILFinalizeLinkage.cpp - Finalize linkage of functions ------------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#include "DXILFinalizeLinkage.h" +#include "DirectX.h" +#include "llvm/Analysis/DXILResource.h" +#include "llvm/IR/Function.h" +#include "llvm/IR/GlobalValue.h" +#include "llvm/IR/Metadata.h" +#include "llvm/IR/Module.h" + +#define DEBUG_TYPE "dxil-finalize-linkage" + +using namespace llvm; + +static bool finalizeLinkage(Module &M) { + SmallPtrSet EntriesAndExports; + + // Find all entry points and export functions + for (Function &EF : M.functions()) { + if (!EF.hasFnAttribute("hlsl.shader") && !EF.hasFnAttribute("hlsl.export")) + continue; + EntriesAndExports.insert(&EF); + } + + for (Function &F : M.functions()) { + if (F.getLinkage() == GlobalValue::ExternalLinkage && + !EntriesAndExports.contains(&F)) { + F.setLinkage(GlobalValue::InternalLinkage); + } + } + + return false; +} + +PreservedAnalyses DXILFinalizeLinkage::run(Module &M, + ModuleAnalysisManager &AM) { + if (finalizeLinkage(M)) + return PreservedAnalyses::none(); + return PreservedAnalyses::all(); +} + +bool DXILFinalizeLinkageLegacy::runOnModule(Module &M) { + return finalizeLinkage(M); +} + +void DXILFinalizeLinkageLegacy::getAnalysisUsage(AnalysisUsage &AU) const { + AU.addPreserved(); +} + +char DXILFinalizeLinkageLegacy::ID = 0; + +INITIALIZE_PASS_BEGIN(DXILFinalizeLinkageLegacy, DEBUG_TYPE, + "DXIL Finalize Linkage", false, false) +INITIALIZE_PASS_END(DXILFinalizeLinkageLegacy, DEBUG_TYPE, + "DXIL Finalize Linkage", false, false) + +ModulePass *llvm::createDXILFinalizeLinkageLegacyPass() { + return new DXILFinalizeLinkageLegacy(); +} diff --git a/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.cpp b/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.cpp index 8ea31401121bce..9844fd394aa4c5 100644 --- a/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.cpp +++ b/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.cpp @@ -1,38 +1,38 @@ -//===- DirectXTargetTransformInfo.cpp - DirectX TTI ---------------*- C++ -//-*-===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// -/// -//===----------------------------------------------------------------------===// - -#include "DirectXTargetTransformInfo.h" -#include "llvm/IR/Intrinsics.h" -#include "llvm/IR/IntrinsicsDirectX.h" - -using namespace llvm; - -bool DirectXTTIImpl::isTargetIntrinsicWithScalarOpAtArg(Intrinsic::ID ID, - unsigned ScalarOpdIdx) { - switch (ID) { - case Intrinsic::dx_wave_readlane: - return ScalarOpdIdx == 1; - default: - return false; - } -} - -bool DirectXTTIImpl::isTargetIntrinsicTriviallyScalarizable( - Intrinsic::ID ID) const { - switch (ID) { - case Intrinsic::dx_frac: - case Intrinsic::dx_rsqrt: - case Intrinsic::dx_wave_readlane: - return true; - default: - return false; - } -} +//===- DirectXTargetTransformInfo.cpp - DirectX TTI ---------------*- C++ +//-*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +/// +//===----------------------------------------------------------------------===// + +#include "DirectXTargetTransformInfo.h" +#include "llvm/IR/Intrinsics.h" +#include "llvm/IR/IntrinsicsDirectX.h" + +using namespace llvm; + +bool DirectXTTIImpl::isTargetIntrinsicWithScalarOpAtArg(Intrinsic::ID ID, + unsigned ScalarOpdIdx) { + switch (ID) { + case Intrinsic::dx_wave_readlane: + return ScalarOpdIdx == 1; + default: + return false; + } +} + +bool DirectXTTIImpl::isTargetIntrinsicTriviallyScalarizable( + Intrinsic::ID ID) const { + switch (ID) { + case Intrinsic::dx_frac: + case Intrinsic::dx_rsqrt: + case Intrinsic::dx_wave_readlane: + return true; + default: + return false; + } +} diff --git a/llvm/test/CodeGen/DirectX/atan2.ll b/llvm/test/CodeGen/DirectX/atan2.ll index 9d86f87f3ed50e..b2c650d1162655 100644 --- a/llvm/test/CodeGen/DirectX/atan2.ll +++ b/llvm/test/CodeGen/DirectX/atan2.ll @@ -1,87 +1,87 @@ -; RUN: opt -S -dxil-intrinsic-expansion -mtriple=dxil-pc-shadermodel6.3-library %s | FileCheck %s --check-prefixes=CHECK,EXPCHECK -; RUN: opt -S -dxil-intrinsic-expansion -scalarizer -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library %s | FileCheck %s --check-prefixes=CHECK,DOPCHECK - -; Make sure correct dxil expansions for atan2 are generated for float and half. - -define noundef float @atan2_float(float noundef %y, float noundef %x) { -entry: -; CHECK: [[DIV:%.+]] = fdiv float %y, %x -; EXPCHECK: [[ATAN:%.+]] = call float @llvm.atan.f32(float [[DIV]]) -; DOPCHECK: [[ATAN:%.+]] = call float @dx.op.unary.f32(i32 17, float [[DIV]]) -; CHECK-DAG: [[ADD_PI:%.+]] = fadd float [[ATAN]], 0x400921FB60000000 -; CHECK-DAG: [[SUB_PI:%.+]] = fsub float [[ATAN]], 0x400921FB60000000 -; CHECK-DAG: [[X_LT_0:%.+]] = fcmp olt float %x, 0.000000e+00 -; CHECK-DAG: [[X_EQ_0:%.+]] = fcmp oeq float %x, 0.000000e+00 -; CHECK-DAG: [[Y_GE_0:%.+]] = fcmp oge float %y, 0.000000e+00 -; CHECK-DAG: [[Y_LT_0:%.+]] = fcmp olt float %y, 0.000000e+00 -; CHECK: [[XLT0_AND_YGE0:%.+]] = and i1 [[X_LT_0]], [[Y_GE_0]] -; CHECK: [[SELECT_ADD_PI:%.+]] = select i1 [[XLT0_AND_YGE0]], float [[ADD_PI]], float [[ATAN]] -; CHECK: [[XLT0_AND_YLT0:%.+]] = and i1 [[X_LT_0]], [[Y_LT_0]] -; CHECK: [[SELECT_SUB_PI:%.+]] = select i1 [[XLT0_AND_YLT0]], float [[SUB_PI]], float [[SELECT_ADD_PI]] -; CHECK: [[XEQ0_AND_YLT0:%.+]] = and i1 [[X_EQ_0]], [[Y_LT_0]] -; CHECK: [[SELECT_NEGHPI:%.+]] = select i1 [[XEQ0_AND_YLT0]], float 0xBFF921FB60000000, float [[SELECT_SUB_PI]] -; CHECK: [[XEQ0_AND_YGE0:%.+]] = and i1 [[X_EQ_0]], [[Y_GE_0]] -; CHECK: [[SELECT_HPI:%.+]] = select i1 [[XEQ0_AND_YGE0]], float 0x3FF921FB60000000, float [[SELECT_NEGHPI]] -; CHECK: ret float [[SELECT_HPI]] - %elt.atan2 = call float @llvm.atan2.f32(float %y, float %x) - ret float %elt.atan2 -} - -define noundef half @atan2_half(half noundef %y, half noundef %x) { -entry: -; CHECK: [[DIV:%.+]] = fdiv half %y, %x -; EXPCHECK: [[ATAN:%.+]] = call half @llvm.atan.f16(half [[DIV]]) -; DOPCHECK: [[ATAN:%.+]] = call half @dx.op.unary.f16(i32 17, half [[DIV]]) -; CHECK-DAG: [[ADD_PI:%.+]] = fadd half [[ATAN]], 0xH4248 -; CHECK-DAG: [[SUB_PI:%.+]] = fsub half [[ATAN]], 0xH4248 -; CHECK-DAG: [[X_LT_0:%.+]] = fcmp olt half %x, 0xH0000 -; CHECK-DAG: [[X_EQ_0:%.+]] = fcmp oeq half %x, 0xH0000 -; CHECK-DAG: [[Y_GE_0:%.+]] = fcmp oge half %y, 0xH0000 -; CHECK-DAG: [[Y_LT_0:%.+]] = fcmp olt half %y, 0xH0000 -; CHECK: [[XLT0_AND_YGE0:%.+]] = and i1 [[X_LT_0]], [[Y_GE_0]] -; CHECK: [[SELECT_ADD_PI:%.+]] = select i1 [[XLT0_AND_YGE0]], half [[ADD_PI]], half [[ATAN]] -; CHECK: [[XLT0_AND_YLT0:%.+]] = and i1 [[X_LT_0]], [[Y_LT_0]] -; CHECK: [[SELECT_SUB_PI:%.+]] = select i1 [[XLT0_AND_YLT0]], half [[SUB_PI]], half [[SELECT_ADD_PI]] -; CHECK: [[XEQ0_AND_YLT0:%.+]] = and i1 [[X_EQ_0]], [[Y_LT_0]] -; CHECK: [[SELECT_NEGHPI:%.+]] = select i1 [[XEQ0_AND_YLT0]], half 0xHBE48, half [[SELECT_SUB_PI]] -; CHECK: [[XEQ0_AND_YGE0:%.+]] = and i1 [[X_EQ_0]], [[Y_GE_0]] -; CHECK: [[SELECT_HPI:%.+]] = select i1 [[XEQ0_AND_YGE0]], half 0xH3E48, half [[SELECT_NEGHPI]] -; CHECK: ret half [[SELECT_HPI]] - %elt.atan2 = call half @llvm.atan2.f16(half %y, half %x) - ret half %elt.atan2 -} - -define noundef <4 x float> @atan2_float4(<4 x float> noundef %y, <4 x float> noundef %x) { -entry: -; Just Expansion, no scalarization or lowering: -; EXPCHECK: [[DIV:%.+]] = fdiv <4 x float> %y, %x -; EXPCHECK: [[ATAN:%.+]] = call <4 x float> @llvm.atan.v4f32(<4 x float> [[DIV]]) -; EXPCHECK-DAG: [[ADD_PI:%.+]] = fadd <4 x float> [[ATAN]], -; EXPCHECK-DAG: [[SUB_PI:%.+]] = fsub <4 x float> [[ATAN]], -; EXPCHECK-DAG: [[X_LT_0:%.+]] = fcmp olt <4 x float> %x, zeroinitializer -; EXPCHECK-DAG: [[X_EQ_0:%.+]] = fcmp oeq <4 x float> %x, zeroinitializer -; EXPCHECK-DAG: [[Y_GE_0:%.+]] = fcmp oge <4 x float> %y, zeroinitializer -; EXPCHECK-DAG: [[Y_LT_0:%.+]] = fcmp olt <4 x float> %y, zeroinitializer -; EXPCHECK: [[XLT0_AND_YGE0:%.+]] = and <4 x i1> [[X_LT_0]], [[Y_GE_0]] -; EXPCHECK: [[SELECT_ADD_PI:%.+]] = select <4 x i1> [[XLT0_AND_YGE0]], <4 x float> [[ADD_PI]], <4 x float> [[ATAN]] -; EXPCHECK: [[XLT0_AND_YLT0:%.+]] = and <4 x i1> [[X_LT_0]], [[Y_LT_0]] -; EXPCHECK: [[SELECT_SUB_PI:%.+]] = select <4 x i1> [[XLT0_AND_YLT0]], <4 x float> [[SUB_PI]], <4 x float> [[SELECT_ADD_PI]] -; EXPCHECK: [[XEQ0_AND_YLT0:%.+]] = and <4 x i1> [[X_EQ_0]], [[Y_LT_0]] -; EXPCHECK: [[SELECT_NEGHPI:%.+]] = select <4 x i1> [[XEQ0_AND_YLT0]], <4 x float> , <4 x float> [[SELECT_SUB_PI]] -; EXPCHECK: [[XEQ0_AND_YGE0:%.+]] = and <4 x i1> [[X_EQ_0]], [[Y_GE_0]] -; EXPCHECK: [[SELECT_HPI:%.+]] = select <4 x i1> [[XEQ0_AND_YGE0]], <4 x float> , <4 x float> [[SELECT_NEGHPI]] -; EXPCHECK: ret <4 x float> [[SELECT_HPI]] - -; Scalarization occurs after expansion, so atan scalarization is tested separately. -; Expansion, scalarization and lowering: -; Just make sure this expands to exactly 4 scalar DXIL atan (OpCode=17) calls. -; DOPCHECK-COUNT-4: call float @dx.op.unary.f32(i32 17, float %{{.*}}) -; DOPCHECK-NOT: call float @dx.op.unary.f32(i32 17, - - %elt.atan2 = call <4 x float> @llvm.atan2.v4f32(<4 x float> %y, <4 x float> %x) - ret <4 x float> %elt.atan2 -} - -declare half @llvm.atan2.f16(half, half) -declare float @llvm.atan2.f32(float, float) -declare <4 x float> @llvm.atan2.v4f32(<4 x float>, <4 x float>) +; RUN: opt -S -dxil-intrinsic-expansion -mtriple=dxil-pc-shadermodel6.3-library %s | FileCheck %s --check-prefixes=CHECK,EXPCHECK +; RUN: opt -S -dxil-intrinsic-expansion -scalarizer -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library %s | FileCheck %s --check-prefixes=CHECK,DOPCHECK + +; Make sure correct dxil expansions for atan2 are generated for float and half. + +define noundef float @atan2_float(float noundef %y, float noundef %x) { +entry: +; CHECK: [[DIV:%.+]] = fdiv float %y, %x +; EXPCHECK: [[ATAN:%.+]] = call float @llvm.atan.f32(float [[DIV]]) +; DOPCHECK: [[ATAN:%.+]] = call float @dx.op.unary.f32(i32 17, float [[DIV]]) +; CHECK-DAG: [[ADD_PI:%.+]] = fadd float [[ATAN]], 0x400921FB60000000 +; CHECK-DAG: [[SUB_PI:%.+]] = fsub float [[ATAN]], 0x400921FB60000000 +; CHECK-DAG: [[X_LT_0:%.+]] = fcmp olt float %x, 0.000000e+00 +; CHECK-DAG: [[X_EQ_0:%.+]] = fcmp oeq float %x, 0.000000e+00 +; CHECK-DAG: [[Y_GE_0:%.+]] = fcmp oge float %y, 0.000000e+00 +; CHECK-DAG: [[Y_LT_0:%.+]] = fcmp olt float %y, 0.000000e+00 +; CHECK: [[XLT0_AND_YGE0:%.+]] = and i1 [[X_LT_0]], [[Y_GE_0]] +; CHECK: [[SELECT_ADD_PI:%.+]] = select i1 [[XLT0_AND_YGE0]], float [[ADD_PI]], float [[ATAN]] +; CHECK: [[XLT0_AND_YLT0:%.+]] = and i1 [[X_LT_0]], [[Y_LT_0]] +; CHECK: [[SELECT_SUB_PI:%.+]] = select i1 [[XLT0_AND_YLT0]], float [[SUB_PI]], float [[SELECT_ADD_PI]] +; CHECK: [[XEQ0_AND_YLT0:%.+]] = and i1 [[X_EQ_0]], [[Y_LT_0]] +; CHECK: [[SELECT_NEGHPI:%.+]] = select i1 [[XEQ0_AND_YLT0]], float 0xBFF921FB60000000, float [[SELECT_SUB_PI]] +; CHECK: [[XEQ0_AND_YGE0:%.+]] = and i1 [[X_EQ_0]], [[Y_GE_0]] +; CHECK: [[SELECT_HPI:%.+]] = select i1 [[XEQ0_AND_YGE0]], float 0x3FF921FB60000000, float [[SELECT_NEGHPI]] +; CHECK: ret float [[SELECT_HPI]] + %elt.atan2 = call float @llvm.atan2.f32(float %y, float %x) + ret float %elt.atan2 +} + +define noundef half @atan2_half(half noundef %y, half noundef %x) { +entry: +; CHECK: [[DIV:%.+]] = fdiv half %y, %x +; EXPCHECK: [[ATAN:%.+]] = call half @llvm.atan.f16(half [[DIV]]) +; DOPCHECK: [[ATAN:%.+]] = call half @dx.op.unary.f16(i32 17, half [[DIV]]) +; CHECK-DAG: [[ADD_PI:%.+]] = fadd half [[ATAN]], 0xH4248 +; CHECK-DAG: [[SUB_PI:%.+]] = fsub half [[ATAN]], 0xH4248 +; CHECK-DAG: [[X_LT_0:%.+]] = fcmp olt half %x, 0xH0000 +; CHECK-DAG: [[X_EQ_0:%.+]] = fcmp oeq half %x, 0xH0000 +; CHECK-DAG: [[Y_GE_0:%.+]] = fcmp oge half %y, 0xH0000 +; CHECK-DAG: [[Y_LT_0:%.+]] = fcmp olt half %y, 0xH0000 +; CHECK: [[XLT0_AND_YGE0:%.+]] = and i1 [[X_LT_0]], [[Y_GE_0]] +; CHECK: [[SELECT_ADD_PI:%.+]] = select i1 [[XLT0_AND_YGE0]], half [[ADD_PI]], half [[ATAN]] +; CHECK: [[XLT0_AND_YLT0:%.+]] = and i1 [[X_LT_0]], [[Y_LT_0]] +; CHECK: [[SELECT_SUB_PI:%.+]] = select i1 [[XLT0_AND_YLT0]], half [[SUB_PI]], half [[SELECT_ADD_PI]] +; CHECK: [[XEQ0_AND_YLT0:%.+]] = and i1 [[X_EQ_0]], [[Y_LT_0]] +; CHECK: [[SELECT_NEGHPI:%.+]] = select i1 [[XEQ0_AND_YLT0]], half 0xHBE48, half [[SELECT_SUB_PI]] +; CHECK: [[XEQ0_AND_YGE0:%.+]] = and i1 [[X_EQ_0]], [[Y_GE_0]] +; CHECK: [[SELECT_HPI:%.+]] = select i1 [[XEQ0_AND_YGE0]], half 0xH3E48, half [[SELECT_NEGHPI]] +; CHECK: ret half [[SELECT_HPI]] + %elt.atan2 = call half @llvm.atan2.f16(half %y, half %x) + ret half %elt.atan2 +} + +define noundef <4 x float> @atan2_float4(<4 x float> noundef %y, <4 x float> noundef %x) { +entry: +; Just Expansion, no scalarization or lowering: +; EXPCHECK: [[DIV:%.+]] = fdiv <4 x float> %y, %x +; EXPCHECK: [[ATAN:%.+]] = call <4 x float> @llvm.atan.v4f32(<4 x float> [[DIV]]) +; EXPCHECK-DAG: [[ADD_PI:%.+]] = fadd <4 x float> [[ATAN]], +; EXPCHECK-DAG: [[SUB_PI:%.+]] = fsub <4 x float> [[ATAN]], +; EXPCHECK-DAG: [[X_LT_0:%.+]] = fcmp olt <4 x float> %x, zeroinitializer +; EXPCHECK-DAG: [[X_EQ_0:%.+]] = fcmp oeq <4 x float> %x, zeroinitializer +; EXPCHECK-DAG: [[Y_GE_0:%.+]] = fcmp oge <4 x float> %y, zeroinitializer +; EXPCHECK-DAG: [[Y_LT_0:%.+]] = fcmp olt <4 x float> %y, zeroinitializer +; EXPCHECK: [[XLT0_AND_YGE0:%.+]] = and <4 x i1> [[X_LT_0]], [[Y_GE_0]] +; EXPCHECK: [[SELECT_ADD_PI:%.+]] = select <4 x i1> [[XLT0_AND_YGE0]], <4 x float> [[ADD_PI]], <4 x float> [[ATAN]] +; EXPCHECK: [[XLT0_AND_YLT0:%.+]] = and <4 x i1> [[X_LT_0]], [[Y_LT_0]] +; EXPCHECK: [[SELECT_SUB_PI:%.+]] = select <4 x i1> [[XLT0_AND_YLT0]], <4 x float> [[SUB_PI]], <4 x float> [[SELECT_ADD_PI]] +; EXPCHECK: [[XEQ0_AND_YLT0:%.+]] = and <4 x i1> [[X_EQ_0]], [[Y_LT_0]] +; EXPCHECK: [[SELECT_NEGHPI:%.+]] = select <4 x i1> [[XEQ0_AND_YLT0]], <4 x float> , <4 x float> [[SELECT_SUB_PI]] +; EXPCHECK: [[XEQ0_AND_YGE0:%.+]] = and <4 x i1> [[X_EQ_0]], [[Y_GE_0]] +; EXPCHECK: [[SELECT_HPI:%.+]] = select <4 x i1> [[XEQ0_AND_YGE0]], <4 x float> , <4 x float> [[SELECT_NEGHPI]] +; EXPCHECK: ret <4 x float> [[SELECT_HPI]] + +; Scalarization occurs after expansion, so atan scalarization is tested separately. +; Expansion, scalarization and lowering: +; Just make sure this expands to exactly 4 scalar DXIL atan (OpCode=17) calls. +; DOPCHECK-COUNT-4: call float @dx.op.unary.f32(i32 17, float %{{.*}}) +; DOPCHECK-NOT: call float @dx.op.unary.f32(i32 17, + + %elt.atan2 = call <4 x float> @llvm.atan2.v4f32(<4 x float> %y, <4 x float> %x) + ret <4 x float> %elt.atan2 +} + +declare half @llvm.atan2.f16(half, half) +declare float @llvm.atan2.f32(float, float) +declare <4 x float> @llvm.atan2.v4f32(<4 x float>, <4 x float>) diff --git a/llvm/test/CodeGen/DirectX/atan2_error.ll b/llvm/test/CodeGen/DirectX/atan2_error.ll index 372934098b7cab..9b66f9f1dd45a7 100644 --- a/llvm/test/CodeGen/DirectX/atan2_error.ll +++ b/llvm/test/CodeGen/DirectX/atan2_error.ll @@ -1,11 +1,11 @@ -; RUN: not opt -S -dxil-intrinsic-expansion -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library %s 2>&1 | FileCheck %s - -; DXIL operation atan does not support double overload type -; CHECK: in function atan2_double -; CHECK-SAME: Cannot create ATan operation: Invalid overload type - -define noundef double @atan2_double(double noundef %a, double noundef %b) #0 { -entry: - %1 = call double @llvm.atan2.f64(double %a, double %b) - ret double %1 -} +; RUN: not opt -S -dxil-intrinsic-expansion -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library %s 2>&1 | FileCheck %s + +; DXIL operation atan does not support double overload type +; CHECK: in function atan2_double +; CHECK-SAME: Cannot create ATan operation: Invalid overload type + +define noundef double @atan2_double(double noundef %a, double noundef %b) #0 { +entry: + %1 = call double @llvm.atan2.f64(double %a, double %b) + ret double %1 +} diff --git a/llvm/test/CodeGen/DirectX/cross.ll b/llvm/test/CodeGen/DirectX/cross.ll index 6ec3ec4d3594af..6153cf7cddc9d5 100644 --- a/llvm/test/CodeGen/DirectX/cross.ll +++ b/llvm/test/CodeGen/DirectX/cross.ll @@ -1,56 +1,56 @@ -; RUN: opt -S -dxil-intrinsic-expansion < %s | FileCheck %s - -; Make sure dxil operation function calls for cross are generated for half/float. - -declare <3 x half> @llvm.dx.cross.v3f16(<3 x half>, <3 x half>) -declare <3 x float> @llvm.dx.cross.v3f32(<3 x float>, <3 x float>) - -define noundef <3 x half> @test_cross_half3(<3 x half> noundef %p0, <3 x half> noundef %p1) { -entry: - ; CHECK: %x0 = extractelement <3 x half> %p0, i64 0 - ; CHECK: %x1 = extractelement <3 x half> %p0, i64 1 - ; CHECK: %x2 = extractelement <3 x half> %p0, i64 2 - ; CHECK: %y0 = extractelement <3 x half> %p1, i64 0 - ; CHECK: %y1 = extractelement <3 x half> %p1, i64 1 - ; CHECK: %y2 = extractelement <3 x half> %p1, i64 2 - ; CHECK: %0 = fmul half %x1, %y2 - ; CHECK: %1 = fmul half %x2, %y1 - ; CHECK: %hlsl.cross1 = fsub half %0, %1 - ; CHECK: %2 = fmul half %x2, %y0 - ; CHECK: %3 = fmul half %x0, %y2 - ; CHECK: %hlsl.cross2 = fsub half %2, %3 - ; CHECK: %4 = fmul half %x0, %y1 - ; CHECK: %5 = fmul half %x1, %y0 - ; CHECK: %hlsl.cross3 = fsub half %4, %5 - ; CHECK: %6 = insertelement <3 x half> undef, half %hlsl.cross1, i64 0 - ; CHECK: %7 = insertelement <3 x half> %6, half %hlsl.cross2, i64 1 - ; CHECK: %8 = insertelement <3 x half> %7, half %hlsl.cross3, i64 2 - ; CHECK: ret <3 x half> %8 - %hlsl.cross = call <3 x half> @llvm.dx.cross.v3f16(<3 x half> %p0, <3 x half> %p1) - ret <3 x half> %hlsl.cross -} - -define noundef <3 x float> @test_cross_float3(<3 x float> noundef %p0, <3 x float> noundef %p1) { -entry: - ; CHECK: %x0 = extractelement <3 x float> %p0, i64 0 - ; CHECK: %x1 = extractelement <3 x float> %p0, i64 1 - ; CHECK: %x2 = extractelement <3 x float> %p0, i64 2 - ; CHECK: %y0 = extractelement <3 x float> %p1, i64 0 - ; CHECK: %y1 = extractelement <3 x float> %p1, i64 1 - ; CHECK: %y2 = extractelement <3 x float> %p1, i64 2 - ; CHECK: %0 = fmul float %x1, %y2 - ; CHECK: %1 = fmul float %x2, %y1 - ; CHECK: %hlsl.cross1 = fsub float %0, %1 - ; CHECK: %2 = fmul float %x2, %y0 - ; CHECK: %3 = fmul float %x0, %y2 - ; CHECK: %hlsl.cross2 = fsub float %2, %3 - ; CHECK: %4 = fmul float %x0, %y1 - ; CHECK: %5 = fmul float %x1, %y0 - ; CHECK: %hlsl.cross3 = fsub float %4, %5 - ; CHECK: %6 = insertelement <3 x float> undef, float %hlsl.cross1, i64 0 - ; CHECK: %7 = insertelement <3 x float> %6, float %hlsl.cross2, i64 1 - ; CHECK: %8 = insertelement <3 x float> %7, float %hlsl.cross3, i64 2 - ; CHECK: ret <3 x float> %8 - %hlsl.cross = call <3 x float> @llvm.dx.cross.v3f32(<3 x float> %p0, <3 x float> %p1) - ret <3 x float> %hlsl.cross -} +; RUN: opt -S -dxil-intrinsic-expansion < %s | FileCheck %s + +; Make sure dxil operation function calls for cross are generated for half/float. + +declare <3 x half> @llvm.dx.cross.v3f16(<3 x half>, <3 x half>) +declare <3 x float> @llvm.dx.cross.v3f32(<3 x float>, <3 x float>) + +define noundef <3 x half> @test_cross_half3(<3 x half> noundef %p0, <3 x half> noundef %p1) { +entry: + ; CHECK: %x0 = extractelement <3 x half> %p0, i64 0 + ; CHECK: %x1 = extractelement <3 x half> %p0, i64 1 + ; CHECK: %x2 = extractelement <3 x half> %p0, i64 2 + ; CHECK: %y0 = extractelement <3 x half> %p1, i64 0 + ; CHECK: %y1 = extractelement <3 x half> %p1, i64 1 + ; CHECK: %y2 = extractelement <3 x half> %p1, i64 2 + ; CHECK: %0 = fmul half %x1, %y2 + ; CHECK: %1 = fmul half %x2, %y1 + ; CHECK: %hlsl.cross1 = fsub half %0, %1 + ; CHECK: %2 = fmul half %x2, %y0 + ; CHECK: %3 = fmul half %x0, %y2 + ; CHECK: %hlsl.cross2 = fsub half %2, %3 + ; CHECK: %4 = fmul half %x0, %y1 + ; CHECK: %5 = fmul half %x1, %y0 + ; CHECK: %hlsl.cross3 = fsub half %4, %5 + ; CHECK: %6 = insertelement <3 x half> undef, half %hlsl.cross1, i64 0 + ; CHECK: %7 = insertelement <3 x half> %6, half %hlsl.cross2, i64 1 + ; CHECK: %8 = insertelement <3 x half> %7, half %hlsl.cross3, i64 2 + ; CHECK: ret <3 x half> %8 + %hlsl.cross = call <3 x half> @llvm.dx.cross.v3f16(<3 x half> %p0, <3 x half> %p1) + ret <3 x half> %hlsl.cross +} + +define noundef <3 x float> @test_cross_float3(<3 x float> noundef %p0, <3 x float> noundef %p1) { +entry: + ; CHECK: %x0 = extractelement <3 x float> %p0, i64 0 + ; CHECK: %x1 = extractelement <3 x float> %p0, i64 1 + ; CHECK: %x2 = extractelement <3 x float> %p0, i64 2 + ; CHECK: %y0 = extractelement <3 x float> %p1, i64 0 + ; CHECK: %y1 = extractelement <3 x float> %p1, i64 1 + ; CHECK: %y2 = extractelement <3 x float> %p1, i64 2 + ; CHECK: %0 = fmul float %x1, %y2 + ; CHECK: %1 = fmul float %x2, %y1 + ; CHECK: %hlsl.cross1 = fsub float %0, %1 + ; CHECK: %2 = fmul float %x2, %y0 + ; CHECK: %3 = fmul float %x0, %y2 + ; CHECK: %hlsl.cross2 = fsub float %2, %3 + ; CHECK: %4 = fmul float %x0, %y1 + ; CHECK: %5 = fmul float %x1, %y0 + ; CHECK: %hlsl.cross3 = fsub float %4, %5 + ; CHECK: %6 = insertelement <3 x float> undef, float %hlsl.cross1, i64 0 + ; CHECK: %7 = insertelement <3 x float> %6, float %hlsl.cross2, i64 1 + ; CHECK: %8 = insertelement <3 x float> %7, float %hlsl.cross3, i64 2 + ; CHECK: ret <3 x float> %8 + %hlsl.cross = call <3 x float> @llvm.dx.cross.v3f32(<3 x float> %p0, <3 x float> %p1) + ret <3 x float> %hlsl.cross +} diff --git a/llvm/test/CodeGen/DirectX/finalize_linkage.ll b/llvm/test/CodeGen/DirectX/finalize_linkage.ll index 0ee8a5f44593ba..b6da9f6cb3926a 100644 --- a/llvm/test/CodeGen/DirectX/finalize_linkage.ll +++ b/llvm/test/CodeGen/DirectX/finalize_linkage.ll @@ -1,64 +1,64 @@ -; RUN: opt -S -dxil-finalize-linkage -mtriple=dxil-unknown-shadermodel6.5-compute %s | FileCheck %s -; RUN: llc %s --filetype=asm -o - | FileCheck %s --check-prefixes=CHECK-LLC - -target triple = "dxilv1.5-pc-shadermodel6.5-compute" - -; DXILFinalizeLinkage changes linkage of all functions that are not -; entry points or exported function to internal. - -; CHECK: define internal void @"?f1@@YAXXZ"() -define void @"?f1@@YAXXZ"() #0 { -entry: - ret void -} - -; CHECK: define internal void @"?f2@@YAXXZ"() -define void @"?f2@@YAXXZ"() #0 { -entry: - ret void -} - -; CHECK: define internal void @"?f3@@YAXXZ"() -define void @"?f3@@YAXXZ"() #0 { -entry: - ret void -} - -; CHECK: define internal void @"?foo@@YAXXZ"() -define void @"?foo@@YAXXZ"() #0 { -entry: - call void @"?f2@@YAXXZ"() #3 - ret void -} - -; Exported function - do not change linkage -; CHECK: define void @"?bar@@YAXXZ"() -define void @"?bar@@YAXXZ"() #1 { -entry: - call void @"?f3@@YAXXZ"() #3 - ret void -} - -; CHECK: define internal void @"?main@@YAXXZ"() #0 -define internal void @"?main@@YAXXZ"() #0 { -entry: - call void @"?foo@@YAXXZ"() #3 - call void @"?bar@@YAXXZ"() #3 - ret void -} - -; Entry point function - do not change linkage -; CHECK: define void @main() #2 -define void @main() #2 { -entry: - call void @"?main@@YAXXZ"() - ret void -} - -attributes #0 = { convergent noinline nounwind optnone} -attributes #1 = { convergent noinline nounwind optnone "hlsl.export"} -attributes #2 = { convergent "hlsl.numthreads"="4,1,1" "hlsl.shader"="compute"} -attributes #3 = { convergent } - -; Make sure "hlsl.export" attribute is stripped by llc -; CHECK-LLC-NOT: "hlsl.export" +; RUN: opt -S -dxil-finalize-linkage -mtriple=dxil-unknown-shadermodel6.5-compute %s | FileCheck %s +; RUN: llc %s --filetype=asm -o - | FileCheck %s --check-prefixes=CHECK-LLC + +target triple = "dxilv1.5-pc-shadermodel6.5-compute" + +; DXILFinalizeLinkage changes linkage of all functions that are not +; entry points or exported function to internal. + +; CHECK: define internal void @"?f1@@YAXXZ"() +define void @"?f1@@YAXXZ"() #0 { +entry: + ret void +} + +; CHECK: define internal void @"?f2@@YAXXZ"() +define void @"?f2@@YAXXZ"() #0 { +entry: + ret void +} + +; CHECK: define internal void @"?f3@@YAXXZ"() +define void @"?f3@@YAXXZ"() #0 { +entry: + ret void +} + +; CHECK: define internal void @"?foo@@YAXXZ"() +define void @"?foo@@YAXXZ"() #0 { +entry: + call void @"?f2@@YAXXZ"() #3 + ret void +} + +; Exported function - do not change linkage +; CHECK: define void @"?bar@@YAXXZ"() +define void @"?bar@@YAXXZ"() #1 { +entry: + call void @"?f3@@YAXXZ"() #3 + ret void +} + +; CHECK: define internal void @"?main@@YAXXZ"() #0 +define internal void @"?main@@YAXXZ"() #0 { +entry: + call void @"?foo@@YAXXZ"() #3 + call void @"?bar@@YAXXZ"() #3 + ret void +} + +; Entry point function - do not change linkage +; CHECK: define void @main() #2 +define void @main() #2 { +entry: + call void @"?main@@YAXXZ"() + ret void +} + +attributes #0 = { convergent noinline nounwind optnone} +attributes #1 = { convergent noinline nounwind optnone "hlsl.export"} +attributes #2 = { convergent "hlsl.numthreads"="4,1,1" "hlsl.shader"="compute"} +attributes #3 = { convergent } + +; Make sure "hlsl.export" attribute is stripped by llc +; CHECK-LLC-NOT: "hlsl.export" diff --git a/llvm/test/CodeGen/DirectX/normalize.ll b/llvm/test/CodeGen/DirectX/normalize.ll index 2aba9d5f74d78e..de106be1243712 100644 --- a/llvm/test/CodeGen/DirectX/normalize.ll +++ b/llvm/test/CodeGen/DirectX/normalize.ll @@ -1,112 +1,112 @@ -; RUN: opt -S -dxil-intrinsic-expansion < %s | FileCheck %s --check-prefixes=CHECK,EXPCHECK -; RUN: opt -S -dxil-intrinsic-expansion -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library < %s | FileCheck %s --check-prefixes=CHECK,DOPCHECK - -; Make sure dxil operation function calls for normalize are generated for half/float. - -declare half @llvm.dx.normalize.f16(half) -declare <2 x half> @llvm.dx.normalize.v2f16(<2 x half>) -declare <3 x half> @llvm.dx.normalize.v3f16(<3 x half>) -declare <4 x half> @llvm.dx.normalize.v4f16(<4 x half>) - -declare float @llvm.dx.normalize.f32(float) -declare <2 x float> @llvm.dx.normalize.v2f32(<2 x float>) -declare <3 x float> @llvm.dx.normalize.v3f32(<3 x float>) -declare <4 x float> @llvm.dx.normalize.v4f32(<4 x float>) - -define noundef half @test_normalize_half(half noundef %p0) { -entry: - ; CHECK: fdiv half %p0, %p0 - %hlsl.normalize = call half @llvm.dx.normalize.f16(half %p0) - ret half %hlsl.normalize -} - -define noundef <2 x half> @test_normalize_half2(<2 x half> noundef %p0) { -entry: - ; EXPCHECK: [[doth2:%.*]] = call half @llvm.dx.dot2.v2f16(<2 x half> %{{.*}}, <2 x half> %{{.*}}) - ; DOPCHECK: [[doth2:%.*]] = call half @dx.op.dot2.f16(i32 54, half %{{.*}}, half %{{.*}}, half %{{.*}}, half %{{.*}}) - ; EXPCHECK: [[rsqrt:%.*]] = call half @llvm.dx.rsqrt.f16(half [[doth2]]) - ; DOPCHECK: [[rsqrt:%.*]] = call half @dx.op.unary.f16(i32 25, half [[doth2]]) - ; CHECK: [[splatinserth2:%.*]] = insertelement <2 x half> poison, half [[rsqrt]], i64 0 - ; CHECK: [[splat:%.*]] = shufflevector <2 x half> [[splatinserth2]], <2 x half> poison, <2 x i32> zeroinitializer - ; CHECK: fmul <2 x half> %p0, [[splat]] - - %hlsl.normalize = call <2 x half> @llvm.dx.normalize.v2f16(<2 x half> %p0) - ret <2 x half> %hlsl.normalize -} - -define noundef <3 x half> @test_normalize_half3(<3 x half> noundef %p0) { -entry: - ; EXPCHECK: [[doth3:%.*]] = call half @llvm.dx.dot3.v3f16(<3 x half> %{{.*}}, <3 x half> %{{.*}}) - ; DOPCHECK: [[doth3:%.*]] = call half @dx.op.dot3.f16(i32 55, half %{{.*}}, half %{{.*}}, half %{{.*}}, half %{{.*}}) - ; EXPCHECK: [[rsqrt:%.*]] = call half @llvm.dx.rsqrt.f16(half [[doth3]]) - ; DOPCHECK: [[rsqrt:%.*]] = call half @dx.op.unary.f16(i32 25, half [[doth3]]) - ; CHECK: [[splatinserth3:%.*]] = insertelement <3 x half> poison, half [[rsqrt]], i64 0 - ; CHECK: [[splat:%.*]] shufflevector <3 x half> [[splatinserth3]], <3 x half> poison, <3 x i32> zeroinitializer - ; CHECK: fmul <3 x half> %p0, %.splat - - %hlsl.normalize = call <3 x half> @llvm.dx.normalize.v3f16(<3 x half> %p0) - ret <3 x half> %hlsl.normalize -} - -define noundef <4 x half> @test_normalize_half4(<4 x half> noundef %p0) { -entry: - ; EXPCHECK: [[doth4:%.*]] = call half @llvm.dx.dot4.v4f16(<4 x half> %{{.*}}, <4 x half> %{{.*}}) - ; DOPCHECK: [[doth4:%.*]] = call half @dx.op.dot4.f16(i32 56, half %{{.*}}, half %{{.*}}, half %{{.*}}, half %{{.*}}) - ; EXPCHECK: [[rsqrt:%.*]] = call half @llvm.dx.rsqrt.f16(half [[doth4]]) - ; DOPCHECK: [[rsqrt:%.*]] = call half @dx.op.unary.f16(i32 25, half [[doth4]]) - ; CHECK: [[splatinserth4:%.*]] = insertelement <4 x half> poison, half [[rsqrt]], i64 0 - ; CHECK: [[splat:%.*]] shufflevector <4 x half> [[splatinserth4]], <4 x half> poison, <4 x i32> zeroinitializer - ; CHECK: fmul <4 x half> %p0, %.splat - - %hlsl.normalize = call <4 x half> @llvm.dx.normalize.v4f16(<4 x half> %p0) - ret <4 x half> %hlsl.normalize -} - -define noundef float @test_normalize_float(float noundef %p0) { -entry: - ; CHECK: fdiv float %p0, %p0 - %hlsl.normalize = call float @llvm.dx.normalize.f32(float %p0) - ret float %hlsl.normalize -} - -define noundef <2 x float> @test_normalize_float2(<2 x float> noundef %p0) { -entry: - ; EXPCHECK: [[dotf2:%.*]] = call float @llvm.dx.dot2.v2f32(<2 x float> %{{.*}}, <2 x float> %{{.*}}) - ; DOPCHECK: [[dotf2:%.*]] = call float @dx.op.dot2.f32(i32 54, float %{{.*}}, float %{{.*}}, float %{{.*}}, float %{{.*}}) - ; EXPCHECK: [[rsqrt:%.*]] = call float @llvm.dx.rsqrt.f32(float [[dotf2]]) - ; DOPCHECK: [[rsqrt:%.*]] = call float @dx.op.unary.f32(i32 25, float [[dotf2]]) - ; CHECK: [[splatinsertf2:%.*]] = insertelement <2 x float> poison, float [[rsqrt]], i64 0 - ; CHECK: [[splat:%.*]] shufflevector <2 x float> [[splatinsertf2]], <2 x float> poison, <2 x i32> zeroinitializer - ; CHECK: fmul <2 x float> %p0, %.splat - - %hlsl.normalize = call <2 x float> @llvm.dx.normalize.v2f32(<2 x float> %p0) - ret <2 x float> %hlsl.normalize -} - -define noundef <3 x float> @test_normalize_float3(<3 x float> noundef %p0) { -entry: - ; EXPCHECK: [[dotf3:%.*]] = call float @llvm.dx.dot3.v3f32(<3 x float> %{{.*}}, <3 x float> %{{.*}}) - ; DOPCHECK: [[dotf3:%.*]] = call float @dx.op.dot3.f32(i32 55, float %{{.*}}, float %{{.*}}, float %{{.*}}, float %{{.*}}) - ; EXPCHECK: [[rsqrt:%.*]] = call float @llvm.dx.rsqrt.f32(float [[dotf3]]) - ; DOPCHECK: [[rsqrt:%.*]] = call float @dx.op.unary.f32(i32 25, float [[dotf3]]) - ; CHECK: [[splatinsertf3:%.*]] = insertelement <3 x float> poison, float [[rsqrt]], i64 0 - ; CHECK: [[splat:%.*]] shufflevector <3 x float> [[splatinsertf3]], <3 x float> poison, <3 x i32> zeroinitializer - ; CHECK: fmul <3 x float> %p0, %.splat - - %hlsl.normalize = call <3 x float> @llvm.dx.normalize.v3f32(<3 x float> %p0) - ret <3 x float> %hlsl.normalize -} - -define noundef <4 x float> @test_normalize_float4(<4 x float> noundef %p0) { -entry: - ; EXPCHECK: [[dotf4:%.*]] = call float @llvm.dx.dot4.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}) - ; DOPCHECK: [[dotf4:%.*]] = call float @dx.op.dot4.f32(i32 56, float %{{.*}}, float %{{.*}}, float %{{.*}}, float %{{.*}}) - ; EXPCHECK: [[rsqrt:%.*]] = call float @llvm.dx.rsqrt.f32(float [[dotf4]]) - ; DOPCHECK: [[rsqrt:%.*]] = call float @dx.op.unary.f32(i32 25, float [[dotf4]]) - ; CHECK: [[splatinsertf4:%.*]] = insertelement <4 x float> poison, float [[rsqrt]], i64 0 - ; CHECK: [[splat:%.*]] shufflevector <4 x float> [[splatinsertf4]], <4 x float> poison, <4 x i32> zeroinitializer - ; CHECK: fmul <4 x float> %p0, %.splat - - %hlsl.normalize = call <4 x float> @llvm.dx.normalize.v4f32(<4 x float> %p0) - ret <4 x float> %hlsl.normalize -} +; RUN: opt -S -dxil-intrinsic-expansion < %s | FileCheck %s --check-prefixes=CHECK,EXPCHECK +; RUN: opt -S -dxil-intrinsic-expansion -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library < %s | FileCheck %s --check-prefixes=CHECK,DOPCHECK + +; Make sure dxil operation function calls for normalize are generated for half/float. + +declare half @llvm.dx.normalize.f16(half) +declare <2 x half> @llvm.dx.normalize.v2f16(<2 x half>) +declare <3 x half> @llvm.dx.normalize.v3f16(<3 x half>) +declare <4 x half> @llvm.dx.normalize.v4f16(<4 x half>) + +declare float @llvm.dx.normalize.f32(float) +declare <2 x float> @llvm.dx.normalize.v2f32(<2 x float>) +declare <3 x float> @llvm.dx.normalize.v3f32(<3 x float>) +declare <4 x float> @llvm.dx.normalize.v4f32(<4 x float>) + +define noundef half @test_normalize_half(half noundef %p0) { +entry: + ; CHECK: fdiv half %p0, %p0 + %hlsl.normalize = call half @llvm.dx.normalize.f16(half %p0) + ret half %hlsl.normalize +} + +define noundef <2 x half> @test_normalize_half2(<2 x half> noundef %p0) { +entry: + ; EXPCHECK: [[doth2:%.*]] = call half @llvm.dx.dot2.v2f16(<2 x half> %{{.*}}, <2 x half> %{{.*}}) + ; DOPCHECK: [[doth2:%.*]] = call half @dx.op.dot2.f16(i32 54, half %{{.*}}, half %{{.*}}, half %{{.*}}, half %{{.*}}) + ; EXPCHECK: [[rsqrt:%.*]] = call half @llvm.dx.rsqrt.f16(half [[doth2]]) + ; DOPCHECK: [[rsqrt:%.*]] = call half @dx.op.unary.f16(i32 25, half [[doth2]]) + ; CHECK: [[splatinserth2:%.*]] = insertelement <2 x half> poison, half [[rsqrt]], i64 0 + ; CHECK: [[splat:%.*]] = shufflevector <2 x half> [[splatinserth2]], <2 x half> poison, <2 x i32> zeroinitializer + ; CHECK: fmul <2 x half> %p0, [[splat]] + + %hlsl.normalize = call <2 x half> @llvm.dx.normalize.v2f16(<2 x half> %p0) + ret <2 x half> %hlsl.normalize +} + +define noundef <3 x half> @test_normalize_half3(<3 x half> noundef %p0) { +entry: + ; EXPCHECK: [[doth3:%.*]] = call half @llvm.dx.dot3.v3f16(<3 x half> %{{.*}}, <3 x half> %{{.*}}) + ; DOPCHECK: [[doth3:%.*]] = call half @dx.op.dot3.f16(i32 55, half %{{.*}}, half %{{.*}}, half %{{.*}}, half %{{.*}}) + ; EXPCHECK: [[rsqrt:%.*]] = call half @llvm.dx.rsqrt.f16(half [[doth3]]) + ; DOPCHECK: [[rsqrt:%.*]] = call half @dx.op.unary.f16(i32 25, half [[doth3]]) + ; CHECK: [[splatinserth3:%.*]] = insertelement <3 x half> poison, half [[rsqrt]], i64 0 + ; CHECK: [[splat:%.*]] shufflevector <3 x half> [[splatinserth3]], <3 x half> poison, <3 x i32> zeroinitializer + ; CHECK: fmul <3 x half> %p0, %.splat + + %hlsl.normalize = call <3 x half> @llvm.dx.normalize.v3f16(<3 x half> %p0) + ret <3 x half> %hlsl.normalize +} + +define noundef <4 x half> @test_normalize_half4(<4 x half> noundef %p0) { +entry: + ; EXPCHECK: [[doth4:%.*]] = call half @llvm.dx.dot4.v4f16(<4 x half> %{{.*}}, <4 x half> %{{.*}}) + ; DOPCHECK: [[doth4:%.*]] = call half @dx.op.dot4.f16(i32 56, half %{{.*}}, half %{{.*}}, half %{{.*}}, half %{{.*}}) + ; EXPCHECK: [[rsqrt:%.*]] = call half @llvm.dx.rsqrt.f16(half [[doth4]]) + ; DOPCHECK: [[rsqrt:%.*]] = call half @dx.op.unary.f16(i32 25, half [[doth4]]) + ; CHECK: [[splatinserth4:%.*]] = insertelement <4 x half> poison, half [[rsqrt]], i64 0 + ; CHECK: [[splat:%.*]] shufflevector <4 x half> [[splatinserth4]], <4 x half> poison, <4 x i32> zeroinitializer + ; CHECK: fmul <4 x half> %p0, %.splat + + %hlsl.normalize = call <4 x half> @llvm.dx.normalize.v4f16(<4 x half> %p0) + ret <4 x half> %hlsl.normalize +} + +define noundef float @test_normalize_float(float noundef %p0) { +entry: + ; CHECK: fdiv float %p0, %p0 + %hlsl.normalize = call float @llvm.dx.normalize.f32(float %p0) + ret float %hlsl.normalize +} + +define noundef <2 x float> @test_normalize_float2(<2 x float> noundef %p0) { +entry: + ; EXPCHECK: [[dotf2:%.*]] = call float @llvm.dx.dot2.v2f32(<2 x float> %{{.*}}, <2 x float> %{{.*}}) + ; DOPCHECK: [[dotf2:%.*]] = call float @dx.op.dot2.f32(i32 54, float %{{.*}}, float %{{.*}}, float %{{.*}}, float %{{.*}}) + ; EXPCHECK: [[rsqrt:%.*]] = call float @llvm.dx.rsqrt.f32(float [[dotf2]]) + ; DOPCHECK: [[rsqrt:%.*]] = call float @dx.op.unary.f32(i32 25, float [[dotf2]]) + ; CHECK: [[splatinsertf2:%.*]] = insertelement <2 x float> poison, float [[rsqrt]], i64 0 + ; CHECK: [[splat:%.*]] shufflevector <2 x float> [[splatinsertf2]], <2 x float> poison, <2 x i32> zeroinitializer + ; CHECK: fmul <2 x float> %p0, %.splat + + %hlsl.normalize = call <2 x float> @llvm.dx.normalize.v2f32(<2 x float> %p0) + ret <2 x float> %hlsl.normalize +} + +define noundef <3 x float> @test_normalize_float3(<3 x float> noundef %p0) { +entry: + ; EXPCHECK: [[dotf3:%.*]] = call float @llvm.dx.dot3.v3f32(<3 x float> %{{.*}}, <3 x float> %{{.*}}) + ; DOPCHECK: [[dotf3:%.*]] = call float @dx.op.dot3.f32(i32 55, float %{{.*}}, float %{{.*}}, float %{{.*}}, float %{{.*}}) + ; EXPCHECK: [[rsqrt:%.*]] = call float @llvm.dx.rsqrt.f32(float [[dotf3]]) + ; DOPCHECK: [[rsqrt:%.*]] = call float @dx.op.unary.f32(i32 25, float [[dotf3]]) + ; CHECK: [[splatinsertf3:%.*]] = insertelement <3 x float> poison, float [[rsqrt]], i64 0 + ; CHECK: [[splat:%.*]] shufflevector <3 x float> [[splatinsertf3]], <3 x float> poison, <3 x i32> zeroinitializer + ; CHECK: fmul <3 x float> %p0, %.splat + + %hlsl.normalize = call <3 x float> @llvm.dx.normalize.v3f32(<3 x float> %p0) + ret <3 x float> %hlsl.normalize +} + +define noundef <4 x float> @test_normalize_float4(<4 x float> noundef %p0) { +entry: + ; EXPCHECK: [[dotf4:%.*]] = call float @llvm.dx.dot4.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}) + ; DOPCHECK: [[dotf4:%.*]] = call float @dx.op.dot4.f32(i32 56, float %{{.*}}, float %{{.*}}, float %{{.*}}, float %{{.*}}) + ; EXPCHECK: [[rsqrt:%.*]] = call float @llvm.dx.rsqrt.f32(float [[dotf4]]) + ; DOPCHECK: [[rsqrt:%.*]] = call float @dx.op.unary.f32(i32 25, float [[dotf4]]) + ; CHECK: [[splatinsertf4:%.*]] = insertelement <4 x float> poison, float [[rsqrt]], i64 0 + ; CHECK: [[splat:%.*]] shufflevector <4 x float> [[splatinsertf4]], <4 x float> poison, <4 x i32> zeroinitializer + ; CHECK: fmul <4 x float> %p0, %.splat + + %hlsl.normalize = call <4 x float> @llvm.dx.normalize.v4f32(<4 x float> %p0) + ret <4 x float> %hlsl.normalize +} diff --git a/llvm/test/CodeGen/DirectX/normalize_error.ll b/llvm/test/CodeGen/DirectX/normalize_error.ll index 35a91c0cdc24df..3041d2ecdd923a 100644 --- a/llvm/test/CodeGen/DirectX/normalize_error.ll +++ b/llvm/test/CodeGen/DirectX/normalize_error.ll @@ -1,10 +1,10 @@ -; RUN: not opt -S -dxil-intrinsic-expansion -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library %s 2>&1 | FileCheck %s - -; DXIL operation normalize does not support double overload type -; CHECK: Cannot create Dot2 operation: Invalid overload type - -define noundef <2 x double> @test_normalize_double2(<2 x double> noundef %p0) { -entry: - %hlsl.normalize = call <2 x double> @llvm.dx.normalize.v2f32(<2 x double> %p0) - ret <2 x double> %hlsl.normalize -} +; RUN: not opt -S -dxil-intrinsic-expansion -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library %s 2>&1 | FileCheck %s + +; DXIL operation normalize does not support double overload type +; CHECK: Cannot create Dot2 operation: Invalid overload type + +define noundef <2 x double> @test_normalize_double2(<2 x double> noundef %p0) { +entry: + %hlsl.normalize = call <2 x double> @llvm.dx.normalize.v2f32(<2 x double> %p0) + ret <2 x double> %hlsl.normalize +} diff --git a/llvm/test/CodeGen/DirectX/step.ll b/llvm/test/CodeGen/DirectX/step.ll index 1c9894026c62ec..6a9b5bf71da899 100644 --- a/llvm/test/CodeGen/DirectX/step.ll +++ b/llvm/test/CodeGen/DirectX/step.ll @@ -1,78 +1,78 @@ -; RUN: opt -S -dxil-intrinsic-expansion < %s | FileCheck %s --check-prefix=CHECK -; RUN: opt -S -dxil-intrinsic-expansion -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library < %s | FileCheck %s --check-prefix=CHECK - -; Make sure dxil operation function calls for step are generated for half/float. - -declare half @llvm.dx.step.f16(half, half) -declare <2 x half> @llvm.dx.step.v2f16(<2 x half>, <2 x half>) -declare <3 x half> @llvm.dx.step.v3f16(<3 x half>, <3 x half>) -declare <4 x half> @llvm.dx.step.v4f16(<4 x half>, <4 x half>) - -declare float @llvm.dx.step.f32(float, float) -declare <2 x float> @llvm.dx.step.v2f32(<2 x float>, <2 x float>) -declare <3 x float> @llvm.dx.step.v3f32(<3 x float>, <3 x float>) -declare <4 x float> @llvm.dx.step.v4f32(<4 x float>, <4 x float>) - -define noundef half @test_step_half(half noundef %p0, half noundef %p1) { -entry: - ; CHECK: %0 = fcmp olt half %p1, %p0 - ; CHECK: %1 = select i1 %0, half 0xH0000, half 0xH3C00 - %hlsl.step = call half @llvm.dx.step.f16(half %p0, half %p1) - ret half %hlsl.step -} - -define noundef <2 x half> @test_step_half2(<2 x half> noundef %p0, <2 x half> noundef %p1) { -entry: - ; CHECK: %0 = fcmp olt <2 x half> %p1, %p0 - ; CHECK: %1 = select <2 x i1> %0, <2 x half> zeroinitializer, <2 x half> - %hlsl.step = call <2 x half> @llvm.dx.step.v2f16(<2 x half> %p0, <2 x half> %p1) - ret <2 x half> %hlsl.step -} - -define noundef <3 x half> @test_step_half3(<3 x half> noundef %p0, <3 x half> noundef %p1) { -entry: - ; CHECK: %0 = fcmp olt <3 x half> %p1, %p0 - ; CHECK: %1 = select <3 x i1> %0, <3 x half> zeroinitializer, <3 x half> - %hlsl.step = call <3 x half> @llvm.dx.step.v3f16(<3 x half> %p0, <3 x half> %p1) - ret <3 x half> %hlsl.step -} - -define noundef <4 x half> @test_step_half4(<4 x half> noundef %p0, <4 x half> noundef %p1) { -entry: - ; CHECK: %0 = fcmp olt <4 x half> %p1, %p0 - ; CHECK: %1 = select <4 x i1> %0, <4 x half> zeroinitializer, <4 x half> - %hlsl.step = call <4 x half> @llvm.dx.step.v4f16(<4 x half> %p0, <4 x half> %p1) - ret <4 x half> %hlsl.step -} - -define noundef float @test_step_float(float noundef %p0, float noundef %p1) { -entry: - ; CHECK: %0 = fcmp olt float %p1, %p0 - ; CHECK: %1 = select i1 %0, float 0.000000e+00, float 1.000000e+00 - %hlsl.step = call float @llvm.dx.step.f32(float %p0, float %p1) - ret float %hlsl.step -} - -define noundef <2 x float> @test_step_float2(<2 x float> noundef %p0, <2 x float> noundef %p1) { -entry: - ; CHECK: %0 = fcmp olt <2 x float> %p1, %p0 - ; CHECK: %1 = select <2 x i1> %0, <2 x float> zeroinitializer, <2 x float> - %hlsl.step = call <2 x float> @llvm.dx.step.v2f32(<2 x float> %p0, <2 x float> %p1) - ret <2 x float> %hlsl.step -} - -define noundef <3 x float> @test_step_float3(<3 x float> noundef %p0, <3 x float> noundef %p1) { -entry: - ; CHECK: %0 = fcmp olt <3 x float> %p1, %p0 - ; CHECK: %1 = select <3 x i1> %0, <3 x float> zeroinitializer, <3 x float> - %hlsl.step = call <3 x float> @llvm.dx.step.v3f32(<3 x float> %p0, <3 x float> %p1) - ret <3 x float> %hlsl.step -} - -define noundef <4 x float> @test_step_float4(<4 x float> noundef %p0, <4 x float> noundef %p1) { -entry: - ; CHECK: %0 = fcmp olt <4 x float> %p1, %p0 - ; CHECK: %1 = select <4 x i1> %0, <4 x float> zeroinitializer, <4 x float> - %hlsl.step = call <4 x float> @llvm.dx.step.v4f32(<4 x float> %p0, <4 x float> %p1) - ret <4 x float> %hlsl.step -} +; RUN: opt -S -dxil-intrinsic-expansion < %s | FileCheck %s --check-prefix=CHECK +; RUN: opt -S -dxil-intrinsic-expansion -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library < %s | FileCheck %s --check-prefix=CHECK + +; Make sure dxil operation function calls for step are generated for half/float. + +declare half @llvm.dx.step.f16(half, half) +declare <2 x half> @llvm.dx.step.v2f16(<2 x half>, <2 x half>) +declare <3 x half> @llvm.dx.step.v3f16(<3 x half>, <3 x half>) +declare <4 x half> @llvm.dx.step.v4f16(<4 x half>, <4 x half>) + +declare float @llvm.dx.step.f32(float, float) +declare <2 x float> @llvm.dx.step.v2f32(<2 x float>, <2 x float>) +declare <3 x float> @llvm.dx.step.v3f32(<3 x float>, <3 x float>) +declare <4 x float> @llvm.dx.step.v4f32(<4 x float>, <4 x float>) + +define noundef half @test_step_half(half noundef %p0, half noundef %p1) { +entry: + ; CHECK: %0 = fcmp olt half %p1, %p0 + ; CHECK: %1 = select i1 %0, half 0xH0000, half 0xH3C00 + %hlsl.step = call half @llvm.dx.step.f16(half %p0, half %p1) + ret half %hlsl.step +} + +define noundef <2 x half> @test_step_half2(<2 x half> noundef %p0, <2 x half> noundef %p1) { +entry: + ; CHECK: %0 = fcmp olt <2 x half> %p1, %p0 + ; CHECK: %1 = select <2 x i1> %0, <2 x half> zeroinitializer, <2 x half> + %hlsl.step = call <2 x half> @llvm.dx.step.v2f16(<2 x half> %p0, <2 x half> %p1) + ret <2 x half> %hlsl.step +} + +define noundef <3 x half> @test_step_half3(<3 x half> noundef %p0, <3 x half> noundef %p1) { +entry: + ; CHECK: %0 = fcmp olt <3 x half> %p1, %p0 + ; CHECK: %1 = select <3 x i1> %0, <3 x half> zeroinitializer, <3 x half> + %hlsl.step = call <3 x half> @llvm.dx.step.v3f16(<3 x half> %p0, <3 x half> %p1) + ret <3 x half> %hlsl.step +} + +define noundef <4 x half> @test_step_half4(<4 x half> noundef %p0, <4 x half> noundef %p1) { +entry: + ; CHECK: %0 = fcmp olt <4 x half> %p1, %p0 + ; CHECK: %1 = select <4 x i1> %0, <4 x half> zeroinitializer, <4 x half> + %hlsl.step = call <4 x half> @llvm.dx.step.v4f16(<4 x half> %p0, <4 x half> %p1) + ret <4 x half> %hlsl.step +} + +define noundef float @test_step_float(float noundef %p0, float noundef %p1) { +entry: + ; CHECK: %0 = fcmp olt float %p1, %p0 + ; CHECK: %1 = select i1 %0, float 0.000000e+00, float 1.000000e+00 + %hlsl.step = call float @llvm.dx.step.f32(float %p0, float %p1) + ret float %hlsl.step +} + +define noundef <2 x float> @test_step_float2(<2 x float> noundef %p0, <2 x float> noundef %p1) { +entry: + ; CHECK: %0 = fcmp olt <2 x float> %p1, %p0 + ; CHECK: %1 = select <2 x i1> %0, <2 x float> zeroinitializer, <2 x float> + %hlsl.step = call <2 x float> @llvm.dx.step.v2f32(<2 x float> %p0, <2 x float> %p1) + ret <2 x float> %hlsl.step +} + +define noundef <3 x float> @test_step_float3(<3 x float> noundef %p0, <3 x float> noundef %p1) { +entry: + ; CHECK: %0 = fcmp olt <3 x float> %p1, %p0 + ; CHECK: %1 = select <3 x i1> %0, <3 x float> zeroinitializer, <3 x float> + %hlsl.step = call <3 x float> @llvm.dx.step.v3f32(<3 x float> %p0, <3 x float> %p1) + ret <3 x float> %hlsl.step +} + +define noundef <4 x float> @test_step_float4(<4 x float> noundef %p0, <4 x float> noundef %p1) { +entry: + ; CHECK: %0 = fcmp olt <4 x float> %p1, %p0 + ; CHECK: %1 = select <4 x i1> %0, <4 x float> zeroinitializer, <4 x float> + %hlsl.step = call <4 x float> @llvm.dx.step.v4f32(<4 x float> %p0, <4 x float> %p1) + ret <4 x float> %hlsl.step +} diff --git a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/atan2.ll b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/atan2.ll index bdbfc133efa29b..a0306bae4a22de 100644 --- a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/atan2.ll +++ b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/atan2.ll @@ -1,49 +1,49 @@ -; RUN: llc -verify-machineinstrs -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s -; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} - -; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" -; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 -; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 -; CHECK-DAG: %[[#vec4_float_32:]] = OpTypeVector %[[#float_32]] 4 -; CHECK-DAG: %[[#vec4_float_16:]] = OpTypeVector %[[#float_16]] 4 - -define noundef float @atan2_float(float noundef %a, float noundef %b) { -entry: -; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] -; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#]] -; CHECK: %[[#]] = OpExtInst %[[#float_32]] %[[#op_ext_glsl]] Atan2 %[[#arg0]] %[[#arg1]] - %elt.atan2 = call float @llvm.atan2.f32(float %a, float %b) - ret float %elt.atan2 -} - -define noundef half @atan2_half(half noundef %a, half noundef %b) { -entry: -; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] -; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#]] -; CHECK: %[[#]] = OpExtInst %[[#float_16]] %[[#op_ext_glsl]] Atan2 %[[#arg0]] %[[#arg1]] - %elt.atan2 = call half @llvm.atan2.f16(half %a, half %b) - ret half %elt.atan2 -} - -define noundef <4 x float> @atan2_float4(<4 x float> noundef %a, <4 x float> noundef %b) { -entry: - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] - ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#]] - ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_32]] %[[#op_ext_glsl]] Atan2 %[[#arg0]] %[[#arg1]] - %elt.atan2 = call <4 x float> @llvm.atan2.v4f32(<4 x float> %a, <4 x float> %b) - ret <4 x float> %elt.atan2 -} - -define noundef <4 x half> @atan2_half4(<4 x half> noundef %a, <4 x half> noundef %b) { -entry: - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] - ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#]] - ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_16]] %[[#op_ext_glsl]] Atan2 %[[#arg0]] %[[#arg1]] - %elt.atan2 = call <4 x half> @llvm.atan2.v4f16(<4 x half> %a, <4 x half> %b) - ret <4 x half> %elt.atan2 -} - -declare half @llvm.atan2.f16(half, half) -declare float @llvm.atan2.f32(float, float) -declare <4 x half> @llvm.atan2.v4f16(<4 x half>, <4 x half>) -declare <4 x float> @llvm.atan2.v4f32(<4 x float>, <4 x float>) +; RUN: llc -verify-machineinstrs -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s +; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} + +; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" +; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 +; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 +; CHECK-DAG: %[[#vec4_float_32:]] = OpTypeVector %[[#float_32]] 4 +; CHECK-DAG: %[[#vec4_float_16:]] = OpTypeVector %[[#float_16]] 4 + +define noundef float @atan2_float(float noundef %a, float noundef %b) { +entry: +; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] +; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#]] +; CHECK: %[[#]] = OpExtInst %[[#float_32]] %[[#op_ext_glsl]] Atan2 %[[#arg0]] %[[#arg1]] + %elt.atan2 = call float @llvm.atan2.f32(float %a, float %b) + ret float %elt.atan2 +} + +define noundef half @atan2_half(half noundef %a, half noundef %b) { +entry: +; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] +; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#]] +; CHECK: %[[#]] = OpExtInst %[[#float_16]] %[[#op_ext_glsl]] Atan2 %[[#arg0]] %[[#arg1]] + %elt.atan2 = call half @llvm.atan2.f16(half %a, half %b) + ret half %elt.atan2 +} + +define noundef <4 x float> @atan2_float4(<4 x float> noundef %a, <4 x float> noundef %b) { +entry: + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] + ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#]] + ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_32]] %[[#op_ext_glsl]] Atan2 %[[#arg0]] %[[#arg1]] + %elt.atan2 = call <4 x float> @llvm.atan2.v4f32(<4 x float> %a, <4 x float> %b) + ret <4 x float> %elt.atan2 +} + +define noundef <4 x half> @atan2_half4(<4 x half> noundef %a, <4 x half> noundef %b) { +entry: + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] + ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#]] + ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_16]] %[[#op_ext_glsl]] Atan2 %[[#arg0]] %[[#arg1]] + %elt.atan2 = call <4 x half> @llvm.atan2.v4f16(<4 x half> %a, <4 x half> %b) + ret <4 x half> %elt.atan2 +} + +declare half @llvm.atan2.f16(half, half) +declare float @llvm.atan2.f32(float, float) +declare <4 x half> @llvm.atan2.v4f16(<4 x half>, <4 x half>) +declare <4 x float> @llvm.atan2.v4f32(<4 x float>, <4 x float>) diff --git a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/cross.ll b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/cross.ll index 2e0eb8c429ac27..7c06c14bb968d1 100644 --- a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/cross.ll +++ b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/cross.ll @@ -1,33 +1,33 @@ -; RUN: llc -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s -; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} - -; Make sure SPIRV operation function calls for cross are lowered correctly. - -; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" -; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 -; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 -; CHECK-DAG: %[[#vec3_float_16:]] = OpTypeVector %[[#float_16]] 3 -; CHECK-DAG: %[[#vec3_float_32:]] = OpTypeVector %[[#float_32]] 3 - -define noundef <3 x half> @cross_half4(<3 x half> noundef %a, <3 x half> noundef %b) { -entry: - ; CHECK: %[[#]] = OpFunction %[[#vec3_float_16]] None %[[#]] - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec3_float_16]] - ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#vec3_float_16]] - ; CHECK: %[[#]] = OpExtInst %[[#vec3_float_16]] %[[#op_ext_glsl]] Cross %[[#arg0]] %[[#arg1]] - %hlsl.cross = call <3 x half> @llvm.spv.cross.v4f16(<3 x half> %a, <3 x half> %b) - ret <3 x half> %hlsl.cross -} - -define noundef <3 x float> @cross_float4(<3 x float> noundef %a, <3 x float> noundef %b) { -entry: - ; CHECK: %[[#]] = OpFunction %[[#vec3_float_32]] None %[[#]] - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec3_float_32]] - ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#vec3_float_32]] - ; CHECK: %[[#]] = OpExtInst %[[#vec3_float_32]] %[[#op_ext_glsl]] Cross %[[#arg0]] %[[#arg1]] - %hlsl.cross = call <3 x float> @llvm.spv.cross.v4f32(<3 x float> %a, <3 x float> %b) - ret <3 x float> %hlsl.cross -} - -declare <3 x half> @llvm.spv.cross.v4f16(<3 x half>, <3 x half>) -declare <3 x float> @llvm.spv.cross.v4f32(<3 x float>, <3 x float>) +; RUN: llc -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s +; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} + +; Make sure SPIRV operation function calls for cross are lowered correctly. + +; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" +; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 +; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 +; CHECK-DAG: %[[#vec3_float_16:]] = OpTypeVector %[[#float_16]] 3 +; CHECK-DAG: %[[#vec3_float_32:]] = OpTypeVector %[[#float_32]] 3 + +define noundef <3 x half> @cross_half4(<3 x half> noundef %a, <3 x half> noundef %b) { +entry: + ; CHECK: %[[#]] = OpFunction %[[#vec3_float_16]] None %[[#]] + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec3_float_16]] + ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#vec3_float_16]] + ; CHECK: %[[#]] = OpExtInst %[[#vec3_float_16]] %[[#op_ext_glsl]] Cross %[[#arg0]] %[[#arg1]] + %hlsl.cross = call <3 x half> @llvm.spv.cross.v4f16(<3 x half> %a, <3 x half> %b) + ret <3 x half> %hlsl.cross +} + +define noundef <3 x float> @cross_float4(<3 x float> noundef %a, <3 x float> noundef %b) { +entry: + ; CHECK: %[[#]] = OpFunction %[[#vec3_float_32]] None %[[#]] + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec3_float_32]] + ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#vec3_float_32]] + ; CHECK: %[[#]] = OpExtInst %[[#vec3_float_32]] %[[#op_ext_glsl]] Cross %[[#arg0]] %[[#arg1]] + %hlsl.cross = call <3 x float> @llvm.spv.cross.v4f32(<3 x float> %a, <3 x float> %b) + ret <3 x float> %hlsl.cross +} + +declare <3 x half> @llvm.spv.cross.v4f16(<3 x half>, <3 x half>) +declare <3 x float> @llvm.spv.cross.v4f32(<3 x float>, <3 x float>) diff --git a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/length.ll b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/length.ll index b4a9d8e0664b7e..df1ef3a7287c3b 100644 --- a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/length.ll +++ b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/length.ll @@ -1,29 +1,29 @@ -; RUN: llc -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s -; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} - -; Make sure SPIRV operation function calls for length are lowered correctly. - -; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" -; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 -; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 -; CHECK-DAG: %[[#vec4_float_16:]] = OpTypeVector %[[#float_16]] 4 -; CHECK-DAG: %[[#vec4_float_32:]] = OpTypeVector %[[#float_32]] 4 - -define noundef half @length_half4(<4 x half> noundef %a) { -entry: - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] - ; CHECK: %[[#]] = OpExtInst %[[#float_16]] %[[#op_ext_glsl]] Length %[[#arg0]] - %hlsl.length = call half @llvm.spv.length.v4f16(<4 x half> %a) - ret half %hlsl.length -} - -define noundef float @length_float4(<4 x float> noundef %a) { -entry: - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] - ; CHECK: %[[#]] = OpExtInst %[[#float_32]] %[[#op_ext_glsl]] Length %[[#arg0]] - %hlsl.length = call float @llvm.spv.length.v4f32(<4 x float> %a) - ret float %hlsl.length -} - -declare half @llvm.spv.length.v4f16(<4 x half>) -declare float @llvm.spv.length.v4f32(<4 x float>) +; RUN: llc -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s +; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} + +; Make sure SPIRV operation function calls for length are lowered correctly. + +; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" +; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 +; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 +; CHECK-DAG: %[[#vec4_float_16:]] = OpTypeVector %[[#float_16]] 4 +; CHECK-DAG: %[[#vec4_float_32:]] = OpTypeVector %[[#float_32]] 4 + +define noundef half @length_half4(<4 x half> noundef %a) { +entry: + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] + ; CHECK: %[[#]] = OpExtInst %[[#float_16]] %[[#op_ext_glsl]] Length %[[#arg0]] + %hlsl.length = call half @llvm.spv.length.v4f16(<4 x half> %a) + ret half %hlsl.length +} + +define noundef float @length_float4(<4 x float> noundef %a) { +entry: + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] + ; CHECK: %[[#]] = OpExtInst %[[#float_32]] %[[#op_ext_glsl]] Length %[[#arg0]] + %hlsl.length = call float @llvm.spv.length.v4f32(<4 x float> %a) + ret float %hlsl.length +} + +declare half @llvm.spv.length.v4f16(<4 x half>) +declare float @llvm.spv.length.v4f32(<4 x float>) diff --git a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/normalize.ll b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/normalize.ll index fa73b9c2a4d3ab..4659b5146e4327 100644 --- a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/normalize.ll +++ b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/normalize.ll @@ -1,31 +1,31 @@ -; RUN: llc -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s -; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} - -; Make sure SPIRV operation function calls for normalize are lowered correctly. - -; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" -; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 -; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 -; CHECK-DAG: %[[#vec4_float_16:]] = OpTypeVector %[[#float_16]] 4 -; CHECK-DAG: %[[#vec4_float_32:]] = OpTypeVector %[[#float_32]] 4 - -define noundef <4 x half> @normalize_half4(<4 x half> noundef %a) { -entry: - ; CHECK: %[[#]] = OpFunction %[[#vec4_float_16]] None %[[#]] - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec4_float_16]] - ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_16]] %[[#op_ext_glsl]] Normalize %[[#arg0]] - %hlsl.normalize = call <4 x half> @llvm.spv.normalize.v4f16(<4 x half> %a) - ret <4 x half> %hlsl.normalize -} - -define noundef <4 x float> @normalize_float4(<4 x float> noundef %a) { -entry: - ; CHECK: %[[#]] = OpFunction %[[#vec4_float_32]] None %[[#]] - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec4_float_32]] - ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_32]] %[[#op_ext_glsl]] Normalize %[[#arg0]] - %hlsl.normalize = call <4 x float> @llvm.spv.normalize.v4f32(<4 x float> %a) - ret <4 x float> %hlsl.normalize -} - -declare <4 x half> @llvm.spv.normalize.v4f16(<4 x half>) -declare <4 x float> @llvm.spv.normalize.v4f32(<4 x float>) +; RUN: llc -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s +; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} + +; Make sure SPIRV operation function calls for normalize are lowered correctly. + +; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" +; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 +; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 +; CHECK-DAG: %[[#vec4_float_16:]] = OpTypeVector %[[#float_16]] 4 +; CHECK-DAG: %[[#vec4_float_32:]] = OpTypeVector %[[#float_32]] 4 + +define noundef <4 x half> @normalize_half4(<4 x half> noundef %a) { +entry: + ; CHECK: %[[#]] = OpFunction %[[#vec4_float_16]] None %[[#]] + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec4_float_16]] + ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_16]] %[[#op_ext_glsl]] Normalize %[[#arg0]] + %hlsl.normalize = call <4 x half> @llvm.spv.normalize.v4f16(<4 x half> %a) + ret <4 x half> %hlsl.normalize +} + +define noundef <4 x float> @normalize_float4(<4 x float> noundef %a) { +entry: + ; CHECK: %[[#]] = OpFunction %[[#vec4_float_32]] None %[[#]] + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec4_float_32]] + ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_32]] %[[#op_ext_glsl]] Normalize %[[#arg0]] + %hlsl.normalize = call <4 x float> @llvm.spv.normalize.v4f32(<4 x float> %a) + ret <4 x float> %hlsl.normalize +} + +declare <4 x half> @llvm.spv.normalize.v4f16(<4 x half>) +declare <4 x float> @llvm.spv.normalize.v4f32(<4 x float>) diff --git a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/step.ll b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/step.ll index bb50d8c790f8ad..7c0ee9398d15fc 100644 --- a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/step.ll +++ b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/step.ll @@ -1,33 +1,33 @@ -; RUN: llc -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s -; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} - -; Make sure SPIRV operation function calls for step are lowered correctly. - -; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" -; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 -; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 -; CHECK-DAG: %[[#vec4_float_16:]] = OpTypeVector %[[#float_16]] 4 -; CHECK-DAG: %[[#vec4_float_32:]] = OpTypeVector %[[#float_32]] 4 - -define noundef <4 x half> @step_half4(<4 x half> noundef %a, <4 x half> noundef %b) { -entry: - ; CHECK: %[[#]] = OpFunction %[[#vec4_float_16]] None %[[#]] - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec4_float_16]] - ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#vec4_float_16]] - ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_16]] %[[#op_ext_glsl]] Step %[[#arg0]] %[[#arg1]] - %hlsl.step = call <4 x half> @llvm.spv.step.v4f16(<4 x half> %a, <4 x half> %b) - ret <4 x half> %hlsl.step -} - -define noundef <4 x float> @step_float4(<4 x float> noundef %a, <4 x float> noundef %b) { -entry: - ; CHECK: %[[#]] = OpFunction %[[#vec4_float_32]] None %[[#]] - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec4_float_32]] - ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#vec4_float_32]] - ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_32]] %[[#op_ext_glsl]] Step %[[#arg0]] %[[#arg1]] - %hlsl.step = call <4 x float> @llvm.spv.step.v4f32(<4 x float> %a, <4 x float> %b) - ret <4 x float> %hlsl.step -} - -declare <4 x half> @llvm.spv.step.v4f16(<4 x half>, <4 x half>) -declare <4 x float> @llvm.spv.step.v4f32(<4 x float>, <4 x float>) +; RUN: llc -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s +; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} + +; Make sure SPIRV operation function calls for step are lowered correctly. + +; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" +; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 +; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 +; CHECK-DAG: %[[#vec4_float_16:]] = OpTypeVector %[[#float_16]] 4 +; CHECK-DAG: %[[#vec4_float_32:]] = OpTypeVector %[[#float_32]] 4 + +define noundef <4 x half> @step_half4(<4 x half> noundef %a, <4 x half> noundef %b) { +entry: + ; CHECK: %[[#]] = OpFunction %[[#vec4_float_16]] None %[[#]] + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec4_float_16]] + ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#vec4_float_16]] + ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_16]] %[[#op_ext_glsl]] Step %[[#arg0]] %[[#arg1]] + %hlsl.step = call <4 x half> @llvm.spv.step.v4f16(<4 x half> %a, <4 x half> %b) + ret <4 x half> %hlsl.step +} + +define noundef <4 x float> @step_float4(<4 x float> noundef %a, <4 x float> noundef %b) { +entry: + ; CHECK: %[[#]] = OpFunction %[[#vec4_float_32]] None %[[#]] + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec4_float_32]] + ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#vec4_float_32]] + ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_32]] %[[#op_ext_glsl]] Step %[[#arg0]] %[[#arg1]] + %hlsl.step = call <4 x float> @llvm.spv.step.v4f32(<4 x float> %a, <4 x float> %b) + ret <4 x float> %hlsl.step +} + +declare <4 x half> @llvm.spv.step.v4f16(<4 x half>, <4 x half>) +declare <4 x float> @llvm.spv.step.v4f32(<4 x float>, <4 x float>) diff --git a/llvm/test/Demangle/ms-placeholder-return-type.test b/llvm/test/Demangle/ms-placeholder-return-type.test index 18038e636c8d5a..a656400fe140fb 100644 --- a/llvm/test/Demangle/ms-placeholder-return-type.test +++ b/llvm/test/Demangle/ms-placeholder-return-type.test @@ -1,18 +1,18 @@ -; RUN: llvm-undname < %s | FileCheck %s - -; CHECK-NOT: Invalid mangled name - -?TestNonTemplateAuto@@YA at XZ -; CHECK: __cdecl TestNonTemplateAuto(void) - -??$AutoT at X@@YA?A_PXZ -; CHECK: auto __cdecl AutoT(void) - -??$AutoT at X@@YA?B_PXZ -; CHECK: auto const __cdecl AutoT(void) - -??$AutoT at X@@YA?A_TXZ -; CHECK: decltype(auto) __cdecl AutoT(void) - -??$AutoT at X@@YA?B_TXZ -; CHECK: decltype(auto) const __cdecl AutoT(void) +; RUN: llvm-undname < %s | FileCheck %s + +; CHECK-NOT: Invalid mangled name + +?TestNonTemplateAuto@@YA at XZ +; CHECK: __cdecl TestNonTemplateAuto(void) + +??$AutoT at X@@YA?A_PXZ +; CHECK: auto __cdecl AutoT(void) + +??$AutoT at X@@YA?B_PXZ +; CHECK: auto const __cdecl AutoT(void) + +??$AutoT at X@@YA?A_TXZ +; CHECK: decltype(auto) __cdecl AutoT(void) + +??$AutoT at X@@YA?B_TXZ +; CHECK: decltype(auto) const __cdecl AutoT(void) diff --git a/llvm/test/FileCheck/dos-style-eol.txt b/llvm/test/FileCheck/dos-style-eol.txt index 4252aad4d3e7bf..52184f465c3fdf 100644 --- a/llvm/test/FileCheck/dos-style-eol.txt +++ b/llvm/test/FileCheck/dos-style-eol.txt @@ -1,11 +1,11 @@ -// Test for using FileCheck on DOS style end-of-line -// This test was deliberately committed with DOS style end of line. -// Don't change line endings! -// RUN: FileCheck -input-file %s %s -// RUN: FileCheck --strict-whitespace -input-file %s %s - -LINE 1 -; CHECK: {{^}}LINE 1{{$}} - -LINE 2 +// Test for using FileCheck on DOS style end-of-line +// This test was deliberately committed with DOS style end of line. +// Don't change line endings! +// RUN: FileCheck -input-file %s %s +// RUN: FileCheck --strict-whitespace -input-file %s %s + +LINE 1 +; CHECK: {{^}}LINE 1{{$}} + +LINE 2 ; CHECK: {{^}}LINE 2{{$}} \ No newline at end of file diff --git a/llvm/test/tools/llvm-ar/Inputs/mri-crlf.mri b/llvm/test/tools/llvm-ar/Inputs/mri-crlf.mri index 72d23d041ae807..857c4ff87b6cf8 100644 --- a/llvm/test/tools/llvm-ar/Inputs/mri-crlf.mri +++ b/llvm/test/tools/llvm-ar/Inputs/mri-crlf.mri @@ -1,4 +1,4 @@ -; this file intentionally has crlf line endings -create crlf.a -addmod foo.txt -end +; this file intentionally has crlf line endings +create crlf.a +addmod foo.txt +end diff --git a/llvm/test/tools/llvm-cvtres/Inputs/languages.rc b/llvm/test/tools/llvm-cvtres/Inputs/languages.rc index 081b3a77bebc10..82031d0e208395 100644 --- a/llvm/test/tools/llvm-cvtres/Inputs/languages.rc +++ b/llvm/test/tools/llvm-cvtres/Inputs/languages.rc @@ -1,36 +1,36 @@ -#include "windows.h" - -LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US -randomdat RCDATA -{ - "this is a random bit of data that means nothing\0", - 0x23a9, - 0x140e, - 194292, -} - -LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED -randomdat RCDATA -{ - "zhe4 shi4 yi1ge4 sui2ji1 de shu4ju4, zhe4 yi4wei4zhe shen2me\0", - 0x23a9, - 0x140e, - 194292, -} - -LANGUAGE LANG_GERMAN, SUBLANG_GERMAN_LUXEMBOURG -randomdat RCDATA -{ - "Dies ist ein zufälliges Bit von Daten, die nichts bedeutet\0", - 0x23a9, - 0x140e, - 194292, -} - -LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED -myaccelerators ACCELERATORS -{ - "^C", 999, VIRTKEY, ALT - "D", 1100, VIRTKEY, CONTROL, SHIFT - "^R", 444, ASCII, NOINVERT -} +#include "windows.h" + +LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US +randomdat RCDATA +{ + "this is a random bit of data that means nothing\0", + 0x23a9, + 0x140e, + 194292, +} + +LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED +randomdat RCDATA +{ + "zhe4 shi4 yi1ge4 sui2ji1 de shu4ju4, zhe4 yi4wei4zhe shen2me\0", + 0x23a9, + 0x140e, + 194292, +} + +LANGUAGE LANG_GERMAN, SUBLANG_GERMAN_LUXEMBOURG +randomdat RCDATA +{ + "Dies ist ein zufälliges Bit von Daten, die nichts bedeutet\0", + 0x23a9, + 0x140e, + 194292, +} + +LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED +myaccelerators ACCELERATORS +{ + "^C", 999, VIRTKEY, ALT + "D", 1100, VIRTKEY, CONTROL, SHIFT + "^R", 444, ASCII, NOINVERT +} diff --git a/llvm/test/tools/llvm-cvtres/Inputs/test_resource.rc b/llvm/test/tools/llvm-cvtres/Inputs/test_resource.rc index 5ca097baa0f736..494849f57a0a9e 100644 --- a/llvm/test/tools/llvm-cvtres/Inputs/test_resource.rc +++ b/llvm/test/tools/llvm-cvtres/Inputs/test_resource.rc @@ -1,50 +1,50 @@ -#include "windows.h" - -LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US - -myaccelerators ACCELERATORS -{ - "^C", 999, VIRTKEY, ALT - "D", 1100, VIRTKEY, CONTROL, SHIFT - "^R", 444, ASCII, NOINVERT -} - -cursor BITMAP "cursor_small.bmp" -okay BITMAP "okay_small.bmp" - -14432 MENU -LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED -{ - MENUITEM "yu", 100 - MENUITEM "shala", 101 - MENUITEM "kaoya", 102 -} - -testdialog DIALOG 10, 10, 200, 300 -STYLE WS_POPUP | WS_BORDER -CAPTION "Test" -{ - CTEXT "Continue:", 1, 10, 10, 230, 14 - PUSHBUTTON "&OK", 2, 66, 134, 161, 13 -} - -12 ACCELERATORS -{ - "X", 164, VIRTKEY, ALT - "H", 5678, VIRTKEY, CONTROL, SHIFT - "^R", 444, ASCII, NOINVERT -} - -"eat" MENU -LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_AUS -{ - MENUITEM "fish", 100 - MENUITEM "salad", 101 - MENUITEM "duck", 102 -} - - -myresource stringarray { - "this is a user defined resource\0", - "it contains many strings\0", +#include "windows.h" + +LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US + +myaccelerators ACCELERATORS +{ + "^C", 999, VIRTKEY, ALT + "D", 1100, VIRTKEY, CONTROL, SHIFT + "^R", 444, ASCII, NOINVERT +} + +cursor BITMAP "cursor_small.bmp" +okay BITMAP "okay_small.bmp" + +14432 MENU +LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED +{ + MENUITEM "yu", 100 + MENUITEM "shala", 101 + MENUITEM "kaoya", 102 +} + +testdialog DIALOG 10, 10, 200, 300 +STYLE WS_POPUP | WS_BORDER +CAPTION "Test" +{ + CTEXT "Continue:", 1, 10, 10, 230, 14 + PUSHBUTTON "&OK", 2, 66, 134, 161, 13 +} + +12 ACCELERATORS +{ + "X", 164, VIRTKEY, ALT + "H", 5678, VIRTKEY, CONTROL, SHIFT + "^R", 444, ASCII, NOINVERT +} + +"eat" MENU +LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_AUS +{ + MENUITEM "fish", 100 + MENUITEM "salad", 101 + MENUITEM "duck", 102 +} + + +myresource stringarray { + "this is a user defined resource\0", + "it contains many strings\0", } \ No newline at end of file diff --git a/llvm/test/tools/llvm-rc/Inputs/dialog-with-menu.rc b/llvm/test/tools/llvm-rc/Inputs/dialog-with-menu.rc index bb79dca399c219..c700b587af6483 100644 --- a/llvm/test/tools/llvm-rc/Inputs/dialog-with-menu.rc +++ b/llvm/test/tools/llvm-rc/Inputs/dialog-with-menu.rc @@ -1,16 +1,16 @@ -101 DIALOG 0, 0, 362, 246 -STYLE 0x40l | 0x0004l | 0x0008l | 0x0800l | 0x00020000l | - 0x00010000l | 0x80000000l | 0x10000000l | 0x02000000l | 0x00C00000l | - 0x00080000l | 0x00040000l -CAPTION "MakeNSISW" -MENU 104 -FONT 8, "MS Shell Dlg" -BEGIN - CONTROL "",202,"RichEdit20A",0x0004l | 0x0040l | - 0x0100l | 0x0800l | 0x00008000 | - 0x00010000l | 0x00800000l | 0x00200000l,7,22,348,190 - CONTROL "",-1,"Static",0x00000010l,7,220,346,1 - LTEXT "",200,7,230,200,12,0x08000000l - DEFPUSHBUTTON "Test &Installer",203,230,226,60,15,0x08000000l | 0x00010000l - PUSHBUTTON "&Close",2,296,226,49,15,0x00010000l -END +101 DIALOG 0, 0, 362, 246 +STYLE 0x40l | 0x0004l | 0x0008l | 0x0800l | 0x00020000l | + 0x00010000l | 0x80000000l | 0x10000000l | 0x02000000l | 0x00C00000l | + 0x00080000l | 0x00040000l +CAPTION "MakeNSISW" +MENU 104 +FONT 8, "MS Shell Dlg" +BEGIN + CONTROL "",202,"RichEdit20A",0x0004l | 0x0040l | + 0x0100l | 0x0800l | 0x00008000 | + 0x00010000l | 0x00800000l | 0x00200000l,7,22,348,190 + CONTROL "",-1,"Static",0x00000010l,7,220,346,1 + LTEXT "",200,7,230,200,12,0x08000000l + DEFPUSHBUTTON "Test &Installer",203,230,226,60,15,0x08000000l | 0x00010000l + PUSHBUTTON "&Close",2,296,226,49,15,0x00010000l +END diff --git a/llvm/test/tools/llvm-readobj/COFF/Inputs/resources/test_resource.rc b/llvm/test/tools/llvm-readobj/COFF/Inputs/resources/test_resource.rc index fd616520dbe1b3..6ad56bc02d73ca 100644 --- a/llvm/test/tools/llvm-readobj/COFF/Inputs/resources/test_resource.rc +++ b/llvm/test/tools/llvm-readobj/COFF/Inputs/resources/test_resource.rc @@ -1,44 +1,44 @@ -#include "windows.h" - -LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US - -myaccelerators ACCELERATORS -{ - "^C", 999, VIRTKEY, ALT - "D", 1100, VIRTKEY, CONTROL, SHIFT - "^R", 444, ASCII, NOINVERT -} - -cursor BITMAP "cursor_small.bmp" -okay BITMAP "okay_small.bmp" - -14432 MENU -LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED -{ - MENUITEM "yu", 100 - MENUITEM "shala", 101 - MENUITEM "kaoya", 102 -} - -testdialog DIALOG 10, 10, 200, 300 -STYLE WS_POPUP | WS_BORDER -CAPTION "Test" -{ - CTEXT "Continue:", 1, 10, 10, 230, 14 - PUSHBUTTON "&OK", 2, 66, 134, 161, 13 -} - -12 ACCELERATORS -{ - "X", 164, VIRTKEY, ALT - "H", 5678, VIRTKEY, CONTROL, SHIFT - "^R", 444, ASCII, NOINVERT -} - -"eat" MENU -LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_AUS -{ - MENUITEM "fish", 100 - MENUITEM "salad", 101 - MENUITEM "duck", 102 -} +#include "windows.h" + +LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US + +myaccelerators ACCELERATORS +{ + "^C", 999, VIRTKEY, ALT + "D", 1100, VIRTKEY, CONTROL, SHIFT + "^R", 444, ASCII, NOINVERT +} + +cursor BITMAP "cursor_small.bmp" +okay BITMAP "okay_small.bmp" + +14432 MENU +LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED +{ + MENUITEM "yu", 100 + MENUITEM "shala", 101 + MENUITEM "kaoya", 102 +} + +testdialog DIALOG 10, 10, 200, 300 +STYLE WS_POPUP | WS_BORDER +CAPTION "Test" +{ + CTEXT "Continue:", 1, 10, 10, 230, 14 + PUSHBUTTON "&OK", 2, 66, 134, 161, 13 +} + +12 ACCELERATORS +{ + "X", 164, VIRTKEY, ALT + "H", 5678, VIRTKEY, CONTROL, SHIFT + "^R", 444, ASCII, NOINVERT +} + +"eat" MENU +LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_AUS +{ + MENUITEM "fish", 100 + MENUITEM "salad", 101 + MENUITEM "duck", 102 +} diff --git a/llvm/unittests/Support/ModRefTest.cpp b/llvm/unittests/Support/ModRefTest.cpp index 35107e50b32db7..f77e7e39e14eab 100644 --- a/llvm/unittests/Support/ModRefTest.cpp +++ b/llvm/unittests/Support/ModRefTest.cpp @@ -1,27 +1,27 @@ -//===- llvm/unittest/Support/ModRefTest.cpp - ModRef tests ----------------===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// - -#include "llvm/Support/ModRef.h" -#include "llvm/ADT/SmallString.h" -#include "llvm/Support/raw_ostream.h" -#include "gtest/gtest.h" -#include - -using namespace llvm; - -namespace { - -// Verify that printing a MemoryEffects does not end with a ,. -TEST(ModRefTest, PrintMemoryEffects) { - std::string S; - raw_string_ostream OS(S); - OS << MemoryEffects::none(); - EXPECT_EQ(S, "ArgMem: NoModRef, InaccessibleMem: NoModRef, Other: NoModRef"); -} - -} // namespace +//===- llvm/unittest/Support/ModRefTest.cpp - ModRef tests ----------------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#include "llvm/Support/ModRef.h" +#include "llvm/ADT/SmallString.h" +#include "llvm/Support/raw_ostream.h" +#include "gtest/gtest.h" +#include + +using namespace llvm; + +namespace { + +// Verify that printing a MemoryEffects does not end with a ,. +TEST(ModRefTest, PrintMemoryEffects) { + std::string S; + raw_string_ostream OS(S); + OS << MemoryEffects::none(); + EXPECT_EQ(S, "ArgMem: NoModRef, InaccessibleMem: NoModRef, Other: NoModRef"); +} + +} // namespace diff --git a/llvm/utils/LLVMVisualizers/llvm.natvis b/llvm/utils/LLVMVisualizers/llvm.natvis index d83ae8013c51e2..03ca2d33a80ba6 100644 --- a/llvm/utils/LLVMVisualizers/llvm.natvis +++ b/llvm/utils/LLVMVisualizers/llvm.natvis @@ -1,408 +1,408 @@ - - - - - empty - {(value_type*)BeginX,[Size]} - {Size} elements - Uninitialized - - Size - Capacity - - Size - (value_type*)BeginX - - - - - - {U.VAL} - Cannot visualize APInts longer than 64 bits - - - {Data,[Length]} - {Length} elements - Uninitialized - - Length - - Length - Data - - - - - {(const char*)BeginX,[Size]s8} - (const char*)BeginX,[Size] - - Size - Capacity - - Size - (char*)BeginX - - - - - - {First,[Last - First]s8} - - - - {Data,[Length]s8} - Data,[Length]s8 - - Length - - Length - Data - - - - - - {($T1)*(intptr_t *)Data} - - - - - - {($T1)(*(intptr_t *)Value.Data & $T6::PointerBitMask)} - {($T4)((*(intptr_t *)Value.Data >> $T6::IntShift) & $T6::IntMask)} - {$T6::IntMask}: {($T1)(*(intptr_t *)Value.Data & $T6::PointerBitMask)} [{($T4)((*(intptr_t *)Value.Data >> $T6::IntShift) & $T6::IntMask)}] - - ($T1)(*(intptr_t *)Value.Data & $T6::PointerBitMask) - ($T4)((*(intptr_t *)Value.Data >> $T6::IntShift) & $T6::IntMask) - - - - - {($T1)(*(intptr_t *)Value.Data & $T5::PointerBitMask)} - {((*(intptr_t *)Value.Data >> $T5::IntShift) & $T5::IntMask)} - {$T5::IntMask}: {($T1)(*(intptr_t *)Value.Data & $T5::PointerBitMask)} [{((*(intptr_t *)Value.Data >> $T5::IntShift) & $T5::IntMask)}] - - ($T1)(*(intptr_t *)Value.Data & $T5::PointerBitMask) - ((*(intptr_t *)Value.Data >> $T5::IntShift) & $T5::IntMask) - - - - - - {($T4)(*(intptr_t *)Val.Value.Data & $T2::InfoTy::PointerBitMask)} - - - {($T5)(*(intptr_t *)Val.Value.Data & $T2::InfoTy::PointerBitMask)} - - Unexpected index in PointerUnion: {(*(intptr_t *)Val.Value.Data>>$T2::InfoTy::IntShift) & $T2::InfoTy::IntMask} - - "$T4",s8b - - ($T4)(*(intptr_t *)Val.Value.Data & $T2::InfoTy::PointerBitMask) - - "$T5",s8b - - ($T5)(*(intptr_t *)Val.Value.Data & $T2::InfoTy::PointerBitMask) - - - - - - {{ empty }} - {{ head={Head} }} - - - Head - Next - this - - - - - - empty - RefPtr [1 ref] {*Obj} - RefPtr [{Obj->RefCount} refs] {*Obj} - - Obj->RefCount - Obj - - - - - {{ [Small Mode] size={NumNonEmpty}, capacity={CurArraySize} }} - {{ [Big Mode] size={NumNonEmpty}, capacity={CurArraySize} }} - - NumNonEmpty - CurArraySize - - NumNonEmpty - ($T1*)CurArray - - - - - - empty - {{ size={NumEntries}, buckets={NumBuckets} }} - - NumEntries - NumBuckets - - NumBuckets - Buckets - - - - - - {{ size={NumItems}, buckets={NumBuckets} }} - - NumItems - NumBuckets - - NumBuckets - (MapEntryTy**)TheTable - - - - - - empty - ({this+1,s8}, {second}) - - this+1,s - second - - - - - {Data} - - - - None - {Storage.value} - - Storage.value - - - - - Error - {*((storage_type *)TStorage.buffer)} - - *((storage_type *)TStorage.buffer) - *((error_type *)ErrorStorage.buffer) - - - - - - - {{little endian value = {*(($T1*)(unsigned char *)Value.buffer)} }} - - (unsigned char *)Value.buffer,1 - (unsigned char *)Value.buffer,2 - (unsigned char *)Value.buffer,4 - (unsigned char *)Value.buffer,8 - - - - - - {{ big endian value = {*(unsigned char *)Value.buffer} }} - {{ big endian value = {(($T1)(*(unsigned char *)Value.buffer) << 8) - | ($T1)(*((unsigned char *)Value.buffer+1))} }} - {{ big endian value = {(($T1)(*(unsigned char *)Value.buffer) << 24) - | (($T1)(*((unsigned char *)Value.buffer+1)) << 16) - | (($T1)(*((unsigned char *)Value.buffer+2)) << 8) - | ($T1)(*((unsigned char *)Value.buffer+3))} }} - {{ big endian value = {(($T1)(*(unsigned char *)Value.buffer) << 56) - | (($T1)(*((unsigned char *)Value.buffer+1)) << 48) - | (($T1)(*((unsigned char *)Value.buffer+2)) << 40) - | (($T1)(*((unsigned char *)Value.buffer+3)) << 32) - | (($T1)(*((unsigned char *)Value.buffer+4)) << 24) - | (($T1)(*((unsigned char *)Value.buffer+5)) << 16) - | (($T1)(*((unsigned char *)Value.buffer+6)) << 8) - | ($T1)(*((unsigned char *)Value.buffer+7))} }} - - (unsigned char *)Value.buffer,1 - (unsigned char *)Value.buffer,2 - (unsigned char *)Value.buffer,4 - (unsigned char *)Value.buffer,8 - - - - - {ID} - - ID - - SubclassData - - *ContainedTys - - {NumContainedTys - 1} - - - NumContainedTys - 1 - ContainedTys + 1 - - - - SubclassData == 1 - - (SubclassData & llvm::StructType::SCDB_HasBody) != 0 - (SubclassData & llvm::StructType::SCDB_Packed) != 0 - (SubclassData & llvm::StructType::SCDB_IsLiteral) != 0 - (SubclassData & llvm::StructType::SCDB_IsSized) != 0 - - {NumContainedTys} - - - NumContainedTys - ContainedTys - - - - - *ContainedTys - ((llvm::ArrayType*)this)->NumElements - - *ContainedTys - ((llvm::VectorType*)this)->ElementQuantity - - *ContainedTys - ((llvm::VectorType*)this)->ElementQuantity - - SubclassData - *ContainedTys - - Context - - - - - $(Type) {*Value} - - - - $(Type) {(llvm::ISD::NodeType)this->NodeType} - - - NumOperands - OperandList - - - - - - i{Val.BitWidth} {Val.VAL} - - - - {IDAndSubclassData >> 8}bit integer type - - - - $(Type) {*VTy} {this->getName()} {SubclassData} - $(Type) {*VTy} anon {SubclassData} - - (Instruction*)this - (User*)this - - UseList - Next - Prev.Value & 3 == 3 ? (User*)(this + 1) : (User*)(this + 2) - - - - - - - Val - - - - - - - $(Type) {*VTy} {this->getName()} {SubclassData} - $(Type) {*VTy} anon {SubclassData} - - (Value*)this,nd - *VTy - - NumUserOperands - (llvm::Use*)this - NumUserOperands - - - NumUserOperands - *((llvm::Use**)this - 1) - - - - - - {getOpcodeName(SubclassID - InstructionVal)} - - (User*)this,nd - - - - - {this->getName()} {(LinkageTypes)Linkage} {(VisibilityTypes)Visibility} {(DLLStorageClassTypes)DllStorageClass} {(llvm::GlobalValue::ThreadLocalMode) ThreadLocal} - - - - - - - this - Next - this - - - - - - - pImpl - - - - - {ModuleID,s8} {TargetTriple} - - - - $(Type) {PassID} {Kind} - - + + + + + empty + {(value_type*)BeginX,[Size]} + {Size} elements + Uninitialized + + Size + Capacity + + Size + (value_type*)BeginX + + + + + + {U.VAL} + Cannot visualize APInts longer than 64 bits + + + {Data,[Length]} + {Length} elements + Uninitialized + + Length + + Length + Data + + + + + {(const char*)BeginX,[Size]s8} + (const char*)BeginX,[Size] + + Size + Capacity + + Size + (char*)BeginX + + + + + + {First,[Last - First]s8} + + + + {Data,[Length]s8} + Data,[Length]s8 + + Length + + Length + Data + + + + + + {($T1)*(intptr_t *)Data} + + + + + + {($T1)(*(intptr_t *)Value.Data & $T6::PointerBitMask)} + {($T4)((*(intptr_t *)Value.Data >> $T6::IntShift) & $T6::IntMask)} + {$T6::IntMask}: {($T1)(*(intptr_t *)Value.Data & $T6::PointerBitMask)} [{($T4)((*(intptr_t *)Value.Data >> $T6::IntShift) & $T6::IntMask)}] + + ($T1)(*(intptr_t *)Value.Data & $T6::PointerBitMask) + ($T4)((*(intptr_t *)Value.Data >> $T6::IntShift) & $T6::IntMask) + + + + + {($T1)(*(intptr_t *)Value.Data & $T5::PointerBitMask)} + {((*(intptr_t *)Value.Data >> $T5::IntShift) & $T5::IntMask)} + {$T5::IntMask}: {($T1)(*(intptr_t *)Value.Data & $T5::PointerBitMask)} [{((*(intptr_t *)Value.Data >> $T5::IntShift) & $T5::IntMask)}] + + ($T1)(*(intptr_t *)Value.Data & $T5::PointerBitMask) + ((*(intptr_t *)Value.Data >> $T5::IntShift) & $T5::IntMask) + + + + + + {($T4)(*(intptr_t *)Val.Value.Data & $T2::InfoTy::PointerBitMask)} + + + {($T5)(*(intptr_t *)Val.Value.Data & $T2::InfoTy::PointerBitMask)} + + Unexpected index in PointerUnion: {(*(intptr_t *)Val.Value.Data>>$T2::InfoTy::IntShift) & $T2::InfoTy::IntMask} + + "$T4",s8b + + ($T4)(*(intptr_t *)Val.Value.Data & $T2::InfoTy::PointerBitMask) + + "$T5",s8b + + ($T5)(*(intptr_t *)Val.Value.Data & $T2::InfoTy::PointerBitMask) + + + + + + {{ empty }} + {{ head={Head} }} + + + Head + Next + this + + + + + + empty + RefPtr [1 ref] {*Obj} + RefPtr [{Obj->RefCount} refs] {*Obj} + + Obj->RefCount + Obj + + + + + {{ [Small Mode] size={NumNonEmpty}, capacity={CurArraySize} }} + {{ [Big Mode] size={NumNonEmpty}, capacity={CurArraySize} }} + + NumNonEmpty + CurArraySize + + NumNonEmpty + ($T1*)CurArray + + + + + + empty + {{ size={NumEntries}, buckets={NumBuckets} }} + + NumEntries + NumBuckets + + NumBuckets + Buckets + + + + + + {{ size={NumItems}, buckets={NumBuckets} }} + + NumItems + NumBuckets + + NumBuckets + (MapEntryTy**)TheTable + + + + + + empty + ({this+1,s8}, {second}) + + this+1,s + second + + + + + {Data} + + + + None + {Storage.value} + + Storage.value + + + + + Error + {*((storage_type *)TStorage.buffer)} + + *((storage_type *)TStorage.buffer) + *((error_type *)ErrorStorage.buffer) + + + + + + + {{little endian value = {*(($T1*)(unsigned char *)Value.buffer)} }} + + (unsigned char *)Value.buffer,1 + (unsigned char *)Value.buffer,2 + (unsigned char *)Value.buffer,4 + (unsigned char *)Value.buffer,8 + + + + + + {{ big endian value = {*(unsigned char *)Value.buffer} }} + {{ big endian value = {(($T1)(*(unsigned char *)Value.buffer) << 8) + | ($T1)(*((unsigned char *)Value.buffer+1))} }} + {{ big endian value = {(($T1)(*(unsigned char *)Value.buffer) << 24) + | (($T1)(*((unsigned char *)Value.buffer+1)) << 16) + | (($T1)(*((unsigned char *)Value.buffer+2)) << 8) + | ($T1)(*((unsigned char *)Value.buffer+3))} }} + {{ big endian value = {(($T1)(*(unsigned char *)Value.buffer) << 56) + | (($T1)(*((unsigned char *)Value.buffer+1)) << 48) + | (($T1)(*((unsigned char *)Value.buffer+2)) << 40) + | (($T1)(*((unsigned char *)Value.buffer+3)) << 32) + | (($T1)(*((unsigned char *)Value.buffer+4)) << 24) + | (($T1)(*((unsigned char *)Value.buffer+5)) << 16) + | (($T1)(*((unsigned char *)Value.buffer+6)) << 8) + | ($T1)(*((unsigned char *)Value.buffer+7))} }} + + (unsigned char *)Value.buffer,1 + (unsigned char *)Value.buffer,2 + (unsigned char *)Value.buffer,4 + (unsigned char *)Value.buffer,8 + + + + + {ID} + + ID + + SubclassData + + *ContainedTys + + {NumContainedTys - 1} + + + NumContainedTys - 1 + ContainedTys + 1 + + + + SubclassData == 1 + + (SubclassData & llvm::StructType::SCDB_HasBody) != 0 + (SubclassData & llvm::StructType::SCDB_Packed) != 0 + (SubclassData & llvm::StructType::SCDB_IsLiteral) != 0 + (SubclassData & llvm::StructType::SCDB_IsSized) != 0 + + {NumContainedTys} + + + NumContainedTys + ContainedTys + + + + + *ContainedTys + ((llvm::ArrayType*)this)->NumElements + + *ContainedTys + ((llvm::VectorType*)this)->ElementQuantity + + *ContainedTys + ((llvm::VectorType*)this)->ElementQuantity + + SubclassData + *ContainedTys + + Context + + + + + $(Type) {*Value} + + + + $(Type) {(llvm::ISD::NodeType)this->NodeType} + + + NumOperands + OperandList + + + + + + i{Val.BitWidth} {Val.VAL} + + + + {IDAndSubclassData >> 8}bit integer type + + + + $(Type) {*VTy} {this->getName()} {SubclassData} + $(Type) {*VTy} anon {SubclassData} + + (Instruction*)this + (User*)this + + UseList + Next + Prev.Value & 3 == 3 ? (User*)(this + 1) : (User*)(this + 2) + + + + + + + Val + + + + + + + $(Type) {*VTy} {this->getName()} {SubclassData} + $(Type) {*VTy} anon {SubclassData} + + (Value*)this,nd + *VTy + + NumUserOperands + (llvm::Use*)this - NumUserOperands + + + NumUserOperands + *((llvm::Use**)this - 1) + + + + + + {getOpcodeName(SubclassID - InstructionVal)} + + (User*)this,nd + + + + + {this->getName()} {(LinkageTypes)Linkage} {(VisibilityTypes)Visibility} {(DLLStorageClassTypes)DllStorageClass} {(llvm::GlobalValue::ThreadLocalMode) ThreadLocal} + + + + + + + this + Next + this + + + + + + + pImpl + + + + + {ModuleID,s8} {TargetTriple} + + + + $(Type) {PassID} {Kind} + + diff --git a/llvm/utils/lit/tests/Inputs/shtest-shell/diff-in.dos b/llvm/utils/lit/tests/Inputs/shtest-shell/diff-in.dos index 7a0560654c5c70..0f25621c787ed3 100644 --- a/llvm/utils/lit/tests/Inputs/shtest-shell/diff-in.dos +++ b/llvm/utils/lit/tests/Inputs/shtest-shell/diff-in.dos @@ -1,3 +1,3 @@ -In this file, the -sequence "\r\n" -terminates lines. +In this file, the +sequence "\r\n" +terminates lines. diff --git a/llvm/utils/release/build_llvm_release.bat b/llvm/utils/release/build_llvm_release.bat index dd041d7d384ec4..3718673ae7a28d 100755 --- a/llvm/utils/release/build_llvm_release.bat +++ b/llvm/utils/release/build_llvm_release.bat @@ -1,515 +1,515 @@ - at echo off -setlocal enabledelayedexpansion - -goto begin - -:usage -echo Script for building the LLVM installer on Windows, -echo used for the releases at https://github.com/llvm/llvm-project/releases -echo. -echo Usage: build_llvm_release.bat --version ^ [--x86,--x64, --arm64] [--skip-checkout] [--local-python] -echo. -echo Options: -echo --version: [required] version to build -echo --help: display this help -echo --x86: build and test x86 variant -echo --x64: build and test x64 variant -echo --arm64: build and test arm64 variant -echo --skip-checkout: use local git checkout instead of downloading src.zip -echo --local-python: use installed Python and does not try to use a specific version (3.10) -echo. -echo Note: At least one variant to build is required. -echo. -echo Example: build_llvm_release.bat --version 15.0.0 --x86 --x64 -exit /b 1 - -:begin - -::============================================================================== -:: parse args -set version= -set help= -set x86= -set x64= -set arm64= -set skip-checkout= -set local-python= -call :parse_args %* - -if "%help%" NEQ "" goto usage - -if "%version%" == "" ( - echo --version option is required - echo ============================= - goto usage -) - -if "%arm64%" == "" if "%x64%" == "" if "%x86%" == "" ( - echo nothing to build! - echo choose one or several variants from: --x86 --x64 --arm64 - exit /b 1 -) - -::============================================================================== -:: check prerequisites -REM Note: -REM 7zip versions 21.x and higher will try to extract the symlinks in -REM llvm's git archive, which requires running as administrator. - -REM Check 7-zip version and/or administrator permissions. -for /f "delims=" %%i in ('7z.exe ^| findstr /r "2[1-9].[0-9][0-9]"') do set version_7z=%%i -if not "%version_7z%"=="" ( - REM Unique temporary filename to use by the 'mklink' command. - set "link_name=%temp%\%username%_%random%_%random%.tmp" - - REM As the 'mklink' requires elevated permissions, the symbolic link - REM creation will fail if the script is not running as administrator. - mklink /d "!link_name!" . 1>nul 2>nul - if errorlevel 1 ( - echo. - echo Script requires administrator permissions, or a 7-zip version 20.x or older. - echo Current version is "%version_7z%" - exit /b 1 - ) else ( - REM Remove the temporary symbolic link. - rd "!link_name!" - ) -) - -REM Prerequisites: -REM -REM Visual Studio 2019, CMake, Ninja, GNUWin32, SWIG, Python 3, -REM NSIS with the strlen_8192 patch, -REM Perl (for the OpenMP run-time). -REM -REM -REM For LLDB, SWIG version 4.1.1 should be used. -REM - -:: Detect Visual Studio -set vsinstall= -set vswhere=%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\vswhere.exe - -if "%VSINSTALLDIR%" NEQ "" ( - echo using enabled Visual Studio installation - set "vsinstall=%VSINSTALLDIR%" -) else ( - echo using vswhere to detect Visual Studio installation - FOR /F "delims=" %%r IN ('^""%vswhere%" -nologo -latest -products "*" -all -property installationPath^"') DO set vsinstall=%%r -) -set "vsdevcmd=%vsinstall%\Common7\Tools\VsDevCmd.bat" - -if not exist "%vsdevcmd%" ( - echo Can't find any installation of Visual Studio - exit /b 1 -) -echo Using VS devcmd: %vsdevcmd% - -::============================================================================== -:: start echoing what we do - at echo on - -set python32_dir=C:\Users\%USERNAME%\AppData\Local\Programs\Python\Python310-32 -set python64_dir=C:\Users\%USERNAME%\AppData\Local\Programs\Python\Python310 -set pythonarm64_dir=C:\Users\%USERNAME%\AppData\Local\Programs\Python\Python311-arm64 - -set revision=llvmorg-%version% -set package_version=%version% -set build_dir=%cd%\llvm_package_%package_version% - -echo Revision: %revision% -echo Package version: %package_version% -echo Build dir: %build_dir% -echo. - -if exist %build_dir% ( - echo Build directory already exists: %build_dir% - exit /b 1 -) -mkdir %build_dir% -cd %build_dir% || exit /b 1 - -if "%skip-checkout%" == "true" ( - echo Using local source - set llvm_src=%~dp0..\..\.. -) else ( - echo Checking out %revision% - curl -L https://github.com/llvm/llvm-project/archive/%revision%.zip -o src.zip || exit /b 1 - 7z x src.zip || exit /b 1 - mv llvm-project-* llvm-project || exit /b 1 - set llvm_src=%build_dir%\llvm-project -) - -curl -O https://gitlab.gnome.org/GNOME/libxml2/-/archive/v2.9.12/libxml2-v2.9.12.tar.gz || exit /b 1 -tar zxf libxml2-v2.9.12.tar.gz - -REM Setting CMAKE_CL_SHOWINCLUDES_PREFIX to work around PR27226. -REM Common flags for all builds. -set common_compiler_flags=-DLIBXML_STATIC -set common_cmake_flags=^ - -DCMAKE_BUILD_TYPE=Release ^ - -DLLVM_ENABLE_ASSERTIONS=OFF ^ - -DLLVM_INSTALL_TOOLCHAIN_ONLY=ON ^ - -DLLVM_TARGETS_TO_BUILD="AArch64;ARM;X86" ^ - -DLLVM_BUILD_LLVM_C_DYLIB=ON ^ - -DCMAKE_INSTALL_UCRT_LIBRARIES=ON ^ - -DPython3_FIND_REGISTRY=NEVER ^ - -DPACKAGE_VERSION=%package_version% ^ - -DLLDB_RELOCATABLE_PYTHON=1 ^ - -DLLDB_EMBED_PYTHON_HOME=OFF ^ - -DCMAKE_CL_SHOWINCLUDES_PREFIX="Note: including file: " ^ - -DLLVM_ENABLE_LIBXML2=FORCE_ON ^ - -DLLDB_ENABLE_LIBXML2=OFF ^ - -DCLANG_ENABLE_LIBXML2=OFF ^ - -DCMAKE_C_FLAGS="%common_compiler_flags%" ^ - -DCMAKE_CXX_FLAGS="%common_compiler_flags%" ^ - -DLLVM_ENABLE_RPMALLOC=ON ^ - -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;lld;compiler-rt;lldb;openmp" - -set cmake_profile_flags="" - -REM Preserve original path -set OLDPATH=%PATH% - -REM Build the 32-bits and/or 64-bits binaries. -if "%x86%" == "true" call :do_build_32 || exit /b 1 -if "%x64%" == "true" call :do_build_64 || exit /b 1 -if "%arm64%" == "true" call :do_build_arm64 || exit /b 1 -exit /b 0 - -::============================================================================== -:: Build 32-bits binaries. -::============================================================================== -:do_build_32 -call :set_environment %python32_dir% || exit /b 1 -call "%vsdevcmd%" -arch=x86 || exit /b 1 - at echo on -mkdir build32_stage0 -cd build32_stage0 -call :do_build_libxml || exit /b 1 - -REM Stage0 binaries directory; used in stage1. -set "stage0_bin_dir=%build_dir%/build32_stage0/bin" -set cmake_flags=^ - %common_cmake_flags% ^ - -DLLVM_ENABLE_RPMALLOC=OFF ^ - -DLLDB_TEST_COMPILER=%stage0_bin_dir%/clang.exe ^ - -DPYTHON_HOME=%PYTHONHOME% ^ - -DPython3_ROOT_DIR=%PYTHONHOME% ^ - -DLIBXML2_INCLUDE_DIR=%libxmldir%/include/libxml2 ^ - -DLIBXML2_LIBRARIES=%libxmldir%/lib/libxml2s.lib - -cmake -GNinja %cmake_flags% %llvm_src%\llvm || exit /b 1 -ninja || ninja || ninja || exit /b 1 -REM ninja check-llvm || ninja check-llvm || ninja check-llvm || exit /b 1 -REM ninja check-clang || ninja check-clang || ninja check-clang || exit /b 1 -ninja check-lld || ninja check-lld || ninja check-lld || exit /b 1 -ninja check-sanitizer || ninja check-sanitizer || ninja check-sanitizer || exit /b 1 -REM ninja check-clang-tools || ninja check-clang-tools || ninja check-clang-tools || exit /b 1 -cd.. - -REM CMake expects the paths that specifies the compiler and linker to be -REM with forward slash. -set all_cmake_flags=^ - %cmake_flags% ^ - -DCMAKE_C_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ - -DCMAKE_CXX_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ - -DCMAKE_LINKER=%stage0_bin_dir%/lld-link.exe ^ - -DCMAKE_AR=%stage0_bin_dir%/llvm-lib.exe ^ - -DCMAKE_RC=%stage0_bin_dir%/llvm-windres.exe -set cmake_flags=%all_cmake_flags:\=/% - -mkdir build32 -cd build32 -cmake -GNinja %cmake_flags% %llvm_src%\llvm || exit /b 1 -ninja || ninja || ninja || exit /b 1 -REM ninja check-llvm || ninja check-llvm || ninja check-llvm || exit /b 1 -REM ninja check-clang || ninja check-clang || ninja check-clang || exit /b 1 -ninja check-lld || ninja check-lld || ninja check-lld || exit /b 1 -ninja check-sanitizer || ninja check-sanitizer || ninja check-sanitizer || exit /b 1 -REM ninja check-clang-tools || ninja check-clang-tools || ninja check-clang-tools || exit /b 1 -ninja package || exit /b 1 -cd .. - -exit /b 0 -::============================================================================== - -::============================================================================== -:: Build 64-bits binaries. -::============================================================================== -:do_build_64 -call :set_environment %python64_dir% || exit /b 1 -call "%vsdevcmd%" -arch=amd64 || exit /b 1 - at echo on -mkdir build64_stage0 -cd build64_stage0 -call :do_build_libxml || exit /b 1 - -REM Stage0 binaries directory; used in stage1. -set "stage0_bin_dir=%build_dir%/build64_stage0/bin" -set cmake_flags=^ - %common_cmake_flags% ^ - -DLLDB_TEST_COMPILER=%stage0_bin_dir%/clang.exe ^ - -DPYTHON_HOME=%PYTHONHOME% ^ - -DPython3_ROOT_DIR=%PYTHONHOME% ^ - -DLIBXML2_INCLUDE_DIR=%libxmldir%/include/libxml2 ^ - -DLIBXML2_LIBRARIES=%libxmldir%/lib/libxml2s.lib - -cmake -GNinja %cmake_flags% %llvm_src%\llvm || exit /b 1 -ninja || ninja || ninja || exit /b 1 -ninja check-llvm || ninja check-llvm || ninja check-llvm || exit /b 1 -ninja check-clang || ninja check-clang || ninja check-clang || exit /b 1 -ninja check-lld || ninja check-lld || ninja check-lld || exit /b 1 -ninja check-sanitizer || ninja check-sanitizer || ninja check-sanitizer || exit /b 1 -ninja check-clang-tools || ninja check-clang-tools || ninja check-clang-tools || exit /b 1 -ninja check-clangd || ninja check-clangd || ninja check-clangd || exit /b 1 -cd.. - -REM CMake expects the paths that specifies the compiler and linker to be -REM with forward slash. -set all_cmake_flags=^ - %cmake_flags% ^ - -DCMAKE_C_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ - -DCMAKE_CXX_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ - -DCMAKE_LINKER=%stage0_bin_dir%/lld-link.exe ^ - -DCMAKE_AR=%stage0_bin_dir%/llvm-lib.exe ^ - -DCMAKE_RC=%stage0_bin_dir%/llvm-windres.exe -set cmake_flags=%all_cmake_flags:\=/% - - -mkdir build64 -cd build64 -call :do_generate_profile || exit /b 1 -cmake -GNinja %cmake_flags% %cmake_profile_flags% %llvm_src%\llvm || exit /b 1 -ninja || ninja || ninja || exit /b 1 -ninja check-llvm || ninja check-llvm || ninja check-llvm || exit /b 1 -ninja check-clang || ninja check-clang || ninja check-clang || exit /b 1 -ninja check-lld || ninja check-lld || ninja check-lld || exit /b 1 -ninja check-sanitizer || ninja check-sanitizer || ninja check-sanitizer || exit /b 1 -ninja check-clang-tools || ninja check-clang-tools || ninja check-clang-tools || exit /b 1 -ninja check-clangd || ninja check-clangd || ninja check-clangd || exit /b 1 -ninja package || exit /b 1 - -:: generate tarball with install toolchain only off -set filename=clang+llvm-%version%-x86_64-pc-windows-msvc -cmake -GNinja %cmake_flags% %cmake_profile_flags% -DLLVM_INSTALL_TOOLCHAIN_ONLY=OFF ^ - -DCMAKE_INSTALL_PREFIX=%build_dir%/%filename% ..\llvm-project\llvm || exit /b 1 -ninja install || exit /b 1 -:: check llvm_config is present & returns something -%build_dir%/%filename%/bin/llvm-config.exe --bindir || exit /b 1 -cd .. -7z a -ttar -so %filename%.tar %filename% | 7z a -txz -si %filename%.tar.xz - -exit /b 0 -::============================================================================== - -::============================================================================== -:: Build arm64 binaries. -::============================================================================== -:do_build_arm64 -call :set_environment %pythonarm64_dir% || exit /b 1 -call "%vsdevcmd%" -host_arch=x64 -arch=arm64 || exit /b 1 - at echo on -mkdir build_arm64_stage0 -cd build_arm64_stage0 -call :do_build_libxml || exit /b 1 - -REM Stage0 binaries directory; used in stage1. -set "stage0_bin_dir=%build_dir%/build_arm64_stage0/bin" -set cmake_flags=^ - %common_cmake_flags% ^ - -DCLANG_DEFAULT_LINKER=lld ^ - -DLIBXML2_INCLUDE_DIR=%libxmldir%/include/libxml2 ^ - -DLIBXML2_LIBRARIES=%libxmldir%/lib/libxml2s.lib ^ - -DPython3_ROOT_DIR=%PYTHONHOME% ^ - -DCOMPILER_RT_BUILD_PROFILE=OFF ^ - -DCOMPILER_RT_BUILD_SANITIZERS=OFF - -REM We need to build stage0 compiler-rt with clang-cl (msvc lacks some builtins). -cmake -GNinja %cmake_flags% ^ - -DCMAKE_C_COMPILER=clang-cl.exe ^ - -DCMAKE_CXX_COMPILER=clang-cl.exe ^ - %llvm_src%\llvm || exit /b 1 -ninja || exit /b 1 -::ninja check-llvm || exit /b 1 -::ninja check-clang || exit /b 1 -::ninja check-lld || exit /b 1 -::ninja check-sanitizer || exit /b 1 -::ninja check-clang-tools || exit /b 1 -::ninja check-clangd || exit /b 1 -cd.. - -REM CMake expects the paths that specifies the compiler and linker to be -REM with forward slash. -REM CPACK_SYSTEM_NAME is set to have a correct name for installer generated. -set all_cmake_flags=^ - %cmake_flags% ^ - -DCMAKE_C_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ - -DCMAKE_CXX_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ - -DCMAKE_LINKER=%stage0_bin_dir%/lld-link.exe ^ - -DCMAKE_AR=%stage0_bin_dir%/llvm-lib.exe ^ - -DCMAKE_RC=%stage0_bin_dir%/llvm-windres.exe ^ - -DCPACK_SYSTEM_NAME=woa64 -set cmake_flags=%all_cmake_flags:\=/% - -mkdir build_arm64 -cd build_arm64 -cmake -GNinja %cmake_flags% %llvm_src%\llvm || exit /b 1 -ninja || exit /b 1 -REM Check but do not fail on errors. -ninja check-lldb -::ninja check-llvm || exit /b 1 -::ninja check-clang || exit /b 1 -::ninja check-lld || exit /b 1 -::ninja check-sanitizer || exit /b 1 -::ninja check-clang-tools || exit /b 1 -::ninja check-clangd || exit /b 1 -ninja package || exit /b 1 -cd .. - -exit /b 0 -::============================================================================== -:: -::============================================================================== -:: Set PATH and some environment variables. -::============================================================================== -:set_environment -REM Restore original path -set PATH=%OLDPATH% - -set python_dir=%1 - -REM Set Python environment -if "%local-python%" == "true" ( - FOR /F "delims=" %%i IN ('where python.exe ^| head -1') DO set python_exe=%%i - set PYTHONHOME=!python_exe:~0,-11! -) else ( - %python_dir%/python.exe --version || exit /b 1 - set PYTHONHOME=%python_dir% -) -set PATH=%PYTHONHOME%;%PATH% - -set "VSCMD_START_DIR=%build_dir%" - -exit /b 0 - -::============================================================================= - -::============================================================================== -:: Build libxml. -::============================================================================== -:do_build_libxml -mkdir libxmlbuild -cd libxmlbuild -cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=install ^ - -DBUILD_SHARED_LIBS=OFF -DLIBXML2_WITH_C14N=OFF -DLIBXML2_WITH_CATALOG=OFF ^ - -DLIBXML2_WITH_DEBUG=OFF -DLIBXML2_WITH_DOCB=OFF -DLIBXML2_WITH_FTP=OFF ^ - -DLIBXML2_WITH_HTML=OFF -DLIBXML2_WITH_HTTP=OFF -DLIBXML2_WITH_ICONV=OFF ^ - -DLIBXML2_WITH_ICU=OFF -DLIBXML2_WITH_ISO8859X=OFF -DLIBXML2_WITH_LEGACY=OFF ^ - -DLIBXML2_WITH_LZMA=OFF -DLIBXML2_WITH_MEM_DEBUG=OFF -DLIBXML2_WITH_MODULES=OFF ^ - -DLIBXML2_WITH_OUTPUT=ON -DLIBXML2_WITH_PATTERN=OFF -DLIBXML2_WITH_PROGRAMS=OFF ^ - -DLIBXML2_WITH_PUSH=OFF -DLIBXML2_WITH_PYTHON=OFF -DLIBXML2_WITH_READER=OFF ^ - -DLIBXML2_WITH_REGEXPS=OFF -DLIBXML2_WITH_RUN_DEBUG=OFF -DLIBXML2_WITH_SAX1=OFF ^ - -DLIBXML2_WITH_SCHEMAS=OFF -DLIBXML2_WITH_SCHEMATRON=OFF -DLIBXML2_WITH_TESTS=OFF ^ - -DLIBXML2_WITH_THREADS=ON -DLIBXML2_WITH_THREAD_ALLOC=OFF -DLIBXML2_WITH_TREE=ON ^ - -DLIBXML2_WITH_VALID=OFF -DLIBXML2_WITH_WRITER=OFF -DLIBXML2_WITH_XINCLUDE=OFF ^ - -DLIBXML2_WITH_XPATH=OFF -DLIBXML2_WITH_XPTR=OFF -DLIBXML2_WITH_ZLIB=OFF ^ - -DCMAKE_MSVC_RUNTIME_LIBRARY=MultiThreaded ^ - ../../libxml2-v2.9.12 || exit /b 1 -ninja install || exit /b 1 -set libxmldir=%cd%\install -set "libxmldir=%libxmldir:\=/%" -cd .. -exit /b 0 - -::============================================================================== -:: Generate a PGO profile. -::============================================================================== -:do_generate_profile -REM Build Clang with instrumentation. -mkdir instrument -cd instrument -cmake -GNinja %cmake_flags% -DLLVM_TARGETS_TO_BUILD=Native ^ - -DLLVM_BUILD_INSTRUMENTED=IR %llvm_src%\llvm || exit /b 1 -ninja clang || ninja clang || ninja clang || exit /b 1 -set instrumented_clang=%cd:\=/%/bin/clang-cl.exe -cd .. -REM Use that to build part of llvm to generate a profile. -mkdir train -cd train -cmake -GNinja %cmake_flags% ^ - -DCMAKE_C_COMPILER=%instrumented_clang% ^ - -DCMAKE_CXX_COMPILER=%instrumented_clang% ^ - -DLLVM_ENABLE_PROJECTS=clang ^ - -DLLVM_TARGETS_TO_BUILD=Native ^ - %llvm_src%\llvm || exit /b 1 -REM Drop profiles generated from running cmake; those are not representative. -del ..\instrument\profiles\*.profraw -ninja tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/Sema.cpp.obj -cd .. -set profile=%cd:\=/%/profile.profdata -%stage0_bin_dir%\llvm-profdata merge -output=%profile% instrument\profiles\*.profraw || exit /b 1 -set common_compiler_flags=%common_compiler_flags% -Wno-backend-plugin -set cmake_profile_flags=-DLLVM_PROFDATA_FILE=%profile% ^ - -DCMAKE_C_FLAGS="%common_compiler_flags%" ^ - -DCMAKE_CXX_FLAGS="%common_compiler_flags%" -exit /b 0 - -::============================================================================= -:: Parse command line arguments. -:: The format for the arguments is: -:: Boolean: --option -:: Value: --optionvalue -:: with being: space, colon, semicolon or equal sign -:: -:: Command line usage example: -:: my-batch-file.bat --build --type=release --version 123 -:: It will create 3 variables: -:: 'build' with the value 'true' -:: 'type' with the value 'release' -:: 'version' with the value '123' -:: -:: Usage: -:: set "build=" -:: set "type=" -:: set "version=" -:: -:: REM Parse arguments. -:: call :parse_args %* -:: -:: if defined build ( -:: ... -:: ) -:: if %type%=='release' ( -:: ... -:: ) -:: if %version%=='123' ( -:: ... -:: ) -::============================================================================= -:parse_args - set "arg_name=" - :parse_args_start - if "%1" == "" ( - :: Set a seen boolean argument. - if "%arg_name%" neq "" ( - set "%arg_name%=true" - ) - goto :parse_args_done - ) - set aux=%1 - if "%aux:~0,2%" == "--" ( - :: Set a seen boolean argument. - if "%arg_name%" neq "" ( - set "%arg_name%=true" - ) - set "arg_name=%aux:~2,250%" - ) else ( - set "%arg_name%=%1" - set "arg_name=" - ) - shift - goto :parse_args_start - -:parse_args_done -exit /b 0 + at echo off +setlocal enabledelayedexpansion + +goto begin + +:usage +echo Script for building the LLVM installer on Windows, +echo used for the releases at https://github.com/llvm/llvm-project/releases +echo. +echo Usage: build_llvm_release.bat --version ^ [--x86,--x64, --arm64] [--skip-checkout] [--local-python] +echo. +echo Options: +echo --version: [required] version to build +echo --help: display this help +echo --x86: build and test x86 variant +echo --x64: build and test x64 variant +echo --arm64: build and test arm64 variant +echo --skip-checkout: use local git checkout instead of downloading src.zip +echo --local-python: use installed Python and does not try to use a specific version (3.10) +echo. +echo Note: At least one variant to build is required. +echo. +echo Example: build_llvm_release.bat --version 15.0.0 --x86 --x64 +exit /b 1 + +:begin + +::============================================================================== +:: parse args +set version= +set help= +set x86= +set x64= +set arm64= +set skip-checkout= +set local-python= +call :parse_args %* + +if "%help%" NEQ "" goto usage + +if "%version%" == "" ( + echo --version option is required + echo ============================= + goto usage +) + +if "%arm64%" == "" if "%x64%" == "" if "%x86%" == "" ( + echo nothing to build! + echo choose one or several variants from: --x86 --x64 --arm64 + exit /b 1 +) + +::============================================================================== +:: check prerequisites +REM Note: +REM 7zip versions 21.x and higher will try to extract the symlinks in +REM llvm's git archive, which requires running as administrator. + +REM Check 7-zip version and/or administrator permissions. +for /f "delims=" %%i in ('7z.exe ^| findstr /r "2[1-9].[0-9][0-9]"') do set version_7z=%%i +if not "%version_7z%"=="" ( + REM Unique temporary filename to use by the 'mklink' command. + set "link_name=%temp%\%username%_%random%_%random%.tmp" + + REM As the 'mklink' requires elevated permissions, the symbolic link + REM creation will fail if the script is not running as administrator. + mklink /d "!link_name!" . 1>nul 2>nul + if errorlevel 1 ( + echo. + echo Script requires administrator permissions, or a 7-zip version 20.x or older. + echo Current version is "%version_7z%" + exit /b 1 + ) else ( + REM Remove the temporary symbolic link. + rd "!link_name!" + ) +) + +REM Prerequisites: +REM +REM Visual Studio 2019, CMake, Ninja, GNUWin32, SWIG, Python 3, +REM NSIS with the strlen_8192 patch, +REM Perl (for the OpenMP run-time). +REM +REM +REM For LLDB, SWIG version 4.1.1 should be used. +REM + +:: Detect Visual Studio +set vsinstall= +set vswhere=%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\vswhere.exe + +if "%VSINSTALLDIR%" NEQ "" ( + echo using enabled Visual Studio installation + set "vsinstall=%VSINSTALLDIR%" +) else ( + echo using vswhere to detect Visual Studio installation + FOR /F "delims=" %%r IN ('^""%vswhere%" -nologo -latest -products "*" -all -property installationPath^"') DO set vsinstall=%%r +) +set "vsdevcmd=%vsinstall%\Common7\Tools\VsDevCmd.bat" + +if not exist "%vsdevcmd%" ( + echo Can't find any installation of Visual Studio + exit /b 1 +) +echo Using VS devcmd: %vsdevcmd% + +::============================================================================== +:: start echoing what we do + at echo on + +set python32_dir=C:\Users\%USERNAME%\AppData\Local\Programs\Python\Python310-32 +set python64_dir=C:\Users\%USERNAME%\AppData\Local\Programs\Python\Python310 +set pythonarm64_dir=C:\Users\%USERNAME%\AppData\Local\Programs\Python\Python311-arm64 + +set revision=llvmorg-%version% +set package_version=%version% +set build_dir=%cd%\llvm_package_%package_version% + +echo Revision: %revision% +echo Package version: %package_version% +echo Build dir: %build_dir% +echo. + +if exist %build_dir% ( + echo Build directory already exists: %build_dir% + exit /b 1 +) +mkdir %build_dir% +cd %build_dir% || exit /b 1 + +if "%skip-checkout%" == "true" ( + echo Using local source + set llvm_src=%~dp0..\..\.. +) else ( + echo Checking out %revision% + curl -L https://github.com/llvm/llvm-project/archive/%revision%.zip -o src.zip || exit /b 1 + 7z x src.zip || exit /b 1 + mv llvm-project-* llvm-project || exit /b 1 + set llvm_src=%build_dir%\llvm-project +) + +curl -O https://gitlab.gnome.org/GNOME/libxml2/-/archive/v2.9.12/libxml2-v2.9.12.tar.gz || exit /b 1 +tar zxf libxml2-v2.9.12.tar.gz + +REM Setting CMAKE_CL_SHOWINCLUDES_PREFIX to work around PR27226. +REM Common flags for all builds. +set common_compiler_flags=-DLIBXML_STATIC +set common_cmake_flags=^ + -DCMAKE_BUILD_TYPE=Release ^ + -DLLVM_ENABLE_ASSERTIONS=OFF ^ + -DLLVM_INSTALL_TOOLCHAIN_ONLY=ON ^ + -DLLVM_TARGETS_TO_BUILD="AArch64;ARM;X86" ^ + -DLLVM_BUILD_LLVM_C_DYLIB=ON ^ + -DCMAKE_INSTALL_UCRT_LIBRARIES=ON ^ + -DPython3_FIND_REGISTRY=NEVER ^ + -DPACKAGE_VERSION=%package_version% ^ + -DLLDB_RELOCATABLE_PYTHON=1 ^ + -DLLDB_EMBED_PYTHON_HOME=OFF ^ + -DCMAKE_CL_SHOWINCLUDES_PREFIX="Note: including file: " ^ + -DLLVM_ENABLE_LIBXML2=FORCE_ON ^ + -DLLDB_ENABLE_LIBXML2=OFF ^ + -DCLANG_ENABLE_LIBXML2=OFF ^ + -DCMAKE_C_FLAGS="%common_compiler_flags%" ^ + -DCMAKE_CXX_FLAGS="%common_compiler_flags%" ^ + -DLLVM_ENABLE_RPMALLOC=ON ^ + -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;lld;compiler-rt;lldb;openmp" + +set cmake_profile_flags="" + +REM Preserve original path +set OLDPATH=%PATH% + +REM Build the 32-bits and/or 64-bits binaries. +if "%x86%" == "true" call :do_build_32 || exit /b 1 +if "%x64%" == "true" call :do_build_64 || exit /b 1 +if "%arm64%" == "true" call :do_build_arm64 || exit /b 1 +exit /b 0 + +::============================================================================== +:: Build 32-bits binaries. +::============================================================================== +:do_build_32 +call :set_environment %python32_dir% || exit /b 1 +call "%vsdevcmd%" -arch=x86 || exit /b 1 + at echo on +mkdir build32_stage0 +cd build32_stage0 +call :do_build_libxml || exit /b 1 + +REM Stage0 binaries directory; used in stage1. +set "stage0_bin_dir=%build_dir%/build32_stage0/bin" +set cmake_flags=^ + %common_cmake_flags% ^ + -DLLVM_ENABLE_RPMALLOC=OFF ^ + -DLLDB_TEST_COMPILER=%stage0_bin_dir%/clang.exe ^ + -DPYTHON_HOME=%PYTHONHOME% ^ + -DPython3_ROOT_DIR=%PYTHONHOME% ^ + -DLIBXML2_INCLUDE_DIR=%libxmldir%/include/libxml2 ^ + -DLIBXML2_LIBRARIES=%libxmldir%/lib/libxml2s.lib + +cmake -GNinja %cmake_flags% %llvm_src%\llvm || exit /b 1 +ninja || ninja || ninja || exit /b 1 +REM ninja check-llvm || ninja check-llvm || ninja check-llvm || exit /b 1 +REM ninja check-clang || ninja check-clang || ninja check-clang || exit /b 1 +ninja check-lld || ninja check-lld || ninja check-lld || exit /b 1 +ninja check-sanitizer || ninja check-sanitizer || ninja check-sanitizer || exit /b 1 +REM ninja check-clang-tools || ninja check-clang-tools || ninja check-clang-tools || exit /b 1 +cd.. + +REM CMake expects the paths that specifies the compiler and linker to be +REM with forward slash. +set all_cmake_flags=^ + %cmake_flags% ^ + -DCMAKE_C_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ + -DCMAKE_CXX_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ + -DCMAKE_LINKER=%stage0_bin_dir%/lld-link.exe ^ + -DCMAKE_AR=%stage0_bin_dir%/llvm-lib.exe ^ + -DCMAKE_RC=%stage0_bin_dir%/llvm-windres.exe +set cmake_flags=%all_cmake_flags:\=/% + +mkdir build32 +cd build32 +cmake -GNinja %cmake_flags% %llvm_src%\llvm || exit /b 1 +ninja || ninja || ninja || exit /b 1 +REM ninja check-llvm || ninja check-llvm || ninja check-llvm || exit /b 1 +REM ninja check-clang || ninja check-clang || ninja check-clang || exit /b 1 +ninja check-lld || ninja check-lld || ninja check-lld || exit /b 1 +ninja check-sanitizer || ninja check-sanitizer || ninja check-sanitizer || exit /b 1 +REM ninja check-clang-tools || ninja check-clang-tools || ninja check-clang-tools || exit /b 1 +ninja package || exit /b 1 +cd .. + +exit /b 0 +::============================================================================== + +::============================================================================== +:: Build 64-bits binaries. +::============================================================================== +:do_build_64 +call :set_environment %python64_dir% || exit /b 1 +call "%vsdevcmd%" -arch=amd64 || exit /b 1 + at echo on +mkdir build64_stage0 +cd build64_stage0 +call :do_build_libxml || exit /b 1 + +REM Stage0 binaries directory; used in stage1. +set "stage0_bin_dir=%build_dir%/build64_stage0/bin" +set cmake_flags=^ + %common_cmake_flags% ^ + -DLLDB_TEST_COMPILER=%stage0_bin_dir%/clang.exe ^ + -DPYTHON_HOME=%PYTHONHOME% ^ + -DPython3_ROOT_DIR=%PYTHONHOME% ^ + -DLIBXML2_INCLUDE_DIR=%libxmldir%/include/libxml2 ^ + -DLIBXML2_LIBRARIES=%libxmldir%/lib/libxml2s.lib + +cmake -GNinja %cmake_flags% %llvm_src%\llvm || exit /b 1 +ninja || ninja || ninja || exit /b 1 +ninja check-llvm || ninja check-llvm || ninja check-llvm || exit /b 1 +ninja check-clang || ninja check-clang || ninja check-clang || exit /b 1 +ninja check-lld || ninja check-lld || ninja check-lld || exit /b 1 +ninja check-sanitizer || ninja check-sanitizer || ninja check-sanitizer || exit /b 1 +ninja check-clang-tools || ninja check-clang-tools || ninja check-clang-tools || exit /b 1 +ninja check-clangd || ninja check-clangd || ninja check-clangd || exit /b 1 +cd.. + +REM CMake expects the paths that specifies the compiler and linker to be +REM with forward slash. +set all_cmake_flags=^ + %cmake_flags% ^ + -DCMAKE_C_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ + -DCMAKE_CXX_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ + -DCMAKE_LINKER=%stage0_bin_dir%/lld-link.exe ^ + -DCMAKE_AR=%stage0_bin_dir%/llvm-lib.exe ^ + -DCMAKE_RC=%stage0_bin_dir%/llvm-windres.exe +set cmake_flags=%all_cmake_flags:\=/% + + +mkdir build64 +cd build64 +call :do_generate_profile || exit /b 1 +cmake -GNinja %cmake_flags% %cmake_profile_flags% %llvm_src%\llvm || exit /b 1 +ninja || ninja || ninja || exit /b 1 +ninja check-llvm || ninja check-llvm || ninja check-llvm || exit /b 1 +ninja check-clang || ninja check-clang || ninja check-clang || exit /b 1 +ninja check-lld || ninja check-lld || ninja check-lld || exit /b 1 +ninja check-sanitizer || ninja check-sanitizer || ninja check-sanitizer || exit /b 1 +ninja check-clang-tools || ninja check-clang-tools || ninja check-clang-tools || exit /b 1 +ninja check-clangd || ninja check-clangd || ninja check-clangd || exit /b 1 +ninja package || exit /b 1 + +:: generate tarball with install toolchain only off +set filename=clang+llvm-%version%-x86_64-pc-windows-msvc +cmake -GNinja %cmake_flags% %cmake_profile_flags% -DLLVM_INSTALL_TOOLCHAIN_ONLY=OFF ^ + -DCMAKE_INSTALL_PREFIX=%build_dir%/%filename% ..\llvm-project\llvm || exit /b 1 +ninja install || exit /b 1 +:: check llvm_config is present & returns something +%build_dir%/%filename%/bin/llvm-config.exe --bindir || exit /b 1 +cd .. +7z a -ttar -so %filename%.tar %filename% | 7z a -txz -si %filename%.tar.xz + +exit /b 0 +::============================================================================== + +::============================================================================== +:: Build arm64 binaries. +::============================================================================== +:do_build_arm64 +call :set_environment %pythonarm64_dir% || exit /b 1 +call "%vsdevcmd%" -host_arch=x64 -arch=arm64 || exit /b 1 + at echo on +mkdir build_arm64_stage0 +cd build_arm64_stage0 +call :do_build_libxml || exit /b 1 + +REM Stage0 binaries directory; used in stage1. +set "stage0_bin_dir=%build_dir%/build_arm64_stage0/bin" +set cmake_flags=^ + %common_cmake_flags% ^ + -DCLANG_DEFAULT_LINKER=lld ^ + -DLIBXML2_INCLUDE_DIR=%libxmldir%/include/libxml2 ^ + -DLIBXML2_LIBRARIES=%libxmldir%/lib/libxml2s.lib ^ + -DPython3_ROOT_DIR=%PYTHONHOME% ^ + -DCOMPILER_RT_BUILD_PROFILE=OFF ^ + -DCOMPILER_RT_BUILD_SANITIZERS=OFF + +REM We need to build stage0 compiler-rt with clang-cl (msvc lacks some builtins). +cmake -GNinja %cmake_flags% ^ + -DCMAKE_C_COMPILER=clang-cl.exe ^ + -DCMAKE_CXX_COMPILER=clang-cl.exe ^ + %llvm_src%\llvm || exit /b 1 +ninja || exit /b 1 +::ninja check-llvm || exit /b 1 +::ninja check-clang || exit /b 1 +::ninja check-lld || exit /b 1 +::ninja check-sanitizer || exit /b 1 +::ninja check-clang-tools || exit /b 1 +::ninja check-clangd || exit /b 1 +cd.. + +REM CMake expects the paths that specifies the compiler and linker to be +REM with forward slash. +REM CPACK_SYSTEM_NAME is set to have a correct name for installer generated. +set all_cmake_flags=^ + %cmake_flags% ^ + -DCMAKE_C_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ + -DCMAKE_CXX_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ + -DCMAKE_LINKER=%stage0_bin_dir%/lld-link.exe ^ + -DCMAKE_AR=%stage0_bin_dir%/llvm-lib.exe ^ + -DCMAKE_RC=%stage0_bin_dir%/llvm-windres.exe ^ + -DCPACK_SYSTEM_NAME=woa64 +set cmake_flags=%all_cmake_flags:\=/% + +mkdir build_arm64 +cd build_arm64 +cmake -GNinja %cmake_flags% %llvm_src%\llvm || exit /b 1 +ninja || exit /b 1 +REM Check but do not fail on errors. +ninja check-lldb +::ninja check-llvm || exit /b 1 +::ninja check-clang || exit /b 1 +::ninja check-lld || exit /b 1 +::ninja check-sanitizer || exit /b 1 +::ninja check-clang-tools || exit /b 1 +::ninja check-clangd || exit /b 1 +ninja package || exit /b 1 +cd .. + +exit /b 0 +::============================================================================== +:: +::============================================================================== +:: Set PATH and some environment variables. +::============================================================================== +:set_environment +REM Restore original path +set PATH=%OLDPATH% + +set python_dir=%1 + +REM Set Python environment +if "%local-python%" == "true" ( + FOR /F "delims=" %%i IN ('where python.exe ^| head -1') DO set python_exe=%%i + set PYTHONHOME=!python_exe:~0,-11! +) else ( + %python_dir%/python.exe --version || exit /b 1 + set PYTHONHOME=%python_dir% +) +set PATH=%PYTHONHOME%;%PATH% + +set "VSCMD_START_DIR=%build_dir%" + +exit /b 0 + +::============================================================================= + +::============================================================================== +:: Build libxml. +::============================================================================== +:do_build_libxml +mkdir libxmlbuild +cd libxmlbuild +cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=install ^ + -DBUILD_SHARED_LIBS=OFF -DLIBXML2_WITH_C14N=OFF -DLIBXML2_WITH_CATALOG=OFF ^ + -DLIBXML2_WITH_DEBUG=OFF -DLIBXML2_WITH_DOCB=OFF -DLIBXML2_WITH_FTP=OFF ^ + -DLIBXML2_WITH_HTML=OFF -DLIBXML2_WITH_HTTP=OFF -DLIBXML2_WITH_ICONV=OFF ^ + -DLIBXML2_WITH_ICU=OFF -DLIBXML2_WITH_ISO8859X=OFF -DLIBXML2_WITH_LEGACY=OFF ^ + -DLIBXML2_WITH_LZMA=OFF -DLIBXML2_WITH_MEM_DEBUG=OFF -DLIBXML2_WITH_MODULES=OFF ^ + -DLIBXML2_WITH_OUTPUT=ON -DLIBXML2_WITH_PATTERN=OFF -DLIBXML2_WITH_PROGRAMS=OFF ^ + -DLIBXML2_WITH_PUSH=OFF -DLIBXML2_WITH_PYTHON=OFF -DLIBXML2_WITH_READER=OFF ^ + -DLIBXML2_WITH_REGEXPS=OFF -DLIBXML2_WITH_RUN_DEBUG=OFF -DLIBXML2_WITH_SAX1=OFF ^ + -DLIBXML2_WITH_SCHEMAS=OFF -DLIBXML2_WITH_SCHEMATRON=OFF -DLIBXML2_WITH_TESTS=OFF ^ + -DLIBXML2_WITH_THREADS=ON -DLIBXML2_WITH_THREAD_ALLOC=OFF -DLIBXML2_WITH_TREE=ON ^ + -DLIBXML2_WITH_VALID=OFF -DLIBXML2_WITH_WRITER=OFF -DLIBXML2_WITH_XINCLUDE=OFF ^ + -DLIBXML2_WITH_XPATH=OFF -DLIBXML2_WITH_XPTR=OFF -DLIBXML2_WITH_ZLIB=OFF ^ + -DCMAKE_MSVC_RUNTIME_LIBRARY=MultiThreaded ^ + ../../libxml2-v2.9.12 || exit /b 1 +ninja install || exit /b 1 +set libxmldir=%cd%\install +set "libxmldir=%libxmldir:\=/%" +cd .. +exit /b 0 + +::============================================================================== +:: Generate a PGO profile. +::============================================================================== +:do_generate_profile +REM Build Clang with instrumentation. +mkdir instrument +cd instrument +cmake -GNinja %cmake_flags% -DLLVM_TARGETS_TO_BUILD=Native ^ + -DLLVM_BUILD_INSTRUMENTED=IR %llvm_src%\llvm || exit /b 1 +ninja clang || ninja clang || ninja clang || exit /b 1 +set instrumented_clang=%cd:\=/%/bin/clang-cl.exe +cd .. +REM Use that to build part of llvm to generate a profile. +mkdir train +cd train +cmake -GNinja %cmake_flags% ^ + -DCMAKE_C_COMPILER=%instrumented_clang% ^ + -DCMAKE_CXX_COMPILER=%instrumented_clang% ^ + -DLLVM_ENABLE_PROJECTS=clang ^ + -DLLVM_TARGETS_TO_BUILD=Native ^ + %llvm_src%\llvm || exit /b 1 +REM Drop profiles generated from running cmake; those are not representative. +del ..\instrument\profiles\*.profraw +ninja tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/Sema.cpp.obj +cd .. +set profile=%cd:\=/%/profile.profdata +%stage0_bin_dir%\llvm-profdata merge -output=%profile% instrument\profiles\*.profraw || exit /b 1 +set common_compiler_flags=%common_compiler_flags% -Wno-backend-plugin +set cmake_profile_flags=-DLLVM_PROFDATA_FILE=%profile% ^ + -DCMAKE_C_FLAGS="%common_compiler_flags%" ^ + -DCMAKE_CXX_FLAGS="%common_compiler_flags%" +exit /b 0 + +::============================================================================= +:: Parse command line arguments. +:: The format for the arguments is: +:: Boolean: --option +:: Value: --optionvalue +:: with being: space, colon, semicolon or equal sign +:: +:: Command line usage example: +:: my-batch-file.bat --build --type=release --version 123 +:: It will create 3 variables: +:: 'build' with the value 'true' +:: 'type' with the value 'release' +:: 'version' with the value '123' +:: +:: Usage: +:: set "build=" +:: set "type=" +:: set "version=" +:: +:: REM Parse arguments. +:: call :parse_args %* +:: +:: if defined build ( +:: ... +:: ) +:: if %type%=='release' ( +:: ... +:: ) +:: if %version%=='123' ( +:: ... +:: ) +::============================================================================= +:parse_args + set "arg_name=" + :parse_args_start + if "%1" == "" ( + :: Set a seen boolean argument. + if "%arg_name%" neq "" ( + set "%arg_name%=true" + ) + goto :parse_args_done + ) + set aux=%1 + if "%aux:~0,2%" == "--" ( + :: Set a seen boolean argument. + if "%arg_name%" neq "" ( + set "%arg_name%=true" + ) + set "arg_name=%aux:~2,250%" + ) else ( + set "%arg_name%=%1" + set "arg_name=" + ) + shift + goto :parse_args_start + +:parse_args_done +exit /b 0 diff --git a/openmp/runtime/doc/doxygen/config b/openmp/runtime/doc/doxygen/config index 04c966766ba6ef..8d79dc143cc1a0 100644 --- a/openmp/runtime/doc/doxygen/config +++ b/openmp/runtime/doc/doxygen/config @@ -1,1822 +1,1822 @@ -# Doxyfile 1.o8.2 - -# This file describes the settings to be used by the documentation system -# doxygen (www.doxygen.org) for a project. -# -# All text after a hash (#) is considered a comment and will be ignored. -# The format is: -# TAG = value [value, ...] -# For lists items can also be appended using: -# TAG += value [value, ...] -# Values that contain spaces should be placed between quotes (" "). - -#--------------------------------------------------------------------------- -# Project related configuration options -#--------------------------------------------------------------------------- - -# This tag specifies the encoding used for all characters in the config file -# that follow. The default is UTF-8 which is also the encoding used for all -# text before the first occurrence of this tag. Doxygen uses libiconv (or the -# iconv built into libc) for the transcoding. See -# http://www.gnu.org/software/libiconv for the list of possible encodings. - -DOXYFILE_ENCODING = UTF-8 - -# The PROJECT_NAME tag is a single word (or sequence of words) that should -# identify the project. Note that if you do not use Doxywizard you need -# to put quotes around the project name if it contains spaces. - -PROJECT_NAME = "LLVM OpenMP* Runtime Library" - -# The PROJECT_NUMBER tag can be used to enter a project or revision number. -# This could be handy for archiving the generated documentation or -# if some version control system is used. - -PROJECT_NUMBER = - -# Using the PROJECT_BRIEF tag one can provide an optional one line description -# for a project that appears at the top of each page and should give viewer -# a quick idea about the purpose of the project. Keep the description short. - -PROJECT_BRIEF = - -# With the PROJECT_LOGO tag one can specify an logo or icon that is -# included in the documentation. The maximum height of the logo should not -# exceed 55 pixels and the maximum width should not exceed 200 pixels. -# Doxygen will copy the logo to the output directory. - -PROJECT_LOGO = - -# The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute) -# base path where the generated documentation will be put. -# If a relative path is entered, it will be relative to the location -# where doxygen was started. If left blank the current directory will be used. - -OUTPUT_DIRECTORY = doc/doxygen/generated - -# If the CREATE_SUBDIRS tag is set to YES, then doxygen will create -# 4096 sub-directories (in 2 levels) under the output directory of each output -# format and will distribute the generated files over these directories. -# Enabling this option can be useful when feeding doxygen a huge amount of -# source files, where putting all generated files in the same directory would -# otherwise cause performance problems for the file system. - -CREATE_SUBDIRS = NO - -# The OUTPUT_LANGUAGE tag is used to specify the language in which all -# documentation generated by doxygen is written. Doxygen will use this -# information to generate all constant output in the proper language. -# The default language is English, other supported languages are: -# Afrikaans, Arabic, Brazilian, Catalan, Chinese, Chinese-Traditional, -# Croatian, Czech, Danish, Dutch, Esperanto, Farsi, Finnish, French, German, -# Greek, Hungarian, Italian, Japanese, Japanese-en (Japanese with English -# messages), Korean, Korean-en, Lithuanian, Norwegian, Macedonian, Persian, -# Polish, Portuguese, Romanian, Russian, Serbian, Serbian-Cyrillic, Slovak, -# Slovene, Spanish, Swedish, Ukrainian, and Vietnamese. - -OUTPUT_LANGUAGE = English - -# If the BRIEF_MEMBER_DESC tag is set to YES (the default) Doxygen will -# include brief member descriptions after the members that are listed in -# the file and class documentation (similar to JavaDoc). -# Set to NO to disable this. - -BRIEF_MEMBER_DESC = YES - -# If the REPEAT_BRIEF tag is set to YES (the default) Doxygen will prepend -# the brief description of a member or function before the detailed description. -# Note: if both HIDE_UNDOC_MEMBERS and BRIEF_MEMBER_DESC are set to NO, the -# brief descriptions will be completely suppressed. - -REPEAT_BRIEF = YES - -# This tag implements a quasi-intelligent brief description abbreviator -# that is used to form the text in various listings. Each string -# in this list, if found as the leading text of the brief description, will be -# stripped from the text and the result after processing the whole list, is -# used as the annotated text. Otherwise, the brief description is used as-is. -# If left blank, the following values are used ("$name" is automatically -# replaced with the name of the entity): "The $name class" "The $name widget" -# "The $name file" "is" "provides" "specifies" "contains" -# "represents" "a" "an" "the" - -ABBREVIATE_BRIEF = - -# If the ALWAYS_DETAILED_SEC and REPEAT_BRIEF tags are both set to YES then -# Doxygen will generate a detailed section even if there is only a brief -# description. - -ALWAYS_DETAILED_SEC = NO - -# If the INLINE_INHERITED_MEMB tag is set to YES, doxygen will show all -# inherited members of a class in the documentation of that class as if those -# members were ordinary class members. Constructors, destructors and assignment -# operators of the base classes will not be shown. - -INLINE_INHERITED_MEMB = NO - -# If the FULL_PATH_NAMES tag is set to YES then Doxygen will prepend the full -# path before files name in the file list and in the header files. If set -# to NO the shortest path that makes the file name unique will be used. - -FULL_PATH_NAMES = NO - -# If the FULL_PATH_NAMES tag is set to YES then the STRIP_FROM_PATH tag -# can be used to strip a user-defined part of the path. Stripping is -# only done if one of the specified strings matches the left-hand part of -# the path. The tag can be used to show relative paths in the file list. -# If left blank the directory from which doxygen is run is used as the -# path to strip. Note that you specify absolute paths here, but also -# relative paths, which will be relative from the directory where doxygen is -# started. - -STRIP_FROM_PATH = - -# The STRIP_FROM_INC_PATH tag can be used to strip a user-defined part of -# the path mentioned in the documentation of a class, which tells -# the reader which header file to include in order to use a class. -# If left blank only the name of the header file containing the class -# definition is used. Otherwise one should specify the include paths that -# are normally passed to the compiler using the -I flag. - -STRIP_FROM_INC_PATH = - -# If the SHORT_NAMES tag is set to YES, doxygen will generate much shorter -# (but less readable) file names. This can be useful if your file system -# doesn't support long names like on DOS, Mac, or CD-ROM. - -SHORT_NAMES = NO - -# If the JAVADOC_AUTOBRIEF tag is set to YES then Doxygen -# will interpret the first line (until the first dot) of a JavaDoc-style -# comment as the brief description. If set to NO, the JavaDoc -# comments will behave just like regular Qt-style comments -# (thus requiring an explicit @brief command for a brief description.) - -JAVADOC_AUTOBRIEF = NO - -# If the QT_AUTOBRIEF tag is set to YES then Doxygen will -# interpret the first line (until the first dot) of a Qt-style -# comment as the brief description. If set to NO, the comments -# will behave just like regular Qt-style comments (thus requiring -# an explicit \brief command for a brief description.) - -QT_AUTOBRIEF = NO - -# The MULTILINE_CPP_IS_BRIEF tag can be set to YES to make Doxygen -# treat a multi-line C++ special comment block (i.e. a block of //! or /// -# comments) as a brief description. This used to be the default behaviour. -# The new default is to treat a multi-line C++ comment block as a detailed -# description. Set this tag to YES if you prefer the old behaviour instead. - -MULTILINE_CPP_IS_BRIEF = NO - -# If the INHERIT_DOCS tag is set to YES (the default) then an undocumented -# member inherits the documentation from any documented member that it -# re-implements. - -INHERIT_DOCS = YES - -# If the SEPARATE_MEMBER_PAGES tag is set to YES, then doxygen will produce -# a new page for each member. If set to NO, the documentation of a member will -# be part of the file/class/namespace that contains it. - -SEPARATE_MEMBER_PAGES = NO - -# The TAB_SIZE tag can be used to set the number of spaces in a tab. -# Doxygen uses this value to replace tabs by spaces in code fragments. - -TAB_SIZE = 8 - -# This tag can be used to specify a number of aliases that acts -# as commands in the documentation. An alias has the form "name=value". -# For example adding "sideeffect=\par Side Effects:\n" will allow you to -# put the command \sideeffect (or @sideeffect) in the documentation, which -# will result in a user-defined paragraph with heading "Side Effects:". -# You can put \n's in the value part of an alias to insert newlines. - -ALIASES = "other=*" - -# This tag can be used to specify a number of word-keyword mappings (TCL only). -# A mapping has the form "name=value". For example adding -# "class=itcl::class" will allow you to use the command class in the -# itcl::class meaning. - -TCL_SUBST = - -# Set the OPTIMIZE_OUTPUT_FOR_C tag to YES if your project consists of C -# sources only. Doxygen will then generate output that is more tailored for C. -# For instance, some of the names that are used will be different. The list -# of all members will be omitted, etc. - -OPTIMIZE_OUTPUT_FOR_C = NO - -# Set the OPTIMIZE_OUTPUT_JAVA tag to YES if your project consists of Java -# sources only. Doxygen will then generate output that is more tailored for -# Java. For instance, namespaces will be presented as packages, qualified -# scopes will look different, etc. - -OPTIMIZE_OUTPUT_JAVA = NO - -# Set the OPTIMIZE_FOR_FORTRAN tag to YES if your project consists of Fortran -# sources only. Doxygen will then generate output that is more tailored for -# Fortran. - -OPTIMIZE_FOR_FORTRAN = NO - -# Set the OPTIMIZE_OUTPUT_VHDL tag to YES if your project consists of VHDL -# sources. Doxygen will then generate output that is tailored for -# VHDL. - -OPTIMIZE_OUTPUT_VHDL = NO - -# Doxygen selects the parser to use depending on the extension of the files it -# parses. With this tag you can assign which parser to use for a given -# extension. Doxygen has a built-in mapping, but you can override or extend it -# using this tag. The format is ext=language, where ext is a file extension, -# and language is one of the parsers supported by doxygen: IDL, Java, -# Javascript, CSharp, C, C++, D, PHP, Objective-C, Python, Fortran, VHDL, C, -# C++. For instance to make doxygen treat .inc files as Fortran files (default -# is PHP), and .f files as C (default is Fortran), use: inc=Fortran f=C. Note -# that for custom extensions you also need to set FILE_PATTERNS otherwise the -# files are not read by doxygen. - -EXTENSION_MAPPING = - -# If MARKDOWN_SUPPORT is enabled (the default) then doxygen pre-processes all -# comments according to the Markdown format, which allows for more readable -# documentation. See http://daringfireball.net/projects/markdown/ for details. -# The output of markdown processing is further processed by doxygen, so you -# can mix doxygen, HTML, and XML commands with Markdown formatting. -# Disable only in case of backward compatibilities issues. - -MARKDOWN_SUPPORT = YES - -# When enabled doxygen tries to link words that correspond to documented classes, -# or namespaces to their corresponding documentation. Such a link can be -# prevented in individual cases by by putting a % sign in front of the word or -# globally by setting AUTOLINK_SUPPORT to NO. - -AUTOLINK_SUPPORT = YES - -# If you use STL classes (i.e. std::string, std::vector, etc.) but do not want -# to include (a tag file for) the STL sources as input, then you should -# set this tag to YES in order to let doxygen match functions declarations and -# definitions whose arguments contain STL classes (e.g. func(std::string); v.s. -# func(std::string) {}). This also makes the inheritance and collaboration -# diagrams that involve STL classes more complete and accurate. - -BUILTIN_STL_SUPPORT = NO - -# If you use Microsoft's C++/CLI language, you should set this option to YES to -# enable parsing support. - -CPP_CLI_SUPPORT = NO - -# Set the SIP_SUPPORT tag to YES if your project consists of sip sources only. -# Doxygen will parse them like normal C++ but will assume all classes use public -# instead of private inheritance when no explicit protection keyword is present. - -SIP_SUPPORT = NO - -# For Microsoft's IDL there are propget and propput attributes to -# indicate getter and setter methods for a property. Setting this -# option to YES (the default) will make doxygen replace the get and -# set methods by a property in the documentation. This will only work -# if the methods are indeed getting or setting a simple type. If this -# is not the case, or you want to show the methods anyway, you should -# set this option to NO. - -IDL_PROPERTY_SUPPORT = YES - -# If member grouping is used in the documentation and the DISTRIBUTE_GROUP_DOC -# tag is set to YES, then doxygen will reuse the documentation of the first -# member in the group (if any) for the other members of the group. By default -# all members of a group must be documented explicitly. - -DISTRIBUTE_GROUP_DOC = NO - -# Set the SUBGROUPING tag to YES (the default) to allow class member groups of -# the same type (for instance a group of public functions) to be put as a -# subgroup of that type (e.g. under the Public Functions section). Set it to -# NO to prevent subgrouping. Alternatively, this can be done per class using -# the \nosubgrouping command. - -SUBGROUPING = YES - -# When the INLINE_GROUPED_CLASSES tag is set to YES, classes, structs and -# unions are shown inside the group in which they are included (e.g. using -# @ingroup) instead of on a separate page (for HTML and Man pages) or -# section (for LaTeX and RTF). - -INLINE_GROUPED_CLASSES = NO - -# When the INLINE_SIMPLE_STRUCTS tag is set to YES, structs, classes, and -# unions with only public data fields will be shown inline in the documentation -# of the scope in which they are defined (i.e. file, namespace, or group -# documentation), provided this scope is documented. If set to NO (the default), -# structs, classes, and unions are shown on a separate page (for HTML and Man -# pages) or section (for LaTeX and RTF). - -INLINE_SIMPLE_STRUCTS = NO - -# When TYPEDEF_HIDES_STRUCT is enabled, a typedef of a struct, union, or enum -# is documented as struct, union, or enum with the name of the typedef. So -# typedef struct TypeS {} TypeT, will appear in the documentation as a struct -# with name TypeT. When disabled the typedef will appear as a member of a file, -# namespace, or class. And the struct will be named TypeS. This can typically -# be useful for C code in case the coding convention dictates that all compound -# types are typedef'ed and only the typedef is referenced, never the tag name. - -TYPEDEF_HIDES_STRUCT = NO - -# The SYMBOL_CACHE_SIZE determines the size of the internal cache use to -# determine which symbols to keep in memory and which to flush to disk. -# When the cache is full, less often used symbols will be written to disk. -# For small to medium size projects (<1000 input files) the default value is -# probably good enough. For larger projects a too small cache size can cause -# doxygen to be busy swapping symbols to and from disk most of the time -# causing a significant performance penalty. -# If the system has enough physical memory increasing the cache will improve the -# performance by keeping more symbols in memory. Note that the value works on -# a logarithmic scale so increasing the size by one will roughly double the -# memory usage. The cache size is given by this formula: -# 2^(16+SYMBOL_CACHE_SIZE). The valid range is 0..9, the default is 0, -# corresponding to a cache size of 2^16 = 65536 symbols. - -SYMBOL_CACHE_SIZE = 0 - -# Similar to the SYMBOL_CACHE_SIZE the size of the symbol lookup cache can be -# set using LOOKUP_CACHE_SIZE. This cache is used to resolve symbols given -# their name and scope. Since this can be an expensive process and often the -# same symbol appear multiple times in the code, doxygen keeps a cache of -# pre-resolved symbols. If the cache is too small doxygen will become slower. -# If the cache is too large, memory is wasted. The cache size is given by this -# formula: 2^(16+LOOKUP_CACHE_SIZE). The valid range is 0..9, the default is 0, -# corresponding to a cache size of 2^16 = 65536 symbols. - -LOOKUP_CACHE_SIZE = 0 - -#--------------------------------------------------------------------------- -# Build related configuration options -#--------------------------------------------------------------------------- - -# If the EXTRACT_ALL tag is set to YES doxygen will assume all entities in -# documentation are documented, even if no documentation was available. -# Private class members and static file members will be hidden unless -# the EXTRACT_PRIVATE and EXTRACT_STATIC tags are set to YES - -EXTRACT_ALL = NO - -# If the EXTRACT_PRIVATE tag is set to YES all private members of a class -# will be included in the documentation. - -EXTRACT_PRIVATE = YES - -# If the EXTRACT_PACKAGE tag is set to YES all members with package or internal -# scope will be included in the documentation. - -EXTRACT_PACKAGE = NO - -# If the EXTRACT_STATIC tag is set to YES all static members of a file -# will be included in the documentation. - -EXTRACT_STATIC = YES - -# If the EXTRACT_LOCAL_CLASSES tag is set to YES classes (and structs) -# defined locally in source files will be included in the documentation. -# If set to NO only classes defined in header files are included. - -EXTRACT_LOCAL_CLASSES = YES - -# This flag is only useful for Objective-C code. When set to YES local -# methods, which are defined in the implementation section but not in -# the interface are included in the documentation. -# If set to NO (the default) only methods in the interface are included. - -EXTRACT_LOCAL_METHODS = NO - -# If this flag is set to YES, the members of anonymous namespaces will be -# extracted and appear in the documentation as a namespace called -# 'anonymous_namespace{file}', where file will be replaced with the base -# name of the file that contains the anonymous namespace. By default -# anonymous namespaces are hidden. - -EXTRACT_ANON_NSPACES = NO - -# If the HIDE_UNDOC_MEMBERS tag is set to YES, Doxygen will hide all -# undocumented members of documented classes, files or namespaces. -# If set to NO (the default) these members will be included in the -# various overviews, but no documentation section is generated. -# This option has no effect if EXTRACT_ALL is enabled. - -HIDE_UNDOC_MEMBERS = YES - -# If the HIDE_UNDOC_CLASSES tag is set to YES, Doxygen will hide all -# undocumented classes that are normally visible in the class hierarchy. -# If set to NO (the default) these classes will be included in the various -# overviews. This option has no effect if EXTRACT_ALL is enabled. - -HIDE_UNDOC_CLASSES = YES - -# If the HIDE_FRIEND_COMPOUNDS tag is set to YES, Doxygen will hide all -# friend (class|struct|union) declarations. -# If set to NO (the default) these declarations will be included in the -# documentation. - -HIDE_FRIEND_COMPOUNDS = NO - -# If the HIDE_IN_BODY_DOCS tag is set to YES, Doxygen will hide any -# documentation blocks found inside the body of a function. -# If set to NO (the default) these blocks will be appended to the -# function's detailed documentation block. - -HIDE_IN_BODY_DOCS = NO - -# The INTERNAL_DOCS tag determines if documentation -# that is typed after a \internal command is included. If the tag is set -# to NO (the default) then the documentation will be excluded. -# Set it to YES to include the internal documentation. - -INTERNAL_DOCS = NO - -# If the CASE_SENSE_NAMES tag is set to NO then Doxygen will only generate -# file names in lower-case letters. If set to YES upper-case letters are also -# allowed. This is useful if you have classes or files whose names only differ -# in case and if your file system supports case sensitive file names. Windows -# and Mac users are advised to set this option to NO. - -CASE_SENSE_NAMES = YES - -# If the HIDE_SCOPE_NAMES tag is set to NO (the default) then Doxygen -# will show members with their full class and namespace scopes in the -# documentation. If set to YES the scope will be hidden. - -HIDE_SCOPE_NAMES = NO - -# If the SHOW_INCLUDE_FILES tag is set to YES (the default) then Doxygen -# will put a list of the files that are included by a file in the documentation -# of that file. - -SHOW_INCLUDE_FILES = YES - -# If the FORCE_LOCAL_INCLUDES tag is set to YES then Doxygen -# will list include files with double quotes in the documentation -# rather than with sharp brackets. - -FORCE_LOCAL_INCLUDES = NO - -# If the INLINE_INFO tag is set to YES (the default) then a tag [inline] -# is inserted in the documentation for inline members. - -INLINE_INFO = YES - -# If the SORT_MEMBER_DOCS tag is set to YES (the default) then doxygen -# will sort the (detailed) documentation of file and class members -# alphabetically by member name. If set to NO the members will appear in -# declaration order. - -SORT_MEMBER_DOCS = YES - -# If the SORT_BRIEF_DOCS tag is set to YES then doxygen will sort the -# brief documentation of file, namespace and class members alphabetically -# by member name. If set to NO (the default) the members will appear in -# declaration order. - -SORT_BRIEF_DOCS = NO - -# If the SORT_MEMBERS_CTORS_1ST tag is set to YES then doxygen -# will sort the (brief and detailed) documentation of class members so that -# constructors and destructors are listed first. If set to NO (the default) -# the constructors will appear in the respective orders defined by -# SORT_MEMBER_DOCS and SORT_BRIEF_DOCS. -# This tag will be ignored for brief docs if SORT_BRIEF_DOCS is set to NO -# and ignored for detailed docs if SORT_MEMBER_DOCS is set to NO. - -SORT_MEMBERS_CTORS_1ST = NO - -# If the SORT_GROUP_NAMES tag is set to YES then doxygen will sort the -# hierarchy of group names into alphabetical order. If set to NO (the default) -# the group names will appear in their defined order. - -SORT_GROUP_NAMES = NO - -# If the SORT_BY_SCOPE_NAME tag is set to YES, the class list will be -# sorted by fully-qualified names, including namespaces. If set to -# NO (the default), the class list will be sorted only by class name, -# not including the namespace part. -# Note: This option is not very useful if HIDE_SCOPE_NAMES is set to YES. -# Note: This option applies only to the class list, not to the -# alphabetical list. - -SORT_BY_SCOPE_NAME = NO - -# If the STRICT_PROTO_MATCHING option is enabled and doxygen fails to -# do proper type resolution of all parameters of a function it will reject a -# match between the prototype and the implementation of a member function even -# if there is only one candidate or it is obvious which candidate to choose -# by doing a simple string match. By disabling STRICT_PROTO_MATCHING doxygen -# will still accept a match between prototype and implementation in such cases. - -STRICT_PROTO_MATCHING = NO - -# The GENERATE_TODOLIST tag can be used to enable (YES) or -# disable (NO) the todo list. This list is created by putting \todo -# commands in the documentation. - -GENERATE_TODOLIST = YES - -# The GENERATE_TESTLIST tag can be used to enable (YES) or -# disable (NO) the test list. This list is created by putting \test -# commands in the documentation. - -GENERATE_TESTLIST = YES - -# The GENERATE_BUGLIST tag can be used to enable (YES) or -# disable (NO) the bug list. This list is created by putting \bug -# commands in the documentation. - -GENERATE_BUGLIST = YES - -# The GENERATE_DEPRECATEDLIST tag can be used to enable (YES) or -# disable (NO) the deprecated list. This list is created by putting -# \deprecated commands in the documentation. - -GENERATE_DEPRECATEDLIST= YES - -# The ENABLED_SECTIONS tag can be used to enable conditional -# documentation sections, marked by \if sectionname ... \endif. - -ENABLED_SECTIONS = - -# The MAX_INITIALIZER_LINES tag determines the maximum number of lines -# the initial value of a variable or macro consists of for it to appear in -# the documentation. If the initializer consists of more lines than specified -# here it will be hidden. Use a value of 0 to hide initializers completely. -# The appearance of the initializer of individual variables and macros in the -# documentation can be controlled using \showinitializer or \hideinitializer -# command in the documentation regardless of this setting. - -MAX_INITIALIZER_LINES = 30 - -# Set the SHOW_USED_FILES tag to NO to disable the list of files generated -# at the bottom of the documentation of classes and structs. If set to YES the -# list will mention the files that were used to generate the documentation. - -SHOW_USED_FILES = YES - -# Set the SHOW_FILES tag to NO to disable the generation of the Files page. -# This will remove the Files entry from the Quick Index and from the -# Folder Tree View (if specified). The default is YES. - -# We probably will want this, but we have no file documentation yet so it's simpler to remove -# it for now. -SHOW_FILES = NO - -# Set the SHOW_NAMESPACES tag to NO to disable the generation of the -# Namespaces page. -# This will remove the Namespaces entry from the Quick Index -# and from the Folder Tree View (if specified). The default is YES. - -SHOW_NAMESPACES = YES - -# The FILE_VERSION_FILTER tag can be used to specify a program or script that -# doxygen should invoke to get the current version for each file (typically from -# the version control system). Doxygen will invoke the program by executing (via -# popen()) the command , where is the value of -# the FILE_VERSION_FILTER tag, and is the name of an input file -# provided by doxygen. Whatever the program writes to standard output -# is used as the file version. See the manual for examples. - -FILE_VERSION_FILTER = - -# The LAYOUT_FILE tag can be used to specify a layout file which will be parsed -# by doxygen. The layout file controls the global structure of the generated -# output files in an output format independent way. To create the layout file -# that represents doxygen's defaults, run doxygen with the -l option. -# You can optionally specify a file name after the option, if omitted -# DoxygenLayout.xml will be used as the name of the layout file. - -LAYOUT_FILE = - -# The CITE_BIB_FILES tag can be used to specify one or more bib files -# containing the references data. This must be a list of .bib files. The -# .bib extension is automatically appended if omitted. Using this command -# requires the bibtex tool to be installed. See also -# http://en.wikipedia.org/wiki/BibTeX for more info. For LaTeX the style -# of the bibliography can be controlled using LATEX_BIB_STYLE. To use this -# feature you need bibtex and perl available in the search path. - -CITE_BIB_FILES = - -#--------------------------------------------------------------------------- -# configuration options related to warning and progress messages -#--------------------------------------------------------------------------- - -# The QUIET tag can be used to turn on/off the messages that are generated -# by doxygen. Possible values are YES and NO. If left blank NO is used. - -QUIET = NO - -# The WARNINGS tag can be used to turn on/off the warning messages that are -# generated by doxygen. Possible values are YES and NO. If left blank -# NO is used. - -WARNINGS = YES - -# If WARN_IF_UNDOCUMENTED is set to YES, then doxygen will generate warnings -# for undocumented members. If EXTRACT_ALL is set to YES then this flag will -# automatically be disabled. - -WARN_IF_UNDOCUMENTED = YES - -# If WARN_IF_DOC_ERROR is set to YES, doxygen will generate warnings for -# potential errors in the documentation, such as not documenting some -# parameters in a documented function, or documenting parameters that -# don't exist or using markup commands wrongly. - -WARN_IF_DOC_ERROR = YES - -# The WARN_NO_PARAMDOC option can be enabled to get warnings for -# functions that are documented, but have no documentation for their parameters -# or return value. If set to NO (the default) doxygen will only warn about -# wrong or incomplete parameter documentation, but not about the absence of -# documentation. - -WARN_NO_PARAMDOC = NO - -# The WARN_FORMAT tag determines the format of the warning messages that -# doxygen can produce. The string should contain the $file, $line, and $text -# tags, which will be replaced by the file and line number from which the -# warning originated and the warning text. Optionally the format may contain -# $version, which will be replaced by the version of the file (if it could -# be obtained via FILE_VERSION_FILTER) - -WARN_FORMAT = - -# The WARN_LOGFILE tag can be used to specify a file to which warning -# and error messages should be written. If left blank the output is written -# to stderr. - -WARN_LOGFILE = - -#--------------------------------------------------------------------------- -# configuration options related to the input files -#--------------------------------------------------------------------------- - -# The INPUT tag can be used to specify the files and/or directories that contain -# documented source files. You may enter file names like "myfile.cpp" or -# directories like "/usr/src/myproject". Separate the files or directories -# with spaces. - -INPUT = src doc/doxygen/libomp_interface.h -# The ittnotify code also has doxygen documentation, but if we include it here -# it takes over from us! -# src/thirdparty/ittnotify - -# This tag can be used to specify the character encoding of the source files -# that doxygen parses. Internally doxygen uses the UTF-8 encoding, which is -# also the default input encoding. Doxygen uses libiconv (or the iconv built -# into libc) for the transcoding. See http://www.gnu.org/software/libiconv for -# the list of possible encodings. - -INPUT_ENCODING = UTF-8 - -# If the value of the INPUT tag contains directories, you can use the -# FILE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp -# and *.h) to filter out the source-files in the directories. If left -# blank the following patterns are tested: -# *.c *.cc *.cxx *.cpp *.c++ *.d *.java *.ii *.ixx *.ipp *.i++ *.inl *.h *.hh -# *.hxx *.hpp *.h++ *.idl *.odl *.cs *.php *.php3 *.inc *.m *.mm *.dox *.py -# *.f90 *.f *.for *.vhd *.vhdl - -FILE_PATTERNS = *.c *.h *.cpp -# We may also want to include the asm files with appropriate ifdef to ensure -# doxygen doesn't see the content, just the documentation... - -# The RECURSIVE tag can be used to turn specify whether or not subdirectories -# should be searched for input files as well. Possible values are YES and NO. -# If left blank NO is used. - -# Only look in the one directory. -RECURSIVE = NO - -# The EXCLUDE tag can be used to specify files and/or directories that should be -# excluded from the INPUT source files. This way you can easily exclude a -# subdirectory from a directory tree whose root is specified with the INPUT tag. -# Note that relative paths are relative to the directory from which doxygen is -# run. - -EXCLUDE = src/test-touch.c - -# The EXCLUDE_SYMLINKS tag can be used to select whether or not files or -# directories that are symbolic links (a Unix file system feature) are excluded -# from the input. - -EXCLUDE_SYMLINKS = NO - -# If the value of the INPUT tag contains directories, you can use the -# EXCLUDE_PATTERNS tag to specify one or more wildcard patterns to exclude -# certain files from those directories. Note that the wildcards are matched -# against the file with absolute path, so to exclude all test directories -# for example use the pattern */test/* - -EXCLUDE_PATTERNS = - -# The EXCLUDE_SYMBOLS tag can be used to specify one or more symbol names -# (namespaces, classes, functions, etc.) that should be excluded from the -# output. The symbol name can be a fully qualified name, a word, or if the -# wildcard * is used, a substring. Examples: ANamespace, AClass, -# AClass::ANamespace, ANamespace::*Test - -EXCLUDE_SYMBOLS = - -# The EXAMPLE_PATH tag can be used to specify one or more files or -# directories that contain example code fragments that are included (see -# the \include command). - -EXAMPLE_PATH = - -# If the value of the EXAMPLE_PATH tag contains directories, you can use the -# EXAMPLE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp -# and *.h) to filter out the source-files in the directories. If left -# blank all files are included. - -EXAMPLE_PATTERNS = - -# If the EXAMPLE_RECURSIVE tag is set to YES then subdirectories will be -# searched for input files to be used with the \include or \dontinclude -# commands irrespective of the value of the RECURSIVE tag. -# Possible values are YES and NO. If left blank NO is used. - -EXAMPLE_RECURSIVE = NO - -# The IMAGE_PATH tag can be used to specify one or more files or -# directories that contain image that are included in the documentation (see -# the \image command). - -IMAGE_PATH = - -# The INPUT_FILTER tag can be used to specify a program that doxygen should -# invoke to filter for each input file. Doxygen will invoke the filter program -# by executing (via popen()) the command , where -# is the value of the INPUT_FILTER tag, and is the name of an -# input file. Doxygen will then use the output that the filter program writes -# to standard output. -# If FILTER_PATTERNS is specified, this tag will be -# ignored. - -INPUT_FILTER = - -# The FILTER_PATTERNS tag can be used to specify filters on a per file pattern -# basis. -# Doxygen will compare the file name with each pattern and apply the -# filter if there is a match. -# The filters are a list of the form: -# pattern=filter (like *.cpp=my_cpp_filter). See INPUT_FILTER for further -# info on how filters are used. If FILTER_PATTERNS is empty or if -# non of the patterns match the file name, INPUT_FILTER is applied. - -FILTER_PATTERNS = - -# If the FILTER_SOURCE_FILES tag is set to YES, the input filter (if set using -# INPUT_FILTER) will be used to filter the input files when producing source -# files to browse (i.e. when SOURCE_BROWSER is set to YES). - -FILTER_SOURCE_FILES = NO - -# The FILTER_SOURCE_PATTERNS tag can be used to specify source filters per file -# pattern. A pattern will override the setting for FILTER_PATTERN (if any) -# and it is also possible to disable source filtering for a specific pattern -# using *.ext= (so without naming a filter). This option only has effect when -# FILTER_SOURCE_FILES is enabled. - -FILTER_SOURCE_PATTERNS = - -#--------------------------------------------------------------------------- -# configuration options related to source browsing -#--------------------------------------------------------------------------- - -# If the SOURCE_BROWSER tag is set to YES then a list of source files will -# be generated. Documented entities will be cross-referenced with these sources. -# Note: To get rid of all source code in the generated output, make sure also -# VERBATIM_HEADERS is set to NO. - -SOURCE_BROWSER = YES - -# Setting the INLINE_SOURCES tag to YES will include the body -# of functions and classes directly in the documentation. - -INLINE_SOURCES = NO - -# Setting the STRIP_CODE_COMMENTS tag to YES (the default) will instruct -# doxygen to hide any special comment blocks from generated source code -# fragments. Normal C, C++ and Fortran comments will always remain visible. - -STRIP_CODE_COMMENTS = YES - -# If the REFERENCED_BY_RELATION tag is set to YES -# then for each documented function all documented -# functions referencing it will be listed. - -REFERENCED_BY_RELATION = YES - -# If the REFERENCES_RELATION tag is set to YES -# then for each documented function all documented entities -# called/used by that function will be listed. - -REFERENCES_RELATION = NO - -# If the REFERENCES_LINK_SOURCE tag is set to YES (the default) -# and SOURCE_BROWSER tag is set to YES, then the hyperlinks from -# functions in REFERENCES_RELATION and REFERENCED_BY_RELATION lists will -# link to the source code. -# Otherwise they will link to the documentation. - -REFERENCES_LINK_SOURCE = YES - -# If the USE_HTAGS tag is set to YES then the references to source code -# will point to the HTML generated by the htags(1) tool instead of doxygen -# built-in source browser. The htags tool is part of GNU's global source -# tagging system (see http://www.gnu.org/software/global/global.html). You -# will need version 4.8.6 or higher. - -USE_HTAGS = NO - -# If the VERBATIM_HEADERS tag is set to YES (the default) then Doxygen -# will generate a verbatim copy of the header file for each class for -# which an include is specified. Set to NO to disable this. - -VERBATIM_HEADERS = YES - -#--------------------------------------------------------------------------- -# configuration options related to the alphabetical class index -#--------------------------------------------------------------------------- - -# If the ALPHABETICAL_INDEX tag is set to YES, an alphabetical index -# of all compounds will be generated. Enable this if the project -# contains a lot of classes, structs, unions or interfaces. - -ALPHABETICAL_INDEX = YES - -# If the alphabetical index is enabled (see ALPHABETICAL_INDEX) then -# the COLS_IN_ALPHA_INDEX tag can be used to specify the number of columns -# in which this list will be split (can be a number in the range [1..20]) - -COLS_IN_ALPHA_INDEX = 5 - -# In case all classes in a project start with a common prefix, all -# classes will be put under the same header in the alphabetical index. -# The IGNORE_PREFIX tag can be used to specify one or more prefixes that -# should be ignored while generating the index headers. - -IGNORE_PREFIX = - -#--------------------------------------------------------------------------- -# configuration options related to the HTML output -#--------------------------------------------------------------------------- - -# If the GENERATE_HTML tag is set to YES (the default) Doxygen will -# generate HTML output. - -GENERATE_HTML = YES - -# The HTML_OUTPUT tag is used to specify where the HTML docs will be put. -# If a relative path is entered the value of OUTPUT_DIRECTORY will be -# put in front of it. If left blank `html' will be used as the default path. - -HTML_OUTPUT = - -# The HTML_FILE_EXTENSION tag can be used to specify the file extension for -# each generated HTML page (for example: .htm,.php,.asp). If it is left blank -# doxygen will generate files with .html extension. - -HTML_FILE_EXTENSION = .html - -# The HTML_HEADER tag can be used to specify a personal HTML header for -# each generated HTML page. If it is left blank doxygen will generate a -# standard header. Note that when using a custom header you are responsible -# for the proper inclusion of any scripts and style sheets that doxygen -# needs, which is dependent on the configuration options used. -# It is advised to generate a default header using "doxygen -w html -# header.html footer.html stylesheet.css YourConfigFile" and then modify -# that header. Note that the header is subject to change so you typically -# have to redo this when upgrading to a newer version of doxygen or when -# changing the value of configuration settings such as GENERATE_TREEVIEW! - -HTML_HEADER = - -# The HTML_FOOTER tag can be used to specify a personal HTML footer for -# each generated HTML page. If it is left blank doxygen will generate a -# standard footer. - -HTML_FOOTER = - -# The HTML_STYLESHEET tag can be used to specify a user-defined cascading -# style sheet that is used by each HTML page. It can be used to -# fine-tune the look of the HTML output. If left blank doxygen will -# generate a default style sheet. Note that it is recommended to use -# HTML_EXTRA_STYLESHEET instead of this one, as it is more robust and this -# tag will in the future become obsolete. - -HTML_STYLESHEET = - -# The HTML_EXTRA_STYLESHEET tag can be used to specify an additional -# user-defined cascading style sheet that is included after the standard -# style sheets created by doxygen. Using this option one can overrule -# certain style aspects. This is preferred over using HTML_STYLESHEET -# since it does not replace the standard style sheet and is therefor more -# robust against future updates. Doxygen will copy the style sheet file to -# the output directory. - -HTML_EXTRA_STYLESHEET = - -# The HTML_EXTRA_FILES tag can be used to specify one or more extra images or -# other source files which should be copied to the HTML output directory. Note -# that these files will be copied to the base HTML output directory. Use the -# $relpath$ marker in the HTML_HEADER and/or HTML_FOOTER files to load these -# files. In the HTML_STYLESHEET file, use the file name only. Also note that -# the files will be copied as-is; there are no commands or markers available. - -HTML_EXTRA_FILES = - -# The HTML_COLORSTYLE_HUE tag controls the color of the HTML output. -# Doxygen will adjust the colors in the style sheet and background images -# according to this color. Hue is specified as an angle on a colorwheel, -# see http://en.wikipedia.org/wiki/Hue for more information. -# For instance the value 0 represents red, 60 is yellow, 120 is green, -# 180 is cyan, 240 is blue, 300 purple, and 360 is red again. -# The allowed range is 0 to 359. - -HTML_COLORSTYLE_HUE = 220 - -# The HTML_COLORSTYLE_SAT tag controls the purity (or saturation) of -# the colors in the HTML output. For a value of 0 the output will use -# grayscales only. A value of 255 will produce the most vivid colors. - -HTML_COLORSTYLE_SAT = 100 - -# The HTML_COLORSTYLE_GAMMA tag controls the gamma correction applied to -# the luminance component of the colors in the HTML output. Values below -# 100 gradually make the output lighter, whereas values above 100 make -# the output darker. The value divided by 100 is the actual gamma applied, -# so 80 represents a gamma of 0.8, The value 220 represents a gamma of 2.2, -# and 100 does not change the gamma. - -HTML_COLORSTYLE_GAMMA = 80 - -# If the HTML_TIMESTAMP tag is set to YES then the footer of each generated HTML -# page will contain the date and time when the page was generated. Setting -# this to NO can help when comparing the output of multiple runs. - -HTML_TIMESTAMP = NO - -# If the HTML_DYNAMIC_SECTIONS tag is set to YES then the generated HTML -# documentation will contain sections that can be hidden and shown after the -# page has loaded. - -HTML_DYNAMIC_SECTIONS = NO - -# With HTML_INDEX_NUM_ENTRIES one can control the preferred number of -# entries shown in the various tree structured indices initially; the user -# can expand and collapse entries dynamically later on. Doxygen will expand -# the tree to such a level that at most the specified number of entries are -# visible (unless a fully collapsed tree already exceeds this amount). -# So setting the number of entries 1 will produce a full collapsed tree by -# default. 0 is a special value representing an infinite number of entries -# and will result in a full expanded tree by default. - -HTML_INDEX_NUM_ENTRIES = 100 - -# If the GENERATE_DOCSET tag is set to YES, additional index files -# will be generated that can be used as input for Apple's Xcode 3 -# integrated development environment, introduced with OSX 10.5 (Leopard). -# To create a documentation set, doxygen will generate a Makefile in the -# HTML output directory. Running make will produce the docset in that -# directory and running "make install" will install the docset in -# ~/Library/Developer/Shared/Documentation/DocSets so that Xcode will find -# it at startup. -# See http://developer.apple.com/tools/creatingdocsetswithdoxygen.html -# for more information. - -GENERATE_DOCSET = NO - -# When GENERATE_DOCSET tag is set to YES, this tag determines the name of the -# feed. A documentation feed provides an umbrella under which multiple -# documentation sets from a single provider (such as a company or product suite) -# can be grouped. - -DOCSET_FEEDNAME = "Doxygen generated docs" - -# When GENERATE_DOCSET tag is set to YES, this tag specifies a string that -# should uniquely identify the documentation set bundle. This should be a -# reverse domain-name style string, e.g. com.mycompany.MyDocSet. Doxygen -# will append .docset to the name. - -DOCSET_BUNDLE_ID = org.doxygen.Project - -# When GENERATE_PUBLISHER_ID tag specifies a string that should uniquely -# identify the documentation publisher. This should be a reverse domain-name -# style string, e.g. com.mycompany.MyDocSet.documentation. - -DOCSET_PUBLISHER_ID = org.doxygen.Publisher - -# The GENERATE_PUBLISHER_NAME tag identifies the documentation publisher. - -DOCSET_PUBLISHER_NAME = Publisher - -# If the GENERATE_HTMLHELP tag is set to YES, additional index files -# will be generated that can be used as input for tools like the -# Microsoft HTML help workshop to generate a compiled HTML help file (.chm) -# of the generated HTML documentation. - -GENERATE_HTMLHELP = NO - -# If the GENERATE_HTMLHELP tag is set to YES, the CHM_FILE tag can -# be used to specify the file name of the resulting .chm file. You -# can add a path in front of the file if the result should not be -# written to the html output directory. - -CHM_FILE = - -# If the GENERATE_HTMLHELP tag is set to YES, the HHC_LOCATION tag can -# be used to specify the location (absolute path including file name) of -# the HTML help compiler (hhc.exe). If non-empty doxygen will try to run -# the HTML help compiler on the generated index.hhp. - -HHC_LOCATION = - -# If the GENERATE_HTMLHELP tag is set to YES, the GENERATE_CHI flag -# controls if a separate .chi index file is generated (YES) or that -# it should be included in the main .chm file (NO). - -GENERATE_CHI = NO - -# If the GENERATE_HTMLHELP tag is set to YES, the CHM_INDEX_ENCODING -# is used to encode HtmlHelp index (hhk), content (hhc) and project file -# content. - -CHM_INDEX_ENCODING = - -# If the GENERATE_HTMLHELP tag is set to YES, the BINARY_TOC flag -# controls whether a binary table of contents is generated (YES) or a -# normal table of contents (NO) in the .chm file. - -BINARY_TOC = NO - -# The TOC_EXPAND flag can be set to YES to add extra items for group members -# to the contents of the HTML help documentation and to the tree view. - -TOC_EXPAND = NO - -# If the GENERATE_QHP tag is set to YES and both QHP_NAMESPACE and -# QHP_VIRTUAL_FOLDER are set, an additional index file will be generated -# that can be used as input for Qt's qhelpgenerator to generate a -# Qt Compressed Help (.qch) of the generated HTML documentation. - -GENERATE_QHP = NO - -# If the QHG_LOCATION tag is specified, the QCH_FILE tag can -# be used to specify the file name of the resulting .qch file. -# The path specified is relative to the HTML output folder. - -QCH_FILE = - -# The QHP_NAMESPACE tag specifies the namespace to use when generating -# Qt Help Project output. For more information please see -# http://doc.trolltech.com/qthelpproject.html#namespace - -QHP_NAMESPACE = org.doxygen.Project - -# The QHP_VIRTUAL_FOLDER tag specifies the namespace to use when generating -# Qt Help Project output. For more information please see -# http://doc.trolltech.com/qthelpproject.html#virtual-folders - -QHP_VIRTUAL_FOLDER = doc - -# If QHP_CUST_FILTER_NAME is set, it specifies the name of a custom filter to -# add. For more information please see -# http://doc.trolltech.com/qthelpproject.html#custom-filters - -QHP_CUST_FILTER_NAME = - -# The QHP_CUST_FILT_ATTRS tag specifies the list of the attributes of the -# custom filter to add. For more information please see -# -# Qt Help Project / Custom Filters. - -QHP_CUST_FILTER_ATTRS = - -# The QHP_SECT_FILTER_ATTRS tag specifies the list of the attributes this -# project's -# filter section matches. -# -# Qt Help Project / Filter Attributes. - -QHP_SECT_FILTER_ATTRS = - -# If the GENERATE_QHP tag is set to YES, the QHG_LOCATION tag can -# be used to specify the location of Qt's qhelpgenerator. -# If non-empty doxygen will try to run qhelpgenerator on the generated -# .qhp file. - -QHG_LOCATION = - -# If the GENERATE_ECLIPSEHELP tag is set to YES, additional index files -# will be generated, which together with the HTML files, form an Eclipse help -# plugin. To install this plugin and make it available under the help contents -# menu in Eclipse, the contents of the directory containing the HTML and XML -# files needs to be copied into the plugins directory of eclipse. The name of -# the directory within the plugins directory should be the same as -# the ECLIPSE_DOC_ID value. After copying Eclipse needs to be restarted before -# the help appears. - -GENERATE_ECLIPSEHELP = NO - -# A unique identifier for the eclipse help plugin. When installing the plugin -# the directory name containing the HTML and XML files should also have -# this name. - -ECLIPSE_DOC_ID = org.doxygen.Project - -# The DISABLE_INDEX tag can be used to turn on/off the condensed index (tabs) -# at top of each HTML page. The value NO (the default) enables the index and -# the value YES disables it. Since the tabs have the same information as the -# navigation tree you can set this option to NO if you already set -# GENERATE_TREEVIEW to YES. - -DISABLE_INDEX = NO - -# The GENERATE_TREEVIEW tag is used to specify whether a tree-like index -# structure should be generated to display hierarchical information. -# If the tag value is set to YES, a side panel will be generated -# containing a tree-like index structure (just like the one that -# is generated for HTML Help). For this to work a browser that supports -# JavaScript, DHTML, CSS and frames is required (i.e. any modern browser). -# Windows users are probably better off using the HTML help feature. -# Since the tree basically has the same information as the tab index you -# could consider to set DISABLE_INDEX to NO when enabling this option. - -GENERATE_TREEVIEW = NO - -# The ENUM_VALUES_PER_LINE tag can be used to set the number of enum values -# (range [0,1..20]) that doxygen will group on one line in the generated HTML -# documentation. Note that a value of 0 will completely suppress the enum -# values from appearing in the overview section. - -ENUM_VALUES_PER_LINE = 4 - -# If the treeview is enabled (see GENERATE_TREEVIEW) then this tag can be -# used to set the initial width (in pixels) of the frame in which the tree -# is shown. - -TREEVIEW_WIDTH = 250 - -# When the EXT_LINKS_IN_WINDOW option is set to YES doxygen will open -# links to external symbols imported via tag files in a separate window. - -EXT_LINKS_IN_WINDOW = NO - -# Use this tag to change the font size of Latex formulas included -# as images in the HTML documentation. The default is 10. Note that -# when you change the font size after a successful doxygen run you need -# to manually remove any form_*.png images from the HTML output directory -# to force them to be regenerated. - -FORMULA_FONTSIZE = 10 - -# Use the FORMULA_TRANPARENT tag to determine whether or not the images -# generated for formulas are transparent PNGs. Transparent PNGs are -# not supported properly for IE 6.0, but are supported on all modern browsers. -# Note that when changing this option you need to delete any form_*.png files -# in the HTML output before the changes have effect. - -FORMULA_TRANSPARENT = YES - -# Enable the USE_MATHJAX option to render LaTeX formulas using MathJax -# (see http://www.mathjax.org) which uses client side Javascript for the -# rendering instead of using prerendered bitmaps. Use this if you do not -# have LaTeX installed or if you want to formulas look prettier in the HTML -# output. When enabled you may also need to install MathJax separately and -# configure the path to it using the MATHJAX_RELPATH option. - -USE_MATHJAX = NO - -# When MathJax is enabled you need to specify the location relative to the -# HTML output directory using the MATHJAX_RELPATH option. The destination -# directory should contain the MathJax.js script. For instance, if the mathjax -# directory is located at the same level as the HTML output directory, then -# MATHJAX_RELPATH should be ../mathjax. The default value points to -# the MathJax Content Delivery Network so you can quickly see the result without -# installing MathJax. -# However, it is strongly recommended to install a local -# copy of MathJax from http://www.mathjax.org before deployment. - -MATHJAX_RELPATH = http://cdn.mathjax.org/mathjax/latest - -# The MATHJAX_EXTENSIONS tag can be used to specify one or MathJax extension -# names that should be enabled during MathJax rendering. - -MATHJAX_EXTENSIONS = - -# When the SEARCHENGINE tag is enabled doxygen will generate a search box -# for the HTML output. The underlying search engine uses javascript -# and DHTML and should work on any modern browser. Note that when using -# HTML help (GENERATE_HTMLHELP), Qt help (GENERATE_QHP), or docsets -# (GENERATE_DOCSET) there is already a search function so this one should -# typically be disabled. For large projects the javascript based search engine -# can be slow, then enabling SERVER_BASED_SEARCH may provide a better solution. - -SEARCHENGINE = YES - -# When the SERVER_BASED_SEARCH tag is enabled the search engine will be -# implemented using a PHP enabled web server instead of at the web client -# using Javascript. Doxygen will generate the search PHP script and index -# file to put on the web server. The advantage of the server -# based approach is that it scales better to large projects and allows -# full text search. The disadvantages are that it is more difficult to setup -# and does not have live searching capabilities. - -SERVER_BASED_SEARCH = NO - -#--------------------------------------------------------------------------- -# configuration options related to the LaTeX output -#--------------------------------------------------------------------------- - -# If the GENERATE_LATEX tag is set to YES (the default) Doxygen will -# generate Latex output. - -GENERATE_LATEX = YES - -# The LATEX_OUTPUT tag is used to specify where the LaTeX docs will be put. -# If a relative path is entered the value of OUTPUT_DIRECTORY will be -# put in front of it. If left blank `latex' will be used as the default path. - -LATEX_OUTPUT = - -# The LATEX_CMD_NAME tag can be used to specify the LaTeX command name to be -# invoked. If left blank `latex' will be used as the default command name. -# Note that when enabling USE_PDFLATEX this option is only used for -# generating bitmaps for formulas in the HTML output, but not in the -# Makefile that is written to the output directory. - -LATEX_CMD_NAME = latex - -# The MAKEINDEX_CMD_NAME tag can be used to specify the command name to -# generate index for LaTeX. If left blank `makeindex' will be used as the -# default command name. - -MAKEINDEX_CMD_NAME = makeindex - -# If the COMPACT_LATEX tag is set to YES Doxygen generates more compact -# LaTeX documents. This may be useful for small projects and may help to -# save some trees in general. - -COMPACT_LATEX = NO - -# The PAPER_TYPE tag can be used to set the paper type that is used -# by the printer. Possible values are: a4, letter, legal and -# executive. If left blank a4wide will be used. - -PAPER_TYPE = a4wide - -# The EXTRA_PACKAGES tag can be to specify one or more names of LaTeX -# packages that should be included in the LaTeX output. - -EXTRA_PACKAGES = - -# The LATEX_HEADER tag can be used to specify a personal LaTeX header for -# the generated latex document. The header should contain everything until -# the first chapter. If it is left blank doxygen will generate a -# standard header. Notice: only use this tag if you know what you are doing! - -LATEX_HEADER = doc/doxygen/header.tex - -# The LATEX_FOOTER tag can be used to specify a personal LaTeX footer for -# the generated latex document. The footer should contain everything after -# the last chapter. If it is left blank doxygen will generate a -# standard footer. Notice: only use this tag if you know what you are doing! - -LATEX_FOOTER = - -# If the PDF_HYPERLINKS tag is set to YES, the LaTeX that is generated -# is prepared for conversion to pdf (using ps2pdf). The pdf file will -# contain links (just like the HTML output) instead of page references -# This makes the output suitable for online browsing using a pdf viewer. - -PDF_HYPERLINKS = YES - -# If the USE_PDFLATEX tag is set to YES, pdflatex will be used instead of -# plain latex in the generated Makefile. Set this option to YES to get a -# higher quality PDF documentation. - -USE_PDFLATEX = YES - -# If the LATEX_BATCHMODE tag is set to YES, doxygen will add the \\batchmode. -# command to the generated LaTeX files. This will instruct LaTeX to keep -# running if errors occur, instead of asking the user for help. -# This option is also used when generating formulas in HTML. - -LATEX_BATCHMODE = NO - -# If LATEX_HIDE_INDICES is set to YES then doxygen will not -# include the index chapters (such as File Index, Compound Index, etc.) -# in the output. - -LATEX_HIDE_INDICES = NO - -# If LATEX_SOURCE_CODE is set to YES then doxygen will include -# source code with syntax highlighting in the LaTeX output. -# Note that which sources are shown also depends on other settings -# such as SOURCE_BROWSER. - -LATEX_SOURCE_CODE = NO - -# The LATEX_BIB_STYLE tag can be used to specify the style to use for the -# bibliography, e.g. plainnat, or ieeetr. The default style is "plain". See -# http://en.wikipedia.org/wiki/BibTeX for more info. - -LATEX_BIB_STYLE = plain - -#--------------------------------------------------------------------------- -# configuration options related to the RTF output -#--------------------------------------------------------------------------- - -# If the GENERATE_RTF tag is set to YES Doxygen will generate RTF output -# The RTF output is optimized for Word 97 and may not look very pretty with -# other RTF readers or editors. - -GENERATE_RTF = NO - -# The RTF_OUTPUT tag is used to specify where the RTF docs will be put. -# If a relative path is entered the value of OUTPUT_DIRECTORY will be -# put in front of it. If left blank `rtf' will be used as the default path. - -RTF_OUTPUT = - -# If the COMPACT_RTF tag is set to YES Doxygen generates more compact -# RTF documents. This may be useful for small projects and may help to -# save some trees in general. - -COMPACT_RTF = NO - -# If the RTF_HYPERLINKS tag is set to YES, the RTF that is generated -# will contain hyperlink fields. The RTF file will -# contain links (just like the HTML output) instead of page references. -# This makes the output suitable for online browsing using WORD or other -# programs which support those fields. -# Note: wordpad (write) and others do not support links. - -RTF_HYPERLINKS = NO - -# Load style sheet definitions from file. Syntax is similar to doxygen's -# config file, i.e. a series of assignments. You only have to provide -# replacements, missing definitions are set to their default value. - -RTF_STYLESHEET_FILE = - -# Set optional variables used in the generation of an rtf document. -# Syntax is similar to doxygen's config file. - -RTF_EXTENSIONS_FILE = - -#--------------------------------------------------------------------------- -# configuration options related to the man page output -#--------------------------------------------------------------------------- - -# If the GENERATE_MAN tag is set to YES (the default) Doxygen will -# generate man pages - -GENERATE_MAN = NO - -# The MAN_OUTPUT tag is used to specify where the man pages will be put. -# If a relative path is entered the value of OUTPUT_DIRECTORY will be -# put in front of it. If left blank `man' will be used as the default path. - -MAN_OUTPUT = - -# The MAN_EXTENSION tag determines the extension that is added to -# the generated man pages (default is the subroutine's section .3) - -MAN_EXTENSION = - -# If the MAN_LINKS tag is set to YES and Doxygen generates man output, -# then it will generate one additional man file for each entity -# documented in the real man page(s). These additional files -# only source the real man page, but without them the man command -# would be unable to find the correct page. The default is NO. - -MAN_LINKS = NO - -#--------------------------------------------------------------------------- -# configuration options related to the XML output -#--------------------------------------------------------------------------- - -# If the GENERATE_XML tag is set to YES Doxygen will -# generate an XML file that captures the structure of -# the code including all documentation. - -GENERATE_XML = NO - -# The XML_OUTPUT tag is used to specify where the XML pages will be put. -# If a relative path is entered the value of OUTPUT_DIRECTORY will be -# put in front of it. If left blank `xml' will be used as the default path. - -XML_OUTPUT = xml - -# The XML_SCHEMA tag can be used to specify an XML schema, -# which can be used by a validating XML parser to check the -# syntax of the XML files. - -XML_SCHEMA = - -# The XML_DTD tag can be used to specify an XML DTD, -# which can be used by a validating XML parser to check the -# syntax of the XML files. - -XML_DTD = - -# If the XML_PROGRAMLISTING tag is set to YES Doxygen will -# dump the program listings (including syntax highlighting -# and cross-referencing information) to the XML output. Note that -# enabling this will significantly increase the size of the XML output. - -XML_PROGRAMLISTING = YES - -#--------------------------------------------------------------------------- -# configuration options for the AutoGen Definitions output -#--------------------------------------------------------------------------- - -# If the GENERATE_AUTOGEN_DEF tag is set to YES Doxygen will -# generate an AutoGen Definitions (see autogen.sf.net) file -# that captures the structure of the code including all -# documentation. Note that this feature is still experimental -# and incomplete at the moment. - -GENERATE_AUTOGEN_DEF = NO - -#--------------------------------------------------------------------------- -# configuration options related to the Perl module output -#--------------------------------------------------------------------------- - -# If the GENERATE_PERLMOD tag is set to YES Doxygen will -# generate a Perl module file that captures the structure of -# the code including all documentation. Note that this -# feature is still experimental and incomplete at the -# moment. - -GENERATE_PERLMOD = NO - -# If the PERLMOD_LATEX tag is set to YES Doxygen will generate -# the necessary Makefile rules, Perl scripts and LaTeX code to be able -# to generate PDF and DVI output from the Perl module output. - -PERLMOD_LATEX = NO - -# If the PERLMOD_PRETTY tag is set to YES the Perl module output will be -# nicely formatted so it can be parsed by a human reader. -# This is useful -# if you want to understand what is going on. -# On the other hand, if this -# tag is set to NO the size of the Perl module output will be much smaller -# and Perl will parse it just the same. - -PERLMOD_PRETTY = YES - -# The names of the make variables in the generated doxyrules.make file -# are prefixed with the string contained in PERLMOD_MAKEVAR_PREFIX. -# This is useful so different doxyrules.make files included by the same -# Makefile don't overwrite each other's variables. - -PERLMOD_MAKEVAR_PREFIX = - -#--------------------------------------------------------------------------- -# Configuration options related to the preprocessor -#--------------------------------------------------------------------------- - -# If the ENABLE_PREPROCESSING tag is set to YES (the default) Doxygen will -# evaluate all C-preprocessor directives found in the sources and include -# files. - -ENABLE_PREPROCESSING = YES - -# If the MACRO_EXPANSION tag is set to YES Doxygen will expand all macro -# names in the source code. If set to NO (the default) only conditional -# compilation will be performed. Macro expansion can be done in a controlled -# way by setting EXPAND_ONLY_PREDEF to YES. - -MACRO_EXPANSION = YES - -# If the EXPAND_ONLY_PREDEF and MACRO_EXPANSION tags are both set to YES -# then the macro expansion is limited to the macros specified with the -# PREDEFINED and EXPAND_AS_DEFINED tags. - -EXPAND_ONLY_PREDEF = YES - -# If the SEARCH_INCLUDES tag is set to YES (the default) the includes files -# pointed to by INCLUDE_PATH will be searched when a #include is found. - -SEARCH_INCLUDES = YES - -# The INCLUDE_PATH tag can be used to specify one or more directories that -# contain include files that are not input files but should be processed by -# the preprocessor. - -INCLUDE_PATH = - -# You can use the INCLUDE_FILE_PATTERNS tag to specify one or more wildcard -# patterns (like *.h and *.hpp) to filter out the header-files in the -# directories. If left blank, the patterns specified with FILE_PATTERNS will -# be used. - -INCLUDE_FILE_PATTERNS = - -# The PREDEFINED tag can be used to specify one or more macro names that -# are defined before the preprocessor is started (similar to the -D option of -# gcc). The argument of the tag is a list of macros of the form: name -# or name=definition (no spaces). If the definition and the = are -# omitted =1 is assumed. To prevent a macro definition from being -# undefined via #undef or recursively expanded use the := operator -# instead of the = operator. - -PREDEFINED = OMP_30_ENABLED=1, OMP_40_ENABLED=1, KMP_STATS_ENABLED=1 - -# If the MACRO_EXPANSION and EXPAND_ONLY_PREDEF tags are set to YES then -# this tag can be used to specify a list of macro names that should be expanded. -# The macro definition that is found in the sources will be used. -# Use the PREDEFINED tag if you want to use a different macro definition that -# overrules the definition found in the source code. - -EXPAND_AS_DEFINED = - -# If the SKIP_FUNCTION_MACROS tag is set to YES (the default) then -# doxygen's preprocessor will remove all references to function-like macros -# that are alone on a line, have an all uppercase name, and do not end with a -# semicolon, because these will confuse the parser if not removed. - -SKIP_FUNCTION_MACROS = YES - -#--------------------------------------------------------------------------- -# Configuration::additions related to external references -#--------------------------------------------------------------------------- - -# The TAGFILES option can be used to specify one or more tagfiles. For each -# tag file the location of the external documentation should be added. The -# format of a tag file without this location is as follows: -# -# TAGFILES = file1 file2 ... -# Adding location for the tag files is done as follows: -# -# TAGFILES = file1=loc1 "file2 = loc2" ... -# where "loc1" and "loc2" can be relative or absolute paths -# or URLs. Note that each tag file must have a unique name (where the name does -# NOT include the path). If a tag file is not located in the directory in which -# doxygen is run, you must also specify the path to the tagfile here. - -TAGFILES = - -# When a file name is specified after GENERATE_TAGFILE, doxygen will create -# a tag file that is based on the input files it reads. - -GENERATE_TAGFILE = - -# If the ALLEXTERNALS tag is set to YES all external classes will be listed -# in the class index. If set to NO only the inherited external classes -# will be listed. - -ALLEXTERNALS = NO - -# If the EXTERNAL_GROUPS tag is set to YES all external groups will be listed -# in the modules index. If set to NO, only the current project's groups will -# be listed. - -EXTERNAL_GROUPS = YES - -# The PERL_PATH should be the absolute path and name of the perl script -# interpreter (i.e. the result of `which perl'). - -PERL_PATH = - -#--------------------------------------------------------------------------- -# Configuration options related to the dot tool -#--------------------------------------------------------------------------- - -# If the CLASS_DIAGRAMS tag is set to YES (the default) Doxygen will -# generate a inheritance diagram (in HTML, RTF and LaTeX) for classes with base -# or super classes. Setting the tag to NO turns the diagrams off. Note that -# this option also works with HAVE_DOT disabled, but it is recommended to -# install and use dot, since it yields more powerful graphs. - -CLASS_DIAGRAMS = YES - -# You can define message sequence charts within doxygen comments using the \msc -# command. Doxygen will then run the mscgen tool (see -# http://www.mcternan.me.uk/mscgen/) to produce the chart and insert it in the -# documentation. The MSCGEN_PATH tag allows you to specify the directory where -# the mscgen tool resides. If left empty the tool is assumed to be found in the -# default search path. - -MSCGEN_PATH = - -# If set to YES, the inheritance and collaboration graphs will hide -# inheritance and usage relations if the target is undocumented -# or is not a class. - -HIDE_UNDOC_RELATIONS = YES - -# If you set the HAVE_DOT tag to YES then doxygen will assume the dot tool is -# available from the path. This tool is part of Graphviz, a graph visualization -# toolkit from AT&T and Lucent Bell Labs. The other options in this section -# have no effect if this option is set to NO (the default) - -HAVE_DOT = NO - -# The DOT_NUM_THREADS specifies the number of dot invocations doxygen is -# allowed to run in parallel. When set to 0 (the default) doxygen will -# base this on the number of processors available in the system. You can set it -# explicitly to a value larger than 0 to get control over the balance -# between CPU load and processing speed. - -DOT_NUM_THREADS = 0 - -# By default doxygen will use the Helvetica font for all dot files that -# doxygen generates. When you want a differently looking font you can specify -# the font name using DOT_FONTNAME. You need to make sure dot is able to find -# the font, which can be done by putting it in a standard location or by setting -# the DOTFONTPATH environment variable or by setting DOT_FONTPATH to the -# directory containing the font. - -DOT_FONTNAME = Helvetica - -# The DOT_FONTSIZE tag can be used to set the size of the font of dot graphs. -# The default size is 10pt. - -DOT_FONTSIZE = 10 - -# By default doxygen will tell dot to use the Helvetica font. -# If you specify a different font using DOT_FONTNAME you can use DOT_FONTPATH to -# set the path where dot can find it. - -DOT_FONTPATH = - -# If the CLASS_GRAPH and HAVE_DOT tags are set to YES then doxygen -# will generate a graph for each documented class showing the direct and -# indirect inheritance relations. Setting this tag to YES will force the -# CLASS_DIAGRAMS tag to NO. - -CLASS_GRAPH = YES - -# If the COLLABORATION_GRAPH and HAVE_DOT tags are set to YES then doxygen -# will generate a graph for each documented class showing the direct and -# indirect implementation dependencies (inheritance, containment, and -# class references variables) of the class with other documented classes. - -COLLABORATION_GRAPH = NO - -# If the GROUP_GRAPHS and HAVE_DOT tags are set to YES then doxygen -# will generate a graph for groups, showing the direct groups dependencies - -GROUP_GRAPHS = YES - -# If the UML_LOOK tag is set to YES doxygen will generate inheritance and -# collaboration diagrams in a style similar to the OMG's Unified Modeling -# Language. - -UML_LOOK = NO - -# If the UML_LOOK tag is enabled, the fields and methods are shown inside -# the class node. If there are many fields or methods and many nodes the -# graph may become too big to be useful. The UML_LIMIT_NUM_FIELDS -# threshold limits the number of items for each type to make the size more -# manageable. Set this to 0 for no limit. Note that the threshold may be -# exceeded by 50% before the limit is enforced. - -UML_LIMIT_NUM_FIELDS = 10 - -# If set to YES, the inheritance and collaboration graphs will show the -# relations between templates and their instances. - -TEMPLATE_RELATIONS = YES - -# If the ENABLE_PREPROCESSING, SEARCH_INCLUDES, INCLUDE_GRAPH, and HAVE_DOT -# tags are set to YES then doxygen will generate a graph for each documented -# file showing the direct and indirect include dependencies of the file with -# other documented files. - -INCLUDE_GRAPH = NO - -# If the ENABLE_PREPROCESSING, SEARCH_INCLUDES, INCLUDED_BY_GRAPH, and -# HAVE_DOT tags are set to YES then doxygen will generate a graph for each -# documented header file showing the documented files that directly or -# indirectly include this file. - -INCLUDED_BY_GRAPH = NO - -# If the CALL_GRAPH and HAVE_DOT options are set to YES then -# doxygen will generate a call dependency graph for every global function -# or class method. Note that enabling this option will significantly increase -# the time of a run. So in most cases it will be better to enable call graphs -# for selected functions only using the \callgraph command. - -CALL_GRAPH = NO - -# If the CALLER_GRAPH and HAVE_DOT tags are set to YES then -# doxygen will generate a caller dependency graph for every global function -# or class method. Note that enabling this option will significantly increase -# the time of a run. So in most cases it will be better to enable caller -# graphs for selected functions only using the \callergraph command. - -CALLER_GRAPH = NO - -# If the GRAPHICAL_HIERARCHY and HAVE_DOT tags are set to YES then doxygen -# will generate a graphical hierarchy of all classes instead of a textual one. - -GRAPHICAL_HIERARCHY = YES - -# If the DIRECTORY_GRAPH and HAVE_DOT tags are set to YES -# then doxygen will show the dependencies a directory has on other directories -# in a graphical way. The dependency relations are determined by the #include -# relations between the files in the directories. - -DIRECTORY_GRAPH = YES - -# The DOT_IMAGE_FORMAT tag can be used to set the image format of the images -# generated by dot. Possible values are svg, png, jpg, or gif. -# If left blank png will be used. If you choose svg you need to set -# HTML_FILE_EXTENSION to xhtml in order to make the SVG files -# visible in IE 9+ (other browsers do not have this requirement). - -DOT_IMAGE_FORMAT = png - -# If DOT_IMAGE_FORMAT is set to svg, then this option can be set to YES to -# enable generation of interactive SVG images that allow zooming and panning. -# Note that this requires a modern browser other than Internet Explorer. -# Tested and working are Firefox, Chrome, Safari, and Opera. For IE 9+ you -# need to set HTML_FILE_EXTENSION to xhtml in order to make the SVG files -# visible. Older versions of IE do not have SVG support. - -INTERACTIVE_SVG = NO - -# The tag DOT_PATH can be used to specify the path where the dot tool can be -# found. If left blank, it is assumed the dot tool can be found in the path. - -DOT_PATH = - -# The DOTFILE_DIRS tag can be used to specify one or more directories that -# contain dot files that are included in the documentation (see the -# \dotfile command). - -DOTFILE_DIRS = - -# The MSCFILE_DIRS tag can be used to specify one or more directories that -# contain msc files that are included in the documentation (see the -# \mscfile command). - -MSCFILE_DIRS = - -# The DOT_GRAPH_MAX_NODES tag can be used to set the maximum number of -# nodes that will be shown in the graph. If the number of nodes in a graph -# becomes larger than this value, doxygen will truncate the graph, which is -# visualized by representing a node as a red box. Note that doxygen if the -# number of direct children of the root node in a graph is already larger than -# DOT_GRAPH_MAX_NODES then the graph will not be shown at all. Also note -# that the size of a graph can be further restricted by MAX_DOT_GRAPH_DEPTH. - -DOT_GRAPH_MAX_NODES = 50 - -# The MAX_DOT_GRAPH_DEPTH tag can be used to set the maximum depth of the -# graphs generated by dot. A depth value of 3 means that only nodes reachable -# from the root by following a path via at most 3 edges will be shown. Nodes -# that lay further from the root node will be omitted. Note that setting this -# option to 1 or 2 may greatly reduce the computation time needed for large -# code bases. Also note that the size of a graph can be further restricted by -# DOT_GRAPH_MAX_NODES. Using a depth of 0 means no depth restriction. - -MAX_DOT_GRAPH_DEPTH = 0 - -# Set the DOT_TRANSPARENT tag to YES to generate images with a transparent -# background. This is disabled by default, because dot on Windows does not -# seem to support this out of the box. Warning: Depending on the platform used, -# enabling this option may lead to badly anti-aliased labels on the edges of -# a graph (i.e. they become hard to read). - -DOT_TRANSPARENT = NO - -# Set the DOT_MULTI_TARGETS tag to YES allow dot to generate multiple output -# files in one run (i.e. multiple -o and -T options on the command line). This -# makes dot run faster, but since only newer versions of dot (>1.8.10) -# support this, this feature is disabled by default. - -DOT_MULTI_TARGETS = NO - -# If the GENERATE_LEGEND tag is set to YES (the default) Doxygen will -# generate a legend page explaining the meaning of the various boxes and -# arrows in the dot generated graphs. - -GENERATE_LEGEND = YES - -# If the DOT_CLEANUP tag is set to YES (the default) Doxygen will -# remove the intermediate dot files that are used to generate -# the various graphs. - -DOT_CLEANUP = YES +# Doxyfile 1.o8.2 + +# This file describes the settings to be used by the documentation system +# doxygen (www.doxygen.org) for a project. +# +# All text after a hash (#) is considered a comment and will be ignored. +# The format is: +# TAG = value [value, ...] +# For lists items can also be appended using: +# TAG += value [value, ...] +# Values that contain spaces should be placed between quotes (" "). + +#--------------------------------------------------------------------------- +# Project related configuration options +#--------------------------------------------------------------------------- + +# This tag specifies the encoding used for all characters in the config file +# that follow. The default is UTF-8 which is also the encoding used for all +# text before the first occurrence of this tag. Doxygen uses libiconv (or the +# iconv built into libc) for the transcoding. See +# http://www.gnu.org/software/libiconv for the list of possible encodings. + +DOXYFILE_ENCODING = UTF-8 + +# The PROJECT_NAME tag is a single word (or sequence of words) that should +# identify the project. Note that if you do not use Doxywizard you need +# to put quotes around the project name if it contains spaces. + +PROJECT_NAME = "LLVM OpenMP* Runtime Library" + +# The PROJECT_NUMBER tag can be used to enter a project or revision number. +# This could be handy for archiving the generated documentation or +# if some version control system is used. + +PROJECT_NUMBER = + +# Using the PROJECT_BRIEF tag one can provide an optional one line description +# for a project that appears at the top of each page and should give viewer +# a quick idea about the purpose of the project. Keep the description short. + +PROJECT_BRIEF = + +# With the PROJECT_LOGO tag one can specify an logo or icon that is +# included in the documentation. The maximum height of the logo should not +# exceed 55 pixels and the maximum width should not exceed 200 pixels. +# Doxygen will copy the logo to the output directory. + +PROJECT_LOGO = + +# The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute) +# base path where the generated documentation will be put. +# If a relative path is entered, it will be relative to the location +# where doxygen was started. If left blank the current directory will be used. + +OUTPUT_DIRECTORY = doc/doxygen/generated + +# If the CREATE_SUBDIRS tag is set to YES, then doxygen will create +# 4096 sub-directories (in 2 levels) under the output directory of each output +# format and will distribute the generated files over these directories. +# Enabling this option can be useful when feeding doxygen a huge amount of +# source files, where putting all generated files in the same directory would +# otherwise cause performance problems for the file system. + +CREATE_SUBDIRS = NO + +# The OUTPUT_LANGUAGE tag is used to specify the language in which all +# documentation generated by doxygen is written. Doxygen will use this +# information to generate all constant output in the proper language. +# The default language is English, other supported languages are: +# Afrikaans, Arabic, Brazilian, Catalan, Chinese, Chinese-Traditional, +# Croatian, Czech, Danish, Dutch, Esperanto, Farsi, Finnish, French, German, +# Greek, Hungarian, Italian, Japanese, Japanese-en (Japanese with English +# messages), Korean, Korean-en, Lithuanian, Norwegian, Macedonian, Persian, +# Polish, Portuguese, Romanian, Russian, Serbian, Serbian-Cyrillic, Slovak, +# Slovene, Spanish, Swedish, Ukrainian, and Vietnamese. + +OUTPUT_LANGUAGE = English + +# If the BRIEF_MEMBER_DESC tag is set to YES (the default) Doxygen will +# include brief member descriptions after the members that are listed in +# the file and class documentation (similar to JavaDoc). +# Set to NO to disable this. + +BRIEF_MEMBER_DESC = YES + +# If the REPEAT_BRIEF tag is set to YES (the default) Doxygen will prepend +# the brief description of a member or function before the detailed description. +# Note: if both HIDE_UNDOC_MEMBERS and BRIEF_MEMBER_DESC are set to NO, the +# brief descriptions will be completely suppressed. + +REPEAT_BRIEF = YES + +# This tag implements a quasi-intelligent brief description abbreviator +# that is used to form the text in various listings. Each string +# in this list, if found as the leading text of the brief description, will be +# stripped from the text and the result after processing the whole list, is +# used as the annotated text. Otherwise, the brief description is used as-is. +# If left blank, the following values are used ("$name" is automatically +# replaced with the name of the entity): "The $name class" "The $name widget" +# "The $name file" "is" "provides" "specifies" "contains" +# "represents" "a" "an" "the" + +ABBREVIATE_BRIEF = + +# If the ALWAYS_DETAILED_SEC and REPEAT_BRIEF tags are both set to YES then +# Doxygen will generate a detailed section even if there is only a brief +# description. + +ALWAYS_DETAILED_SEC = NO + +# If the INLINE_INHERITED_MEMB tag is set to YES, doxygen will show all +# inherited members of a class in the documentation of that class as if those +# members were ordinary class members. Constructors, destructors and assignment +# operators of the base classes will not be shown. + +INLINE_INHERITED_MEMB = NO + +# If the FULL_PATH_NAMES tag is set to YES then Doxygen will prepend the full +# path before files name in the file list and in the header files. If set +# to NO the shortest path that makes the file name unique will be used. + +FULL_PATH_NAMES = NO + +# If the FULL_PATH_NAMES tag is set to YES then the STRIP_FROM_PATH tag +# can be used to strip a user-defined part of the path. Stripping is +# only done if one of the specified strings matches the left-hand part of +# the path. The tag can be used to show relative paths in the file list. +# If left blank the directory from which doxygen is run is used as the +# path to strip. Note that you specify absolute paths here, but also +# relative paths, which will be relative from the directory where doxygen is +# started. + +STRIP_FROM_PATH = + +# The STRIP_FROM_INC_PATH tag can be used to strip a user-defined part of +# the path mentioned in the documentation of a class, which tells +# the reader which header file to include in order to use a class. +# If left blank only the name of the header file containing the class +# definition is used. Otherwise one should specify the include paths that +# are normally passed to the compiler using the -I flag. + +STRIP_FROM_INC_PATH = + +# If the SHORT_NAMES tag is set to YES, doxygen will generate much shorter +# (but less readable) file names. This can be useful if your file system +# doesn't support long names like on DOS, Mac, or CD-ROM. + +SHORT_NAMES = NO + +# If the JAVADOC_AUTOBRIEF tag is set to YES then Doxygen +# will interpret the first line (until the first dot) of a JavaDoc-style +# comment as the brief description. If set to NO, the JavaDoc +# comments will behave just like regular Qt-style comments +# (thus requiring an explicit @brief command for a brief description.) + +JAVADOC_AUTOBRIEF = NO + +# If the QT_AUTOBRIEF tag is set to YES then Doxygen will +# interpret the first line (until the first dot) of a Qt-style +# comment as the brief description. If set to NO, the comments +# will behave just like regular Qt-style comments (thus requiring +# an explicit \brief command for a brief description.) + +QT_AUTOBRIEF = NO + +# The MULTILINE_CPP_IS_BRIEF tag can be set to YES to make Doxygen +# treat a multi-line C++ special comment block (i.e. a block of //! or /// +# comments) as a brief description. This used to be the default behaviour. +# The new default is to treat a multi-line C++ comment block as a detailed +# description. Set this tag to YES if you prefer the old behaviour instead. + +MULTILINE_CPP_IS_BRIEF = NO + +# If the INHERIT_DOCS tag is set to YES (the default) then an undocumented +# member inherits the documentation from any documented member that it +# re-implements. + +INHERIT_DOCS = YES + +# If the SEPARATE_MEMBER_PAGES tag is set to YES, then doxygen will produce +# a new page for each member. If set to NO, the documentation of a member will +# be part of the file/class/namespace that contains it. + +SEPARATE_MEMBER_PAGES = NO + +# The TAB_SIZE tag can be used to set the number of spaces in a tab. +# Doxygen uses this value to replace tabs by spaces in code fragments. + +TAB_SIZE = 8 + +# This tag can be used to specify a number of aliases that acts +# as commands in the documentation. An alias has the form "name=value". +# For example adding "sideeffect=\par Side Effects:\n" will allow you to +# put the command \sideeffect (or @sideeffect) in the documentation, which +# will result in a user-defined paragraph with heading "Side Effects:". +# You can put \n's in the value part of an alias to insert newlines. + +ALIASES = "other=*" + +# This tag can be used to specify a number of word-keyword mappings (TCL only). +# A mapping has the form "name=value". For example adding +# "class=itcl::class" will allow you to use the command class in the +# itcl::class meaning. + +TCL_SUBST = + +# Set the OPTIMIZE_OUTPUT_FOR_C tag to YES if your project consists of C +# sources only. Doxygen will then generate output that is more tailored for C. +# For instance, some of the names that are used will be different. The list +# of all members will be omitted, etc. + +OPTIMIZE_OUTPUT_FOR_C = NO + +# Set the OPTIMIZE_OUTPUT_JAVA tag to YES if your project consists of Java +# sources only. Doxygen will then generate output that is more tailored for +# Java. For instance, namespaces will be presented as packages, qualified +# scopes will look different, etc. + +OPTIMIZE_OUTPUT_JAVA = NO + +# Set the OPTIMIZE_FOR_FORTRAN tag to YES if your project consists of Fortran +# sources only. Doxygen will then generate output that is more tailored for +# Fortran. + +OPTIMIZE_FOR_FORTRAN = NO + +# Set the OPTIMIZE_OUTPUT_VHDL tag to YES if your project consists of VHDL +# sources. Doxygen will then generate output that is tailored for +# VHDL. + +OPTIMIZE_OUTPUT_VHDL = NO + +# Doxygen selects the parser to use depending on the extension of the files it +# parses. With this tag you can assign which parser to use for a given +# extension. Doxygen has a built-in mapping, but you can override or extend it +# using this tag. The format is ext=language, where ext is a file extension, +# and language is one of the parsers supported by doxygen: IDL, Java, +# Javascript, CSharp, C, C++, D, PHP, Objective-C, Python, Fortran, VHDL, C, +# C++. For instance to make doxygen treat .inc files as Fortran files (default +# is PHP), and .f files as C (default is Fortran), use: inc=Fortran f=C. Note +# that for custom extensions you also need to set FILE_PATTERNS otherwise the +# files are not read by doxygen. + +EXTENSION_MAPPING = + +# If MARKDOWN_SUPPORT is enabled (the default) then doxygen pre-processes all +# comments according to the Markdown format, which allows for more readable +# documentation. See http://daringfireball.net/projects/markdown/ for details. +# The output of markdown processing is further processed by doxygen, so you +# can mix doxygen, HTML, and XML commands with Markdown formatting. +# Disable only in case of backward compatibilities issues. + +MARKDOWN_SUPPORT = YES + +# When enabled doxygen tries to link words that correspond to documented classes, +# or namespaces to their corresponding documentation. Such a link can be +# prevented in individual cases by by putting a % sign in front of the word or +# globally by setting AUTOLINK_SUPPORT to NO. + +AUTOLINK_SUPPORT = YES + +# If you use STL classes (i.e. std::string, std::vector, etc.) but do not want +# to include (a tag file for) the STL sources as input, then you should +# set this tag to YES in order to let doxygen match functions declarations and +# definitions whose arguments contain STL classes (e.g. func(std::string); v.s. +# func(std::string) {}). This also makes the inheritance and collaboration +# diagrams that involve STL classes more complete and accurate. + +BUILTIN_STL_SUPPORT = NO + +# If you use Microsoft's C++/CLI language, you should set this option to YES to +# enable parsing support. + +CPP_CLI_SUPPORT = NO + +# Set the SIP_SUPPORT tag to YES if your project consists of sip sources only. +# Doxygen will parse them like normal C++ but will assume all classes use public +# instead of private inheritance when no explicit protection keyword is present. + +SIP_SUPPORT = NO + +# For Microsoft's IDL there are propget and propput attributes to +# indicate getter and setter methods for a property. Setting this +# option to YES (the default) will make doxygen replace the get and +# set methods by a property in the documentation. This will only work +# if the methods are indeed getting or setting a simple type. If this +# is not the case, or you want to show the methods anyway, you should +# set this option to NO. + +IDL_PROPERTY_SUPPORT = YES + +# If member grouping is used in the documentation and the DISTRIBUTE_GROUP_DOC +# tag is set to YES, then doxygen will reuse the documentation of the first +# member in the group (if any) for the other members of the group. By default +# all members of a group must be documented explicitly. + +DISTRIBUTE_GROUP_DOC = NO + +# Set the SUBGROUPING tag to YES (the default) to allow class member groups of +# the same type (for instance a group of public functions) to be put as a +# subgroup of that type (e.g. under the Public Functions section). Set it to +# NO to prevent subgrouping. Alternatively, this can be done per class using +# the \nosubgrouping command. + +SUBGROUPING = YES + +# When the INLINE_GROUPED_CLASSES tag is set to YES, classes, structs and +# unions are shown inside the group in which they are included (e.g. using +# @ingroup) instead of on a separate page (for HTML and Man pages) or +# section (for LaTeX and RTF). + +INLINE_GROUPED_CLASSES = NO + +# When the INLINE_SIMPLE_STRUCTS tag is set to YES, structs, classes, and +# unions with only public data fields will be shown inline in the documentation +# of the scope in which they are defined (i.e. file, namespace, or group +# documentation), provided this scope is documented. If set to NO (the default), +# structs, classes, and unions are shown on a separate page (for HTML and Man +# pages) or section (for LaTeX and RTF). + +INLINE_SIMPLE_STRUCTS = NO + +# When TYPEDEF_HIDES_STRUCT is enabled, a typedef of a struct, union, or enum +# is documented as struct, union, or enum with the name of the typedef. So +# typedef struct TypeS {} TypeT, will appear in the documentation as a struct +# with name TypeT. When disabled the typedef will appear as a member of a file, +# namespace, or class. And the struct will be named TypeS. This can typically +# be useful for C code in case the coding convention dictates that all compound +# types are typedef'ed and only the typedef is referenced, never the tag name. + +TYPEDEF_HIDES_STRUCT = NO + +# The SYMBOL_CACHE_SIZE determines the size of the internal cache use to +# determine which symbols to keep in memory and which to flush to disk. +# When the cache is full, less often used symbols will be written to disk. +# For small to medium size projects (<1000 input files) the default value is +# probably good enough. For larger projects a too small cache size can cause +# doxygen to be busy swapping symbols to and from disk most of the time +# causing a significant performance penalty. +# If the system has enough physical memory increasing the cache will improve the +# performance by keeping more symbols in memory. Note that the value works on +# a logarithmic scale so increasing the size by one will roughly double the +# memory usage. The cache size is given by this formula: +# 2^(16+SYMBOL_CACHE_SIZE). The valid range is 0..9, the default is 0, +# corresponding to a cache size of 2^16 = 65536 symbols. + +SYMBOL_CACHE_SIZE = 0 + +# Similar to the SYMBOL_CACHE_SIZE the size of the symbol lookup cache can be +# set using LOOKUP_CACHE_SIZE. This cache is used to resolve symbols given +# their name and scope. Since this can be an expensive process and often the +# same symbol appear multiple times in the code, doxygen keeps a cache of +# pre-resolved symbols. If the cache is too small doxygen will become slower. +# If the cache is too large, memory is wasted. The cache size is given by this +# formula: 2^(16+LOOKUP_CACHE_SIZE). The valid range is 0..9, the default is 0, +# corresponding to a cache size of 2^16 = 65536 symbols. + +LOOKUP_CACHE_SIZE = 0 + +#--------------------------------------------------------------------------- +# Build related configuration options +#--------------------------------------------------------------------------- + +# If the EXTRACT_ALL tag is set to YES doxygen will assume all entities in +# documentation are documented, even if no documentation was available. +# Private class members and static file members will be hidden unless +# the EXTRACT_PRIVATE and EXTRACT_STATIC tags are set to YES + +EXTRACT_ALL = NO + +# If the EXTRACT_PRIVATE tag is set to YES all private members of a class +# will be included in the documentation. + +EXTRACT_PRIVATE = YES + +# If the EXTRACT_PACKAGE tag is set to YES all members with package or internal +# scope will be included in the documentation. + +EXTRACT_PACKAGE = NO + +# If the EXTRACT_STATIC tag is set to YES all static members of a file +# will be included in the documentation. + +EXTRACT_STATIC = YES + +# If the EXTRACT_LOCAL_CLASSES tag is set to YES classes (and structs) +# defined locally in source files will be included in the documentation. +# If set to NO only classes defined in header files are included. + +EXTRACT_LOCAL_CLASSES = YES + +# This flag is only useful for Objective-C code. When set to YES local +# methods, which are defined in the implementation section but not in +# the interface are included in the documentation. +# If set to NO (the default) only methods in the interface are included. + +EXTRACT_LOCAL_METHODS = NO + +# If this flag is set to YES, the members of anonymous namespaces will be +# extracted and appear in the documentation as a namespace called +# 'anonymous_namespace{file}', where file will be replaced with the base +# name of the file that contains the anonymous namespace. By default +# anonymous namespaces are hidden. + +EXTRACT_ANON_NSPACES = NO + +# If the HIDE_UNDOC_MEMBERS tag is set to YES, Doxygen will hide all +# undocumented members of documented classes, files or namespaces. +# If set to NO (the default) these members will be included in the +# various overviews, but no documentation section is generated. +# This option has no effect if EXTRACT_ALL is enabled. + +HIDE_UNDOC_MEMBERS = YES + +# If the HIDE_UNDOC_CLASSES tag is set to YES, Doxygen will hide all +# undocumented classes that are normally visible in the class hierarchy. +# If set to NO (the default) these classes will be included in the various +# overviews. This option has no effect if EXTRACT_ALL is enabled. + +HIDE_UNDOC_CLASSES = YES + +# If the HIDE_FRIEND_COMPOUNDS tag is set to YES, Doxygen will hide all +# friend (class|struct|union) declarations. +# If set to NO (the default) these declarations will be included in the +# documentation. + +HIDE_FRIEND_COMPOUNDS = NO + +# If the HIDE_IN_BODY_DOCS tag is set to YES, Doxygen will hide any +# documentation blocks found inside the body of a function. +# If set to NO (the default) these blocks will be appended to the +# function's detailed documentation block. + +HIDE_IN_BODY_DOCS = NO + +# The INTERNAL_DOCS tag determines if documentation +# that is typed after a \internal command is included. If the tag is set +# to NO (the default) then the documentation will be excluded. +# Set it to YES to include the internal documentation. + +INTERNAL_DOCS = NO + +# If the CASE_SENSE_NAMES tag is set to NO then Doxygen will only generate +# file names in lower-case letters. If set to YES upper-case letters are also +# allowed. This is useful if you have classes or files whose names only differ +# in case and if your file system supports case sensitive file names. Windows +# and Mac users are advised to set this option to NO. + +CASE_SENSE_NAMES = YES + +# If the HIDE_SCOPE_NAMES tag is set to NO (the default) then Doxygen +# will show members with their full class and namespace scopes in the +# documentation. If set to YES the scope will be hidden. + +HIDE_SCOPE_NAMES = NO + +# If the SHOW_INCLUDE_FILES tag is set to YES (the default) then Doxygen +# will put a list of the files that are included by a file in the documentation +# of that file. + +SHOW_INCLUDE_FILES = YES + +# If the FORCE_LOCAL_INCLUDES tag is set to YES then Doxygen +# will list include files with double quotes in the documentation +# rather than with sharp brackets. + +FORCE_LOCAL_INCLUDES = NO + +# If the INLINE_INFO tag is set to YES (the default) then a tag [inline] +# is inserted in the documentation for inline members. + +INLINE_INFO = YES + +# If the SORT_MEMBER_DOCS tag is set to YES (the default) then doxygen +# will sort the (detailed) documentation of file and class members +# alphabetically by member name. If set to NO the members will appear in +# declaration order. + +SORT_MEMBER_DOCS = YES + +# If the SORT_BRIEF_DOCS tag is set to YES then doxygen will sort the +# brief documentation of file, namespace and class members alphabetically +# by member name. If set to NO (the default) the members will appear in +# declaration order. + +SORT_BRIEF_DOCS = NO + +# If the SORT_MEMBERS_CTORS_1ST tag is set to YES then doxygen +# will sort the (brief and detailed) documentation of class members so that +# constructors and destructors are listed first. If set to NO (the default) +# the constructors will appear in the respective orders defined by +# SORT_MEMBER_DOCS and SORT_BRIEF_DOCS. +# This tag will be ignored for brief docs if SORT_BRIEF_DOCS is set to NO +# and ignored for detailed docs if SORT_MEMBER_DOCS is set to NO. + +SORT_MEMBERS_CTORS_1ST = NO + +# If the SORT_GROUP_NAMES tag is set to YES then doxygen will sort the +# hierarchy of group names into alphabetical order. If set to NO (the default) +# the group names will appear in their defined order. + +SORT_GROUP_NAMES = NO + +# If the SORT_BY_SCOPE_NAME tag is set to YES, the class list will be +# sorted by fully-qualified names, including namespaces. If set to +# NO (the default), the class list will be sorted only by class name, +# not including the namespace part. +# Note: This option is not very useful if HIDE_SCOPE_NAMES is set to YES. +# Note: This option applies only to the class list, not to the +# alphabetical list. + +SORT_BY_SCOPE_NAME = NO + +# If the STRICT_PROTO_MATCHING option is enabled and doxygen fails to +# do proper type resolution of all parameters of a function it will reject a +# match between the prototype and the implementation of a member function even +# if there is only one candidate or it is obvious which candidate to choose +# by doing a simple string match. By disabling STRICT_PROTO_MATCHING doxygen +# will still accept a match between prototype and implementation in such cases. + +STRICT_PROTO_MATCHING = NO + +# The GENERATE_TODOLIST tag can be used to enable (YES) or +# disable (NO) the todo list. This list is created by putting \todo +# commands in the documentation. + +GENERATE_TODOLIST = YES + +# The GENERATE_TESTLIST tag can be used to enable (YES) or +# disable (NO) the test list. This list is created by putting \test +# commands in the documentation. + +GENERATE_TESTLIST = YES + +# The GENERATE_BUGLIST tag can be used to enable (YES) or +# disable (NO) the bug list. This list is created by putting \bug +# commands in the documentation. + +GENERATE_BUGLIST = YES + +# The GENERATE_DEPRECATEDLIST tag can be used to enable (YES) or +# disable (NO) the deprecated list. This list is created by putting +# \deprecated commands in the documentation. + +GENERATE_DEPRECATEDLIST= YES + +# The ENABLED_SECTIONS tag can be used to enable conditional +# documentation sections, marked by \if sectionname ... \endif. + +ENABLED_SECTIONS = + +# The MAX_INITIALIZER_LINES tag determines the maximum number of lines +# the initial value of a variable or macro consists of for it to appear in +# the documentation. If the initializer consists of more lines than specified +# here it will be hidden. Use a value of 0 to hide initializers completely. +# The appearance of the initializer of individual variables and macros in the +# documentation can be controlled using \showinitializer or \hideinitializer +# command in the documentation regardless of this setting. + +MAX_INITIALIZER_LINES = 30 + +# Set the SHOW_USED_FILES tag to NO to disable the list of files generated +# at the bottom of the documentation of classes and structs. If set to YES the +# list will mention the files that were used to generate the documentation. + +SHOW_USED_FILES = YES + +# Set the SHOW_FILES tag to NO to disable the generation of the Files page. +# This will remove the Files entry from the Quick Index and from the +# Folder Tree View (if specified). The default is YES. + +# We probably will want this, but we have no file documentation yet so it's simpler to remove +# it for now. +SHOW_FILES = NO + +# Set the SHOW_NAMESPACES tag to NO to disable the generation of the +# Namespaces page. +# This will remove the Namespaces entry from the Quick Index +# and from the Folder Tree View (if specified). The default is YES. + +SHOW_NAMESPACES = YES + +# The FILE_VERSION_FILTER tag can be used to specify a program or script that +# doxygen should invoke to get the current version for each file (typically from +# the version control system). Doxygen will invoke the program by executing (via +# popen()) the command , where is the value of +# the FILE_VERSION_FILTER tag, and is the name of an input file +# provided by doxygen. Whatever the program writes to standard output +# is used as the file version. See the manual for examples. + +FILE_VERSION_FILTER = + +# The LAYOUT_FILE tag can be used to specify a layout file which will be parsed +# by doxygen. The layout file controls the global structure of the generated +# output files in an output format independent way. To create the layout file +# that represents doxygen's defaults, run doxygen with the -l option. +# You can optionally specify a file name after the option, if omitted +# DoxygenLayout.xml will be used as the name of the layout file. + +LAYOUT_FILE = + +# The CITE_BIB_FILES tag can be used to specify one or more bib files +# containing the references data. This must be a list of .bib files. The +# .bib extension is automatically appended if omitted. Using this command +# requires the bibtex tool to be installed. See also +# http://en.wikipedia.org/wiki/BibTeX for more info. For LaTeX the style +# of the bibliography can be controlled using LATEX_BIB_STYLE. To use this +# feature you need bibtex and perl available in the search path. + +CITE_BIB_FILES = + +#--------------------------------------------------------------------------- +# configuration options related to warning and progress messages +#--------------------------------------------------------------------------- + +# The QUIET tag can be used to turn on/off the messages that are generated +# by doxygen. Possible values are YES and NO. If left blank NO is used. + +QUIET = NO + +# The WARNINGS tag can be used to turn on/off the warning messages that are +# generated by doxygen. Possible values are YES and NO. If left blank +# NO is used. + +WARNINGS = YES + +# If WARN_IF_UNDOCUMENTED is set to YES, then doxygen will generate warnings +# for undocumented members. If EXTRACT_ALL is set to YES then this flag will +# automatically be disabled. + +WARN_IF_UNDOCUMENTED = YES + +# If WARN_IF_DOC_ERROR is set to YES, doxygen will generate warnings for +# potential errors in the documentation, such as not documenting some +# parameters in a documented function, or documenting parameters that +# don't exist or using markup commands wrongly. + +WARN_IF_DOC_ERROR = YES + +# The WARN_NO_PARAMDOC option can be enabled to get warnings for +# functions that are documented, but have no documentation for their parameters +# or return value. If set to NO (the default) doxygen will only warn about +# wrong or incomplete parameter documentation, but not about the absence of +# documentation. + +WARN_NO_PARAMDOC = NO + +# The WARN_FORMAT tag determines the format of the warning messages that +# doxygen can produce. The string should contain the $file, $line, and $text +# tags, which will be replaced by the file and line number from which the +# warning originated and the warning text. Optionally the format may contain +# $version, which will be replaced by the version of the file (if it could +# be obtained via FILE_VERSION_FILTER) + +WARN_FORMAT = + +# The WARN_LOGFILE tag can be used to specify a file to which warning +# and error messages should be written. If left blank the output is written +# to stderr. + +WARN_LOGFILE = + +#--------------------------------------------------------------------------- +# configuration options related to the input files +#--------------------------------------------------------------------------- + +# The INPUT tag can be used to specify the files and/or directories that contain +# documented source files. You may enter file names like "myfile.cpp" or +# directories like "/usr/src/myproject". Separate the files or directories +# with spaces. + +INPUT = src doc/doxygen/libomp_interface.h +# The ittnotify code also has doxygen documentation, but if we include it here +# it takes over from us! +# src/thirdparty/ittnotify + +# This tag can be used to specify the character encoding of the source files +# that doxygen parses. Internally doxygen uses the UTF-8 encoding, which is +# also the default input encoding. Doxygen uses libiconv (or the iconv built +# into libc) for the transcoding. See http://www.gnu.org/software/libiconv for +# the list of possible encodings. + +INPUT_ENCODING = UTF-8 + +# If the value of the INPUT tag contains directories, you can use the +# FILE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp +# and *.h) to filter out the source-files in the directories. If left +# blank the following patterns are tested: +# *.c *.cc *.cxx *.cpp *.c++ *.d *.java *.ii *.ixx *.ipp *.i++ *.inl *.h *.hh +# *.hxx *.hpp *.h++ *.idl *.odl *.cs *.php *.php3 *.inc *.m *.mm *.dox *.py +# *.f90 *.f *.for *.vhd *.vhdl + +FILE_PATTERNS = *.c *.h *.cpp +# We may also want to include the asm files with appropriate ifdef to ensure +# doxygen doesn't see the content, just the documentation... + +# The RECURSIVE tag can be used to turn specify whether or not subdirectories +# should be searched for input files as well. Possible values are YES and NO. +# If left blank NO is used. + +# Only look in the one directory. +RECURSIVE = NO + +# The EXCLUDE tag can be used to specify files and/or directories that should be +# excluded from the INPUT source files. This way you can easily exclude a +# subdirectory from a directory tree whose root is specified with the INPUT tag. +# Note that relative paths are relative to the directory from which doxygen is +# run. + +EXCLUDE = src/test-touch.c + +# The EXCLUDE_SYMLINKS tag can be used to select whether or not files or +# directories that are symbolic links (a Unix file system feature) are excluded +# from the input. + +EXCLUDE_SYMLINKS = NO + +# If the value of the INPUT tag contains directories, you can use the +# EXCLUDE_PATTERNS tag to specify one or more wildcard patterns to exclude +# certain files from those directories. Note that the wildcards are matched +# against the file with absolute path, so to exclude all test directories +# for example use the pattern */test/* + +EXCLUDE_PATTERNS = + +# The EXCLUDE_SYMBOLS tag can be used to specify one or more symbol names +# (namespaces, classes, functions, etc.) that should be excluded from the +# output. The symbol name can be a fully qualified name, a word, or if the +# wildcard * is used, a substring. Examples: ANamespace, AClass, +# AClass::ANamespace, ANamespace::*Test + +EXCLUDE_SYMBOLS = + +# The EXAMPLE_PATH tag can be used to specify one or more files or +# directories that contain example code fragments that are included (see +# the \include command). + +EXAMPLE_PATH = + +# If the value of the EXAMPLE_PATH tag contains directories, you can use the +# EXAMPLE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp +# and *.h) to filter out the source-files in the directories. If left +# blank all files are included. + +EXAMPLE_PATTERNS = + +# If the EXAMPLE_RECURSIVE tag is set to YES then subdirectories will be +# searched for input files to be used with the \include or \dontinclude +# commands irrespective of the value of the RECURSIVE tag. +# Possible values are YES and NO. If left blank NO is used. + +EXAMPLE_RECURSIVE = NO + +# The IMAGE_PATH tag can be used to specify one or more files or +# directories that contain image that are included in the documentation (see +# the \image command). + +IMAGE_PATH = + +# The INPUT_FILTER tag can be used to specify a program that doxygen should +# invoke to filter for each input file. Doxygen will invoke the filter program +# by executing (via popen()) the command , where +# is the value of the INPUT_FILTER tag, and is the name of an +# input file. Doxygen will then use the output that the filter program writes +# to standard output. +# If FILTER_PATTERNS is specified, this tag will be +# ignored. + +INPUT_FILTER = + +# The FILTER_PATTERNS tag can be used to specify filters on a per file pattern +# basis. +# Doxygen will compare the file name with each pattern and apply the +# filter if there is a match. +# The filters are a list of the form: +# pattern=filter (like *.cpp=my_cpp_filter). See INPUT_FILTER for further +# info on how filters are used. If FILTER_PATTERNS is empty or if +# non of the patterns match the file name, INPUT_FILTER is applied. + +FILTER_PATTERNS = + +# If the FILTER_SOURCE_FILES tag is set to YES, the input filter (if set using +# INPUT_FILTER) will be used to filter the input files when producing source +# files to browse (i.e. when SOURCE_BROWSER is set to YES). + +FILTER_SOURCE_FILES = NO + +# The FILTER_SOURCE_PATTERNS tag can be used to specify source filters per file +# pattern. A pattern will override the setting for FILTER_PATTERN (if any) +# and it is also possible to disable source filtering for a specific pattern +# using *.ext= (so without naming a filter). This option only has effect when +# FILTER_SOURCE_FILES is enabled. + +FILTER_SOURCE_PATTERNS = + +#--------------------------------------------------------------------------- +# configuration options related to source browsing +#--------------------------------------------------------------------------- + +# If the SOURCE_BROWSER tag is set to YES then a list of source files will +# be generated. Documented entities will be cross-referenced with these sources. +# Note: To get rid of all source code in the generated output, make sure also +# VERBATIM_HEADERS is set to NO. + +SOURCE_BROWSER = YES + +# Setting the INLINE_SOURCES tag to YES will include the body +# of functions and classes directly in the documentation. + +INLINE_SOURCES = NO + +# Setting the STRIP_CODE_COMMENTS tag to YES (the default) will instruct +# doxygen to hide any special comment blocks from generated source code +# fragments. Normal C, C++ and Fortran comments will always remain visible. + +STRIP_CODE_COMMENTS = YES + +# If the REFERENCED_BY_RELATION tag is set to YES +# then for each documented function all documented +# functions referencing it will be listed. + +REFERENCED_BY_RELATION = YES + +# If the REFERENCES_RELATION tag is set to YES +# then for each documented function all documented entities +# called/used by that function will be listed. + +REFERENCES_RELATION = NO + +# If the REFERENCES_LINK_SOURCE tag is set to YES (the default) +# and SOURCE_BROWSER tag is set to YES, then the hyperlinks from +# functions in REFERENCES_RELATION and REFERENCED_BY_RELATION lists will +# link to the source code. +# Otherwise they will link to the documentation. + +REFERENCES_LINK_SOURCE = YES + +# If the USE_HTAGS tag is set to YES then the references to source code +# will point to the HTML generated by the htags(1) tool instead of doxygen +# built-in source browser. The htags tool is part of GNU's global source +# tagging system (see http://www.gnu.org/software/global/global.html). You +# will need version 4.8.6 or higher. + +USE_HTAGS = NO + +# If the VERBATIM_HEADERS tag is set to YES (the default) then Doxygen +# will generate a verbatim copy of the header file for each class for +# which an include is specified. Set to NO to disable this. + +VERBATIM_HEADERS = YES + +#--------------------------------------------------------------------------- +# configuration options related to the alphabetical class index +#--------------------------------------------------------------------------- + +# If the ALPHABETICAL_INDEX tag is set to YES, an alphabetical index +# of all compounds will be generated. Enable this if the project +# contains a lot of classes, structs, unions or interfaces. + +ALPHABETICAL_INDEX = YES + +# If the alphabetical index is enabled (see ALPHABETICAL_INDEX) then +# the COLS_IN_ALPHA_INDEX tag can be used to specify the number of columns +# in which this list will be split (can be a number in the range [1..20]) + +COLS_IN_ALPHA_INDEX = 5 + +# In case all classes in a project start with a common prefix, all +# classes will be put under the same header in the alphabetical index. +# The IGNORE_PREFIX tag can be used to specify one or more prefixes that +# should be ignored while generating the index headers. + +IGNORE_PREFIX = + +#--------------------------------------------------------------------------- +# configuration options related to the HTML output +#--------------------------------------------------------------------------- + +# If the GENERATE_HTML tag is set to YES (the default) Doxygen will +# generate HTML output. + +GENERATE_HTML = YES + +# The HTML_OUTPUT tag is used to specify where the HTML docs will be put. +# If a relative path is entered the value of OUTPUT_DIRECTORY will be +# put in front of it. If left blank `html' will be used as the default path. + +HTML_OUTPUT = + +# The HTML_FILE_EXTENSION tag can be used to specify the file extension for +# each generated HTML page (for example: .htm,.php,.asp). If it is left blank +# doxygen will generate files with .html extension. + +HTML_FILE_EXTENSION = .html + +# The HTML_HEADER tag can be used to specify a personal HTML header for +# each generated HTML page. If it is left blank doxygen will generate a +# standard header. Note that when using a custom header you are responsible +# for the proper inclusion of any scripts and style sheets that doxygen +# needs, which is dependent on the configuration options used. +# It is advised to generate a default header using "doxygen -w html +# header.html footer.html stylesheet.css YourConfigFile" and then modify +# that header. Note that the header is subject to change so you typically +# have to redo this when upgrading to a newer version of doxygen or when +# changing the value of configuration settings such as GENERATE_TREEVIEW! + +HTML_HEADER = + +# The HTML_FOOTER tag can be used to specify a personal HTML footer for +# each generated HTML page. If it is left blank doxygen will generate a +# standard footer. + +HTML_FOOTER = + +# The HTML_STYLESHEET tag can be used to specify a user-defined cascading +# style sheet that is used by each HTML page. It can be used to +# fine-tune the look of the HTML output. If left blank doxygen will +# generate a default style sheet. Note that it is recommended to use +# HTML_EXTRA_STYLESHEET instead of this one, as it is more robust and this +# tag will in the future become obsolete. + +HTML_STYLESHEET = + +# The HTML_EXTRA_STYLESHEET tag can be used to specify an additional +# user-defined cascading style sheet that is included after the standard +# style sheets created by doxygen. Using this option one can overrule +# certain style aspects. This is preferred over using HTML_STYLESHEET +# since it does not replace the standard style sheet and is therefor more +# robust against future updates. Doxygen will copy the style sheet file to +# the output directory. + +HTML_EXTRA_STYLESHEET = + +# The HTML_EXTRA_FILES tag can be used to specify one or more extra images or +# other source files which should be copied to the HTML output directory. Note +# that these files will be copied to the base HTML output directory. Use the +# $relpath$ marker in the HTML_HEADER and/or HTML_FOOTER files to load these +# files. In the HTML_STYLESHEET file, use the file name only. Also note that +# the files will be copied as-is; there are no commands or markers available. + +HTML_EXTRA_FILES = + +# The HTML_COLORSTYLE_HUE tag controls the color of the HTML output. +# Doxygen will adjust the colors in the style sheet and background images +# according to this color. Hue is specified as an angle on a colorwheel, +# see http://en.wikipedia.org/wiki/Hue for more information. +# For instance the value 0 represents red, 60 is yellow, 120 is green, +# 180 is cyan, 240 is blue, 300 purple, and 360 is red again. +# The allowed range is 0 to 359. + +HTML_COLORSTYLE_HUE = 220 + +# The HTML_COLORSTYLE_SAT tag controls the purity (or saturation) of +# the colors in the HTML output. For a value of 0 the output will use +# grayscales only. A value of 255 will produce the most vivid colors. + +HTML_COLORSTYLE_SAT = 100 + +# The HTML_COLORSTYLE_GAMMA tag controls the gamma correction applied to +# the luminance component of the colors in the HTML output. Values below +# 100 gradually make the output lighter, whereas values above 100 make +# the output darker. The value divided by 100 is the actual gamma applied, +# so 80 represents a gamma of 0.8, The value 220 represents a gamma of 2.2, +# and 100 does not change the gamma. + +HTML_COLORSTYLE_GAMMA = 80 + +# If the HTML_TIMESTAMP tag is set to YES then the footer of each generated HTML +# page will contain the date and time when the page was generated. Setting +# this to NO can help when comparing the output of multiple runs. + +HTML_TIMESTAMP = NO + +# If the HTML_DYNAMIC_SECTIONS tag is set to YES then the generated HTML +# documentation will contain sections that can be hidden and shown after the +# page has loaded. + +HTML_DYNAMIC_SECTIONS = NO + +# With HTML_INDEX_NUM_ENTRIES one can control the preferred number of +# entries shown in the various tree structured indices initially; the user +# can expand and collapse entries dynamically later on. Doxygen will expand +# the tree to such a level that at most the specified number of entries are +# visible (unless a fully collapsed tree already exceeds this amount). +# So setting the number of entries 1 will produce a full collapsed tree by +# default. 0 is a special value representing an infinite number of entries +# and will result in a full expanded tree by default. + +HTML_INDEX_NUM_ENTRIES = 100 + +# If the GENERATE_DOCSET tag is set to YES, additional index files +# will be generated that can be used as input for Apple's Xcode 3 +# integrated development environment, introduced with OSX 10.5 (Leopard). +# To create a documentation set, doxygen will generate a Makefile in the +# HTML output directory. Running make will produce the docset in that +# directory and running "make install" will install the docset in +# ~/Library/Developer/Shared/Documentation/DocSets so that Xcode will find +# it at startup. +# See http://developer.apple.com/tools/creatingdocsetswithdoxygen.html +# for more information. + +GENERATE_DOCSET = NO + +# When GENERATE_DOCSET tag is set to YES, this tag determines the name of the +# feed. A documentation feed provides an umbrella under which multiple +# documentation sets from a single provider (such as a company or product suite) +# can be grouped. + +DOCSET_FEEDNAME = "Doxygen generated docs" + +# When GENERATE_DOCSET tag is set to YES, this tag specifies a string that +# should uniquely identify the documentation set bundle. This should be a +# reverse domain-name style string, e.g. com.mycompany.MyDocSet. Doxygen +# will append .docset to the name. + +DOCSET_BUNDLE_ID = org.doxygen.Project + +# When GENERATE_PUBLISHER_ID tag specifies a string that should uniquely +# identify the documentation publisher. This should be a reverse domain-name +# style string, e.g. com.mycompany.MyDocSet.documentation. + +DOCSET_PUBLISHER_ID = org.doxygen.Publisher + +# The GENERATE_PUBLISHER_NAME tag identifies the documentation publisher. + +DOCSET_PUBLISHER_NAME = Publisher + +# If the GENERATE_HTMLHELP tag is set to YES, additional index files +# will be generated that can be used as input for tools like the +# Microsoft HTML help workshop to generate a compiled HTML help file (.chm) +# of the generated HTML documentation. + +GENERATE_HTMLHELP = NO + +# If the GENERATE_HTMLHELP tag is set to YES, the CHM_FILE tag can +# be used to specify the file name of the resulting .chm file. You +# can add a path in front of the file if the result should not be +# written to the html output directory. + +CHM_FILE = + +# If the GENERATE_HTMLHELP tag is set to YES, the HHC_LOCATION tag can +# be used to specify the location (absolute path including file name) of +# the HTML help compiler (hhc.exe). If non-empty doxygen will try to run +# the HTML help compiler on the generated index.hhp. + +HHC_LOCATION = + +# If the GENERATE_HTMLHELP tag is set to YES, the GENERATE_CHI flag +# controls if a separate .chi index file is generated (YES) or that +# it should be included in the main .chm file (NO). + +GENERATE_CHI = NO + +# If the GENERATE_HTMLHELP tag is set to YES, the CHM_INDEX_ENCODING +# is used to encode HtmlHelp index (hhk), content (hhc) and project file +# content. + +CHM_INDEX_ENCODING = + +# If the GENERATE_HTMLHELP tag is set to YES, the BINARY_TOC flag +# controls whether a binary table of contents is generated (YES) or a +# normal table of contents (NO) in the .chm file. + +BINARY_TOC = NO + +# The TOC_EXPAND flag can be set to YES to add extra items for group members +# to the contents of the HTML help documentation and to the tree view. + +TOC_EXPAND = NO + +# If the GENERATE_QHP tag is set to YES and both QHP_NAMESPACE and +# QHP_VIRTUAL_FOLDER are set, an additional index file will be generated +# that can be used as input for Qt's qhelpgenerator to generate a +# Qt Compressed Help (.qch) of the generated HTML documentation. + +GENERATE_QHP = NO + +# If the QHG_LOCATION tag is specified, the QCH_FILE tag can +# be used to specify the file name of the resulting .qch file. +# The path specified is relative to the HTML output folder. + +QCH_FILE = + +# The QHP_NAMESPACE tag specifies the namespace to use when generating +# Qt Help Project output. For more information please see +# http://doc.trolltech.com/qthelpproject.html#namespace + +QHP_NAMESPACE = org.doxygen.Project + +# The QHP_VIRTUAL_FOLDER tag specifies the namespace to use when generating +# Qt Help Project output. For more information please see +# http://doc.trolltech.com/qthelpproject.html#virtual-folders + +QHP_VIRTUAL_FOLDER = doc + +# If QHP_CUST_FILTER_NAME is set, it specifies the name of a custom filter to +# add. For more information please see +# http://doc.trolltech.com/qthelpproject.html#custom-filters + +QHP_CUST_FILTER_NAME = + +# The QHP_CUST_FILT_ATTRS tag specifies the list of the attributes of the +# custom filter to add. For more information please see +# +# Qt Help Project / Custom Filters. + +QHP_CUST_FILTER_ATTRS = + +# The QHP_SECT_FILTER_ATTRS tag specifies the list of the attributes this +# project's +# filter section matches. +# +# Qt Help Project / Filter Attributes. + +QHP_SECT_FILTER_ATTRS = + +# If the GENERATE_QHP tag is set to YES, the QHG_LOCATION tag can +# be used to specify the location of Qt's qhelpgenerator. +# If non-empty doxygen will try to run qhelpgenerator on the generated +# .qhp file. + +QHG_LOCATION = + +# If the GENERATE_ECLIPSEHELP tag is set to YES, additional index files +# will be generated, which together with the HTML files, form an Eclipse help +# plugin. To install this plugin and make it available under the help contents +# menu in Eclipse, the contents of the directory containing the HTML and XML +# files needs to be copied into the plugins directory of eclipse. The name of +# the directory within the plugins directory should be the same as +# the ECLIPSE_DOC_ID value. After copying Eclipse needs to be restarted before +# the help appears. + +GENERATE_ECLIPSEHELP = NO + +# A unique identifier for the eclipse help plugin. When installing the plugin +# the directory name containing the HTML and XML files should also have +# this name. + +ECLIPSE_DOC_ID = org.doxygen.Project + +# The DISABLE_INDEX tag can be used to turn on/off the condensed index (tabs) +# at top of each HTML page. The value NO (the default) enables the index and +# the value YES disables it. Since the tabs have the same information as the +# navigation tree you can set this option to NO if you already set +# GENERATE_TREEVIEW to YES. + +DISABLE_INDEX = NO + +# The GENERATE_TREEVIEW tag is used to specify whether a tree-like index +# structure should be generated to display hierarchical information. +# If the tag value is set to YES, a side panel will be generated +# containing a tree-like index structure (just like the one that +# is generated for HTML Help). For this to work a browser that supports +# JavaScript, DHTML, CSS and frames is required (i.e. any modern browser). +# Windows users are probably better off using the HTML help feature. +# Since the tree basically has the same information as the tab index you +# could consider to set DISABLE_INDEX to NO when enabling this option. + +GENERATE_TREEVIEW = NO + +# The ENUM_VALUES_PER_LINE tag can be used to set the number of enum values +# (range [0,1..20]) that doxygen will group on one line in the generated HTML +# documentation. Note that a value of 0 will completely suppress the enum +# values from appearing in the overview section. + +ENUM_VALUES_PER_LINE = 4 + +# If the treeview is enabled (see GENERATE_TREEVIEW) then this tag can be +# used to set the initial width (in pixels) of the frame in which the tree +# is shown. + +TREEVIEW_WIDTH = 250 + +# When the EXT_LINKS_IN_WINDOW option is set to YES doxygen will open +# links to external symbols imported via tag files in a separate window. + +EXT_LINKS_IN_WINDOW = NO + +# Use this tag to change the font size of Latex formulas included +# as images in the HTML documentation. The default is 10. Note that +# when you change the font size after a successful doxygen run you need +# to manually remove any form_*.png images from the HTML output directory +# to force them to be regenerated. + +FORMULA_FONTSIZE = 10 + +# Use the FORMULA_TRANPARENT tag to determine whether or not the images +# generated for formulas are transparent PNGs. Transparent PNGs are +# not supported properly for IE 6.0, but are supported on all modern browsers. +# Note that when changing this option you need to delete any form_*.png files +# in the HTML output before the changes have effect. + +FORMULA_TRANSPARENT = YES + +# Enable the USE_MATHJAX option to render LaTeX formulas using MathJax +# (see http://www.mathjax.org) which uses client side Javascript for the +# rendering instead of using prerendered bitmaps. Use this if you do not +# have LaTeX installed or if you want to formulas look prettier in the HTML +# output. When enabled you may also need to install MathJax separately and +# configure the path to it using the MATHJAX_RELPATH option. + +USE_MATHJAX = NO + +# When MathJax is enabled you need to specify the location relative to the +# HTML output directory using the MATHJAX_RELPATH option. The destination +# directory should contain the MathJax.js script. For instance, if the mathjax +# directory is located at the same level as the HTML output directory, then +# MATHJAX_RELPATH should be ../mathjax. The default value points to +# the MathJax Content Delivery Network so you can quickly see the result without +# installing MathJax. +# However, it is strongly recommended to install a local +# copy of MathJax from http://www.mathjax.org before deployment. + +MATHJAX_RELPATH = http://cdn.mathjax.org/mathjax/latest + +# The MATHJAX_EXTENSIONS tag can be used to specify one or MathJax extension +# names that should be enabled during MathJax rendering. + +MATHJAX_EXTENSIONS = + +# When the SEARCHENGINE tag is enabled doxygen will generate a search box +# for the HTML output. The underlying search engine uses javascript +# and DHTML and should work on any modern browser. Note that when using +# HTML help (GENERATE_HTMLHELP), Qt help (GENERATE_QHP), or docsets +# (GENERATE_DOCSET) there is already a search function so this one should +# typically be disabled. For large projects the javascript based search engine +# can be slow, then enabling SERVER_BASED_SEARCH may provide a better solution. + +SEARCHENGINE = YES + +# When the SERVER_BASED_SEARCH tag is enabled the search engine will be +# implemented using a PHP enabled web server instead of at the web client +# using Javascript. Doxygen will generate the search PHP script and index +# file to put on the web server. The advantage of the server +# based approach is that it scales better to large projects and allows +# full text search. The disadvantages are that it is more difficult to setup +# and does not have live searching capabilities. + +SERVER_BASED_SEARCH = NO + +#--------------------------------------------------------------------------- +# configuration options related to the LaTeX output +#--------------------------------------------------------------------------- + +# If the GENERATE_LATEX tag is set to YES (the default) Doxygen will +# generate Latex output. + +GENERATE_LATEX = YES + +# The LATEX_OUTPUT tag is used to specify where the LaTeX docs will be put. +# If a relative path is entered the value of OUTPUT_DIRECTORY will be +# put in front of it. If left blank `latex' will be used as the default path. + +LATEX_OUTPUT = + +# The LATEX_CMD_NAME tag can be used to specify the LaTeX command name to be +# invoked. If left blank `latex' will be used as the default command name. +# Note that when enabling USE_PDFLATEX this option is only used for +# generating bitmaps for formulas in the HTML output, but not in the +# Makefile that is written to the output directory. + +LATEX_CMD_NAME = latex + +# The MAKEINDEX_CMD_NAME tag can be used to specify the command name to +# generate index for LaTeX. If left blank `makeindex' will be used as the +# default command name. + +MAKEINDEX_CMD_NAME = makeindex + +# If the COMPACT_LATEX tag is set to YES Doxygen generates more compact +# LaTeX documents. This may be useful for small projects and may help to +# save some trees in general. + +COMPACT_LATEX = NO + +# The PAPER_TYPE tag can be used to set the paper type that is used +# by the printer. Possible values are: a4, letter, legal and +# executive. If left blank a4wide will be used. + +PAPER_TYPE = a4wide + +# The EXTRA_PACKAGES tag can be to specify one or more names of LaTeX +# packages that should be included in the LaTeX output. + +EXTRA_PACKAGES = + +# The LATEX_HEADER tag can be used to specify a personal LaTeX header for +# the generated latex document. The header should contain everything until +# the first chapter. If it is left blank doxygen will generate a +# standard header. Notice: only use this tag if you know what you are doing! + +LATEX_HEADER = doc/doxygen/header.tex + +# The LATEX_FOOTER tag can be used to specify a personal LaTeX footer for +# the generated latex document. The footer should contain everything after +# the last chapter. If it is left blank doxygen will generate a +# standard footer. Notice: only use this tag if you know what you are doing! + +LATEX_FOOTER = + +# If the PDF_HYPERLINKS tag is set to YES, the LaTeX that is generated +# is prepared for conversion to pdf (using ps2pdf). The pdf file will +# contain links (just like the HTML output) instead of page references +# This makes the output suitable for online browsing using a pdf viewer. + +PDF_HYPERLINKS = YES + +# If the USE_PDFLATEX tag is set to YES, pdflatex will be used instead of +# plain latex in the generated Makefile. Set this option to YES to get a +# higher quality PDF documentation. + +USE_PDFLATEX = YES + +# If the LATEX_BATCHMODE tag is set to YES, doxygen will add the \\batchmode. +# command to the generated LaTeX files. This will instruct LaTeX to keep +# running if errors occur, instead of asking the user for help. +# This option is also used when generating formulas in HTML. + +LATEX_BATCHMODE = NO + +# If LATEX_HIDE_INDICES is set to YES then doxygen will not +# include the index chapters (such as File Index, Compound Index, etc.) +# in the output. + +LATEX_HIDE_INDICES = NO + +# If LATEX_SOURCE_CODE is set to YES then doxygen will include +# source code with syntax highlighting in the LaTeX output. +# Note that which sources are shown also depends on other settings +# such as SOURCE_BROWSER. + +LATEX_SOURCE_CODE = NO + +# The LATEX_BIB_STYLE tag can be used to specify the style to use for the +# bibliography, e.g. plainnat, or ieeetr. The default style is "plain". See +# http://en.wikipedia.org/wiki/BibTeX for more info. + +LATEX_BIB_STYLE = plain + +#--------------------------------------------------------------------------- +# configuration options related to the RTF output +#--------------------------------------------------------------------------- + +# If the GENERATE_RTF tag is set to YES Doxygen will generate RTF output +# The RTF output is optimized for Word 97 and may not look very pretty with +# other RTF readers or editors. + +GENERATE_RTF = NO + +# The RTF_OUTPUT tag is used to specify where the RTF docs will be put. +# If a relative path is entered the value of OUTPUT_DIRECTORY will be +# put in front of it. If left blank `rtf' will be used as the default path. + +RTF_OUTPUT = + +# If the COMPACT_RTF tag is set to YES Doxygen generates more compact +# RTF documents. This may be useful for small projects and may help to +# save some trees in general. + +COMPACT_RTF = NO + +# If the RTF_HYPERLINKS tag is set to YES, the RTF that is generated +# will contain hyperlink fields. The RTF file will +# contain links (just like the HTML output) instead of page references. +# This makes the output suitable for online browsing using WORD or other +# programs which support those fields. +# Note: wordpad (write) and others do not support links. + +RTF_HYPERLINKS = NO + +# Load style sheet definitions from file. Syntax is similar to doxygen's +# config file, i.e. a series of assignments. You only have to provide +# replacements, missing definitions are set to their default value. + +RTF_STYLESHEET_FILE = + +# Set optional variables used in the generation of an rtf document. +# Syntax is similar to doxygen's config file. + +RTF_EXTENSIONS_FILE = + +#--------------------------------------------------------------------------- +# configuration options related to the man page output +#--------------------------------------------------------------------------- + +# If the GENERATE_MAN tag is set to YES (the default) Doxygen will +# generate man pages + +GENERATE_MAN = NO + +# The MAN_OUTPUT tag is used to specify where the man pages will be put. +# If a relative path is entered the value of OUTPUT_DIRECTORY will be +# put in front of it. If left blank `man' will be used as the default path. + +MAN_OUTPUT = + +# The MAN_EXTENSION tag determines the extension that is added to +# the generated man pages (default is the subroutine's section .3) + +MAN_EXTENSION = + +# If the MAN_LINKS tag is set to YES and Doxygen generates man output, +# then it will generate one additional man file for each entity +# documented in the real man page(s). These additional files +# only source the real man page, but without them the man command +# would be unable to find the correct page. The default is NO. + +MAN_LINKS = NO + +#--------------------------------------------------------------------------- +# configuration options related to the XML output +#--------------------------------------------------------------------------- + +# If the GENERATE_XML tag is set to YES Doxygen will +# generate an XML file that captures the structure of +# the code including all documentation. + +GENERATE_XML = NO + +# The XML_OUTPUT tag is used to specify where the XML pages will be put. +# If a relative path is entered the value of OUTPUT_DIRECTORY will be +# put in front of it. If left blank `xml' will be used as the default path. + +XML_OUTPUT = xml + +# The XML_SCHEMA tag can be used to specify an XML schema, +# which can be used by a validating XML parser to check the +# syntax of the XML files. + +XML_SCHEMA = + +# The XML_DTD tag can be used to specify an XML DTD, +# which can be used by a validating XML parser to check the +# syntax of the XML files. + +XML_DTD = + +# If the XML_PROGRAMLISTING tag is set to YES Doxygen will +# dump the program listings (including syntax highlighting +# and cross-referencing information) to the XML output. Note that +# enabling this will significantly increase the size of the XML output. + +XML_PROGRAMLISTING = YES + +#--------------------------------------------------------------------------- +# configuration options for the AutoGen Definitions output +#--------------------------------------------------------------------------- + +# If the GENERATE_AUTOGEN_DEF tag is set to YES Doxygen will +# generate an AutoGen Definitions (see autogen.sf.net) file +# that captures the structure of the code including all +# documentation. Note that this feature is still experimental +# and incomplete at the moment. + +GENERATE_AUTOGEN_DEF = NO + +#--------------------------------------------------------------------------- +# configuration options related to the Perl module output +#--------------------------------------------------------------------------- + +# If the GENERATE_PERLMOD tag is set to YES Doxygen will +# generate a Perl module file that captures the structure of +# the code including all documentation. Note that this +# feature is still experimental and incomplete at the +# moment. + +GENERATE_PERLMOD = NO + +# If the PERLMOD_LATEX tag is set to YES Doxygen will generate +# the necessary Makefile rules, Perl scripts and LaTeX code to be able +# to generate PDF and DVI output from the Perl module output. + +PERLMOD_LATEX = NO + +# If the PERLMOD_PRETTY tag is set to YES the Perl module output will be +# nicely formatted so it can be parsed by a human reader. +# This is useful +# if you want to understand what is going on. +# On the other hand, if this +# tag is set to NO the size of the Perl module output will be much smaller +# and Perl will parse it just the same. + +PERLMOD_PRETTY = YES + +# The names of the make variables in the generated doxyrules.make file +# are prefixed with the string contained in PERLMOD_MAKEVAR_PREFIX. +# This is useful so different doxyrules.make files included by the same +# Makefile don't overwrite each other's variables. + +PERLMOD_MAKEVAR_PREFIX = + +#--------------------------------------------------------------------------- +# Configuration options related to the preprocessor +#--------------------------------------------------------------------------- + +# If the ENABLE_PREPROCESSING tag is set to YES (the default) Doxygen will +# evaluate all C-preprocessor directives found in the sources and include +# files. + +ENABLE_PREPROCESSING = YES + +# If the MACRO_EXPANSION tag is set to YES Doxygen will expand all macro +# names in the source code. If set to NO (the default) only conditional +# compilation will be performed. Macro expansion can be done in a controlled +# way by setting EXPAND_ONLY_PREDEF to YES. + +MACRO_EXPANSION = YES + +# If the EXPAND_ONLY_PREDEF and MACRO_EXPANSION tags are both set to YES +# then the macro expansion is limited to the macros specified with the +# PREDEFINED and EXPAND_AS_DEFINED tags. + +EXPAND_ONLY_PREDEF = YES + +# If the SEARCH_INCLUDES tag is set to YES (the default) the includes files +# pointed to by INCLUDE_PATH will be searched when a #include is found. + +SEARCH_INCLUDES = YES + +# The INCLUDE_PATH tag can be used to specify one or more directories that +# contain include files that are not input files but should be processed by +# the preprocessor. + +INCLUDE_PATH = + +# You can use the INCLUDE_FILE_PATTERNS tag to specify one or more wildcard +# patterns (like *.h and *.hpp) to filter out the header-files in the +# directories. If left blank, the patterns specified with FILE_PATTERNS will +# be used. + +INCLUDE_FILE_PATTERNS = + +# The PREDEFINED tag can be used to specify one or more macro names that +# are defined before the preprocessor is started (similar to the -D option of +# gcc). The argument of the tag is a list of macros of the form: name +# or name=definition (no spaces). If the definition and the = are +# omitted =1 is assumed. To prevent a macro definition from being +# undefined via #undef or recursively expanded use the := operator +# instead of the = operator. + +PREDEFINED = OMP_30_ENABLED=1, OMP_40_ENABLED=1, KMP_STATS_ENABLED=1 + +# If the MACRO_EXPANSION and EXPAND_ONLY_PREDEF tags are set to YES then +# this tag can be used to specify a list of macro names that should be expanded. +# The macro definition that is found in the sources will be used. +# Use the PREDEFINED tag if you want to use a different macro definition that +# overrules the definition found in the source code. + +EXPAND_AS_DEFINED = + +# If the SKIP_FUNCTION_MACROS tag is set to YES (the default) then +# doxygen's preprocessor will remove all references to function-like macros +# that are alone on a line, have an all uppercase name, and do not end with a +# semicolon, because these will confuse the parser if not removed. + +SKIP_FUNCTION_MACROS = YES + +#--------------------------------------------------------------------------- +# Configuration::additions related to external references +#--------------------------------------------------------------------------- + +# The TAGFILES option can be used to specify one or more tagfiles. For each +# tag file the location of the external documentation should be added. The +# format of a tag file without this location is as follows: +# +# TAGFILES = file1 file2 ... +# Adding location for the tag files is done as follows: +# +# TAGFILES = file1=loc1 "file2 = loc2" ... +# where "loc1" and "loc2" can be relative or absolute paths +# or URLs. Note that each tag file must have a unique name (where the name does +# NOT include the path). If a tag file is not located in the directory in which +# doxygen is run, you must also specify the path to the tagfile here. + +TAGFILES = + +# When a file name is specified after GENERATE_TAGFILE, doxygen will create +# a tag file that is based on the input files it reads. + +GENERATE_TAGFILE = + +# If the ALLEXTERNALS tag is set to YES all external classes will be listed +# in the class index. If set to NO only the inherited external classes +# will be listed. + +ALLEXTERNALS = NO + +# If the EXTERNAL_GROUPS tag is set to YES all external groups will be listed +# in the modules index. If set to NO, only the current project's groups will +# be listed. + +EXTERNAL_GROUPS = YES + +# The PERL_PATH should be the absolute path and name of the perl script +# interpreter (i.e. the result of `which perl'). + +PERL_PATH = + +#--------------------------------------------------------------------------- +# Configuration options related to the dot tool +#--------------------------------------------------------------------------- + +# If the CLASS_DIAGRAMS tag is set to YES (the default) Doxygen will +# generate a inheritance diagram (in HTML, RTF and LaTeX) for classes with base +# or super classes. Setting the tag to NO turns the diagrams off. Note that +# this option also works with HAVE_DOT disabled, but it is recommended to +# install and use dot, since it yields more powerful graphs. + +CLASS_DIAGRAMS = YES + +# You can define message sequence charts within doxygen comments using the \msc +# command. Doxygen will then run the mscgen tool (see +# http://www.mcternan.me.uk/mscgen/) to produce the chart and insert it in the +# documentation. The MSCGEN_PATH tag allows you to specify the directory where +# the mscgen tool resides. If left empty the tool is assumed to be found in the +# default search path. + +MSCGEN_PATH = + +# If set to YES, the inheritance and collaboration graphs will hide +# inheritance and usage relations if the target is undocumented +# or is not a class. + +HIDE_UNDOC_RELATIONS = YES + +# If you set the HAVE_DOT tag to YES then doxygen will assume the dot tool is +# available from the path. This tool is part of Graphviz, a graph visualization +# toolkit from AT&T and Lucent Bell Labs. The other options in this section +# have no effect if this option is set to NO (the default) + +HAVE_DOT = NO + +# The DOT_NUM_THREADS specifies the number of dot invocations doxygen is +# allowed to run in parallel. When set to 0 (the default) doxygen will +# base this on the number of processors available in the system. You can set it +# explicitly to a value larger than 0 to get control over the balance +# between CPU load and processing speed. + +DOT_NUM_THREADS = 0 + +# By default doxygen will use the Helvetica font for all dot files that +# doxygen generates. When you want a differently looking font you can specify +# the font name using DOT_FONTNAME. You need to make sure dot is able to find +# the font, which can be done by putting it in a standard location or by setting +# the DOTFONTPATH environment variable or by setting DOT_FONTPATH to the +# directory containing the font. + +DOT_FONTNAME = Helvetica + +# The DOT_FONTSIZE tag can be used to set the size of the font of dot graphs. +# The default size is 10pt. + +DOT_FONTSIZE = 10 + +# By default doxygen will tell dot to use the Helvetica font. +# If you specify a different font using DOT_FONTNAME you can use DOT_FONTPATH to +# set the path where dot can find it. + +DOT_FONTPATH = + +# If the CLASS_GRAPH and HAVE_DOT tags are set to YES then doxygen +# will generate a graph for each documented class showing the direct and +# indirect inheritance relations. Setting this tag to YES will force the +# CLASS_DIAGRAMS tag to NO. + +CLASS_GRAPH = YES + +# If the COLLABORATION_GRAPH and HAVE_DOT tags are set to YES then doxygen +# will generate a graph for each documented class showing the direct and +# indirect implementation dependencies (inheritance, containment, and +# class references variables) of the class with other documented classes. + +COLLABORATION_GRAPH = NO + +# If the GROUP_GRAPHS and HAVE_DOT tags are set to YES then doxygen +# will generate a graph for groups, showing the direct groups dependencies + +GROUP_GRAPHS = YES + +# If the UML_LOOK tag is set to YES doxygen will generate inheritance and +# collaboration diagrams in a style similar to the OMG's Unified Modeling +# Language. + +UML_LOOK = NO + +# If the UML_LOOK tag is enabled, the fields and methods are shown inside +# the class node. If there are many fields or methods and many nodes the +# graph may become too big to be useful. The UML_LIMIT_NUM_FIELDS +# threshold limits the number of items for each type to make the size more +# manageable. Set this to 0 for no limit. Note that the threshold may be +# exceeded by 50% before the limit is enforced. + +UML_LIMIT_NUM_FIELDS = 10 + +# If set to YES, the inheritance and collaboration graphs will show the +# relations between templates and their instances. + +TEMPLATE_RELATIONS = YES + +# If the ENABLE_PREPROCESSING, SEARCH_INCLUDES, INCLUDE_GRAPH, and HAVE_DOT +# tags are set to YES then doxygen will generate a graph for each documented +# file showing the direct and indirect include dependencies of the file with +# other documented files. + +INCLUDE_GRAPH = NO + +# If the ENABLE_PREPROCESSING, SEARCH_INCLUDES, INCLUDED_BY_GRAPH, and +# HAVE_DOT tags are set to YES then doxygen will generate a graph for each +# documented header file showing the documented files that directly or +# indirectly include this file. + +INCLUDED_BY_GRAPH = NO + +# If the CALL_GRAPH and HAVE_DOT options are set to YES then +# doxygen will generate a call dependency graph for every global function +# or class method. Note that enabling this option will significantly increase +# the time of a run. So in most cases it will be better to enable call graphs +# for selected functions only using the \callgraph command. + +CALL_GRAPH = NO + +# If the CALLER_GRAPH and HAVE_DOT tags are set to YES then +# doxygen will generate a caller dependency graph for every global function +# or class method. Note that enabling this option will significantly increase +# the time of a run. So in most cases it will be better to enable caller +# graphs for selected functions only using the \callergraph command. + +CALLER_GRAPH = NO + +# If the GRAPHICAL_HIERARCHY and HAVE_DOT tags are set to YES then doxygen +# will generate a graphical hierarchy of all classes instead of a textual one. + +GRAPHICAL_HIERARCHY = YES + +# If the DIRECTORY_GRAPH and HAVE_DOT tags are set to YES +# then doxygen will show the dependencies a directory has on other directories +# in a graphical way. The dependency relations are determined by the #include +# relations between the files in the directories. + +DIRECTORY_GRAPH = YES + +# The DOT_IMAGE_FORMAT tag can be used to set the image format of the images +# generated by dot. Possible values are svg, png, jpg, or gif. +# If left blank png will be used. If you choose svg you need to set +# HTML_FILE_EXTENSION to xhtml in order to make the SVG files +# visible in IE 9+ (other browsers do not have this requirement). + +DOT_IMAGE_FORMAT = png + +# If DOT_IMAGE_FORMAT is set to svg, then this option can be set to YES to +# enable generation of interactive SVG images that allow zooming and panning. +# Note that this requires a modern browser other than Internet Explorer. +# Tested and working are Firefox, Chrome, Safari, and Opera. For IE 9+ you +# need to set HTML_FILE_EXTENSION to xhtml in order to make the SVG files +# visible. Older versions of IE do not have SVG support. + +INTERACTIVE_SVG = NO + +# The tag DOT_PATH can be used to specify the path where the dot tool can be +# found. If left blank, it is assumed the dot tool can be found in the path. + +DOT_PATH = + +# The DOTFILE_DIRS tag can be used to specify one or more directories that +# contain dot files that are included in the documentation (see the +# \dotfile command). + +DOTFILE_DIRS = + +# The MSCFILE_DIRS tag can be used to specify one or more directories that +# contain msc files that are included in the documentation (see the +# \mscfile command). + +MSCFILE_DIRS = + +# The DOT_GRAPH_MAX_NODES tag can be used to set the maximum number of +# nodes that will be shown in the graph. If the number of nodes in a graph +# becomes larger than this value, doxygen will truncate the graph, which is +# visualized by representing a node as a red box. Note that doxygen if the +# number of direct children of the root node in a graph is already larger than +# DOT_GRAPH_MAX_NODES then the graph will not be shown at all. Also note +# that the size of a graph can be further restricted by MAX_DOT_GRAPH_DEPTH. + +DOT_GRAPH_MAX_NODES = 50 + +# The MAX_DOT_GRAPH_DEPTH tag can be used to set the maximum depth of the +# graphs generated by dot. A depth value of 3 means that only nodes reachable +# from the root by following a path via at most 3 edges will be shown. Nodes +# that lay further from the root node will be omitted. Note that setting this +# option to 1 or 2 may greatly reduce the computation time needed for large +# code bases. Also note that the size of a graph can be further restricted by +# DOT_GRAPH_MAX_NODES. Using a depth of 0 means no depth restriction. + +MAX_DOT_GRAPH_DEPTH = 0 + +# Set the DOT_TRANSPARENT tag to YES to generate images with a transparent +# background. This is disabled by default, because dot on Windows does not +# seem to support this out of the box. Warning: Depending on the platform used, +# enabling this option may lead to badly anti-aliased labels on the edges of +# a graph (i.e. they become hard to read). + +DOT_TRANSPARENT = NO + +# Set the DOT_MULTI_TARGETS tag to YES allow dot to generate multiple output +# files in one run (i.e. multiple -o and -T options on the command line). This +# makes dot run faster, but since only newer versions of dot (>1.8.10) +# support this, this feature is disabled by default. + +DOT_MULTI_TARGETS = NO + +# If the GENERATE_LEGEND tag is set to YES (the default) Doxygen will +# generate a legend page explaining the meaning of the various boxes and +# arrows in the dot generated graphs. + +GENERATE_LEGEND = YES + +# If the DOT_CLEANUP tag is set to YES (the default) Doxygen will +# remove the intermediate dot files that are used to generate +# the various graphs. + +DOT_CLEANUP = YES diff --git a/pstl/CREDITS.txt b/pstl/CREDITS.txt index 4945fd5ad308be..174722510fdea4 100644 --- a/pstl/CREDITS.txt +++ b/pstl/CREDITS.txt @@ -1,21 +1,21 @@ -This file is a partial list of people who have contributed to the LLVM/pstl -(Parallel STL) project. If you have contributed a patch or made some other -contribution to LLVM/pstl, please submit a patch to this file to add yourself, -and it will be done! - -The list is sorted by surname and formatted to allow easy grepping and -beautification by scripts. The fields are: name (N), email (E), web-address -(W), PGP key ID and fingerprint (P), description (D), and snail-mail address -(S). - -N: Intel Corporation -W: http://www.intel.com -D: Created the initial implementation. - -N: Thomas Rodgers -E: trodgers at redhat.com -D: Identifier name transformation for inclusion in a Standard C++ library. - -N: Christopher Nelson -E: nadiasvertex at gmail.com -D: Add support for an OpenMP backend. +This file is a partial list of people who have contributed to the LLVM/pstl +(Parallel STL) project. If you have contributed a patch or made some other +contribution to LLVM/pstl, please submit a patch to this file to add yourself, +and it will be done! + +The list is sorted by surname and formatted to allow easy grepping and +beautification by scripts. The fields are: name (N), email (E), web-address +(W), PGP key ID and fingerprint (P), description (D), and snail-mail address +(S). + +N: Intel Corporation +W: http://www.intel.com +D: Created the initial implementation. + +N: Thomas Rodgers +E: trodgers at redhat.com +D: Identifier name transformation for inclusion in a Standard C++ library. + +N: Christopher Nelson +E: nadiasvertex at gmail.com +D: Add support for an OpenMP backend. From openmp-commits at lists.llvm.org Thu Oct 17 06:49:48 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Thu, 17 Oct 2024 06:49:48 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <671115fc.a70a0220.25c3a5.37d3@mx.google.com> https://github.com/ldrumm closed https://github.com/llvm/llvm-project/pull/86318 From openmp-commits at lists.llvm.org Thu Oct 17 06:49:46 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Thu, 17 Oct 2024 06:49:46 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <671115fa.a70a0220.2d9f9b.35ed@mx.google.com> https://github.com/ldrumm updated https://github.com/llvm/llvm-project/pull/86318 >From dccebddb3b802c4c1fe287222e454b63f850f012 Mon Sep 17 00:00:00 2001 From: Luke Drummond Date: Fri, 22 Mar 2024 17:09:54 +0000 Subject: [PATCH 1/2] Finally formalise our defacto line-ending policy Historically, we've not automatically enforced how git tracks line endings, but there are many, many commits that "undo" unintended CRLFs getting into history. `git log --pretty=oneline --grep=CRLF` shows nearly 100 commits involving reverts of CRLF making its way into the index and then history. As far as I can tell, there are none the other way round except for specific cases like `.bat` files or tests for parsers that need to accept such sequences. Of note, one of the earliest of those listed in that output is: ``` commit 9795860250734e5c2a879546c534e35d9edd5944 Author: NAKAMURA Takumi Date: Thu Feb 3 11:41:27 2011 +0000 cmake/*: Add svn:eol-style=native and fix CRLF. llvm-svn: 124793 ``` ...which introduced such a defacto policy for subversion. With old versions of git, it's been a bit of a crap-shoot whether enforcing storing line endings in the history will upset checkouts on machines where such line endings are the norm. Indeed many users have enforced that git checks out the working copy according to a global or per-user config via core crlf, or core autocrlf. For ~8 years now[1], however, git has supported the ability to "do as the Romans do" on checkout, but internally store subsets of text files with line-endings specified via a system of patterns in the `.gitattributes` file. Since we now have this ability, and we've been specifying attributes for various binary files, I think it makes sense to rid us of all that work converting things "back", and just let git handle the local checkout. Thus the new toplevel policy here is * text=auto In simple terms this means "unless otherwise specified, convert all files considered "text" files to LF in the project history, but check them out as expected on the local machine. What is "expected on the local machine" is dependent on configuration and default. For those files in the repository that *do* need CRLF endings, I've adopted a policy of `eol=crlf` which means that git will store them in history with LF, but regardless of user config, they'll be checked out in tree with CRLF. Finally, existing files have been "corrected" in history via `git add --renormalize .` End users should *not* need to adjust their local git config or workflow. [1]: git 2.10 was released with fixed support for fine-grained line-ending tracking that respects user-config *and* repo policy. This can be considered the point at which git will respect both the user's local working tree preference *and* the history as specified by the maintainers. See https://github.com/git/git/blob/master/Documentation/RelNotes/2.10.0.txt#L248 for the release note. --- .gitattributes | 7 +++++++ clang-tools-extra/clangd/test/.gitattributes | 3 +++ clang/test/.gitattributes | 4 ++++ llvm/docs/TestingGuide.rst | 6 ++++++ llvm/test/FileCheck/.gitattributes | 1 + llvm/test/tools/llvm-ar/Inputs/.gitattributes | 1 + llvm/utils/lit/tests/Inputs/shtest-shell/.gitattributes | 1 + 7 files changed, 23 insertions(+) create mode 100644 clang-tools-extra/clangd/test/.gitattributes create mode 100644 clang/test/.gitattributes create mode 100644 llvm/test/FileCheck/.gitattributes create mode 100644 llvm/test/tools/llvm-ar/Inputs/.gitattributes create mode 100644 llvm/utils/lit/tests/Inputs/shtest-shell/.gitattributes diff --git a/.gitattributes b/.gitattributes index 6b281f33f737db..aced01d485c181 100644 --- a/.gitattributes +++ b/.gitattributes @@ -1,3 +1,10 @@ +# Checkout as native, commit as LF except in specific circumstances +* text=auto +*.bat text eol=crlf +*.rc text eol=crlf +*.sln text eol=crlf +*.natvis text eol=crlf + libcxx/src/**/*.cpp merge=libcxx-reformat libcxx/include/**/*.h merge=libcxx-reformat diff --git a/clang-tools-extra/clangd/test/.gitattributes b/clang-tools-extra/clangd/test/.gitattributes new file mode 100644 index 00000000000000..20971adc2b5d03 --- /dev/null +++ b/clang-tools-extra/clangd/test/.gitattributes @@ -0,0 +1,3 @@ +input-mirror.test text eol=crlf +too_large.test text eol=crlf +protocol.test text eol=crlf diff --git a/clang/test/.gitattributes b/clang/test/.gitattributes new file mode 100644 index 00000000000000..160fc6cf561751 --- /dev/null +++ b/clang/test/.gitattributes @@ -0,0 +1,4 @@ +FixIt/fixit-newline-style.c text eol=crlf +Frontend/system-header-line-directive-ms-lineendings.c text eol=crlf +Frontend/rewrite-includes-mixed-eol-crlf.* text eol=crlf +clang/test/Frontend/rewrite-includes-mixed-eol-lf.h text eolf=lf diff --git a/llvm/docs/TestingGuide.rst b/llvm/docs/TestingGuide.rst index 08617933519fdb..344a295226f6ae 100644 --- a/llvm/docs/TestingGuide.rst +++ b/llvm/docs/TestingGuide.rst @@ -360,6 +360,12 @@ Best practices for regression tests - Try to give values (including variables, blocks and functions) meaningful names, and avoid retaining complex names generated by the optimization pipeline (such as ``%foo.0.0.0.0.0.0``). +- If your tests depend on specific input file encodings, beware of line-ending + issues across different platforms, and in the project's history. Before you + commit tests that depend on explicit encodings, consider adding filetype or + specific line-ending annotations to a `<.gitattributes + https://git-scm.com/docs/gitattributes#_effects>`_ file in the appropriate + directory in the repository. Extra files ----------- diff --git a/llvm/test/FileCheck/.gitattributes b/llvm/test/FileCheck/.gitattributes new file mode 100644 index 00000000000000..ba27d7fad76d50 --- /dev/null +++ b/llvm/test/FileCheck/.gitattributes @@ -0,0 +1 @@ +dos-style-eol.txt text eol=crlf diff --git a/llvm/test/tools/llvm-ar/Inputs/.gitattributes b/llvm/test/tools/llvm-ar/Inputs/.gitattributes new file mode 100644 index 00000000000000..6c8a26285daf7f --- /dev/null +++ b/llvm/test/tools/llvm-ar/Inputs/.gitattributes @@ -0,0 +1 @@ +mri-crlf.mri text eol=crlf diff --git a/llvm/utils/lit/tests/Inputs/shtest-shell/.gitattributes b/llvm/utils/lit/tests/Inputs/shtest-shell/.gitattributes new file mode 100644 index 00000000000000..2df17345df5b87 --- /dev/null +++ b/llvm/utils/lit/tests/Inputs/shtest-shell/.gitattributes @@ -0,0 +1 @@ +*.dos text eol=crlf >From 9d98acb196a40fee5229afeb08f95fd36d41c10a Mon Sep 17 00:00:00 2001 From: Luke Drummond Date: Thu, 17 Oct 2024 14:49:26 +0100 Subject: [PATCH 2/2] Renormalize line endings whitespace only after dccebddb3b80 Line ending policies were changed in the parent, dccebddb3b80. To make it easier to resolve downstream merge conflicts after line-ending policies are adjusted this is a separate whitespace-only commit. If you have merge conflicts as a result, you can simply `git add --renormalize -u && git merge --continue` or `git add --renormalize -u && git rebase --continue` - depending on your workflow. --- .../clangd/test/input-mirror.test | 34 +- clang-tools-extra/clangd/test/protocol.test | 226 +- clang-tools-extra/clangd/test/too_large.test | 14 +- clang/test/AST/HLSL/StructuredBuffer-AST.hlsl | 128 +- clang/test/C/C2y/n3262.c | 40 +- clang/test/C/C2y/n3274.c | 36 +- .../StructuredBuffer-annotations.hlsl | 44 +- .../StructuredBuffer-constructor.hlsl | 38 +- .../StructuredBuffer-elementtype.hlsl | 140 +- .../builtins/StructuredBuffer-subscript.hlsl | 34 +- clang/test/CodeGenHLSL/builtins/atan2.hlsl | 118 +- clang/test/CodeGenHLSL/builtins/cross.hlsl | 74 +- clang/test/CodeGenHLSL/builtins/length.hlsl | 146 +- .../test/CodeGenHLSL/builtins/normalize.hlsl | 170 +- clang/test/CodeGenHLSL/builtins/step.hlsl | 168 +- clang/test/Driver/flang/msvc-link.f90 | 10 +- clang/test/FixIt/fixit-newline-style.c | 22 +- .../rewrite-includes-mixed-eol-crlf.c | 16 +- .../rewrite-includes-mixed-eol-crlf.h | 22 +- ...tem-header-line-directive-ms-lineendings.c | 42 +- clang/test/ParserHLSL/bitfields.hlsl | 60 +- .../hlsl_annotations_on_struct_members.hlsl | 42 +- .../ParserHLSL/hlsl_contained_type_attr.hlsl | 50 +- .../hlsl_contained_type_attr_error.hlsl | 56 +- clang/test/ParserHLSL/hlsl_is_rov_attr.hlsl | 44 +- .../ParserHLSL/hlsl_is_rov_attr_error.hlsl | 40 +- .../test/ParserHLSL/hlsl_raw_buffer_attr.hlsl | 44 +- .../hlsl_raw_buffer_attr_error.hlsl | 34 +- .../ParserHLSL/hlsl_resource_class_attr.hlsl | 74 +- .../hlsl_resource_class_attr_error.hlsl | 44 +- .../hlsl_resource_handle_attrs.hlsl | 42 +- clang/test/Sema/aarch64-sve-vector-trig-ops.c | 130 +- clang/test/Sema/riscv-rvv-vector-trig-ops.c | 134 +- .../avail-diag-default-compute.hlsl | 238 +- .../Availability/avail-diag-default-lib.hlsl | 360 +- .../avail-diag-relaxed-compute.hlsl | 238 +- .../Availability/avail-diag-relaxed-lib.hlsl | 324 +- .../avail-diag-strict-compute.hlsl | 256 +- .../Availability/avail-diag-strict-lib.hlsl | 384 +- .../avail-lib-multiple-stages.hlsl | 114 +- .../SemaHLSL/BuiltIns/StructuredBuffers.hlsl | 38 +- .../test/SemaHLSL/BuiltIns/cross-errors.hlsl | 86 +- .../BuiltIns/half-float-only-errors2.hlsl | 26 +- .../test/SemaHLSL/BuiltIns/length-errors.hlsl | 64 +- .../SemaHLSL/BuiltIns/normalize-errors.hlsl | 62 +- clang/test/SemaHLSL/BuiltIns/step-errors.hlsl | 62 +- .../Types/Traits/IsIntangibleType.hlsl | 162 +- .../Types/Traits/IsIntangibleTypeErrors.hlsl | 24 +- .../resource_binding_attr_error_basic.hlsl | 84 +- .../resource_binding_attr_error_other.hlsl | 18 +- .../resource_binding_attr_error_resource.hlsl | 98 +- ...urce_binding_attr_error_silence_diags.hlsl | 54 +- .../resource_binding_attr_error_space.hlsl | 124 +- .../resource_binding_attr_error_udt.hlsl | 270 +- clang/tools/scan-build/bin/scan-build.bat | 2 +- .../tools/scan-build/libexec/c++-analyzer.bat | 2 +- .../tools/scan-build/libexec/ccc-analyzer.bat | 2 +- clang/utils/ClangVisualizers/clang.natvis | 2178 ++--- .../test/Driver/msvc-dependent-lib-flags.f90 | 72 +- .../ir-interpreter-phi-nodes/Makefile | 8 +- .../postmortem/minidump/fizzbuzz.syms | 4 +- .../target-new-solib-notifications/Makefile | 46 +- .../target-new-solib-notifications/a.cpp | 6 +- .../target-new-solib-notifications/b.cpp | 2 +- .../target-new-solib-notifications/c.cpp | 2 +- .../target-new-solib-notifications/d.cpp | 2 +- .../target-new-solib-notifications/main.cpp | 32 +- .../unwind/zeroth_frame/Makefile | 6 +- .../unwind/zeroth_frame/TestZerothFrame.py | 176 +- lldb/test/API/python_api/debugger/Makefile | 6 +- lldb/test/Shell/BuildScript/modes.test | 70 +- lldb/test/Shell/BuildScript/script-args.test | 64 +- .../Shell/BuildScript/toolchain-clang-cl.test | 98 +- .../Windows/Sigsegv/Inputs/sigsegv.cpp | 80 +- .../NativePDB/Inputs/inline_sites.s | 1244 +-- .../Inputs/inline_sites_live.lldbinit | 14 +- .../Inputs/local-variables-registers.lldbinit | 70 +- .../NativePDB/Inputs/lookup-by-types.lldbinit | 6 +- .../subfield_register_simple_type.lldbinit | 4 +- .../NativePDB/function-types-classes.cpp | 12 +- .../NativePDB/inline_sites_live.cpp | 68 +- .../SymbolFile/NativePDB/lookup-by-types.cpp | 92 +- lldb/unittests/Breakpoint/CMakeLists.txt | 20 +- llvm/benchmarks/FormatVariadicBM.cpp | 126 +- .../GetIntrinsicForClangBuiltin.cpp | 100 +- .../GetIntrinsicInfoTableEntriesBM.cpp | 60 +- llvm/docs/_static/LoopOptWG_invite.ics | 160 +- llvm/lib/Support/rpmalloc/CACHE.md | 38 +- llvm/lib/Support/rpmalloc/README.md | 440 +- llvm/lib/Support/rpmalloc/malloc.c | 1448 +-- llvm/lib/Support/rpmalloc/rpmalloc.c | 7984 ++++++++--------- llvm/lib/Support/rpmalloc/rpmalloc.h | 856 +- llvm/lib/Support/rpmalloc/rpnew.h | 226 +- .../Target/DirectX/DXILFinalizeLinkage.cpp | 130 +- .../DirectX/DirectXTargetTransformInfo.cpp | 76 +- llvm/test/CodeGen/DirectX/atan2.ll | 174 +- llvm/test/CodeGen/DirectX/atan2_error.ll | 22 +- llvm/test/CodeGen/DirectX/cross.ll | 112 +- llvm/test/CodeGen/DirectX/finalize_linkage.ll | 128 +- llvm/test/CodeGen/DirectX/normalize.ll | 224 +- llvm/test/CodeGen/DirectX/normalize_error.ll | 20 +- llvm/test/CodeGen/DirectX/step.ll | 156 +- .../CodeGen/SPIRV/hlsl-intrinsics/atan2.ll | 98 +- .../CodeGen/SPIRV/hlsl-intrinsics/cross.ll | 66 +- .../CodeGen/SPIRV/hlsl-intrinsics/length.ll | 58 +- .../SPIRV/hlsl-intrinsics/normalize.ll | 62 +- .../CodeGen/SPIRV/hlsl-intrinsics/step.ll | 66 +- .../Demangle/ms-placeholder-return-type.test | 36 +- llvm/test/FileCheck/dos-style-eol.txt | 20 +- llvm/test/tools/llvm-ar/Inputs/mri-crlf.mri | 8 +- .../tools/llvm-cvtres/Inputs/languages.rc | 72 +- .../tools/llvm-cvtres/Inputs/test_resource.rc | 98 +- .../tools/llvm-rc/Inputs/dialog-with-menu.rc | 32 +- .../COFF/Inputs/resources/test_resource.rc | 88 +- llvm/unittests/Support/ModRefTest.cpp | 54 +- llvm/utils/LLVMVisualizers/llvm.natvis | 816 +- .../lit/tests/Inputs/shtest-shell/diff-in.dos | 6 +- llvm/utils/release/build_llvm_release.bat | 1030 +-- openmp/runtime/doc/doxygen/config | 3644 ++++---- pstl/CREDITS.txt | 42 +- 120 files changed, 14283 insertions(+), 14283 deletions(-) diff --git a/clang-tools-extra/clangd/test/input-mirror.test b/clang-tools-extra/clangd/test/input-mirror.test index a34a4a08cf60cf..bce3f9923a3b90 100644 --- a/clang-tools-extra/clangd/test/input-mirror.test +++ b/clang-tools-extra/clangd/test/input-mirror.test @@ -1,17 +1,17 @@ -# RUN: clangd -pretty -sync -input-mirror-file %t < %s -# Note that we have to use '-b' as -input-mirror-file does not have a newline at the end of file. -# RUN: diff -b %t %s -# It is absolutely vital that this file has CRLF line endings. -# -Content-Length: 125 - -{"jsonrpc":"2.0","id":0,"method":"initialize","params":{"processId":123,"rootPath":"clangd","capabilities":{},"trace":"off"}} -Content-Length: 172 - -{"jsonrpc":"2.0","method":"textDocument/didOpen","params":{"textDocument":{"uri":"test:///main.cpp","languageId":"cpp","version":1,"text":"int main() {\nint a;\na;\n}\n"}}} -Content-Length: 44 - -{"jsonrpc":"2.0","id":3,"method":"shutdown"} -Content-Length: 33 - -{"jsonrpc":"2.0","method":"exit"} +# RUN: clangd -pretty -sync -input-mirror-file %t < %s +# Note that we have to use '-b' as -input-mirror-file does not have a newline at the end of file. +# RUN: diff -b %t %s +# It is absolutely vital that this file has CRLF line endings. +# +Content-Length: 125 + +{"jsonrpc":"2.0","id":0,"method":"initialize","params":{"processId":123,"rootPath":"clangd","capabilities":{},"trace":"off"}} +Content-Length: 172 + +{"jsonrpc":"2.0","method":"textDocument/didOpen","params":{"textDocument":{"uri":"test:///main.cpp","languageId":"cpp","version":1,"text":"int main() {\nint a;\na;\n}\n"}}} +Content-Length: 44 + +{"jsonrpc":"2.0","id":3,"method":"shutdown"} +Content-Length: 33 + +{"jsonrpc":"2.0","method":"exit"} diff --git a/clang-tools-extra/clangd/test/protocol.test b/clang-tools-extra/clangd/test/protocol.test index 5e852d1d9deebc..64ccfaef189111 100644 --- a/clang-tools-extra/clangd/test/protocol.test +++ b/clang-tools-extra/clangd/test/protocol.test @@ -1,113 +1,113 @@ -# RUN: not clangd -pretty -sync -enable-test-uri-scheme < %s | FileCheck -strict-whitespace %s -# RUN: not clangd -pretty -sync -enable-test-uri-scheme < %s 2>&1 | FileCheck -check-prefix=STDERR %s -# vim: fileformat=dos -# It is absolutely vital that this file has CRLF line endings. -# -# Note that we invert the test because we intent to let clangd exit prematurely. -# -# Test protocol parsing -Content-Length: 125 -Content-Type: application/vscode-jsonrpc; charset-utf-8 - -{"jsonrpc":"2.0","id":0,"method":"initialize","params":{"processId":123,"rootPath":"clangd","capabilities":{},"trace":"off"}} -# Test message with Content-Type after Content-Length -# -# CHECK: "jsonrpc": "2.0", -# CHECK-NEXT: "result": { -# CHECK: } -Content-Length: 246 - -{"jsonrpc":"2.0","method":"textDocument/didOpen","params":{"textDocument":{"uri":"test:///main.cpp","languageId":"cpp","version":1,"text":"struct fake { int a, bb, ccc; int f(int i, const float f) const; };\nint main() {\n fake f;\n f.\n}\n"}}} - -Content-Length: 104 - -{"jsonrpc":"2.0","method":"textDocument/didChange","params":{"textDocument":{"uri":"test:///main.cpp"}}} - -Content-Type: application/vscode-jsonrpc; charset-utf-8 -Content-Length: 146 - -{"jsonrpc":"2.0","id":1,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} -# Test message with Content-Type before Content-Length -# -# CHECK: "id": 1, -# CHECK-NEXT: "jsonrpc": "2.0", -# CHECK-NEXT: "result": { -# CHECK-NEXT: "isIncomplete": false, -# CHECK-NEXT: "items": [ -# CHECK: "filterText": "a", -# CHECK-NEXT: "insertText": "a", -# CHECK-NEXT: "insertTextFormat": 1, -# CHECK-NEXT: "kind": 5, -# CHECK-NEXT: "label": " a", -# CHECK-NEXT: "score": {{[0-9]+.[0-9]+}}, -# CHECK-NEXT: "sortText": "{{.*}}" -# CHECK: ] -# CHECK-NEXT: } - -X-Test: Testing -Content-Type: application/vscode-jsonrpc; charset-utf-8 -Content-Length: 146 -Content-Type: application/vscode-jsonrpc; charset-utf-8 -X-Testing: Test - -{"jsonrpc":"2.0","id":2,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} - -Content-Type: application/vscode-jsonrpc; charset-utf-8 -Content-Length: 10 -Content-Length: 146 - -{"jsonrpc":"2.0","id":3,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} -# Test message with duplicate Content-Length headers -# -# CHECK: "id": 3, -# CHECK-NEXT: "jsonrpc": "2.0", -# CHECK-NEXT: "result": { -# CHECK-NEXT: "isIncomplete": false, -# CHECK-NEXT: "items": [ -# CHECK: "filterText": "a", -# CHECK-NEXT: "insertText": "a", -# CHECK-NEXT: "insertTextFormat": 1, -# CHECK-NEXT: "kind": 5, -# CHECK-NEXT: "label": " a", -# CHECK-NEXT: "score": {{[0-9]+.[0-9]+}}, -# CHECK-NEXT: "sortText": "{{.*}}" -# CHECK: ] -# CHECK-NEXT: } -# STDERR: Warning: Duplicate Content-Length header received. The previous value for this message (10) was ignored. - -Content-Type: application/vscode-jsonrpc; charset-utf-8 -Content-Length: 10 - -{"jsonrpc":"2.0","id":4,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} -# Test message with malformed Content-Length -# -# STDERR: JSON parse error -# Ensure we recover by sending another (valid) message - -Content-Length: 146 - -{"jsonrpc":"2.0","id":5,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} -# Test message with Content-Type before Content-Length -# -# CHECK: "id": 5, -# CHECK-NEXT: "jsonrpc": "2.0", -# CHECK-NEXT: "result": { -# CHECK-NEXT: "isIncomplete": false, -# CHECK-NEXT: "items": [ -# CHECK: "filterText": "a", -# CHECK-NEXT: "insertText": "a", -# CHECK-NEXT: "insertTextFormat": 1, -# CHECK-NEXT: "kind": 5, -# CHECK-NEXT: "label": " a", -# CHECK-NEXT: "score": {{[0-9]+.[0-9]+}}, -# CHECK-NEXT: "sortText": "{{.*}}" -# CHECK: ] -# CHECK-NEXT: } -Content-Length: 1024 - -{"jsonrpc":"2.0","id":5,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} -# Test message which reads beyond the end of the stream. -# -# Ensure this is the last test in the file! -# STDERR: Input was aborted. Read only {{[0-9]+}} bytes of expected {{[0-9]+}}. - +# RUN: not clangd -pretty -sync -enable-test-uri-scheme < %s | FileCheck -strict-whitespace %s +# RUN: not clangd -pretty -sync -enable-test-uri-scheme < %s 2>&1 | FileCheck -check-prefix=STDERR %s +# vim: fileformat=dos +# It is absolutely vital that this file has CRLF line endings. +# +# Note that we invert the test because we intent to let clangd exit prematurely. +# +# Test protocol parsing +Content-Length: 125 +Content-Type: application/vscode-jsonrpc; charset-utf-8 + +{"jsonrpc":"2.0","id":0,"method":"initialize","params":{"processId":123,"rootPath":"clangd","capabilities":{},"trace":"off"}} +# Test message with Content-Type after Content-Length +# +# CHECK: "jsonrpc": "2.0", +# CHECK-NEXT: "result": { +# CHECK: } +Content-Length: 246 + +{"jsonrpc":"2.0","method":"textDocument/didOpen","params":{"textDocument":{"uri":"test:///main.cpp","languageId":"cpp","version":1,"text":"struct fake { int a, bb, ccc; int f(int i, const float f) const; };\nint main() {\n fake f;\n f.\n}\n"}}} + +Content-Length: 104 + +{"jsonrpc":"2.0","method":"textDocument/didChange","params":{"textDocument":{"uri":"test:///main.cpp"}}} + +Content-Type: application/vscode-jsonrpc; charset-utf-8 +Content-Length: 146 + +{"jsonrpc":"2.0","id":1,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} +# Test message with Content-Type before Content-Length +# +# CHECK: "id": 1, +# CHECK-NEXT: "jsonrpc": "2.0", +# CHECK-NEXT: "result": { +# CHECK-NEXT: "isIncomplete": false, +# CHECK-NEXT: "items": [ +# CHECK: "filterText": "a", +# CHECK-NEXT: "insertText": "a", +# CHECK-NEXT: "insertTextFormat": 1, +# CHECK-NEXT: "kind": 5, +# CHECK-NEXT: "label": " a", +# CHECK-NEXT: "score": {{[0-9]+.[0-9]+}}, +# CHECK-NEXT: "sortText": "{{.*}}" +# CHECK: ] +# CHECK-NEXT: } + +X-Test: Testing +Content-Type: application/vscode-jsonrpc; charset-utf-8 +Content-Length: 146 +Content-Type: application/vscode-jsonrpc; charset-utf-8 +X-Testing: Test + +{"jsonrpc":"2.0","id":2,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} + +Content-Type: application/vscode-jsonrpc; charset-utf-8 +Content-Length: 10 +Content-Length: 146 + +{"jsonrpc":"2.0","id":3,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} +# Test message with duplicate Content-Length headers +# +# CHECK: "id": 3, +# CHECK-NEXT: "jsonrpc": "2.0", +# CHECK-NEXT: "result": { +# CHECK-NEXT: "isIncomplete": false, +# CHECK-NEXT: "items": [ +# CHECK: "filterText": "a", +# CHECK-NEXT: "insertText": "a", +# CHECK-NEXT: "insertTextFormat": 1, +# CHECK-NEXT: "kind": 5, +# CHECK-NEXT: "label": " a", +# CHECK-NEXT: "score": {{[0-9]+.[0-9]+}}, +# CHECK-NEXT: "sortText": "{{.*}}" +# CHECK: ] +# CHECK-NEXT: } +# STDERR: Warning: Duplicate Content-Length header received. The previous value for this message (10) was ignored. + +Content-Type: application/vscode-jsonrpc; charset-utf-8 +Content-Length: 10 + +{"jsonrpc":"2.0","id":4,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} +# Test message with malformed Content-Length +# +# STDERR: JSON parse error +# Ensure we recover by sending another (valid) message + +Content-Length: 146 + +{"jsonrpc":"2.0","id":5,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} +# Test message with Content-Type before Content-Length +# +# CHECK: "id": 5, +# CHECK-NEXT: "jsonrpc": "2.0", +# CHECK-NEXT: "result": { +# CHECK-NEXT: "isIncomplete": false, +# CHECK-NEXT: "items": [ +# CHECK: "filterText": "a", +# CHECK-NEXT: "insertText": "a", +# CHECK-NEXT: "insertTextFormat": 1, +# CHECK-NEXT: "kind": 5, +# CHECK-NEXT: "label": " a", +# CHECK-NEXT: "score": {{[0-9]+.[0-9]+}}, +# CHECK-NEXT: "sortText": "{{.*}}" +# CHECK: ] +# CHECK-NEXT: } +Content-Length: 1024 + +{"jsonrpc":"2.0","id":5,"method":"textDocument/completion","params":{"textDocument":{"uri":"test:/main.cpp"},"position":{"line":3,"character":5}}} +# Test message which reads beyond the end of the stream. +# +# Ensure this is the last test in the file! +# STDERR: Input was aborted. Read only {{[0-9]+}} bytes of expected {{[0-9]+}}. + diff --git a/clang-tools-extra/clangd/test/too_large.test b/clang-tools-extra/clangd/test/too_large.test index 7df981e7942073..6986bd5e258e87 100644 --- a/clang-tools-extra/clangd/test/too_large.test +++ b/clang-tools-extra/clangd/test/too_large.test @@ -1,7 +1,7 @@ -# RUN: not clangd -sync < %s 2>&1 | FileCheck -check-prefix=STDERR %s -# vim: fileformat=dos -# It is absolutely vital that this file has CRLF line endings. -# -Content-Length: 2147483648 - -# STDERR: Refusing to read message +# RUN: not clangd -sync < %s 2>&1 | FileCheck -check-prefix=STDERR %s +# vim: fileformat=dos +# It is absolutely vital that this file has CRLF line endings. +# +Content-Length: 2147483648 + +# STDERR: Refusing to read message diff --git a/clang/test/AST/HLSL/StructuredBuffer-AST.hlsl b/clang/test/AST/HLSL/StructuredBuffer-AST.hlsl index 030fcfc31691dc..9c1630f6f570aa 100644 --- a/clang/test/AST/HLSL/StructuredBuffer-AST.hlsl +++ b/clang/test/AST/HLSL/StructuredBuffer-AST.hlsl @@ -1,64 +1,64 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library -x hlsl -ast-dump -DEMPTY %s | FileCheck -check-prefix=EMPTY %s -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library -x hlsl -ast-dump %s | FileCheck %s - - -// This test tests two different AST generations. The "EMPTY" test mode verifies -// the AST generated by forward declaration of the HLSL types which happens on -// initializing the HLSL external AST with an AST Context. - -// The non-empty mode has a use that requires the StructuredBuffer type be complete, -// which results in the AST being populated by the external AST source. That -// case covers the full implementation of the template declaration and the -// instantiated specialization. - -// EMPTY: ClassTemplateDecl 0x{{[0-9A-Fa-f]+}} <> implicit StructuredBuffer -// EMPTY-NEXT: TemplateTypeParmDecl 0x{{[0-9A-Fa-f]+}} <> class depth 0 index 0 element_type -// EMPTY-NEXT: CXXRecordDecl 0x{{[0-9A-Fa-f]+}} <> implicit class StructuredBuffer -// EMPTY-NEXT: FinalAttr 0x{{[0-9A-Fa-f]+}} <> Implicit final - -// There should be no more occurrances of StructuredBuffer -// EMPTY-NOT: StructuredBuffer - -#ifndef EMPTY - -StructuredBuffer Buffer; - -#endif - -// CHECK: ClassTemplateDecl 0x{{[0-9A-Fa-f]+}} <> implicit StructuredBuffer -// CHECK-NEXT: TemplateTypeParmDecl 0x{{[0-9A-Fa-f]+}} <> class depth 0 index 0 element_type -// CHECK-NEXT: CXXRecordDecl 0x{{[0-9A-Fa-f]+}} <> implicit class StructuredBuffer definition - -// CHECK: FinalAttr 0x{{[0-9A-Fa-f]+}} <> Implicit final -// CHECK-NEXT: FieldDecl 0x{{[0-9A-Fa-f]+}} <> implicit h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] -// CHECK-SAME{LITERAL}: [[hlsl::contained_type(element_type)]] -// CHECK-NEXT: HLSLResourceAttr 0x{{[0-9A-Fa-f]+}} <> Implicit TypedBuffer - -// CHECK: CXXMethodDecl 0x{{[0-9A-Fa-f]+}} <> operator[] 'element_type &const (unsigned int) const' -// CHECK-NEXT: ParmVarDecl 0x{{[0-9A-Fa-f]+}} <> Idx 'unsigned int' -// CHECK-NEXT: CompoundStmt 0x{{[0-9A-Fa-f]+}} <> -// CHECK-NEXT: ReturnStmt 0x{{[0-9A-Fa-f]+}} <> -// CHECK-NEXT: MemberExpr 0x{{[0-9A-Fa-f]+}} <> 'element_type' lvalue .e 0x{{[0-9A-Fa-f]+}} -// CHECK-NEXT: CXXThisExpr 0x{{[0-9A-Fa-f]+}} <> 'const StructuredBuffer' lvalue implicit this -// CHECK-NEXT: AlwaysInlineAttr 0x{{[0-9A-Fa-f]+}} <> Implicit always_inline - -// CHECK-NEXT: CXXMethodDecl 0x{{[0-9A-Fa-f]+}} <> operator[] 'element_type &(unsigned int)' -// CHECK-NEXT: ParmVarDecl 0x{{[0-9A-Fa-f]+}} <> Idx 'unsigned int' -// CHECK-NEXT: CompoundStmt 0x{{[0-9A-Fa-f]+}} <> -// CHECK-NEXT: ReturnStmt 0x{{[0-9A-Fa-f]+}} <> -// CHECK-NEXT: MemberExpr 0x{{[0-9A-Fa-f]+}} <> 'element_type' lvalue .e 0x{{[0-9A-Fa-f]+}} -// CHECK-NEXT: CXXThisExpr 0x{{[0-9A-Fa-f]+}} <> 'StructuredBuffer' lvalue implicit this -// CHECK-NEXT: AlwaysInlineAttr 0x{{[0-9A-Fa-f]+}} <> Implicit always_inline - -// CHECK: ClassTemplateSpecializationDecl 0x{{[0-9A-Fa-f]+}} <> class StructuredBuffer definition - -// CHECK: TemplateArgument type 'float' -// CHECK-NEXT: BuiltinType 0x{{[0-9A-Fa-f]+}} 'float' -// CHECK-NEXT: FinalAttr 0x{{[0-9A-Fa-f]+}} <> Implicit final -// CHECK-NEXT: FieldDecl 0x{{[0-9A-Fa-f]+}} <> implicit h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] -// CHECK-SAME{LITERAL}: [[hlsl::contained_type(float)]] -// CHECK-NEXT: HLSLResourceAttr 0x{{[0-9A-Fa-f]+}} <> Implicit TypedBuffer +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library -x hlsl -ast-dump -DEMPTY %s | FileCheck -check-prefix=EMPTY %s +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library -x hlsl -ast-dump %s | FileCheck %s + + +// This test tests two different AST generations. The "EMPTY" test mode verifies +// the AST generated by forward declaration of the HLSL types which happens on +// initializing the HLSL external AST with an AST Context. + +// The non-empty mode has a use that requires the StructuredBuffer type be complete, +// which results in the AST being populated by the external AST source. That +// case covers the full implementation of the template declaration and the +// instantiated specialization. + +// EMPTY: ClassTemplateDecl 0x{{[0-9A-Fa-f]+}} <> implicit StructuredBuffer +// EMPTY-NEXT: TemplateTypeParmDecl 0x{{[0-9A-Fa-f]+}} <> class depth 0 index 0 element_type +// EMPTY-NEXT: CXXRecordDecl 0x{{[0-9A-Fa-f]+}} <> implicit class StructuredBuffer +// EMPTY-NEXT: FinalAttr 0x{{[0-9A-Fa-f]+}} <> Implicit final + +// There should be no more occurrances of StructuredBuffer +// EMPTY-NOT: StructuredBuffer + +#ifndef EMPTY + +StructuredBuffer Buffer; + +#endif + +// CHECK: ClassTemplateDecl 0x{{[0-9A-Fa-f]+}} <> implicit StructuredBuffer +// CHECK-NEXT: TemplateTypeParmDecl 0x{{[0-9A-Fa-f]+}} <> class depth 0 index 0 element_type +// CHECK-NEXT: CXXRecordDecl 0x{{[0-9A-Fa-f]+}} <> implicit class StructuredBuffer definition + +// CHECK: FinalAttr 0x{{[0-9A-Fa-f]+}} <> Implicit final +// CHECK-NEXT: FieldDecl 0x{{[0-9A-Fa-f]+}} <> implicit h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] +// CHECK-SAME{LITERAL}: [[hlsl::contained_type(element_type)]] +// CHECK-NEXT: HLSLResourceAttr 0x{{[0-9A-Fa-f]+}} <> Implicit TypedBuffer + +// CHECK: CXXMethodDecl 0x{{[0-9A-Fa-f]+}} <> operator[] 'element_type &const (unsigned int) const' +// CHECK-NEXT: ParmVarDecl 0x{{[0-9A-Fa-f]+}} <> Idx 'unsigned int' +// CHECK-NEXT: CompoundStmt 0x{{[0-9A-Fa-f]+}} <> +// CHECK-NEXT: ReturnStmt 0x{{[0-9A-Fa-f]+}} <> +// CHECK-NEXT: MemberExpr 0x{{[0-9A-Fa-f]+}} <> 'element_type' lvalue .e 0x{{[0-9A-Fa-f]+}} +// CHECK-NEXT: CXXThisExpr 0x{{[0-9A-Fa-f]+}} <> 'const StructuredBuffer' lvalue implicit this +// CHECK-NEXT: AlwaysInlineAttr 0x{{[0-9A-Fa-f]+}} <> Implicit always_inline + +// CHECK-NEXT: CXXMethodDecl 0x{{[0-9A-Fa-f]+}} <> operator[] 'element_type &(unsigned int)' +// CHECK-NEXT: ParmVarDecl 0x{{[0-9A-Fa-f]+}} <> Idx 'unsigned int' +// CHECK-NEXT: CompoundStmt 0x{{[0-9A-Fa-f]+}} <> +// CHECK-NEXT: ReturnStmt 0x{{[0-9A-Fa-f]+}} <> +// CHECK-NEXT: MemberExpr 0x{{[0-9A-Fa-f]+}} <> 'element_type' lvalue .e 0x{{[0-9A-Fa-f]+}} +// CHECK-NEXT: CXXThisExpr 0x{{[0-9A-Fa-f]+}} <> 'StructuredBuffer' lvalue implicit this +// CHECK-NEXT: AlwaysInlineAttr 0x{{[0-9A-Fa-f]+}} <> Implicit always_inline + +// CHECK: ClassTemplateSpecializationDecl 0x{{[0-9A-Fa-f]+}} <> class StructuredBuffer definition + +// CHECK: TemplateArgument type 'float' +// CHECK-NEXT: BuiltinType 0x{{[0-9A-Fa-f]+}} 'float' +// CHECK-NEXT: FinalAttr 0x{{[0-9A-Fa-f]+}} <> Implicit final +// CHECK-NEXT: FieldDecl 0x{{[0-9A-Fa-f]+}} <> implicit h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] +// CHECK-SAME{LITERAL}: [[hlsl::contained_type(float)]] +// CHECK-NEXT: HLSLResourceAttr 0x{{[0-9A-Fa-f]+}} <> Implicit TypedBuffer diff --git a/clang/test/C/C2y/n3262.c b/clang/test/C/C2y/n3262.c index 3ff2062d88dde8..864ab351bdbc23 100644 --- a/clang/test/C/C2y/n3262.c +++ b/clang/test/C/C2y/n3262.c @@ -1,20 +1,20 @@ -// RUN: %clang_cc1 -verify -std=c2y -Wall -pedantic %s -// expected-no-diagnostics - -/* WG14 N3262: Yes - * Usability of a byte-wise copy of va_list - * - * NB: Clang explicitly documents this as being undefined behavior. A - * diagnostic is produced for some targets but not for others for assignment or - * initialization, but no diagnostic is possible to produce for use with memcpy - * in the general case, nor with a manual bytewise copy via a for loop. - * - * Therefore, nothing is tested in this file; it serves as a reminder that we - * validated our documentation against the paper. See - * clang/docs/LanguageExtensions.rst for more details. - * - * FIXME: it would be nice to add ubsan support for recognizing when an invalid - * copy is made and diagnosing on copy (or on use of the copied va_list). - */ - -int main() {} +// RUN: %clang_cc1 -verify -std=c2y -Wall -pedantic %s +// expected-no-diagnostics + +/* WG14 N3262: Yes + * Usability of a byte-wise copy of va_list + * + * NB: Clang explicitly documents this as being undefined behavior. A + * diagnostic is produced for some targets but not for others for assignment or + * initialization, but no diagnostic is possible to produce for use with memcpy + * in the general case, nor with a manual bytewise copy via a for loop. + * + * Therefore, nothing is tested in this file; it serves as a reminder that we + * validated our documentation against the paper. See + * clang/docs/LanguageExtensions.rst for more details. + * + * FIXME: it would be nice to add ubsan support for recognizing when an invalid + * copy is made and diagnosing on copy (or on use of the copied va_list). + */ + +int main() {} diff --git a/clang/test/C/C2y/n3274.c b/clang/test/C/C2y/n3274.c index ccdb89f4069ded..6bf8d72d0f3319 100644 --- a/clang/test/C/C2y/n3274.c +++ b/clang/test/C/C2y/n3274.c @@ -1,18 +1,18 @@ -// RUN: %clang_cc1 -verify -std=c23 -Wall -pedantic %s -// RUN: %clang_cc1 -verify -std=c2y -Wall -pedantic %s - -/* WG14 N3274: Yes - * Remove imaginary types - */ - -// Clang has never supported _Imaginary. -#ifdef __STDC_IEC_559_COMPLEX__ -#error "When did this happen?" -#endif - -_Imaginary float i; // expected-error {{imaginary types are not supported}} - -// _Imaginary is a keyword in older language modes, but doesn't need to be one -// in C2y or later. However, to improve diagnostic behavior, we retain it as a -// keyword in all language modes -- it is not available as an identifier. -static_assert(!__is_identifier(_Imaginary)); +// RUN: %clang_cc1 -verify -std=c23 -Wall -pedantic %s +// RUN: %clang_cc1 -verify -std=c2y -Wall -pedantic %s + +/* WG14 N3274: Yes + * Remove imaginary types + */ + +// Clang has never supported _Imaginary. +#ifdef __STDC_IEC_559_COMPLEX__ +#error "When did this happen?" +#endif + +_Imaginary float i; // expected-error {{imaginary types are not supported}} + +// _Imaginary is a keyword in older language modes, but doesn't need to be one +// in C2y or later. However, to improve diagnostic behavior, we retain it as a +// keyword in all language modes -- it is not available as an identifier. +static_assert(!__is_identifier(_Imaginary)); diff --git a/clang/test/CodeGenHLSL/builtins/StructuredBuffer-annotations.hlsl b/clang/test/CodeGenHLSL/builtins/StructuredBuffer-annotations.hlsl index 4d3d4908c396e6..81c5837d8f2077 100644 --- a/clang/test/CodeGenHLSL/builtins/StructuredBuffer-annotations.hlsl +++ b/clang/test/CodeGenHLSL/builtins/StructuredBuffer-annotations.hlsl @@ -1,22 +1,22 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s - -StructuredBuffer Buffer1; -StructuredBuffer > BufferArray[4]; - -StructuredBuffer Buffer2 : register(u3); -StructuredBuffer > BufferArray2[4] : register(u4); - -StructuredBuffer Buffer3 : register(u3, space1); -StructuredBuffer > BufferArray3[4] : register(u4, space1); - -[numthreads(1,1,1)] -void main() { -} - -// CHECK: !hlsl.uavs = !{![[Single:[0-9]+]], ![[Array:[0-9]+]], ![[SingleAllocated:[0-9]+]], ![[ArrayAllocated:[0-9]+]], ![[SingleSpace:[0-9]+]], ![[ArraySpace:[0-9]+]]} -// CHECK-DAG: ![[Single]] = !{ptr @Buffer1, i32 10, i32 9, i1 false, i32 -1, i32 0} -// CHECK-DAG: ![[Array]] = !{ptr @BufferArray, i32 10, i32 9, i1 false, i32 -1, i32 0} -// CHECK-DAG: ![[SingleAllocated]] = !{ptr @Buffer2, i32 10, i32 9, i1 false, i32 3, i32 0} -// CHECK-DAG: ![[ArrayAllocated]] = !{ptr @BufferArray2, i32 10, i32 9, i1 false, i32 4, i32 0} -// CHECK-DAG: ![[SingleSpace]] = !{ptr @Buffer3, i32 10, i32 9, i1 false, i32 3, i32 1} -// CHECK-DAG: ![[ArraySpace]] = !{ptr @BufferArray3, i32 10, i32 9, i1 false, i32 4, i32 1} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s + +StructuredBuffer Buffer1; +StructuredBuffer > BufferArray[4]; + +StructuredBuffer Buffer2 : register(u3); +StructuredBuffer > BufferArray2[4] : register(u4); + +StructuredBuffer Buffer3 : register(u3, space1); +StructuredBuffer > BufferArray3[4] : register(u4, space1); + +[numthreads(1,1,1)] +void main() { +} + +// CHECK: !hlsl.uavs = !{![[Single:[0-9]+]], ![[Array:[0-9]+]], ![[SingleAllocated:[0-9]+]], ![[ArrayAllocated:[0-9]+]], ![[SingleSpace:[0-9]+]], ![[ArraySpace:[0-9]+]]} +// CHECK-DAG: ![[Single]] = !{ptr @Buffer1, i32 10, i32 9, i1 false, i32 -1, i32 0} +// CHECK-DAG: ![[Array]] = !{ptr @BufferArray, i32 10, i32 9, i1 false, i32 -1, i32 0} +// CHECK-DAG: ![[SingleAllocated]] = !{ptr @Buffer2, i32 10, i32 9, i1 false, i32 3, i32 0} +// CHECK-DAG: ![[ArrayAllocated]] = !{ptr @BufferArray2, i32 10, i32 9, i1 false, i32 4, i32 0} +// CHECK-DAG: ![[SingleSpace]] = !{ptr @Buffer3, i32 10, i32 9, i1 false, i32 3, i32 1} +// CHECK-DAG: ![[ArraySpace]] = !{ptr @BufferArray3, i32 10, i32 9, i1 false, i32 4, i32 1} diff --git a/clang/test/CodeGenHLSL/builtins/StructuredBuffer-constructor.hlsl b/clang/test/CodeGenHLSL/builtins/StructuredBuffer-constructor.hlsl index 178332d03e6404..f65090410ce66f 100644 --- a/clang/test/CodeGenHLSL/builtins/StructuredBuffer-constructor.hlsl +++ b/clang/test/CodeGenHLSL/builtins/StructuredBuffer-constructor.hlsl @@ -1,19 +1,19 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s -// RUN: %clang_cc1 -triple spirv-vulkan-library -x hlsl -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s --check-prefix=CHECK-SPIRV - -// XFAIL: * -// This expectedly fails because create.handle is no longer invoked -// from StructuredBuffer constructor and the replacement has not been -// implemented yet. This test should be updated to expect -// dx.create.handleFromBinding as part of issue #105076. - -StructuredBuffer Buf; - -// CHECK: define linkonce_odr noundef ptr @"??0?$StructuredBuffer at M@hlsl@@QAA at XZ" -// CHECK-NEXT: entry: - -// CHECK: %[[HandleRes:[0-9]+]] = call ptr @llvm.dx.create.handle(i8 1) -// CHECK: store ptr %[[HandleRes]], ptr %h, align 4 - -// CHECK-SPIRV: %[[HandleRes:[0-9]+]] = call ptr @llvm.spv.create.handle(i8 1) -// CHECK-SPIRV: store ptr %[[HandleRes]], ptr %h, align 8 +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple spirv-vulkan-library -x hlsl -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s --check-prefix=CHECK-SPIRV + +// XFAIL: * +// This expectedly fails because create.handle is no longer invoked +// from StructuredBuffer constructor and the replacement has not been +// implemented yet. This test should be updated to expect +// dx.create.handleFromBinding as part of issue #105076. + +StructuredBuffer Buf; + +// CHECK: define linkonce_odr noundef ptr @"??0?$StructuredBuffer at M@hlsl@@QAA at XZ" +// CHECK-NEXT: entry: + +// CHECK: %[[HandleRes:[0-9]+]] = call ptr @llvm.dx.create.handle(i8 1) +// CHECK: store ptr %[[HandleRes]], ptr %h, align 4 + +// CHECK-SPIRV: %[[HandleRes:[0-9]+]] = call ptr @llvm.spv.create.handle(i8 1) +// CHECK-SPIRV: store ptr %[[HandleRes]], ptr %h, align 8 diff --git a/clang/test/CodeGenHLSL/builtins/StructuredBuffer-elementtype.hlsl b/clang/test/CodeGenHLSL/builtins/StructuredBuffer-elementtype.hlsl index a99c7f98a1afb6..435a904327a26a 100644 --- a/clang/test/CodeGenHLSL/builtins/StructuredBuffer-elementtype.hlsl +++ b/clang/test/CodeGenHLSL/builtins/StructuredBuffer-elementtype.hlsl @@ -1,70 +1,70 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.2-compute -finclude-default-header -fnative-half-type -emit-llvm -o - %s | FileCheck %s - -// NOTE: The number in type name and whether the struct is packed or not will mostly -// likely change once subscript operators are properly implemented (llvm/llvm-project#95956) -// and theinterim field of the contained type is removed. - -// CHECK: %"class.hlsl::StructuredBuffer" = type <{ target("dx.RawBuffer", i16, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.0" = type <{ target("dx.RawBuffer", i16, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.2" = type { target("dx.RawBuffer", i32, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.3" = type { target("dx.RawBuffer", i32, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.4" = type { target("dx.RawBuffer", i64, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.5" = type { target("dx.RawBuffer", i64, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.6" = type <{ target("dx.RawBuffer", half, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.8" = type { target("dx.RawBuffer", float, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.9" = type { target("dx.RawBuffer", double, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.10" = type { target("dx.RawBuffer", <4 x i16>, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.11" = type { target("dx.RawBuffer", <3 x i32>, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.12" = type { target("dx.RawBuffer", <2 x half>, 1, 0) -// CHECK: %"class.hlsl::StructuredBuffer.13" = type { target("dx.RawBuffer", <3 x float>, 1, 0) - -StructuredBuffer BufI16; -StructuredBuffer BufU16; -StructuredBuffer BufI32; -StructuredBuffer BufU32; -StructuredBuffer BufI64; -StructuredBuffer BufU64; -StructuredBuffer BufF16; -StructuredBuffer BufF32; -StructuredBuffer BufF64; -StructuredBuffer< vector > BufI16x4; -StructuredBuffer< vector > BufU32x3; -StructuredBuffer BufF16x2; -StructuredBuffer BufF32x3; -// TODO: StructuredBuffer BufSNormF16; -> 11 -// TODO: StructuredBuffer BufUNormF16; -> 12 -// TODO: StructuredBuffer BufSNormF32; -> 13 -// TODO: StructuredBuffer BufUNormF32; -> 14 -// TODO: StructuredBuffer BufSNormF64; -> 15 -// TODO: StructuredBuffer BufUNormF64; -> 16 - -[numthreads(1,1,1)] -void main(int GI : SV_GroupIndex) { - BufI16[GI] = 0; - BufU16[GI] = 0; - BufI32[GI] = 0; - BufU32[GI] = 0; - BufI64[GI] = 0; - BufU64[GI] = 0; - BufF16[GI] = 0; - BufF32[GI] = 0; - BufF64[GI] = 0; - BufI16x4[GI] = 0; - BufU32x3[GI] = 0; - BufF16x2[GI] = 0; - BufF32x3[GI] = 0; -} - -// CHECK: !{{[0-9]+}} = !{ptr @BufI16, i32 10, i32 2, -// CHECK: !{{[0-9]+}} = !{ptr @BufU16, i32 10, i32 3, -// CHECK: !{{[0-9]+}} = !{ptr @BufI32, i32 10, i32 4, -// CHECK: !{{[0-9]+}} = !{ptr @BufU32, i32 10, i32 5, -// CHECK: !{{[0-9]+}} = !{ptr @BufI64, i32 10, i32 6, -// CHECK: !{{[0-9]+}} = !{ptr @BufU64, i32 10, i32 7, -// CHECK: !{{[0-9]+}} = !{ptr @BufF16, i32 10, i32 8, -// CHECK: !{{[0-9]+}} = !{ptr @BufF32, i32 10, i32 9, -// CHECK: !{{[0-9]+}} = !{ptr @BufF64, i32 10, i32 10, -// CHECK: !{{[0-9]+}} = !{ptr @BufI16x4, i32 10, i32 2, -// CHECK: !{{[0-9]+}} = !{ptr @BufU32x3, i32 10, i32 5, -// CHECK: !{{[0-9]+}} = !{ptr @BufF16x2, i32 10, i32 8, -// CHECK: !{{[0-9]+}} = !{ptr @BufF32x3, i32 10, i32 9, +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.2-compute -finclude-default-header -fnative-half-type -emit-llvm -o - %s | FileCheck %s + +// NOTE: The number in type name and whether the struct is packed or not will mostly +// likely change once subscript operators are properly implemented (llvm/llvm-project#95956) +// and theinterim field of the contained type is removed. + +// CHECK: %"class.hlsl::StructuredBuffer" = type <{ target("dx.RawBuffer", i16, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.0" = type <{ target("dx.RawBuffer", i16, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.2" = type { target("dx.RawBuffer", i32, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.3" = type { target("dx.RawBuffer", i32, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.4" = type { target("dx.RawBuffer", i64, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.5" = type { target("dx.RawBuffer", i64, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.6" = type <{ target("dx.RawBuffer", half, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.8" = type { target("dx.RawBuffer", float, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.9" = type { target("dx.RawBuffer", double, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.10" = type { target("dx.RawBuffer", <4 x i16>, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.11" = type { target("dx.RawBuffer", <3 x i32>, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.12" = type { target("dx.RawBuffer", <2 x half>, 1, 0) +// CHECK: %"class.hlsl::StructuredBuffer.13" = type { target("dx.RawBuffer", <3 x float>, 1, 0) + +StructuredBuffer BufI16; +StructuredBuffer BufU16; +StructuredBuffer BufI32; +StructuredBuffer BufU32; +StructuredBuffer BufI64; +StructuredBuffer BufU64; +StructuredBuffer BufF16; +StructuredBuffer BufF32; +StructuredBuffer BufF64; +StructuredBuffer< vector > BufI16x4; +StructuredBuffer< vector > BufU32x3; +StructuredBuffer BufF16x2; +StructuredBuffer BufF32x3; +// TODO: StructuredBuffer BufSNormF16; -> 11 +// TODO: StructuredBuffer BufUNormF16; -> 12 +// TODO: StructuredBuffer BufSNormF32; -> 13 +// TODO: StructuredBuffer BufUNormF32; -> 14 +// TODO: StructuredBuffer BufSNormF64; -> 15 +// TODO: StructuredBuffer BufUNormF64; -> 16 + +[numthreads(1,1,1)] +void main(int GI : SV_GroupIndex) { + BufI16[GI] = 0; + BufU16[GI] = 0; + BufI32[GI] = 0; + BufU32[GI] = 0; + BufI64[GI] = 0; + BufU64[GI] = 0; + BufF16[GI] = 0; + BufF32[GI] = 0; + BufF64[GI] = 0; + BufI16x4[GI] = 0; + BufU32x3[GI] = 0; + BufF16x2[GI] = 0; + BufF32x3[GI] = 0; +} + +// CHECK: !{{[0-9]+}} = !{ptr @BufI16, i32 10, i32 2, +// CHECK: !{{[0-9]+}} = !{ptr @BufU16, i32 10, i32 3, +// CHECK: !{{[0-9]+}} = !{ptr @BufI32, i32 10, i32 4, +// CHECK: !{{[0-9]+}} = !{ptr @BufU32, i32 10, i32 5, +// CHECK: !{{[0-9]+}} = !{ptr @BufI64, i32 10, i32 6, +// CHECK: !{{[0-9]+}} = !{ptr @BufU64, i32 10, i32 7, +// CHECK: !{{[0-9]+}} = !{ptr @BufF16, i32 10, i32 8, +// CHECK: !{{[0-9]+}} = !{ptr @BufF32, i32 10, i32 9, +// CHECK: !{{[0-9]+}} = !{ptr @BufF64, i32 10, i32 10, +// CHECK: !{{[0-9]+}} = !{ptr @BufI16x4, i32 10, i32 2, +// CHECK: !{{[0-9]+}} = !{ptr @BufU32x3, i32 10, i32 5, +// CHECK: !{{[0-9]+}} = !{ptr @BufF16x2, i32 10, i32 8, +// CHECK: !{{[0-9]+}} = !{ptr @BufF32x3, i32 10, i32 9, diff --git a/clang/test/CodeGenHLSL/builtins/StructuredBuffer-subscript.hlsl b/clang/test/CodeGenHLSL/builtins/StructuredBuffer-subscript.hlsl index 155749ec4f94a9..89bde9236288fc 100644 --- a/clang/test/CodeGenHLSL/builtins/StructuredBuffer-subscript.hlsl +++ b/clang/test/CodeGenHLSL/builtins/StructuredBuffer-subscript.hlsl @@ -1,17 +1,17 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -emit-llvm -o - -O0 %s | FileCheck %s - -StructuredBuffer In; -StructuredBuffer Out; - -[numthreads(1,1,1)] -void main(unsigned GI : SV_GroupIndex) { - Out[GI] = In[GI]; -} - -// Even at -O0 the subscript operators get inlined. The -O0 IR is a bit messy -// and confusing to follow so the match here is pretty weak. - -// CHECK: define void @main() -// Verify inlining leaves only calls to "llvm." intrinsics -// CHECK-NOT: call {{[^@]*}} @{{[^l][^l][^v][^m][^\.]}} -// CHECK: ret void +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -emit-llvm -o - -O0 %s | FileCheck %s + +StructuredBuffer In; +StructuredBuffer Out; + +[numthreads(1,1,1)] +void main(unsigned GI : SV_GroupIndex) { + Out[GI] = In[GI]; +} + +// Even at -O0 the subscript operators get inlined. The -O0 IR is a bit messy +// and confusing to follow so the match here is pretty weak. + +// CHECK: define void @main() +// Verify inlining leaves only calls to "llvm." intrinsics +// CHECK-NOT: call {{[^@]*}} @{{[^l][^l][^v][^m][^\.]}} +// CHECK: ret void diff --git a/clang/test/CodeGenHLSL/builtins/atan2.hlsl b/clang/test/CodeGenHLSL/builtins/atan2.hlsl index 40796052e608fe..ada269db2f00d3 100644 --- a/clang/test/CodeGenHLSL/builtins/atan2.hlsl +++ b/clang/test/CodeGenHLSL/builtins/atan2.hlsl @@ -1,59 +1,59 @@ -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ -// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ -// RUN: --check-prefixes=CHECK,NATIVE_HALF -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: spirv-unknown-vulkan-compute %s -emit-llvm -disable-llvm-passes \ -// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF - -// CHECK-LABEL: test_atan2_half -// NATIVE_HALF: call half @llvm.atan2.f16 -// NO_HALF: call float @llvm.atan2.f32 -half test_atan2_half (half p0, half p1) { - return atan2(p0, p1); -} - -// CHECK-LABEL: test_atan2_half2 -// NATIVE_HALF: call <2 x half> @llvm.atan2.v2f16 -// NO_HALF: call <2 x float> @llvm.atan2.v2f32 -half2 test_atan2_half2 (half2 p0, half2 p1) { - return atan2(p0, p1); -} - -// CHECK-LABEL: test_atan2_half3 -// NATIVE_HALF: call <3 x half> @llvm.atan2.v3f16 -// NO_HALF: call <3 x float> @llvm.atan2.v3f32 -half3 test_atan2_half3 (half3 p0, half3 p1) { - return atan2(p0, p1); -} - -// CHECK-LABEL: test_atan2_half4 -// NATIVE_HALF: call <4 x half> @llvm.atan2.v4f16 -// NO_HALF: call <4 x float> @llvm.atan2.v4f32 -half4 test_atan2_half4 (half4 p0, half4 p1) { - return atan2(p0, p1); -} - -// CHECK-LABEL: test_atan2_float -// CHECK: call float @llvm.atan2.f32 -float test_atan2_float (float p0, float p1) { - return atan2(p0, p1); -} - -// CHECK-LABEL: test_atan2_float2 -// CHECK: call <2 x float> @llvm.atan2.v2f32 -float2 test_atan2_float2 (float2 p0, float2 p1) { - return atan2(p0, p1); -} - -// CHECK-LABEL: test_atan2_float3 -// CHECK: call <3 x float> @llvm.atan2.v3f32 -float3 test_atan2_float3 (float3 p0, float3 p1) { - return atan2(p0, p1); -} - -// CHECK-LABEL: test_atan2_float4 -// CHECK: call <4 x float> @llvm.atan2.v4f32 -float4 test_atan2_float4 (float4 p0, float4 p1) { - return atan2(p0, p1); -} +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ +// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ +// RUN: --check-prefixes=CHECK,NATIVE_HALF +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: spirv-unknown-vulkan-compute %s -emit-llvm -disable-llvm-passes \ +// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF + +// CHECK-LABEL: test_atan2_half +// NATIVE_HALF: call half @llvm.atan2.f16 +// NO_HALF: call float @llvm.atan2.f32 +half test_atan2_half (half p0, half p1) { + return atan2(p0, p1); +} + +// CHECK-LABEL: test_atan2_half2 +// NATIVE_HALF: call <2 x half> @llvm.atan2.v2f16 +// NO_HALF: call <2 x float> @llvm.atan2.v2f32 +half2 test_atan2_half2 (half2 p0, half2 p1) { + return atan2(p0, p1); +} + +// CHECK-LABEL: test_atan2_half3 +// NATIVE_HALF: call <3 x half> @llvm.atan2.v3f16 +// NO_HALF: call <3 x float> @llvm.atan2.v3f32 +half3 test_atan2_half3 (half3 p0, half3 p1) { + return atan2(p0, p1); +} + +// CHECK-LABEL: test_atan2_half4 +// NATIVE_HALF: call <4 x half> @llvm.atan2.v4f16 +// NO_HALF: call <4 x float> @llvm.atan2.v4f32 +half4 test_atan2_half4 (half4 p0, half4 p1) { + return atan2(p0, p1); +} + +// CHECK-LABEL: test_atan2_float +// CHECK: call float @llvm.atan2.f32 +float test_atan2_float (float p0, float p1) { + return atan2(p0, p1); +} + +// CHECK-LABEL: test_atan2_float2 +// CHECK: call <2 x float> @llvm.atan2.v2f32 +float2 test_atan2_float2 (float2 p0, float2 p1) { + return atan2(p0, p1); +} + +// CHECK-LABEL: test_atan2_float3 +// CHECK: call <3 x float> @llvm.atan2.v3f32 +float3 test_atan2_float3 (float3 p0, float3 p1) { + return atan2(p0, p1); +} + +// CHECK-LABEL: test_atan2_float4 +// CHECK: call <4 x float> @llvm.atan2.v4f32 +float4 test_atan2_float4 (float4 p0, float4 p1) { + return atan2(p0, p1); +} diff --git a/clang/test/CodeGenHLSL/builtins/cross.hlsl b/clang/test/CodeGenHLSL/builtins/cross.hlsl index 514e57d36b2016..eba710c905bf46 100644 --- a/clang/test/CodeGenHLSL/builtins/cross.hlsl +++ b/clang/test/CodeGenHLSL/builtins/cross.hlsl @@ -1,37 +1,37 @@ -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ -// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ -// RUN: --check-prefixes=CHECK,NATIVE_HALF \ -// RUN: -DFNATTRS=noundef -DTARGET=dx -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -emit-llvm -disable-llvm-passes \ -// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ -// RUN: -DFNATTRS=noundef -DTARGET=dx -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: spirv-unknown-vulkan-compute %s -fnative-half-type \ -// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ -// RUN: --check-prefixes=CHECK,NATIVE_HALF \ -// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: spirv-unknown-vulkan-compute %s -emit-llvm -disable-llvm-passes \ -// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ -// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv - -// NATIVE_HALF: define [[FNATTRS]] <3 x half> @ -// NATIVE_HALF: call <3 x half> @llvm.[[TARGET]].cross.v3f16(<3 x half> -// NATIVE_HALF: ret <3 x half> %hlsl.cross -// NO_HALF: define [[FNATTRS]] <3 x float> @ -// NO_HALF: call <3 x float> @llvm.[[TARGET]].cross.v3f32(<3 x float> -// NO_HALF: ret <3 x float> %hlsl.cross -half3 test_cross_half3(half3 p0, half3 p1) -{ - return cross(p0, p1); -} - -// CHECK: define [[FNATTRS]] <3 x float> @ -// CHECK: %hlsl.cross = call <3 x float> @llvm.[[TARGET]].cross.v3f32( -// CHECK: ret <3 x float> %hlsl.cross -float3 test_cross_float3(float3 p0, float3 p1) -{ - return cross(p0, p1); -} +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ +// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ +// RUN: --check-prefixes=CHECK,NATIVE_HALF \ +// RUN: -DFNATTRS=noundef -DTARGET=dx +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -emit-llvm -disable-llvm-passes \ +// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ +// RUN: -DFNATTRS=noundef -DTARGET=dx +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: spirv-unknown-vulkan-compute %s -fnative-half-type \ +// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ +// RUN: --check-prefixes=CHECK,NATIVE_HALF \ +// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: spirv-unknown-vulkan-compute %s -emit-llvm -disable-llvm-passes \ +// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ +// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv + +// NATIVE_HALF: define [[FNATTRS]] <3 x half> @ +// NATIVE_HALF: call <3 x half> @llvm.[[TARGET]].cross.v3f16(<3 x half> +// NATIVE_HALF: ret <3 x half> %hlsl.cross +// NO_HALF: define [[FNATTRS]] <3 x float> @ +// NO_HALF: call <3 x float> @llvm.[[TARGET]].cross.v3f32(<3 x float> +// NO_HALF: ret <3 x float> %hlsl.cross +half3 test_cross_half3(half3 p0, half3 p1) +{ + return cross(p0, p1); +} + +// CHECK: define [[FNATTRS]] <3 x float> @ +// CHECK: %hlsl.cross = call <3 x float> @llvm.[[TARGET]].cross.v3f32( +// CHECK: ret <3 x float> %hlsl.cross +float3 test_cross_float3(float3 p0, float3 p1) +{ + return cross(p0, p1); +} diff --git a/clang/test/CodeGenHLSL/builtins/length.hlsl b/clang/test/CodeGenHLSL/builtins/length.hlsl index 1c23b0df04df98..9b0293c218a5de 100644 --- a/clang/test/CodeGenHLSL/builtins/length.hlsl +++ b/clang/test/CodeGenHLSL/builtins/length.hlsl @@ -1,73 +1,73 @@ -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ -// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ -// RUN: --check-prefixes=CHECK,NATIVE_HALF -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -emit-llvm -disable-llvm-passes \ -// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF - -// NATIVE_HALF: define noundef half @ -// NATIVE_HALF: call half @llvm.fabs.f16(half -// NO_HALF: call float @llvm.fabs.f32(float -// NATIVE_HALF: ret half -// NO_HALF: ret float -half test_length_half(half p0) -{ - return length(p0); -} -// NATIVE_HALF: define noundef half @ -// NATIVE_HALF: %hlsl.length = call half @llvm.dx.length.v2f16 -// NO_HALF: %hlsl.length = call float @llvm.dx.length.v2f32( -// NATIVE_HALF: ret half %hlsl.length -// NO_HALF: ret float %hlsl.length -half test_length_half2(half2 p0) -{ - return length(p0); -} -// NATIVE_HALF: define noundef half @ -// NATIVE_HALF: %hlsl.length = call half @llvm.dx.length.v3f16 -// NO_HALF: %hlsl.length = call float @llvm.dx.length.v3f32( -// NATIVE_HALF: ret half %hlsl.length -// NO_HALF: ret float %hlsl.length -half test_length_half3(half3 p0) -{ - return length(p0); -} -// NATIVE_HALF: define noundef half @ -// NATIVE_HALF: %hlsl.length = call half @llvm.dx.length.v4f16 -// NO_HALF: %hlsl.length = call float @llvm.dx.length.v4f32( -// NATIVE_HALF: ret half %hlsl.length -// NO_HALF: ret float %hlsl.length -half test_length_half4(half4 p0) -{ - return length(p0); -} - -// CHECK: define noundef float @ -// CHECK: call float @llvm.fabs.f32(float -// CHECK: ret float -float test_length_float(float p0) -{ - return length(p0); -} -// CHECK: define noundef float @ -// CHECK: %hlsl.length = call float @llvm.dx.length.v2f32( -// CHECK: ret float %hlsl.length -float test_length_float2(float2 p0) -{ - return length(p0); -} -// CHECK: define noundef float @ -// CHECK: %hlsl.length = call float @llvm.dx.length.v3f32( -// CHECK: ret float %hlsl.length -float test_length_float3(float3 p0) -{ - return length(p0); -} -// CHECK: define noundef float @ -// CHECK: %hlsl.length = call float @llvm.dx.length.v4f32( -// CHECK: ret float %hlsl.length -float test_length_float4(float4 p0) -{ - return length(p0); -} +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ +// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ +// RUN: --check-prefixes=CHECK,NATIVE_HALF +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -emit-llvm -disable-llvm-passes \ +// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF + +// NATIVE_HALF: define noundef half @ +// NATIVE_HALF: call half @llvm.fabs.f16(half +// NO_HALF: call float @llvm.fabs.f32(float +// NATIVE_HALF: ret half +// NO_HALF: ret float +half test_length_half(half p0) +{ + return length(p0); +} +// NATIVE_HALF: define noundef half @ +// NATIVE_HALF: %hlsl.length = call half @llvm.dx.length.v2f16 +// NO_HALF: %hlsl.length = call float @llvm.dx.length.v2f32( +// NATIVE_HALF: ret half %hlsl.length +// NO_HALF: ret float %hlsl.length +half test_length_half2(half2 p0) +{ + return length(p0); +} +// NATIVE_HALF: define noundef half @ +// NATIVE_HALF: %hlsl.length = call half @llvm.dx.length.v3f16 +// NO_HALF: %hlsl.length = call float @llvm.dx.length.v3f32( +// NATIVE_HALF: ret half %hlsl.length +// NO_HALF: ret float %hlsl.length +half test_length_half3(half3 p0) +{ + return length(p0); +} +// NATIVE_HALF: define noundef half @ +// NATIVE_HALF: %hlsl.length = call half @llvm.dx.length.v4f16 +// NO_HALF: %hlsl.length = call float @llvm.dx.length.v4f32( +// NATIVE_HALF: ret half %hlsl.length +// NO_HALF: ret float %hlsl.length +half test_length_half4(half4 p0) +{ + return length(p0); +} + +// CHECK: define noundef float @ +// CHECK: call float @llvm.fabs.f32(float +// CHECK: ret float +float test_length_float(float p0) +{ + return length(p0); +} +// CHECK: define noundef float @ +// CHECK: %hlsl.length = call float @llvm.dx.length.v2f32( +// CHECK: ret float %hlsl.length +float test_length_float2(float2 p0) +{ + return length(p0); +} +// CHECK: define noundef float @ +// CHECK: %hlsl.length = call float @llvm.dx.length.v3f32( +// CHECK: ret float %hlsl.length +float test_length_float3(float3 p0) +{ + return length(p0); +} +// CHECK: define noundef float @ +// CHECK: %hlsl.length = call float @llvm.dx.length.v4f32( +// CHECK: ret float %hlsl.length +float test_length_float4(float4 p0) +{ + return length(p0); +} diff --git a/clang/test/CodeGenHLSL/builtins/normalize.hlsl b/clang/test/CodeGenHLSL/builtins/normalize.hlsl index 83ad607c14a607..d14e7c70ce0653 100644 --- a/clang/test/CodeGenHLSL/builtins/normalize.hlsl +++ b/clang/test/CodeGenHLSL/builtins/normalize.hlsl @@ -1,85 +1,85 @@ -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ -// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ -// RUN: --check-prefixes=CHECK,NATIVE_HALF \ -// RUN: -DFNATTRS=noundef -DTARGET=dx -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -emit-llvm -disable-llvm-passes \ -// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ -// RUN: -DFNATTRS=noundef -DTARGET=dx -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: spirv-unknown-vulkan-compute %s -fnative-half-type \ -// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ -// RUN: --check-prefixes=CHECK,NATIVE_HALF \ -// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: spirv-unknown-vulkan-compute %s -emit-llvm -disable-llvm-passes \ -// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ -// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv - -// NATIVE_HALF: define [[FNATTRS]] half @ -// NATIVE_HALF: call half @llvm.[[TARGET]].normalize.f16(half -// NO_HALF: call float @llvm.[[TARGET]].normalize.f32(float -// NATIVE_HALF: ret half -// NO_HALF: ret float -half test_normalize_half(half p0) -{ - return normalize(p0); -} -// NATIVE_HALF: define [[FNATTRS]] <2 x half> @ -// NATIVE_HALF: call <2 x half> @llvm.[[TARGET]].normalize.v2f16(<2 x half> -// NO_HALF: call <2 x float> @llvm.[[TARGET]].normalize.v2f32(<2 x float> -// NATIVE_HALF: ret <2 x half> %hlsl.normalize -// NO_HALF: ret <2 x float> %hlsl.normalize -half2 test_normalize_half2(half2 p0) -{ - return normalize(p0); -} -// NATIVE_HALF: define [[FNATTRS]] <3 x half> @ -// NATIVE_HALF: call <3 x half> @llvm.[[TARGET]].normalize.v3f16(<3 x half> -// NO_HALF: call <3 x float> @llvm.[[TARGET]].normalize.v3f32(<3 x float> -// NATIVE_HALF: ret <3 x half> %hlsl.normalize -// NO_HALF: ret <3 x float> %hlsl.normalize -half3 test_normalize_half3(half3 p0) -{ - return normalize(p0); -} -// NATIVE_HALF: define [[FNATTRS]] <4 x half> @ -// NATIVE_HALF: call <4 x half> @llvm.[[TARGET]].normalize.v4f16(<4 x half> -// NO_HALF: call <4 x float> @llvm.[[TARGET]].normalize.v4f32(<4 x float> -// NATIVE_HALF: ret <4 x half> %hlsl.normalize -// NO_HALF: ret <4 x float> %hlsl.normalize -half4 test_normalize_half4(half4 p0) -{ - return normalize(p0); -} - -// CHECK: define [[FNATTRS]] float @ -// CHECK: call float @llvm.[[TARGET]].normalize.f32(float -// CHECK: ret float -float test_normalize_float(float p0) -{ - return normalize(p0); -} -// CHECK: define [[FNATTRS]] <2 x float> @ -// CHECK: %hlsl.normalize = call <2 x float> @llvm.[[TARGET]].normalize.v2f32(<2 x float> - -// CHECK: ret <2 x float> %hlsl.normalize -float2 test_normalize_float2(float2 p0) -{ - return normalize(p0); -} -// CHECK: define [[FNATTRS]] <3 x float> @ -// CHECK: %hlsl.normalize = call <3 x float> @llvm.[[TARGET]].normalize.v3f32( -// CHECK: ret <3 x float> %hlsl.normalize -float3 test_normalize_float3(float3 p0) -{ - return normalize(p0); -} -// CHECK: define [[FNATTRS]] <4 x float> @ -// CHECK: %hlsl.normalize = call <4 x float> @llvm.[[TARGET]].normalize.v4f32( -// CHECK: ret <4 x float> %hlsl.normalize -float4 test_length_float4(float4 p0) -{ - return normalize(p0); -} +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ +// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ +// RUN: --check-prefixes=CHECK,NATIVE_HALF \ +// RUN: -DFNATTRS=noundef -DTARGET=dx +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -emit-llvm -disable-llvm-passes \ +// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ +// RUN: -DFNATTRS=noundef -DTARGET=dx +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: spirv-unknown-vulkan-compute %s -fnative-half-type \ +// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ +// RUN: --check-prefixes=CHECK,NATIVE_HALF \ +// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: spirv-unknown-vulkan-compute %s -emit-llvm -disable-llvm-passes \ +// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ +// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv + +// NATIVE_HALF: define [[FNATTRS]] half @ +// NATIVE_HALF: call half @llvm.[[TARGET]].normalize.f16(half +// NO_HALF: call float @llvm.[[TARGET]].normalize.f32(float +// NATIVE_HALF: ret half +// NO_HALF: ret float +half test_normalize_half(half p0) +{ + return normalize(p0); +} +// NATIVE_HALF: define [[FNATTRS]] <2 x half> @ +// NATIVE_HALF: call <2 x half> @llvm.[[TARGET]].normalize.v2f16(<2 x half> +// NO_HALF: call <2 x float> @llvm.[[TARGET]].normalize.v2f32(<2 x float> +// NATIVE_HALF: ret <2 x half> %hlsl.normalize +// NO_HALF: ret <2 x float> %hlsl.normalize +half2 test_normalize_half2(half2 p0) +{ + return normalize(p0); +} +// NATIVE_HALF: define [[FNATTRS]] <3 x half> @ +// NATIVE_HALF: call <3 x half> @llvm.[[TARGET]].normalize.v3f16(<3 x half> +// NO_HALF: call <3 x float> @llvm.[[TARGET]].normalize.v3f32(<3 x float> +// NATIVE_HALF: ret <3 x half> %hlsl.normalize +// NO_HALF: ret <3 x float> %hlsl.normalize +half3 test_normalize_half3(half3 p0) +{ + return normalize(p0); +} +// NATIVE_HALF: define [[FNATTRS]] <4 x half> @ +// NATIVE_HALF: call <4 x half> @llvm.[[TARGET]].normalize.v4f16(<4 x half> +// NO_HALF: call <4 x float> @llvm.[[TARGET]].normalize.v4f32(<4 x float> +// NATIVE_HALF: ret <4 x half> %hlsl.normalize +// NO_HALF: ret <4 x float> %hlsl.normalize +half4 test_normalize_half4(half4 p0) +{ + return normalize(p0); +} + +// CHECK: define [[FNATTRS]] float @ +// CHECK: call float @llvm.[[TARGET]].normalize.f32(float +// CHECK: ret float +float test_normalize_float(float p0) +{ + return normalize(p0); +} +// CHECK: define [[FNATTRS]] <2 x float> @ +// CHECK: %hlsl.normalize = call <2 x float> @llvm.[[TARGET]].normalize.v2f32(<2 x float> + +// CHECK: ret <2 x float> %hlsl.normalize +float2 test_normalize_float2(float2 p0) +{ + return normalize(p0); +} +// CHECK: define [[FNATTRS]] <3 x float> @ +// CHECK: %hlsl.normalize = call <3 x float> @llvm.[[TARGET]].normalize.v3f32( +// CHECK: ret <3 x float> %hlsl.normalize +float3 test_normalize_float3(float3 p0) +{ + return normalize(p0); +} +// CHECK: define [[FNATTRS]] <4 x float> @ +// CHECK: %hlsl.normalize = call <4 x float> @llvm.[[TARGET]].normalize.v4f32( +// CHECK: ret <4 x float> %hlsl.normalize +float4 test_length_float4(float4 p0) +{ + return normalize(p0); +} diff --git a/clang/test/CodeGenHLSL/builtins/step.hlsl b/clang/test/CodeGenHLSL/builtins/step.hlsl index 442f4930ca579c..8ef52794a3be5d 100644 --- a/clang/test/CodeGenHLSL/builtins/step.hlsl +++ b/clang/test/CodeGenHLSL/builtins/step.hlsl @@ -1,84 +1,84 @@ -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ -// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ -// RUN: --check-prefixes=CHECK,NATIVE_HALF \ -// RUN: -DFNATTRS=noundef -DTARGET=dx -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: dxil-pc-shadermodel6.3-library %s -emit-llvm -disable-llvm-passes \ -// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ -// RUN: -DFNATTRS=noundef -DTARGET=dx -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: spirv-unknown-vulkan-compute %s -fnative-half-type \ -// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ -// RUN: --check-prefixes=CHECK,NATIVE_HALF \ -// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv -// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ -// RUN: spirv-unknown-vulkan-compute %s -emit-llvm -disable-llvm-passes \ -// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ -// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv - -// NATIVE_HALF: define [[FNATTRS]] half @ -// NATIVE_HALF: call half @llvm.[[TARGET]].step.f16(half -// NO_HALF: call float @llvm.[[TARGET]].step.f32(float -// NATIVE_HALF: ret half -// NO_HALF: ret float -half test_step_half(half p0, half p1) -{ - return step(p0, p1); -} -// NATIVE_HALF: define [[FNATTRS]] <2 x half> @ -// NATIVE_HALF: call <2 x half> @llvm.[[TARGET]].step.v2f16(<2 x half> -// NO_HALF: call <2 x float> @llvm.[[TARGET]].step.v2f32(<2 x float> -// NATIVE_HALF: ret <2 x half> %hlsl.step -// NO_HALF: ret <2 x float> %hlsl.step -half2 test_step_half2(half2 p0, half2 p1) -{ - return step(p0, p1); -} -// NATIVE_HALF: define [[FNATTRS]] <3 x half> @ -// NATIVE_HALF: call <3 x half> @llvm.[[TARGET]].step.v3f16(<3 x half> -// NO_HALF: call <3 x float> @llvm.[[TARGET]].step.v3f32(<3 x float> -// NATIVE_HALF: ret <3 x half> %hlsl.step -// NO_HALF: ret <3 x float> %hlsl.step -half3 test_step_half3(half3 p0, half3 p1) -{ - return step(p0, p1); -} -// NATIVE_HALF: define [[FNATTRS]] <4 x half> @ -// NATIVE_HALF: call <4 x half> @llvm.[[TARGET]].step.v4f16(<4 x half> -// NO_HALF: call <4 x float> @llvm.[[TARGET]].step.v4f32(<4 x float> -// NATIVE_HALF: ret <4 x half> %hlsl.step -// NO_HALF: ret <4 x float> %hlsl.step -half4 test_step_half4(half4 p0, half4 p1) -{ - return step(p0, p1); -} - -// CHECK: define [[FNATTRS]] float @ -// CHECK: call float @llvm.[[TARGET]].step.f32(float -// CHECK: ret float -float test_step_float(float p0, float p1) -{ - return step(p0, p1); -} -// CHECK: define [[FNATTRS]] <2 x float> @ -// CHECK: %hlsl.step = call <2 x float> @llvm.[[TARGET]].step.v2f32( -// CHECK: ret <2 x float> %hlsl.step -float2 test_step_float2(float2 p0, float2 p1) -{ - return step(p0, p1); -} -// CHECK: define [[FNATTRS]] <3 x float> @ -// CHECK: %hlsl.step = call <3 x float> @llvm.[[TARGET]].step.v3f32( -// CHECK: ret <3 x float> %hlsl.step -float3 test_step_float3(float3 p0, float3 p1) -{ - return step(p0, p1); -} -// CHECK: define [[FNATTRS]] <4 x float> @ -// CHECK: %hlsl.step = call <4 x float> @llvm.[[TARGET]].step.v4f32( -// CHECK: ret <4 x float> %hlsl.step -float4 test_step_float4(float4 p0, float4 p1) -{ - return step(p0, p1); -} +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -fnative-half-type \ +// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ +// RUN: --check-prefixes=CHECK,NATIVE_HALF \ +// RUN: -DFNATTRS=noundef -DTARGET=dx +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: dxil-pc-shadermodel6.3-library %s -emit-llvm -disable-llvm-passes \ +// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ +// RUN: -DFNATTRS=noundef -DTARGET=dx +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: spirv-unknown-vulkan-compute %s -fnative-half-type \ +// RUN: -emit-llvm -disable-llvm-passes -o - | FileCheck %s \ +// RUN: --check-prefixes=CHECK,NATIVE_HALF \ +// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv +// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple \ +// RUN: spirv-unknown-vulkan-compute %s -emit-llvm -disable-llvm-passes \ +// RUN: -o - | FileCheck %s --check-prefixes=CHECK,NO_HALF \ +// RUN: -DFNATTRS="spir_func noundef" -DTARGET=spv + +// NATIVE_HALF: define [[FNATTRS]] half @ +// NATIVE_HALF: call half @llvm.[[TARGET]].step.f16(half +// NO_HALF: call float @llvm.[[TARGET]].step.f32(float +// NATIVE_HALF: ret half +// NO_HALF: ret float +half test_step_half(half p0, half p1) +{ + return step(p0, p1); +} +// NATIVE_HALF: define [[FNATTRS]] <2 x half> @ +// NATIVE_HALF: call <2 x half> @llvm.[[TARGET]].step.v2f16(<2 x half> +// NO_HALF: call <2 x float> @llvm.[[TARGET]].step.v2f32(<2 x float> +// NATIVE_HALF: ret <2 x half> %hlsl.step +// NO_HALF: ret <2 x float> %hlsl.step +half2 test_step_half2(half2 p0, half2 p1) +{ + return step(p0, p1); +} +// NATIVE_HALF: define [[FNATTRS]] <3 x half> @ +// NATIVE_HALF: call <3 x half> @llvm.[[TARGET]].step.v3f16(<3 x half> +// NO_HALF: call <3 x float> @llvm.[[TARGET]].step.v3f32(<3 x float> +// NATIVE_HALF: ret <3 x half> %hlsl.step +// NO_HALF: ret <3 x float> %hlsl.step +half3 test_step_half3(half3 p0, half3 p1) +{ + return step(p0, p1); +} +// NATIVE_HALF: define [[FNATTRS]] <4 x half> @ +// NATIVE_HALF: call <4 x half> @llvm.[[TARGET]].step.v4f16(<4 x half> +// NO_HALF: call <4 x float> @llvm.[[TARGET]].step.v4f32(<4 x float> +// NATIVE_HALF: ret <4 x half> %hlsl.step +// NO_HALF: ret <4 x float> %hlsl.step +half4 test_step_half4(half4 p0, half4 p1) +{ + return step(p0, p1); +} + +// CHECK: define [[FNATTRS]] float @ +// CHECK: call float @llvm.[[TARGET]].step.f32(float +// CHECK: ret float +float test_step_float(float p0, float p1) +{ + return step(p0, p1); +} +// CHECK: define [[FNATTRS]] <2 x float> @ +// CHECK: %hlsl.step = call <2 x float> @llvm.[[TARGET]].step.v2f32( +// CHECK: ret <2 x float> %hlsl.step +float2 test_step_float2(float2 p0, float2 p1) +{ + return step(p0, p1); +} +// CHECK: define [[FNATTRS]] <3 x float> @ +// CHECK: %hlsl.step = call <3 x float> @llvm.[[TARGET]].step.v3f32( +// CHECK: ret <3 x float> %hlsl.step +float3 test_step_float3(float3 p0, float3 p1) +{ + return step(p0, p1); +} +// CHECK: define [[FNATTRS]] <4 x float> @ +// CHECK: %hlsl.step = call <4 x float> @llvm.[[TARGET]].step.v4f32( +// CHECK: ret <4 x float> %hlsl.step +float4 test_step_float4(float4 p0, float4 p1) +{ + return step(p0, p1); +} diff --git a/clang/test/Driver/flang/msvc-link.f90 b/clang/test/Driver/flang/msvc-link.f90 index 463749510eb5f8..3f7e162a9a6116 100644 --- a/clang/test/Driver/flang/msvc-link.f90 +++ b/clang/test/Driver/flang/msvc-link.f90 @@ -1,5 +1,5 @@ -! RUN: %clang --driver-mode=flang --target=x86_64-pc-windows-msvc -### %s -Ltest 2>&1 | FileCheck %s -! -! Test that user provided paths come before the Flang runtimes -! CHECK: "-libpath:test" -! CHECK: "-libpath:{{.*(\\|/)}}lib" +! RUN: %clang --driver-mode=flang --target=x86_64-pc-windows-msvc -### %s -Ltest 2>&1 | FileCheck %s +! +! Test that user provided paths come before the Flang runtimes +! CHECK: "-libpath:test" +! CHECK: "-libpath:{{.*(\\|/)}}lib" diff --git a/clang/test/FixIt/fixit-newline-style.c b/clang/test/FixIt/fixit-newline-style.c index 61e4df67e85bac..2aac143d4d753e 100644 --- a/clang/test/FixIt/fixit-newline-style.c +++ b/clang/test/FixIt/fixit-newline-style.c @@ -1,11 +1,11 @@ -// RUN: %clang_cc1 -pedantic -Wunused-label -fno-diagnostics-show-line-numbers -x c %s 2>&1 | FileCheck %s -strict-whitespace - -// This file intentionally uses a CRLF newline style -// CHECK: warning: unused label 'ddd' -// CHECK-NEXT: {{^ ddd:}} -// CHECK-NEXT: {{^ \^~~~$}} -// CHECK-NOT: {{^ ;}} -void f(void) { - ddd: - ; -} +// RUN: %clang_cc1 -pedantic -Wunused-label -fno-diagnostics-show-line-numbers -x c %s 2>&1 | FileCheck %s -strict-whitespace + +// This file intentionally uses a CRLF newline style +// CHECK: warning: unused label 'ddd' +// CHECK-NEXT: {{^ ddd:}} +// CHECK-NEXT: {{^ \^~~~$}} +// CHECK-NOT: {{^ ;}} +void f(void) { + ddd: + ; +} diff --git a/clang/test/Frontend/rewrite-includes-mixed-eol-crlf.c b/clang/test/Frontend/rewrite-includes-mixed-eol-crlf.c index d6724444c06676..2faeaba3229218 100644 --- a/clang/test/Frontend/rewrite-includes-mixed-eol-crlf.c +++ b/clang/test/Frontend/rewrite-includes-mixed-eol-crlf.c @@ -1,8 +1,8 @@ -// RUN: %clang_cc1 -E -frewrite-includes %s | %clang_cc1 - -// expected-no-diagnostics -// Note: This source file has CRLF line endings. -// This test validates that -frewrite-includes translates the end of line (EOL) -// form used in header files to the EOL form used in the the primary source -// file when the files use different EOL forms. -#include "rewrite-includes-mixed-eol-crlf.h" -#include "rewrite-includes-mixed-eol-lf.h" +// RUN: %clang_cc1 -E -frewrite-includes %s | %clang_cc1 - +// expected-no-diagnostics +// Note: This source file has CRLF line endings. +// This test validates that -frewrite-includes translates the end of line (EOL) +// form used in header files to the EOL form used in the the primary source +// file when the files use different EOL forms. +#include "rewrite-includes-mixed-eol-crlf.h" +#include "rewrite-includes-mixed-eol-lf.h" diff --git a/clang/test/Frontend/rewrite-includes-mixed-eol-crlf.h b/clang/test/Frontend/rewrite-includes-mixed-eol-crlf.h index 0439b88b75e2cf..baedc282296bd7 100644 --- a/clang/test/Frontend/rewrite-includes-mixed-eol-crlf.h +++ b/clang/test/Frontend/rewrite-includes-mixed-eol-crlf.h @@ -1,11 +1,11 @@ -// Note: This header file has CRLF line endings. -// The indentation in some of the conditional inclusion directives below is -// intentional and is required for this test to function as a regression test -// for GH59736. -_Static_assert(__LINE__ == 5, ""); -#if 1 -_Static_assert(__LINE__ == 7, ""); - #if 1 - _Static_assert(__LINE__ == 9, ""); - #endif -#endif +// Note: This header file has CRLF line endings. +// The indentation in some of the conditional inclusion directives below is +// intentional and is required for this test to function as a regression test +// for GH59736. +_Static_assert(__LINE__ == 5, ""); +#if 1 +_Static_assert(__LINE__ == 7, ""); + #if 1 + _Static_assert(__LINE__ == 9, ""); + #endif +#endif diff --git a/clang/test/Frontend/system-header-line-directive-ms-lineendings.c b/clang/test/Frontend/system-header-line-directive-ms-lineendings.c index 92fc07f65e0d4d..dffdd5cf1959ae 100644 --- a/clang/test/Frontend/system-header-line-directive-ms-lineendings.c +++ b/clang/test/Frontend/system-header-line-directive-ms-lineendings.c @@ -1,21 +1,21 @@ -// RUN: %clang_cc1 %s -E -o - -I %S/Inputs -isystem %S/Inputs/SystemHeaderPrefix | FileCheck %s -#include -#include - -#include "line-directive.h" - -// This tests that the line numbers for the current file are correctly outputted -// for the include-file-completed test case. This file should be CRLF. - -// CHECK: # 1 "{{.*}}system-header-line-directive-ms-lineendings.c" 2 -// CHECK: # 1 "{{.*}}noline.h" 1 3 -// CHECK: foo(void); -// CHECK: # 3 "{{.*}}system-header-line-directive-ms-lineendings.c" 2 -// CHECK: # 1 "{{.*}}line-directive-in-system.h" 1 3 -// The "3" below indicates that "foo.h" is considered a system header. -// CHECK: # 1 "foo.h" 3 -// CHECK: foo(void); -// CHECK: # 4 "{{.*}}system-header-line-directive-ms-lineendings.c" 2 -// CHECK: # 1 "{{.*}}line-directive.h" 1 -// CHECK: # 10 "foo.h"{{$}} -// CHECK: # 6 "{{.*}}system-header-line-directive-ms-lineendings.c" 2 +// RUN: %clang_cc1 %s -E -o - -I %S/Inputs -isystem %S/Inputs/SystemHeaderPrefix | FileCheck %s +#include +#include + +#include "line-directive.h" + +// This tests that the line numbers for the current file are correctly outputted +// for the include-file-completed test case. This file should be CRLF. + +// CHECK: # 1 "{{.*}}system-header-line-directive-ms-lineendings.c" 2 +// CHECK: # 1 "{{.*}}noline.h" 1 3 +// CHECK: foo(void); +// CHECK: # 3 "{{.*}}system-header-line-directive-ms-lineendings.c" 2 +// CHECK: # 1 "{{.*}}line-directive-in-system.h" 1 3 +// The "3" below indicates that "foo.h" is considered a system header. +// CHECK: # 1 "foo.h" 3 +// CHECK: foo(void); +// CHECK: # 4 "{{.*}}system-header-line-directive-ms-lineendings.c" 2 +// CHECK: # 1 "{{.*}}line-directive.h" 1 +// CHECK: # 10 "foo.h"{{$}} +// CHECK: # 6 "{{.*}}system-header-line-directive-ms-lineendings.c" 2 diff --git a/clang/test/ParserHLSL/bitfields.hlsl b/clang/test/ParserHLSL/bitfields.hlsl index 307d1143a068e2..57b6705babdc12 100644 --- a/clang/test/ParserHLSL/bitfields.hlsl +++ b/clang/test/ParserHLSL/bitfields.hlsl @@ -1,31 +1,31 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -ast-dump -x hlsl -o - %s | FileCheck %s - - -struct MyBitFields { - // CHECK: FieldDecl 0x{{[0-9a-f]+}} col:16 referenced field1 'unsigned int' - // CHECK:-ConstantExpr 0x{{[0-9a-f]+}} 'int' - // CHECK:-value: Int 3 - // CHECK:-IntegerLiteral 0x{{[0-9a-f]+}} 'int' 3 - unsigned int field1 : 3; // 3 bits for field1 - - // CHECK:FieldDecl 0x{{[0-9a-f]+}} col:16 referenced field2 'unsigned int' - // CHECK:-ConstantExpr 0x{{[0-9a-f]+}} 'int' - // CHECK:-value: Int 4 - // CHECK:-IntegerLiteral 0x{{[0-9a-f]+}} 'int' 4 - unsigned int field2 : 4; // 4 bits for field2 - - // CHECK:FieldDecl 0x{{[0-9a-f]+}} col:7 field3 'int' - // CHECK:-ConstantExpr 0x{{[0-9a-f]+}} 'int' - // CHECK:-value: Int 5 - // CHECK:-IntegerLiteral 0x{{[0-9a-f]+}} 'int' 5 - int field3 : 5; // 5 bits for field3 (signed) -}; - - - -[numthreads(1,1,1)] -void main() { - MyBitFields m; - m.field1 = 4; - m.field2 = m.field1*2; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -ast-dump -x hlsl -o - %s | FileCheck %s + + +struct MyBitFields { + // CHECK: FieldDecl 0x{{[0-9a-f]+}} col:16 referenced field1 'unsigned int' + // CHECK:-ConstantExpr 0x{{[0-9a-f]+}} 'int' + // CHECK:-value: Int 3 + // CHECK:-IntegerLiteral 0x{{[0-9a-f]+}} 'int' 3 + unsigned int field1 : 3; // 3 bits for field1 + + // CHECK:FieldDecl 0x{{[0-9a-f]+}} col:16 referenced field2 'unsigned int' + // CHECK:-ConstantExpr 0x{{[0-9a-f]+}} 'int' + // CHECK:-value: Int 4 + // CHECK:-IntegerLiteral 0x{{[0-9a-f]+}} 'int' 4 + unsigned int field2 : 4; // 4 bits for field2 + + // CHECK:FieldDecl 0x{{[0-9a-f]+}} col:7 field3 'int' + // CHECK:-ConstantExpr 0x{{[0-9a-f]+}} 'int' + // CHECK:-value: Int 5 + // CHECK:-IntegerLiteral 0x{{[0-9a-f]+}} 'int' 5 + int field3 : 5; // 5 bits for field3 (signed) +}; + + + +[numthreads(1,1,1)] +void main() { + MyBitFields m; + m.field1 = 4; + m.field2 = m.field1*2; } \ No newline at end of file diff --git a/clang/test/ParserHLSL/hlsl_annotations_on_struct_members.hlsl b/clang/test/ParserHLSL/hlsl_annotations_on_struct_members.hlsl index 2eebc920388b5b..5b228d039345e1 100644 --- a/clang/test/ParserHLSL/hlsl_annotations_on_struct_members.hlsl +++ b/clang/test/ParserHLSL/hlsl_annotations_on_struct_members.hlsl @@ -1,21 +1,21 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -ast-dump -o - %s | FileCheck %s - -// tests that hlsl annotations are properly parsed when applied on field decls, -// and that the annotation gets properly placed on the AST. - -struct Eg9{ - // CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} col:8 implicit struct Eg9 - // CHECK: FieldDecl 0x{{[0-9a-f]+}} col:16 referenced a 'unsigned int' - // CHECK: -HLSLSV_DispatchThreadIDAttr 0x{{[0-9a-f]+}} - unsigned int a : SV_DispatchThreadID; -}; -Eg9 e9; - - -RWBuffer In : register(u1); - - -[numthreads(1,1,1)] -void main() { - In[0] = e9.a; -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -ast-dump -o - %s | FileCheck %s + +// tests that hlsl annotations are properly parsed when applied on field decls, +// and that the annotation gets properly placed on the AST. + +struct Eg9{ + // CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} col:8 implicit struct Eg9 + // CHECK: FieldDecl 0x{{[0-9a-f]+}} col:16 referenced a 'unsigned int' + // CHECK: -HLSLSV_DispatchThreadIDAttr 0x{{[0-9a-f]+}} + unsigned int a : SV_DispatchThreadID; +}; +Eg9 e9; + + +RWBuffer In : register(u1); + + +[numthreads(1,1,1)] +void main() { + In[0] = e9.a; +} diff --git a/clang/test/ParserHLSL/hlsl_contained_type_attr.hlsl b/clang/test/ParserHLSL/hlsl_contained_type_attr.hlsl index 5a72aa242e581d..476ec39e14da98 100644 --- a/clang/test/ParserHLSL/hlsl_contained_type_attr.hlsl +++ b/clang/test/ParserHLSL/hlsl_contained_type_attr.hlsl @@ -1,25 +1,25 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -std=hlsl202x -x hlsl -ast-dump -o - %s | FileCheck %s - -typedef vector float4; - -// CHECK: -TypeAliasDecl 0x{{[0-9a-f]+}} -// CHECK: -HLSLAttributedResourceType 0x{{[0-9a-f]+}} '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::contained_type(int)]] -using ResourceIntAliasT = __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(int)]]; -ResourceIntAliasT h1; - -// CHECK: -VarDecl 0x{{[0-9a-f]+}} col:82 h2 '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::contained_type(float4)]] -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(float4)]] h2; - -// CHECK: ClassTemplateDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 6]]:30 S -// CHECK: TemplateTypeParmDecl 0x{{[0-9a-f]+}} col:20 referenced typename depth 0 index 0 T -// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 4]]:30 struct S definition -// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:79 h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::contained_type(T)]] -template struct S { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(T)]] h; -}; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -std=hlsl202x -x hlsl -ast-dump -o - %s | FileCheck %s + +typedef vector float4; + +// CHECK: -TypeAliasDecl 0x{{[0-9a-f]+}} +// CHECK: -HLSLAttributedResourceType 0x{{[0-9a-f]+}} '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::contained_type(int)]] +using ResourceIntAliasT = __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(int)]]; +ResourceIntAliasT h1; + +// CHECK: -VarDecl 0x{{[0-9a-f]+}} col:82 h2 '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::contained_type(float4)]] +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(float4)]] h2; + +// CHECK: ClassTemplateDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 6]]:30 S +// CHECK: TemplateTypeParmDecl 0x{{[0-9a-f]+}} col:20 referenced typename depth 0 index 0 T +// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 4]]:30 struct S definition +// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:79 h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::contained_type(T)]] +template struct S { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(T)]] h; +}; diff --git a/clang/test/ParserHLSL/hlsl_contained_type_attr_error.hlsl b/clang/test/ParserHLSL/hlsl_contained_type_attr_error.hlsl index b2d492d95945c1..673ff8693b83b8 100644 --- a/clang/test/ParserHLSL/hlsl_contained_type_attr_error.hlsl +++ b/clang/test/ParserHLSL/hlsl_contained_type_attr_error.hlsl @@ -1,28 +1,28 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -std=hlsl202x -x hlsl -o - %s -verify - -typedef vector float4; - -// expected-error at +1{{'contained_type' attribute cannot be applied to a declaration}} -[[hlsl::contained_type(float4)]] __hlsl_resource_t h1; - -// expected-error at +1{{'contained_type' attribute takes one argument}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type()]] h3; - -// expected-error at +1{{expected a type}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(0)]] h4; - -// expected-error at +1{{unknown type name 'a'}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(a)]] h5; - -// expected-error at +1{{expected a type}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type("b", c)]] h6; - -// expected-warning at +1{{attribute 'contained_type' is already applied}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(float)]] [[hlsl::contained_type(float)]] h7; - -// expected-warning at +1{{attribute 'contained_type' is already applied with different arguments}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(float)]] [[hlsl::contained_type(int)]] h8; - -// expected-error at +2{{attribute 'resource_class' can be used only on HLSL intangible type '__hlsl_resource_t'}} -// expected-error at +1{{attribute 'contained_type' can be used only on HLSL intangible type '__hlsl_resource_t'}} -float [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(float)]] res5; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -std=hlsl202x -x hlsl -o - %s -verify + +typedef vector float4; + +// expected-error at +1{{'contained_type' attribute cannot be applied to a declaration}} +[[hlsl::contained_type(float4)]] __hlsl_resource_t h1; + +// expected-error at +1{{'contained_type' attribute takes one argument}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type()]] h3; + +// expected-error at +1{{expected a type}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(0)]] h4; + +// expected-error at +1{{unknown type name 'a'}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(a)]] h5; + +// expected-error at +1{{expected a type}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type("b", c)]] h6; + +// expected-warning at +1{{attribute 'contained_type' is already applied}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(float)]] [[hlsl::contained_type(float)]] h7; + +// expected-warning at +1{{attribute 'contained_type' is already applied with different arguments}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(float)]] [[hlsl::contained_type(int)]] h8; + +// expected-error at +2{{attribute 'resource_class' can be used only on HLSL intangible type '__hlsl_resource_t'}} +// expected-error at +1{{attribute 'contained_type' can be used only on HLSL intangible type '__hlsl_resource_t'}} +float [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(float)]] res5; diff --git a/clang/test/ParserHLSL/hlsl_is_rov_attr.hlsl b/clang/test/ParserHLSL/hlsl_is_rov_attr.hlsl index 836d129c8d0002..487dc32413032d 100644 --- a/clang/test/ParserHLSL/hlsl_is_rov_attr.hlsl +++ b/clang/test/ParserHLSL/hlsl_is_rov_attr.hlsl @@ -1,22 +1,22 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -ast-dump -o - %s | FileCheck %s - -// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} {{.*}} struct MyBuffer definition -// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:68 h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::is_rov]] -struct MyBuffer { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::is_rov]] h; -}; - -// CHECK: VarDecl 0x{{[0-9a-f]+}} col:66 res '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(SRV)]] -// CHECK-SAME{LITERAL}: [[hlsl::is_rov]] -__hlsl_resource_t [[hlsl::is_rov]] [[hlsl::resource_class(SRV)]] res; - -// CHECK: FunctionDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 4]]:6 f 'void () -// CHECK: VarDecl 0x{{[0-9a-f]+}} col:72 r '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(Sampler)]] -// CHECK-SAME{LITERAL}: [[hlsl::is_rov]] -void f() { - __hlsl_resource_t [[hlsl::resource_class(Sampler)]] [[hlsl::is_rov]] r; -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -ast-dump -o - %s | FileCheck %s + +// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} {{.*}} struct MyBuffer definition +// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:68 h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::is_rov]] +struct MyBuffer { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::is_rov]] h; +}; + +// CHECK: VarDecl 0x{{[0-9a-f]+}} col:66 res '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(SRV)]] +// CHECK-SAME{LITERAL}: [[hlsl::is_rov]] +__hlsl_resource_t [[hlsl::is_rov]] [[hlsl::resource_class(SRV)]] res; + +// CHECK: FunctionDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 4]]:6 f 'void () +// CHECK: VarDecl 0x{{[0-9a-f]+}} col:72 r '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(Sampler)]] +// CHECK-SAME{LITERAL}: [[hlsl::is_rov]] +void f() { + __hlsl_resource_t [[hlsl::resource_class(Sampler)]] [[hlsl::is_rov]] r; +} diff --git a/clang/test/ParserHLSL/hlsl_is_rov_attr_error.hlsl b/clang/test/ParserHLSL/hlsl_is_rov_attr_error.hlsl index 3b2c12e7a96c5c..9bb64ea990e284 100644 --- a/clang/test/ParserHLSL/hlsl_is_rov_attr_error.hlsl +++ b/clang/test/ParserHLSL/hlsl_is_rov_attr_error.hlsl @@ -1,20 +1,20 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -o - %s -verify - -// expected-error at +1{{'is_rov' attribute cannot be applied to a declaration}} -[[hlsl::is_rov]] __hlsl_resource_t res0; - -// expected-error at +1{{HLSL resource needs to have [[hlsl::resource_class()]] attribute}} -__hlsl_resource_t [[hlsl::is_rov]] res1; - -// expected-error at +1{{'is_rov' attribute takes no arguments}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::is_rov(3)]] res2; - -// expected-error at +1{{use of undeclared identifier 'gibberish'}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::is_rov(gibberish)]] res3; - -// expected-warning at +1{{attribute 'is_rov' is already applied}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::is_rov]] [[hlsl::is_rov]] res4; - -// expected-error at +2{{attribute 'resource_class' can be used only on HLSL intangible type '__hlsl_resource_t'}} -// expected-error at +1{{attribute 'is_rov' can be used only on HLSL intangible type '__hlsl_resource_t'}} -float [[hlsl::resource_class(UAV)]] [[hlsl::is_rov]] res5; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -o - %s -verify + +// expected-error at +1{{'is_rov' attribute cannot be applied to a declaration}} +[[hlsl::is_rov]] __hlsl_resource_t res0; + +// expected-error at +1{{HLSL resource needs to have [[hlsl::resource_class()]] attribute}} +__hlsl_resource_t [[hlsl::is_rov]] res1; + +// expected-error at +1{{'is_rov' attribute takes no arguments}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::is_rov(3)]] res2; + +// expected-error at +1{{use of undeclared identifier 'gibberish'}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::is_rov(gibberish)]] res3; + +// expected-warning at +1{{attribute 'is_rov' is already applied}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::is_rov]] [[hlsl::is_rov]] res4; + +// expected-error at +2{{attribute 'resource_class' can be used only on HLSL intangible type '__hlsl_resource_t'}} +// expected-error at +1{{attribute 'is_rov' can be used only on HLSL intangible type '__hlsl_resource_t'}} +float [[hlsl::resource_class(UAV)]] [[hlsl::is_rov]] res5; diff --git a/clang/test/ParserHLSL/hlsl_raw_buffer_attr.hlsl b/clang/test/ParserHLSL/hlsl_raw_buffer_attr.hlsl index 84c924eec24efc..e09ed5586c1025 100644 --- a/clang/test/ParserHLSL/hlsl_raw_buffer_attr.hlsl +++ b/clang/test/ParserHLSL/hlsl_raw_buffer_attr.hlsl @@ -1,22 +1,22 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -ast-dump -o - %s | FileCheck %s - -// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} {{.*}} struct MyBuffer definition -// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:72 h1 '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] -struct MyBuffer { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer]] h1; -}; - -// CHECK: VarDecl 0x{{[0-9a-f]+}} col:70 h2 '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(SRV)]] -// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] -__hlsl_resource_t [[hlsl::raw_buffer]] [[hlsl::resource_class(SRV)]] h2; - -// CHECK: FunctionDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 4]]:6 f 'void () -// CHECK: VarDecl 0x{{[0-9a-f]+}} col:72 h3 '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] -void f() { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer]] h3; -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -ast-dump -o - %s | FileCheck %s + +// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} {{.*}} struct MyBuffer definition +// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:72 h1 '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] +struct MyBuffer { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer]] h1; +}; + +// CHECK: VarDecl 0x{{[0-9a-f]+}} col:70 h2 '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(SRV)]] +// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] +__hlsl_resource_t [[hlsl::raw_buffer]] [[hlsl::resource_class(SRV)]] h2; + +// CHECK: FunctionDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 4]]:6 f 'void () +// CHECK: VarDecl 0x{{[0-9a-f]+}} col:72 h3 '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::raw_buffer]] +void f() { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer]] h3; +} diff --git a/clang/test/ParserHLSL/hlsl_raw_buffer_attr_error.hlsl b/clang/test/ParserHLSL/hlsl_raw_buffer_attr_error.hlsl index 77530cbf9e4d92..a10aca4e96fc53 100644 --- a/clang/test/ParserHLSL/hlsl_raw_buffer_attr_error.hlsl +++ b/clang/test/ParserHLSL/hlsl_raw_buffer_attr_error.hlsl @@ -1,17 +1,17 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -o - %s -verify - -// expected-error at +1{{'raw_buffer' attribute cannot be applied to a declaration}} -[[hlsl::raw_buffer]] __hlsl_resource_t res0; - -// expected-error at +1{{'raw_buffer' attribute takes no arguments}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer(3)]] res2; - -// expected-error at +1{{use of undeclared identifier 'gibberish'}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer(gibberish)]] res3; - -// expected-warning at +1{{attribute 'raw_buffer' is already applied}} -__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer]] [[hlsl::raw_buffer]] res4; - -// expected-error at +2{{attribute 'resource_class' can be used only on HLSL intangible type '__hlsl_resource_t'}} -// expected-error at +1{{attribute 'raw_buffer' can be used only on HLSL intangible type '__hlsl_resource_t'}} -float [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer]] res5; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -o - %s -verify + +// expected-error at +1{{'raw_buffer' attribute cannot be applied to a declaration}} +[[hlsl::raw_buffer]] __hlsl_resource_t res0; + +// expected-error at +1{{'raw_buffer' attribute takes no arguments}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer(3)]] res2; + +// expected-error at +1{{use of undeclared identifier 'gibberish'}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer(gibberish)]] res3; + +// expected-warning at +1{{attribute 'raw_buffer' is already applied}} +__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer]] [[hlsl::raw_buffer]] res4; + +// expected-error at +2{{attribute 'resource_class' can be used only on HLSL intangible type '__hlsl_resource_t'}} +// expected-error at +1{{attribute 'raw_buffer' can be used only on HLSL intangible type '__hlsl_resource_t'}} +float [[hlsl::resource_class(UAV)]] [[hlsl::raw_buffer]] res5; diff --git a/clang/test/ParserHLSL/hlsl_resource_class_attr.hlsl b/clang/test/ParserHLSL/hlsl_resource_class_attr.hlsl index fbada8b4b99f75..9fee9edddf619a 100644 --- a/clang/test/ParserHLSL/hlsl_resource_class_attr.hlsl +++ b/clang/test/ParserHLSL/hlsl_resource_class_attr.hlsl @@ -1,37 +1,37 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -ast-dump -o - %s | FileCheck %s - -// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} {{.*}} struct MyBuffer definition -// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:51 h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -struct MyBuffer { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] h; -}; - -// CHECK: VarDecl 0x{{[0-9a-f]+}} col:49 res '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(SRV)]] -__hlsl_resource_t [[hlsl::resource_class(SRV)]] res; - -// CHECK: FunctionDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 3]]:6 f 'void () -// CHECK: VarDecl 0x{{[0-9a-f]+}} col:55 r '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(Sampler)]] -void f() { - __hlsl_resource_t [[hlsl::resource_class(Sampler)]] r; -} - -// CHECK: ClassTemplateDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 6]]:29 MyBuffer2 -// CHECK: TemplateTypeParmDecl 0x{{[0-9a-f]+}} col:19 typename depth 0 index 0 T -// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 4]]:29 struct MyBuffer2 definition -// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} col:29 implicit struct MyBuffer2 -// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:51 h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -template struct MyBuffer2 { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] h; -}; - -// CHECK: ClassTemplateSpecializationDecl 0x{{[0-9a-f]+}} line:[[# @LINE - 4]]:29 struct MyBuffer2 definition implicit_instantiation -// CHECK: TemplateArgument type 'float' -// CHECK: BuiltinType 0x{{[0-9a-f]+}} 'float' -// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} col:29 implicit struct MyBuffer2 -// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:51 h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -MyBuffer2 myBuffer2; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -ast-dump -o - %s | FileCheck %s + +// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} {{.*}} struct MyBuffer definition +// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:51 h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +struct MyBuffer { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] h; +}; + +// CHECK: VarDecl 0x{{[0-9a-f]+}} col:49 res '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(SRV)]] +__hlsl_resource_t [[hlsl::resource_class(SRV)]] res; + +// CHECK: FunctionDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 3]]:6 f 'void () +// CHECK: VarDecl 0x{{[0-9a-f]+}} col:55 r '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(Sampler)]] +void f() { + __hlsl_resource_t [[hlsl::resource_class(Sampler)]] r; +} + +// CHECK: ClassTemplateDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 6]]:29 MyBuffer2 +// CHECK: TemplateTypeParmDecl 0x{{[0-9a-f]+}} col:19 typename depth 0 index 0 T +// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} line:[[# @LINE + 4]]:29 struct MyBuffer2 definition +// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} col:29 implicit struct MyBuffer2 +// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:51 h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +template struct MyBuffer2 { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] h; +}; + +// CHECK: ClassTemplateSpecializationDecl 0x{{[0-9a-f]+}} line:[[# @LINE - 4]]:29 struct MyBuffer2 definition implicit_instantiation +// CHECK: TemplateArgument type 'float' +// CHECK: BuiltinType 0x{{[0-9a-f]+}} 'float' +// CHECK: CXXRecordDecl 0x{{[0-9a-f]+}} col:29 implicit struct MyBuffer2 +// CHECK: FieldDecl 0x{{[0-9a-f]+}} col:51 h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +MyBuffer2 myBuffer2; diff --git a/clang/test/ParserHLSL/hlsl_resource_class_attr_error.hlsl b/clang/test/ParserHLSL/hlsl_resource_class_attr_error.hlsl index 63e39daff949b4..a0a4da1dc2bf44 100644 --- a/clang/test/ParserHLSL/hlsl_resource_class_attr_error.hlsl +++ b/clang/test/ParserHLSL/hlsl_resource_class_attr_error.hlsl @@ -1,22 +1,22 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -o - %s -verify - -// expected-error at +1{{'resource_class' attribute cannot be applied to a declaration}} -[[hlsl::resource_class(UAV)]] __hlsl_resource_t e0; - -// expected-error at +1{{'resource_class' attribute takes one argument}} -__hlsl_resource_t [[hlsl::resource_class()]] e1; - -// expected-warning at +1{{ResourceClass attribute argument not supported: gibberish}} -__hlsl_resource_t [[hlsl::resource_class(gibberish)]] e2; - -// expected-warning at +1{{attribute 'resource_class' is already applied with different arguments}} -__hlsl_resource_t [[hlsl::resource_class(SRV)]] [[hlsl::resource_class(UAV)]] e3; - -// expected-warning at +1{{attribute 'resource_class' is already applied}} -__hlsl_resource_t [[hlsl::resource_class(SRV)]] [[hlsl::resource_class(SRV)]] e4; - -// expected-error at +1{{'resource_class' attribute takes one argument}} -__hlsl_resource_t [[hlsl::resource_class(SRV, "aa")]] e5; - -// expected-error at +1{{attribute 'resource_class' can be used only on HLSL intangible type '__hlsl_resource_t'}} -float [[hlsl::resource_class(UAV)]] e6; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -o - %s -verify + +// expected-error at +1{{'resource_class' attribute cannot be applied to a declaration}} +[[hlsl::resource_class(UAV)]] __hlsl_resource_t e0; + +// expected-error at +1{{'resource_class' attribute takes one argument}} +__hlsl_resource_t [[hlsl::resource_class()]] e1; + +// expected-warning at +1{{ResourceClass attribute argument not supported: gibberish}} +__hlsl_resource_t [[hlsl::resource_class(gibberish)]] e2; + +// expected-warning at +1{{attribute 'resource_class' is already applied with different arguments}} +__hlsl_resource_t [[hlsl::resource_class(SRV)]] [[hlsl::resource_class(UAV)]] e3; + +// expected-warning at +1{{attribute 'resource_class' is already applied}} +__hlsl_resource_t [[hlsl::resource_class(SRV)]] [[hlsl::resource_class(SRV)]] e4; + +// expected-error at +1{{'resource_class' attribute takes one argument}} +__hlsl_resource_t [[hlsl::resource_class(SRV, "aa")]] e5; + +// expected-error at +1{{attribute 'resource_class' can be used only on HLSL intangible type '__hlsl_resource_t'}} +float [[hlsl::resource_class(UAV)]] e6; diff --git a/clang/test/ParserHLSL/hlsl_resource_handle_attrs.hlsl b/clang/test/ParserHLSL/hlsl_resource_handle_attrs.hlsl index 38d27bc21e4aa8..8885e39237357d 100644 --- a/clang/test/ParserHLSL/hlsl_resource_handle_attrs.hlsl +++ b/clang/test/ParserHLSL/hlsl_resource_handle_attrs.hlsl @@ -1,21 +1,21 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -ast-dump -o - %s | FileCheck %s - -// CHECK: -ClassTemplateSpecializationDecl 0x{{[0-9a-f]+}} <> class RWBuffer definition implicit_instantiation -// CHECK: -TemplateArgument type 'float' -// CHECK: `-BuiltinType 0x{{[0-9a-f]+}} 'float' -// CHECK: -FieldDecl 0x{{[0-9a-f]+}} <> implicit h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] -// CHECK-SAME{LITERAL}: [[hlsl::contained_type(float)]] -// CHECK: -HLSLResourceAttr 0x{{[0-9a-f]+}} <> Implicit TypedBuffer -RWBuffer Buffer1; - -// CHECK: -ClassTemplateSpecializationDecl 0x{{[0-9a-f]+}} <> class RasterizerOrderedBuffer definition implicit_instantiation -// CHECK: -TemplateArgument type 'vector' -// CHECK: `-ExtVectorType 0x{{[0-9a-f]+}} 'vector' 4 -// CHECK: `-BuiltinType 0x{{[0-9a-f]+}} 'float' -// CHECK: -FieldDecl 0x{{[0-9a-f]+}} <> implicit h '__hlsl_resource_t -// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)] -// CHECK-SAME{LITERAL}: [[hlsl::is_rov]] -// CHECK-SAME{LITERAL}: [[hlsl::contained_type(vector)]] -// CHECK: -HLSLResourceAttr 0x{{[0-9a-f]+}} <> Implicit TypedBuffer -RasterizerOrderedBuffer > BufferArray3[4]; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -ast-dump -o - %s | FileCheck %s + +// CHECK: -ClassTemplateSpecializationDecl 0x{{[0-9a-f]+}} <> class RWBuffer definition implicit_instantiation +// CHECK: -TemplateArgument type 'float' +// CHECK: `-BuiltinType 0x{{[0-9a-f]+}} 'float' +// CHECK: -FieldDecl 0x{{[0-9a-f]+}} <> implicit h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)]] +// CHECK-SAME{LITERAL}: [[hlsl::contained_type(float)]] +// CHECK: -HLSLResourceAttr 0x{{[0-9a-f]+}} <> Implicit TypedBuffer +RWBuffer Buffer1; + +// CHECK: -ClassTemplateSpecializationDecl 0x{{[0-9a-f]+}} <> class RasterizerOrderedBuffer definition implicit_instantiation +// CHECK: -TemplateArgument type 'vector' +// CHECK: `-ExtVectorType 0x{{[0-9a-f]+}} 'vector' 4 +// CHECK: `-BuiltinType 0x{{[0-9a-f]+}} 'float' +// CHECK: -FieldDecl 0x{{[0-9a-f]+}} <> implicit h '__hlsl_resource_t +// CHECK-SAME{LITERAL}: [[hlsl::resource_class(UAV)] +// CHECK-SAME{LITERAL}: [[hlsl::is_rov]] +// CHECK-SAME{LITERAL}: [[hlsl::contained_type(vector)]] +// CHECK: -HLSLResourceAttr 0x{{[0-9a-f]+}} <> Implicit TypedBuffer +RasterizerOrderedBuffer > BufferArray3[4]; diff --git a/clang/test/Sema/aarch64-sve-vector-trig-ops.c b/clang/test/Sema/aarch64-sve-vector-trig-ops.c index 3fe6834be2e0b7..f853abcd3379fa 100644 --- a/clang/test/Sema/aarch64-sve-vector-trig-ops.c +++ b/clang/test/Sema/aarch64-sve-vector-trig-ops.c @@ -1,65 +1,65 @@ -// RUN: %clang_cc1 -triple aarch64 -target-feature +sve \ -// RUN: -disable-O0-optnone -o - -fsyntax-only %s -verify -// REQUIRES: aarch64-registered-target - -#include - -svfloat32_t test_asin_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_asin(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -svfloat32_t test_acos_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_acos(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -svfloat32_t test_atan_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_atan(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -svfloat32_t test_atan2_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_atan2(v, v); - // expected-error at -1 {{1st argument must be a floating point type}} -} - -svfloat32_t test_sin_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_sin(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -svfloat32_t test_cos_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_cos(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -svfloat32_t test_tan_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_tan(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -svfloat32_t test_sinh_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_sinh(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -svfloat32_t test_cosh_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_cosh(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -svfloat32_t test_tanh_vv_i8mf8(svfloat32_t v) { - - return __builtin_elementwise_tanh(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} +// RUN: %clang_cc1 -triple aarch64 -target-feature +sve \ +// RUN: -disable-O0-optnone -o - -fsyntax-only %s -verify +// REQUIRES: aarch64-registered-target + +#include + +svfloat32_t test_asin_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_asin(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +svfloat32_t test_acos_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_acos(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +svfloat32_t test_atan_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_atan(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +svfloat32_t test_atan2_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_atan2(v, v); + // expected-error at -1 {{1st argument must be a floating point type}} +} + +svfloat32_t test_sin_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_sin(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +svfloat32_t test_cos_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_cos(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +svfloat32_t test_tan_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_tan(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +svfloat32_t test_sinh_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_sinh(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +svfloat32_t test_cosh_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_cosh(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +svfloat32_t test_tanh_vv_i8mf8(svfloat32_t v) { + + return __builtin_elementwise_tanh(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} diff --git a/clang/test/Sema/riscv-rvv-vector-trig-ops.c b/clang/test/Sema/riscv-rvv-vector-trig-ops.c index 0aed1b2a099865..006c136f80332c 100644 --- a/clang/test/Sema/riscv-rvv-vector-trig-ops.c +++ b/clang/test/Sema/riscv-rvv-vector-trig-ops.c @@ -1,67 +1,67 @@ -// RUN: %clang_cc1 -triple riscv64 -target-feature +f -target-feature +d \ -// RUN: -target-feature +v -target-feature +zfh -target-feature +zvfh \ -// RUN: -disable-O0-optnone -o - -fsyntax-only %s -verify -// REQUIRES: riscv-registered-target - -#include - -vfloat32mf2_t test_asin_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_asin(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} - } - - vfloat32mf2_t test_acos_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_acos(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} - } - - vfloat32mf2_t test_atan_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_atan(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} - } - -vfloat32mf2_t test_atan2_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_atan2(v, v); - // expected-error at -1 {{1st argument must be a floating point type}} -} - -vfloat32mf2_t test_sin_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_sin(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -vfloat32mf2_t test_cos_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_cos(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -vfloat32mf2_t test_tan_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_tan(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} -} - -vfloat32mf2_t test_sinh_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_sinh(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} - } - - vfloat32mf2_t test_cosh_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_cosh(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} - } - - vfloat32mf2_t test_tanh_vv_i8mf8(vfloat32mf2_t v) { - - return __builtin_elementwise_tanh(v); - // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} - } - +// RUN: %clang_cc1 -triple riscv64 -target-feature +f -target-feature +d \ +// RUN: -target-feature +v -target-feature +zfh -target-feature +zvfh \ +// RUN: -disable-O0-optnone -o - -fsyntax-only %s -verify +// REQUIRES: riscv-registered-target + +#include + +vfloat32mf2_t test_asin_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_asin(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} + } + + vfloat32mf2_t test_acos_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_acos(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} + } + + vfloat32mf2_t test_atan_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_atan(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} + } + +vfloat32mf2_t test_atan2_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_atan2(v, v); + // expected-error at -1 {{1st argument must be a floating point type}} +} + +vfloat32mf2_t test_sin_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_sin(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +vfloat32mf2_t test_cos_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_cos(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +vfloat32mf2_t test_tan_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_tan(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} +} + +vfloat32mf2_t test_sinh_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_sinh(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} + } + + vfloat32mf2_t test_cosh_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_cosh(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} + } + + vfloat32mf2_t test_tanh_vv_i8mf8(vfloat32mf2_t v) { + + return __builtin_elementwise_tanh(v); + // expected-error at -1 {{1st argument must be a vector, integer or floating point type}} + } + diff --git a/clang/test/SemaHLSL/Availability/avail-diag-default-compute.hlsl b/clang/test/SemaHLSL/Availability/avail-diag-default-compute.hlsl index 764b9e843f7f1c..b60fba62bdb000 100644 --- a/clang/test/SemaHLSL/Availability/avail-diag-default-compute.hlsl +++ b/clang/test/SemaHLSL/Availability/avail-diag-default-compute.hlsl @@ -1,119 +1,119 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute \ -// RUN: -fsyntax-only -verify %s - -__attribute__((availability(shadermodel, introduced = 6.5))) -float fx(float); // #fx - -__attribute__((availability(shadermodel, introduced = 6.6))) -half fx(half); // #fx_half - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) -float fy(float); // #fy - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) -float fz(float); // #fz - -float also_alive(float f) { - // expected-error@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #also_alive_fx_call - // expected-error@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #also_alive_fy_call - // expected-error@#also_alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #also_alive_fz_call - return 0; -} - -float alive(float f) { - // expected-error@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #alive_fx_call - // expected-error@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #alive_fy_call - // expected-error@#alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #alive_fz_call - - return also_alive(f); -} - -float also_dead(float f) { - // unreachable code - no errors expected - float A = fx(f); - float B = fy(f); - float C = fz(f); - return 0; -} - -float dead(float f) { - // unreachable code - no errors expected - float A = fx(f); - float B = fy(f); - float C = fz(f); - - return also_dead(f); -} - -template -T aliveTemp(T f) { - // expected-error@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #aliveTemp_fx_call - // expected-error@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #aliveTemp_fy_call - // expected-error@#aliveTemp_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #aliveTemp_fz_call - return 0; -} - -template T aliveTemp2(T f) { - // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} - // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} - // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - return fx(f); // #aliveTemp2_fx_call -} - -half test(half x) { - return aliveTemp2(x); -} - -float test(float x) { - return aliveTemp2(x); -} - -class MyClass -{ - float F; - float makeF() { - // expected-error@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(F); // #MyClass_makeF_fx_call - // expected-error@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(F); // #MyClass_makeF_fy_call - // expected-error@#MyClass_makeF_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(F); // #MyClass_makeF_fz_call - return 0; - } -}; - -[numthreads(4,1,1)] -float main() { - float f = 3; - MyClass C = { 1.0f }; - float a = alive(f); - float b = aliveTemp(f); // #aliveTemp_inst - float c = C.makeF(); - float d = test((float)1.0); - float e = test((half)1.0); - return a * b * c; -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute \ +// RUN: -fsyntax-only -verify %s + +__attribute__((availability(shadermodel, introduced = 6.5))) +float fx(float); // #fx + +__attribute__((availability(shadermodel, introduced = 6.6))) +half fx(half); // #fx_half + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) +float fy(float); // #fy + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) +float fz(float); // #fz + +float also_alive(float f) { + // expected-error@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #also_alive_fx_call + // expected-error@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #also_alive_fy_call + // expected-error@#also_alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #also_alive_fz_call + return 0; +} + +float alive(float f) { + // expected-error@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #alive_fx_call + // expected-error@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #alive_fy_call + // expected-error@#alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #alive_fz_call + + return also_alive(f); +} + +float also_dead(float f) { + // unreachable code - no errors expected + float A = fx(f); + float B = fy(f); + float C = fz(f); + return 0; +} + +float dead(float f) { + // unreachable code - no errors expected + float A = fx(f); + float B = fy(f); + float C = fz(f); + + return also_dead(f); +} + +template +T aliveTemp(T f) { + // expected-error@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #aliveTemp_fx_call + // expected-error@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #aliveTemp_fy_call + // expected-error@#aliveTemp_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #aliveTemp_fz_call + return 0; +} + +template T aliveTemp2(T f) { + // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} + // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} + // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + return fx(f); // #aliveTemp2_fx_call +} + +half test(half x) { + return aliveTemp2(x); +} + +float test(float x) { + return aliveTemp2(x); +} + +class MyClass +{ + float F; + float makeF() { + // expected-error@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(F); // #MyClass_makeF_fx_call + // expected-error@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(F); // #MyClass_makeF_fy_call + // expected-error@#MyClass_makeF_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(F); // #MyClass_makeF_fz_call + return 0; + } +}; + +[numthreads(4,1,1)] +float main() { + float f = 3; + MyClass C = { 1.0f }; + float a = alive(f); + float b = aliveTemp(f); // #aliveTemp_inst + float c = C.makeF(); + float d = test((float)1.0); + float e = test((half)1.0); + return a * b * c; +} diff --git a/clang/test/SemaHLSL/Availability/avail-diag-default-lib.hlsl b/clang/test/SemaHLSL/Availability/avail-diag-default-lib.hlsl index 6bfc8577670cc7..35b7c384f26cdd 100644 --- a/clang/test/SemaHLSL/Availability/avail-diag-default-lib.hlsl +++ b/clang/test/SemaHLSL/Availability/avail-diag-default-lib.hlsl @@ -1,180 +1,180 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library \ -// RUN: -fsyntax-only -verify %s - -__attribute__((availability(shadermodel, introduced = 6.5))) -float fx(float); // #fx - -__attribute__((availability(shadermodel, introduced = 6.6))) -half fx(half); // #fx_half - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) -float fy(float); // #fy - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) -float fz(float); // #fz - -float also_alive(float f) { - // expected-error@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #also_alive_fx_call - - // expected-error@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #also_alive_fy_call - - // expected-error@#also_alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #also_alive_fz_call - - return 0; -} - -float alive(float f) { - // expected-error@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #alive_fx_call - - // expected-error@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #alive_fy_call - - // expected-error@#alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #alive_fz_call - - return also_alive(f); -} - -float also_dead(float f) { - // unreachable code - no errors expected - float A = fx(f); - float B = fy(f); - float C = fz(f); - return 0; -} - -float dead(float f) { - // unreachable code - no errors expected - float A = fx(f); - float B = fy(f); - float C = fz(f); - return also_dead(f); -} - -template -T aliveTemp(T f) { - // expected-error@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #aliveTemp_fx_call - // expected-error@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #aliveTemp_fy_call - // expected-error@#aliveTemp_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #aliveTemp_fz_call - return 0; -} - -template T aliveTemp2(T f) { - // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} - // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} - // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - return fx(f); // #aliveTemp2_fx_call -} - -half test(half x) { - return aliveTemp2(x); -} - -float test(float x) { - return aliveTemp2(x); -} - -class MyClass -{ - float F; - float makeF() { - // expected-error@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(F); // #MyClass_makeF_fx_call - // expected-error@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(F); // #MyClass_makeF_fy_call - // expected-error@#MyClass_makeF_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(F); // #MyClass_makeF_fz_call - return 0; - } -}; - -// Exported function without body, not used -export void exportedFunctionUnused(float f); - -// Exported function with body, without export, not used -void exportedFunctionUnused(float f) { - // expected-error@#exportedFunctionUnused_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #exportedFunctionUnused_fx_call - - // API with shader-stage-specific availability in unused exported library function - // - no errors expected because the actual shader stage this function - // will be used in not known at this time - float B = fy(f); - float C = fz(f); -} - -// Exported function with body - called from main() which is a compute shader entry point -export void exportedFunctionUsed(float f) { - // expected-error@#exportedFunctionUsed_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #exportedFunctionUsed_fx_call - - // expected-error@#exportedFunctionUsed_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #exportedFunctionUsed_fy_call - - // expected-error@#exportedFunctionUsed_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #exportedFunctionUsed_fz_call -} - -namespace A { - namespace B { - export { - void exportedFunctionInNS(float x) { - // expected-error@#exportedFunctionInNS_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(x); // #exportedFunctionInNS_fx_call - - // API with shader-stage-specific availability in exported library function - // - no errors expected because the actual shader stage this function - // will be used in not known at this time - float B = fy(x); - float C = fz(x); - } - } - } -} - -// Shader entry point without body -[shader("compute")] -[numthreads(4,1,1)] -float main(); - -// Shader entry point with body -[shader("compute")] -[numthreads(4,1,1)] -float main() { - float f = 3; - MyClass C = { 1.0f }; - float a = alive(f); - float b = aliveTemp(f); // #aliveTemp_inst - float c = C.makeF(); - float d = test((float)1.0); - float e = test((half)1.0); - exportedFunctionUsed(1.0f); - return a * b * c; -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library \ +// RUN: -fsyntax-only -verify %s + +__attribute__((availability(shadermodel, introduced = 6.5))) +float fx(float); // #fx + +__attribute__((availability(shadermodel, introduced = 6.6))) +half fx(half); // #fx_half + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) +float fy(float); // #fy + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) +float fz(float); // #fz + +float also_alive(float f) { + // expected-error@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #also_alive_fx_call + + // expected-error@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #also_alive_fy_call + + // expected-error@#also_alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #also_alive_fz_call + + return 0; +} + +float alive(float f) { + // expected-error@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #alive_fx_call + + // expected-error@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #alive_fy_call + + // expected-error@#alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #alive_fz_call + + return also_alive(f); +} + +float also_dead(float f) { + // unreachable code - no errors expected + float A = fx(f); + float B = fy(f); + float C = fz(f); + return 0; +} + +float dead(float f) { + // unreachable code - no errors expected + float A = fx(f); + float B = fy(f); + float C = fz(f); + return also_dead(f); +} + +template +T aliveTemp(T f) { + // expected-error@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #aliveTemp_fx_call + // expected-error@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #aliveTemp_fy_call + // expected-error@#aliveTemp_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #aliveTemp_fz_call + return 0; +} + +template T aliveTemp2(T f) { + // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} + // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} + // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + return fx(f); // #aliveTemp2_fx_call +} + +half test(half x) { + return aliveTemp2(x); +} + +float test(float x) { + return aliveTemp2(x); +} + +class MyClass +{ + float F; + float makeF() { + // expected-error@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(F); // #MyClass_makeF_fx_call + // expected-error@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(F); // #MyClass_makeF_fy_call + // expected-error@#MyClass_makeF_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(F); // #MyClass_makeF_fz_call + return 0; + } +}; + +// Exported function without body, not used +export void exportedFunctionUnused(float f); + +// Exported function with body, without export, not used +void exportedFunctionUnused(float f) { + // expected-error@#exportedFunctionUnused_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #exportedFunctionUnused_fx_call + + // API with shader-stage-specific availability in unused exported library function + // - no errors expected because the actual shader stage this function + // will be used in not known at this time + float B = fy(f); + float C = fz(f); +} + +// Exported function with body - called from main() which is a compute shader entry point +export void exportedFunctionUsed(float f) { + // expected-error@#exportedFunctionUsed_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #exportedFunctionUsed_fx_call + + // expected-error@#exportedFunctionUsed_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #exportedFunctionUsed_fy_call + + // expected-error@#exportedFunctionUsed_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #exportedFunctionUsed_fz_call +} + +namespace A { + namespace B { + export { + void exportedFunctionInNS(float x) { + // expected-error@#exportedFunctionInNS_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(x); // #exportedFunctionInNS_fx_call + + // API with shader-stage-specific availability in exported library function + // - no errors expected because the actual shader stage this function + // will be used in not known at this time + float B = fy(x); + float C = fz(x); + } + } + } +} + +// Shader entry point without body +[shader("compute")] +[numthreads(4,1,1)] +float main(); + +// Shader entry point with body +[shader("compute")] +[numthreads(4,1,1)] +float main() { + float f = 3; + MyClass C = { 1.0f }; + float a = alive(f); + float b = aliveTemp(f); // #aliveTemp_inst + float c = C.makeF(); + float d = test((float)1.0); + float e = test((half)1.0); + exportedFunctionUsed(1.0f); + return a * b * c; +} diff --git a/clang/test/SemaHLSL/Availability/avail-diag-relaxed-compute.hlsl b/clang/test/SemaHLSL/Availability/avail-diag-relaxed-compute.hlsl index 65836c55821d77..40687983839303 100644 --- a/clang/test/SemaHLSL/Availability/avail-diag-relaxed-compute.hlsl +++ b/clang/test/SemaHLSL/Availability/avail-diag-relaxed-compute.hlsl @@ -1,119 +1,119 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute \ -// RUN: -fsyntax-only -Wno-error=hlsl-availability -verify %s - -__attribute__((availability(shadermodel, introduced = 6.5))) -float fx(float); // #fx - -__attribute__((availability(shadermodel, introduced = 6.6))) -half fx(half); // #fx_half - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) -float fy(float); // #fy - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) -float fz(float); // #fz - -float also_alive(float f) { - // expected-warning@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #also_alive_fx_call - // expected-warning@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #also_alive_fy_call - // expected-warning@#also_alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #also_alive_fz_call - return 0; -} - -float alive(float f) { - // expected-warning@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #alive_fx_call - // expected-warning@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #alive_fy_call - // expected-warning@#alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #alive_fz_call - - return also_alive(f); -} - -float also_dead(float f) { - // unreachable code - no errors expected - float A = fx(f); - float B = fy(f); - float C = fz(f); - return 0; -} - -float dead(float f) { - // unreachable code - no errors expected - float A = fx(f); - float B = fy(f); - float C = fz(f); - - return also_dead(f); -} - -template -T aliveTemp(T f) { - // expected-warning@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #aliveTemp_fx_call - // expected-warning@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #aliveTemp_fy_call - // expected-warning@#aliveTemp_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #aliveTemp_fz_call - return 0; -} - -template T aliveTemp2(T f) { - // expected-warning@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} - // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} - // expected-warning@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - return fx(f); // #aliveTemp2_fx_call -} - -half test(half x) { - return aliveTemp2(x); -} - -float test(float x) { - return aliveTemp2(x); -} - -class MyClass -{ - float F; - float makeF() { - // expected-warning@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(F); // #MyClass_makeF_fx_call - // expected-warning@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(F); // #MyClass_makeF_fy_call - // expected-warning@#MyClass_makeF_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(F); // #MyClass_makeF_fz_call - return 0; - } -}; - -[numthreads(4,1,1)] -float main() { - float f = 3; - MyClass C = { 1.0f }; - float a = alive(f); - float b = aliveTemp(f); // #aliveTemp_inst - float c = C.makeF(); - float d = test((float)1.0); - float e = test((half)1.0); - return a * b * c; -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute \ +// RUN: -fsyntax-only -Wno-error=hlsl-availability -verify %s + +__attribute__((availability(shadermodel, introduced = 6.5))) +float fx(float); // #fx + +__attribute__((availability(shadermodel, introduced = 6.6))) +half fx(half); // #fx_half + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) +float fy(float); // #fy + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) +float fz(float); // #fz + +float also_alive(float f) { + // expected-warning@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #also_alive_fx_call + // expected-warning@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #also_alive_fy_call + // expected-warning@#also_alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #also_alive_fz_call + return 0; +} + +float alive(float f) { + // expected-warning@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #alive_fx_call + // expected-warning@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #alive_fy_call + // expected-warning@#alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #alive_fz_call + + return also_alive(f); +} + +float also_dead(float f) { + // unreachable code - no errors expected + float A = fx(f); + float B = fy(f); + float C = fz(f); + return 0; +} + +float dead(float f) { + // unreachable code - no errors expected + float A = fx(f); + float B = fy(f); + float C = fz(f); + + return also_dead(f); +} + +template +T aliveTemp(T f) { + // expected-warning@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #aliveTemp_fx_call + // expected-warning@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #aliveTemp_fy_call + // expected-warning@#aliveTemp_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #aliveTemp_fz_call + return 0; +} + +template T aliveTemp2(T f) { + // expected-warning@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} + // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} + // expected-warning@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + return fx(f); // #aliveTemp2_fx_call +} + +half test(half x) { + return aliveTemp2(x); +} + +float test(float x) { + return aliveTemp2(x); +} + +class MyClass +{ + float F; + float makeF() { + // expected-warning@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(F); // #MyClass_makeF_fx_call + // expected-warning@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(F); // #MyClass_makeF_fy_call + // expected-warning@#MyClass_makeF_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(F); // #MyClass_makeF_fz_call + return 0; + } +}; + +[numthreads(4,1,1)] +float main() { + float f = 3; + MyClass C = { 1.0f }; + float a = alive(f); + float b = aliveTemp(f); // #aliveTemp_inst + float c = C.makeF(); + float d = test((float)1.0); + float e = test((half)1.0); + return a * b * c; +} diff --git a/clang/test/SemaHLSL/Availability/avail-diag-relaxed-lib.hlsl b/clang/test/SemaHLSL/Availability/avail-diag-relaxed-lib.hlsl index 4c9783138f6701..a23e91a546b167 100644 --- a/clang/test/SemaHLSL/Availability/avail-diag-relaxed-lib.hlsl +++ b/clang/test/SemaHLSL/Availability/avail-diag-relaxed-lib.hlsl @@ -1,162 +1,162 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library \ -// RUN: -fsyntax-only -Wno-error=hlsl-availability -verify %s - -__attribute__((availability(shadermodel, introduced = 6.5))) -float fx(float); // #fx - -__attribute__((availability(shadermodel, introduced = 6.6))) -half fx(half); // #fx_half - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) -float fy(float); // #fy - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) -float fz(float); // #fz - -float also_alive(float f) { - // expected-warning@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #also_alive_fx_call - - // expected-warning@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #also_alive_fy_call - - // expected-warning@#also_alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #also_alive_fz_call - - return 0; -} - -float alive(float f) { - // expected-warning@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #alive_fx_call - - // expected-warning@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #alive_fy_call - - // expected-warning@#alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #alive_fz_call - - return also_alive(f); -} - -float also_dead(float f) { - // unreachable code - no errors expected - float A = fx(f); - float B = fy(f); - float C = fz(f); - return 0; -} - -float dead(float f) { - // unreachable code - no errors expected - float A = fx(f); - float B = fy(f); - float C = fz(f); - return also_dead(f); -} - -template -T aliveTemp(T f) { - // expected-warning@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #aliveTemp_fx_call - // expected-warning@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #aliveTemp_fy_call - // expected-warning@#aliveTemp_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #aliveTemp_fz_call - return 0; -} - -template T aliveTemp2(T f) { - // expected-warning@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} - // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} - // expected-warning@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - return fx(f); // #aliveTemp2_fx_call -} - -half test(half x) { - return aliveTemp2(x); -} - -float test(float x) { - return aliveTemp2(x); -} - -class MyClass -{ - float F; - float makeF() { - // expected-warning@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(F); // #MyClass_makeF_fx_call - // expected-warning@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(F); // #MyClass_makeF_fy_call - // expected-warning@#MyClass_makeF_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(F); // #MyClass_makeF_fz_call - return 0; - } -}; - -// Exported function without body, not used -export void exportedFunctionUnused(float f); - -// Exported function with body, without export, not used -void exportedFunctionUnused(float f) { - // expected-warning@#exportedFunctionUnused_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #exportedFunctionUnused_fx_call - - // API with shader-stage-specific availability in unused exported library function - // - no errors expected because the actual shader stage this function - // will be used in not known at this time - float B = fy(f); - float C = fz(f); -} - -// Exported function with body - called from main() which is a compute shader entry point -export void exportedFunctionUsed(float f) { - // expected-warning@#exportedFunctionUsed_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #exportedFunctionUsed_fx_call - - // expected-warning@#exportedFunctionUsed_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #exportedFunctionUsed_fy_call - - // expected-warning@#exportedFunctionUsed_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #exportedFunctionUsed_fz_call -} - -// Shader entry point without body -[shader("compute")] -[numthreads(4,1,1)] -float main(); - -// Shader entry point with body -[shader("compute")] -[numthreads(4,1,1)] -float main() { - float f = 3; - MyClass C = { 1.0f }; - float a = alive(f); - float b = aliveTemp(f); // #aliveTemp_inst - float c = C.makeF(); - float d = test((float)1.0); - float e = test((half)1.0); - exportedFunctionUsed(1.0f); - return a * b * c; -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library \ +// RUN: -fsyntax-only -Wno-error=hlsl-availability -verify %s + +__attribute__((availability(shadermodel, introduced = 6.5))) +float fx(float); // #fx + +__attribute__((availability(shadermodel, introduced = 6.6))) +half fx(half); // #fx_half + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) +float fy(float); // #fy + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) +float fz(float); // #fz + +float also_alive(float f) { + // expected-warning@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #also_alive_fx_call + + // expected-warning@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #also_alive_fy_call + + // expected-warning@#also_alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #also_alive_fz_call + + return 0; +} + +float alive(float f) { + // expected-warning@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #alive_fx_call + + // expected-warning@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #alive_fy_call + + // expected-warning@#alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #alive_fz_call + + return also_alive(f); +} + +float also_dead(float f) { + // unreachable code - no errors expected + float A = fx(f); + float B = fy(f); + float C = fz(f); + return 0; +} + +float dead(float f) { + // unreachable code - no errors expected + float A = fx(f); + float B = fy(f); + float C = fz(f); + return also_dead(f); +} + +template +T aliveTemp(T f) { + // expected-warning@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #aliveTemp_fx_call + // expected-warning@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #aliveTemp_fy_call + // expected-warning@#aliveTemp_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #aliveTemp_fz_call + return 0; +} + +template T aliveTemp2(T f) { + // expected-warning@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} + // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} + // expected-warning@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + return fx(f); // #aliveTemp2_fx_call +} + +half test(half x) { + return aliveTemp2(x); +} + +float test(float x) { + return aliveTemp2(x); +} + +class MyClass +{ + float F; + float makeF() { + // expected-warning@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(F); // #MyClass_makeF_fx_call + // expected-warning@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(F); // #MyClass_makeF_fy_call + // expected-warning@#MyClass_makeF_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(F); // #MyClass_makeF_fz_call + return 0; + } +}; + +// Exported function without body, not used +export void exportedFunctionUnused(float f); + +// Exported function with body, without export, not used +void exportedFunctionUnused(float f) { + // expected-warning@#exportedFunctionUnused_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #exportedFunctionUnused_fx_call + + // API with shader-stage-specific availability in unused exported library function + // - no errors expected because the actual shader stage this function + // will be used in not known at this time + float B = fy(f); + float C = fz(f); +} + +// Exported function with body - called from main() which is a compute shader entry point +export void exportedFunctionUsed(float f) { + // expected-warning@#exportedFunctionUsed_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #exportedFunctionUsed_fx_call + + // expected-warning@#exportedFunctionUsed_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #exportedFunctionUsed_fy_call + + // expected-warning@#exportedFunctionUsed_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #exportedFunctionUsed_fz_call +} + +// Shader entry point without body +[shader("compute")] +[numthreads(4,1,1)] +float main(); + +// Shader entry point with body +[shader("compute")] +[numthreads(4,1,1)] +float main() { + float f = 3; + MyClass C = { 1.0f }; + float a = alive(f); + float b = aliveTemp(f); // #aliveTemp_inst + float c = C.makeF(); + float d = test((float)1.0); + float e = test((half)1.0); + exportedFunctionUsed(1.0f); + return a * b * c; +} diff --git a/clang/test/SemaHLSL/Availability/avail-diag-strict-compute.hlsl b/clang/test/SemaHLSL/Availability/avail-diag-strict-compute.hlsl index b67e10c9a9017a..a8783c10cbabca 100644 --- a/clang/test/SemaHLSL/Availability/avail-diag-strict-compute.hlsl +++ b/clang/test/SemaHLSL/Availability/avail-diag-strict-compute.hlsl @@ -1,129 +1,129 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute \ -// RUN: -fhlsl-strict-availability -fsyntax-only -verify %s - -__attribute__((availability(shadermodel, introduced = 6.5))) -float fx(float); // #fx - -__attribute__((availability(shadermodel, introduced = 6.6))) -half fx(half); // #fx_half - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) -float fy(float); // #fy - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) -float fz(float); // #fz - -float also_alive(float f) { - // expected-error@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #also_alive_fx_call - // expected-error@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #also_alive_fy_call - // expected-error@#also_alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #also_alive_fz_call - return 0; -} - -float alive(float f) { - // expected-error@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #alive_fx_call - // expected-error@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #alive_fy_call - // expected-error@#alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #alive_fz_call - - return also_alive(f); -} - -float also_dead(float f) { - // expected-error@#also_dead_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #also_dead_fx_call - // expected-error@#also_dead_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #also_dead_fy_call - // expected-error@#also_dead_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #also_dead_fz_call - return 0; -} - -float dead(float f) { - // expected-error@#dead_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #dead_fx_call - // expected-error@#dead_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #dead_fy_call - // expected-error@#dead_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #dead_fz_call - - return also_dead(f); -} - -template -T aliveTemp(T f) { - // expected-error@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#aliveTemp_inst {{in instantiation of function template specialization 'aliveTemp' requested here}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #aliveTemp_fx_call - // expected-error@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #aliveTemp_fy_call - // expected-error@#aliveTemp_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #aliveTemp_fz_call - return 0; -} - -template T aliveTemp2(T f) { - // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} - // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} - // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - return fx(f); // #aliveTemp2_fx_call -} - -half test(half x) { - return aliveTemp2(x); // expected-note {{in instantiation of function template specialization 'aliveTemp2' requested here}} -} - -float test(float x) { - return aliveTemp2(x); // expected-note {{in instantiation of function template specialization 'aliveTemp2' requested here}} -} - -class MyClass -{ - float F; - float makeF() { - // expected-error@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(F); // #MyClass_makeF_fx_call - // expected-error@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(F); // #MyClass_makeF_fy_call - // expected-error@#MyClass_makeF_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(F); // #MyClass_makeF_fz_call - } -}; - -[numthreads(4,1,1)] -float main() { - float f = 3; - MyClass C = { 1.0f }; - float a = alive(f); - float b = aliveTemp(f); // #aliveTemp_inst - float c = C.makeF(); - float d = test((float)1.0); - float e = test((half)1.0); - return a * b * c; +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute \ +// RUN: -fhlsl-strict-availability -fsyntax-only -verify %s + +__attribute__((availability(shadermodel, introduced = 6.5))) +float fx(float); // #fx + +__attribute__((availability(shadermodel, introduced = 6.6))) +half fx(half); // #fx_half + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) +float fy(float); // #fy + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) +float fz(float); // #fz + +float also_alive(float f) { + // expected-error@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #also_alive_fx_call + // expected-error@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #also_alive_fy_call + // expected-error@#also_alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #also_alive_fz_call + return 0; +} + +float alive(float f) { + // expected-error@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #alive_fx_call + // expected-error@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #alive_fy_call + // expected-error@#alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #alive_fz_call + + return also_alive(f); +} + +float also_dead(float f) { + // expected-error@#also_dead_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #also_dead_fx_call + // expected-error@#also_dead_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #also_dead_fy_call + // expected-error@#also_dead_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #also_dead_fz_call + return 0; +} + +float dead(float f) { + // expected-error@#dead_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #dead_fx_call + // expected-error@#dead_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #dead_fy_call + // expected-error@#dead_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #dead_fz_call + + return also_dead(f); +} + +template +T aliveTemp(T f) { + // expected-error@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#aliveTemp_inst {{in instantiation of function template specialization 'aliveTemp' requested here}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #aliveTemp_fx_call + // expected-error@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #aliveTemp_fy_call + // expected-error@#aliveTemp_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #aliveTemp_fz_call + return 0; +} + +template T aliveTemp2(T f) { + // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} + // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} + // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + return fx(f); // #aliveTemp2_fx_call +} + +half test(half x) { + return aliveTemp2(x); // expected-note {{in instantiation of function template specialization 'aliveTemp2' requested here}} +} + +float test(float x) { + return aliveTemp2(x); // expected-note {{in instantiation of function template specialization 'aliveTemp2' requested here}} +} + +class MyClass +{ + float F; + float makeF() { + // expected-error@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(F); // #MyClass_makeF_fx_call + // expected-error@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(F); // #MyClass_makeF_fy_call + // expected-error@#MyClass_makeF_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(F); // #MyClass_makeF_fz_call + } +}; + +[numthreads(4,1,1)] +float main() { + float f = 3; + MyClass C = { 1.0f }; + float a = alive(f); + float b = aliveTemp(f); // #aliveTemp_inst + float c = C.makeF(); + float d = test((float)1.0); + float e = test((half)1.0); + return a * b * c; } \ No newline at end of file diff --git a/clang/test/SemaHLSL/Availability/avail-diag-strict-lib.hlsl b/clang/test/SemaHLSL/Availability/avail-diag-strict-lib.hlsl index c7be5afbc2d22f..0fffbc96dac194 100644 --- a/clang/test/SemaHLSL/Availability/avail-diag-strict-lib.hlsl +++ b/clang/test/SemaHLSL/Availability/avail-diag-strict-lib.hlsl @@ -1,192 +1,192 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library \ -// RUN: -fhlsl-strict-availability -fsyntax-only -verify %s - -__attribute__((availability(shadermodel, introduced = 6.5))) -float fx(float); // #fx - -__attribute__((availability(shadermodel, introduced = 6.6))) -half fx(half); // #fx_half - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) -float fy(float); // #fy - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) -float fz(float); // #fz - -// FIXME: all diagnostics marked as FUTURE will come alive when HLSL default -// diagnostic mode is implemented in a future PR which will verify calls in -// all functions that are reachable from the shader library entry points - -float also_alive(float f) { - // expected-error@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #also_alive_fx_call - - // expected-error@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #also_alive_fy_call - - // expected-error@#also_alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #also_alive_fz_call - - return 0; -} - -float alive(float f) { - // expected-error@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #alive_fx_call - - // expected-error@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #alive_fy_call - - // expected-error@#alive_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #alive_fz_call - - return also_alive(f); -} - -float also_dead(float f) { - // expected-error@#also_dead_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #also_dead_fx_call - - // Call to environment-specific function from an unreachable function - // in a shader library - no diagnostic expected. - float B = fy(f); // #also_dead_fy_call - - // Call to environment-specific function from an unreachable function - // in a shader library - no diagnostic expected. - float C = fz(f); // #also_dead_fz_call - return 0; -} - -float dead(float f) { - // expected-error@#dead_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #dead_fx_call - - // Call to environment-specific function from an unreachable function - // in a shader library - no diagnostic expected. - float B = fy(f); // #dead_fy_call - - // Call to environment-specific function from an unreachable function - // in a shader library - no diagnostic expected. - float C = fz(f); // #dead_fz_call - - return also_dead(f); -} - -template -T aliveTemp(T f) { - // expected-error@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#aliveTemp_inst {{in instantiation of function template specialization 'aliveTemp' requested here}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #aliveTemp_fx_call - // expected-error@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #aliveTemp_fy_call - // expected-error@#aliveTemp_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #aliveTemp_fz_call - return 0; -} - -template T aliveTemp2(T f) { - // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} - // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} - // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - return fx(f); // #aliveTemp2_fx_call -} - -half test(half x) { - return aliveTemp2(x); // expected-note {{in instantiation of function template specialization 'aliveTemp2' requested here}} -} - -float test(float x) { - return aliveTemp2(x); // expected-note {{in instantiation of function template specialization 'aliveTemp2' requested here}} -} - -class MyClass -{ - float F; - float makeF() { - // expected-error@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(F); // #MyClass_makeF_fx_call - // expected-error@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(F); // #MyClass_makeF_fy_call - // expected-error@#MyClass_makeF_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(F); // #MyClass_makeF_fz_call - } -}; - -// Exported function without body, not used -export void exportedFunctionUnused(float f); - -// Exported function with body, without export, not used -void exportedFunctionUnused(float f) { - // expected-error@#exportedFunctionUnused_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #exportedFunctionUnused_fx_call - - // API with shader-stage-specific availability in unused exported library function - // - no errors expected because the actual shader stage this function - // will be used in not known at this time - float B = fy(f); - float C = fz(f); -} - -// Exported function with body - called from main() which is a compute shader entry point -export void exportedFunctionUsed(float f) { - // expected-error@#exportedFunctionUsed_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #exportedFunctionUsed_fx_call - - // expected-error@#exportedFunctionUsed_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #exportedFunctionUsed_fy_call - - // expected-error@#exportedFunctionUsed_fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} - float C = fz(f); // #exportedFunctionUsed_fz_call -} - -namespace A { - namespace B { - export { - void exportedFunctionInNS(float x) { - // expected-error@#exportedFunctionInNS_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(x); // #exportedFunctionInNS_fx_call - - // API with shader-stage-specific availability in exported library function - // - no errors expected because the actual shader stage this function - // will be used in not known at this time - float B = fy(x); - float C = fz(x); - } - } - } -} - -[shader("compute")] -[numthreads(4,1,1)] -float main() { - float f = 3; - MyClass C = { 1.0f }; - float a = alive(f);float b = aliveTemp(f); // #aliveTemp_inst - float c = C.makeF(); - float d = test((float)1.0); - float e = test((half)1.0); - exportedFunctionUsed(1.0f); - return a * b * c; -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library \ +// RUN: -fhlsl-strict-availability -fsyntax-only -verify %s + +__attribute__((availability(shadermodel, introduced = 6.5))) +float fx(float); // #fx + +__attribute__((availability(shadermodel, introduced = 6.6))) +half fx(half); // #fx_half + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) +float fy(float); // #fy + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = mesh))) +float fz(float); // #fz + +// FIXME: all diagnostics marked as FUTURE will come alive when HLSL default +// diagnostic mode is implemented in a future PR which will verify calls in +// all functions that are reachable from the shader library entry points + +float also_alive(float f) { + // expected-error@#also_alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #also_alive_fx_call + + // expected-error@#also_alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #also_alive_fy_call + + // expected-error@#also_alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #also_alive_fz_call + + return 0; +} + +float alive(float f) { + // expected-error@#alive_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #alive_fx_call + + // expected-error@#alive_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #alive_fy_call + + // expected-error@#alive_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #alive_fz_call + + return also_alive(f); +} + +float also_dead(float f) { + // expected-error@#also_dead_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #also_dead_fx_call + + // Call to environment-specific function from an unreachable function + // in a shader library - no diagnostic expected. + float B = fy(f); // #also_dead_fy_call + + // Call to environment-specific function from an unreachable function + // in a shader library - no diagnostic expected. + float C = fz(f); // #also_dead_fz_call + return 0; +} + +float dead(float f) { + // expected-error@#dead_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #dead_fx_call + + // Call to environment-specific function from an unreachable function + // in a shader library - no diagnostic expected. + float B = fy(f); // #dead_fy_call + + // Call to environment-specific function from an unreachable function + // in a shader library - no diagnostic expected. + float C = fz(f); // #dead_fz_call + + return also_dead(f); +} + +template +T aliveTemp(T f) { + // expected-error@#aliveTemp_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#aliveTemp_inst {{in instantiation of function template specialization 'aliveTemp' requested here}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #aliveTemp_fx_call + // expected-error@#aliveTemp_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #aliveTemp_fy_call + // expected-error@#aliveTemp_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #aliveTemp_fz_call + return 0; +} + +template T aliveTemp2(T f) { + // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.6 or newer}} + // expected-note@#fx_half {{'fx' has been marked as being introduced in Shader Model 6.6 here, but the deployment target is Shader Model 6.0}} + // expected-error@#aliveTemp2_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + return fx(f); // #aliveTemp2_fx_call +} + +half test(half x) { + return aliveTemp2(x); // expected-note {{in instantiation of function template specialization 'aliveTemp2' requested here}} +} + +float test(float x) { + return aliveTemp2(x); // expected-note {{in instantiation of function template specialization 'aliveTemp2' requested here}} +} + +class MyClass +{ + float F; + float makeF() { + // expected-error@#MyClass_makeF_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(F); // #MyClass_makeF_fx_call + // expected-error@#MyClass_makeF_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(F); // #MyClass_makeF_fy_call + // expected-error@#MyClass_makeF_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(F); // #MyClass_makeF_fz_call + } +}; + +// Exported function without body, not used +export void exportedFunctionUnused(float f); + +// Exported function with body, without export, not used +void exportedFunctionUnused(float f) { + // expected-error@#exportedFunctionUnused_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #exportedFunctionUnused_fx_call + + // API with shader-stage-specific availability in unused exported library function + // - no errors expected because the actual shader stage this function + // will be used in not known at this time + float B = fy(f); + float C = fz(f); +} + +// Exported function with body - called from main() which is a compute shader entry point +export void exportedFunctionUsed(float f) { + // expected-error@#exportedFunctionUsed_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #exportedFunctionUsed_fx_call + + // expected-error@#exportedFunctionUsed_fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #exportedFunctionUsed_fy_call + + // expected-error@#exportedFunctionUsed_fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 6.5 in mesh environment here, but the deployment target is Shader Model 6.0 compute environment}} + float C = fz(f); // #exportedFunctionUsed_fz_call +} + +namespace A { + namespace B { + export { + void exportedFunctionInNS(float x) { + // expected-error@#exportedFunctionInNS_fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{'fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(x); // #exportedFunctionInNS_fx_call + + // API with shader-stage-specific availability in exported library function + // - no errors expected because the actual shader stage this function + // will be used in not known at this time + float B = fy(x); + float C = fz(x); + } + } + } +} + +[shader("compute")] +[numthreads(4,1,1)] +float main() { + float f = 3; + MyClass C = { 1.0f }; + float a = alive(f);float b = aliveTemp(f); // #aliveTemp_inst + float c = C.makeF(); + float d = test((float)1.0); + float e = test((half)1.0); + exportedFunctionUsed(1.0f); + return a * b * c; +} diff --git a/clang/test/SemaHLSL/Availability/avail-lib-multiple-stages.hlsl b/clang/test/SemaHLSL/Availability/avail-lib-multiple-stages.hlsl index b56ab8fe4526ba..bfefc9b116a64f 100644 --- a/clang/test/SemaHLSL/Availability/avail-lib-multiple-stages.hlsl +++ b/clang/test/SemaHLSL/Availability/avail-lib-multiple-stages.hlsl @@ -1,57 +1,57 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library \ -// RUN: -fsyntax-only -verify %s - -__attribute__((availability(shadermodel, introduced = 6.5))) -float fx(float); // #fx - -__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) -__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) -float fy(float); // #fy - -__attribute__((availability(shadermodel, introduced = 5.0, environment = compute))) -float fz(float); // #fz - - -void F(float f) { - // Make sure we only get this error once, even though this function is scanned twice - once - // in compute shader context and once in pixel shader context. - // expected-error@#fx_call {{'fx' is only available on Shader Model 6.5 or newer}} - // expected-note@#fx {{fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} - float A = fx(f); // #fx_call - - // expected-error@#fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} - // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} - float B = fy(f); // #fy_call - - // expected-error@#fz_call {{'fz' is unavailable}} - // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 5.0 in compute environment here, but the deployment target is Shader Model 6.0 pixel environment}} - float X = fz(f); // #fz_call -} - -void deadCode(float f) { - // no diagnostics expected under default diagnostic mode - float A = fx(f); - float B = fy(f); - float X = fz(f); -} - -// Pixel shader -[shader("pixel")] -void mainPixel() { - F(1.0); -} - -// First Compute shader -[shader("compute")] -[numthreads(4,1,1)] -void mainCompute1() { - F(2.0); -} - -// Second compute shader to make sure we do not get duplicate messages if F is called -// from multiple entry points. -[shader("compute")] -[numthreads(4,1,1)] -void mainCompute2() { - F(3.0); -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-library \ +// RUN: -fsyntax-only -verify %s + +__attribute__((availability(shadermodel, introduced = 6.5))) +float fx(float); // #fx + +__attribute__((availability(shadermodel, introduced = 5.0, environment = pixel))) +__attribute__((availability(shadermodel, introduced = 6.5, environment = compute))) +float fy(float); // #fy + +__attribute__((availability(shadermodel, introduced = 5.0, environment = compute))) +float fz(float); // #fz + + +void F(float f) { + // Make sure we only get this error once, even though this function is scanned twice - once + // in compute shader context and once in pixel shader context. + // expected-error@#fx_call {{'fx' is only available on Shader Model 6.5 or newer}} + // expected-note@#fx {{fx' has been marked as being introduced in Shader Model 6.5 here, but the deployment target is Shader Model 6.0}} + float A = fx(f); // #fx_call + + // expected-error@#fy_call {{'fy' is only available in compute environment on Shader Model 6.5 or newer}} + // expected-note@#fy {{'fy' has been marked as being introduced in Shader Model 6.5 in compute environment here, but the deployment target is Shader Model 6.0 compute environment}} + float B = fy(f); // #fy_call + + // expected-error@#fz_call {{'fz' is unavailable}} + // expected-note@#fz {{'fz' has been marked as being introduced in Shader Model 5.0 in compute environment here, but the deployment target is Shader Model 6.0 pixel environment}} + float X = fz(f); // #fz_call +} + +void deadCode(float f) { + // no diagnostics expected under default diagnostic mode + float A = fx(f); + float B = fy(f); + float X = fz(f); +} + +// Pixel shader +[shader("pixel")] +void mainPixel() { + F(1.0); +} + +// First Compute shader +[shader("compute")] +[numthreads(4,1,1)] +void mainCompute1() { + F(2.0); +} + +// Second compute shader to make sure we do not get duplicate messages if F is called +// from multiple entry points. +[shader("compute")] +[numthreads(4,1,1)] +void mainCompute2() { + F(3.0); +} diff --git a/clang/test/SemaHLSL/BuiltIns/StructuredBuffers.hlsl b/clang/test/SemaHLSL/BuiltIns/StructuredBuffers.hlsl index a472d5519dc51f..1ec56542113d90 100644 --- a/clang/test/SemaHLSL/BuiltIns/StructuredBuffers.hlsl +++ b/clang/test/SemaHLSL/BuiltIns/StructuredBuffers.hlsl @@ -1,19 +1,19 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -fsyntax-only -verify %s - -typedef vector float3; - -StructuredBuffer Buffer; - -// expected-error at +2 {{class template 'StructuredBuffer' requires template arguments}} -// expected-note@*:* {{template declaration from hidden source: template class StructuredBuffer}} -StructuredBuffer BufferErr1; - -// expected-error at +2 {{too few template arguments for class template 'StructuredBuffer'}} -// expected-note@*:* {{template declaration from hidden source: template class StructuredBuffer}} -StructuredBuffer<> BufferErr2; - -[numthreads(1,1,1)] -void main() { - (void)Buffer.h; // expected-error {{'h' is a private member of 'hlsl::StructuredBuffer>'}} - // expected-note@* {{implicitly declared private here}} -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -x hlsl -fsyntax-only -verify %s + +typedef vector float3; + +StructuredBuffer Buffer; + +// expected-error at +2 {{class template 'StructuredBuffer' requires template arguments}} +// expected-note@*:* {{template declaration from hidden source: template class StructuredBuffer}} +StructuredBuffer BufferErr1; + +// expected-error at +2 {{too few template arguments for class template 'StructuredBuffer'}} +// expected-note@*:* {{template declaration from hidden source: template class StructuredBuffer}} +StructuredBuffer<> BufferErr2; + +[numthreads(1,1,1)] +void main() { + (void)Buffer.h; // expected-error {{'h' is a private member of 'hlsl::StructuredBuffer>'}} + // expected-note@* {{implicitly declared private here}} +} diff --git a/clang/test/SemaHLSL/BuiltIns/cross-errors.hlsl b/clang/test/SemaHLSL/BuiltIns/cross-errors.hlsl index 423f5bac9471f4..354e7abb8a31eb 100644 --- a/clang/test/SemaHLSL/BuiltIns/cross-errors.hlsl +++ b/clang/test/SemaHLSL/BuiltIns/cross-errors.hlsl @@ -1,43 +1,43 @@ -// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -disable-llvm-passes -verify - -void test_too_few_arg() -{ - return __builtin_hlsl_cross(); - // expected-error at -1 {{too few arguments to function call, expected 2, have 0}} -} - -void test_too_many_arg(float3 p0) -{ - return __builtin_hlsl_cross(p0, p0, p0); - // expected-error at -1 {{too many arguments to function call, expected 2, have 3}} -} - -bool builtin_bool_to_float_type_promotion(bool p1) -{ - return __builtin_hlsl_cross(p1, p1); - // expected-error at -1 {{passing 'bool' to parameter of incompatible type 'float'}} -} - -bool builtin_cross_int_to_float_promotion(int p1) -{ - return __builtin_hlsl_cross(p1, p1); - // expected-error at -1 {{passing 'int' to parameter of incompatible type 'float'}} -} - -bool2 builtin_cross_int2_to_float2_promotion(int2 p1) -{ - return __builtin_hlsl_cross(p1, p1); - // expected-error at -1 {{passing 'int2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} -} - -float2 builtin_cross_float2(float2 p1, float2 p2) -{ - return __builtin_hlsl_cross(p1, p2); - // expected-error at -1 {{too many elements in vector operand (expected 3 elements, have 2)}} -} - -float3 builtin_cross_float3_int3(float3 p1, int3 p2) -{ - return __builtin_hlsl_cross(p1, p2); - // expected-error at -1 {{all arguments to '__builtin_hlsl_cross' must have the same type}} -} +// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -disable-llvm-passes -verify + +void test_too_few_arg() +{ + return __builtin_hlsl_cross(); + // expected-error at -1 {{too few arguments to function call, expected 2, have 0}} +} + +void test_too_many_arg(float3 p0) +{ + return __builtin_hlsl_cross(p0, p0, p0); + // expected-error at -1 {{too many arguments to function call, expected 2, have 3}} +} + +bool builtin_bool_to_float_type_promotion(bool p1) +{ + return __builtin_hlsl_cross(p1, p1); + // expected-error at -1 {{passing 'bool' to parameter of incompatible type 'float'}} +} + +bool builtin_cross_int_to_float_promotion(int p1) +{ + return __builtin_hlsl_cross(p1, p1); + // expected-error at -1 {{passing 'int' to parameter of incompatible type 'float'}} +} + +bool2 builtin_cross_int2_to_float2_promotion(int2 p1) +{ + return __builtin_hlsl_cross(p1, p1); + // expected-error at -1 {{passing 'int2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} +} + +float2 builtin_cross_float2(float2 p1, float2 p2) +{ + return __builtin_hlsl_cross(p1, p2); + // expected-error at -1 {{too many elements in vector operand (expected 3 elements, have 2)}} +} + +float3 builtin_cross_float3_int3(float3 p1, int3 p2) +{ + return __builtin_hlsl_cross(p1, p2); + // expected-error at -1 {{all arguments to '__builtin_hlsl_cross' must have the same type}} +} diff --git a/clang/test/SemaHLSL/BuiltIns/half-float-only-errors2.hlsl b/clang/test/SemaHLSL/BuiltIns/half-float-only-errors2.hlsl index bfbd8b28257a3b..b876a8e84cb3ac 100644 --- a/clang/test/SemaHLSL/BuiltIns/half-float-only-errors2.hlsl +++ b/clang/test/SemaHLSL/BuiltIns/half-float-only-errors2.hlsl @@ -1,13 +1,13 @@ -// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -emit-llvm-only -disable-llvm-passes -verify -DTEST_FUNC=__builtin_elementwise_atan2 -// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -emit-llvm-only -disable-llvm-passes -verify -DTEST_FUNC=__builtin_elementwise_fmod -// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -emit-llvm-only -disable-llvm-passes -verify -DTEST_FUNC=__builtin_elementwise_pow - -double test_double_builtin(double p0, double p1) { - return TEST_FUNC(p0, p1); - // expected-error at -1 {{passing 'double' to parameter of incompatible type 'float'}} -} - -double2 test_vec_double_builtin(double2 p0, double2 p1) { - return TEST_FUNC(p0, p1); - // expected-error at -1 {{passing 'double2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} -} +// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -emit-llvm-only -disable-llvm-passes -verify -DTEST_FUNC=__builtin_elementwise_atan2 +// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -emit-llvm-only -disable-llvm-passes -verify -DTEST_FUNC=__builtin_elementwise_fmod +// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -emit-llvm-only -disable-llvm-passes -verify -DTEST_FUNC=__builtin_elementwise_pow + +double test_double_builtin(double p0, double p1) { + return TEST_FUNC(p0, p1); + // expected-error at -1 {{passing 'double' to parameter of incompatible type 'float'}} +} + +double2 test_vec_double_builtin(double2 p0, double2 p1) { + return TEST_FUNC(p0, p1); + // expected-error at -1 {{passing 'double2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} +} diff --git a/clang/test/SemaHLSL/BuiltIns/length-errors.hlsl b/clang/test/SemaHLSL/BuiltIns/length-errors.hlsl index 281faada6f5e94..c5e2ac0b502dc4 100644 --- a/clang/test/SemaHLSL/BuiltIns/length-errors.hlsl +++ b/clang/test/SemaHLSL/BuiltIns/length-errors.hlsl @@ -1,32 +1,32 @@ -// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -disable-llvm-passes -verify -verify-ignore-unexpected - - -void test_too_few_arg() -{ - return __builtin_hlsl_length(); - // expected-error at -1 {{too few arguments to function call, expected 1, have 0}} -} - -void test_too_many_arg(float2 p0) -{ - return __builtin_hlsl_length(p0, p0); - // expected-error at -1 {{too many arguments to function call, expected 1, have 2}} -} - -bool builtin_bool_to_float_type_promotion(bool p1) -{ - return __builtin_hlsl_length(p1); - // expected-error at -1 {passing 'bool' to parameter of incompatible type 'float'}} -} - -bool builtin_length_int_to_float_promotion(int p1) -{ - return __builtin_hlsl_length(p1); - // expected-error at -1 {{passing 'int' to parameter of incompatible type 'float'}} -} - -bool2 builtin_length_int2_to_float2_promotion(int2 p1) -{ - return __builtin_hlsl_length(p1); - // expected-error at -1 {{passing 'int2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} -} +// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -disable-llvm-passes -verify -verify-ignore-unexpected + + +void test_too_few_arg() +{ + return __builtin_hlsl_length(); + // expected-error at -1 {{too few arguments to function call, expected 1, have 0}} +} + +void test_too_many_arg(float2 p0) +{ + return __builtin_hlsl_length(p0, p0); + // expected-error at -1 {{too many arguments to function call, expected 1, have 2}} +} + +bool builtin_bool_to_float_type_promotion(bool p1) +{ + return __builtin_hlsl_length(p1); + // expected-error at -1 {passing 'bool' to parameter of incompatible type 'float'}} +} + +bool builtin_length_int_to_float_promotion(int p1) +{ + return __builtin_hlsl_length(p1); + // expected-error at -1 {{passing 'int' to parameter of incompatible type 'float'}} +} + +bool2 builtin_length_int2_to_float2_promotion(int2 p1) +{ + return __builtin_hlsl_length(p1); + // expected-error at -1 {{passing 'int2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} +} diff --git a/clang/test/SemaHLSL/BuiltIns/normalize-errors.hlsl b/clang/test/SemaHLSL/BuiltIns/normalize-errors.hlsl index fc48c9b2589f7e..3720dca9b88a12 100644 --- a/clang/test/SemaHLSL/BuiltIns/normalize-errors.hlsl +++ b/clang/test/SemaHLSL/BuiltIns/normalize-errors.hlsl @@ -1,31 +1,31 @@ -// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -disable-llvm-passes -verify -verify-ignore-unexpected - -void test_too_few_arg() -{ - return __builtin_hlsl_normalize(); - // expected-error at -1 {{too few arguments to function call, expected 1, have 0}} -} - -void test_too_many_arg(float2 p0) -{ - return __builtin_hlsl_normalize(p0, p0); - // expected-error at -1 {{too many arguments to function call, expected 1, have 2}} -} - -bool builtin_bool_to_float_type_promotion(bool p1) -{ - return __builtin_hlsl_normalize(p1); - // expected-error at -1 {passing 'bool' to parameter of incompatible type 'float'}} -} - -bool builtin_normalize_int_to_float_promotion(int p1) -{ - return __builtin_hlsl_normalize(p1); - // expected-error at -1 {{passing 'int' to parameter of incompatible type 'float'}} -} - -bool2 builtin_normalize_int2_to_float2_promotion(int2 p1) -{ - return __builtin_hlsl_normalize(p1); - // expected-error at -1 {{passing 'int2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} -} +// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -disable-llvm-passes -verify -verify-ignore-unexpected + +void test_too_few_arg() +{ + return __builtin_hlsl_normalize(); + // expected-error at -1 {{too few arguments to function call, expected 1, have 0}} +} + +void test_too_many_arg(float2 p0) +{ + return __builtin_hlsl_normalize(p0, p0); + // expected-error at -1 {{too many arguments to function call, expected 1, have 2}} +} + +bool builtin_bool_to_float_type_promotion(bool p1) +{ + return __builtin_hlsl_normalize(p1); + // expected-error at -1 {passing 'bool' to parameter of incompatible type 'float'}} +} + +bool builtin_normalize_int_to_float_promotion(int p1) +{ + return __builtin_hlsl_normalize(p1); + // expected-error at -1 {{passing 'int' to parameter of incompatible type 'float'}} +} + +bool2 builtin_normalize_int2_to_float2_promotion(int2 p1) +{ + return __builtin_hlsl_normalize(p1); + // expected-error at -1 {{passing 'int2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} +} diff --git a/clang/test/SemaHLSL/BuiltIns/step-errors.hlsl b/clang/test/SemaHLSL/BuiltIns/step-errors.hlsl index 823585201ca62d..a76c5ff5dbd2ba 100644 --- a/clang/test/SemaHLSL/BuiltIns/step-errors.hlsl +++ b/clang/test/SemaHLSL/BuiltIns/step-errors.hlsl @@ -1,31 +1,31 @@ -// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -disable-llvm-passes -verify -verify-ignore-unexpected - -void test_too_few_arg() -{ - return __builtin_hlsl_step(); - // expected-error at -1 {{too few arguments to function call, expected 2, have 0}} -} - -void test_too_many_arg(float2 p0) -{ - return __builtin_hlsl_step(p0, p0, p0); - // expected-error at -1 {{too many arguments to function call, expected 2, have 3}} -} - -bool builtin_bool_to_float_type_promotion(bool p1) -{ - return __builtin_hlsl_step(p1, p1); - // expected-error at -1 {passing 'bool' to parameter of incompatible type 'float'}} -} - -bool builtin_step_int_to_float_promotion(int p1) -{ - return __builtin_hlsl_step(p1, p1); - // expected-error at -1 {{passing 'int' to parameter of incompatible type 'float'}} -} - -bool2 builtin_step_int2_to_float2_promotion(int2 p1) -{ - return __builtin_hlsl_step(p1, p1); - // expected-error at -1 {{passing 'int2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} -} +// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -disable-llvm-passes -verify -verify-ignore-unexpected + +void test_too_few_arg() +{ + return __builtin_hlsl_step(); + // expected-error at -1 {{too few arguments to function call, expected 2, have 0}} +} + +void test_too_many_arg(float2 p0) +{ + return __builtin_hlsl_step(p0, p0, p0); + // expected-error at -1 {{too many arguments to function call, expected 2, have 3}} +} + +bool builtin_bool_to_float_type_promotion(bool p1) +{ + return __builtin_hlsl_step(p1, p1); + // expected-error at -1 {passing 'bool' to parameter of incompatible type 'float'}} +} + +bool builtin_step_int_to_float_promotion(int p1) +{ + return __builtin_hlsl_step(p1, p1); + // expected-error at -1 {{passing 'int' to parameter of incompatible type 'float'}} +} + +bool2 builtin_step_int2_to_float2_promotion(int2 p1) +{ + return __builtin_hlsl_step(p1, p1); + // expected-error at -1 {{passing 'int2' (aka 'vector') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(float)))) float' (vector of 2 'float' values)}} +} diff --git a/clang/test/SemaHLSL/Types/Traits/IsIntangibleType.hlsl b/clang/test/SemaHLSL/Types/Traits/IsIntangibleType.hlsl index 8c0f8d6f271dbd..1223a131af35c4 100644 --- a/clang/test/SemaHLSL/Types/Traits/IsIntangibleType.hlsl +++ b/clang/test/SemaHLSL/Types/Traits/IsIntangibleType.hlsl @@ -1,81 +1,81 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.6-library -finclude-default-header -verify %s -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.6-library -finclude-default-header -fnative-half-type -verify %s -// expected-no-diagnostics - -_Static_assert(__builtin_hlsl_is_intangible(__hlsl_resource_t), ""); -// no need to check array of __hlsl_resource_t, arrays of sizeless types are not supported - -_Static_assert(!__builtin_hlsl_is_intangible(int), ""); -_Static_assert(!__builtin_hlsl_is_intangible(float3), ""); -_Static_assert(!__builtin_hlsl_is_intangible(half[4]), ""); - -typedef __hlsl_resource_t Res; -_Static_assert(__builtin_hlsl_is_intangible(const Res), ""); -// no need to check array of Res, arrays of sizeless types are not supported - -struct ABuffer { - const int i[10]; - __hlsl_resource_t h; -}; -_Static_assert(__builtin_hlsl_is_intangible(ABuffer), ""); -_Static_assert(__builtin_hlsl_is_intangible(ABuffer[10]), ""); - -struct MyStruct { - half2 h2; - int3 i3; -}; -_Static_assert(!__builtin_hlsl_is_intangible(MyStruct), ""); -_Static_assert(!__builtin_hlsl_is_intangible(MyStruct[10]), ""); - -class MyClass { - int3 ivec; - float farray[12]; - MyStruct ms; - ABuffer buf; -}; -_Static_assert(__builtin_hlsl_is_intangible(MyClass), ""); -_Static_assert(__builtin_hlsl_is_intangible(MyClass[2]), ""); - -union U { - double d[4]; - Res buf; -}; -_Static_assert(__builtin_hlsl_is_intangible(U), ""); -_Static_assert(__builtin_hlsl_is_intangible(U[100]), ""); - -class MyClass2 { - int3 ivec; - float farray[12]; - U u; -}; -_Static_assert(__builtin_hlsl_is_intangible(MyClass2), ""); -_Static_assert(__builtin_hlsl_is_intangible(MyClass2[5]), ""); - -class Simple { - int a; -}; - -template struct TemplatedBuffer { - T a; - __hlsl_resource_t h; -}; -_Static_assert(__builtin_hlsl_is_intangible(TemplatedBuffer), ""); - -struct MyStruct2 : TemplatedBuffer { - float x; -}; -_Static_assert(__builtin_hlsl_is_intangible(MyStruct2), ""); - -struct MyStruct3 { - const TemplatedBuffer TB[10]; -}; -_Static_assert(__builtin_hlsl_is_intangible(MyStruct3), ""); - -template struct SimpleTemplate { - T a; -}; -_Static_assert(__builtin_hlsl_is_intangible(SimpleTemplate<__hlsl_resource_t>), ""); -_Static_assert(!__builtin_hlsl_is_intangible(SimpleTemplate), ""); - -_Static_assert(__builtin_hlsl_is_intangible(RWBuffer), ""); -_Static_assert(__builtin_hlsl_is_intangible(StructuredBuffer), ""); +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.6-library -finclude-default-header -verify %s +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.6-library -finclude-default-header -fnative-half-type -verify %s +// expected-no-diagnostics + +_Static_assert(__builtin_hlsl_is_intangible(__hlsl_resource_t), ""); +// no need to check array of __hlsl_resource_t, arrays of sizeless types are not supported + +_Static_assert(!__builtin_hlsl_is_intangible(int), ""); +_Static_assert(!__builtin_hlsl_is_intangible(float3), ""); +_Static_assert(!__builtin_hlsl_is_intangible(half[4]), ""); + +typedef __hlsl_resource_t Res; +_Static_assert(__builtin_hlsl_is_intangible(const Res), ""); +// no need to check array of Res, arrays of sizeless types are not supported + +struct ABuffer { + const int i[10]; + __hlsl_resource_t h; +}; +_Static_assert(__builtin_hlsl_is_intangible(ABuffer), ""); +_Static_assert(__builtin_hlsl_is_intangible(ABuffer[10]), ""); + +struct MyStruct { + half2 h2; + int3 i3; +}; +_Static_assert(!__builtin_hlsl_is_intangible(MyStruct), ""); +_Static_assert(!__builtin_hlsl_is_intangible(MyStruct[10]), ""); + +class MyClass { + int3 ivec; + float farray[12]; + MyStruct ms; + ABuffer buf; +}; +_Static_assert(__builtin_hlsl_is_intangible(MyClass), ""); +_Static_assert(__builtin_hlsl_is_intangible(MyClass[2]), ""); + +union U { + double d[4]; + Res buf; +}; +_Static_assert(__builtin_hlsl_is_intangible(U), ""); +_Static_assert(__builtin_hlsl_is_intangible(U[100]), ""); + +class MyClass2 { + int3 ivec; + float farray[12]; + U u; +}; +_Static_assert(__builtin_hlsl_is_intangible(MyClass2), ""); +_Static_assert(__builtin_hlsl_is_intangible(MyClass2[5]), ""); + +class Simple { + int a; +}; + +template struct TemplatedBuffer { + T a; + __hlsl_resource_t h; +}; +_Static_assert(__builtin_hlsl_is_intangible(TemplatedBuffer), ""); + +struct MyStruct2 : TemplatedBuffer { + float x; +}; +_Static_assert(__builtin_hlsl_is_intangible(MyStruct2), ""); + +struct MyStruct3 { + const TemplatedBuffer TB[10]; +}; +_Static_assert(__builtin_hlsl_is_intangible(MyStruct3), ""); + +template struct SimpleTemplate { + T a; +}; +_Static_assert(__builtin_hlsl_is_intangible(SimpleTemplate<__hlsl_resource_t>), ""); +_Static_assert(!__builtin_hlsl_is_intangible(SimpleTemplate), ""); + +_Static_assert(__builtin_hlsl_is_intangible(RWBuffer), ""); +_Static_assert(__builtin_hlsl_is_intangible(StructuredBuffer), ""); diff --git a/clang/test/SemaHLSL/Types/Traits/IsIntangibleTypeErrors.hlsl b/clang/test/SemaHLSL/Types/Traits/IsIntangibleTypeErrors.hlsl index de9ac90b895fc6..33614e87640dad 100644 --- a/clang/test/SemaHLSL/Types/Traits/IsIntangibleTypeErrors.hlsl +++ b/clang/test/SemaHLSL/Types/Traits/IsIntangibleTypeErrors.hlsl @@ -1,12 +1,12 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.6-library -finclude-default-header -verify %s - -struct Undefined; // expected-note {{forward declaration of 'Undefined'}} -_Static_assert(!__builtin_hlsl_is_intangible(Undefined), ""); // expected-error{{incomplete type 'Undefined' used in type trait expression}} - -void fn(int X) { // expected-note {{declared here}} - // expected-error@#vla {{variable length arrays are not supported for the current target}} - // expected-error@#vla {{variable length arrays are not supported in '__builtin_hlsl_is_intangible'}} - // expected-warning@#vla {{variable length arrays in C++ are a Clang extension}} - // expected-note@#vla {{function parameter 'X' with unknown value cannot be used in a constant expression}} - _Static_assert(!__builtin_hlsl_is_intangible(int[X]), ""); // #vla -} +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.6-library -finclude-default-header -verify %s + +struct Undefined; // expected-note {{forward declaration of 'Undefined'}} +_Static_assert(!__builtin_hlsl_is_intangible(Undefined), ""); // expected-error{{incomplete type 'Undefined' used in type trait expression}} + +void fn(int X) { // expected-note {{declared here}} + // expected-error@#vla {{variable length arrays are not supported for the current target}} + // expected-error@#vla {{variable length arrays are not supported in '__builtin_hlsl_is_intangible'}} + // expected-warning@#vla {{variable length arrays in C++ are a Clang extension}} + // expected-note@#vla {{function parameter 'X' with unknown value cannot be used in a constant expression}} + _Static_assert(!__builtin_hlsl_is_intangible(int[X]), ""); // #vla +} diff --git a/clang/test/SemaHLSL/resource_binding_attr_error_basic.hlsl b/clang/test/SemaHLSL/resource_binding_attr_error_basic.hlsl index 760c057630a7fa..4e50f70952ad13 100644 --- a/clang/test/SemaHLSL/resource_binding_attr_error_basic.hlsl +++ b/clang/test/SemaHLSL/resource_binding_attr_error_basic.hlsl @@ -1,42 +1,42 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s -verify - -// expected-error at +1{{binding type 't' only applies to SRV resources}} -float f1 : register(t0); - -// expected-error at +1 {{binding type 'u' only applies to UAV resources}} -float f2 : register(u0); - -// expected-error at +1{{binding type 'b' only applies to constant buffers. The 'bool constant' binding type is no longer supported}} -float f3 : register(b9); - -// expected-error at +1 {{binding type 's' only applies to sampler state}} -float f4 : register(s0); - -// expected-error at +1{{binding type 'i' ignored. The 'integer constant' binding type is no longer supported}} -float f5 : register(i9); - -// expected-error at +1{{binding type 'x' is invalid}} -float f6 : register(x9); - -cbuffer g_cbuffer1 { -// expected-error at +1{{binding type 'c' ignored in buffer declaration. Did you mean 'packoffset'?}} - float f7 : register(c2); -}; - -tbuffer g_tbuffer1 { -// expected-error at +1{{binding type 'c' ignored in buffer declaration. Did you mean 'packoffset'?}} - float f8 : register(c2); -}; - -cbuffer g_cbuffer2 { -// expected-error at +1{{binding type 'b' only applies to constant buffer resources}} - float f9 : register(b2); -}; - -tbuffer g_tbuffer2 { -// expected-error at +1{{binding type 'i' ignored. The 'integer constant' binding type is no longer supported}} - float f10 : register(i2); -}; - -// expected-error at +1{{binding type 'c' only applies to numeric variables in the global scope}} -RWBuffer f11 : register(c3); +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s -verify + +// expected-error at +1{{binding type 't' only applies to SRV resources}} +float f1 : register(t0); + +// expected-error at +1 {{binding type 'u' only applies to UAV resources}} +float f2 : register(u0); + +// expected-error at +1{{binding type 'b' only applies to constant buffers. The 'bool constant' binding type is no longer supported}} +float f3 : register(b9); + +// expected-error at +1 {{binding type 's' only applies to sampler state}} +float f4 : register(s0); + +// expected-error at +1{{binding type 'i' ignored. The 'integer constant' binding type is no longer supported}} +float f5 : register(i9); + +// expected-error at +1{{binding type 'x' is invalid}} +float f6 : register(x9); + +cbuffer g_cbuffer1 { +// expected-error at +1{{binding type 'c' ignored in buffer declaration. Did you mean 'packoffset'?}} + float f7 : register(c2); +}; + +tbuffer g_tbuffer1 { +// expected-error at +1{{binding type 'c' ignored in buffer declaration. Did you mean 'packoffset'?}} + float f8 : register(c2); +}; + +cbuffer g_cbuffer2 { +// expected-error at +1{{binding type 'b' only applies to constant buffer resources}} + float f9 : register(b2); +}; + +tbuffer g_tbuffer2 { +// expected-error at +1{{binding type 'i' ignored. The 'integer constant' binding type is no longer supported}} + float f10 : register(i2); +}; + +// expected-error at +1{{binding type 'c' only applies to numeric variables in the global scope}} +RWBuffer f11 : register(c3); diff --git a/clang/test/SemaHLSL/resource_binding_attr_error_other.hlsl b/clang/test/SemaHLSL/resource_binding_attr_error_other.hlsl index 4c9e9a6b44c928..503c8469666f3b 100644 --- a/clang/test/SemaHLSL/resource_binding_attr_error_other.hlsl +++ b/clang/test/SemaHLSL/resource_binding_attr_error_other.hlsl @@ -1,9 +1,9 @@ -// RUN: not %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s | FileCheck %s - -// XFAIL: * -// This expectedly fails because RayQuery is an unsupported type. -// When it becomes supported, we should expect an error due to -// the variable type being classified as "other", and according -// to the spec, err_hlsl_unsupported_register_type_and_variable_type -// should be emitted. -RayQuery<0> r1: register(t0); +// RUN: not %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s | FileCheck %s + +// XFAIL: * +// This expectedly fails because RayQuery is an unsupported type. +// When it becomes supported, we should expect an error due to +// the variable type being classified as "other", and according +// to the spec, err_hlsl_unsupported_register_type_and_variable_type +// should be emitted. +RayQuery<0> r1: register(t0); diff --git a/clang/test/SemaHLSL/resource_binding_attr_error_resource.hlsl b/clang/test/SemaHLSL/resource_binding_attr_error_resource.hlsl index 4b6af47c0ab725..ea43e27b5b5ac1 100644 --- a/clang/test/SemaHLSL/resource_binding_attr_error_resource.hlsl +++ b/clang/test/SemaHLSL/resource_binding_attr_error_resource.hlsl @@ -1,49 +1,49 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s -verify - -// This test validates the diagnostics that are emitted when a variable with a "resource" type -// is bound to a register using the register annotation - - -template -struct MyTemplatedSRV { - __hlsl_resource_t [[hlsl::resource_class(SRV)]] x; -}; - -struct MySRV { - __hlsl_resource_t [[hlsl::resource_class(SRV)]] x; -}; - -struct MySampler { - __hlsl_resource_t [[hlsl::resource_class(Sampler)]] x; -}; - -struct MyUAV { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] x; -}; - -struct MyCBuffer { - __hlsl_resource_t [[hlsl::resource_class(CBuffer)]] x; -}; - - -// expected-error at +1 {{binding type 'i' ignored. The 'integer constant' binding type is no longer supported}} -MySRV invalid : register(i2); - -// expected-error at +1 {{binding type 't' only applies to SRV resources}} -MyUAV a : register(t2, space1); - -// expected-error at +1 {{binding type 'u' only applies to UAV resources}} -MySampler b : register(u2, space1); - -// expected-error at +1 {{binding type 'b' only applies to constant buffer resources}} -MyTemplatedSRV c : register(b2); - -// expected-error at +1 {{binding type 's' only applies to sampler state}} -MyUAV d : register(s2, space1); - -// empty binding prefix cases: -// expected-error at +1 {{expected identifier}} -MyTemplatedSRV e: register(); - -// expected-error at +1 {{expected identifier}} -MyTemplatedSRV f: register(""); +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s -verify + +// This test validates the diagnostics that are emitted when a variable with a "resource" type +// is bound to a register using the register annotation + + +template +struct MyTemplatedSRV { + __hlsl_resource_t [[hlsl::resource_class(SRV)]] x; +}; + +struct MySRV { + __hlsl_resource_t [[hlsl::resource_class(SRV)]] x; +}; + +struct MySampler { + __hlsl_resource_t [[hlsl::resource_class(Sampler)]] x; +}; + +struct MyUAV { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] x; +}; + +struct MyCBuffer { + __hlsl_resource_t [[hlsl::resource_class(CBuffer)]] x; +}; + + +// expected-error at +1 {{binding type 'i' ignored. The 'integer constant' binding type is no longer supported}} +MySRV invalid : register(i2); + +// expected-error at +1 {{binding type 't' only applies to SRV resources}} +MyUAV a : register(t2, space1); + +// expected-error at +1 {{binding type 'u' only applies to UAV resources}} +MySampler b : register(u2, space1); + +// expected-error at +1 {{binding type 'b' only applies to constant buffer resources}} +MyTemplatedSRV c : register(b2); + +// expected-error at +1 {{binding type 's' only applies to sampler state}} +MyUAV d : register(s2, space1); + +// empty binding prefix cases: +// expected-error at +1 {{expected identifier}} +MyTemplatedSRV e: register(); + +// expected-error at +1 {{expected identifier}} +MyTemplatedSRV f: register(""); diff --git a/clang/test/SemaHLSL/resource_binding_attr_error_silence_diags.hlsl b/clang/test/SemaHLSL/resource_binding_attr_error_silence_diags.hlsl index e63f264452da79..7f248e30c07096 100644 --- a/clang/test/SemaHLSL/resource_binding_attr_error_silence_diags.hlsl +++ b/clang/test/SemaHLSL/resource_binding_attr_error_silence_diags.hlsl @@ -1,27 +1,27 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only -Wno-legacy-constant-register-binding %s -verify - -// expected-no-diagnostics -float f2 : register(b9); - -float f3 : register(i9); - -cbuffer g_cbuffer1 { - float f4 : register(c2); -}; - - -struct Eg12{ - RWBuffer a; -}; - -Eg12 e12 : register(c9); - -Eg12 bar : register(i1); - -struct Eg7 { - struct Bar { - float f; - }; - Bar b; -}; -Eg7 e7 : register(t0); +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only -Wno-legacy-constant-register-binding %s -verify + +// expected-no-diagnostics +float f2 : register(b9); + +float f3 : register(i9); + +cbuffer g_cbuffer1 { + float f4 : register(c2); +}; + + +struct Eg12{ + RWBuffer a; +}; + +Eg12 e12 : register(c9); + +Eg12 bar : register(i1); + +struct Eg7 { + struct Bar { + float f; + }; + Bar b; +}; +Eg7 e7 : register(t0); diff --git a/clang/test/SemaHLSL/resource_binding_attr_error_space.hlsl b/clang/test/SemaHLSL/resource_binding_attr_error_space.hlsl index 70e64e6ca75280..3001dbb1e3ec96 100644 --- a/clang/test/SemaHLSL/resource_binding_attr_error_space.hlsl +++ b/clang/test/SemaHLSL/resource_binding_attr_error_space.hlsl @@ -1,62 +1,62 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s -verify - -// valid -cbuffer cbuf { - RWBuffer r : register(u0, space0); -} - -cbuffer cbuf2 { - struct x { - // this test validates that no diagnostic is emitted on the space parameter, because - // this register annotation is not in the global scope. - // expected-error at +1 {{'register' attribute only applies to cbuffer/tbuffer and external global variables}} - RWBuffer E : register(u2, space3); - }; -} - -struct MyStruct { - RWBuffer E; -}; - -cbuffer cbuf3 { - // valid - MyStruct E : register(u2, space3); -} - -// valid -MyStruct F : register(u3, space4); - -cbuffer cbuf4 { - // this test validates that no diagnostic is emitted on the space parameter, because - // this register annotation is not in the global scope. - // expected-error at +1 {{binding type 'u' only applies to UAV resources}} - float a : register(u2, space3); -} - -// expected-error at +1 {{invalid space specifier 's2' used; expected 'space' followed by an integer, like space1}} -cbuffer a : register(b0, s2) { - -} - -// expected-error at +1 {{invalid space specifier 'spaces' used; expected 'space' followed by an integer, like space1}} -cbuffer b : register(b2, spaces) { - -} - -// expected-error at +1 {{wrong argument format for hlsl attribute, use space3 instead}} -cbuffer c : register(b2, space 3) {} - -// expected-error at +1 {{register space cannot be specified on global constants}} -int d : register(c2, space3); - -// expected-error at +1 {{register space cannot be specified on global constants}} -int e : register(c2, space0); - -// expected-error at +1 {{register space cannot be specified on global constants}} -int f : register(c2, space00); - -// valid -RWBuffer g : register(u2, space0); - -// valid -RWBuffer h : register(u2, space0); +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s -verify + +// valid +cbuffer cbuf { + RWBuffer r : register(u0, space0); +} + +cbuffer cbuf2 { + struct x { + // this test validates that no diagnostic is emitted on the space parameter, because + // this register annotation is not in the global scope. + // expected-error at +1 {{'register' attribute only applies to cbuffer/tbuffer and external global variables}} + RWBuffer E : register(u2, space3); + }; +} + +struct MyStruct { + RWBuffer E; +}; + +cbuffer cbuf3 { + // valid + MyStruct E : register(u2, space3); +} + +// valid +MyStruct F : register(u3, space4); + +cbuffer cbuf4 { + // this test validates that no diagnostic is emitted on the space parameter, because + // this register annotation is not in the global scope. + // expected-error at +1 {{binding type 'u' only applies to UAV resources}} + float a : register(u2, space3); +} + +// expected-error at +1 {{invalid space specifier 's2' used; expected 'space' followed by an integer, like space1}} +cbuffer a : register(b0, s2) { + +} + +// expected-error at +1 {{invalid space specifier 'spaces' used; expected 'space' followed by an integer, like space1}} +cbuffer b : register(b2, spaces) { + +} + +// expected-error at +1 {{wrong argument format for hlsl attribute, use space3 instead}} +cbuffer c : register(b2, space 3) {} + +// expected-error at +1 {{register space cannot be specified on global constants}} +int d : register(c2, space3); + +// expected-error at +1 {{register space cannot be specified on global constants}} +int e : register(c2, space0); + +// expected-error at +1 {{register space cannot be specified on global constants}} +int f : register(c2, space00); + +// valid +RWBuffer g : register(u2, space0); + +// valid +RWBuffer h : register(u2, space0); diff --git a/clang/test/SemaHLSL/resource_binding_attr_error_udt.hlsl b/clang/test/SemaHLSL/resource_binding_attr_error_udt.hlsl index 40517f393e1284..235004102a539b 100644 --- a/clang/test/SemaHLSL/resource_binding_attr_error_udt.hlsl +++ b/clang/test/SemaHLSL/resource_binding_attr_error_udt.hlsl @@ -1,135 +1,135 @@ -// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s -verify - -template -struct MyTemplatedUAV { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] x; -}; - -struct MySRV { - __hlsl_resource_t [[hlsl::resource_class(SRV)]] x; -}; - -struct MySampler { - __hlsl_resource_t [[hlsl::resource_class(Sampler)]] x; -}; - -struct MyUAV { - __hlsl_resource_t [[hlsl::resource_class(UAV)]] x; -}; - -struct MyCBuffer { - __hlsl_resource_t [[hlsl::resource_class(CBuffer)]] x; -}; - -// Valid: f is skipped, SRVBuf is bound to t0, UAVBuf is bound to u0 -struct Eg1 { - float f; - MySRV SRVBuf; - MyUAV UAVBuf; - }; -Eg1 e1 : register(t0) : register(u0); - -// Valid: f is skipped, SRVBuf is bound to t0, UAVBuf is bound to u0. -// UAVBuf2 gets automatically assigned to u1 even though there is no explicit binding for u1. -struct Eg2 { - float f; - MySRV SRVBuf; - MyUAV UAVBuf; - MyUAV UAVBuf2; - }; -Eg2 e2 : register(t0) : register(u0); - -// Valid: Bar, the struct within Eg3, has a valid resource that can be bound to t0. -struct Eg3 { - struct Bar { - MyUAV a; - }; - Bar b; -}; -Eg3 e3 : register(u0); - -// Valid: the first sampler state object within 's' is bound to slot 5 -struct Eg4 { - MySampler s[3]; -}; - -Eg4 e4 : register(s5); - - -struct Eg5 { - float f; -}; -// expected-warning at +1{{binding type 't' only applies to types containing SRV resources}} -Eg5 e5 : register(t0); - -struct Eg6 { - float f; -}; -// expected-warning at +1{{binding type 'u' only applies to types containing UAV resources}} -Eg6 e6 : register(u0); - -struct Eg7 { - float f; -}; -// expected-warning at +1{{binding type 'b' only applies to types containing constant buffer resources}} -Eg7 e7 : register(b0); - -struct Eg8 { - float f; -}; -// expected-warning at +1{{binding type 's' only applies to types containing sampler state}} -Eg8 e8 : register(s0); - -struct Eg9 { - MySRV s; -}; -// expected-warning at +1{{binding type 'c' only applies to types containing numeric types}} -Eg9 e9 : register(c0); - -struct Eg10{ - // expected-error at +1{{'register' attribute only applies to cbuffer/tbuffer and external global variables}} - MyTemplatedUAV a : register(u9); -}; -Eg10 e10; - - -template -struct Eg11 { - R b; -}; -// expected-warning at +1{{binding type 'u' only applies to types containing UAV resources}} -Eg11 e11 : register(u0); -// invalid because after template expansion, there are no valid resources inside Eg11 to bind as a UAV, only an SRV - - -struct Eg12{ - MySRV s1; - MySRV s2; -}; -// expected-warning at +2{{binding type 'u' only applies to types containing UAV resources}} -// expected-error at +1{{binding type 'u' cannot be applied more than once}} -Eg12 e12 : register(u9) : register(u10); - -struct Eg13{ - MySRV s1; - MySRV s2; -}; -// expected-warning at +3{{binding type 'u' only applies to types containing UAV resources}} -// expected-error at +2{{binding type 'u' cannot be applied more than once}} -// expected-error at +1{{binding type 'u' cannot be applied more than once}} -Eg13 e13 : register(u9) : register(u10) : register(u11); - -// expected-error at +1{{binding type 't' cannot be applied more than once}} -Eg13 e13_2 : register(t11) : register(t12); - -struct Eg14{ - MyTemplatedUAV r1; -}; -// expected-warning at +1{{binding type 't' only applies to types containing SRV resources}} -Eg14 e14 : register(t9); - -struct Eg15 { - float f[4]; -}; -// expected no error -Eg15 e15 : register(c0); +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -x hlsl -o - -fsyntax-only %s -verify + +template +struct MyTemplatedUAV { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] x; +}; + +struct MySRV { + __hlsl_resource_t [[hlsl::resource_class(SRV)]] x; +}; + +struct MySampler { + __hlsl_resource_t [[hlsl::resource_class(Sampler)]] x; +}; + +struct MyUAV { + __hlsl_resource_t [[hlsl::resource_class(UAV)]] x; +}; + +struct MyCBuffer { + __hlsl_resource_t [[hlsl::resource_class(CBuffer)]] x; +}; + +// Valid: f is skipped, SRVBuf is bound to t0, UAVBuf is bound to u0 +struct Eg1 { + float f; + MySRV SRVBuf; + MyUAV UAVBuf; + }; +Eg1 e1 : register(t0) : register(u0); + +// Valid: f is skipped, SRVBuf is bound to t0, UAVBuf is bound to u0. +// UAVBuf2 gets automatically assigned to u1 even though there is no explicit binding for u1. +struct Eg2 { + float f; + MySRV SRVBuf; + MyUAV UAVBuf; + MyUAV UAVBuf2; + }; +Eg2 e2 : register(t0) : register(u0); + +// Valid: Bar, the struct within Eg3, has a valid resource that can be bound to t0. +struct Eg3 { + struct Bar { + MyUAV a; + }; + Bar b; +}; +Eg3 e3 : register(u0); + +// Valid: the first sampler state object within 's' is bound to slot 5 +struct Eg4 { + MySampler s[3]; +}; + +Eg4 e4 : register(s5); + + +struct Eg5 { + float f; +}; +// expected-warning at +1{{binding type 't' only applies to types containing SRV resources}} +Eg5 e5 : register(t0); + +struct Eg6 { + float f; +}; +// expected-warning at +1{{binding type 'u' only applies to types containing UAV resources}} +Eg6 e6 : register(u0); + +struct Eg7 { + float f; +}; +// expected-warning at +1{{binding type 'b' only applies to types containing constant buffer resources}} +Eg7 e7 : register(b0); + +struct Eg8 { + float f; +}; +// expected-warning at +1{{binding type 's' only applies to types containing sampler state}} +Eg8 e8 : register(s0); + +struct Eg9 { + MySRV s; +}; +// expected-warning at +1{{binding type 'c' only applies to types containing numeric types}} +Eg9 e9 : register(c0); + +struct Eg10{ + // expected-error at +1{{'register' attribute only applies to cbuffer/tbuffer and external global variables}} + MyTemplatedUAV a : register(u9); +}; +Eg10 e10; + + +template +struct Eg11 { + R b; +}; +// expected-warning at +1{{binding type 'u' only applies to types containing UAV resources}} +Eg11 e11 : register(u0); +// invalid because after template expansion, there are no valid resources inside Eg11 to bind as a UAV, only an SRV + + +struct Eg12{ + MySRV s1; + MySRV s2; +}; +// expected-warning at +2{{binding type 'u' only applies to types containing UAV resources}} +// expected-error at +1{{binding type 'u' cannot be applied more than once}} +Eg12 e12 : register(u9) : register(u10); + +struct Eg13{ + MySRV s1; + MySRV s2; +}; +// expected-warning at +3{{binding type 'u' only applies to types containing UAV resources}} +// expected-error at +2{{binding type 'u' cannot be applied more than once}} +// expected-error at +1{{binding type 'u' cannot be applied more than once}} +Eg13 e13 : register(u9) : register(u10) : register(u11); + +// expected-error at +1{{binding type 't' cannot be applied more than once}} +Eg13 e13_2 : register(t11) : register(t12); + +struct Eg14{ + MyTemplatedUAV r1; +}; +// expected-warning at +1{{binding type 't' only applies to types containing SRV resources}} +Eg14 e14 : register(t9); + +struct Eg15 { + float f[4]; +}; +// expected no error +Eg15 e15 : register(c0); diff --git a/clang/tools/scan-build/bin/scan-build.bat b/clang/tools/scan-build/bin/scan-build.bat index 77be6746318f11..f765f205b8ec50 100644 --- a/clang/tools/scan-build/bin/scan-build.bat +++ b/clang/tools/scan-build/bin/scan-build.bat @@ -1 +1 @@ -perl -S scan-build %* +perl -S scan-build %* diff --git a/clang/tools/scan-build/libexec/c++-analyzer.bat b/clang/tools/scan-build/libexec/c++-analyzer.bat index 69f048a91671f0..83c7172456a51a 100644 --- a/clang/tools/scan-build/libexec/c++-analyzer.bat +++ b/clang/tools/scan-build/libexec/c++-analyzer.bat @@ -1 +1 @@ -perl -S c++-analyzer %* +perl -S c++-analyzer %* diff --git a/clang/tools/scan-build/libexec/ccc-analyzer.bat b/clang/tools/scan-build/libexec/ccc-analyzer.bat index 2a85376eb82b16..fdd36f3bdd0437 100644 --- a/clang/tools/scan-build/libexec/ccc-analyzer.bat +++ b/clang/tools/scan-build/libexec/ccc-analyzer.bat @@ -1 +1 @@ -perl -S ccc-analyzer %* +perl -S ccc-analyzer %* diff --git a/clang/utils/ClangVisualizers/clang.natvis b/clang/utils/ClangVisualizers/clang.natvis index a7c70186bc46de..611c20dacce176 100644 --- a/clang/utils/ClangVisualizers/clang.natvis +++ b/clang/utils/ClangVisualizers/clang.natvis @@ -1,1089 +1,1089 @@ - - - - - - - LocInfoType - {(clang::Type::TypeClass)TypeBits.TC, en}Type - - {*(clang::BuiltinType *)this} - {*(clang::PointerType *)this} - {*(clang::ParenType *)this} - {(clang::BitIntType *)this} - {*(clang::LValueReferenceType *)this} - {*(clang::RValueReferenceType *)this} - {(clang::ConstantArrayType *)this,na} - {(clang::ConstantArrayType *)this,view(left)na} - {(clang::ConstantArrayType *)this,view(right)na} - {(clang::VariableArrayType *)this,na} - {(clang::VariableArrayType *)this,view(left)na} - {(clang::VariableArrayType *)this,view(right)na} - {(clang::IncompleteArrayType *)this,na} - {(clang::IncompleteArrayType *)this,view(left)na} - {(clang::IncompleteArrayType *)this,view(right)na} - {(clang::TypedefType *)this,na} - {(clang::TypedefType *)this,view(cpp)na} - {*(clang::AttributedType *)this} - {(clang::DecayedType *)this,na} - {(clang::DecayedType *)this,view(left)na} - {(clang::DecayedType *)this,view(right)na} - {(clang::ElaboratedType *)this,na} - {(clang::ElaboratedType *)this,view(left)na} - {(clang::ElaboratedType *)this,view(right)na} - {*(clang::TemplateTypeParmType *)this} - {*(clang::TemplateTypeParmType *)this,view(cpp)} - {*(clang::SubstTemplateTypeParmType *)this} - {*(clang::RecordType *)this} - {*(clang::RecordType *)this,view(cpp)} - {(clang::FunctionProtoType *)this,na} - {(clang::FunctionProtoType *)this,view(left)na} - {(clang::FunctionProtoType *)this,view(right)na} - {*(clang::TemplateSpecializationType *)this} - {*(clang::DeducedTemplateSpecializationType *)this} - {*(clang::DeducedTemplateSpecializationType *)this,view(cpp)} - {*(clang::InjectedClassNameType *)this} - {*(clang::DependentNameType *)this} - {*(clang::PackExpansionType *)this} - {(clang::LocInfoType *)this,na} - {(clang::LocInfoType *)this,view(cpp)na} - {this,view(poly)na} - {*this,view(cpp)} - - No visualizer yet for {(clang::Type::TypeClass)TypeBits.TC,en}Type - Dependence{" ",en} - - CachedLinkage: {(clang::Linkage)TypeBits.CachedLinkage,en} CachedLocalOrUnnamed - CachedLinkage: {(clang::Linkage)TypeBits.CachedLinkage,en}{" ",sb} - - FromAST - - - No TypeBits set beyond TypeClass - - {*this, view(Dependence)}{*this, view(Cache)}{*this, view(FromAST)} - {*this,view(cmn)} {{{*this,view(poly)}}} - - (clang::Type::TypeClass)TypeBits.TC - this,view(flags)na - CanonicalType - *(clang::BuiltinType *)this - *(clang::PointerType *)this - *(clang::ParenType*)this - *(clang::BitIntType*)this - *(clang::LValueReferenceType *)this - *(clang::RValueReferenceType *)this - (clang::ConstantArrayType *)this - (clang::VariableArrayType *)this - (clang::IncompleteArrayType *)this - *(clang::AttributedType *)this - (clang::DecayedType *)this - (clang::ElaboratedType *)this - (clang::TemplateTypeParmType *)this - (clang::SubstTemplateTypeParmType *)this - (clang::RecordType *)this - (clang::FunctionProtoType *)this - (clang::TemplateSpecializationType *)this - (clang::DeducedTemplateSpecializationType *)this - (clang::InjectedClassNameType *)this - (clang::DependentNameType *)this - (clang::PackExpansionType *)this - (clang::LocInfoType *)this - - - - - ElementType - - - - {ElementType,view(cpp)} - [{Size}] - {ElementType,view(cpp)}[{Size}] - - Size - (clang::ArrayType *)this - - - - {ElementType,view(cpp)} - [] - {ElementType,view(cpp)}[] - - (clang::ArrayType *)this - - - - {ElementType,view(cpp)} - [*] - {ElementType,view(cpp)}[*] - - (clang::Expr *)SizeExpr - (clang::ArrayType *)this - - - - {Decl,view(name)nd} - {Decl} - - Decl - *(clang::Type *)this, view(cmn) - - - - {PointeeType, view(cpp)} * - - PointeeType - *(clang::Type *)this, view(cmn) - - - - {Inner, view(cpp)} - - Inner - *(clang::Type *)this, view(cmn) - - - - signed _BitInt({NumBits}) - unsigned _BitInt({NumBits})( - - NumBits - (clang::Type *)this, view(cmn) - - - - - {((clang::ReferenceType *)this)->PointeeType,view(cpp)} & - - *(clang::Type *)this, view(cmn) - PointeeType - - - - {((clang::ReferenceType *)this)->PointeeType,view(cpp)} && - - *(clang::Type *)this, view(cmn) - PointeeType - - - - {ModifiedType} Attribute={(clang::AttributedType::Kind)AttributedTypeBits.AttrKind} - - - - - {(clang::Decl::Kind)DeclContextBits.DeclKind,en}Decl - - (clang::Decl::Kind)DeclContextBits.DeclKind,en - - - - - FirstDecl - (clang::Decl *)(*(intptr_t *)NextInContextAndBits.Value.Data & ~3) - *this - - - - - - - Field {{{*(clang::DeclaratorDecl *)this,view(cpp)nd}}} - - - {*(clang::FunctionDecl *)this,nd} - Method {{{*this,view(cpp)}}} - - - Constructor {{{Name,view(cpp)}({*(clang::FunctionDecl *)this,view(parm0)nd})}} - - - Destructor {{~{Name,view(cpp)}()}} - - - typename - class - (not yet known if parameter pack) - ... - - {(TypeSourceInfo *)(*(uintptr_t *)DefaultArgument.ValueOrInherited.Val.Value.Data&~3LL),view(cpp)} - {{InheritedInitializer}} - = {this,view(DefaultArg)na} - - {*this,view(TorC)} {*this,view(MaybeEllipses)}{Name,view(cpp)} {this,view(Initializer)na} - - - {*TemplatedDecl,view(cpp)} - template{TemplateParams,na} {*TemplatedDecl}; - - TemplateParams,na - TemplatedDecl,na - - - - - {(clang::TypeSourceInfo *)(*(uintptr_t *)MaybeModedTInfo.Value.Data & ~7LL),view(cpp)na} - {(clang::TypedefNameDecl::ModedTInfo *)(*(uintptr_t *)MaybeModedTInfo.Value.Data & ~7LL),view(cpp)na} - {(TypeDecl *)this,view(cpp)nand} - typedef {this,view(type)na} {this,view(name)na}; - - "Not yet calculated",sb - (bool)(*(uintptr_t *)MaybeModedTInfo.Value.Data & 2) - (clang::TypeSourceInfo *)(*(uintptr_t *)MaybeModedTInfo.Value.Data & ~7LL) - (clang::TypedefNameDecl::ModedTInfo *)(*(uintptr_t *)MaybeModedTInfo.Value.Data & ~7LL) - (TypeDecl *)this,nd - - - - {(TypedefNameDecl *)this,view(name)nand} - using {(TypedefNameDecl *)this,view(name)nand} = {(TypedefNameDecl *)this,view(type)nand} - - - {Name} - - - Kind={(UncommonTemplateNameStorage::Kind)Kind,en}, Size={Size} - - (UncommonTemplateNameStorage::Kind)Kind - Size - - - - {Bits}, - {this,view(cmn)na},{(OverloadedTemplateStorage*)this,na} - {this,view(cmn)na},{(AssumedTemplateStorage*)this,na} - {this,view(cmn)na},{(SubstTemplateTemplateParmStorage*)this,na} - {this,view(cmn)na},{(SubstTemplateTemplateParmPackStorage*)this,na} - {this,view(cmn)na} - - Bits - (OverloadedTemplateStorage*)this - (AssumedTemplateStorage*)this - (SubstTemplateTemplateParmStorage*)this - (SubstTemplateTemplateParmPackStorage*)this - - - - - - - {(clang::TemplateDecl *)(Val.Value & ~3LL),view(cpp)na} - - - {(clang::TemplateDecl *)(Val.Value & ~3LL),na} - - - {(clang::UncommonTemplateNameStorage *)(Val.Value & ~3LL),view(cpp)na} - - - {(clang::UncommonTemplateNameStorage *)(Val.Value & ~3LL),na} - - - {(clang::QualifiedTemplateName *)(Val.Value & ~3LL),view(cpp)na} - - - {(clang::QualifiedTemplateName *)(Val.Value & ~3LL),na} - - - {(clang::DependentTemplateName *)(Val.Value & ~3LL),view(cpp)na} - - - {(clang::DependentTemplateName *)(Val.Value & ~3LL),na} - - - "TemplateDecl",s8b - - (clang::TemplateDecl *)(Val.Value & ~3LL) - - "UncommonTemplateNameStorage",s8b - - (clang::UncommonTemplateNameStorage *)(Val.Value & ~3LL) - - "QualifiedTemplateName",s8b - - (clang::QualifiedTemplateName *)(Val.Value & ~3LL) - - "DependentTemplateName",s8b - - (clang::DependentTemplateName *)(Val.Value & ~3LL) - - Val - - - - - {Storage,view(cpp)na} - {Storage,na} - - Storage - - - - {Name,view(cpp)} - {Name} - - - implicit{" ",sb} - - {*this,view(implicit)nd} - {*this,view(modifiers)}{Name,view(cpp)} - {*this,view(modifiers)nd}struct {Name,view(cpp)} - {*this,view(modifiers)nd}interface {Name,view(cpp)} - {*this,view(modifiers)nd}union {Name,view(cpp)} - {*this,view(modifiers)nd}class {Name,view(cpp)} - {*this,view(modifiers)nd}enum {Name,view(cpp)} - - (clang::DeclContext *)this - - - - {decl,view(cpp)na} - {*decl} - - *(clang::Type *)this, view(cmn) - decl - - - - {(clang::TagType *)this,view(cpp)na} - {(clang::TagType *)this,na} - - *(clang::TagType *)this - - - - {{{*Replaced,view(cpp)} <= {CanonicalType,view(cpp)}}} - - *(clang::Type *)this, view(cmn) - *Replaced - - - - - - {ResultType,view(cpp)} - - {*(clang::QualType *)(this+1),view(cpp)}{*this,view(parm1)} - - , {*((clang::QualType *)(this+1)+1),view(cpp)}{*this,view(parm2)} - - , {*((clang::QualType *)(this+1)+2),view(cpp)}{*this,view(parm3)} - - , {*((clang::QualType *)(this+1)+3),view(cpp)}{*this,view(parm4)} - - , {*((clang::QualType *)(this+1)+4),view(cpp)}{*this,view(parm5)} - - , /* expand for more params */ - ({*this,view(parm0)}) -> {ResultType,view(cpp)} - ({*this,view(parm0)}) - {this,view(left)na}{this,view(right)na} - - ResultType - - {*this,view(parm0)} - - - FunctionTypeBits.NumParams - (clang::QualType *)(this+1) - - - - *(clang::Type *)this, view(cmn) - - - - - {OriginalTy} adjusted to {AdjustedTy} - - OriginalTy - AdjustedTy - - - - {OriginalTy,view(left)} - {OriginalTy,view(right)} - {OriginalTy} - - (clang::AdjustedType *)this - - - - {NamedType,view(left)} - {NamedType,view(right)} - {NamedType} - - (clang::ElaboratedTypeKeyword)TypeWithKeywordBits.Keyword - NNS - NamedType,view(cmn) - - - - {TTPDecl->Name,view(cpp)} - Non-canonical: {*TTPDecl} - Canonical: {CanTTPTInfo} - - *(clang::Type *)this, view(cmn) - - - - {Decl,view(cpp)} - - Decl - InjectedType - *(clang::Type *)this, view(cmn) - - - - {NNS}{Name,view(cpp)na} - - NNS - Name - *(clang::Type *)this, view(cmn) - - - - - {(IdentifierInfo*)Specifier,view(cpp)na}:: - {(NamedDecl*)Specifier,view(cpp)na}:: - {(Type*)Specifier,view(cpp)na}:: - - (NestedNameSpecifier::StoredSpecifierKind)((*(uintptr_t *)Prefix.Value.Data>>1)&3) - - - - {Pattern} - - Pattern - NumExpansions - *(clang::Type *)this, view(cmn) - - - - {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,view(poly)}{*this,view(fastQuals)} - {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,view(cpp)}{*this,view(fastQuals)} - {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,view(left)}{*this,view(fastQuals)} - {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,view(right)}{*this,view(fastQuals)} - - - {" ",sb}const - {" ",sb}restrict - {" ",sb}const restrict - {" ",sb}volatile - {" ",sb}const volatile - {" ",sb}volatile restrict - {" ",sb}const volatile restrict - Cannot visualize non-fast qualifiers - Null - {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,na}{*this,view(fastQuals)} - - *this,view(fastQuals) - ((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType - - - - - {DeclInfo,view(cpp)na} - {DeclInfo,na} - - DeclInfo - *(clang::Type *)this, view(cmn) - - - - {Ty,view(cpp)} - {Ty} - - Ty - - - - {(QualType *)&Ty,na} - - (QualType *)&Ty - Data - - - - Not building anything - Building a {LastTy} - - - {Argument,view(cpp)} - {Argument} - - - {*(clang::QualType *)&TypeOrValue.V,view(cpp)} - {(clang::TemplateArgument::ArgKind)TypeOrValue.Kind,en} template argument: {*(clang::QualType *)&TypeOrValue.V} - - {Args.Args[0]}{*this,view(arg1)} - - , {Args.Args[1]}{*this,view(arg2)} - - , {Args.Args[2]}, ... - - {Args.Args[0],view(cpp)}{*this,view(arg1cpp)} - - , {Args.Args[1],view(cpp)}{*this,view(arg2cpp)} - - , {Args.Args[2],view(cpp)}, ... - {*this,view(arg0cpp)} - {*this,view(arg0)} - {(clang::Expr *)TypeOrValue.V,view(cpp)na} - {(clang::TemplateArgument::ArgKind)TypeOrValue.Kind,en} - - *(clang::QualType *)&TypeOrValue.V - (clang::Expr *)TypeOrValue.V - - Args.NumArgs - Args.Args - - - - - - - {((TemplateArgumentLoc*)Arguments.BeginX)[0],view(cpp)}{*this,view(elt1)} - - , {((TemplateArgumentLoc*)Arguments.BeginX)[1],view(cpp)}{*this,view(elt2)} - - , {((TemplateArgumentLoc*)Arguments.BeginX)[2],view(cpp)}{*this,view(elt3)} - - , {((TemplateArgumentLoc*)Arguments.BeginX)[3],view(cpp)}{*this,view(elt4)} - - , ... - empty - <{*this,view(elt0)}> - Uninitialized - - - - {Arguments[0],view(cpp)}{*this,view(arg1)} - - , {Arguments[1],view(cpp)}{*this,view(arg2)} - - , {Arguments[1],view(cpp)}, ... - <{*this,view(arg0)}> - - NumArguments - - NumArguments - Arguments - - - - - - {Data[0],view(cpp)}{*this,view(arg1)} - - , {Data[1],view(cpp)}{*this,view(arg2)} - - , {Data[2],view(cpp)}, ... - <{*this,view(arg0)}> - - Length - - - - Length - Data - - - - - - - - {((llvm::ArrayRef<clang::TemplateArgument> *)TemplateArgumentLists.BeginX)[0],view(cpp)}{*this,view(level1)} - - ::{((llvm::ArrayRef<clang::TemplateArgument> *)TemplateArgumentLists.BeginX)[1],view(cpp)}{*this,view(level2)} - - ::{((llvm::ArrayRef<clang::TemplateArgument> *)TemplateArgumentLists.BeginX)[2],view(cpp)}, ... - {*this,view(level0)} - - TemplateArgumentLists - - - - {(clang::QualType *)Arg,view(cpp)na} - Type template argument: {*(clang::QualType *)Arg} - Non-type template argument: {*(clang::Expr *)Arg} - Template template argument: {*(clang::TemplateName *)Arg - - Kind,en - (clang::QualType *)Arg - (clang::Expr *)Arg - (clang::TemplateName *)Arg - - - - - void - bool - char - unsigned char - wchar_t - char16_t - char32_t - unsigned short - unsigned int - unsigned long - unsigned long long - __uint128_t - char - signed char - wchar_t - short - int - long - long long - __int128_t - __fp16 - float - double - long double - nullptr_t - {(clang::BuiltinType::Kind)BuiltinTypeBits.Kind, en} - - (clang::BuiltinType::Kind)BuiltinTypeBits.Kind - - - - - - {((clang::TemplateArgument *)(this+1))[0],view(cpp)}{*this,view(arg1)} - - , {((clang::TemplateArgument *)(this+1))[1],view(cpp)}{*this,view(arg2)} - - , {((clang::TemplateArgument *)(this+1))[2],view(cpp)}{*this,view(arg3)} - - {*((clang::TemplateDecl *)(Template.Storage.Val.Value))->TemplatedDecl,view(cpp)}<{*this,view(arg0)}> - - Can't visualize this TemplateSpecializationType - - Template.Storage - - TemplateSpecializationTypeBits.NumArgs - (clang::TemplateArgument *)(this+1) - - *(clang::Type *)this, view(cmn) - - - - - (CanonicalType.Value.Value != this) || TypeBits.Dependent - *(clang::Type *)this,view(cmn) - - - - {CanonicalType,view(cpp)} - {Template,view(cpp)} - {Template} - - Template - CanonicalType,view(cpp) - (clang::DeducedType *)this - Template - - - - {*(CXXRecordDecl *)this,nd}{*TemplateArgs} - - (CXXRecordDecl *)this,nd - TemplateArgs - - - - {((llvm::StringMapEntry<clang::IdentifierInfo *>*)Entry)+1,sb} - - ((llvm::StringMapEntry<clang::IdentifierInfo *>*)Entry)+1,s - (clang::tok::TokenKind)TokenID - - - - - Empty - {*(clang::IdentifierInfo *)(Ptr & ~PtrMask)} - {{Identifier ({*(clang::IdentifierInfo *)(Ptr & ~PtrMask)})}} - {{ObjC Zero Arg Selector (*{(clang::IdentifierInfo *)(Ptr & ~PtrMask)})}} - {{ObjC One Arg Selector (*{(clang::IdentifierInfo *)(Ptr & ~PtrMask)})}} - {(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),view(cpp)na} - C++ Constructor {{{(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),view(cpp)na}}} - C++ Destructor {{*(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask)}} - C++ Conversion function {{*(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask)}} - C++ Operator {{*(clang::detail::CXXOperatorIdName *)(Ptr & ~PtrMask)}} - {*(clang::detail::DeclarationNameExtra *)(Ptr & ~PtrMask),view(cpp)} - {{Extra ({*(clang::detail::DeclarationNameExtra *)(Ptr & ~PtrMask)})}} - - StoredNameKind(Ptr & PtrMask),en - *(clang::IdentifierInfo *)(Ptr & ~PtrMask),na - *(clang::IdentifierInfo *)(Ptr & ~PtrMask),na - *(clang::IdentifierInfo *)(Ptr & ~PtrMask),na - *(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),na - *(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),na - *(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),na - *(clang::detail::CXXOperatorIdName *)(Ptr & ~PtrMask),na - (clang::detail::DeclarationNameExtra *)(Ptr & ~PtrMask),na - - - - - {(CXXDeductionGuideNameExtra *)this,view(cpp)nand} - - - {(CXXDeductionGuideNameExtra *)this,nand} - - C++ Literal operator - C++ Using directive - Objective-C MultiArg selector - {(clang::detail::DeclarationNameExtra::ExtraKind)ExtraKindOrNumArgs,en}{" ",sb}{*this,view(cpp)} - - (CXXDeductionGuideNameExtra *)this - ExtraKindOrNumArgs - - - - {Template->TemplatedDecl,view(cpp)} - C++ Deduction guide for {Template->TemplatedDecl,view(cpp)na} - - - {Type,view(cpp)} - {Type} - - - {Name} - - - - {(ParsedTemplateArgument *)(this+1),view(cpp)na}{this,view(arg1)na} - - , {((ParsedTemplateArgument *)(this+1))+1,view(cpp)na}{this,view(arg2)na} - - , ... - {Name,na}<{this,view(arg0)na}> - - Name - - {this,view(arg0)na} - - - NumArgs - (ParsedTemplateArgument *)(this+1) - - - - Operator - - - - {{annot_template_id ({(clang::TemplateIdAnnotation *)(PtrData),na})}} - {{Identifier ({(clang::IdentifierInfo *)(PtrData),na})}} - {(clang::tok::TokenKind)Kind,en} - - - {BufferPtr,nasb} - - - {TheLexer._Mypair._Myval2,na} - Expanding Macro: {TheTokenLexer._Mypair._Myval2,na} - - - - - [{(Token *)(CachedTokens.BeginX) + CachedLexPos,na}] {IncludeMacroStack._Mypair._Myval2._Mylast - 1,na} - - {IncludeMacroStack._Mypair._Myval2._Mylast - 1,na} - {CurLexer._Mypair._Myval2,na} - Expanding Macro: {CurTokenLexer._Mypair._Myval2,na} - - - {this,view(cached)} - - CLK_LexAfterModuleImport - - - [{Tok}] {PP,na} - - - this - *this - {Id} - &{Id} - No visualizer for {Kind} - - - - =, - &, - - {(LambdaCapture *)(Captures.BeginX),na}{this,view(capture1)na} - - ,{(LambdaCapture *)(Captures.BeginX)+1,na}{this,view(capture2)na} - - ,{(LambdaCapture *)(Captures.BeginX)+2,na}{this,view(capture3)na} - - ,... - [{this,view(default)na}{this,view(capture0)na}] - - - - , [{TypeRep}] - - - , [{ExprRep}] - - - , [{DeclRep}] - - - [{(clang::DeclSpec::SCS)StorageClassSpec,en}], [{(clang::TypeSpecifierType)TypeSpecType,en}]{this,view(extra)na} - - (clang::DeclSpec::SCS)StorageClassSpec - (clang::TypeSpecifierType)TypeSpecType - - TypeRep - - - ExprRep - - - DeclRep - - - - - - {Name,s} - - - {RealPathName,s} - - - {Name,s} - - - - (clang::StorageClass)SClass - (clang::ThreadStorageClassSpecifier)TSCSpec - (clang::VarDecl::InitializationStyle)InitStyle - - - - {DeclType,view(left)} {Name,view(cpp)}{DeclType,view(right)} - - Name - DeclType - - - - {(DeclaratorDecl*)this,nand} - - (DeclaratorDecl*)this,nd - Init - VarDeclBits - - - - {*(VarDecl*)this,nd} - - ParmVarDeclBits - *(VarDecl*)this,nd - - - - {"explicit ",sb} - - explicit({ExplicitSpec,view(ptr)na}) - {ExplicitSpec,view(int)en} - {ExplicitSpec,view(int)en} : {ExplicitSpec,view(ptr)na} - - - {ExplicitSpec,view(cpp)}{Name,view(cpp)nd}({(FunctionDecl*)this,view(parm0)nand}) -> {((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)(((uintptr_t)DeclType.Value.Value) & ~15))->BaseType)->ResultType,view(cpp)} - - ExplicitSpec - (bool)FunctionDeclBits.IsCopyDeductionCandidate - (FunctionDecl*)this,nd - - - - {((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)DeclType.Value.Value.Data) & ~15))->BaseType)->ResultType,view(cpp)} - - {ParamInfo[0],na}{*this,view(parm1)nd} - - , {ParamInfo[1],na}{*this,view(parm2)nd} - - , {ParamInfo[2],na}{*this,view(parm3)nd} - - , {ParamInfo[3],na}{*this,view(parm4)nd} - - , {ParamInfo[4],na}{*this,view(parm5)nd} - - , /* expand for more params */ - - auto {Name,view(cpp)nd}({*this,view(parm0)nd}) -> {((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)DeclType.Value.Value.Data) & ~15))->BaseType)->ResultType,view(cpp)} - - {this,view(retType)nand} {Name,view(cpp)nd}({*this,view(parm0)nd}) - - (clang::DeclaratorDecl *)this,nd - ((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)DeclType.Value.Value.Data) & ~15))->BaseType)->ResultType - - {*this,view(parm0)nd} - - - ((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)DeclType.Value.Value.Data) & ~15))->BaseType)->FunctionTypeBits.NumParams - ParamInfo - - - - TemplateOrSpecialization - - - - {*($T1*)&Ptr} - - ($T1*)&Ptr - - - - {($T1 *)Ptr} - - ($T1 *)Ptr - - - - - {*((NamedDecl **)(this+1))[0],view(cpp)}{*this,view(parm1)} - - , {*((NamedDecl **)(this+1))[1],view(cpp)}{*this,view(parm2)} - - , {*((NamedDecl **)(this+1))[2],view(cpp)}{*this,view(parm3)} - - , {*((NamedDecl **)(this+1))[3],view(cpp)}{*this,view(parm4)} - - , {*((NamedDecl **)(this+1))[4],view(cpp)}{*this,view(parm5)} - - , /* Expand for more params */ - <{*this,view(parm0)}> - - - NumParams - (NamedDecl **)(this+1) - - - - - {(clang::Stmt::StmtClass)StmtBits.sClass,en} - - (clang::Stmt::StmtClass)StmtBits.sClass,en - - - - {*(clang::StringLiteral *)this} - Expression of class {(clang::Stmt::StmtClass)StmtBits.sClass,en} and type {TR,view(cpp)} - - - - *(unsigned *)(((clang::StringLiteral *)this)+1) - (const char *)(((clang::StringLiteral *)this)+1)+4+4,[*(unsigned *)(((clang::StringLiteral *)this)+1)]s8 - - - - public - protected - private - - {*(clang::NamedDecl *)(Ptr&~Mask)} - {*this,view(access)} {*this,view(decl)} - - (clang::AccessSpecifier)(Ptr&Mask),en - *(clang::NamedDecl *)(Ptr&~Mask) - - - - [IK_Identifier] {*Identifier} - [IK_OperatorFunctionId] {OperatorFunctionId} - [IK_ConversionFunctionId] {ConversionFunctionId} - [IK_ConstructorName] {ConstructorName} - [IK_DestructorName] {DestructorName} - [IK_DeductionGuideName] {TemplateName} - [IK_TemplateId] {TemplateId} - [IK_ConstructorTemplateId] {TemplateId} - Kind - - Identifier - OperatorFunctionId - ConversionFunctionId - ConstructorName - DestructorName - TemplateName - TemplateId - TemplateId - - - - NumDecls={NumDecls} - - - NumDecls - (Decl **)(this+1) - - - - - {*D} - {*(DeclGroup *)((uintptr_t)D&~1)} - - D - (DeclGroup *)((uintptr_t)D&~1) - - - - {DS} {Name} - - - {Decls} - - Decls - - - - {Ambiguity,en}: {Decls} - {ResultKind,en}: {Decls} - - - Invalid - Unset - {Val} - - - Invalid - Unset - {($T1)(Value&~1)} - - (bool)(Value&1) - ($T1)(Value&~1) - - - + + + + + + + LocInfoType + {(clang::Type::TypeClass)TypeBits.TC, en}Type + + {*(clang::BuiltinType *)this} + {*(clang::PointerType *)this} + {*(clang::ParenType *)this} + {(clang::BitIntType *)this} + {*(clang::LValueReferenceType *)this} + {*(clang::RValueReferenceType *)this} + {(clang::ConstantArrayType *)this,na} + {(clang::ConstantArrayType *)this,view(left)na} + {(clang::ConstantArrayType *)this,view(right)na} + {(clang::VariableArrayType *)this,na} + {(clang::VariableArrayType *)this,view(left)na} + {(clang::VariableArrayType *)this,view(right)na} + {(clang::IncompleteArrayType *)this,na} + {(clang::IncompleteArrayType *)this,view(left)na} + {(clang::IncompleteArrayType *)this,view(right)na} + {(clang::TypedefType *)this,na} + {(clang::TypedefType *)this,view(cpp)na} + {*(clang::AttributedType *)this} + {(clang::DecayedType *)this,na} + {(clang::DecayedType *)this,view(left)na} + {(clang::DecayedType *)this,view(right)na} + {(clang::ElaboratedType *)this,na} + {(clang::ElaboratedType *)this,view(left)na} + {(clang::ElaboratedType *)this,view(right)na} + {*(clang::TemplateTypeParmType *)this} + {*(clang::TemplateTypeParmType *)this,view(cpp)} + {*(clang::SubstTemplateTypeParmType *)this} + {*(clang::RecordType *)this} + {*(clang::RecordType *)this,view(cpp)} + {(clang::FunctionProtoType *)this,na} + {(clang::FunctionProtoType *)this,view(left)na} + {(clang::FunctionProtoType *)this,view(right)na} + {*(clang::TemplateSpecializationType *)this} + {*(clang::DeducedTemplateSpecializationType *)this} + {*(clang::DeducedTemplateSpecializationType *)this,view(cpp)} + {*(clang::InjectedClassNameType *)this} + {*(clang::DependentNameType *)this} + {*(clang::PackExpansionType *)this} + {(clang::LocInfoType *)this,na} + {(clang::LocInfoType *)this,view(cpp)na} + {this,view(poly)na} + {*this,view(cpp)} + + No visualizer yet for {(clang::Type::TypeClass)TypeBits.TC,en}Type + Dependence{" ",en} + + CachedLinkage: {(clang::Linkage)TypeBits.CachedLinkage,en} CachedLocalOrUnnamed + CachedLinkage: {(clang::Linkage)TypeBits.CachedLinkage,en}{" ",sb} + + FromAST + + + No TypeBits set beyond TypeClass + + {*this, view(Dependence)}{*this, view(Cache)}{*this, view(FromAST)} + {*this,view(cmn)} {{{*this,view(poly)}}} + + (clang::Type::TypeClass)TypeBits.TC + this,view(flags)na + CanonicalType + *(clang::BuiltinType *)this + *(clang::PointerType *)this + *(clang::ParenType*)this + *(clang::BitIntType*)this + *(clang::LValueReferenceType *)this + *(clang::RValueReferenceType *)this + (clang::ConstantArrayType *)this + (clang::VariableArrayType *)this + (clang::IncompleteArrayType *)this + *(clang::AttributedType *)this + (clang::DecayedType *)this + (clang::ElaboratedType *)this + (clang::TemplateTypeParmType *)this + (clang::SubstTemplateTypeParmType *)this + (clang::RecordType *)this + (clang::FunctionProtoType *)this + (clang::TemplateSpecializationType *)this + (clang::DeducedTemplateSpecializationType *)this + (clang::InjectedClassNameType *)this + (clang::DependentNameType *)this + (clang::PackExpansionType *)this + (clang::LocInfoType *)this + + + + + ElementType + + + + {ElementType,view(cpp)} + [{Size}] + {ElementType,view(cpp)}[{Size}] + + Size + (clang::ArrayType *)this + + + + {ElementType,view(cpp)} + [] + {ElementType,view(cpp)}[] + + (clang::ArrayType *)this + + + + {ElementType,view(cpp)} + [*] + {ElementType,view(cpp)}[*] + + (clang::Expr *)SizeExpr + (clang::ArrayType *)this + + + + {Decl,view(name)nd} + {Decl} + + Decl + *(clang::Type *)this, view(cmn) + + + + {PointeeType, view(cpp)} * + + PointeeType + *(clang::Type *)this, view(cmn) + + + + {Inner, view(cpp)} + + Inner + *(clang::Type *)this, view(cmn) + + + + signed _BitInt({NumBits}) + unsigned _BitInt({NumBits})( + + NumBits + (clang::Type *)this, view(cmn) + + + + + {((clang::ReferenceType *)this)->PointeeType,view(cpp)} & + + *(clang::Type *)this, view(cmn) + PointeeType + + + + {((clang::ReferenceType *)this)->PointeeType,view(cpp)} && + + *(clang::Type *)this, view(cmn) + PointeeType + + + + {ModifiedType} Attribute={(clang::AttributedType::Kind)AttributedTypeBits.AttrKind} + + + + + {(clang::Decl::Kind)DeclContextBits.DeclKind,en}Decl + + (clang::Decl::Kind)DeclContextBits.DeclKind,en + + + + + FirstDecl + (clang::Decl *)(*(intptr_t *)NextInContextAndBits.Value.Data & ~3) + *this + + + + + + + Field {{{*(clang::DeclaratorDecl *)this,view(cpp)nd}}} + + + {*(clang::FunctionDecl *)this,nd} + Method {{{*this,view(cpp)}}} + + + Constructor {{{Name,view(cpp)}({*(clang::FunctionDecl *)this,view(parm0)nd})}} + + + Destructor {{~{Name,view(cpp)}()}} + + + typename + class + (not yet known if parameter pack) + ... + + {(TypeSourceInfo *)(*(uintptr_t *)DefaultArgument.ValueOrInherited.Val.Value.Data&~3LL),view(cpp)} + {{InheritedInitializer}} + = {this,view(DefaultArg)na} + + {*this,view(TorC)} {*this,view(MaybeEllipses)}{Name,view(cpp)} {this,view(Initializer)na} + + + {*TemplatedDecl,view(cpp)} + template{TemplateParams,na} {*TemplatedDecl}; + + TemplateParams,na + TemplatedDecl,na + + + + + {(clang::TypeSourceInfo *)(*(uintptr_t *)MaybeModedTInfo.Value.Data & ~7LL),view(cpp)na} + {(clang::TypedefNameDecl::ModedTInfo *)(*(uintptr_t *)MaybeModedTInfo.Value.Data & ~7LL),view(cpp)na} + {(TypeDecl *)this,view(cpp)nand} + typedef {this,view(type)na} {this,view(name)na}; + + "Not yet calculated",sb + (bool)(*(uintptr_t *)MaybeModedTInfo.Value.Data & 2) + (clang::TypeSourceInfo *)(*(uintptr_t *)MaybeModedTInfo.Value.Data & ~7LL) + (clang::TypedefNameDecl::ModedTInfo *)(*(uintptr_t *)MaybeModedTInfo.Value.Data & ~7LL) + (TypeDecl *)this,nd + + + + {(TypedefNameDecl *)this,view(name)nand} + using {(TypedefNameDecl *)this,view(name)nand} = {(TypedefNameDecl *)this,view(type)nand} + + + {Name} + + + Kind={(UncommonTemplateNameStorage::Kind)Kind,en}, Size={Size} + + (UncommonTemplateNameStorage::Kind)Kind + Size + + + + {Bits}, + {this,view(cmn)na},{(OverloadedTemplateStorage*)this,na} + {this,view(cmn)na},{(AssumedTemplateStorage*)this,na} + {this,view(cmn)na},{(SubstTemplateTemplateParmStorage*)this,na} + {this,view(cmn)na},{(SubstTemplateTemplateParmPackStorage*)this,na} + {this,view(cmn)na} + + Bits + (OverloadedTemplateStorage*)this + (AssumedTemplateStorage*)this + (SubstTemplateTemplateParmStorage*)this + (SubstTemplateTemplateParmPackStorage*)this + + + + + + + {(clang::TemplateDecl *)(Val.Value & ~3LL),view(cpp)na} + + + {(clang::TemplateDecl *)(Val.Value & ~3LL),na} + + + {(clang::UncommonTemplateNameStorage *)(Val.Value & ~3LL),view(cpp)na} + + + {(clang::UncommonTemplateNameStorage *)(Val.Value & ~3LL),na} + + + {(clang::QualifiedTemplateName *)(Val.Value & ~3LL),view(cpp)na} + + + {(clang::QualifiedTemplateName *)(Val.Value & ~3LL),na} + + + {(clang::DependentTemplateName *)(Val.Value & ~3LL),view(cpp)na} + + + {(clang::DependentTemplateName *)(Val.Value & ~3LL),na} + + + "TemplateDecl",s8b + + (clang::TemplateDecl *)(Val.Value & ~3LL) + + "UncommonTemplateNameStorage",s8b + + (clang::UncommonTemplateNameStorage *)(Val.Value & ~3LL) + + "QualifiedTemplateName",s8b + + (clang::QualifiedTemplateName *)(Val.Value & ~3LL) + + "DependentTemplateName",s8b + + (clang::DependentTemplateName *)(Val.Value & ~3LL) + + Val + + + + + {Storage,view(cpp)na} + {Storage,na} + + Storage + + + + {Name,view(cpp)} + {Name} + + + implicit{" ",sb} + + {*this,view(implicit)nd} + {*this,view(modifiers)}{Name,view(cpp)} + {*this,view(modifiers)nd}struct {Name,view(cpp)} + {*this,view(modifiers)nd}interface {Name,view(cpp)} + {*this,view(modifiers)nd}union {Name,view(cpp)} + {*this,view(modifiers)nd}class {Name,view(cpp)} + {*this,view(modifiers)nd}enum {Name,view(cpp)} + + (clang::DeclContext *)this + + + + {decl,view(cpp)na} + {*decl} + + *(clang::Type *)this, view(cmn) + decl + + + + {(clang::TagType *)this,view(cpp)na} + {(clang::TagType *)this,na} + + *(clang::TagType *)this + + + + {{{*Replaced,view(cpp)} <= {CanonicalType,view(cpp)}}} + + *(clang::Type *)this, view(cmn) + *Replaced + + + + + + {ResultType,view(cpp)} + + {*(clang::QualType *)(this+1),view(cpp)}{*this,view(parm1)} + + , {*((clang::QualType *)(this+1)+1),view(cpp)}{*this,view(parm2)} + + , {*((clang::QualType *)(this+1)+2),view(cpp)}{*this,view(parm3)} + + , {*((clang::QualType *)(this+1)+3),view(cpp)}{*this,view(parm4)} + + , {*((clang::QualType *)(this+1)+4),view(cpp)}{*this,view(parm5)} + + , /* expand for more params */ + ({*this,view(parm0)}) -> {ResultType,view(cpp)} + ({*this,view(parm0)}) + {this,view(left)na}{this,view(right)na} + + ResultType + + {*this,view(parm0)} + + + FunctionTypeBits.NumParams + (clang::QualType *)(this+1) + + + + *(clang::Type *)this, view(cmn) + + + + + {OriginalTy} adjusted to {AdjustedTy} + + OriginalTy + AdjustedTy + + + + {OriginalTy,view(left)} + {OriginalTy,view(right)} + {OriginalTy} + + (clang::AdjustedType *)this + + + + {NamedType,view(left)} + {NamedType,view(right)} + {NamedType} + + (clang::ElaboratedTypeKeyword)TypeWithKeywordBits.Keyword + NNS + NamedType,view(cmn) + + + + {TTPDecl->Name,view(cpp)} + Non-canonical: {*TTPDecl} + Canonical: {CanTTPTInfo} + + *(clang::Type *)this, view(cmn) + + + + {Decl,view(cpp)} + + Decl + InjectedType + *(clang::Type *)this, view(cmn) + + + + {NNS}{Name,view(cpp)na} + + NNS + Name + *(clang::Type *)this, view(cmn) + + + + + {(IdentifierInfo*)Specifier,view(cpp)na}:: + {(NamedDecl*)Specifier,view(cpp)na}:: + {(Type*)Specifier,view(cpp)na}:: + + (NestedNameSpecifier::StoredSpecifierKind)((*(uintptr_t *)Prefix.Value.Data>>1)&3) + + + + {Pattern} + + Pattern + NumExpansions + *(clang::Type *)this, view(cmn) + + + + {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,view(poly)}{*this,view(fastQuals)} + {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,view(cpp)}{*this,view(fastQuals)} + {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,view(left)}{*this,view(fastQuals)} + {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,view(right)}{*this,view(fastQuals)} + + + {" ",sb}const + {" ",sb}restrict + {" ",sb}const restrict + {" ",sb}volatile + {" ",sb}const volatile + {" ",sb}volatile restrict + {" ",sb}const volatile restrict + Cannot visualize non-fast qualifiers + Null + {((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType,na}{*this,view(fastQuals)} + + *this,view(fastQuals) + ((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)Value.Value.Data) & ~(uintptr_t)((1U << clang::TypeAlignmentInBits) - 1U)))->BaseType + + + + + {DeclInfo,view(cpp)na} + {DeclInfo,na} + + DeclInfo + *(clang::Type *)this, view(cmn) + + + + {Ty,view(cpp)} + {Ty} + + Ty + + + + {(QualType *)&Ty,na} + + (QualType *)&Ty + Data + + + + Not building anything + Building a {LastTy} + + + {Argument,view(cpp)} + {Argument} + + + {*(clang::QualType *)&TypeOrValue.V,view(cpp)} + {(clang::TemplateArgument::ArgKind)TypeOrValue.Kind,en} template argument: {*(clang::QualType *)&TypeOrValue.V} + + {Args.Args[0]}{*this,view(arg1)} + + , {Args.Args[1]}{*this,view(arg2)} + + , {Args.Args[2]}, ... + + {Args.Args[0],view(cpp)}{*this,view(arg1cpp)} + + , {Args.Args[1],view(cpp)}{*this,view(arg2cpp)} + + , {Args.Args[2],view(cpp)}, ... + {*this,view(arg0cpp)} + {*this,view(arg0)} + {(clang::Expr *)TypeOrValue.V,view(cpp)na} + {(clang::TemplateArgument::ArgKind)TypeOrValue.Kind,en} + + *(clang::QualType *)&TypeOrValue.V + (clang::Expr *)TypeOrValue.V + + Args.NumArgs + Args.Args + + + + + + + {((TemplateArgumentLoc*)Arguments.BeginX)[0],view(cpp)}{*this,view(elt1)} + + , {((TemplateArgumentLoc*)Arguments.BeginX)[1],view(cpp)}{*this,view(elt2)} + + , {((TemplateArgumentLoc*)Arguments.BeginX)[2],view(cpp)}{*this,view(elt3)} + + , {((TemplateArgumentLoc*)Arguments.BeginX)[3],view(cpp)}{*this,view(elt4)} + + , ... + empty + <{*this,view(elt0)}> + Uninitialized + + + + {Arguments[0],view(cpp)}{*this,view(arg1)} + + , {Arguments[1],view(cpp)}{*this,view(arg2)} + + , {Arguments[1],view(cpp)}, ... + <{*this,view(arg0)}> + + NumArguments + + NumArguments + Arguments + + + + + + {Data[0],view(cpp)}{*this,view(arg1)} + + , {Data[1],view(cpp)}{*this,view(arg2)} + + , {Data[2],view(cpp)}, ... + <{*this,view(arg0)}> + + Length + + + + Length + Data + + + + + + + + {((llvm::ArrayRef<clang::TemplateArgument> *)TemplateArgumentLists.BeginX)[0],view(cpp)}{*this,view(level1)} + + ::{((llvm::ArrayRef<clang::TemplateArgument> *)TemplateArgumentLists.BeginX)[1],view(cpp)}{*this,view(level2)} + + ::{((llvm::ArrayRef<clang::TemplateArgument> *)TemplateArgumentLists.BeginX)[2],view(cpp)}, ... + {*this,view(level0)} + + TemplateArgumentLists + + + + {(clang::QualType *)Arg,view(cpp)na} + Type template argument: {*(clang::QualType *)Arg} + Non-type template argument: {*(clang::Expr *)Arg} + Template template argument: {*(clang::TemplateName *)Arg + + Kind,en + (clang::QualType *)Arg + (clang::Expr *)Arg + (clang::TemplateName *)Arg + + + + + void + bool + char + unsigned char + wchar_t + char16_t + char32_t + unsigned short + unsigned int + unsigned long + unsigned long long + __uint128_t + char + signed char + wchar_t + short + int + long + long long + __int128_t + __fp16 + float + double + long double + nullptr_t + {(clang::BuiltinType::Kind)BuiltinTypeBits.Kind, en} + + (clang::BuiltinType::Kind)BuiltinTypeBits.Kind + + + + + + {((clang::TemplateArgument *)(this+1))[0],view(cpp)}{*this,view(arg1)} + + , {((clang::TemplateArgument *)(this+1))[1],view(cpp)}{*this,view(arg2)} + + , {((clang::TemplateArgument *)(this+1))[2],view(cpp)}{*this,view(arg3)} + + {*((clang::TemplateDecl *)(Template.Storage.Val.Value))->TemplatedDecl,view(cpp)}<{*this,view(arg0)}> + + Can't visualize this TemplateSpecializationType + + Template.Storage + + TemplateSpecializationTypeBits.NumArgs + (clang::TemplateArgument *)(this+1) + + *(clang::Type *)this, view(cmn) + + + + + (CanonicalType.Value.Value != this) || TypeBits.Dependent + *(clang::Type *)this,view(cmn) + + + + {CanonicalType,view(cpp)} + {Template,view(cpp)} + {Template} + + Template + CanonicalType,view(cpp) + (clang::DeducedType *)this + Template + + + + {*(CXXRecordDecl *)this,nd}{*TemplateArgs} + + (CXXRecordDecl *)this,nd + TemplateArgs + + + + {((llvm::StringMapEntry<clang::IdentifierInfo *>*)Entry)+1,sb} + + ((llvm::StringMapEntry<clang::IdentifierInfo *>*)Entry)+1,s + (clang::tok::TokenKind)TokenID + + + + + Empty + {*(clang::IdentifierInfo *)(Ptr & ~PtrMask)} + {{Identifier ({*(clang::IdentifierInfo *)(Ptr & ~PtrMask)})}} + {{ObjC Zero Arg Selector (*{(clang::IdentifierInfo *)(Ptr & ~PtrMask)})}} + {{ObjC One Arg Selector (*{(clang::IdentifierInfo *)(Ptr & ~PtrMask)})}} + {(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),view(cpp)na} + C++ Constructor {{{(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),view(cpp)na}}} + C++ Destructor {{*(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask)}} + C++ Conversion function {{*(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask)}} + C++ Operator {{*(clang::detail::CXXOperatorIdName *)(Ptr & ~PtrMask)}} + {*(clang::detail::DeclarationNameExtra *)(Ptr & ~PtrMask),view(cpp)} + {{Extra ({*(clang::detail::DeclarationNameExtra *)(Ptr & ~PtrMask)})}} + + StoredNameKind(Ptr & PtrMask),en + *(clang::IdentifierInfo *)(Ptr & ~PtrMask),na + *(clang::IdentifierInfo *)(Ptr & ~PtrMask),na + *(clang::IdentifierInfo *)(Ptr & ~PtrMask),na + *(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),na + *(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),na + *(clang::detail::CXXSpecialNameExtra *)(Ptr & ~PtrMask),na + *(clang::detail::CXXOperatorIdName *)(Ptr & ~PtrMask),na + (clang::detail::DeclarationNameExtra *)(Ptr & ~PtrMask),na + + + + + {(CXXDeductionGuideNameExtra *)this,view(cpp)nand} + + + {(CXXDeductionGuideNameExtra *)this,nand} + + C++ Literal operator + C++ Using directive + Objective-C MultiArg selector + {(clang::detail::DeclarationNameExtra::ExtraKind)ExtraKindOrNumArgs,en}{" ",sb}{*this,view(cpp)} + + (CXXDeductionGuideNameExtra *)this + ExtraKindOrNumArgs + + + + {Template->TemplatedDecl,view(cpp)} + C++ Deduction guide for {Template->TemplatedDecl,view(cpp)na} + + + {Type,view(cpp)} + {Type} + + + {Name} + + + + {(ParsedTemplateArgument *)(this+1),view(cpp)na}{this,view(arg1)na} + + , {((ParsedTemplateArgument *)(this+1))+1,view(cpp)na}{this,view(arg2)na} + + , ... + {Name,na}<{this,view(arg0)na}> + + Name + + {this,view(arg0)na} + + + NumArgs + (ParsedTemplateArgument *)(this+1) + + + + Operator + + + + {{annot_template_id ({(clang::TemplateIdAnnotation *)(PtrData),na})}} + {{Identifier ({(clang::IdentifierInfo *)(PtrData),na})}} + {(clang::tok::TokenKind)Kind,en} + + + {BufferPtr,nasb} + + + {TheLexer._Mypair._Myval2,na} + Expanding Macro: {TheTokenLexer._Mypair._Myval2,na} + + + + + [{(Token *)(CachedTokens.BeginX) + CachedLexPos,na}] {IncludeMacroStack._Mypair._Myval2._Mylast - 1,na} + + {IncludeMacroStack._Mypair._Myval2._Mylast - 1,na} + {CurLexer._Mypair._Myval2,na} + Expanding Macro: {CurTokenLexer._Mypair._Myval2,na} + + + {this,view(cached)} + + CLK_LexAfterModuleImport + + + [{Tok}] {PP,na} + + + this + *this + {Id} + &{Id} + No visualizer for {Kind} + + + + =, + &, + + {(LambdaCapture *)(Captures.BeginX),na}{this,view(capture1)na} + + ,{(LambdaCapture *)(Captures.BeginX)+1,na}{this,view(capture2)na} + + ,{(LambdaCapture *)(Captures.BeginX)+2,na}{this,view(capture3)na} + + ,... + [{this,view(default)na}{this,view(capture0)na}] + + + + , [{TypeRep}] + + + , [{ExprRep}] + + + , [{DeclRep}] + + + [{(clang::DeclSpec::SCS)StorageClassSpec,en}], [{(clang::TypeSpecifierType)TypeSpecType,en}]{this,view(extra)na} + + (clang::DeclSpec::SCS)StorageClassSpec + (clang::TypeSpecifierType)TypeSpecType + + TypeRep + + + ExprRep + + + DeclRep + + + + + + {Name,s} + + + {RealPathName,s} + + + {Name,s} + + + + (clang::StorageClass)SClass + (clang::ThreadStorageClassSpecifier)TSCSpec + (clang::VarDecl::InitializationStyle)InitStyle + + + + {DeclType,view(left)} {Name,view(cpp)}{DeclType,view(right)} + + Name + DeclType + + + + {(DeclaratorDecl*)this,nand} + + (DeclaratorDecl*)this,nd + Init + VarDeclBits + + + + {*(VarDecl*)this,nd} + + ParmVarDeclBits + *(VarDecl*)this,nd + + + + {"explicit ",sb} + + explicit({ExplicitSpec,view(ptr)na}) + {ExplicitSpec,view(int)en} + {ExplicitSpec,view(int)en} : {ExplicitSpec,view(ptr)na} + + + {ExplicitSpec,view(cpp)}{Name,view(cpp)nd}({(FunctionDecl*)this,view(parm0)nand}) -> {((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)(((uintptr_t)DeclType.Value.Value) & ~15))->BaseType)->ResultType,view(cpp)} + + ExplicitSpec + (bool)FunctionDeclBits.IsCopyDeductionCandidate + (FunctionDecl*)this,nd + + + + {((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)DeclType.Value.Value.Data) & ~15))->BaseType)->ResultType,view(cpp)} + + {ParamInfo[0],na}{*this,view(parm1)nd} + + , {ParamInfo[1],na}{*this,view(parm2)nd} + + , {ParamInfo[2],na}{*this,view(parm3)nd} + + , {ParamInfo[3],na}{*this,view(parm4)nd} + + , {ParamInfo[4],na}{*this,view(parm5)nd} + + , /* expand for more params */ + + auto {Name,view(cpp)nd}({*this,view(parm0)nd}) -> {((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)DeclType.Value.Value.Data) & ~15))->BaseType)->ResultType,view(cpp)} + + {this,view(retType)nand} {Name,view(cpp)nd}({*this,view(parm0)nd}) + + (clang::DeclaratorDecl *)this,nd + ((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)DeclType.Value.Value.Data) & ~15))->BaseType)->ResultType + + {*this,view(parm0)nd} + + + ((clang::FunctionProtoType *)((clang::ExtQualsTypeCommonBase *)((*(uintptr_t *)DeclType.Value.Value.Data) & ~15))->BaseType)->FunctionTypeBits.NumParams + ParamInfo + + + + TemplateOrSpecialization + + + + {*($T1*)&Ptr} + + ($T1*)&Ptr + + + + {($T1 *)Ptr} + + ($T1 *)Ptr + + + + + {*((NamedDecl **)(this+1))[0],view(cpp)}{*this,view(parm1)} + + , {*((NamedDecl **)(this+1))[1],view(cpp)}{*this,view(parm2)} + + , {*((NamedDecl **)(this+1))[2],view(cpp)}{*this,view(parm3)} + + , {*((NamedDecl **)(this+1))[3],view(cpp)}{*this,view(parm4)} + + , {*((NamedDecl **)(this+1))[4],view(cpp)}{*this,view(parm5)} + + , /* Expand for more params */ + <{*this,view(parm0)}> + + + NumParams + (NamedDecl **)(this+1) + + + + + {(clang::Stmt::StmtClass)StmtBits.sClass,en} + + (clang::Stmt::StmtClass)StmtBits.sClass,en + + + + {*(clang::StringLiteral *)this} + Expression of class {(clang::Stmt::StmtClass)StmtBits.sClass,en} and type {TR,view(cpp)} + + + + *(unsigned *)(((clang::StringLiteral *)this)+1) + (const char *)(((clang::StringLiteral *)this)+1)+4+4,[*(unsigned *)(((clang::StringLiteral *)this)+1)]s8 + + + + public + protected + private + + {*(clang::NamedDecl *)(Ptr&~Mask)} + {*this,view(access)} {*this,view(decl)} + + (clang::AccessSpecifier)(Ptr&Mask),en + *(clang::NamedDecl *)(Ptr&~Mask) + + + + [IK_Identifier] {*Identifier} + [IK_OperatorFunctionId] {OperatorFunctionId} + [IK_ConversionFunctionId] {ConversionFunctionId} + [IK_ConstructorName] {ConstructorName} + [IK_DestructorName] {DestructorName} + [IK_DeductionGuideName] {TemplateName} + [IK_TemplateId] {TemplateId} + [IK_ConstructorTemplateId] {TemplateId} + Kind + + Identifier + OperatorFunctionId + ConversionFunctionId + ConstructorName + DestructorName + TemplateName + TemplateId + TemplateId + + + + NumDecls={NumDecls} + + + NumDecls + (Decl **)(this+1) + + + + + {*D} + {*(DeclGroup *)((uintptr_t)D&~1)} + + D + (DeclGroup *)((uintptr_t)D&~1) + + + + {DS} {Name} + + + {Decls} + + Decls + + + + {Ambiguity,en}: {Decls} + {ResultKind,en}: {Decls} + + + Invalid + Unset + {Val} + + + Invalid + Unset + {($T1)(Value&~1)} + + (bool)(Value&1) + ($T1)(Value&~1) + + + diff --git a/flang/test/Driver/msvc-dependent-lib-flags.f90 b/flang/test/Driver/msvc-dependent-lib-flags.f90 index 765917f07d8e72..1b7ecb604ad67d 100644 --- a/flang/test/Driver/msvc-dependent-lib-flags.f90 +++ b/flang/test/Driver/msvc-dependent-lib-flags.f90 @@ -1,36 +1,36 @@ -! RUN: %flang -### --target=aarch64-windows-msvc -resource-dir=%S/Inputs/resource_dir %S/Inputs/hello.f90 -v 2>&1 | FileCheck %s --check-prefixes=MSVC -! RUN: %flang -### --target=aarch64-windows-msvc -resource-dir=%S/Inputs/resource_dir -fms-runtime-lib=static_dbg %S/Inputs/hello.f90 -v 2>&1 | FileCheck %s --check-prefixes=MSVC-DEBUG -! RUN: %flang -### --target=aarch64-windows-msvc -resource-dir=%S/Inputs/resource_dir -fms-runtime-lib=dll %S/Inputs/hello.f90 -v 2>&1 | FileCheck %s --check-prefixes=MSVC-DLL -! RUN: %flang -### --target=aarch64-windows-msvc -resource-dir=%S/Inputs/resource_dir -fms-runtime-lib=dll_dbg %S/Inputs/hello.f90 -v 2>&1 | FileCheck %s --check-prefixes=MSVC-DLL-DEBUG - -! MSVC: -fc1 -! MSVC-SAME: --dependent-lib=clang_rt.builtins.lib -! MSVC-SAME: -D_MT -! MSVC-SAME: --dependent-lib=libcmt -! MSVC-SAME: --dependent-lib=FortranRuntime.static.lib -! MSVC-SAME: --dependent-lib=FortranDecimal.static.lib - -! MSVC-DEBUG: -fc1 -! MSVC-DEBUG-SAME: --dependent-lib=clang_rt.builtins.lib -! MSVC-DEBUG-SAME: -D_MT -! MSVC-DEBUG-SAME: -D_DEBUG -! MSVC-DEBUG-SAME: --dependent-lib=libcmtd -! MSVC-DEBUG-SAME: --dependent-lib=FortranRuntime.static_dbg.lib -! MSVC-DEBUG-SAME: --dependent-lib=FortranDecimal.static_dbg.lib - -! MSVC-DLL: -fc1 -! MSVC-DLL-SAME: --dependent-lib=clang_rt.builtins.lib -! MSVC-DLL-SAME: -D_MT -! MSVC-DLL-SAME: -D_DLL -! MSVC-DLL-SAME: --dependent-lib=msvcrt -! MSVC-DLL-SAME: --dependent-lib=FortranRuntime.dynamic.lib -! MSVC-DLL-SAME: --dependent-lib=FortranDecimal.dynamic.lib - -! MSVC-DLL-DEBUG: -fc1 -! MSVC-DLL-DEBUG-SAME: --dependent-lib=clang_rt.builtins.lib -! MSVC-DLL-DEBUG-SAME: -D_MT -! MSVC-DLL-DEBUG-SAME: -D_DEBUG -! MSVC-DLL-DEBUG-SAME: -D_DLL -! MSVC-DLL-DEBUG-SAME: --dependent-lib=msvcrtd -! MSVC-DLL-DEBUG-SAME: --dependent-lib=FortranRuntime.dynamic_dbg.lib -! MSVC-DLL-DEBUG-SAME: --dependent-lib=FortranDecimal.dynamic_dbg.lib +! RUN: %flang -### --target=aarch64-windows-msvc -resource-dir=%S/Inputs/resource_dir %S/Inputs/hello.f90 -v 2>&1 | FileCheck %s --check-prefixes=MSVC +! RUN: %flang -### --target=aarch64-windows-msvc -resource-dir=%S/Inputs/resource_dir -fms-runtime-lib=static_dbg %S/Inputs/hello.f90 -v 2>&1 | FileCheck %s --check-prefixes=MSVC-DEBUG +! RUN: %flang -### --target=aarch64-windows-msvc -resource-dir=%S/Inputs/resource_dir -fms-runtime-lib=dll %S/Inputs/hello.f90 -v 2>&1 | FileCheck %s --check-prefixes=MSVC-DLL +! RUN: %flang -### --target=aarch64-windows-msvc -resource-dir=%S/Inputs/resource_dir -fms-runtime-lib=dll_dbg %S/Inputs/hello.f90 -v 2>&1 | FileCheck %s --check-prefixes=MSVC-DLL-DEBUG + +! MSVC: -fc1 +! MSVC-SAME: --dependent-lib=clang_rt.builtins.lib +! MSVC-SAME: -D_MT +! MSVC-SAME: --dependent-lib=libcmt +! MSVC-SAME: --dependent-lib=FortranRuntime.static.lib +! MSVC-SAME: --dependent-lib=FortranDecimal.static.lib + +! MSVC-DEBUG: -fc1 +! MSVC-DEBUG-SAME: --dependent-lib=clang_rt.builtins.lib +! MSVC-DEBUG-SAME: -D_MT +! MSVC-DEBUG-SAME: -D_DEBUG +! MSVC-DEBUG-SAME: --dependent-lib=libcmtd +! MSVC-DEBUG-SAME: --dependent-lib=FortranRuntime.static_dbg.lib +! MSVC-DEBUG-SAME: --dependent-lib=FortranDecimal.static_dbg.lib + +! MSVC-DLL: -fc1 +! MSVC-DLL-SAME: --dependent-lib=clang_rt.builtins.lib +! MSVC-DLL-SAME: -D_MT +! MSVC-DLL-SAME: -D_DLL +! MSVC-DLL-SAME: --dependent-lib=msvcrt +! MSVC-DLL-SAME: --dependent-lib=FortranRuntime.dynamic.lib +! MSVC-DLL-SAME: --dependent-lib=FortranDecimal.dynamic.lib + +! MSVC-DLL-DEBUG: -fc1 +! MSVC-DLL-DEBUG-SAME: --dependent-lib=clang_rt.builtins.lib +! MSVC-DLL-DEBUG-SAME: -D_MT +! MSVC-DLL-DEBUG-SAME: -D_DEBUG +! MSVC-DLL-DEBUG-SAME: -D_DLL +! MSVC-DLL-DEBUG-SAME: --dependent-lib=msvcrtd +! MSVC-DLL-DEBUG-SAME: --dependent-lib=FortranRuntime.dynamic_dbg.lib +! MSVC-DLL-DEBUG-SAME: --dependent-lib=FortranDecimal.dynamic_dbg.lib diff --git a/lldb/test/API/commands/expression/ir-interpreter-phi-nodes/Makefile b/lldb/test/API/commands/expression/ir-interpreter-phi-nodes/Makefile index a1f689e07c77ff..d420a34c03e785 100644 --- a/lldb/test/API/commands/expression/ir-interpreter-phi-nodes/Makefile +++ b/lldb/test/API/commands/expression/ir-interpreter-phi-nodes/Makefile @@ -1,4 +1,4 @@ - -CXX_SOURCES := main.cpp - -include Makefile.rules + +CXX_SOURCES := main.cpp + +include Makefile.rules diff --git a/lldb/test/API/functionalities/postmortem/minidump/fizzbuzz.syms b/lldb/test/API/functionalities/postmortem/minidump/fizzbuzz.syms index cab06c1c9d50b1..e817a491af5750 100644 --- a/lldb/test/API/functionalities/postmortem/minidump/fizzbuzz.syms +++ b/lldb/test/API/functionalities/postmortem/minidump/fizzbuzz.syms @@ -1,2 +1,2 @@ -MODULE windows x86 0F45B7919A9646F9BF8F2D6076EA421A11 fizzbuzz.pdb -PUBLIC 1000 0 main +MODULE windows x86 0F45B7919A9646F9BF8F2D6076EA421A11 fizzbuzz.pdb +PUBLIC 1000 0 main diff --git a/lldb/test/API/functionalities/target-new-solib-notifications/Makefile b/lldb/test/API/functionalities/target-new-solib-notifications/Makefile index e3b48697fd7837..745f6cc9d65ae3 100644 --- a/lldb/test/API/functionalities/target-new-solib-notifications/Makefile +++ b/lldb/test/API/functionalities/target-new-solib-notifications/Makefile @@ -1,23 +1,23 @@ -CXX_SOURCES := main.cpp -LD_EXTRAS := -L. -l_d -l_c -l_a -l_b - -a.out: lib_b lib_a lib_c lib_d - -include Makefile.rules - -lib_a: lib_b - "$(MAKE)" -f $(MAKEFILE_RULES) \ - DYLIB_ONLY=YES DYLIB_CXX_SOURCES=a.cpp DYLIB_NAME=_a \ - LD_EXTRAS="-L. -l_b" - -lib_b: - "$(MAKE)" -f $(MAKEFILE_RULES) \ - DYLIB_ONLY=YES DYLIB_CXX_SOURCES=b.cpp DYLIB_NAME=_b - -lib_c: - "$(MAKE)" -f $(MAKEFILE_RULES) \ - DYLIB_ONLY=YES DYLIB_CXX_SOURCES=c.cpp DYLIB_NAME=_c - -lib_d: - "$(MAKE)" -f $(MAKEFILE_RULES) \ - DYLIB_ONLY=YES DYLIB_CXX_SOURCES=d.cpp DYLIB_NAME=_d +CXX_SOURCES := main.cpp +LD_EXTRAS := -L. -l_d -l_c -l_a -l_b + +a.out: lib_b lib_a lib_c lib_d + +include Makefile.rules + +lib_a: lib_b + "$(MAKE)" -f $(MAKEFILE_RULES) \ + DYLIB_ONLY=YES DYLIB_CXX_SOURCES=a.cpp DYLIB_NAME=_a \ + LD_EXTRAS="-L. -l_b" + +lib_b: + "$(MAKE)" -f $(MAKEFILE_RULES) \ + DYLIB_ONLY=YES DYLIB_CXX_SOURCES=b.cpp DYLIB_NAME=_b + +lib_c: + "$(MAKE)" -f $(MAKEFILE_RULES) \ + DYLIB_ONLY=YES DYLIB_CXX_SOURCES=c.cpp DYLIB_NAME=_c + +lib_d: + "$(MAKE)" -f $(MAKEFILE_RULES) \ + DYLIB_ONLY=YES DYLIB_CXX_SOURCES=d.cpp DYLIB_NAME=_d diff --git a/lldb/test/API/functionalities/target-new-solib-notifications/a.cpp b/lldb/test/API/functionalities/target-new-solib-notifications/a.cpp index 778b46ed5cef1a..66633b70ee1e50 100644 --- a/lldb/test/API/functionalities/target-new-solib-notifications/a.cpp +++ b/lldb/test/API/functionalities/target-new-solib-notifications/a.cpp @@ -1,3 +1,3 @@ -extern "C" int b_function(); - -extern "C" int a_function() { return b_function(); } +extern "C" int b_function(); + +extern "C" int a_function() { return b_function(); } diff --git a/lldb/test/API/functionalities/target-new-solib-notifications/b.cpp b/lldb/test/API/functionalities/target-new-solib-notifications/b.cpp index 4f1a4032ee0eed..8b16fbdb5728cd 100644 --- a/lldb/test/API/functionalities/target-new-solib-notifications/b.cpp +++ b/lldb/test/API/functionalities/target-new-solib-notifications/b.cpp @@ -1 +1 @@ -extern "C" int b_function() { return 500; } +extern "C" int b_function() { return 500; } diff --git a/lldb/test/API/functionalities/target-new-solib-notifications/c.cpp b/lldb/test/API/functionalities/target-new-solib-notifications/c.cpp index 8abd1b155a7590..120c88f2bb609a 100644 --- a/lldb/test/API/functionalities/target-new-solib-notifications/c.cpp +++ b/lldb/test/API/functionalities/target-new-solib-notifications/c.cpp @@ -1 +1 @@ -extern "C" int c_function() { return 600; } +extern "C" int c_function() { return 600; } diff --git a/lldb/test/API/functionalities/target-new-solib-notifications/d.cpp b/lldb/test/API/functionalities/target-new-solib-notifications/d.cpp index 58888a29ba323a..d37ad2621ae4e9 100644 --- a/lldb/test/API/functionalities/target-new-solib-notifications/d.cpp +++ b/lldb/test/API/functionalities/target-new-solib-notifications/d.cpp @@ -1 +1 @@ -extern "C" int d_function() { return 700; } +extern "C" int d_function() { return 700; } diff --git a/lldb/test/API/functionalities/target-new-solib-notifications/main.cpp b/lldb/test/API/functionalities/target-new-solib-notifications/main.cpp index 77b38c5ccdc698..bd2c79cdab9daa 100644 --- a/lldb/test/API/functionalities/target-new-solib-notifications/main.cpp +++ b/lldb/test/API/functionalities/target-new-solib-notifications/main.cpp @@ -1,16 +1,16 @@ -#include - -extern "C" int a_function(); -extern "C" int c_function(); -extern "C" int b_function(); -extern "C" int d_function(); - -int main() { - a_function(); - b_function(); - c_function(); - d_function(); - - puts("running"); // breakpoint here - return 0; -} +#include + +extern "C" int a_function(); +extern "C" int c_function(); +extern "C" int b_function(); +extern "C" int d_function(); + +int main() { + a_function(); + b_function(); + c_function(); + d_function(); + + puts("running"); // breakpoint here + return 0; +} diff --git a/lldb/test/API/functionalities/unwind/zeroth_frame/Makefile b/lldb/test/API/functionalities/unwind/zeroth_frame/Makefile index 15a931850e17e5..10495940055b63 100644 --- a/lldb/test/API/functionalities/unwind/zeroth_frame/Makefile +++ b/lldb/test/API/functionalities/unwind/zeroth_frame/Makefile @@ -1,3 +1,3 @@ -C_SOURCES := main.c - -include Makefile.rules +C_SOURCES := main.c + +include Makefile.rules diff --git a/lldb/test/API/functionalities/unwind/zeroth_frame/TestZerothFrame.py b/lldb/test/API/functionalities/unwind/zeroth_frame/TestZerothFrame.py index d660844405e137..70f72c72c8340e 100644 --- a/lldb/test/API/functionalities/unwind/zeroth_frame/TestZerothFrame.py +++ b/lldb/test/API/functionalities/unwind/zeroth_frame/TestZerothFrame.py @@ -1,88 +1,88 @@ -""" -Test that line information is recalculated properly for a frame when it moves -from the middle of the backtrace to a zero index. - -This is a regression test for a StackFrame bug, where whether frame is zero or -not depends on an internal field. When LLDB was updating its frame list value -of the field wasn't copied into existing StackFrame instances, so those -StackFrame instances, would use an incorrect line entry evaluation logic in -situations if it was in the middle of the stack frame list (not zeroth), and -then moved to the top position. The difference in logic is that for zeroth -frames line entry is returned for program counter, while for other frame -(except for those that "behave like zeroth") it is for the instruction -preceding PC, as PC points to the next instruction after function call. When -the bug is present, when execution stops at the second breakpoint -SBFrame.GetLineEntry() returns line entry for the previous line, rather than -the one with a breakpoint. Note that this is specific to -SBFrame.GetLineEntry(), SBFrame.GetPCAddress().GetLineEntry() would return -correct entry. - -This bug doesn't reproduce through an LLDB interpretator, however it happens -when using API directly, for example in LLDB-MI. -""" - -import lldb -from lldbsuite.test.decorators import * -from lldbsuite.test.lldbtest import * -from lldbsuite.test import lldbutil - - -class ZerothFrame(TestBase): - def test(self): - """ - Test that line information is recalculated properly for a frame when it moves - from the middle of the backtrace to a zero index. - """ - self.build() - self.setTearDownCleanup() - - exe = self.getBuildArtifact("a.out") - target = self.dbg.CreateTarget(exe) - self.assertTrue(target, VALID_TARGET) - - main_dot_c = lldb.SBFileSpec("main.c") - bp1 = target.BreakpointCreateBySourceRegex( - "// Set breakpoint 1 here", main_dot_c - ) - bp2 = target.BreakpointCreateBySourceRegex( - "// Set breakpoint 2 here", main_dot_c - ) - - process = target.LaunchSimple(None, None, self.get_process_working_directory()) - self.assertTrue(process, VALID_PROCESS) - - thread = self.thread() - - if self.TraceOn(): - print("Backtrace at the first breakpoint:") - for f in thread.frames: - print(f) - - # Check that we have stopped at correct breakpoint. - self.assertEqual( - thread.frame[0].GetLineEntry().GetLine(), - bp1.GetLocationAtIndex(0).GetAddress().GetLineEntry().GetLine(), - "LLDB reported incorrect line number.", - ) - - # Important to use SBProcess::Continue() instead of - # self.runCmd('continue'), because the problem doesn't reproduce with - # 'continue' command. - process.Continue() - - if self.TraceOn(): - print("Backtrace at the second breakpoint:") - for f in thread.frames: - print(f) - # Check that we have stopped at the breakpoint - self.assertEqual( - thread.frame[0].GetLineEntry().GetLine(), - bp2.GetLocationAtIndex(0).GetAddress().GetLineEntry().GetLine(), - "LLDB reported incorrect line number.", - ) - # Double-check with GetPCAddress() - self.assertEqual( - thread.frame[0].GetLineEntry().GetLine(), - thread.frame[0].GetPCAddress().GetLineEntry().GetLine(), - "LLDB reported incorrect line number.", - ) +""" +Test that line information is recalculated properly for a frame when it moves +from the middle of the backtrace to a zero index. + +This is a regression test for a StackFrame bug, where whether frame is zero or +not depends on an internal field. When LLDB was updating its frame list value +of the field wasn't copied into existing StackFrame instances, so those +StackFrame instances, would use an incorrect line entry evaluation logic in +situations if it was in the middle of the stack frame list (not zeroth), and +then moved to the top position. The difference in logic is that for zeroth +frames line entry is returned for program counter, while for other frame +(except for those that "behave like zeroth") it is for the instruction +preceding PC, as PC points to the next instruction after function call. When +the bug is present, when execution stops at the second breakpoint +SBFrame.GetLineEntry() returns line entry for the previous line, rather than +the one with a breakpoint. Note that this is specific to +SBFrame.GetLineEntry(), SBFrame.GetPCAddress().GetLineEntry() would return +correct entry. + +This bug doesn't reproduce through an LLDB interpretator, however it happens +when using API directly, for example in LLDB-MI. +""" + +import lldb +from lldbsuite.test.decorators import * +from lldbsuite.test.lldbtest import * +from lldbsuite.test import lldbutil + + +class ZerothFrame(TestBase): + def test(self): + """ + Test that line information is recalculated properly for a frame when it moves + from the middle of the backtrace to a zero index. + """ + self.build() + self.setTearDownCleanup() + + exe = self.getBuildArtifact("a.out") + target = self.dbg.CreateTarget(exe) + self.assertTrue(target, VALID_TARGET) + + main_dot_c = lldb.SBFileSpec("main.c") + bp1 = target.BreakpointCreateBySourceRegex( + "// Set breakpoint 1 here", main_dot_c + ) + bp2 = target.BreakpointCreateBySourceRegex( + "// Set breakpoint 2 here", main_dot_c + ) + + process = target.LaunchSimple(None, None, self.get_process_working_directory()) + self.assertTrue(process, VALID_PROCESS) + + thread = self.thread() + + if self.TraceOn(): + print("Backtrace at the first breakpoint:") + for f in thread.frames: + print(f) + + # Check that we have stopped at correct breakpoint. + self.assertEqual( + thread.frame[0].GetLineEntry().GetLine(), + bp1.GetLocationAtIndex(0).GetAddress().GetLineEntry().GetLine(), + "LLDB reported incorrect line number.", + ) + + # Important to use SBProcess::Continue() instead of + # self.runCmd('continue'), because the problem doesn't reproduce with + # 'continue' command. + process.Continue() + + if self.TraceOn(): + print("Backtrace at the second breakpoint:") + for f in thread.frames: + print(f) + # Check that we have stopped at the breakpoint + self.assertEqual( + thread.frame[0].GetLineEntry().GetLine(), + bp2.GetLocationAtIndex(0).GetAddress().GetLineEntry().GetLine(), + "LLDB reported incorrect line number.", + ) + # Double-check with GetPCAddress() + self.assertEqual( + thread.frame[0].GetLineEntry().GetLine(), + thread.frame[0].GetPCAddress().GetLineEntry().GetLine(), + "LLDB reported incorrect line number.", + ) diff --git a/lldb/test/API/python_api/debugger/Makefile b/lldb/test/API/python_api/debugger/Makefile index bfad5f33e86753..99998b20bcb050 100644 --- a/lldb/test/API/python_api/debugger/Makefile +++ b/lldb/test/API/python_api/debugger/Makefile @@ -1,3 +1,3 @@ -CXX_SOURCES := main.cpp - -include Makefile.rules +CXX_SOURCES := main.cpp + +include Makefile.rules diff --git a/lldb/test/Shell/BuildScript/modes.test b/lldb/test/Shell/BuildScript/modes.test index 02311f712d770f..1ce50104855f46 100644 --- a/lldb/test/Shell/BuildScript/modes.test +++ b/lldb/test/Shell/BuildScript/modes.test @@ -1,35 +1,35 @@ -RUN: %build -n --verbose --arch=32 --mode=compile --compiler=any -o %t/foo.out foobar.c \ -RUN: | FileCheck --check-prefix=COMPILE %s - -RUN: %build -n --verbose --arch=32 --mode=compile --compiler=any --outdir %t foo.c bar.c \ -RUN: | FileCheck --check-prefix=COMPILE-MULTI %s - -RUN: %build -n --verbose --arch=32 --mode=link --compiler=any -o %t/foo.exe foobar.obj \ -RUN: | FileCheck --check-prefix=LINK %s - -RUN: %build -n --verbose --arch=32 --mode=link --compiler=any -o %t/foobar.exe foo.obj bar.obj \ -RUN: | FileCheck --check-prefix=LINK-MULTI %s - -RUN: %build -n --verbose --arch=32 --mode=compile-and-link --compiler=any -o %t/foobar.exe foobar.c \ -RUN: | FileCheck --check-prefix=BOTH %s - -RUN: %build -n --verbose --arch=32 --mode=compile-and-link --compiler=any -o %t/foobar.exe foo.c bar.c \ -RUN: | FileCheck --check-prefix=BOTH-MULTI %s - - -COMPILE: compiling foobar.c -> foo.out - -COMPILE-MULTI: compiling foo.c -> foo.o{{(bj)?}} -COMPILE-MULTI: compiling bar.c -> bar.o{{(bj)?}} - - -LINK: linking foobar.obj -> foo.exe - -LINK-MULTI: linking foo.obj+bar.obj -> foobar.exe - -BOTH: compiling foobar.c -> [[OBJFOO:foobar.exe-foobar.o(bj)?]] -BOTH: linking [[OBJFOO]] -> foobar.exe - -BOTH-MULTI: compiling foo.c -> [[OBJFOO:foobar.exe-foo.o(bj)?]] -BOTH-MULTI: compiling bar.c -> [[OBJBAR:foobar.exe-bar.o(bj)?]] -BOTH-MULTI: linking [[OBJFOO]]+[[OBJBAR]] -> foobar.exe +RUN: %build -n --verbose --arch=32 --mode=compile --compiler=any -o %t/foo.out foobar.c \ +RUN: | FileCheck --check-prefix=COMPILE %s + +RUN: %build -n --verbose --arch=32 --mode=compile --compiler=any --outdir %t foo.c bar.c \ +RUN: | FileCheck --check-prefix=COMPILE-MULTI %s + +RUN: %build -n --verbose --arch=32 --mode=link --compiler=any -o %t/foo.exe foobar.obj \ +RUN: | FileCheck --check-prefix=LINK %s + +RUN: %build -n --verbose --arch=32 --mode=link --compiler=any -o %t/foobar.exe foo.obj bar.obj \ +RUN: | FileCheck --check-prefix=LINK-MULTI %s + +RUN: %build -n --verbose --arch=32 --mode=compile-and-link --compiler=any -o %t/foobar.exe foobar.c \ +RUN: | FileCheck --check-prefix=BOTH %s + +RUN: %build -n --verbose --arch=32 --mode=compile-and-link --compiler=any -o %t/foobar.exe foo.c bar.c \ +RUN: | FileCheck --check-prefix=BOTH-MULTI %s + + +COMPILE: compiling foobar.c -> foo.out + +COMPILE-MULTI: compiling foo.c -> foo.o{{(bj)?}} +COMPILE-MULTI: compiling bar.c -> bar.o{{(bj)?}} + + +LINK: linking foobar.obj -> foo.exe + +LINK-MULTI: linking foo.obj+bar.obj -> foobar.exe + +BOTH: compiling foobar.c -> [[OBJFOO:foobar.exe-foobar.o(bj)?]] +BOTH: linking [[OBJFOO]] -> foobar.exe + +BOTH-MULTI: compiling foo.c -> [[OBJFOO:foobar.exe-foo.o(bj)?]] +BOTH-MULTI: compiling bar.c -> [[OBJBAR:foobar.exe-bar.o(bj)?]] +BOTH-MULTI: linking [[OBJFOO]]+[[OBJBAR]] -> foobar.exe diff --git a/lldb/test/Shell/BuildScript/script-args.test b/lldb/test/Shell/BuildScript/script-args.test index 13e8a516094267..647a48e4442b12 100644 --- a/lldb/test/Shell/BuildScript/script-args.test +++ b/lldb/test/Shell/BuildScript/script-args.test @@ -1,32 +1,32 @@ -RUN: %build -n --verbose --arch=32 --mode=compile --compiler=any -o %t/foo.out foobar.c \ -RUN: | FileCheck %s -RUN: %build -n --verbose --arch=32 --mode=compile --compiler=any --outdir %t foo.c bar.c \ -RUN: | FileCheck --check-prefix=MULTI-INPUT %s - - -CHECK: Script Arguments: -CHECK-NEXT: Arch: 32 -CHECK: Compiler: any -CHECK: Outdir: {{.*}}script-args.test.tmp -CHECK: Output: {{.*}}script-args.test.tmp{{.}}foo.out -CHECK: Nodefaultlib: False -CHECK: Opt: none -CHECK: Mode: compile -CHECK: Clean: True -CHECK: Verbose: True -CHECK: Dryrun: True -CHECK: Inputs: foobar.c - -MULTI-INPUT: Script Arguments: -MULTI-INPUT-NEXT: Arch: 32 -MULTI-INPUT-NEXT: Compiler: any -MULTI-INPUT-NEXT: Outdir: {{.*}}script-args.test.tmp -MULTI-INPUT-NEXT: Output: -MULTI-INPUT-NEXT: Nodefaultlib: False -MULTI-INPUT-NEXT: Opt: none -MULTI-INPUT-NEXT: Mode: compile -MULTI-INPUT-NEXT: Clean: True -MULTI-INPUT-NEXT: Verbose: True -MULTI-INPUT-NEXT: Dryrun: True -MULTI-INPUT-NEXT: Inputs: foo.c -MULTI-INPUT-NEXT: bar.c +RUN: %build -n --verbose --arch=32 --mode=compile --compiler=any -o %t/foo.out foobar.c \ +RUN: | FileCheck %s +RUN: %build -n --verbose --arch=32 --mode=compile --compiler=any --outdir %t foo.c bar.c \ +RUN: | FileCheck --check-prefix=MULTI-INPUT %s + + +CHECK: Script Arguments: +CHECK-NEXT: Arch: 32 +CHECK: Compiler: any +CHECK: Outdir: {{.*}}script-args.test.tmp +CHECK: Output: {{.*}}script-args.test.tmp{{.}}foo.out +CHECK: Nodefaultlib: False +CHECK: Opt: none +CHECK: Mode: compile +CHECK: Clean: True +CHECK: Verbose: True +CHECK: Dryrun: True +CHECK: Inputs: foobar.c + +MULTI-INPUT: Script Arguments: +MULTI-INPUT-NEXT: Arch: 32 +MULTI-INPUT-NEXT: Compiler: any +MULTI-INPUT-NEXT: Outdir: {{.*}}script-args.test.tmp +MULTI-INPUT-NEXT: Output: +MULTI-INPUT-NEXT: Nodefaultlib: False +MULTI-INPUT-NEXT: Opt: none +MULTI-INPUT-NEXT: Mode: compile +MULTI-INPUT-NEXT: Clean: True +MULTI-INPUT-NEXT: Verbose: True +MULTI-INPUT-NEXT: Dryrun: True +MULTI-INPUT-NEXT: Inputs: foo.c +MULTI-INPUT-NEXT: bar.c diff --git a/lldb/test/Shell/BuildScript/toolchain-clang-cl.test b/lldb/test/Shell/BuildScript/toolchain-clang-cl.test index 8c9ea9fddb8a50..4f64859a02b607 100644 --- a/lldb/test/Shell/BuildScript/toolchain-clang-cl.test +++ b/lldb/test/Shell/BuildScript/toolchain-clang-cl.test @@ -1,49 +1,49 @@ -REQUIRES: lld, system-windows - -RUN: %build -n --verbose --arch=32 --compiler=clang-cl --mode=compile-and-link -o %t/foo.exe foobar.c \ -RUN: | FileCheck --check-prefix=CHECK-32 %s - -RUN: %build -n --verbose --arch=64 --compiler=clang-cl --mode=compile-and-link -o %t/foo.exe foobar.c \ -RUN: | FileCheck --check-prefix=CHECK-64 %s - -CHECK-32: Script Arguments: -CHECK-32: Arch: 32 -CHECK-32: Compiler: clang-cl -CHECK-32: Outdir: {{.*}} -CHECK-32: Output: {{.*}}toolchain-clang-cl.test.tmp\foo.exe -CHECK-32: Nodefaultlib: False -CHECK-32: Opt: none -CHECK-32: Mode: compile -CHECK-32: Clean: True -CHECK-32: Verbose: True -CHECK-32: Dryrun: True -CHECK-32: Inputs: foobar.c -CHECK-32: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foobar.ilk -CHECK-32: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.exe-foobar.obj -CHECK-32: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.pdb -CHECK-32: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.exe -CHECK-32: compiling foobar.c -> foo.exe-foobar.obj -CHECK-32: {{.*}}clang-cl{{(\.EXE)?}} -m32 -CHECK-32: linking foo.exe-foobar.obj -> foo.exe -CHECK-32: {{.*}}lld-link{{(\.EXE)?}} - -CHECK-64: Script Arguments: -CHECK-64: Arch: 64 -CHECK-64: Compiler: clang-cl -CHECK-64: Outdir: {{.*}} -CHECK-64: Output: {{.*}}toolchain-clang-cl.test.tmp\foo.exe -CHECK-64: Nodefaultlib: False -CHECK-64: Opt: none -CHECK-64: Mode: compile -CHECK-64: Clean: True -CHECK-64: Verbose: True -CHECK-64: Dryrun: True -CHECK-64: Inputs: foobar.c -CHECK-64: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foobar.ilk -CHECK-64: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.exe-foobar.obj -CHECK-64: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.pdb -CHECK-64: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.exe -CHECK-64: compiling foobar.c -> foo.exe-foobar.obj -CHECK-64: {{.*}}clang-cl{{(\.EXE)?}} -m64 -CHECK-64: linking foo.exe-foobar.obj -> foo.exe -CHECK-64: {{.*}}lld-link{{(\.EXE)?}} +REQUIRES: lld, system-windows + +RUN: %build -n --verbose --arch=32 --compiler=clang-cl --mode=compile-and-link -o %t/foo.exe foobar.c \ +RUN: | FileCheck --check-prefix=CHECK-32 %s + +RUN: %build -n --verbose --arch=64 --compiler=clang-cl --mode=compile-and-link -o %t/foo.exe foobar.c \ +RUN: | FileCheck --check-prefix=CHECK-64 %s + +CHECK-32: Script Arguments: +CHECK-32: Arch: 32 +CHECK-32: Compiler: clang-cl +CHECK-32: Outdir: {{.*}} +CHECK-32: Output: {{.*}}toolchain-clang-cl.test.tmp\foo.exe +CHECK-32: Nodefaultlib: False +CHECK-32: Opt: none +CHECK-32: Mode: compile +CHECK-32: Clean: True +CHECK-32: Verbose: True +CHECK-32: Dryrun: True +CHECK-32: Inputs: foobar.c +CHECK-32: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foobar.ilk +CHECK-32: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.exe-foobar.obj +CHECK-32: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.pdb +CHECK-32: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.exe +CHECK-32: compiling foobar.c -> foo.exe-foobar.obj +CHECK-32: {{.*}}clang-cl{{(\.EXE)?}} -m32 +CHECK-32: linking foo.exe-foobar.obj -> foo.exe +CHECK-32: {{.*}}lld-link{{(\.EXE)?}} + +CHECK-64: Script Arguments: +CHECK-64: Arch: 64 +CHECK-64: Compiler: clang-cl +CHECK-64: Outdir: {{.*}} +CHECK-64: Output: {{.*}}toolchain-clang-cl.test.tmp\foo.exe +CHECK-64: Nodefaultlib: False +CHECK-64: Opt: none +CHECK-64: Mode: compile +CHECK-64: Clean: True +CHECK-64: Verbose: True +CHECK-64: Dryrun: True +CHECK-64: Inputs: foobar.c +CHECK-64: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foobar.ilk +CHECK-64: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.exe-foobar.obj +CHECK-64: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.pdb +CHECK-64: Cleaning {{.*}}toolchain-clang-cl.test.tmp{{.}}foo.exe +CHECK-64: compiling foobar.c -> foo.exe-foobar.obj +CHECK-64: {{.*}}clang-cl{{(\.EXE)?}} -m64 +CHECK-64: linking foo.exe-foobar.obj -> foo.exe +CHECK-64: {{.*}}lld-link{{(\.EXE)?}} diff --git a/lldb/test/Shell/Minidump/Windows/Sigsegv/Inputs/sigsegv.cpp b/lldb/test/Shell/Minidump/Windows/Sigsegv/Inputs/sigsegv.cpp index 6bf78b5dc43b29..d5b96472eb117f 100644 --- a/lldb/test/Shell/Minidump/Windows/Sigsegv/Inputs/sigsegv.cpp +++ b/lldb/test/Shell/Minidump/Windows/Sigsegv/Inputs/sigsegv.cpp @@ -1,40 +1,40 @@ - -// nodefaultlib build: cl -Zi sigsegv.cpp /link /nodefaultlib - -#ifdef USE_CRT -#include -#else -int main(); -extern "C" -{ - int _fltused; - void mainCRTStartup() { main(); } - void printf(const char*, ...) {} -} -#endif - -void crash(bool crash_self) -{ - printf("Before...\n"); - if(crash_self) - { - printf("Crashing in 3, 2, 1 ...\n"); - *(volatile int*)nullptr = 0; - } - printf("After...\n"); -} - -int foo(int x, float y, const char* msg) -{ - bool flag = x > y; - if(flag) - printf("x = %d, y = %f, msg = %s\n", x, y, msg); - crash(flag); - return x << 1; -} - -int main() -{ - foo(10, 3.14, "testing"); -} - + +// nodefaultlib build: cl -Zi sigsegv.cpp /link /nodefaultlib + +#ifdef USE_CRT +#include +#else +int main(); +extern "C" +{ + int _fltused; + void mainCRTStartup() { main(); } + void printf(const char*, ...) {} +} +#endif + +void crash(bool crash_self) +{ + printf("Before...\n"); + if(crash_self) + { + printf("Crashing in 3, 2, 1 ...\n"); + *(volatile int*)nullptr = 0; + } + printf("After...\n"); +} + +int foo(int x, float y, const char* msg) +{ + bool flag = x > y; + if(flag) + printf("x = %d, y = %f, msg = %s\n", x, y, msg); + crash(flag); + return x << 1; +} + +int main() +{ + foo(10, 3.14, "testing"); +} + diff --git a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/inline_sites.s b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/inline_sites.s index aac8f4c1698038..a9d248758bfcec 100644 --- a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/inline_sites.s +++ b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/inline_sites.s @@ -1,622 +1,622 @@ -# Compiled from the following files, but replaced the call to abort with nop. -# clang-cl -fuse-ld=lld-link /Z7 /O1 /Faa.asm /winsysroot~/win_toolchain a.cpp -# a.cpp: -# #include "a.h" -# int main(int argc, char** argv) { -# volatile int main_local = Namespace1::foo(2); -# return 0; -# } -# a.h: -# #include -# #include "b.h" -# namespace Namespace1 { -# inline int foo(int x) { -# volatile int foo_local = x + 1; -# ++foo_local; -# if (!foo_local) -# abort(); -# return Class1::bar(foo_local); -# } -# } // namespace Namespace1 -# b.h: -# #include "c.h" -# class Class1 { -# public: -# inline static int bar(int x) { -# volatile int bar_local = x + 1; -# ++bar_local; -# return Namespace2::Class2::func(bar_local); -# } -# }; -# c.h: -# namespace Namespace2 { -# class Class2 { -# public: -# inline static int func(int x) { -# volatile int func_local = x + 1; -# func_local += x; -# return func_local; -# } -# }; -# } // namespace Namespace2 - - .text - .def @feat.00; - .scl 3; - .type 0; - .endef - .globl @feat.00 -.set @feat.00, 0 - .intel_syntax noprefix - .file "a.cpp" - .def main; - .scl 2; - .type 32; - .endef - .section .text,"xr",one_only,main - .globl main # -- Begin function main -main: # @main -.Lfunc_begin0: - .cv_func_id 0 - .cv_file 1 "/tmp/a.cpp" "4FFB96E5DF1A95CE7DB9732CFFE001D7" 1 - .cv_loc 0 1 2 0 # a.cpp:2:0 -.seh_proc main -# %bb.0: - #DEBUG_VALUE: main:argv <- $rdx - #DEBUG_VALUE: main:argc <- $ecx - #DEBUG_VALUE: foo:x <- 2 - sub rsp, 56 - .seh_stackalloc 56 - .seh_endprologue -.Ltmp0: - .cv_file 2 "/tmp/./a.h" "BBFED90EF093E9C1D032CC9B05B5D167" 1 - .cv_inline_site_id 1 within 0 inlined_at 1 3 0 - .cv_loc 1 2 5 0 # ./a.h:5:0 - mov dword ptr [rsp + 44], 3 - .cv_loc 1 2 6 0 # ./a.h:6:0 - inc dword ptr [rsp + 44] - .cv_loc 1 2 7 0 # ./a.h:7:0 - mov eax, dword ptr [rsp + 44] - test eax, eax - je .LBB0_2 -.Ltmp1: -# %bb.1: - #DEBUG_VALUE: main:argv <- $rdx - #DEBUG_VALUE: main:argc <- $ecx - #DEBUG_VALUE: foo:x <- 2 - .cv_loc 1 2 9 0 # ./a.h:9:0 - mov eax, dword ptr [rsp + 44] -.Ltmp2: - #DEBUG_VALUE: bar:x <- $eax - .cv_file 3 "/tmp/./b.h" "A26CC743A260115F33AF91AB11F95877" 1 - .cv_inline_site_id 2 within 1 inlined_at 2 9 0 - .cv_loc 2 3 5 0 # ./b.h:5:0 - inc eax -.Ltmp3: - mov dword ptr [rsp + 52], eax - .cv_loc 2 3 6 0 # ./b.h:6:0 - inc dword ptr [rsp + 52] - .cv_loc 2 3 7 0 # ./b.h:7:0 - mov eax, dword ptr [rsp + 52] -.Ltmp4: - #DEBUG_VALUE: func:x <- $eax - .cv_file 4 "/tmp/./c.h" "8AF4613F78624BBE96D1C408ABA39B2D" 1 - .cv_inline_site_id 3 within 2 inlined_at 3 7 0 - .cv_loc 3 4 5 0 # ./c.h:5:0 - lea ecx, [rax + 1] -.Ltmp5: - #DEBUG_VALUE: main:argc <- [DW_OP_LLVM_entry_value 1] $ecx - mov dword ptr [rsp + 48], ecx - .cv_loc 3 4 6 0 # ./c.h:6:0 - add dword ptr [rsp + 48], eax - .cv_loc 3 4 7 0 # ./c.h:7:0 - mov eax, dword ptr [rsp + 48] -.Ltmp6: - .cv_loc 0 1 3 0 # a.cpp:3:0 - mov dword ptr [rsp + 48], eax - .cv_loc 0 1 4 0 # a.cpp:4:0 - xor eax, eax - # Use fake debug info to tests inline info. - .cv_loc 1 2 20 0 - add rsp, 56 - ret -.Ltmp7: -.LBB0_2: - #DEBUG_VALUE: main:argv <- $rdx - #DEBUG_VALUE: main:argc <- $ecx - #DEBUG_VALUE: foo:x <- 2 - .cv_loc 1 2 8 0 # ./a.h:8:0 - nop -.Ltmp8: - int3 -.Ltmp9: - #DEBUG_VALUE: main:argc <- [DW_OP_LLVM_entry_value 1] $ecx - #DEBUG_VALUE: main:argv <- [DW_OP_LLVM_entry_value 1] $rdx -.Lfunc_end0: - .seh_endproc - # -- End function - .section .drectve,"yn" - .ascii " /DEFAULTLIB:libcmt.lib" - .ascii " /DEFAULTLIB:oldnames.lib" - .section .debug$S,"dr" - .p2align 2 - .long 4 # Debug section magic - .long 241 - .long .Ltmp11-.Ltmp10 # Subsection size -.Ltmp10: - .short .Ltmp13-.Ltmp12 # Record length -.Ltmp12: - .short 4353 # Record kind: S_OBJNAME - .long 0 # Signature - .asciz "/tmp/a-2b2ba0.obj" # Object name - .p2align 2 -.Ltmp13: - .short .Ltmp15-.Ltmp14 # Record length -.Ltmp14: - .short 4412 # Record kind: S_COMPILE3 - .long 1 # Flags and language - .short 208 # CPUType - .short 15 # Frontend version - .short 0 - .short 0 - .short 0 - .short 15000 # Backend version - .short 0 - .short 0 - .short 0 - .asciz "clang version 15.0.0" # Null-terminated compiler version string - .p2align 2 -.Ltmp15: -.Ltmp11: - .p2align 2 - .long 246 # Inlinee lines subsection - .long .Ltmp17-.Ltmp16 # Subsection size -.Ltmp16: - .long 0 # Inlinee lines signature - - # Inlined function foo starts at ./a.h:4 - .long 4099 # Type index of inlined function - .cv_filechecksumoffset 2 # Offset into filechecksum table - .long 4 # Starting line number - - # Inlined function bar starts at ./b.h:4 - .long 4106 # Type index of inlined function - .cv_filechecksumoffset 3 # Offset into filechecksum table - .long 4 # Starting line number - - # Inlined function func starts at ./c.h:4 - .long 4113 # Type index of inlined function - .cv_filechecksumoffset 4 # Offset into filechecksum table - .long 4 # Starting line number -.Ltmp17: - .p2align 2 - .section .debug$S,"dr",associative,main - .p2align 2 - .long 4 # Debug section magic - .long 241 # Symbol subsection for main - .long .Ltmp19-.Ltmp18 # Subsection size -.Ltmp18: - .short .Ltmp21-.Ltmp20 # Record length -.Ltmp20: - .short 4423 # Record kind: S_GPROC32_ID - .long 0 # PtrParent - .long 0 # PtrEnd - .long 0 # PtrNext - .long .Lfunc_end0-main # Code size - .long 0 # Offset after prologue - .long 0 # Offset before epilogue - .long 4117 # Function type index - .secrel32 main # Function section relative address - .secidx main # Function section index - .byte 0 # Flags - .asciz "main" # Function name - .p2align 2 -.Ltmp21: - .short .Ltmp23-.Ltmp22 # Record length -.Ltmp22: - .short 4114 # Record kind: S_FRAMEPROC - .long 56 # FrameSize - .long 0 # Padding - .long 0 # Offset of padding - .long 0 # Bytes of callee saved registers - .long 0 # Exception handler offset - .short 0 # Exception handler section - .long 81920 # Flags (defines frame register) - .p2align 2 -.Ltmp23: - .short .Ltmp25-.Ltmp24 # Record length -.Ltmp24: - .short 4414 # Record kind: S_LOCAL - .long 116 # TypeIndex - .short 1 # Flags - .asciz "argc" - .p2align 2 -.Ltmp25: - .cv_def_range .Lfunc_begin0 .Ltmp5 .Ltmp7 .Ltmp8, reg, 18 - .short .Ltmp27-.Ltmp26 # Record length -.Ltmp26: - .short 4414 # Record kind: S_LOCAL - .long 4114 # TypeIndex - .short 1 # Flags - .asciz "argv" - .p2align 2 -.Ltmp27: - .cv_def_range .Lfunc_begin0 .Ltmp8, reg, 331 - .short .Ltmp29-.Ltmp28 # Record length -.Ltmp28: - .short 4414 # Record kind: S_LOCAL - .long 4118 # TypeIndex - .short 0 # Flags - .asciz "main_local" - .p2align 2 -.Ltmp29: - .cv_def_range .Ltmp0 .Ltmp9, frame_ptr_rel, 48 - .short .Ltmp31-.Ltmp30 # Record length -.Ltmp30: - .short 4429 # Record kind: S_INLINESITE - .long 0 # PtrParent - .long 0 # PtrEnd - .long 4099 # Inlinee type index - .cv_inline_linetable 1 2 4 .Lfunc_begin0 .Lfunc_end0 - .p2align 2 -.Ltmp31: - .short .Ltmp33-.Ltmp32 # Record length -.Ltmp32: - .short 4414 # Record kind: S_LOCAL - .long 116 # TypeIndex - .short 257 # Flags - .asciz "x" - .p2align 2 -.Ltmp33: - .short .Ltmp35-.Ltmp34 # Record length -.Ltmp34: - .short 4414 # Record kind: S_LOCAL - .long 4118 # TypeIndex - .short 0 # Flags - .asciz "foo_local" - .p2align 2 -.Ltmp35: - .cv_def_range .Ltmp0 .Ltmp6 .Ltmp7 .Ltmp9, frame_ptr_rel, 44 - .short .Ltmp37-.Ltmp36 # Record length -.Ltmp36: - .short 4429 # Record kind: S_INLINESITE - .long 0 # PtrParent - .long 0 # PtrEnd - .long 4106 # Inlinee type index - .cv_inline_linetable 2 3 4 .Lfunc_begin0 .Lfunc_end0 - .p2align 2 -.Ltmp37: - .short .Ltmp39-.Ltmp38 # Record length -.Ltmp38: - .short 4414 # Record kind: S_LOCAL - .long 116 # TypeIndex - .short 1 # Flags - .asciz "x" - .p2align 2 -.Ltmp39: - .cv_def_range .Ltmp2 .Ltmp3, reg, 17 - .short .Ltmp41-.Ltmp40 # Record length -.Ltmp40: - .short 4414 # Record kind: S_LOCAL - .long 4118 # TypeIndex - .short 0 # Flags - .asciz "bar_local" - .p2align 2 -.Ltmp41: - .cv_def_range .Ltmp2 .Ltmp6, frame_ptr_rel, 52 - .short .Ltmp43-.Ltmp42 # Record length -.Ltmp42: - .short 4429 # Record kind: S_INLINESITE - .long 0 # PtrParent - .long 0 # PtrEnd - .long 4113 # Inlinee type index - .cv_inline_linetable 3 4 4 .Lfunc_begin0 .Lfunc_end0 - .p2align 2 -.Ltmp43: - .short .Ltmp45-.Ltmp44 # Record length -.Ltmp44: - .short 4414 # Record kind: S_LOCAL - .long 116 # TypeIndex - .short 1 # Flags - .asciz "x" - .p2align 2 -.Ltmp45: - .cv_def_range .Ltmp4 .Ltmp6, reg, 17 - .short .Ltmp47-.Ltmp46 # Record length -.Ltmp46: - .short 4414 # Record kind: S_LOCAL - .long 4118 # TypeIndex - .short 0 # Flags - .asciz "func_local" - .p2align 2 -.Ltmp47: - .cv_def_range .Ltmp4 .Ltmp6, frame_ptr_rel, 48 - .short 2 # Record length - .short 4430 # Record kind: S_INLINESITE_END - .short 2 # Record length - .short 4430 # Record kind: S_INLINESITE_END - .short 2 # Record length - .short 4430 # Record kind: S_INLINESITE_END - .short 2 # Record length - .short 4431 # Record kind: S_PROC_ID_END -.Ltmp19: - .p2align 2 - .cv_linetable 0, main, .Lfunc_end0 - .section .debug$S,"dr" - .long 241 - .long .Ltmp49-.Ltmp48 # Subsection size -.Ltmp48: - .short .Ltmp51-.Ltmp50 # Record length -.Ltmp50: - .short 4360 # Record kind: S_UDT - .long 4103 # Type - .asciz "Class1" - .p2align 2 -.Ltmp51: - .short .Ltmp53-.Ltmp52 # Record length -.Ltmp52: - .short 4360 # Record kind: S_UDT - .long 4110 # Type - .asciz "Namespace2::Class2" - .p2align 2 -.Ltmp53: -.Ltmp49: - .p2align 2 - .cv_filechecksums # File index to string table offset subsection - .cv_stringtable # String table - .long 241 - .long .Ltmp55-.Ltmp54 # Subsection size -.Ltmp54: - .short .Ltmp57-.Ltmp56 # Record length -.Ltmp56: - .short 4428 # Record kind: S_BUILDINFO - .long 4124 # LF_BUILDINFO index - .p2align 2 -.Ltmp57: -.Ltmp55: - .p2align 2 - .section .debug$T,"dr" - .p2align 2 - .long 4 # Debug section magic - # StringId (0x1000) - .short 0x12 # Record length - .short 0x1605 # Record kind: LF_STRING_ID - .long 0x0 # Id - .asciz "Namespace1" # StringData - .byte 241 - # ArgList (0x1001) - .short 0xa # Record length - .short 0x1201 # Record kind: LF_ARGLIST - .long 0x1 # NumArgs - .long 0x74 # Argument: int - # Procedure (0x1002) - .short 0xe # Record length - .short 0x1008 # Record kind: LF_PROCEDURE - .long 0x74 # ReturnType: int - .byte 0x0 # CallingConvention: NearC - .byte 0x0 # FunctionOptions - .short 0x1 # NumParameters - .long 0x1001 # ArgListType: (int) - # FuncId (0x1003) - .short 0xe # Record length - .short 0x1601 # Record kind: LF_FUNC_ID - .long 0x1000 # ParentScope: Namespace1 - .long 0x1002 # FunctionType: int (int) - .asciz "foo" # Name - # Class (0x1004) - .short 0x2a # Record length - .short 0x1504 # Record kind: LF_CLASS - .short 0x0 # MemberCount - .short 0x280 # Properties ( ForwardReference (0x80) | HasUniqueName (0x200) ) - .long 0x0 # FieldList - .long 0x0 # DerivedFrom - .long 0x0 # VShape - .short 0x0 # SizeOf - .asciz "Class1" # Name - .asciz ".?AVClass1@@" # LinkageName - .byte 242 - .byte 241 - # MemberFunction (0x1005) - .short 0x1a # Record length - .short 0x1009 # Record kind: LF_MFUNCTION - .long 0x74 # ReturnType: int - .long 0x1004 # ClassType: Class1 - .long 0x0 # ThisType - .byte 0x0 # CallingConvention: NearC - .byte 0x0 # FunctionOptions - .short 0x1 # NumParameters - .long 0x1001 # ArgListType: (int) - .long 0x0 # ThisAdjustment - # FieldList (0x1006) - .short 0xe # Record length - .short 0x1203 # Record kind: LF_FIELDLIST - .short 0x1511 # Member kind: OneMethod ( LF_ONEMETHOD ) - .short 0xb # Attrs: Public, Static - .long 0x1005 # Type: int Class1::(int) - .asciz "bar" # Name - # Class (0x1007) - .short 0x2a # Record length - .short 0x1504 # Record kind: LF_CLASS - .short 0x1 # MemberCount - .short 0x200 # Properties ( HasUniqueName (0x200) ) - .long 0x1006 # FieldList: - .long 0x0 # DerivedFrom - .long 0x0 # VShape - .short 0x1 # SizeOf - .asciz "Class1" # Name - .asciz ".?AVClass1@@" # LinkageName - .byte 242 - .byte 241 - # StringId (0x1008) - .short 0x12 # Record length - .short 0x1605 # Record kind: LF_STRING_ID - .long 0x0 # Id - .asciz "/tmp/./b.h" # StringData - .byte 241 - # UdtSourceLine (0x1009) - .short 0xe # Record length - .short 0x1606 # Record kind: LF_UDT_SRC_LINE - .long 0x1007 # UDT: Class1 - .long 0x1008 # SourceFile: /tmp/./b.h - .long 0x2 # LineNumber - # MemberFuncId (0x100A) - .short 0xe # Record length - .short 0x1602 # Record kind: LF_MFUNC_ID - .long 0x1004 # ClassType: Class1 - .long 0x1005 # FunctionType: int Class1::(int) - .asciz "bar" # Name - # Class (0x100B) - .short 0x42 # Record length - .short 0x1504 # Record kind: LF_CLASS - .short 0x0 # MemberCount - .short 0x280 # Properties ( ForwardReference (0x80) | HasUniqueName (0x200) ) - .long 0x0 # FieldList - .long 0x0 # DerivedFrom - .long 0x0 # VShape - .short 0x0 # SizeOf - .asciz "Namespace2::Class2" # Name - .asciz ".?AVClass2 at Namespace2@@" # LinkageName - .byte 243 - .byte 242 - .byte 241 - # MemberFunction (0x100C) - .short 0x1a # Record length - .short 0x1009 # Record kind: LF_MFUNCTION - .long 0x74 # ReturnType: int - .long 0x100b # ClassType: Namespace2::Class2 - .long 0x0 # ThisType - .byte 0x0 # CallingConvention: NearC - .byte 0x0 # FunctionOptions - .short 0x1 # NumParameters - .long 0x1001 # ArgListType: (int) - .long 0x0 # ThisAdjustment - # FieldList (0x100D) - .short 0x12 # Record length - .short 0x1203 # Record kind: LF_FIELDLIST - .short 0x1511 # Member kind: OneMethod ( LF_ONEMETHOD ) - .short 0xb # Attrs: Public, Static - .long 0x100c # Type: int Namespace2::Class2::(int) - .asciz "func" # Name - .byte 243 - .byte 242 - .byte 241 - # Class (0x100E) - .short 0x42 # Record length - .short 0x1504 # Record kind: LF_CLASS - .short 0x1 # MemberCount - .short 0x200 # Properties ( HasUniqueName (0x200) ) - .long 0x100d # FieldList: - .long 0x0 # DerivedFrom - .long 0x0 # VShape - .short 0x1 # SizeOf - .asciz "Namespace2::Class2" # Name - .asciz ".?AVClass2 at Namespace2@@" # LinkageName - .byte 243 - .byte 242 - .byte 241 - # StringId (0x100F) - .short 0x12 # Record length - .short 0x1605 # Record kind: LF_STRING_ID - .long 0x0 # Id - .asciz "/tmp/./c.h" # StringData - .byte 241 - # UdtSourceLine (0x1010) - .short 0xe # Record length - .short 0x1606 # Record kind: LF_UDT_SRC_LINE - .long 0x100e # UDT: Namespace2::Class2 - .long 0x100f # SourceFile: /tmp/./c.h - .long 0x2 # LineNumber - # MemberFuncId (0x1011) - .short 0x12 # Record length - .short 0x1602 # Record kind: LF_MFUNC_ID - .long 0x100b # ClassType: Namespace2::Class2 - .long 0x100c # FunctionType: int Namespace2::Class2::(int) - .asciz "func" # Name - .byte 243 - .byte 242 - .byte 241 - # Pointer (0x1012) - .short 0xa # Record length - .short 0x1002 # Record kind: LF_POINTER - .long 0x670 # PointeeType: char* - .long 0x1000c # Attrs: [ Type: Near64, Mode: Pointer, SizeOf: 8 ] - # ArgList (0x1013) - .short 0xe # Record length - .short 0x1201 # Record kind: LF_ARGLIST - .long 0x2 # NumArgs - .long 0x74 # Argument: int - .long 0x1012 # Argument: char** - # Procedure (0x1014) - .short 0xe # Record length - .short 0x1008 # Record kind: LF_PROCEDURE - .long 0x74 # ReturnType: int - .byte 0x0 # CallingConvention: NearC - .byte 0x0 # FunctionOptions - .short 0x2 # NumParameters - .long 0x1013 # ArgListType: (int, char**) - # FuncId (0x1015) - .short 0x12 # Record length - .short 0x1601 # Record kind: LF_FUNC_ID - .long 0x0 # ParentScope - .long 0x1014 # FunctionType: int (int, char**) - .asciz "main" # Name - .byte 243 - .byte 242 - .byte 241 - # Modifier (0x1016) - .short 0xa # Record length - .short 0x1001 # Record kind: LF_MODIFIER - .long 0x74 # ModifiedType: int - .short 0x2 # Modifiers ( Volatile (0x2) ) - .byte 242 - .byte 241 - # StringId (0x1017) - .short 0xe # Record length - .short 0x1605 # Record kind: LF_STRING_ID - .long 0x0 # Id - .asciz "/tmp" # StringData - .byte 243 - .byte 242 - .byte 241 - # StringId (0x1018) - .short 0xe # Record length - .short 0x1605 # Record kind: LF_STRING_ID - .long 0x0 # Id - .asciz "a.cpp" # StringData - .byte 242 - .byte 241 - # StringId (0x1019) - .short 0xa # Record length - .short 0x1605 # Record kind: LF_STRING_ID - .long 0x0 # Id - .byte 0 # StringData - .byte 243 - .byte 242 - .byte 241 - # StringId (0x101A) - .short 0x4e # Record length - .short 0x1605 # Record kind: LF_STRING_ID - .long 0x0 # Id - .asciz "/usr/local/google/home/zequanwu/llvm-project/build/release/bin/clang" # StringData - .byte 243 - .byte 242 - .byte 241 - # StringId (0x101B) - .short 0x9f6 # Record length - .short 0x1605 # Record kind: LF_STRING_ID - .long 0x0 # Id - .asciz "\"-cc1\" \"-triple\" \"x86_64-pc-windows-msvc19.20.0\" \"-S\" \"-disable-free\" \"-clear-ast-before-backend\" \"-disable-llvm-verifier\" \"-discard-value-names\" \"-mrelocation-model\" \"pic\" \"-pic-level\" \"2\" \"-mframe-pointer=none\" \"-relaxed-aliasing\" \"-fmath-errno\" \"-ffp-contract=on\" \"-fno-rounding-math\" \"-mconstructor-aliases\" \"-funwind-tables=2\" \"-target-cpu\" \"x86-64\" \"-mllvm\" \"-x86-asm-syntax=intel\" \"-tune-cpu\" \"generic\" \"-mllvm\" \"-treat-scalable-fixed-error-as-warning\" \"-D_MT\" \"-flto-visibility-public-std\" \"--dependent-lib=libcmt\" \"--dependent-lib=oldnames\" \"-stack-protector\" \"2\" \"-fms-volatile\" \"-fdiagnostics-format\" \"msvc\" \"-gno-column-info\" \"-gcodeview\" \"-debug-info-kind=constructor\" \"-ffunction-sections\" \"-fcoverage-compilation-dir=/tmp\" \"-resource-dir\" \"/usr/local/google/home/zequanwu/llvm-project/build/release/lib/clang/15.0.0\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/llvm-project/build/release/lib/clang/15.0.0/include\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/DIA SDK/include\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/VC/Tools/MSVC/14.26.28801/include\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/VC/Tools/MSVC/14.26.28801/atlmfc/include\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/ucrt\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/shared\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/um\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/winrt\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/cppwinrt\" \"-Os\" \"-fdeprecated-macro\" \"-fdebug-compilation-dir=/tmp\" \"-ferror-limit\" \"19\" \"-fno-use-cxa-atexit\" \"-fms-extensions\" \"-fms-compatibility\" \"-fms-compatibility-version=19.20\" \"-std=c++14\" \"-fdelayed-template-parsing\" \"-fcolor-diagnostics\" \"-vectorize-loops\" \"-vectorize-slp\" \"-faddrsig\" \"-x\" \"c++\"" # StringData - .byte 242 - .byte 241 - # BuildInfo (0x101C) - .short 0x1a # Record length - .short 0x1603 # Record kind: LF_BUILDINFO - .short 0x5 # NumArgs - .long 0x1017 # Argument: /tmp - .long 0x101a # Argument: /usr/local/google/home/zequanwu/llvm-project/build/release/bin/clang - .long 0x1018 # Argument: a.cpp - .long 0x1019 # Argument - .long 0x101b # Argument: "-cc1" "-triple" "x86_64-pc-windows-msvc19.20.0" "-S" "-disable-free" "-clear-ast-before-backend" "-disable-llvm-verifier" "-discard-value-names" "-mrelocation-model" "pic" "-pic-level" "2" "-mframe-pointer=none" "-relaxed-aliasing" "-fmath-errno" "-ffp-contract=on" "-fno-rounding-math" "-mconstructor-aliases" "-funwind-tables=2" "-target-cpu" "x86-64" "-mllvm" "-x86-asm-syntax=intel" "-tune-cpu" "generic" "-mllvm" "-treat-scalable-fixed-error-as-warning" "-D_MT" "-flto-visibility-public-std" "--dependent-lib=libcmt" "--dependent-lib=oldnames" "-stack-protector" "2" "-fms-volatile" "-fdiagnostics-format" "msvc" "-gno-column-info" "-gcodeview" "-debug-info-kind=constructor" "-ffunction-sections" "-fcoverage-compilation-dir=/tmp" "-resource-dir" "/usr/local/google/home/zequanwu/llvm-project/build/release/lib/clang/15.0.0" "-internal-isystem" "/usr/local/google/home/zequanwu/llvm-project/build/release/lib/clang/15.0.0/include" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/DIA SDK/include" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/VC/Tools/MSVC/14.26.28801/include" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/VC/Tools/MSVC/14.26.28801/atlmfc/include" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/ucrt" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/shared" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/um" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/winrt" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/cppwinrt" "-Os" "-fdeprecated-macro" "-fdebug-compilation-dir=/tmp" "-ferror-limit" "19" "-fno-use-cxa-atexit" "-fms-extensions" "-fms-compatibility" "-fms-compatibility-version=19.20" "-std=c++14" "-fdelayed-template-parsing" "-fcolor-diagnostics" "-vectorize-loops" "-vectorize-slp" "-faddrsig" "-x" "c++" - .byte 242 - .byte 241 - .addrsig +# Compiled from the following files, but replaced the call to abort with nop. +# clang-cl -fuse-ld=lld-link /Z7 /O1 /Faa.asm /winsysroot~/win_toolchain a.cpp +# a.cpp: +# #include "a.h" +# int main(int argc, char** argv) { +# volatile int main_local = Namespace1::foo(2); +# return 0; +# } +# a.h: +# #include +# #include "b.h" +# namespace Namespace1 { +# inline int foo(int x) { +# volatile int foo_local = x + 1; +# ++foo_local; +# if (!foo_local) +# abort(); +# return Class1::bar(foo_local); +# } +# } // namespace Namespace1 +# b.h: +# #include "c.h" +# class Class1 { +# public: +# inline static int bar(int x) { +# volatile int bar_local = x + 1; +# ++bar_local; +# return Namespace2::Class2::func(bar_local); +# } +# }; +# c.h: +# namespace Namespace2 { +# class Class2 { +# public: +# inline static int func(int x) { +# volatile int func_local = x + 1; +# func_local += x; +# return func_local; +# } +# }; +# } // namespace Namespace2 + + .text + .def @feat.00; + .scl 3; + .type 0; + .endef + .globl @feat.00 +.set @feat.00, 0 + .intel_syntax noprefix + .file "a.cpp" + .def main; + .scl 2; + .type 32; + .endef + .section .text,"xr",one_only,main + .globl main # -- Begin function main +main: # @main +.Lfunc_begin0: + .cv_func_id 0 + .cv_file 1 "/tmp/a.cpp" "4FFB96E5DF1A95CE7DB9732CFFE001D7" 1 + .cv_loc 0 1 2 0 # a.cpp:2:0 +.seh_proc main +# %bb.0: + #DEBUG_VALUE: main:argv <- $rdx + #DEBUG_VALUE: main:argc <- $ecx + #DEBUG_VALUE: foo:x <- 2 + sub rsp, 56 + .seh_stackalloc 56 + .seh_endprologue +.Ltmp0: + .cv_file 2 "/tmp/./a.h" "BBFED90EF093E9C1D032CC9B05B5D167" 1 + .cv_inline_site_id 1 within 0 inlined_at 1 3 0 + .cv_loc 1 2 5 0 # ./a.h:5:0 + mov dword ptr [rsp + 44], 3 + .cv_loc 1 2 6 0 # ./a.h:6:0 + inc dword ptr [rsp + 44] + .cv_loc 1 2 7 0 # ./a.h:7:0 + mov eax, dword ptr [rsp + 44] + test eax, eax + je .LBB0_2 +.Ltmp1: +# %bb.1: + #DEBUG_VALUE: main:argv <- $rdx + #DEBUG_VALUE: main:argc <- $ecx + #DEBUG_VALUE: foo:x <- 2 + .cv_loc 1 2 9 0 # ./a.h:9:0 + mov eax, dword ptr [rsp + 44] +.Ltmp2: + #DEBUG_VALUE: bar:x <- $eax + .cv_file 3 "/tmp/./b.h" "A26CC743A260115F33AF91AB11F95877" 1 + .cv_inline_site_id 2 within 1 inlined_at 2 9 0 + .cv_loc 2 3 5 0 # ./b.h:5:0 + inc eax +.Ltmp3: + mov dword ptr [rsp + 52], eax + .cv_loc 2 3 6 0 # ./b.h:6:0 + inc dword ptr [rsp + 52] + .cv_loc 2 3 7 0 # ./b.h:7:0 + mov eax, dword ptr [rsp + 52] +.Ltmp4: + #DEBUG_VALUE: func:x <- $eax + .cv_file 4 "/tmp/./c.h" "8AF4613F78624BBE96D1C408ABA39B2D" 1 + .cv_inline_site_id 3 within 2 inlined_at 3 7 0 + .cv_loc 3 4 5 0 # ./c.h:5:0 + lea ecx, [rax + 1] +.Ltmp5: + #DEBUG_VALUE: main:argc <- [DW_OP_LLVM_entry_value 1] $ecx + mov dword ptr [rsp + 48], ecx + .cv_loc 3 4 6 0 # ./c.h:6:0 + add dword ptr [rsp + 48], eax + .cv_loc 3 4 7 0 # ./c.h:7:0 + mov eax, dword ptr [rsp + 48] +.Ltmp6: + .cv_loc 0 1 3 0 # a.cpp:3:0 + mov dword ptr [rsp + 48], eax + .cv_loc 0 1 4 0 # a.cpp:4:0 + xor eax, eax + # Use fake debug info to tests inline info. + .cv_loc 1 2 20 0 + add rsp, 56 + ret +.Ltmp7: +.LBB0_2: + #DEBUG_VALUE: main:argv <- $rdx + #DEBUG_VALUE: main:argc <- $ecx + #DEBUG_VALUE: foo:x <- 2 + .cv_loc 1 2 8 0 # ./a.h:8:0 + nop +.Ltmp8: + int3 +.Ltmp9: + #DEBUG_VALUE: main:argc <- [DW_OP_LLVM_entry_value 1] $ecx + #DEBUG_VALUE: main:argv <- [DW_OP_LLVM_entry_value 1] $rdx +.Lfunc_end0: + .seh_endproc + # -- End function + .section .drectve,"yn" + .ascii " /DEFAULTLIB:libcmt.lib" + .ascii " /DEFAULTLIB:oldnames.lib" + .section .debug$S,"dr" + .p2align 2 + .long 4 # Debug section magic + .long 241 + .long .Ltmp11-.Ltmp10 # Subsection size +.Ltmp10: + .short .Ltmp13-.Ltmp12 # Record length +.Ltmp12: + .short 4353 # Record kind: S_OBJNAME + .long 0 # Signature + .asciz "/tmp/a-2b2ba0.obj" # Object name + .p2align 2 +.Ltmp13: + .short .Ltmp15-.Ltmp14 # Record length +.Ltmp14: + .short 4412 # Record kind: S_COMPILE3 + .long 1 # Flags and language + .short 208 # CPUType + .short 15 # Frontend version + .short 0 + .short 0 + .short 0 + .short 15000 # Backend version + .short 0 + .short 0 + .short 0 + .asciz "clang version 15.0.0" # Null-terminated compiler version string + .p2align 2 +.Ltmp15: +.Ltmp11: + .p2align 2 + .long 246 # Inlinee lines subsection + .long .Ltmp17-.Ltmp16 # Subsection size +.Ltmp16: + .long 0 # Inlinee lines signature + + # Inlined function foo starts at ./a.h:4 + .long 4099 # Type index of inlined function + .cv_filechecksumoffset 2 # Offset into filechecksum table + .long 4 # Starting line number + + # Inlined function bar starts at ./b.h:4 + .long 4106 # Type index of inlined function + .cv_filechecksumoffset 3 # Offset into filechecksum table + .long 4 # Starting line number + + # Inlined function func starts at ./c.h:4 + .long 4113 # Type index of inlined function + .cv_filechecksumoffset 4 # Offset into filechecksum table + .long 4 # Starting line number +.Ltmp17: + .p2align 2 + .section .debug$S,"dr",associative,main + .p2align 2 + .long 4 # Debug section magic + .long 241 # Symbol subsection for main + .long .Ltmp19-.Ltmp18 # Subsection size +.Ltmp18: + .short .Ltmp21-.Ltmp20 # Record length +.Ltmp20: + .short 4423 # Record kind: S_GPROC32_ID + .long 0 # PtrParent + .long 0 # PtrEnd + .long 0 # PtrNext + .long .Lfunc_end0-main # Code size + .long 0 # Offset after prologue + .long 0 # Offset before epilogue + .long 4117 # Function type index + .secrel32 main # Function section relative address + .secidx main # Function section index + .byte 0 # Flags + .asciz "main" # Function name + .p2align 2 +.Ltmp21: + .short .Ltmp23-.Ltmp22 # Record length +.Ltmp22: + .short 4114 # Record kind: S_FRAMEPROC + .long 56 # FrameSize + .long 0 # Padding + .long 0 # Offset of padding + .long 0 # Bytes of callee saved registers + .long 0 # Exception handler offset + .short 0 # Exception handler section + .long 81920 # Flags (defines frame register) + .p2align 2 +.Ltmp23: + .short .Ltmp25-.Ltmp24 # Record length +.Ltmp24: + .short 4414 # Record kind: S_LOCAL + .long 116 # TypeIndex + .short 1 # Flags + .asciz "argc" + .p2align 2 +.Ltmp25: + .cv_def_range .Lfunc_begin0 .Ltmp5 .Ltmp7 .Ltmp8, reg, 18 + .short .Ltmp27-.Ltmp26 # Record length +.Ltmp26: + .short 4414 # Record kind: S_LOCAL + .long 4114 # TypeIndex + .short 1 # Flags + .asciz "argv" + .p2align 2 +.Ltmp27: + .cv_def_range .Lfunc_begin0 .Ltmp8, reg, 331 + .short .Ltmp29-.Ltmp28 # Record length +.Ltmp28: + .short 4414 # Record kind: S_LOCAL + .long 4118 # TypeIndex + .short 0 # Flags + .asciz "main_local" + .p2align 2 +.Ltmp29: + .cv_def_range .Ltmp0 .Ltmp9, frame_ptr_rel, 48 + .short .Ltmp31-.Ltmp30 # Record length +.Ltmp30: + .short 4429 # Record kind: S_INLINESITE + .long 0 # PtrParent + .long 0 # PtrEnd + .long 4099 # Inlinee type index + .cv_inline_linetable 1 2 4 .Lfunc_begin0 .Lfunc_end0 + .p2align 2 +.Ltmp31: + .short .Ltmp33-.Ltmp32 # Record length +.Ltmp32: + .short 4414 # Record kind: S_LOCAL + .long 116 # TypeIndex + .short 257 # Flags + .asciz "x" + .p2align 2 +.Ltmp33: + .short .Ltmp35-.Ltmp34 # Record length +.Ltmp34: + .short 4414 # Record kind: S_LOCAL + .long 4118 # TypeIndex + .short 0 # Flags + .asciz "foo_local" + .p2align 2 +.Ltmp35: + .cv_def_range .Ltmp0 .Ltmp6 .Ltmp7 .Ltmp9, frame_ptr_rel, 44 + .short .Ltmp37-.Ltmp36 # Record length +.Ltmp36: + .short 4429 # Record kind: S_INLINESITE + .long 0 # PtrParent + .long 0 # PtrEnd + .long 4106 # Inlinee type index + .cv_inline_linetable 2 3 4 .Lfunc_begin0 .Lfunc_end0 + .p2align 2 +.Ltmp37: + .short .Ltmp39-.Ltmp38 # Record length +.Ltmp38: + .short 4414 # Record kind: S_LOCAL + .long 116 # TypeIndex + .short 1 # Flags + .asciz "x" + .p2align 2 +.Ltmp39: + .cv_def_range .Ltmp2 .Ltmp3, reg, 17 + .short .Ltmp41-.Ltmp40 # Record length +.Ltmp40: + .short 4414 # Record kind: S_LOCAL + .long 4118 # TypeIndex + .short 0 # Flags + .asciz "bar_local" + .p2align 2 +.Ltmp41: + .cv_def_range .Ltmp2 .Ltmp6, frame_ptr_rel, 52 + .short .Ltmp43-.Ltmp42 # Record length +.Ltmp42: + .short 4429 # Record kind: S_INLINESITE + .long 0 # PtrParent + .long 0 # PtrEnd + .long 4113 # Inlinee type index + .cv_inline_linetable 3 4 4 .Lfunc_begin0 .Lfunc_end0 + .p2align 2 +.Ltmp43: + .short .Ltmp45-.Ltmp44 # Record length +.Ltmp44: + .short 4414 # Record kind: S_LOCAL + .long 116 # TypeIndex + .short 1 # Flags + .asciz "x" + .p2align 2 +.Ltmp45: + .cv_def_range .Ltmp4 .Ltmp6, reg, 17 + .short .Ltmp47-.Ltmp46 # Record length +.Ltmp46: + .short 4414 # Record kind: S_LOCAL + .long 4118 # TypeIndex + .short 0 # Flags + .asciz "func_local" + .p2align 2 +.Ltmp47: + .cv_def_range .Ltmp4 .Ltmp6, frame_ptr_rel, 48 + .short 2 # Record length + .short 4430 # Record kind: S_INLINESITE_END + .short 2 # Record length + .short 4430 # Record kind: S_INLINESITE_END + .short 2 # Record length + .short 4430 # Record kind: S_INLINESITE_END + .short 2 # Record length + .short 4431 # Record kind: S_PROC_ID_END +.Ltmp19: + .p2align 2 + .cv_linetable 0, main, .Lfunc_end0 + .section .debug$S,"dr" + .long 241 + .long .Ltmp49-.Ltmp48 # Subsection size +.Ltmp48: + .short .Ltmp51-.Ltmp50 # Record length +.Ltmp50: + .short 4360 # Record kind: S_UDT + .long 4103 # Type + .asciz "Class1" + .p2align 2 +.Ltmp51: + .short .Ltmp53-.Ltmp52 # Record length +.Ltmp52: + .short 4360 # Record kind: S_UDT + .long 4110 # Type + .asciz "Namespace2::Class2" + .p2align 2 +.Ltmp53: +.Ltmp49: + .p2align 2 + .cv_filechecksums # File index to string table offset subsection + .cv_stringtable # String table + .long 241 + .long .Ltmp55-.Ltmp54 # Subsection size +.Ltmp54: + .short .Ltmp57-.Ltmp56 # Record length +.Ltmp56: + .short 4428 # Record kind: S_BUILDINFO + .long 4124 # LF_BUILDINFO index + .p2align 2 +.Ltmp57: +.Ltmp55: + .p2align 2 + .section .debug$T,"dr" + .p2align 2 + .long 4 # Debug section magic + # StringId (0x1000) + .short 0x12 # Record length + .short 0x1605 # Record kind: LF_STRING_ID + .long 0x0 # Id + .asciz "Namespace1" # StringData + .byte 241 + # ArgList (0x1001) + .short 0xa # Record length + .short 0x1201 # Record kind: LF_ARGLIST + .long 0x1 # NumArgs + .long 0x74 # Argument: int + # Procedure (0x1002) + .short 0xe # Record length + .short 0x1008 # Record kind: LF_PROCEDURE + .long 0x74 # ReturnType: int + .byte 0x0 # CallingConvention: NearC + .byte 0x0 # FunctionOptions + .short 0x1 # NumParameters + .long 0x1001 # ArgListType: (int) + # FuncId (0x1003) + .short 0xe # Record length + .short 0x1601 # Record kind: LF_FUNC_ID + .long 0x1000 # ParentScope: Namespace1 + .long 0x1002 # FunctionType: int (int) + .asciz "foo" # Name + # Class (0x1004) + .short 0x2a # Record length + .short 0x1504 # Record kind: LF_CLASS + .short 0x0 # MemberCount + .short 0x280 # Properties ( ForwardReference (0x80) | HasUniqueName (0x200) ) + .long 0x0 # FieldList + .long 0x0 # DerivedFrom + .long 0x0 # VShape + .short 0x0 # SizeOf + .asciz "Class1" # Name + .asciz ".?AVClass1@@" # LinkageName + .byte 242 + .byte 241 + # MemberFunction (0x1005) + .short 0x1a # Record length + .short 0x1009 # Record kind: LF_MFUNCTION + .long 0x74 # ReturnType: int + .long 0x1004 # ClassType: Class1 + .long 0x0 # ThisType + .byte 0x0 # CallingConvention: NearC + .byte 0x0 # FunctionOptions + .short 0x1 # NumParameters + .long 0x1001 # ArgListType: (int) + .long 0x0 # ThisAdjustment + # FieldList (0x1006) + .short 0xe # Record length + .short 0x1203 # Record kind: LF_FIELDLIST + .short 0x1511 # Member kind: OneMethod ( LF_ONEMETHOD ) + .short 0xb # Attrs: Public, Static + .long 0x1005 # Type: int Class1::(int) + .asciz "bar" # Name + # Class (0x1007) + .short 0x2a # Record length + .short 0x1504 # Record kind: LF_CLASS + .short 0x1 # MemberCount + .short 0x200 # Properties ( HasUniqueName (0x200) ) + .long 0x1006 # FieldList: + .long 0x0 # DerivedFrom + .long 0x0 # VShape + .short 0x1 # SizeOf + .asciz "Class1" # Name + .asciz ".?AVClass1@@" # LinkageName + .byte 242 + .byte 241 + # StringId (0x1008) + .short 0x12 # Record length + .short 0x1605 # Record kind: LF_STRING_ID + .long 0x0 # Id + .asciz "/tmp/./b.h" # StringData + .byte 241 + # UdtSourceLine (0x1009) + .short 0xe # Record length + .short 0x1606 # Record kind: LF_UDT_SRC_LINE + .long 0x1007 # UDT: Class1 + .long 0x1008 # SourceFile: /tmp/./b.h + .long 0x2 # LineNumber + # MemberFuncId (0x100A) + .short 0xe # Record length + .short 0x1602 # Record kind: LF_MFUNC_ID + .long 0x1004 # ClassType: Class1 + .long 0x1005 # FunctionType: int Class1::(int) + .asciz "bar" # Name + # Class (0x100B) + .short 0x42 # Record length + .short 0x1504 # Record kind: LF_CLASS + .short 0x0 # MemberCount + .short 0x280 # Properties ( ForwardReference (0x80) | HasUniqueName (0x200) ) + .long 0x0 # FieldList + .long 0x0 # DerivedFrom + .long 0x0 # VShape + .short 0x0 # SizeOf + .asciz "Namespace2::Class2" # Name + .asciz ".?AVClass2 at Namespace2@@" # LinkageName + .byte 243 + .byte 242 + .byte 241 + # MemberFunction (0x100C) + .short 0x1a # Record length + .short 0x1009 # Record kind: LF_MFUNCTION + .long 0x74 # ReturnType: int + .long 0x100b # ClassType: Namespace2::Class2 + .long 0x0 # ThisType + .byte 0x0 # CallingConvention: NearC + .byte 0x0 # FunctionOptions + .short 0x1 # NumParameters + .long 0x1001 # ArgListType: (int) + .long 0x0 # ThisAdjustment + # FieldList (0x100D) + .short 0x12 # Record length + .short 0x1203 # Record kind: LF_FIELDLIST + .short 0x1511 # Member kind: OneMethod ( LF_ONEMETHOD ) + .short 0xb # Attrs: Public, Static + .long 0x100c # Type: int Namespace2::Class2::(int) + .asciz "func" # Name + .byte 243 + .byte 242 + .byte 241 + # Class (0x100E) + .short 0x42 # Record length + .short 0x1504 # Record kind: LF_CLASS + .short 0x1 # MemberCount + .short 0x200 # Properties ( HasUniqueName (0x200) ) + .long 0x100d # FieldList: + .long 0x0 # DerivedFrom + .long 0x0 # VShape + .short 0x1 # SizeOf + .asciz "Namespace2::Class2" # Name + .asciz ".?AVClass2 at Namespace2@@" # LinkageName + .byte 243 + .byte 242 + .byte 241 + # StringId (0x100F) + .short 0x12 # Record length + .short 0x1605 # Record kind: LF_STRING_ID + .long 0x0 # Id + .asciz "/tmp/./c.h" # StringData + .byte 241 + # UdtSourceLine (0x1010) + .short 0xe # Record length + .short 0x1606 # Record kind: LF_UDT_SRC_LINE + .long 0x100e # UDT: Namespace2::Class2 + .long 0x100f # SourceFile: /tmp/./c.h + .long 0x2 # LineNumber + # MemberFuncId (0x1011) + .short 0x12 # Record length + .short 0x1602 # Record kind: LF_MFUNC_ID + .long 0x100b # ClassType: Namespace2::Class2 + .long 0x100c # FunctionType: int Namespace2::Class2::(int) + .asciz "func" # Name + .byte 243 + .byte 242 + .byte 241 + # Pointer (0x1012) + .short 0xa # Record length + .short 0x1002 # Record kind: LF_POINTER + .long 0x670 # PointeeType: char* + .long 0x1000c # Attrs: [ Type: Near64, Mode: Pointer, SizeOf: 8 ] + # ArgList (0x1013) + .short 0xe # Record length + .short 0x1201 # Record kind: LF_ARGLIST + .long 0x2 # NumArgs + .long 0x74 # Argument: int + .long 0x1012 # Argument: char** + # Procedure (0x1014) + .short 0xe # Record length + .short 0x1008 # Record kind: LF_PROCEDURE + .long 0x74 # ReturnType: int + .byte 0x0 # CallingConvention: NearC + .byte 0x0 # FunctionOptions + .short 0x2 # NumParameters + .long 0x1013 # ArgListType: (int, char**) + # FuncId (0x1015) + .short 0x12 # Record length + .short 0x1601 # Record kind: LF_FUNC_ID + .long 0x0 # ParentScope + .long 0x1014 # FunctionType: int (int, char**) + .asciz "main" # Name + .byte 243 + .byte 242 + .byte 241 + # Modifier (0x1016) + .short 0xa # Record length + .short 0x1001 # Record kind: LF_MODIFIER + .long 0x74 # ModifiedType: int + .short 0x2 # Modifiers ( Volatile (0x2) ) + .byte 242 + .byte 241 + # StringId (0x1017) + .short 0xe # Record length + .short 0x1605 # Record kind: LF_STRING_ID + .long 0x0 # Id + .asciz "/tmp" # StringData + .byte 243 + .byte 242 + .byte 241 + # StringId (0x1018) + .short 0xe # Record length + .short 0x1605 # Record kind: LF_STRING_ID + .long 0x0 # Id + .asciz "a.cpp" # StringData + .byte 242 + .byte 241 + # StringId (0x1019) + .short 0xa # Record length + .short 0x1605 # Record kind: LF_STRING_ID + .long 0x0 # Id + .byte 0 # StringData + .byte 243 + .byte 242 + .byte 241 + # StringId (0x101A) + .short 0x4e # Record length + .short 0x1605 # Record kind: LF_STRING_ID + .long 0x0 # Id + .asciz "/usr/local/google/home/zequanwu/llvm-project/build/release/bin/clang" # StringData + .byte 243 + .byte 242 + .byte 241 + # StringId (0x101B) + .short 0x9f6 # Record length + .short 0x1605 # Record kind: LF_STRING_ID + .long 0x0 # Id + .asciz "\"-cc1\" \"-triple\" \"x86_64-pc-windows-msvc19.20.0\" \"-S\" \"-disable-free\" \"-clear-ast-before-backend\" \"-disable-llvm-verifier\" \"-discard-value-names\" \"-mrelocation-model\" \"pic\" \"-pic-level\" \"2\" \"-mframe-pointer=none\" \"-relaxed-aliasing\" \"-fmath-errno\" \"-ffp-contract=on\" \"-fno-rounding-math\" \"-mconstructor-aliases\" \"-funwind-tables=2\" \"-target-cpu\" \"x86-64\" \"-mllvm\" \"-x86-asm-syntax=intel\" \"-tune-cpu\" \"generic\" \"-mllvm\" \"-treat-scalable-fixed-error-as-warning\" \"-D_MT\" \"-flto-visibility-public-std\" \"--dependent-lib=libcmt\" \"--dependent-lib=oldnames\" \"-stack-protector\" \"2\" \"-fms-volatile\" \"-fdiagnostics-format\" \"msvc\" \"-gno-column-info\" \"-gcodeview\" \"-debug-info-kind=constructor\" \"-ffunction-sections\" \"-fcoverage-compilation-dir=/tmp\" \"-resource-dir\" \"/usr/local/google/home/zequanwu/llvm-project/build/release/lib/clang/15.0.0\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/llvm-project/build/release/lib/clang/15.0.0/include\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/DIA SDK/include\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/VC/Tools/MSVC/14.26.28801/include\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/VC/Tools/MSVC/14.26.28801/atlmfc/include\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/ucrt\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/shared\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/um\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/winrt\" \"-internal-isystem\" \"/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/cppwinrt\" \"-Os\" \"-fdeprecated-macro\" \"-fdebug-compilation-dir=/tmp\" \"-ferror-limit\" \"19\" \"-fno-use-cxa-atexit\" \"-fms-extensions\" \"-fms-compatibility\" \"-fms-compatibility-version=19.20\" \"-std=c++14\" \"-fdelayed-template-parsing\" \"-fcolor-diagnostics\" \"-vectorize-loops\" \"-vectorize-slp\" \"-faddrsig\" \"-x\" \"c++\"" # StringData + .byte 242 + .byte 241 + # BuildInfo (0x101C) + .short 0x1a # Record length + .short 0x1603 # Record kind: LF_BUILDINFO + .short 0x5 # NumArgs + .long 0x1017 # Argument: /tmp + .long 0x101a # Argument: /usr/local/google/home/zequanwu/llvm-project/build/release/bin/clang + .long 0x1018 # Argument: a.cpp + .long 0x1019 # Argument + .long 0x101b # Argument: "-cc1" "-triple" "x86_64-pc-windows-msvc19.20.0" "-S" "-disable-free" "-clear-ast-before-backend" "-disable-llvm-verifier" "-discard-value-names" "-mrelocation-model" "pic" "-pic-level" "2" "-mframe-pointer=none" "-relaxed-aliasing" "-fmath-errno" "-ffp-contract=on" "-fno-rounding-math" "-mconstructor-aliases" "-funwind-tables=2" "-target-cpu" "x86-64" "-mllvm" "-x86-asm-syntax=intel" "-tune-cpu" "generic" "-mllvm" "-treat-scalable-fixed-error-as-warning" "-D_MT" "-flto-visibility-public-std" "--dependent-lib=libcmt" "--dependent-lib=oldnames" "-stack-protector" "2" "-fms-volatile" "-fdiagnostics-format" "msvc" "-gno-column-info" "-gcodeview" "-debug-info-kind=constructor" "-ffunction-sections" "-fcoverage-compilation-dir=/tmp" "-resource-dir" "/usr/local/google/home/zequanwu/llvm-project/build/release/lib/clang/15.0.0" "-internal-isystem" "/usr/local/google/home/zequanwu/llvm-project/build/release/lib/clang/15.0.0/include" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/DIA SDK/include" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/VC/Tools/MSVC/14.26.28801/include" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/VC/Tools/MSVC/14.26.28801/atlmfc/include" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/ucrt" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/shared" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/um" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/winrt" "-internal-isystem" "/usr/local/google/home/zequanwu/chromium/src/third_party/depot_tools/win_toolchain/vs_files/3bda71a11e/Windows Kits/10/Include/10.0.19041.0/cppwinrt" "-Os" "-fdeprecated-macro" "-fdebug-compilation-dir=/tmp" "-ferror-limit" "19" "-fno-use-cxa-atexit" "-fms-extensions" "-fms-compatibility" "-fms-compatibility-version=19.20" "-std=c++14" "-fdelayed-template-parsing" "-fcolor-diagnostics" "-vectorize-loops" "-vectorize-slp" "-faddrsig" "-x" "c++" + .byte 242 + .byte 241 + .addrsig diff --git a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/inline_sites_live.lldbinit b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/inline_sites_live.lldbinit index 2291c7c4527175..eab5061dafbdcd 100644 --- a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/inline_sites_live.lldbinit +++ b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/inline_sites_live.lldbinit @@ -1,7 +1,7 @@ -br set -p BP_bar -f inline_sites_live.cpp -br set -p BP_foo -f inline_sites_live.cpp -run -expression param -continue -expression param -expression local +br set -p BP_bar -f inline_sites_live.cpp +br set -p BP_foo -f inline_sites_live.cpp +run +expression param +continue +expression param +expression local diff --git a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/local-variables-registers.lldbinit b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/local-variables-registers.lldbinit index ad080da24dab71..feda7485675792 100644 --- a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/local-variables-registers.lldbinit +++ b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/local-variables-registers.lldbinit @@ -1,35 +1,35 @@ -image lookup -a 0x140001000 -v -image lookup -a 0x140001003 -v -image lookup -a 0x140001006 -v - -image lookup -a 0x140001011 -v -image lookup -a 0x140001017 -v -image lookup -a 0x140001019 -v -image lookup -a 0x14000101e -v -image lookup -a 0x14000102c -v - -image lookup -a 0x140001031 -v -image lookup -a 0x140001032 -v -image lookup -a 0x140001033 -v -image lookup -a 0x140001034 -v -image lookup -a 0x140001035 -v -image lookup -a 0x140001036 -v -image lookup -a 0x140001037 -v -image lookup -a 0x14000103b -v -image lookup -a 0x14000103d -v -image lookup -a 0x14000103f -v -image lookup -a 0x140001041 -v -image lookup -a 0x140001043 -v -image lookup -a 0x140001045 -v -image lookup -a 0x140001046 -v -image lookup -a 0x140001047 -v -image lookup -a 0x140001048 -v -image lookup -a 0x140001049 -v -image lookup -a 0x14000104a -v -image lookup -a 0x14000104b -v -image lookup -a 0x14000104c -v -image lookup -a 0x14000104e -v -image lookup -a 0x14000104f -v -image lookup -a 0x140001050 -v -image lookup -a 0x140001051 -v -exit +image lookup -a 0x140001000 -v +image lookup -a 0x140001003 -v +image lookup -a 0x140001006 -v + +image lookup -a 0x140001011 -v +image lookup -a 0x140001017 -v +image lookup -a 0x140001019 -v +image lookup -a 0x14000101e -v +image lookup -a 0x14000102c -v + +image lookup -a 0x140001031 -v +image lookup -a 0x140001032 -v +image lookup -a 0x140001033 -v +image lookup -a 0x140001034 -v +image lookup -a 0x140001035 -v +image lookup -a 0x140001036 -v +image lookup -a 0x140001037 -v +image lookup -a 0x14000103b -v +image lookup -a 0x14000103d -v +image lookup -a 0x14000103f -v +image lookup -a 0x140001041 -v +image lookup -a 0x140001043 -v +image lookup -a 0x140001045 -v +image lookup -a 0x140001046 -v +image lookup -a 0x140001047 -v +image lookup -a 0x140001048 -v +image lookup -a 0x140001049 -v +image lookup -a 0x14000104a -v +image lookup -a 0x14000104b -v +image lookup -a 0x14000104c -v +image lookup -a 0x14000104e -v +image lookup -a 0x14000104f -v +image lookup -a 0x140001050 -v +image lookup -a 0x140001051 -v +exit diff --git a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/lookup-by-types.lldbinit b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/lookup-by-types.lldbinit index afe3f2c8b943e3..3f639eb2e539bc 100644 --- a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/lookup-by-types.lldbinit +++ b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/lookup-by-types.lldbinit @@ -1,4 +1,4 @@ -image lookup -type A -image lookup -type B - +image lookup -type A +image lookup -type B + quit \ No newline at end of file diff --git a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/subfield_register_simple_type.lldbinit b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/subfield_register_simple_type.lldbinit index 3dc33fd789dac0..32758f1fbc51f3 100644 --- a/lldb/test/Shell/SymbolFile/NativePDB/Inputs/subfield_register_simple_type.lldbinit +++ b/lldb/test/Shell/SymbolFile/NativePDB/Inputs/subfield_register_simple_type.lldbinit @@ -1,2 +1,2 @@ -image lookup -a 0x40102f -v -quit +image lookup -a 0x40102f -v +quit diff --git a/lldb/test/Shell/SymbolFile/NativePDB/function-types-classes.cpp b/lldb/test/Shell/SymbolFile/NativePDB/function-types-classes.cpp index ca2a84de7698a4..f0fac90e5065a1 100644 --- a/lldb/test/Shell/SymbolFile/NativePDB/function-types-classes.cpp +++ b/lldb/test/Shell/SymbolFile/NativePDB/function-types-classes.cpp @@ -113,9 +113,9 @@ auto incomplete = &three; // CHECK: |-CXXRecordDecl {{.*}} union U // CHECK: |-EnumDecl {{.*}} E // CHECK: |-CXXRecordDecl {{.*}} struct S -// CHECK: |-VarDecl {{.*}} a 'S (*)(C *, U &, E &&)' -// CHECK: |-VarDecl {{.*}} b 'E (*)(const S *, const C &, const U &&)' -// CHECK: |-VarDecl {{.*}} c 'U (*)(volatile E *, volatile S &, volatile C &&)' +// CHECK: |-VarDecl {{.*}} a 'S (*)(C *, U &, E &&)' +// CHECK: |-VarDecl {{.*}} b 'E (*)(const S *, const C &, const U &&)' +// CHECK: |-VarDecl {{.*}} c 'U (*)(volatile E *, volatile S &, volatile C &&)' // CHECK: |-VarDecl {{.*}} d 'C (*)(const volatile U *, const volatile E &, const volatile S &&)' // CHECK: |-CXXRecordDecl {{.*}} struct B // CHECK: | `-CXXRecordDecl {{.*}} struct A @@ -125,14 +125,14 @@ auto incomplete = &three; // CHECK: | | `-CXXRecordDecl {{.*}} struct S // CHECK: | `-NamespaceDecl {{.*}} B // CHECK: | `-CXXRecordDecl {{.*}} struct S -// CHECK: |-VarDecl {{.*}} e 'A::B::S *(*)(B::A::S *, A::C::S &)' -// CHECK: |-VarDecl {{.*}} f 'A::C::S &(*)(A::B::S *, B::A::S *)' +// CHECK: |-VarDecl {{.*}} e 'A::B::S *(*)(B::A::S *, A::C::S &)' +// CHECK: |-VarDecl {{.*}} f 'A::C::S &(*)(A::B::S *, B::A::S *)' // CHECK: |-VarDecl {{.*}} g 'B::A::S *(*)(A::C::S &, A::B::S *)' // CHECK: |-CXXRecordDecl {{.*}} struct TC // CHECK: |-CXXRecordDecl {{.*}} struct TC> // CHECK: |-CXXRecordDecl {{.*}} struct TC // CHECK: |-CXXRecordDecl {{.*}} struct TC -// CHECK: |-VarDecl {{.*}} h 'TC (*)(TC, TC>, TC)' +// CHECK: |-VarDecl {{.*}} h 'TC (*)(TC, TC>, TC)' // CHECK: |-VarDecl {{.*}} i 'A::B::S (*)()' // CHECK: |-CXXRecordDecl {{.*}} struct Incomplete // CHECK: `-VarDecl {{.*}} incomplete 'Incomplete *(*)(Incomplete **, const Incomplete *)' diff --git a/lldb/test/Shell/SymbolFile/NativePDB/inline_sites_live.cpp b/lldb/test/Shell/SymbolFile/NativePDB/inline_sites_live.cpp index 767149ea18c468..40298272696580 100644 --- a/lldb/test/Shell/SymbolFile/NativePDB/inline_sites_live.cpp +++ b/lldb/test/Shell/SymbolFile/NativePDB/inline_sites_live.cpp @@ -1,34 +1,34 @@ -// clang-format off -// REQUIRES: system-windows - -// RUN: %build -o %t.exe -- %s -// RUN: env LLDB_USE_NATIVE_PDB_READER=1 %lldb -f %t.exe -s \ -// RUN: %p/Inputs/inline_sites_live.lldbinit 2>&1 | FileCheck %s - -void use(int) {} - -void __attribute__((always_inline)) bar(int param) { - use(param); // BP_bar -} - -void __attribute__((always_inline)) foo(int param) { - int local = param+1; - bar(local); - use(param); - use(local); // BP_foo -} - -int main(int argc, char** argv) { - foo(argc); -} - -// CHECK: * thread #1, stop reason = breakpoint 1 -// CHECK-NEXT: frame #0: {{.*}}`main [inlined] bar(param=2) -// CHECK: (lldb) expression param -// CHECK-NEXT: (int) $0 = 2 -// CHECK: * thread #1, stop reason = breakpoint 2 -// CHECK-NEXT: frame #0: {{.*}}`main [inlined] foo(param=1) -// CHECK: (lldb) expression param -// CHECK-NEXT: (int) $1 = 1 -// CHECK-NEXT: (lldb) expression local -// CHECK-NEXT: (int) $2 = 2 +// clang-format off +// REQUIRES: system-windows + +// RUN: %build -o %t.exe -- %s +// RUN: env LLDB_USE_NATIVE_PDB_READER=1 %lldb -f %t.exe -s \ +// RUN: %p/Inputs/inline_sites_live.lldbinit 2>&1 | FileCheck %s + +void use(int) {} + +void __attribute__((always_inline)) bar(int param) { + use(param); // BP_bar +} + +void __attribute__((always_inline)) foo(int param) { + int local = param+1; + bar(local); + use(param); + use(local); // BP_foo +} + +int main(int argc, char** argv) { + foo(argc); +} + +// CHECK: * thread #1, stop reason = breakpoint 1 +// CHECK-NEXT: frame #0: {{.*}}`main [inlined] bar(param=2) +// CHECK: (lldb) expression param +// CHECK-NEXT: (int) $0 = 2 +// CHECK: * thread #1, stop reason = breakpoint 2 +// CHECK-NEXT: frame #0: {{.*}}`main [inlined] foo(param=1) +// CHECK: (lldb) expression param +// CHECK-NEXT: (int) $1 = 1 +// CHECK-NEXT: (lldb) expression local +// CHECK-NEXT: (int) $2 = 2 diff --git a/lldb/test/Shell/SymbolFile/NativePDB/lookup-by-types.cpp b/lldb/test/Shell/SymbolFile/NativePDB/lookup-by-types.cpp index f3aea8115f3858..cd5bbfc30fa0e1 100644 --- a/lldb/test/Shell/SymbolFile/NativePDB/lookup-by-types.cpp +++ b/lldb/test/Shell/SymbolFile/NativePDB/lookup-by-types.cpp @@ -1,46 +1,46 @@ -// clang-format off - -// RUN: %build -o %t.exe -- %s -// RUN: env LLDB_USE_NATIVE_PDB_READER=1 %lldb -f %t.exe -s \ -// RUN: %p/Inputs/lookup-by-types.lldbinit 2>&1 | FileCheck %s - -class B; -class A { -public: - static const A constA; - static A a; - static B b; - int val = 1; -}; -class B { -public: - static A a; - int val = 2; -}; -A varA; -B varB; -const A A::constA = varA; -A A::a = varA; -B A::b = varB; -A B::a = varA; - -int main(int argc, char **argv) { - return varA.val + varB.val; -} - -// CHECK: image lookup -type A -// CHECK-NEXT: 1 match found in {{.*}}.exe -// CHECK-NEXT: compiler_type = "class A { -// CHECK-NEXT: static const A constA; -// CHECK-NEXT: static A a; -// CHECK-NEXT: static B b; -// CHECK-NEXT: public: -// CHECK-NEXT: int val; -// CHECK-NEXT: }" -// CHECK: image lookup -type B -// CHECK-NEXT: 1 match found in {{.*}}.exe -// CHECK-NEXT: compiler_type = "class B { -// CHECK-NEXT: static A a; -// CHECK-NEXT: public: -// CHECK-NEXT: int val; -// CHECK-NEXT: }" +// clang-format off + +// RUN: %build -o %t.exe -- %s +// RUN: env LLDB_USE_NATIVE_PDB_READER=1 %lldb -f %t.exe -s \ +// RUN: %p/Inputs/lookup-by-types.lldbinit 2>&1 | FileCheck %s + +class B; +class A { +public: + static const A constA; + static A a; + static B b; + int val = 1; +}; +class B { +public: + static A a; + int val = 2; +}; +A varA; +B varB; +const A A::constA = varA; +A A::a = varA; +B A::b = varB; +A B::a = varA; + +int main(int argc, char **argv) { + return varA.val + varB.val; +} + +// CHECK: image lookup -type A +// CHECK-NEXT: 1 match found in {{.*}}.exe +// CHECK-NEXT: compiler_type = "class A { +// CHECK-NEXT: static const A constA; +// CHECK-NEXT: static A a; +// CHECK-NEXT: static B b; +// CHECK-NEXT: public: +// CHECK-NEXT: int val; +// CHECK-NEXT: }" +// CHECK: image lookup -type B +// CHECK-NEXT: 1 match found in {{.*}}.exe +// CHECK-NEXT: compiler_type = "class B { +// CHECK-NEXT: static A a; +// CHECK-NEXT: public: +// CHECK-NEXT: int val; +// CHECK-NEXT: }" diff --git a/lldb/unittests/Breakpoint/CMakeLists.txt b/lldb/unittests/Breakpoint/CMakeLists.txt index 757c2da1a4d9de..db985bc82dc5e2 100644 --- a/lldb/unittests/Breakpoint/CMakeLists.txt +++ b/lldb/unittests/Breakpoint/CMakeLists.txt @@ -1,10 +1,10 @@ -add_lldb_unittest(LLDBBreakpointTests - BreakpointIDTest.cpp - WatchpointAlgorithmsTests.cpp - - LINK_LIBS - lldbBreakpoint - lldbCore - LINK_COMPONENTS - Support - ) +add_lldb_unittest(LLDBBreakpointTests + BreakpointIDTest.cpp + WatchpointAlgorithmsTests.cpp + + LINK_LIBS + lldbBreakpoint + lldbCore + LINK_COMPONENTS + Support + ) diff --git a/llvm/benchmarks/FormatVariadicBM.cpp b/llvm/benchmarks/FormatVariadicBM.cpp index c03ead400d0d5c..e351db338730e9 100644 --- a/llvm/benchmarks/FormatVariadicBM.cpp +++ b/llvm/benchmarks/FormatVariadicBM.cpp @@ -1,63 +1,63 @@ -//===- FormatVariadicBM.cpp - formatv() benchmark ---------- --------------===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// - -#include "benchmark/benchmark.h" -#include "llvm/Support/FormatVariadic.h" -#include -#include -#include - -using namespace llvm; -using namespace std; - -// Generate a list of format strings that have `NumReplacements` replacements -// by permuting the replacements and some literal text. -static vector getFormatStrings(int NumReplacements) { - vector Components; - for (int I = 0; I < NumReplacements; I++) - Components.push_back("{" + to_string(I) + "}"); - // Intersperse these with some other literal text (_). - const string_view Literal = "____"; - for (char C : Literal) - Components.push_back(string(1, C)); - - vector Formats; - do { - string Concat; - for (const string &C : Components) - Concat += C; - Formats.emplace_back(Concat); - } while (next_permutation(Components.begin(), Components.end())); - return Formats; -} - -// Generate the set of formats to exercise outside the benchmark code. -static const vector> Formats = { - getFormatStrings(1), getFormatStrings(2), getFormatStrings(3), - getFormatStrings(4), getFormatStrings(5), -}; - -// Benchmark formatv() for a variety of format strings and 1-5 replacements. -static void BM_FormatVariadic(benchmark::State &state) { - for (auto _ : state) { - for (const string &Fmt : Formats[0]) - formatv(Fmt.c_str(), 1).str(); - for (const string &Fmt : Formats[1]) - formatv(Fmt.c_str(), 1, 2).str(); - for (const string &Fmt : Formats[2]) - formatv(Fmt.c_str(), 1, 2, 3).str(); - for (const string &Fmt : Formats[3]) - formatv(Fmt.c_str(), 1, 2, 3, 4).str(); - for (const string &Fmt : Formats[4]) - formatv(Fmt.c_str(), 1, 2, 3, 4, 5).str(); - } -} - -BENCHMARK(BM_FormatVariadic); - -BENCHMARK_MAIN(); +//===- FormatVariadicBM.cpp - formatv() benchmark ---------- --------------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#include "benchmark/benchmark.h" +#include "llvm/Support/FormatVariadic.h" +#include +#include +#include + +using namespace llvm; +using namespace std; + +// Generate a list of format strings that have `NumReplacements` replacements +// by permuting the replacements and some literal text. +static vector getFormatStrings(int NumReplacements) { + vector Components; + for (int I = 0; I < NumReplacements; I++) + Components.push_back("{" + to_string(I) + "}"); + // Intersperse these with some other literal text (_). + const string_view Literal = "____"; + for (char C : Literal) + Components.push_back(string(1, C)); + + vector Formats; + do { + string Concat; + for (const string &C : Components) + Concat += C; + Formats.emplace_back(Concat); + } while (next_permutation(Components.begin(), Components.end())); + return Formats; +} + +// Generate the set of formats to exercise outside the benchmark code. +static const vector> Formats = { + getFormatStrings(1), getFormatStrings(2), getFormatStrings(3), + getFormatStrings(4), getFormatStrings(5), +}; + +// Benchmark formatv() for a variety of format strings and 1-5 replacements. +static void BM_FormatVariadic(benchmark::State &state) { + for (auto _ : state) { + for (const string &Fmt : Formats[0]) + formatv(Fmt.c_str(), 1).str(); + for (const string &Fmt : Formats[1]) + formatv(Fmt.c_str(), 1, 2).str(); + for (const string &Fmt : Formats[2]) + formatv(Fmt.c_str(), 1, 2, 3).str(); + for (const string &Fmt : Formats[3]) + formatv(Fmt.c_str(), 1, 2, 3, 4).str(); + for (const string &Fmt : Formats[4]) + formatv(Fmt.c_str(), 1, 2, 3, 4, 5).str(); + } +} + +BENCHMARK(BM_FormatVariadic); + +BENCHMARK_MAIN(); diff --git a/llvm/benchmarks/GetIntrinsicForClangBuiltin.cpp b/llvm/benchmarks/GetIntrinsicForClangBuiltin.cpp index fa9c528424c95f..953d9125e11ee2 100644 --- a/llvm/benchmarks/GetIntrinsicForClangBuiltin.cpp +++ b/llvm/benchmarks/GetIntrinsicForClangBuiltin.cpp @@ -1,50 +1,50 @@ -#include "benchmark/benchmark.h" -#include "llvm/IR/Intrinsics.h" - -using namespace llvm; -using namespace Intrinsic; - -// Benchmark intrinsic lookup from a variety of targets. -static void BM_GetIntrinsicForClangBuiltin(benchmark::State &state) { - static const char *Builtins[] = { - "__builtin_adjust_trampoline", - "__builtin_trap", - "__builtin_arm_ttest", - "__builtin_amdgcn_cubetc", - "__builtin_amdgcn_udot2", - "__builtin_arm_stc", - "__builtin_bpf_compare", - "__builtin_HEXAGON_A2_max", - "__builtin_lasx_xvabsd_b", - "__builtin_mips_dlsa", - "__nvvm_floor_f", - "__builtin_altivec_vslb", - "__builtin_r600_read_tgid_x", - "__builtin_riscv_aes64im", - "__builtin_s390_vcksm", - "__builtin_ve_vl_pvfmksge_Mvl", - "__builtin_ia32_axor64", - "__builtin_bitrev", - }; - static const char *Targets[] = {"", "aarch64", "amdgcn", "mips", - "nvvm", "r600", "riscv"}; - - for (auto _ : state) { - for (auto Builtin : Builtins) - for (auto Target : Targets) - getIntrinsicForClangBuiltin(Target, Builtin); - } -} - -static void -BM_GetIntrinsicForClangBuiltinHexagonFirst(benchmark::State &state) { - // Exercise the worst case by looking for the first builtin for a target - // that has a lot of builtins. - for (auto _ : state) - getIntrinsicForClangBuiltin("hexagon", "__builtin_HEXAGON_A2_abs"); -} - -BENCHMARK(BM_GetIntrinsicForClangBuiltin); -BENCHMARK(BM_GetIntrinsicForClangBuiltinHexagonFirst); - -BENCHMARK_MAIN(); +#include "benchmark/benchmark.h" +#include "llvm/IR/Intrinsics.h" + +using namespace llvm; +using namespace Intrinsic; + +// Benchmark intrinsic lookup from a variety of targets. +static void BM_GetIntrinsicForClangBuiltin(benchmark::State &state) { + static const char *Builtins[] = { + "__builtin_adjust_trampoline", + "__builtin_trap", + "__builtin_arm_ttest", + "__builtin_amdgcn_cubetc", + "__builtin_amdgcn_udot2", + "__builtin_arm_stc", + "__builtin_bpf_compare", + "__builtin_HEXAGON_A2_max", + "__builtin_lasx_xvabsd_b", + "__builtin_mips_dlsa", + "__nvvm_floor_f", + "__builtin_altivec_vslb", + "__builtin_r600_read_tgid_x", + "__builtin_riscv_aes64im", + "__builtin_s390_vcksm", + "__builtin_ve_vl_pvfmksge_Mvl", + "__builtin_ia32_axor64", + "__builtin_bitrev", + }; + static const char *Targets[] = {"", "aarch64", "amdgcn", "mips", + "nvvm", "r600", "riscv"}; + + for (auto _ : state) { + for (auto Builtin : Builtins) + for (auto Target : Targets) + getIntrinsicForClangBuiltin(Target, Builtin); + } +} + +static void +BM_GetIntrinsicForClangBuiltinHexagonFirst(benchmark::State &state) { + // Exercise the worst case by looking for the first builtin for a target + // that has a lot of builtins. + for (auto _ : state) + getIntrinsicForClangBuiltin("hexagon", "__builtin_HEXAGON_A2_abs"); +} + +BENCHMARK(BM_GetIntrinsicForClangBuiltin); +BENCHMARK(BM_GetIntrinsicForClangBuiltinHexagonFirst); + +BENCHMARK_MAIN(); diff --git a/llvm/benchmarks/GetIntrinsicInfoTableEntriesBM.cpp b/llvm/benchmarks/GetIntrinsicInfoTableEntriesBM.cpp index 7f3bd3bc9eb6b3..758291274675d6 100644 --- a/llvm/benchmarks/GetIntrinsicInfoTableEntriesBM.cpp +++ b/llvm/benchmarks/GetIntrinsicInfoTableEntriesBM.cpp @@ -1,30 +1,30 @@ -//===- GetIntrinsicInfoTableEntries.cpp - IIT signature benchmark ---------===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// - -#include "benchmark/benchmark.h" -#include "llvm/ADT/SmallVector.h" -#include "llvm/IR/Intrinsics.h" - -using namespace llvm; -using namespace Intrinsic; - -static void BM_GetIntrinsicInfoTableEntries(benchmark::State &state) { - SmallVector Table; - for (auto _ : state) { - for (ID ID = 1; ID < num_intrinsics; ++ID) { - // This makes sure the vector does not keep growing, as well as after the - // first iteration does not result in additional allocations. - Table.clear(); - getIntrinsicInfoTableEntries(ID, Table); - } - } -} - -BENCHMARK(BM_GetIntrinsicInfoTableEntries); - -BENCHMARK_MAIN(); +//===- GetIntrinsicInfoTableEntries.cpp - IIT signature benchmark ---------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#include "benchmark/benchmark.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/IR/Intrinsics.h" + +using namespace llvm; +using namespace Intrinsic; + +static void BM_GetIntrinsicInfoTableEntries(benchmark::State &state) { + SmallVector Table; + for (auto _ : state) { + for (ID ID = 1; ID < num_intrinsics; ++ID) { + // This makes sure the vector does not keep growing, as well as after the + // first iteration does not result in additional allocations. + Table.clear(); + getIntrinsicInfoTableEntries(ID, Table); + } + } +} + +BENCHMARK(BM_GetIntrinsicInfoTableEntries); + +BENCHMARK_MAIN(); diff --git a/llvm/docs/_static/LoopOptWG_invite.ics b/llvm/docs/_static/LoopOptWG_invite.ics index 65597d90a9c852..7c92e4048cc3d1 100644 --- a/llvm/docs/_static/LoopOptWG_invite.ics +++ b/llvm/docs/_static/LoopOptWG_invite.ics @@ -1,80 +1,80 @@ -BEGIN:VCALENDAR -PRODID:-//Google Inc//Google Calendar 70.9054//EN -VERSION:2.0 -CALSCALE:GREGORIAN -METHOD:PUBLISH -X-WR-CALNAME:LLVM Loop Optimization Discussion -X-WR-TIMEZONE:Europe/Berlin -BEGIN:VTIMEZONE -TZID:America/New_York -X-LIC-LOCATION:America/New_York -BEGIN:DAYLIGHT -TZOFFSETFROM:-0500 -TZOFFSETTO:-0400 -TZNAME:EDT -DTSTART:19700308T020000 -RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU -END:DAYLIGHT -BEGIN:STANDARD -TZOFFSETFROM:-0400 -TZOFFSETTO:-0500 -TZNAME:EST -DTSTART:19701101T020000 -RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU -END:STANDARD -END:VTIMEZONE -BEGIN:VEVENT -DTSTART;TZID=America/New_York:20240904T110000 -DTEND;TZID=America/New_York:20240904T120000 -RRULE:FREQ=MONTHLY;BYDAY=1WE -DTSTAMP:20240821T160951Z -UID:58h3f0kd3aooohmeii0johh23c at google.com -X-GOOGLE-CONFERENCE:https://meet.google.com/fmz-gspu-odg -CREATED:20240821T151507Z -DESCRIPTION:LLVM Loop Optimization Discussion
Video call link: https://meet.google.c - om/fmz-gspu-odg
Agenda/Minutes/Discussion: https://docs.google.com/document/d/1sdzoyB - 11s0ccTZ3fobqctDpgJmRoFcz0sviKxqczs4g/edit?usp=sharing\n\n-::~:~::~:~:~ - :~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~::~:~::-\ - nJoin with Google Meet: https://meet.google.com/fmz-gspu-odg\nOr dial: (DE) - +49 40 8081617343 PIN: 948106286#\nMore phone numbers: https://tel.meet/fm - z-gspu-odg?pin=6273693382184&hs=7\n\nLearn more about Meet at: https://supp - ort.google.com/a/users/answer/9282720\n\nPlease do not edit this section.\n - -::~:~::~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~ - :~:~::~:~::- -LAST-MODIFIED:20240821T160941Z -SEQUENCE:0 -STATUS:CONFIRMED -SUMMARY:LLVM Loop Optimization Discussion -TRANSP:OPAQUE -END:VEVENT -BEGIN:VEVENT -DTSTART;TZID=America/New_York:20240904T110000 -DTEND;TZID=America/New_York:20240904T120000 -DTSTAMP:20240821T160951Z -UID:58h3f0kd3aooohmeii0johh23c at google.com -X-GOOGLE-CONFERENCE:https://meet.google.com/fmz-gspu-odg -RECURRENCE-ID;TZID=America/New_York:20240904T110000 -CREATED:20240821T151507Z -DESCRIPTION:LLVM Loop Optimization Discussion
Video call link: https://meet.google.c - om/fmz-gspu-odg
Agenda/Minutes/Discussion: https://docs.google.com/document/d/1sdzoyB - 11s0ccTZ3fobqctDpgJmRoFcz0sviKxqczs4g/edit?usp=sharing\n\n-::~:~::~:~:~ - :~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~::~:~::-\ - nJoin with Google Meet: https://meet.google.com/fmz-gspu-odg\nOr dial: (DE) - +49 40 8081617343 PIN: 948106286#\nMore phone numbers: https://tel.meet/fm - z-gspu-odg?pin=6273693382184&hs=7\n\nLearn more about Meet at: https://supp - ort.google.com/a/users/answer/9282720\n\nPlease do not edit this section.\n - -::~:~::~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~ - :~:~::~:~::- -LAST-MODIFIED:20240821T160941Z -SEQUENCE:0 -STATUS:CONFIRMED -SUMMARY:LLVM Loop Optimization Discussion -TRANSP:OPAQUE -END:VEVENT -END:VCALENDAR +BEGIN:VCALENDAR +PRODID:-//Google Inc//Google Calendar 70.9054//EN +VERSION:2.0 +CALSCALE:GREGORIAN +METHOD:PUBLISH +X-WR-CALNAME:LLVM Loop Optimization Discussion +X-WR-TIMEZONE:Europe/Berlin +BEGIN:VTIMEZONE +TZID:America/New_York +X-LIC-LOCATION:America/New_York +BEGIN:DAYLIGHT +TZOFFSETFROM:-0500 +TZOFFSETTO:-0400 +TZNAME:EDT +DTSTART:19700308T020000 +RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU +END:DAYLIGHT +BEGIN:STANDARD +TZOFFSETFROM:-0400 +TZOFFSETTO:-0500 +TZNAME:EST +DTSTART:19701101T020000 +RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU +END:STANDARD +END:VTIMEZONE +BEGIN:VEVENT +DTSTART;TZID=America/New_York:20240904T110000 +DTEND;TZID=America/New_York:20240904T120000 +RRULE:FREQ=MONTHLY;BYDAY=1WE +DTSTAMP:20240821T160951Z +UID:58h3f0kd3aooohmeii0johh23c at google.com +X-GOOGLE-CONFERENCE:https://meet.google.com/fmz-gspu-odg +CREATED:20240821T151507Z +DESCRIPTION:LLVM Loop Optimization Discussion
Video call link: https://meet.google.c + om/fmz-gspu-odg
Agenda/Minutes/Discussion: https://docs.google.com/document/d/1sdzoyB + 11s0ccTZ3fobqctDpgJmRoFcz0sviKxqczs4g/edit?usp=sharing\n\n-::~:~::~:~:~ + :~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~::~:~::-\ + nJoin with Google Meet: https://meet.google.com/fmz-gspu-odg\nOr dial: (DE) + +49 40 8081617343 PIN: 948106286#\nMore phone numbers: https://tel.meet/fm + z-gspu-odg?pin=6273693382184&hs=7\n\nLearn more about Meet at: https://supp + ort.google.com/a/users/answer/9282720\n\nPlease do not edit this section.\n + -::~:~::~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~ + :~:~::~:~::- +LAST-MODIFIED:20240821T160941Z +SEQUENCE:0 +STATUS:CONFIRMED +SUMMARY:LLVM Loop Optimization Discussion +TRANSP:OPAQUE +END:VEVENT +BEGIN:VEVENT +DTSTART;TZID=America/New_York:20240904T110000 +DTEND;TZID=America/New_York:20240904T120000 +DTSTAMP:20240821T160951Z +UID:58h3f0kd3aooohmeii0johh23c at google.com +X-GOOGLE-CONFERENCE:https://meet.google.com/fmz-gspu-odg +RECURRENCE-ID;TZID=America/New_York:20240904T110000 +CREATED:20240821T151507Z +DESCRIPTION:LLVM Loop Optimization Discussion
Video call link: https://meet.google.c + om/fmz-gspu-odg
Agenda/Minutes/Discussion: https://docs.google.com/document/d/1sdzoyB + 11s0ccTZ3fobqctDpgJmRoFcz0sviKxqczs4g/edit?usp=sharing\n\n-::~:~::~:~:~ + :~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~::~:~::-\ + nJoin with Google Meet: https://meet.google.com/fmz-gspu-odg\nOr dial: (DE) + +49 40 8081617343 PIN: 948106286#\nMore phone numbers: https://tel.meet/fm + z-gspu-odg?pin=6273693382184&hs=7\n\nLearn more about Meet at: https://supp + ort.google.com/a/users/answer/9282720\n\nPlease do not edit this section.\n + -::~:~::~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~ + :~:~::~:~::- +LAST-MODIFIED:20240821T160941Z +SEQUENCE:0 +STATUS:CONFIRMED +SUMMARY:LLVM Loop Optimization Discussion +TRANSP:OPAQUE +END:VEVENT +END:VCALENDAR diff --git a/llvm/lib/Support/rpmalloc/CACHE.md b/llvm/lib/Support/rpmalloc/CACHE.md index 052320baf53275..645093026debf1 100644 --- a/llvm/lib/Support/rpmalloc/CACHE.md +++ b/llvm/lib/Support/rpmalloc/CACHE.md @@ -1,19 +1,19 @@ -# Thread caches -rpmalloc has a thread cache of free memory blocks which can be used in allocations without interfering with other threads or going to system to map more memory, as well as a global cache shared by all threads to let spans of memory pages flow between threads. Configuring the size of these caches can be crucial to obtaining good performance while minimizing memory overhead blowup. Below is a simple case study using the benchmark tool to compare different thread cache configurations for rpmalloc. - -The rpmalloc thread cache is configured to be unlimited, performance oriented as meaning default values, size oriented where both thread cache and global cache is reduced significantly, or disabled where both thread and global caches are disabled and completely free pages are directly unmapped. - -The benchmark is configured to run threads allocating 150000 blocks distributed in the `[16, 16000]` bytes range with a linear falloff probability. It runs 1000 loops, and every iteration 75000 blocks (50%) are freed and allocated in a scattered pattern. There are no cross thread allocations/deallocations. Parameters: `benchmark n 0 0 0 1000 150000 75000 16 16000`. The benchmarks are run on an Ubuntu 16.10 machine with 8 cores (4 physical, HT) and 12GiB RAM. - -The benchmark also includes results for the standard library malloc implementation as a reference for comparison with the nocache setting. - -![Ubuntu 16.10 random [16, 16000] bytes, 8 cores](https://docs.google.com/spreadsheets/d/1NWNuar1z0uPCB5iVS_Cs6hSo2xPkTmZf0KsgWS_Fb_4/pubchart?oid=387883204&format=image) -![Ubuntu 16.10 random [16, 16000] bytes, 8 cores](https://docs.google.com/spreadsheets/d/1NWNuar1z0uPCB5iVS_Cs6hSo2xPkTmZf0KsgWS_Fb_4/pubchart?oid=1644710241&format=image) - -For single threaded case the unlimited cache and performance oriented cache settings have identical performance and memory overhead, indicating that the memory pages fit in the combined thread and global cache. As number of threads increase to 2-4 threads, the performance settings have slightly higher performance which can seem odd at first, but can be explained by low contention on the global cache where some memory pages can flow between threads without stalling, reducing the overall number of calls to map new memory pages (also indicated by the slightly lower memory overhead). - -As threads increase even more to 5-10 threads, the increased contention and eventual limit of global cache cause the unlimited setting to gain a slight advantage in performance. As expected the memory overhead remains constant for unlimited caches, while going down for performance setting when number of threads increases. - -The size oriented setting maintain good performance compared to the standard library while reducing the memory overhead compared to the performance setting with a decent amount. - -The nocache setting still outperforms the reference standard library allocator for workloads up to 6 threads while maintaining a near zero memory overhead, which is even slightly lower than the standard library. For use case scenarios where number of allocation of each size class is lower the overhead in rpmalloc from the 64KiB span size will of course increase. +# Thread caches +rpmalloc has a thread cache of free memory blocks which can be used in allocations without interfering with other threads or going to system to map more memory, as well as a global cache shared by all threads to let spans of memory pages flow between threads. Configuring the size of these caches can be crucial to obtaining good performance while minimizing memory overhead blowup. Below is a simple case study using the benchmark tool to compare different thread cache configurations for rpmalloc. + +The rpmalloc thread cache is configured to be unlimited, performance oriented as meaning default values, size oriented where both thread cache and global cache is reduced significantly, or disabled where both thread and global caches are disabled and completely free pages are directly unmapped. + +The benchmark is configured to run threads allocating 150000 blocks distributed in the `[16, 16000]` bytes range with a linear falloff probability. It runs 1000 loops, and every iteration 75000 blocks (50%) are freed and allocated in a scattered pattern. There are no cross thread allocations/deallocations. Parameters: `benchmark n 0 0 0 1000 150000 75000 16 16000`. The benchmarks are run on an Ubuntu 16.10 machine with 8 cores (4 physical, HT) and 12GiB RAM. + +The benchmark also includes results for the standard library malloc implementation as a reference for comparison with the nocache setting. + +![Ubuntu 16.10 random [16, 16000] bytes, 8 cores](https://docs.google.com/spreadsheets/d/1NWNuar1z0uPCB5iVS_Cs6hSo2xPkTmZf0KsgWS_Fb_4/pubchart?oid=387883204&format=image) +![Ubuntu 16.10 random [16, 16000] bytes, 8 cores](https://docs.google.com/spreadsheets/d/1NWNuar1z0uPCB5iVS_Cs6hSo2xPkTmZf0KsgWS_Fb_4/pubchart?oid=1644710241&format=image) + +For single threaded case the unlimited cache and performance oriented cache settings have identical performance and memory overhead, indicating that the memory pages fit in the combined thread and global cache. As number of threads increase to 2-4 threads, the performance settings have slightly higher performance which can seem odd at first, but can be explained by low contention on the global cache where some memory pages can flow between threads without stalling, reducing the overall number of calls to map new memory pages (also indicated by the slightly lower memory overhead). + +As threads increase even more to 5-10 threads, the increased contention and eventual limit of global cache cause the unlimited setting to gain a slight advantage in performance. As expected the memory overhead remains constant for unlimited caches, while going down for performance setting when number of threads increases. + +The size oriented setting maintain good performance compared to the standard library while reducing the memory overhead compared to the performance setting with a decent amount. + +The nocache setting still outperforms the reference standard library allocator for workloads up to 6 threads while maintaining a near zero memory overhead, which is even slightly lower than the standard library. For use case scenarios where number of allocation of each size class is lower the overhead in rpmalloc from the 64KiB span size will of course increase. diff --git a/llvm/lib/Support/rpmalloc/README.md b/llvm/lib/Support/rpmalloc/README.md index 916bca0118d868..2233df9da42d52 100644 --- a/llvm/lib/Support/rpmalloc/README.md +++ b/llvm/lib/Support/rpmalloc/README.md @@ -1,220 +1,220 @@ -# rpmalloc - General Purpose Memory Allocator -This library provides a cross platform lock free thread caching 16-byte aligned memory allocator implemented in C. -This is a fork of rpmalloc 1.4.5. - -Platforms currently supported: - -- Windows -- MacOS -- iOS -- Linux -- Android -- Haiku - -The code should be easily portable to any platform with atomic operations and an mmap-style virtual memory management API. The API used to map/unmap memory pages can be configured in runtime to a custom implementation and mapping granularity/size. - -This library is put in the public domain; you can redistribute it and/or modify it without any restrictions. Or, if you choose, you can use it under the MIT license. - -# Performance -We believe rpmalloc is faster than most popular memory allocators like tcmalloc, hoard, ptmalloc3 and others without causing extra allocated memory overhead in the thread caches compared to these allocators. We also believe the implementation to be easier to read and modify compared to these allocators, as it is a single source file of ~3000 lines of C code. All allocations have a natural 16-byte alignment. - -Contained in a parallel repository is a benchmark utility that performs interleaved unaligned allocations and deallocations (both in-thread and cross-thread) in multiple threads. It measures number of memory operations performed per CPU second, as well as memory overhead by comparing the virtual memory mapped with the number of bytes requested in allocation calls. The setup of number of thread, cross-thread deallocation rate and allocation size limits is configured by command line arguments. - -https://github.com/mjansson/rpmalloc-benchmark - -Below is an example performance comparison chart of rpmalloc and other popular allocator implementations, with default configurations used. - -![Ubuntu 16.10, random [16, 8000] bytes, 8 cores](https://docs.google.com/spreadsheets/d/1NWNuar1z0uPCB5iVS_Cs6hSo2xPkTmZf0KsgWS_Fb_4/pubchart?oid=301017877&format=image) - -The benchmark producing these numbers were run on an Ubuntu 16.10 machine with 8 logical cores (4 physical, HT). The actual numbers are not to be interpreted as absolute performance figures, but rather as relative comparisons between the different allocators. For additional benchmark results, see the [BENCHMARKS](BENCHMARKS.md) file. - -Configuration of the thread and global caches can be important depending on your use pattern. See [CACHE](CACHE.md) for a case study and some comments/guidelines. - -# Required functions - -Before calling any other function in the API, you __MUST__ call the initialization function, either __rpmalloc_initialize__ or __rpmalloc_initialize_config__, or you will get undefined behaviour when calling other rpmalloc entry point. - -Before terminating your use of the allocator, you __SHOULD__ call __rpmalloc_finalize__ in order to release caches and unmap virtual memory, as well as prepare the allocator for global scope cleanup at process exit or dynamic library unload depending on your use case. - -# Using -The easiest way to use the library is simply adding __rpmalloc.[h|c]__ to your project and compile them along with your sources. This contains only the rpmalloc specific entry points and does not provide internal hooks to process and/or thread creation at the moment. You are required to call these functions from your own code in order to initialize and finalize the allocator in your process and threads: - -__rpmalloc_initialize__ : Call at process start to initialize the allocator - -__rpmalloc_initialize_config__ : Optional entry point to call at process start to initialize the allocator with a custom memory mapping backend, memory page size and mapping granularity. - -__rpmalloc_finalize__: Call at process exit to finalize the allocator - -__rpmalloc_thread_initialize__: Call at each thread start to initialize the thread local data for the allocator - -__rpmalloc_thread_finalize__: Call at each thread exit to finalize and release thread cache back to global cache - -__rpmalloc_config__: Get the current runtime configuration of the allocator - -Then simply use the __rpmalloc__/__rpfree__ and the other malloc style replacement functions. Remember all allocations are 16-byte aligned, so no need to call the explicit rpmemalign/rpaligned_alloc/rpposix_memalign functions unless you need greater alignment, they are simply wrappers to make it easier to replace in existing code. - -If you wish to override the standard library malloc family of functions and have automatic initialization/finalization of process and threads, define __ENABLE_OVERRIDE__ to non-zero which will include the `malloc.c` file in compilation of __rpmalloc.c__, and then rebuild the library or your project where you added the rpmalloc source. If you compile rpmalloc as a separate library you must make the linker use the override symbols from the library by referencing at least one symbol. The easiest way is to simply include `rpmalloc.h` in at least one source file and call `rpmalloc_linker_reference` somewhere - it's a dummy empty function. On Windows platforms and C++ overrides you have to `#include ` in at least one source file and also manually handle the initialize/finalize of the process and all threads. The list of libc entry points replaced may not be complete, use libc/stdc++ replacement only as a convenience for testing the library on an existing code base, not a final solution. - -For explicit first class heaps, see the __rpmalloc_heap_*__ API under [first class heaps](#first-class-heaps) section, requiring __RPMALLOC_FIRST_CLASS_HEAPS__ tp be defined to 1. - -# Building -To compile as a static library run the configure python script which generates a Ninja build script, then build using ninja. The ninja build produces two static libraries, one named `rpmalloc` and one named `rpmallocwrap`, where the latter includes the libc entry point overrides. - -The configure + ninja build also produces two shared object/dynamic libraries. The `rpmallocwrap` shared library can be used with LD_PRELOAD/DYLD_INSERT_LIBRARIES to inject in a preexisting binary, replacing any malloc/free family of function calls. This is only implemented for Linux and macOS targets. The list of libc entry points replaced may not be complete, use preloading as a convenience for testing the library on an existing binary, not a final solution. The dynamic library also provides automatic init/fini of process and threads for all platforms. - -The latest stable release is available in the master branch. For latest development code, use the develop branch. - -# Cache configuration options -Free memory pages are cached both per thread and in a global cache for all threads. The size of the thread caches is determined by an adaptive scheme where each cache is limited by a percentage of the maximum allocation count of the corresponding size class. The size of the global caches is determined by a multiple of the maximum of all thread caches. The factors controlling the cache sizes can be set by editing the individual defines in the `rpmalloc.c` source file for fine tuned control. - -__ENABLE_UNLIMITED_CACHE__: By default defined to 0, set to 1 to make all caches infinite, i.e never release spans to global cache unless thread finishes and never unmap memory pages back to the OS. Highest performance but largest memory overhead. - -__ENABLE_UNLIMITED_GLOBAL_CACHE__: By default defined to 0, set to 1 to make global caches infinite, i.e never unmap memory pages back to the OS. - -__ENABLE_UNLIMITED_THREAD_CACHE__: By default defined to 0, set to 1 to make thread caches infinite, i.e never release spans to global cache unless thread finishes. - -__ENABLE_GLOBAL_CACHE__: By default defined to 1, enables the global cache shared between all threads. Set to 0 to disable the global cache and directly unmap pages evicted from the thread cache. - -__ENABLE_THREAD_CACHE__: By default defined to 1, enables the per-thread cache. Set to 0 to disable the thread cache and directly unmap pages no longer in use (also disables the global cache). - -__ENABLE_ADAPTIVE_THREAD_CACHE__: Introduces a simple heuristics in the thread cache size, keeping 25% of the high water mark for each span count class. - -# Other configuration options -Detailed statistics are available if __ENABLE_STATISTICS__ is defined to 1 (default is 0, or disabled), either on compile command line or by setting the value in `rpmalloc.c`. This will cause a slight overhead in runtime to collect statistics for each memory operation, and will also add 4 bytes overhead per allocation to track sizes. - -Integer safety checks on all calls are enabled if __ENABLE_VALIDATE_ARGS__ is defined to 1 (default is 0, or disabled), either on compile command line or by setting the value in `rpmalloc.c`. If enabled, size arguments to the global entry points are verified not to cause integer overflows in calculations. - -Asserts are enabled if __ENABLE_ASSERTS__ is defined to 1 (default is 0, or disabled), either on compile command line or by setting the value in `rpmalloc.c`. - -To include __malloc.c__ in compilation and provide overrides of standard library malloc entry points define __ENABLE_OVERRIDE__ to 1. To enable automatic initialization of finalization of process and threads in order to preload the library into executables using standard library malloc, define __ENABLE_PRELOAD__ to 1. - -To enable the runtime configurable memory page and span sizes, define __RPMALLOC_CONFIGURABLE__ to 1. By default, memory page size is determined by system APIs and memory span size is set to 64KiB. - -To enable support for first class heaps, define __RPMALLOC_FIRST_CLASS_HEAPS__ to 1. By default, the first class heap API is disabled. - -# Huge pages -The allocator has support for huge/large pages on Windows, Linux and MacOS. To enable it, pass a non-zero value in the config value `enable_huge_pages` when initializing the allocator with `rpmalloc_initialize_config`. If the system does not support huge pages it will be automatically disabled. You can query the status by looking at `enable_huge_pages` in the config returned from a call to `rpmalloc_config` after initialization is done. - -# Quick overview -The allocator is similar in spirit to tcmalloc from the [Google Performance Toolkit](https://github.com/gperftools/gperftools). It uses separate heaps for each thread and partitions memory blocks according to a preconfigured set of size classes, up to 2MiB. Larger blocks are mapped and unmapped directly. Allocations for different size classes will be served from different set of memory pages, each "span" of pages is dedicated to one size class. Spans of pages can flow between threads when the thread cache overflows and are released to a global cache, or when the thread ends. Unlike tcmalloc, single blocks do not flow between threads, only entire spans of pages. - -# Implementation details -The allocator is based on a fixed but configurable page alignment (defaults to 64KiB) and 16 byte block alignment, where all runs of memory pages (spans) are mapped to this alignment boundary. On Windows this is automatically guaranteed up to 64KiB by the VirtualAlloc granularity, and on mmap systems it is achieved by oversizing the mapping and aligning the returned virtual memory address to the required boundaries. By aligning to a fixed size the free operation can locate the header of the memory span without having to do a table lookup (as tcmalloc does) by simply masking out the low bits of the address (for 64KiB this would be the low 16 bits). - -Memory blocks are divided into three categories. For 64KiB span size/alignment the small blocks are [16, 1024] bytes, medium blocks (1024, 32256] bytes, and large blocks (32256, 2097120] bytes. The three categories are further divided in size classes. If the span size is changed, the small block classes remain but medium blocks go from (1024, span size] bytes. - -Small blocks have a size class granularity of 16 bytes each in 64 buckets. Medium blocks have a granularity of 512 bytes, 61 buckets (default). Large blocks have the same granularity as the configured span size (default 64KiB). All allocations are fitted to these size class boundaries (an allocation of 36 bytes will allocate a block of 48 bytes). Each small and medium size class has an associated span (meaning a contiguous set of memory pages) configuration describing how many pages the size class will allocate each time the cache is empty and a new allocation is requested. - -Spans for small and medium blocks are cached in four levels to avoid calls to map/unmap memory pages. The first level is a per thread single active span for each size class. The second level is a per thread list of partially free spans for each size class. The third level is a per thread list of free spans. The fourth level is a global list of free spans. - -Each span for a small and medium size class keeps track of how many blocks are allocated/free, as well as a list of which blocks that are free for allocation. To avoid locks, each span is completely owned by the allocating thread, and all cross-thread deallocations will be deferred to the owner thread through a separate free list per span. - -Large blocks, or super spans, are cached in two levels. The first level is a per thread list of free super spans. The second level is a global list of free super spans. - -# Memory mapping -By default the allocator uses OS APIs to map virtual memory pages as needed, either `VirtualAlloc` on Windows or `mmap` on POSIX systems. If you want to use your own custom memory mapping provider you can use __rpmalloc_initialize_config__ and pass function pointers to map and unmap virtual memory. These function should reserve and free the requested number of bytes. - -The returned memory address from the memory map function MUST be aligned to the memory page size and the memory span size (which ever is larger), both of which is configurable. Either provide the page and span sizes during initialization using __rpmalloc_initialize_config__, or use __rpmalloc_config__ to find the required alignment which is equal to the maximum of page and span size. The span size MUST be a power of two in [4096, 262144] range, and be a multiple or divisor of the memory page size. - -Memory mapping requests are always done in multiples of the memory page size. You can specify a custom page size when initializing rpmalloc with __rpmalloc_initialize_config__, or pass 0 to let rpmalloc determine the system memory page size using OS APIs. The page size MUST be a power of two. - -To reduce system call overhead, memory spans are mapped in batches controlled by the `span_map_count` configuration variable (which defaults to the `DEFAULT_SPAN_MAP_COUNT` value if 0, which in turn is sized according to the cache configuration define, defaulting to 64). If the memory page size is larger than the span size, the number of spans to map in a single call will be adjusted to guarantee a multiple of the page size, and the spans will be kept mapped until the entire span range can be unmapped in one call (to avoid trying to unmap partial pages). - -On macOS and iOS mmap requests are tagged with tag 240 for easy identification with the vmmap tool. - -# Span breaking -Super spans (spans a multiple > 1 of the span size) can be subdivided into smaller spans to fulfill a need to map a new span of memory. By default the allocator will greedily grab and break any larger span from the available caches before mapping new virtual memory. However, spans can currently not be glued together to form larger super spans again. Subspans can traverse the cache and be used by different threads individually. - -A span that is a subspan of a larger super span can be individually decommitted to reduce physical memory pressure when the span is evicted from caches and scheduled to be unmapped. The entire original super span will keep track of the subspans it is broken up into, and when the entire range is decommitted the super span will be unmapped. This allows platforms like Windows that require the entire virtual memory range that was mapped in a call to VirtualAlloc to be unmapped in one call to VirtualFree, while still decommitting individual pages in subspans (if the page size is smaller than the span size). - -If you use a custom memory map/unmap function you need to take this into account by looking at the `release` parameter given to the `memory_unmap` function. It is set to 0 for decommitting individual pages and the total super span byte size for finally releasing the entire super span memory range. - -# Memory fragmentation -There is no memory fragmentation by the allocator in the sense that it will not leave unallocated and unusable "holes" in the memory pages by calls to allocate and free blocks of different sizes. This is due to the fact that the memory pages allocated for each size class is split up in perfectly aligned blocks which are not reused for a request of a different size. The block freed by a call to `rpfree` will always be immediately available for an allocation request within the same size class. - -However, there is memory fragmentation in the meaning that a request for x bytes followed by a request of y bytes where x and y are at least one size class different in size will return blocks that are at least one memory page apart in virtual address space. Only blocks of the same size will potentially be within the same memory page span. - -rpmalloc keeps an "active span" and free list for each size class. This leads to back-to-back allocations will most likely be served from within the same span of memory pages (unless the span runs out of free blocks). The rpmalloc implementation will also use any "holes" in memory pages in semi-filled spans before using a completely free span. - -# First class heaps -rpmalloc provides a first class heap type with explicit heap control API. Heaps are maintained with calls to __rpmalloc_heap_acquire__ and __rpmalloc_heap_release__ and allocations/frees are done with __rpmalloc_heap_alloc__ and __rpmalloc_heap_free__. See the `rpmalloc.h` documentation for the full list of functions in the heap API. The main use case of explicit heap control is to scope allocations in a heap and release everything with a single call to __rpmalloc_heap_free_all__ without having to maintain ownership of memory blocks. Note that the heap API is not thread-safe, the caller must make sure that each heap is only used in a single thread at any given time. - -# Producer-consumer scenario -Compared to the some other allocators, rpmalloc does not suffer as much from a producer-consumer thread scenario where one thread allocates memory blocks and another thread frees the blocks. In some allocators the free blocks need to traverse both the thread cache of the thread doing the free operations as well as the global cache before being reused in the allocating thread. In rpmalloc the freed blocks will be reused as soon as the allocating thread needs to get new spans from the thread cache. This enables faster release of completely freed memory pages as blocks in a memory page will not be aliased between different owning threads. - -# Best case scenarios -Threads that keep ownership of allocated memory blocks within the thread and free the blocks from the same thread will have optimal performance. - -Threads that have allocation patterns where the difference in memory usage high and low water marks fit within the thread cache thresholds in the allocator will never touch the global cache except during thread init/fini and have optimal performance. Tweaking the cache limits can be done on a per-size-class basis. - -# Worst case scenarios -Since each thread cache maps spans of memory pages per size class, a thread that allocates just a few blocks of each size class (16, 32, ...) for many size classes will never fill each bucket, and thus map a lot of memory pages while only using a small fraction of the mapped memory. However, the wasted memory will always be less than 4KiB (or the configured memory page size) per size class as each span is initialized one memory page at a time. The cache for free spans will be reused by all size classes. - -Threads that perform a lot of allocations and deallocations in a pattern that have a large difference in high and low water marks, and that difference is larger than the thread cache size, will put a lot of contention on the global cache. What will happen is the thread cache will overflow on each low water mark causing pages to be released to the global cache, then underflow on high water mark causing pages to be re-acquired from the global cache. This can be mitigated by changing the __MAX_SPAN_CACHE_DIVISOR__ define in the source code (at the cost of higher average memory overhead). - -# Caveats -VirtualAlloc has an internal granularity of 64KiB. However, mmap lacks this granularity control, and the implementation instead oversizes the memory mapping with configured span size to be able to always return a memory area with the required alignment. Since the extra memory pages are never touched this will not result in extra committed physical memory pages, but rather only increase virtual memory address space. - -All entry points assume the passed values are valid, for example passing an invalid pointer to free would most likely result in a segmentation fault. __The library does not try to guard against errors!__. - -To support global scope data doing dynamic allocation/deallocation such as C++ objects with custom constructors and destructors, the call to __rpmalloc_finalize__ will not completely terminate the allocator but rather empty all caches and put the allocator in finalization mode. Once this call has been made, the allocator is no longer thread safe and expects all remaining calls to originate from global data destruction on main thread. Any spans or heaps becoming free during this phase will be immediately unmapped to allow correct teardown of the process or dynamic library without any leaks. - -# Other languages - -[Johan Andersson](https://github.com/repi) at Embark has created a Rust wrapper available at [rpmalloc-rs](https://github.com/EmbarkStudios/rpmalloc-rs) - -[Stas Denisov](https://github.com/nxrighthere) has created a C# wrapper available at [Rpmalloc-CSharp](https://github.com/nxrighthere/Rpmalloc-CSharp) - -# License - -This is free and unencumbered software released into the public domain. - -Anyone is free to copy, modify, publish, use, compile, sell, or -distribute this software, either in source code form or as a compiled -binary, for any purpose, commercial or non-commercial, and by any -means. - -In jurisdictions that recognize copyright laws, the author or authors -of this software dedicate any and all copyright interest in the -software to the public domain. We make this dedication for the benefit -of the public at large and to the detriment of our heirs and -successors. We intend this dedication to be an overt act of -relinquishment in perpetuity of all present and future rights to this -software under copyright law. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, -EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF -MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. -IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR -OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, -ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR -OTHER DEALINGS IN THE SOFTWARE. - -For more information, please refer to - - -You can also use this software under the MIT license if public domain is -not recognized in your country - - -The MIT License (MIT) - -Copyright (c) 2017 Mattias Jansson - -Permission is hereby granted, free of charge, to any person obtaining a copy -of this software and associated documentation files (the "Software"), to deal -in the Software without restriction, including without limitation the rights -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -copies of the Software, and to permit persons to whom the Software is -furnished to do so, subject to the following conditions: - -The above copyright notice and this permission notice shall be included in -all copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN -THE SOFTWARE. +# rpmalloc - General Purpose Memory Allocator +This library provides a cross platform lock free thread caching 16-byte aligned memory allocator implemented in C. +This is a fork of rpmalloc 1.4.5. + +Platforms currently supported: + +- Windows +- MacOS +- iOS +- Linux +- Android +- Haiku + +The code should be easily portable to any platform with atomic operations and an mmap-style virtual memory management API. The API used to map/unmap memory pages can be configured in runtime to a custom implementation and mapping granularity/size. + +This library is put in the public domain; you can redistribute it and/or modify it without any restrictions. Or, if you choose, you can use it under the MIT license. + +# Performance +We believe rpmalloc is faster than most popular memory allocators like tcmalloc, hoard, ptmalloc3 and others without causing extra allocated memory overhead in the thread caches compared to these allocators. We also believe the implementation to be easier to read and modify compared to these allocators, as it is a single source file of ~3000 lines of C code. All allocations have a natural 16-byte alignment. + +Contained in a parallel repository is a benchmark utility that performs interleaved unaligned allocations and deallocations (both in-thread and cross-thread) in multiple threads. It measures number of memory operations performed per CPU second, as well as memory overhead by comparing the virtual memory mapped with the number of bytes requested in allocation calls. The setup of number of thread, cross-thread deallocation rate and allocation size limits is configured by command line arguments. + +https://github.com/mjansson/rpmalloc-benchmark + +Below is an example performance comparison chart of rpmalloc and other popular allocator implementations, with default configurations used. + +![Ubuntu 16.10, random [16, 8000] bytes, 8 cores](https://docs.google.com/spreadsheets/d/1NWNuar1z0uPCB5iVS_Cs6hSo2xPkTmZf0KsgWS_Fb_4/pubchart?oid=301017877&format=image) + +The benchmark producing these numbers were run on an Ubuntu 16.10 machine with 8 logical cores (4 physical, HT). The actual numbers are not to be interpreted as absolute performance figures, but rather as relative comparisons between the different allocators. For additional benchmark results, see the [BENCHMARKS](BENCHMARKS.md) file. + +Configuration of the thread and global caches can be important depending on your use pattern. See [CACHE](CACHE.md) for a case study and some comments/guidelines. + +# Required functions + +Before calling any other function in the API, you __MUST__ call the initialization function, either __rpmalloc_initialize__ or __rpmalloc_initialize_config__, or you will get undefined behaviour when calling other rpmalloc entry point. + +Before terminating your use of the allocator, you __SHOULD__ call __rpmalloc_finalize__ in order to release caches and unmap virtual memory, as well as prepare the allocator for global scope cleanup at process exit or dynamic library unload depending on your use case. + +# Using +The easiest way to use the library is simply adding __rpmalloc.[h|c]__ to your project and compile them along with your sources. This contains only the rpmalloc specific entry points and does not provide internal hooks to process and/or thread creation at the moment. You are required to call these functions from your own code in order to initialize and finalize the allocator in your process and threads: + +__rpmalloc_initialize__ : Call at process start to initialize the allocator + +__rpmalloc_initialize_config__ : Optional entry point to call at process start to initialize the allocator with a custom memory mapping backend, memory page size and mapping granularity. + +__rpmalloc_finalize__: Call at process exit to finalize the allocator + +__rpmalloc_thread_initialize__: Call at each thread start to initialize the thread local data for the allocator + +__rpmalloc_thread_finalize__: Call at each thread exit to finalize and release thread cache back to global cache + +__rpmalloc_config__: Get the current runtime configuration of the allocator + +Then simply use the __rpmalloc__/__rpfree__ and the other malloc style replacement functions. Remember all allocations are 16-byte aligned, so no need to call the explicit rpmemalign/rpaligned_alloc/rpposix_memalign functions unless you need greater alignment, they are simply wrappers to make it easier to replace in existing code. + +If you wish to override the standard library malloc family of functions and have automatic initialization/finalization of process and threads, define __ENABLE_OVERRIDE__ to non-zero which will include the `malloc.c` file in compilation of __rpmalloc.c__, and then rebuild the library or your project where you added the rpmalloc source. If you compile rpmalloc as a separate library you must make the linker use the override symbols from the library by referencing at least one symbol. The easiest way is to simply include `rpmalloc.h` in at least one source file and call `rpmalloc_linker_reference` somewhere - it's a dummy empty function. On Windows platforms and C++ overrides you have to `#include ` in at least one source file and also manually handle the initialize/finalize of the process and all threads. The list of libc entry points replaced may not be complete, use libc/stdc++ replacement only as a convenience for testing the library on an existing code base, not a final solution. + +For explicit first class heaps, see the __rpmalloc_heap_*__ API under [first class heaps](#first-class-heaps) section, requiring __RPMALLOC_FIRST_CLASS_HEAPS__ tp be defined to 1. + +# Building +To compile as a static library run the configure python script which generates a Ninja build script, then build using ninja. The ninja build produces two static libraries, one named `rpmalloc` and one named `rpmallocwrap`, where the latter includes the libc entry point overrides. + +The configure + ninja build also produces two shared object/dynamic libraries. The `rpmallocwrap` shared library can be used with LD_PRELOAD/DYLD_INSERT_LIBRARIES to inject in a preexisting binary, replacing any malloc/free family of function calls. This is only implemented for Linux and macOS targets. The list of libc entry points replaced may not be complete, use preloading as a convenience for testing the library on an existing binary, not a final solution. The dynamic library also provides automatic init/fini of process and threads for all platforms. + +The latest stable release is available in the master branch. For latest development code, use the develop branch. + +# Cache configuration options +Free memory pages are cached both per thread and in a global cache for all threads. The size of the thread caches is determined by an adaptive scheme where each cache is limited by a percentage of the maximum allocation count of the corresponding size class. The size of the global caches is determined by a multiple of the maximum of all thread caches. The factors controlling the cache sizes can be set by editing the individual defines in the `rpmalloc.c` source file for fine tuned control. + +__ENABLE_UNLIMITED_CACHE__: By default defined to 0, set to 1 to make all caches infinite, i.e never release spans to global cache unless thread finishes and never unmap memory pages back to the OS. Highest performance but largest memory overhead. + +__ENABLE_UNLIMITED_GLOBAL_CACHE__: By default defined to 0, set to 1 to make global caches infinite, i.e never unmap memory pages back to the OS. + +__ENABLE_UNLIMITED_THREAD_CACHE__: By default defined to 0, set to 1 to make thread caches infinite, i.e never release spans to global cache unless thread finishes. + +__ENABLE_GLOBAL_CACHE__: By default defined to 1, enables the global cache shared between all threads. Set to 0 to disable the global cache and directly unmap pages evicted from the thread cache. + +__ENABLE_THREAD_CACHE__: By default defined to 1, enables the per-thread cache. Set to 0 to disable the thread cache and directly unmap pages no longer in use (also disables the global cache). + +__ENABLE_ADAPTIVE_THREAD_CACHE__: Introduces a simple heuristics in the thread cache size, keeping 25% of the high water mark for each span count class. + +# Other configuration options +Detailed statistics are available if __ENABLE_STATISTICS__ is defined to 1 (default is 0, or disabled), either on compile command line or by setting the value in `rpmalloc.c`. This will cause a slight overhead in runtime to collect statistics for each memory operation, and will also add 4 bytes overhead per allocation to track sizes. + +Integer safety checks on all calls are enabled if __ENABLE_VALIDATE_ARGS__ is defined to 1 (default is 0, or disabled), either on compile command line or by setting the value in `rpmalloc.c`. If enabled, size arguments to the global entry points are verified not to cause integer overflows in calculations. + +Asserts are enabled if __ENABLE_ASSERTS__ is defined to 1 (default is 0, or disabled), either on compile command line or by setting the value in `rpmalloc.c`. + +To include __malloc.c__ in compilation and provide overrides of standard library malloc entry points define __ENABLE_OVERRIDE__ to 1. To enable automatic initialization of finalization of process and threads in order to preload the library into executables using standard library malloc, define __ENABLE_PRELOAD__ to 1. + +To enable the runtime configurable memory page and span sizes, define __RPMALLOC_CONFIGURABLE__ to 1. By default, memory page size is determined by system APIs and memory span size is set to 64KiB. + +To enable support for first class heaps, define __RPMALLOC_FIRST_CLASS_HEAPS__ to 1. By default, the first class heap API is disabled. + +# Huge pages +The allocator has support for huge/large pages on Windows, Linux and MacOS. To enable it, pass a non-zero value in the config value `enable_huge_pages` when initializing the allocator with `rpmalloc_initialize_config`. If the system does not support huge pages it will be automatically disabled. You can query the status by looking at `enable_huge_pages` in the config returned from a call to `rpmalloc_config` after initialization is done. + +# Quick overview +The allocator is similar in spirit to tcmalloc from the [Google Performance Toolkit](https://github.com/gperftools/gperftools). It uses separate heaps for each thread and partitions memory blocks according to a preconfigured set of size classes, up to 2MiB. Larger blocks are mapped and unmapped directly. Allocations for different size classes will be served from different set of memory pages, each "span" of pages is dedicated to one size class. Spans of pages can flow between threads when the thread cache overflows and are released to a global cache, or when the thread ends. Unlike tcmalloc, single blocks do not flow between threads, only entire spans of pages. + +# Implementation details +The allocator is based on a fixed but configurable page alignment (defaults to 64KiB) and 16 byte block alignment, where all runs of memory pages (spans) are mapped to this alignment boundary. On Windows this is automatically guaranteed up to 64KiB by the VirtualAlloc granularity, and on mmap systems it is achieved by oversizing the mapping and aligning the returned virtual memory address to the required boundaries. By aligning to a fixed size the free operation can locate the header of the memory span without having to do a table lookup (as tcmalloc does) by simply masking out the low bits of the address (for 64KiB this would be the low 16 bits). + +Memory blocks are divided into three categories. For 64KiB span size/alignment the small blocks are [16, 1024] bytes, medium blocks (1024, 32256] bytes, and large blocks (32256, 2097120] bytes. The three categories are further divided in size classes. If the span size is changed, the small block classes remain but medium blocks go from (1024, span size] bytes. + +Small blocks have a size class granularity of 16 bytes each in 64 buckets. Medium blocks have a granularity of 512 bytes, 61 buckets (default). Large blocks have the same granularity as the configured span size (default 64KiB). All allocations are fitted to these size class boundaries (an allocation of 36 bytes will allocate a block of 48 bytes). Each small and medium size class has an associated span (meaning a contiguous set of memory pages) configuration describing how many pages the size class will allocate each time the cache is empty and a new allocation is requested. + +Spans for small and medium blocks are cached in four levels to avoid calls to map/unmap memory pages. The first level is a per thread single active span for each size class. The second level is a per thread list of partially free spans for each size class. The third level is a per thread list of free spans. The fourth level is a global list of free spans. + +Each span for a small and medium size class keeps track of how many blocks are allocated/free, as well as a list of which blocks that are free for allocation. To avoid locks, each span is completely owned by the allocating thread, and all cross-thread deallocations will be deferred to the owner thread through a separate free list per span. + +Large blocks, or super spans, are cached in two levels. The first level is a per thread list of free super spans. The second level is a global list of free super spans. + +# Memory mapping +By default the allocator uses OS APIs to map virtual memory pages as needed, either `VirtualAlloc` on Windows or `mmap` on POSIX systems. If you want to use your own custom memory mapping provider you can use __rpmalloc_initialize_config__ and pass function pointers to map and unmap virtual memory. These function should reserve and free the requested number of bytes. + +The returned memory address from the memory map function MUST be aligned to the memory page size and the memory span size (which ever is larger), both of which is configurable. Either provide the page and span sizes during initialization using __rpmalloc_initialize_config__, or use __rpmalloc_config__ to find the required alignment which is equal to the maximum of page and span size. The span size MUST be a power of two in [4096, 262144] range, and be a multiple or divisor of the memory page size. + +Memory mapping requests are always done in multiples of the memory page size. You can specify a custom page size when initializing rpmalloc with __rpmalloc_initialize_config__, or pass 0 to let rpmalloc determine the system memory page size using OS APIs. The page size MUST be a power of two. + +To reduce system call overhead, memory spans are mapped in batches controlled by the `span_map_count` configuration variable (which defaults to the `DEFAULT_SPAN_MAP_COUNT` value if 0, which in turn is sized according to the cache configuration define, defaulting to 64). If the memory page size is larger than the span size, the number of spans to map in a single call will be adjusted to guarantee a multiple of the page size, and the spans will be kept mapped until the entire span range can be unmapped in one call (to avoid trying to unmap partial pages). + +On macOS and iOS mmap requests are tagged with tag 240 for easy identification with the vmmap tool. + +# Span breaking +Super spans (spans a multiple > 1 of the span size) can be subdivided into smaller spans to fulfill a need to map a new span of memory. By default the allocator will greedily grab and break any larger span from the available caches before mapping new virtual memory. However, spans can currently not be glued together to form larger super spans again. Subspans can traverse the cache and be used by different threads individually. + +A span that is a subspan of a larger super span can be individually decommitted to reduce physical memory pressure when the span is evicted from caches and scheduled to be unmapped. The entire original super span will keep track of the subspans it is broken up into, and when the entire range is decommitted the super span will be unmapped. This allows platforms like Windows that require the entire virtual memory range that was mapped in a call to VirtualAlloc to be unmapped in one call to VirtualFree, while still decommitting individual pages in subspans (if the page size is smaller than the span size). + +If you use a custom memory map/unmap function you need to take this into account by looking at the `release` parameter given to the `memory_unmap` function. It is set to 0 for decommitting individual pages and the total super span byte size for finally releasing the entire super span memory range. + +# Memory fragmentation +There is no memory fragmentation by the allocator in the sense that it will not leave unallocated and unusable "holes" in the memory pages by calls to allocate and free blocks of different sizes. This is due to the fact that the memory pages allocated for each size class is split up in perfectly aligned blocks which are not reused for a request of a different size. The block freed by a call to `rpfree` will always be immediately available for an allocation request within the same size class. + +However, there is memory fragmentation in the meaning that a request for x bytes followed by a request of y bytes where x and y are at least one size class different in size will return blocks that are at least one memory page apart in virtual address space. Only blocks of the same size will potentially be within the same memory page span. + +rpmalloc keeps an "active span" and free list for each size class. This leads to back-to-back allocations will most likely be served from within the same span of memory pages (unless the span runs out of free blocks). The rpmalloc implementation will also use any "holes" in memory pages in semi-filled spans before using a completely free span. + +# First class heaps +rpmalloc provides a first class heap type with explicit heap control API. Heaps are maintained with calls to __rpmalloc_heap_acquire__ and __rpmalloc_heap_release__ and allocations/frees are done with __rpmalloc_heap_alloc__ and __rpmalloc_heap_free__. See the `rpmalloc.h` documentation for the full list of functions in the heap API. The main use case of explicit heap control is to scope allocations in a heap and release everything with a single call to __rpmalloc_heap_free_all__ without having to maintain ownership of memory blocks. Note that the heap API is not thread-safe, the caller must make sure that each heap is only used in a single thread at any given time. + +# Producer-consumer scenario +Compared to the some other allocators, rpmalloc does not suffer as much from a producer-consumer thread scenario where one thread allocates memory blocks and another thread frees the blocks. In some allocators the free blocks need to traverse both the thread cache of the thread doing the free operations as well as the global cache before being reused in the allocating thread. In rpmalloc the freed blocks will be reused as soon as the allocating thread needs to get new spans from the thread cache. This enables faster release of completely freed memory pages as blocks in a memory page will not be aliased between different owning threads. + +# Best case scenarios +Threads that keep ownership of allocated memory blocks within the thread and free the blocks from the same thread will have optimal performance. + +Threads that have allocation patterns where the difference in memory usage high and low water marks fit within the thread cache thresholds in the allocator will never touch the global cache except during thread init/fini and have optimal performance. Tweaking the cache limits can be done on a per-size-class basis. + +# Worst case scenarios +Since each thread cache maps spans of memory pages per size class, a thread that allocates just a few blocks of each size class (16, 32, ...) for many size classes will never fill each bucket, and thus map a lot of memory pages while only using a small fraction of the mapped memory. However, the wasted memory will always be less than 4KiB (or the configured memory page size) per size class as each span is initialized one memory page at a time. The cache for free spans will be reused by all size classes. + +Threads that perform a lot of allocations and deallocations in a pattern that have a large difference in high and low water marks, and that difference is larger than the thread cache size, will put a lot of contention on the global cache. What will happen is the thread cache will overflow on each low water mark causing pages to be released to the global cache, then underflow on high water mark causing pages to be re-acquired from the global cache. This can be mitigated by changing the __MAX_SPAN_CACHE_DIVISOR__ define in the source code (at the cost of higher average memory overhead). + +# Caveats +VirtualAlloc has an internal granularity of 64KiB. However, mmap lacks this granularity control, and the implementation instead oversizes the memory mapping with configured span size to be able to always return a memory area with the required alignment. Since the extra memory pages are never touched this will not result in extra committed physical memory pages, but rather only increase virtual memory address space. + +All entry points assume the passed values are valid, for example passing an invalid pointer to free would most likely result in a segmentation fault. __The library does not try to guard against errors!__. + +To support global scope data doing dynamic allocation/deallocation such as C++ objects with custom constructors and destructors, the call to __rpmalloc_finalize__ will not completely terminate the allocator but rather empty all caches and put the allocator in finalization mode. Once this call has been made, the allocator is no longer thread safe and expects all remaining calls to originate from global data destruction on main thread. Any spans or heaps becoming free during this phase will be immediately unmapped to allow correct teardown of the process or dynamic library without any leaks. + +# Other languages + +[Johan Andersson](https://github.com/repi) at Embark has created a Rust wrapper available at [rpmalloc-rs](https://github.com/EmbarkStudios/rpmalloc-rs) + +[Stas Denisov](https://github.com/nxrighthere) has created a C# wrapper available at [Rpmalloc-CSharp](https://github.com/nxrighthere/Rpmalloc-CSharp) + +# License + +This is free and unencumbered software released into the public domain. + +Anyone is free to copy, modify, publish, use, compile, sell, or +distribute this software, either in source code form or as a compiled +binary, for any purpose, commercial or non-commercial, and by any +means. + +In jurisdictions that recognize copyright laws, the author or authors +of this software dedicate any and all copyright interest in the +software to the public domain. We make this dedication for the benefit +of the public at large and to the detriment of our heirs and +successors. We intend this dedication to be an overt act of +relinquishment in perpetuity of all present and future rights to this +software under copyright law. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. +IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR +OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, +ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR +OTHER DEALINGS IN THE SOFTWARE. + +For more information, please refer to + + +You can also use this software under the MIT license if public domain is +not recognized in your country + + +The MIT License (MIT) + +Copyright (c) 2017 Mattias Jansson + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. diff --git a/llvm/lib/Support/rpmalloc/malloc.c b/llvm/lib/Support/rpmalloc/malloc.c index 3fcfe848250c6b..59e13aab3ef7ed 100644 --- a/llvm/lib/Support/rpmalloc/malloc.c +++ b/llvm/lib/Support/rpmalloc/malloc.c @@ -1,724 +1,724 @@ -//===------------------------ malloc.c ------------------*- C -*-=============// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// -// -// This library provides a cross-platform lock free thread caching malloc -// implementation in C11. -// -// -// This file provides overrides for the standard library malloc entry points for -// C and new/delete operators for C++ It also provides automatic -// initialization/finalization of process and threads -// -//===----------------------------------------------------------------------===// - -#if defined(__TINYC__) -#include -#endif - -#ifndef ARCH_64BIT -#if defined(__LLP64__) || defined(__LP64__) || defined(_WIN64) -#define ARCH_64BIT 1 -_Static_assert(sizeof(size_t) == 8, "Data type size mismatch"); -_Static_assert(sizeof(void *) == 8, "Data type size mismatch"); -#else -#define ARCH_64BIT 0 -_Static_assert(sizeof(size_t) == 4, "Data type size mismatch"); -_Static_assert(sizeof(void *) == 4, "Data type size mismatch"); -#endif -#endif - -#if (defined(__GNUC__) || defined(__clang__)) -#pragma GCC visibility push(default) -#endif - -#define USE_IMPLEMENT 1 -#define USE_INTERPOSE 0 -#define USE_ALIAS 0 - -#if defined(__APPLE__) -#undef USE_INTERPOSE -#define USE_INTERPOSE 1 - -typedef struct interpose_t { - void *new_func; - void *orig_func; -} interpose_t; - -#define MAC_INTERPOSE_PAIR(newf, oldf) {(void *)newf, (void *)oldf} -#define MAC_INTERPOSE_SINGLE(newf, oldf) \ - __attribute__((used)) static const interpose_t macinterpose##newf##oldf \ - __attribute__((section("__DATA, __interpose"))) = \ - MAC_INTERPOSE_PAIR(newf, oldf) - -#endif - -#if !defined(_WIN32) && !defined(__APPLE__) -#undef USE_IMPLEMENT -#undef USE_ALIAS -#define USE_IMPLEMENT 0 -#define USE_ALIAS 1 -#endif - -#ifdef _MSC_VER -#pragma warning(disable : 4100) -#undef malloc -#undef free -#undef calloc -#define RPMALLOC_RESTRICT __declspec(restrict) -#else -#define RPMALLOC_RESTRICT -#endif - -#if ENABLE_OVERRIDE - -typedef struct rp_nothrow_t { - int __dummy; -} rp_nothrow_t; - -#if USE_IMPLEMENT - -extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL malloc(size_t size) { - return rpmalloc(size); -} -extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL calloc(size_t count, - size_t size) { - return rpcalloc(count, size); -} -extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL realloc(void *ptr, - size_t size) { - return rprealloc(ptr, size); -} -extern inline void *RPMALLOC_CDECL reallocf(void *ptr, size_t size) { - return rprealloc(ptr, size); -} -extern inline void *RPMALLOC_CDECL aligned_alloc(size_t alignment, - size_t size) { - return rpaligned_alloc(alignment, size); -} -extern inline void *RPMALLOC_CDECL memalign(size_t alignment, size_t size) { - return rpmemalign(alignment, size); -} -extern inline int RPMALLOC_CDECL posix_memalign(void **memptr, size_t alignment, - size_t size) { - return rpposix_memalign(memptr, alignment, size); -} -extern inline void RPMALLOC_CDECL free(void *ptr) { rpfree(ptr); } -extern inline void RPMALLOC_CDECL cfree(void *ptr) { rpfree(ptr); } -extern inline size_t RPMALLOC_CDECL malloc_usable_size(void *ptr) { - return rpmalloc_usable_size(ptr); -} -extern inline size_t RPMALLOC_CDECL malloc_size(void *ptr) { - return rpmalloc_usable_size(ptr); -} - -#ifdef _WIN32 -extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL _malloc_base(size_t size) { - return rpmalloc(size); -} -extern inline void RPMALLOC_CDECL _free_base(void *ptr) { rpfree(ptr); } -extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL _calloc_base(size_t count, - size_t size) { - return rpcalloc(count, size); -} -extern inline size_t RPMALLOC_CDECL _msize(void *ptr) { - return rpmalloc_usable_size(ptr); -} -extern inline size_t RPMALLOC_CDECL _msize_base(void *ptr) { - return rpmalloc_usable_size(ptr); -} -extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL -_realloc_base(void *ptr, size_t size) { - return rprealloc(ptr, size); -} -#endif - -#ifdef _WIN32 -// For Windows, #include in one source file to get the C++ operator -// overrides implemented in your module -#else -// Overload the C++ operators using the mangled names -// (https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling) operators -// delete and delete[] -#define RPDEFVIS __attribute__((visibility("default"))) -extern void _ZdlPv(void *p); -void RPDEFVIS _ZdlPv(void *p) { rpfree(p); } -extern void _ZdaPv(void *p); -void RPDEFVIS _ZdaPv(void *p) { rpfree(p); } -#if ARCH_64BIT -// 64-bit operators new and new[], normal and aligned -extern void *_Znwm(uint64_t size); -void *RPDEFVIS _Znwm(uint64_t size) { return rpmalloc(size); } -extern void *_Znam(uint64_t size); -void *RPDEFVIS _Znam(uint64_t size) { return rpmalloc(size); } -extern void *_Znwmm(uint64_t size, uint64_t align); -void *RPDEFVIS _Znwmm(uint64_t size, uint64_t align) { - return rpaligned_alloc(align, size); -} -extern void *_Znamm(uint64_t size, uint64_t align); -void *RPDEFVIS _Znamm(uint64_t size, uint64_t align) { - return rpaligned_alloc(align, size); -} -extern void *_ZnwmSt11align_val_t(uint64_t size, uint64_t align); -void *RPDEFVIS _ZnwmSt11align_val_t(uint64_t size, uint64_t align) { - return rpaligned_alloc(align, size); -} -extern void *_ZnamSt11align_val_t(uint64_t size, uint64_t align); -void *RPDEFVIS _ZnamSt11align_val_t(uint64_t size, uint64_t align) { - return rpaligned_alloc(align, size); -} -extern void *_ZnwmRKSt9nothrow_t(uint64_t size, rp_nothrow_t t); -void *RPDEFVIS _ZnwmRKSt9nothrow_t(uint64_t size, rp_nothrow_t t) { - (void)sizeof(t); - return rpmalloc(size); -} -extern void *_ZnamRKSt9nothrow_t(uint64_t size, rp_nothrow_t t); -void *RPDEFVIS _ZnamRKSt9nothrow_t(uint64_t size, rp_nothrow_t t) { - (void)sizeof(t); - return rpmalloc(size); -} -extern void *_ZnwmSt11align_val_tRKSt9nothrow_t(uint64_t size, uint64_t align, - rp_nothrow_t t); -void *RPDEFVIS _ZnwmSt11align_val_tRKSt9nothrow_t(uint64_t size, uint64_t align, - rp_nothrow_t t) { - (void)sizeof(t); - return rpaligned_alloc(align, size); -} -extern void *_ZnamSt11align_val_tRKSt9nothrow_t(uint64_t size, uint64_t align, - rp_nothrow_t t); -void *RPDEFVIS _ZnamSt11align_val_tRKSt9nothrow_t(uint64_t size, uint64_t align, - rp_nothrow_t t) { - (void)sizeof(t); - return rpaligned_alloc(align, size); -} -// 64-bit operators sized delete and delete[], normal and aligned -extern void _ZdlPvm(void *p, uint64_t size); -void RPDEFVIS _ZdlPvm(void *p, uint64_t size) { - rpfree(p); - (void)sizeof(size); -} -extern void _ZdaPvm(void *p, uint64_t size); -void RPDEFVIS _ZdaPvm(void *p, uint64_t size) { - rpfree(p); - (void)sizeof(size); -} -extern void _ZdlPvSt11align_val_t(void *p, uint64_t align); -void RPDEFVIS _ZdlPvSt11align_val_t(void *p, uint64_t align) { - rpfree(p); - (void)sizeof(align); -} -extern void _ZdaPvSt11align_val_t(void *p, uint64_t align); -void RPDEFVIS _ZdaPvSt11align_val_t(void *p, uint64_t align) { - rpfree(p); - (void)sizeof(align); -} -extern void _ZdlPvmSt11align_val_t(void *p, uint64_t size, uint64_t align); -void RPDEFVIS _ZdlPvmSt11align_val_t(void *p, uint64_t size, uint64_t align) { - rpfree(p); - (void)sizeof(size); - (void)sizeof(align); -} -extern void _ZdaPvmSt11align_val_t(void *p, uint64_t size, uint64_t align); -void RPDEFVIS _ZdaPvmSt11align_val_t(void *p, uint64_t size, uint64_t align) { - rpfree(p); - (void)sizeof(size); - (void)sizeof(align); -} -#else -// 32-bit operators new and new[], normal and aligned -extern void *_Znwj(uint32_t size); -void *RPDEFVIS _Znwj(uint32_t size) { return rpmalloc(size); } -extern void *_Znaj(uint32_t size); -void *RPDEFVIS _Znaj(uint32_t size) { return rpmalloc(size); } -extern void *_Znwjj(uint32_t size, uint32_t align); -void *RPDEFVIS _Znwjj(uint32_t size, uint32_t align) { - return rpaligned_alloc(align, size); -} -extern void *_Znajj(uint32_t size, uint32_t align); -void *RPDEFVIS _Znajj(uint32_t size, uint32_t align) { - return rpaligned_alloc(align, size); -} -extern void *_ZnwjSt11align_val_t(size_t size, size_t align); -void *RPDEFVIS _ZnwjSt11align_val_t(size_t size, size_t align) { - return rpaligned_alloc(align, size); -} -extern void *_ZnajSt11align_val_t(size_t size, size_t align); -void *RPDEFVIS _ZnajSt11align_val_t(size_t size, size_t align) { - return rpaligned_alloc(align, size); -} -extern void *_ZnwjRKSt9nothrow_t(size_t size, rp_nothrow_t t); -void *RPDEFVIS _ZnwjRKSt9nothrow_t(size_t size, rp_nothrow_t t) { - (void)sizeof(t); - return rpmalloc(size); -} -extern void *_ZnajRKSt9nothrow_t(size_t size, rp_nothrow_t t); -void *RPDEFVIS _ZnajRKSt9nothrow_t(size_t size, rp_nothrow_t t) { - (void)sizeof(t); - return rpmalloc(size); -} -extern void *_ZnwjSt11align_val_tRKSt9nothrow_t(size_t size, size_t align, - rp_nothrow_t t); -void *RPDEFVIS _ZnwjSt11align_val_tRKSt9nothrow_t(size_t size, size_t align, - rp_nothrow_t t) { - (void)sizeof(t); - return rpaligned_alloc(align, size); -} -extern void *_ZnajSt11align_val_tRKSt9nothrow_t(size_t size, size_t align, - rp_nothrow_t t); -void *RPDEFVIS _ZnajSt11align_val_tRKSt9nothrow_t(size_t size, size_t align, - rp_nothrow_t t) { - (void)sizeof(t); - return rpaligned_alloc(align, size); -} -// 32-bit operators sized delete and delete[], normal and aligned -extern void _ZdlPvj(void *p, uint64_t size); -void RPDEFVIS _ZdlPvj(void *p, uint64_t size) { - rpfree(p); - (void)sizeof(size); -} -extern void _ZdaPvj(void *p, uint64_t size); -void RPDEFVIS _ZdaPvj(void *p, uint64_t size) { - rpfree(p); - (void)sizeof(size); -} -extern void _ZdlPvSt11align_val_t(void *p, uint32_t align); -void RPDEFVIS _ZdlPvSt11align_val_t(void *p, uint64_t a) { - rpfree(p); - (void)sizeof(align); -} -extern void _ZdaPvSt11align_val_t(void *p, uint32_t align); -void RPDEFVIS _ZdaPvSt11align_val_t(void *p, uint64_t a) { - rpfree(p); - (void)sizeof(align); -} -extern void _ZdlPvjSt11align_val_t(void *p, uint32_t size, uint32_t align); -void RPDEFVIS _ZdlPvjSt11align_val_t(void *p, uint64_t size, uint64_t align) { - rpfree(p); - (void)sizeof(size); - (void)sizeof(a); -} -extern void _ZdaPvjSt11align_val_t(void *p, uint32_t size, uint32_t align); -void RPDEFVIS _ZdaPvjSt11align_val_t(void *p, uint64_t size, uint64_t align) { - rpfree(p); - (void)sizeof(size); - (void)sizeof(a); -} -#endif -#endif -#endif - -#if USE_INTERPOSE || USE_ALIAS - -static void *rpmalloc_nothrow(size_t size, rp_nothrow_t t) { - (void)sizeof(t); - return rpmalloc(size); -} -static void *rpaligned_alloc_reverse(size_t size, size_t align) { - return rpaligned_alloc(align, size); -} -static void *rpaligned_alloc_reverse_nothrow(size_t size, size_t align, - rp_nothrow_t t) { - (void)sizeof(t); - return rpaligned_alloc(align, size); -} -static void rpfree_size(void *p, size_t size) { - (void)sizeof(size); - rpfree(p); -} -static void rpfree_aligned(void *p, size_t align) { - (void)sizeof(align); - rpfree(p); -} -static void rpfree_size_aligned(void *p, size_t size, size_t align) { - (void)sizeof(size); - (void)sizeof(align); - rpfree(p); -} - -#endif - -#if USE_INTERPOSE - -__attribute__((used)) static const interpose_t macinterpose_malloc[] - __attribute__((section("__DATA, __interpose"))) = { - // new and new[] - MAC_INTERPOSE_PAIR(rpmalloc, _Znwm), - MAC_INTERPOSE_PAIR(rpmalloc, _Znam), - MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse, _Znwmm), - MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse, _Znamm), - MAC_INTERPOSE_PAIR(rpmalloc_nothrow, _ZnwmRKSt9nothrow_t), - MAC_INTERPOSE_PAIR(rpmalloc_nothrow, _ZnamRKSt9nothrow_t), - MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse, _ZnwmSt11align_val_t), - MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse, _ZnamSt11align_val_t), - MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse_nothrow, - _ZnwmSt11align_val_tRKSt9nothrow_t), - MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse_nothrow, - _ZnamSt11align_val_tRKSt9nothrow_t), - // delete and delete[] - MAC_INTERPOSE_PAIR(rpfree, _ZdlPv), MAC_INTERPOSE_PAIR(rpfree, _ZdaPv), - MAC_INTERPOSE_PAIR(rpfree_size, _ZdlPvm), - MAC_INTERPOSE_PAIR(rpfree_size, _ZdaPvm), - MAC_INTERPOSE_PAIR(rpfree_aligned, _ZdlPvSt11align_val_t), - MAC_INTERPOSE_PAIR(rpfree_aligned, _ZdaPvSt11align_val_t), - MAC_INTERPOSE_PAIR(rpfree_size_aligned, _ZdlPvmSt11align_val_t), - MAC_INTERPOSE_PAIR(rpfree_size_aligned, _ZdaPvmSt11align_val_t), - // libc entry points - MAC_INTERPOSE_PAIR(rpmalloc, malloc), - MAC_INTERPOSE_PAIR(rpmalloc, calloc), - MAC_INTERPOSE_PAIR(rprealloc, realloc), - MAC_INTERPOSE_PAIR(rprealloc, reallocf), -#if defined(__MAC_10_15) && __MAC_OS_X_VERSION_MIN_REQUIRED >= __MAC_10_15 - MAC_INTERPOSE_PAIR(rpaligned_alloc, aligned_alloc), -#endif - MAC_INTERPOSE_PAIR(rpmemalign, memalign), - MAC_INTERPOSE_PAIR(rpposix_memalign, posix_memalign), - MAC_INTERPOSE_PAIR(rpfree, free), MAC_INTERPOSE_PAIR(rpfree, cfree), - MAC_INTERPOSE_PAIR(rpmalloc_usable_size, malloc_usable_size), - MAC_INTERPOSE_PAIR(rpmalloc_usable_size, malloc_size)}; - -#endif - -#if USE_ALIAS - -#define RPALIAS(fn) __attribute__((alias(#fn), used, visibility("default"))); - -// Alias the C++ operators using the mangled names -// (https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling) - -// operators delete and delete[] -void _ZdlPv(void *p) RPALIAS(rpfree) void _ZdaPv(void *p) RPALIAS(rpfree) - -#if ARCH_64BIT - // 64-bit operators new and new[], normal and aligned - void *_Znwm(uint64_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(1) - RPALIAS(rpmalloc) void *_Znam(uint64_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(1) RPALIAS(rpmalloc) void *_Znwmm(uint64_t size, - uint64_t align) - RPALIAS(rpaligned_alloc_reverse) void *_Znamm(uint64_t size, - uint64_t align) - RPALIAS(rpaligned_alloc_reverse) void *_ZnwmSt11align_val_t( - size_t size, size_t align) - RPALIAS(rpaligned_alloc_reverse) void *_ZnamSt11align_val_t( - size_t size, size_t align) - RPALIAS(rpaligned_alloc_reverse) void *_ZnwmRKSt9nothrow_t( - size_t size, rp_nothrow_t t) - RPALIAS(rpmalloc_nothrow) void *_ZnamRKSt9nothrow_t( - size_t size, - rp_nothrow_t t) RPALIAS(rpmalloc_nothrow) void - *_ZnwmSt11align_val_tRKSt9nothrow_t(size_t size, - size_t align, - rp_nothrow_t t) - RPALIAS(rpaligned_alloc_reverse_nothrow) void - *_ZnamSt11align_val_tRKSt9nothrow_t( - size_t size, size_t align, - rp_nothrow_t t) - RPALIAS(rpaligned_alloc_reverse_nothrow) - // 64-bit operators delete and delete[], sized and aligned - void _ZdlPvm(void *p, size_t n) RPALIAS(rpfree_size) void _ZdaPvm(void *p, - size_t n) - RPALIAS(rpfree_size) void _ZdlPvSt11align_val_t(void *p, size_t a) - RPALIAS(rpfree_aligned) void _ZdaPvSt11align_val_t(void *p, - size_t a) - RPALIAS(rpfree_aligned) void _ZdlPvmSt11align_val_t(void *p, - size_t n, - size_t a) - RPALIAS(rpfree_size_aligned) void _ZdaPvmSt11align_val_t( - void *p, size_t n, size_t a) - RPALIAS(rpfree_size_aligned) -#else - // 32-bit operators new and new[], normal and aligned - void *_Znwj(uint32_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(1) - RPALIAS(rpmalloc) void *_Znaj(uint32_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(1) RPALIAS(rpmalloc) void *_Znwjj(uint32_t size, - uint32_t align) - RPALIAS(rpaligned_alloc_reverse) void *_Znajj(uint32_t size, - uint32_t align) - RPALIAS(rpaligned_alloc_reverse) void *_ZnwjSt11align_val_t( - size_t size, size_t align) - RPALIAS(rpaligned_alloc_reverse) void *_ZnajSt11align_val_t( - size_t size, size_t align) - RPALIAS(rpaligned_alloc_reverse) void *_ZnwjRKSt9nothrow_t( - size_t size, rp_nothrow_t t) - RPALIAS(rpmalloc_nothrow) void *_ZnajRKSt9nothrow_t( - size_t size, - rp_nothrow_t t) RPALIAS(rpmalloc_nothrow) void - *_ZnwjSt11align_val_tRKSt9nothrow_t(size_t size, - size_t align, - rp_nothrow_t t) - RPALIAS(rpaligned_alloc_reverse_nothrow) void - *_ZnajSt11align_val_tRKSt9nothrow_t( - size_t size, size_t align, - rp_nothrow_t t) - RPALIAS(rpaligned_alloc_reverse_nothrow) - // 32-bit operators delete and delete[], sized and aligned - void _ZdlPvj(void *p, size_t n) RPALIAS(rpfree_size) void _ZdaPvj(void *p, - size_t n) - RPALIAS(rpfree_size) void _ZdlPvSt11align_val_t(void *p, size_t a) - RPALIAS(rpfree_aligned) void _ZdaPvSt11align_val_t(void *p, - size_t a) - RPALIAS(rpfree_aligned) void _ZdlPvjSt11align_val_t(void *p, - size_t n, - size_t a) - RPALIAS(rpfree_size_aligned) void _ZdaPvjSt11align_val_t( - void *p, size_t n, size_t a) - RPALIAS(rpfree_size_aligned) -#endif - - void *malloc(size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(1) - RPALIAS(rpmalloc) void *calloc(size_t count, size_t size) - RPALIAS(rpcalloc) void *realloc(void *ptr, size_t size) - RPALIAS(rprealloc) void *reallocf(void *ptr, size_t size) - RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(2) - RPALIAS(rprealloc) void *aligned_alloc(size_t alignment, size_t size) - RPALIAS(rpaligned_alloc) void *memalign( - size_t alignment, size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(2) - RPALIAS(rpmemalign) int posix_memalign(void **memptr, size_t alignment, - size_t size) - RPALIAS(rpposix_memalign) void free(void *ptr) - RPALIAS(rpfree) void cfree(void *ptr) RPALIAS(rpfree) -#if defined(__ANDROID__) || defined(__FreeBSD__) - size_t - malloc_usable_size(const void *ptr) RPALIAS(rpmalloc_usable_size) -#else - size_t - malloc_usable_size(void *ptr) RPALIAS(rpmalloc_usable_size) -#endif - size_t malloc_size(void *ptr) RPALIAS(rpmalloc_usable_size) - -#endif - - static inline size_t _rpmalloc_page_size(void) { - return _memory_page_size; -} - -extern void *RPMALLOC_CDECL reallocarray(void *ptr, size_t count, size_t size); - -extern void *RPMALLOC_CDECL reallocarray(void *ptr, size_t count, size_t size) { - size_t total; -#if ENABLE_VALIDATE_ARGS -#ifdef _MSC_VER - int err = SizeTMult(count, size, &total); - if ((err != S_OK) || (total >= MAX_ALLOC_SIZE)) { - errno = EINVAL; - return 0; - } -#else - int err = __builtin_umull_overflow(count, size, &total); - if (err || (total >= MAX_ALLOC_SIZE)) { - errno = EINVAL; - return 0; - } -#endif -#else - total = count * size; -#endif - return realloc(ptr, total); -} - -extern inline void *RPMALLOC_CDECL valloc(size_t size) { - get_thread_heap(); - return rpaligned_alloc(_rpmalloc_page_size(), size); -} - -extern inline void *RPMALLOC_CDECL pvalloc(size_t size) { - get_thread_heap(); - const size_t page_size = _rpmalloc_page_size(); - const size_t aligned_size = ((size + page_size - 1) / page_size) * page_size; -#if ENABLE_VALIDATE_ARGS - if (aligned_size < size) { - errno = EINVAL; - return 0; - } -#endif - return rpaligned_alloc(_rpmalloc_page_size(), aligned_size); -} - -#endif // ENABLE_OVERRIDE - -#if ENABLE_PRELOAD - -#ifdef _WIN32 - -#if defined(BUILD_DYNAMIC_LINK) && BUILD_DYNAMIC_LINK - -extern __declspec(dllexport) BOOL WINAPI DllMain(HINSTANCE instance, - DWORD reason, LPVOID reserved); - -extern __declspec(dllexport) BOOL WINAPI DllMain(HINSTANCE instance, - DWORD reason, - LPVOID reserved) { - (void)sizeof(reserved); - (void)sizeof(instance); - if (reason == DLL_PROCESS_ATTACH) - rpmalloc_initialize(); - else if (reason == DLL_PROCESS_DETACH) - rpmalloc_finalize(); - else if (reason == DLL_THREAD_ATTACH) - rpmalloc_thread_initialize(); - else if (reason == DLL_THREAD_DETACH) - rpmalloc_thread_finalize(1); - return TRUE; -} - -// end BUILD_DYNAMIC_LINK -#else - -extern void _global_rpmalloc_init(void) { - rpmalloc_set_main_thread(); - rpmalloc_initialize(); -} - -#if defined(__clang__) || defined(__GNUC__) - -static void __attribute__((constructor)) initializer(void) { - _global_rpmalloc_init(); -} - -#elif defined(_MSC_VER) - -static int _global_rpmalloc_xib(void) { - _global_rpmalloc_init(); - return 0; -} - -#pragma section(".CRT$XIB", read) -__declspec(allocate(".CRT$XIB")) void (*_rpmalloc_module_init)(void) = - _global_rpmalloc_xib; -#if defined(_M_IX86) || defined(__i386__) -#pragma comment(linker, "/include:" \ - "__rpmalloc_module_init") -#else -#pragma comment(linker, "/include:" \ - "_rpmalloc_module_init") -#endif - -#endif - -// end !BUILD_DYNAMIC_LINK -#endif - -#else - -#include -#include -#include -#include - -extern void rpmalloc_set_main_thread(void); - -static pthread_key_t destructor_key; - -static void thread_destructor(void *); - -static void __attribute__((constructor)) initializer(void) { - rpmalloc_set_main_thread(); - rpmalloc_initialize(); - pthread_key_create(&destructor_key, thread_destructor); -} - -static void __attribute__((destructor)) finalizer(void) { rpmalloc_finalize(); } - -typedef struct { - void *(*real_start)(void *); - void *real_arg; -} thread_starter_arg; - -static void *thread_starter(void *argptr) { - thread_starter_arg *arg = argptr; - void *(*real_start)(void *) = arg->real_start; - void *real_arg = arg->real_arg; - rpmalloc_thread_initialize(); - rpfree(argptr); - pthread_setspecific(destructor_key, (void *)1); - return (*real_start)(real_arg); -} - -static void thread_destructor(void *value) { - (void)sizeof(value); - rpmalloc_thread_finalize(1); -} - -#ifdef __APPLE__ - -static int pthread_create_proxy(pthread_t *thread, const pthread_attr_t *attr, - void *(*start_routine)(void *), void *arg) { - rpmalloc_initialize(); - thread_starter_arg *starter_arg = rpmalloc(sizeof(thread_starter_arg)); - starter_arg->real_start = start_routine; - starter_arg->real_arg = arg; - return pthread_create(thread, attr, thread_starter, starter_arg); -} - -MAC_INTERPOSE_SINGLE(pthread_create_proxy, pthread_create); - -#else - -#include - -int pthread_create(pthread_t *thread, const pthread_attr_t *attr, - void *(*start_routine)(void *), void *arg) { -#if defined(__linux__) || defined(__FreeBSD__) || defined(__OpenBSD__) || \ - defined(__NetBSD__) || defined(__DragonFly__) || defined(__APPLE__) || \ - defined(__HAIKU__) - char fname[] = "pthread_create"; -#else - char fname[] = "_pthread_create"; -#endif - void *real_pthread_create = dlsym(RTLD_NEXT, fname); - rpmalloc_thread_initialize(); - thread_starter_arg *starter_arg = rpmalloc(sizeof(thread_starter_arg)); - starter_arg->real_start = start_routine; - starter_arg->real_arg = arg; - return (*(int (*)(pthread_t *, const pthread_attr_t *, void *(*)(void *), - void *))real_pthread_create)(thread, attr, thread_starter, - starter_arg); -} - -#endif - -#endif - -#endif - -#if ENABLE_OVERRIDE - -#if defined(__GLIBC__) && defined(__linux__) - -void *__libc_malloc(size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(1) - RPALIAS(rpmalloc) void *__libc_calloc(size_t count, size_t size) - RPMALLOC_ATTRIB_MALLOC RPMALLOC_ATTRIB_ALLOC_SIZE2(1, 2) - RPALIAS(rpcalloc) void *__libc_realloc(void *p, size_t size) - RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(2) RPALIAS(rprealloc) void __libc_free(void *p) - RPALIAS(rpfree) void __libc_cfree(void *p) - RPALIAS(rpfree) void *__libc_memalign(size_t align, size_t size) - RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(2) - RPALIAS(rpmemalign) int __posix_memalign(void **p, size_t align, - size_t size) - RPALIAS(rpposix_memalign) - - extern void *__libc_valloc(size_t size); -extern void *__libc_pvalloc(size_t size); - -void *__libc_valloc(size_t size) { return valloc(size); } - -void *__libc_pvalloc(size_t size) { return pvalloc(size); } - -#endif - -#endif - -#if (defined(__GNUC__) || defined(__clang__)) -#pragma GCC visibility pop -#endif +//===------------------------ malloc.c ------------------*- C -*-=============// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This library provides a cross-platform lock free thread caching malloc +// implementation in C11. +// +// +// This file provides overrides for the standard library malloc entry points for +// C and new/delete operators for C++ It also provides automatic +// initialization/finalization of process and threads +// +//===----------------------------------------------------------------------===// + +#if defined(__TINYC__) +#include +#endif + +#ifndef ARCH_64BIT +#if defined(__LLP64__) || defined(__LP64__) || defined(_WIN64) +#define ARCH_64BIT 1 +_Static_assert(sizeof(size_t) == 8, "Data type size mismatch"); +_Static_assert(sizeof(void *) == 8, "Data type size mismatch"); +#else +#define ARCH_64BIT 0 +_Static_assert(sizeof(size_t) == 4, "Data type size mismatch"); +_Static_assert(sizeof(void *) == 4, "Data type size mismatch"); +#endif +#endif + +#if (defined(__GNUC__) || defined(__clang__)) +#pragma GCC visibility push(default) +#endif + +#define USE_IMPLEMENT 1 +#define USE_INTERPOSE 0 +#define USE_ALIAS 0 + +#if defined(__APPLE__) +#undef USE_INTERPOSE +#define USE_INTERPOSE 1 + +typedef struct interpose_t { + void *new_func; + void *orig_func; +} interpose_t; + +#define MAC_INTERPOSE_PAIR(newf, oldf) {(void *)newf, (void *)oldf} +#define MAC_INTERPOSE_SINGLE(newf, oldf) \ + __attribute__((used)) static const interpose_t macinterpose##newf##oldf \ + __attribute__((section("__DATA, __interpose"))) = \ + MAC_INTERPOSE_PAIR(newf, oldf) + +#endif + +#if !defined(_WIN32) && !defined(__APPLE__) +#undef USE_IMPLEMENT +#undef USE_ALIAS +#define USE_IMPLEMENT 0 +#define USE_ALIAS 1 +#endif + +#ifdef _MSC_VER +#pragma warning(disable : 4100) +#undef malloc +#undef free +#undef calloc +#define RPMALLOC_RESTRICT __declspec(restrict) +#else +#define RPMALLOC_RESTRICT +#endif + +#if ENABLE_OVERRIDE + +typedef struct rp_nothrow_t { + int __dummy; +} rp_nothrow_t; + +#if USE_IMPLEMENT + +extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL malloc(size_t size) { + return rpmalloc(size); +} +extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL calloc(size_t count, + size_t size) { + return rpcalloc(count, size); +} +extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL realloc(void *ptr, + size_t size) { + return rprealloc(ptr, size); +} +extern inline void *RPMALLOC_CDECL reallocf(void *ptr, size_t size) { + return rprealloc(ptr, size); +} +extern inline void *RPMALLOC_CDECL aligned_alloc(size_t alignment, + size_t size) { + return rpaligned_alloc(alignment, size); +} +extern inline void *RPMALLOC_CDECL memalign(size_t alignment, size_t size) { + return rpmemalign(alignment, size); +} +extern inline int RPMALLOC_CDECL posix_memalign(void **memptr, size_t alignment, + size_t size) { + return rpposix_memalign(memptr, alignment, size); +} +extern inline void RPMALLOC_CDECL free(void *ptr) { rpfree(ptr); } +extern inline void RPMALLOC_CDECL cfree(void *ptr) { rpfree(ptr); } +extern inline size_t RPMALLOC_CDECL malloc_usable_size(void *ptr) { + return rpmalloc_usable_size(ptr); +} +extern inline size_t RPMALLOC_CDECL malloc_size(void *ptr) { + return rpmalloc_usable_size(ptr); +} + +#ifdef _WIN32 +extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL _malloc_base(size_t size) { + return rpmalloc(size); +} +extern inline void RPMALLOC_CDECL _free_base(void *ptr) { rpfree(ptr); } +extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL _calloc_base(size_t count, + size_t size) { + return rpcalloc(count, size); +} +extern inline size_t RPMALLOC_CDECL _msize(void *ptr) { + return rpmalloc_usable_size(ptr); +} +extern inline size_t RPMALLOC_CDECL _msize_base(void *ptr) { + return rpmalloc_usable_size(ptr); +} +extern inline RPMALLOC_RESTRICT void *RPMALLOC_CDECL +_realloc_base(void *ptr, size_t size) { + return rprealloc(ptr, size); +} +#endif + +#ifdef _WIN32 +// For Windows, #include in one source file to get the C++ operator +// overrides implemented in your module +#else +// Overload the C++ operators using the mangled names +// (https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling) operators +// delete and delete[] +#define RPDEFVIS __attribute__((visibility("default"))) +extern void _ZdlPv(void *p); +void RPDEFVIS _ZdlPv(void *p) { rpfree(p); } +extern void _ZdaPv(void *p); +void RPDEFVIS _ZdaPv(void *p) { rpfree(p); } +#if ARCH_64BIT +// 64-bit operators new and new[], normal and aligned +extern void *_Znwm(uint64_t size); +void *RPDEFVIS _Znwm(uint64_t size) { return rpmalloc(size); } +extern void *_Znam(uint64_t size); +void *RPDEFVIS _Znam(uint64_t size) { return rpmalloc(size); } +extern void *_Znwmm(uint64_t size, uint64_t align); +void *RPDEFVIS _Znwmm(uint64_t size, uint64_t align) { + return rpaligned_alloc(align, size); +} +extern void *_Znamm(uint64_t size, uint64_t align); +void *RPDEFVIS _Znamm(uint64_t size, uint64_t align) { + return rpaligned_alloc(align, size); +} +extern void *_ZnwmSt11align_val_t(uint64_t size, uint64_t align); +void *RPDEFVIS _ZnwmSt11align_val_t(uint64_t size, uint64_t align) { + return rpaligned_alloc(align, size); +} +extern void *_ZnamSt11align_val_t(uint64_t size, uint64_t align); +void *RPDEFVIS _ZnamSt11align_val_t(uint64_t size, uint64_t align) { + return rpaligned_alloc(align, size); +} +extern void *_ZnwmRKSt9nothrow_t(uint64_t size, rp_nothrow_t t); +void *RPDEFVIS _ZnwmRKSt9nothrow_t(uint64_t size, rp_nothrow_t t) { + (void)sizeof(t); + return rpmalloc(size); +} +extern void *_ZnamRKSt9nothrow_t(uint64_t size, rp_nothrow_t t); +void *RPDEFVIS _ZnamRKSt9nothrow_t(uint64_t size, rp_nothrow_t t) { + (void)sizeof(t); + return rpmalloc(size); +} +extern void *_ZnwmSt11align_val_tRKSt9nothrow_t(uint64_t size, uint64_t align, + rp_nothrow_t t); +void *RPDEFVIS _ZnwmSt11align_val_tRKSt9nothrow_t(uint64_t size, uint64_t align, + rp_nothrow_t t) { + (void)sizeof(t); + return rpaligned_alloc(align, size); +} +extern void *_ZnamSt11align_val_tRKSt9nothrow_t(uint64_t size, uint64_t align, + rp_nothrow_t t); +void *RPDEFVIS _ZnamSt11align_val_tRKSt9nothrow_t(uint64_t size, uint64_t align, + rp_nothrow_t t) { + (void)sizeof(t); + return rpaligned_alloc(align, size); +} +// 64-bit operators sized delete and delete[], normal and aligned +extern void _ZdlPvm(void *p, uint64_t size); +void RPDEFVIS _ZdlPvm(void *p, uint64_t size) { + rpfree(p); + (void)sizeof(size); +} +extern void _ZdaPvm(void *p, uint64_t size); +void RPDEFVIS _ZdaPvm(void *p, uint64_t size) { + rpfree(p); + (void)sizeof(size); +} +extern void _ZdlPvSt11align_val_t(void *p, uint64_t align); +void RPDEFVIS _ZdlPvSt11align_val_t(void *p, uint64_t align) { + rpfree(p); + (void)sizeof(align); +} +extern void _ZdaPvSt11align_val_t(void *p, uint64_t align); +void RPDEFVIS _ZdaPvSt11align_val_t(void *p, uint64_t align) { + rpfree(p); + (void)sizeof(align); +} +extern void _ZdlPvmSt11align_val_t(void *p, uint64_t size, uint64_t align); +void RPDEFVIS _ZdlPvmSt11align_val_t(void *p, uint64_t size, uint64_t align) { + rpfree(p); + (void)sizeof(size); + (void)sizeof(align); +} +extern void _ZdaPvmSt11align_val_t(void *p, uint64_t size, uint64_t align); +void RPDEFVIS _ZdaPvmSt11align_val_t(void *p, uint64_t size, uint64_t align) { + rpfree(p); + (void)sizeof(size); + (void)sizeof(align); +} +#else +// 32-bit operators new and new[], normal and aligned +extern void *_Znwj(uint32_t size); +void *RPDEFVIS _Znwj(uint32_t size) { return rpmalloc(size); } +extern void *_Znaj(uint32_t size); +void *RPDEFVIS _Znaj(uint32_t size) { return rpmalloc(size); } +extern void *_Znwjj(uint32_t size, uint32_t align); +void *RPDEFVIS _Znwjj(uint32_t size, uint32_t align) { + return rpaligned_alloc(align, size); +} +extern void *_Znajj(uint32_t size, uint32_t align); +void *RPDEFVIS _Znajj(uint32_t size, uint32_t align) { + return rpaligned_alloc(align, size); +} +extern void *_ZnwjSt11align_val_t(size_t size, size_t align); +void *RPDEFVIS _ZnwjSt11align_val_t(size_t size, size_t align) { + return rpaligned_alloc(align, size); +} +extern void *_ZnajSt11align_val_t(size_t size, size_t align); +void *RPDEFVIS _ZnajSt11align_val_t(size_t size, size_t align) { + return rpaligned_alloc(align, size); +} +extern void *_ZnwjRKSt9nothrow_t(size_t size, rp_nothrow_t t); +void *RPDEFVIS _ZnwjRKSt9nothrow_t(size_t size, rp_nothrow_t t) { + (void)sizeof(t); + return rpmalloc(size); +} +extern void *_ZnajRKSt9nothrow_t(size_t size, rp_nothrow_t t); +void *RPDEFVIS _ZnajRKSt9nothrow_t(size_t size, rp_nothrow_t t) { + (void)sizeof(t); + return rpmalloc(size); +} +extern void *_ZnwjSt11align_val_tRKSt9nothrow_t(size_t size, size_t align, + rp_nothrow_t t); +void *RPDEFVIS _ZnwjSt11align_val_tRKSt9nothrow_t(size_t size, size_t align, + rp_nothrow_t t) { + (void)sizeof(t); + return rpaligned_alloc(align, size); +} +extern void *_ZnajSt11align_val_tRKSt9nothrow_t(size_t size, size_t align, + rp_nothrow_t t); +void *RPDEFVIS _ZnajSt11align_val_tRKSt9nothrow_t(size_t size, size_t align, + rp_nothrow_t t) { + (void)sizeof(t); + return rpaligned_alloc(align, size); +} +// 32-bit operators sized delete and delete[], normal and aligned +extern void _ZdlPvj(void *p, uint64_t size); +void RPDEFVIS _ZdlPvj(void *p, uint64_t size) { + rpfree(p); + (void)sizeof(size); +} +extern void _ZdaPvj(void *p, uint64_t size); +void RPDEFVIS _ZdaPvj(void *p, uint64_t size) { + rpfree(p); + (void)sizeof(size); +} +extern void _ZdlPvSt11align_val_t(void *p, uint32_t align); +void RPDEFVIS _ZdlPvSt11align_val_t(void *p, uint64_t a) { + rpfree(p); + (void)sizeof(align); +} +extern void _ZdaPvSt11align_val_t(void *p, uint32_t align); +void RPDEFVIS _ZdaPvSt11align_val_t(void *p, uint64_t a) { + rpfree(p); + (void)sizeof(align); +} +extern void _ZdlPvjSt11align_val_t(void *p, uint32_t size, uint32_t align); +void RPDEFVIS _ZdlPvjSt11align_val_t(void *p, uint64_t size, uint64_t align) { + rpfree(p); + (void)sizeof(size); + (void)sizeof(a); +} +extern void _ZdaPvjSt11align_val_t(void *p, uint32_t size, uint32_t align); +void RPDEFVIS _ZdaPvjSt11align_val_t(void *p, uint64_t size, uint64_t align) { + rpfree(p); + (void)sizeof(size); + (void)sizeof(a); +} +#endif +#endif +#endif + +#if USE_INTERPOSE || USE_ALIAS + +static void *rpmalloc_nothrow(size_t size, rp_nothrow_t t) { + (void)sizeof(t); + return rpmalloc(size); +} +static void *rpaligned_alloc_reverse(size_t size, size_t align) { + return rpaligned_alloc(align, size); +} +static void *rpaligned_alloc_reverse_nothrow(size_t size, size_t align, + rp_nothrow_t t) { + (void)sizeof(t); + return rpaligned_alloc(align, size); +} +static void rpfree_size(void *p, size_t size) { + (void)sizeof(size); + rpfree(p); +} +static void rpfree_aligned(void *p, size_t align) { + (void)sizeof(align); + rpfree(p); +} +static void rpfree_size_aligned(void *p, size_t size, size_t align) { + (void)sizeof(size); + (void)sizeof(align); + rpfree(p); +} + +#endif + +#if USE_INTERPOSE + +__attribute__((used)) static const interpose_t macinterpose_malloc[] + __attribute__((section("__DATA, __interpose"))) = { + // new and new[] + MAC_INTERPOSE_PAIR(rpmalloc, _Znwm), + MAC_INTERPOSE_PAIR(rpmalloc, _Znam), + MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse, _Znwmm), + MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse, _Znamm), + MAC_INTERPOSE_PAIR(rpmalloc_nothrow, _ZnwmRKSt9nothrow_t), + MAC_INTERPOSE_PAIR(rpmalloc_nothrow, _ZnamRKSt9nothrow_t), + MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse, _ZnwmSt11align_val_t), + MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse, _ZnamSt11align_val_t), + MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse_nothrow, + _ZnwmSt11align_val_tRKSt9nothrow_t), + MAC_INTERPOSE_PAIR(rpaligned_alloc_reverse_nothrow, + _ZnamSt11align_val_tRKSt9nothrow_t), + // delete and delete[] + MAC_INTERPOSE_PAIR(rpfree, _ZdlPv), MAC_INTERPOSE_PAIR(rpfree, _ZdaPv), + MAC_INTERPOSE_PAIR(rpfree_size, _ZdlPvm), + MAC_INTERPOSE_PAIR(rpfree_size, _ZdaPvm), + MAC_INTERPOSE_PAIR(rpfree_aligned, _ZdlPvSt11align_val_t), + MAC_INTERPOSE_PAIR(rpfree_aligned, _ZdaPvSt11align_val_t), + MAC_INTERPOSE_PAIR(rpfree_size_aligned, _ZdlPvmSt11align_val_t), + MAC_INTERPOSE_PAIR(rpfree_size_aligned, _ZdaPvmSt11align_val_t), + // libc entry points + MAC_INTERPOSE_PAIR(rpmalloc, malloc), + MAC_INTERPOSE_PAIR(rpmalloc, calloc), + MAC_INTERPOSE_PAIR(rprealloc, realloc), + MAC_INTERPOSE_PAIR(rprealloc, reallocf), +#if defined(__MAC_10_15) && __MAC_OS_X_VERSION_MIN_REQUIRED >= __MAC_10_15 + MAC_INTERPOSE_PAIR(rpaligned_alloc, aligned_alloc), +#endif + MAC_INTERPOSE_PAIR(rpmemalign, memalign), + MAC_INTERPOSE_PAIR(rpposix_memalign, posix_memalign), + MAC_INTERPOSE_PAIR(rpfree, free), MAC_INTERPOSE_PAIR(rpfree, cfree), + MAC_INTERPOSE_PAIR(rpmalloc_usable_size, malloc_usable_size), + MAC_INTERPOSE_PAIR(rpmalloc_usable_size, malloc_size)}; + +#endif + +#if USE_ALIAS + +#define RPALIAS(fn) __attribute__((alias(#fn), used, visibility("default"))); + +// Alias the C++ operators using the mangled names +// (https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling) + +// operators delete and delete[] +void _ZdlPv(void *p) RPALIAS(rpfree) void _ZdaPv(void *p) RPALIAS(rpfree) + +#if ARCH_64BIT + // 64-bit operators new and new[], normal and aligned + void *_Znwm(uint64_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(1) + RPALIAS(rpmalloc) void *_Znam(uint64_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(1) RPALIAS(rpmalloc) void *_Znwmm(uint64_t size, + uint64_t align) + RPALIAS(rpaligned_alloc_reverse) void *_Znamm(uint64_t size, + uint64_t align) + RPALIAS(rpaligned_alloc_reverse) void *_ZnwmSt11align_val_t( + size_t size, size_t align) + RPALIAS(rpaligned_alloc_reverse) void *_ZnamSt11align_val_t( + size_t size, size_t align) + RPALIAS(rpaligned_alloc_reverse) void *_ZnwmRKSt9nothrow_t( + size_t size, rp_nothrow_t t) + RPALIAS(rpmalloc_nothrow) void *_ZnamRKSt9nothrow_t( + size_t size, + rp_nothrow_t t) RPALIAS(rpmalloc_nothrow) void + *_ZnwmSt11align_val_tRKSt9nothrow_t(size_t size, + size_t align, + rp_nothrow_t t) + RPALIAS(rpaligned_alloc_reverse_nothrow) void + *_ZnamSt11align_val_tRKSt9nothrow_t( + size_t size, size_t align, + rp_nothrow_t t) + RPALIAS(rpaligned_alloc_reverse_nothrow) + // 64-bit operators delete and delete[], sized and aligned + void _ZdlPvm(void *p, size_t n) RPALIAS(rpfree_size) void _ZdaPvm(void *p, + size_t n) + RPALIAS(rpfree_size) void _ZdlPvSt11align_val_t(void *p, size_t a) + RPALIAS(rpfree_aligned) void _ZdaPvSt11align_val_t(void *p, + size_t a) + RPALIAS(rpfree_aligned) void _ZdlPvmSt11align_val_t(void *p, + size_t n, + size_t a) + RPALIAS(rpfree_size_aligned) void _ZdaPvmSt11align_val_t( + void *p, size_t n, size_t a) + RPALIAS(rpfree_size_aligned) +#else + // 32-bit operators new and new[], normal and aligned + void *_Znwj(uint32_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(1) + RPALIAS(rpmalloc) void *_Znaj(uint32_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(1) RPALIAS(rpmalloc) void *_Znwjj(uint32_t size, + uint32_t align) + RPALIAS(rpaligned_alloc_reverse) void *_Znajj(uint32_t size, + uint32_t align) + RPALIAS(rpaligned_alloc_reverse) void *_ZnwjSt11align_val_t( + size_t size, size_t align) + RPALIAS(rpaligned_alloc_reverse) void *_ZnajSt11align_val_t( + size_t size, size_t align) + RPALIAS(rpaligned_alloc_reverse) void *_ZnwjRKSt9nothrow_t( + size_t size, rp_nothrow_t t) + RPALIAS(rpmalloc_nothrow) void *_ZnajRKSt9nothrow_t( + size_t size, + rp_nothrow_t t) RPALIAS(rpmalloc_nothrow) void + *_ZnwjSt11align_val_tRKSt9nothrow_t(size_t size, + size_t align, + rp_nothrow_t t) + RPALIAS(rpaligned_alloc_reverse_nothrow) void + *_ZnajSt11align_val_tRKSt9nothrow_t( + size_t size, size_t align, + rp_nothrow_t t) + RPALIAS(rpaligned_alloc_reverse_nothrow) + // 32-bit operators delete and delete[], sized and aligned + void _ZdlPvj(void *p, size_t n) RPALIAS(rpfree_size) void _ZdaPvj(void *p, + size_t n) + RPALIAS(rpfree_size) void _ZdlPvSt11align_val_t(void *p, size_t a) + RPALIAS(rpfree_aligned) void _ZdaPvSt11align_val_t(void *p, + size_t a) + RPALIAS(rpfree_aligned) void _ZdlPvjSt11align_val_t(void *p, + size_t n, + size_t a) + RPALIAS(rpfree_size_aligned) void _ZdaPvjSt11align_val_t( + void *p, size_t n, size_t a) + RPALIAS(rpfree_size_aligned) +#endif + + void *malloc(size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(1) + RPALIAS(rpmalloc) void *calloc(size_t count, size_t size) + RPALIAS(rpcalloc) void *realloc(void *ptr, size_t size) + RPALIAS(rprealloc) void *reallocf(void *ptr, size_t size) + RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(2) + RPALIAS(rprealloc) void *aligned_alloc(size_t alignment, size_t size) + RPALIAS(rpaligned_alloc) void *memalign( + size_t alignment, size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(2) + RPALIAS(rpmemalign) int posix_memalign(void **memptr, size_t alignment, + size_t size) + RPALIAS(rpposix_memalign) void free(void *ptr) + RPALIAS(rpfree) void cfree(void *ptr) RPALIAS(rpfree) +#if defined(__ANDROID__) || defined(__FreeBSD__) + size_t + malloc_usable_size(const void *ptr) RPALIAS(rpmalloc_usable_size) +#else + size_t + malloc_usable_size(void *ptr) RPALIAS(rpmalloc_usable_size) +#endif + size_t malloc_size(void *ptr) RPALIAS(rpmalloc_usable_size) + +#endif + + static inline size_t _rpmalloc_page_size(void) { + return _memory_page_size; +} + +extern void *RPMALLOC_CDECL reallocarray(void *ptr, size_t count, size_t size); + +extern void *RPMALLOC_CDECL reallocarray(void *ptr, size_t count, size_t size) { + size_t total; +#if ENABLE_VALIDATE_ARGS +#ifdef _MSC_VER + int err = SizeTMult(count, size, &total); + if ((err != S_OK) || (total >= MAX_ALLOC_SIZE)) { + errno = EINVAL; + return 0; + } +#else + int err = __builtin_umull_overflow(count, size, &total); + if (err || (total >= MAX_ALLOC_SIZE)) { + errno = EINVAL; + return 0; + } +#endif +#else + total = count * size; +#endif + return realloc(ptr, total); +} + +extern inline void *RPMALLOC_CDECL valloc(size_t size) { + get_thread_heap(); + return rpaligned_alloc(_rpmalloc_page_size(), size); +} + +extern inline void *RPMALLOC_CDECL pvalloc(size_t size) { + get_thread_heap(); + const size_t page_size = _rpmalloc_page_size(); + const size_t aligned_size = ((size + page_size - 1) / page_size) * page_size; +#if ENABLE_VALIDATE_ARGS + if (aligned_size < size) { + errno = EINVAL; + return 0; + } +#endif + return rpaligned_alloc(_rpmalloc_page_size(), aligned_size); +} + +#endif // ENABLE_OVERRIDE + +#if ENABLE_PRELOAD + +#ifdef _WIN32 + +#if defined(BUILD_DYNAMIC_LINK) && BUILD_DYNAMIC_LINK + +extern __declspec(dllexport) BOOL WINAPI DllMain(HINSTANCE instance, + DWORD reason, LPVOID reserved); + +extern __declspec(dllexport) BOOL WINAPI DllMain(HINSTANCE instance, + DWORD reason, + LPVOID reserved) { + (void)sizeof(reserved); + (void)sizeof(instance); + if (reason == DLL_PROCESS_ATTACH) + rpmalloc_initialize(); + else if (reason == DLL_PROCESS_DETACH) + rpmalloc_finalize(); + else if (reason == DLL_THREAD_ATTACH) + rpmalloc_thread_initialize(); + else if (reason == DLL_THREAD_DETACH) + rpmalloc_thread_finalize(1); + return TRUE; +} + +// end BUILD_DYNAMIC_LINK +#else + +extern void _global_rpmalloc_init(void) { + rpmalloc_set_main_thread(); + rpmalloc_initialize(); +} + +#if defined(__clang__) || defined(__GNUC__) + +static void __attribute__((constructor)) initializer(void) { + _global_rpmalloc_init(); +} + +#elif defined(_MSC_VER) + +static int _global_rpmalloc_xib(void) { + _global_rpmalloc_init(); + return 0; +} + +#pragma section(".CRT$XIB", read) +__declspec(allocate(".CRT$XIB")) void (*_rpmalloc_module_init)(void) = + _global_rpmalloc_xib; +#if defined(_M_IX86) || defined(__i386__) +#pragma comment(linker, "/include:" \ + "__rpmalloc_module_init") +#else +#pragma comment(linker, "/include:" \ + "_rpmalloc_module_init") +#endif + +#endif + +// end !BUILD_DYNAMIC_LINK +#endif + +#else + +#include +#include +#include +#include + +extern void rpmalloc_set_main_thread(void); + +static pthread_key_t destructor_key; + +static void thread_destructor(void *); + +static void __attribute__((constructor)) initializer(void) { + rpmalloc_set_main_thread(); + rpmalloc_initialize(); + pthread_key_create(&destructor_key, thread_destructor); +} + +static void __attribute__((destructor)) finalizer(void) { rpmalloc_finalize(); } + +typedef struct { + void *(*real_start)(void *); + void *real_arg; +} thread_starter_arg; + +static void *thread_starter(void *argptr) { + thread_starter_arg *arg = argptr; + void *(*real_start)(void *) = arg->real_start; + void *real_arg = arg->real_arg; + rpmalloc_thread_initialize(); + rpfree(argptr); + pthread_setspecific(destructor_key, (void *)1); + return (*real_start)(real_arg); +} + +static void thread_destructor(void *value) { + (void)sizeof(value); + rpmalloc_thread_finalize(1); +} + +#ifdef __APPLE__ + +static int pthread_create_proxy(pthread_t *thread, const pthread_attr_t *attr, + void *(*start_routine)(void *), void *arg) { + rpmalloc_initialize(); + thread_starter_arg *starter_arg = rpmalloc(sizeof(thread_starter_arg)); + starter_arg->real_start = start_routine; + starter_arg->real_arg = arg; + return pthread_create(thread, attr, thread_starter, starter_arg); +} + +MAC_INTERPOSE_SINGLE(pthread_create_proxy, pthread_create); + +#else + +#include + +int pthread_create(pthread_t *thread, const pthread_attr_t *attr, + void *(*start_routine)(void *), void *arg) { +#if defined(__linux__) || defined(__FreeBSD__) || defined(__OpenBSD__) || \ + defined(__NetBSD__) || defined(__DragonFly__) || defined(__APPLE__) || \ + defined(__HAIKU__) + char fname[] = "pthread_create"; +#else + char fname[] = "_pthread_create"; +#endif + void *real_pthread_create = dlsym(RTLD_NEXT, fname); + rpmalloc_thread_initialize(); + thread_starter_arg *starter_arg = rpmalloc(sizeof(thread_starter_arg)); + starter_arg->real_start = start_routine; + starter_arg->real_arg = arg; + return (*(int (*)(pthread_t *, const pthread_attr_t *, void *(*)(void *), + void *))real_pthread_create)(thread, attr, thread_starter, + starter_arg); +} + +#endif + +#endif + +#endif + +#if ENABLE_OVERRIDE + +#if defined(__GLIBC__) && defined(__linux__) + +void *__libc_malloc(size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(1) + RPALIAS(rpmalloc) void *__libc_calloc(size_t count, size_t size) + RPMALLOC_ATTRIB_MALLOC RPMALLOC_ATTRIB_ALLOC_SIZE2(1, 2) + RPALIAS(rpcalloc) void *__libc_realloc(void *p, size_t size) + RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(2) RPALIAS(rprealloc) void __libc_free(void *p) + RPALIAS(rpfree) void __libc_cfree(void *p) + RPALIAS(rpfree) void *__libc_memalign(size_t align, size_t size) + RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(2) + RPALIAS(rpmemalign) int __posix_memalign(void **p, size_t align, + size_t size) + RPALIAS(rpposix_memalign) + + extern void *__libc_valloc(size_t size); +extern void *__libc_pvalloc(size_t size); + +void *__libc_valloc(size_t size) { return valloc(size); } + +void *__libc_pvalloc(size_t size) { return pvalloc(size); } + +#endif + +#endif + +#if (defined(__GNUC__) || defined(__clang__)) +#pragma GCC visibility pop +#endif diff --git a/llvm/lib/Support/rpmalloc/rpmalloc.c b/llvm/lib/Support/rpmalloc/rpmalloc.c index a06d3cdb5b52ef..0976ec8ae6af4e 100644 --- a/llvm/lib/Support/rpmalloc/rpmalloc.c +++ b/llvm/lib/Support/rpmalloc/rpmalloc.c @@ -1,3992 +1,3992 @@ -//===---------------------- rpmalloc.c ------------------*- C -*-=============// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// -// -// This library provides a cross-platform lock free thread caching malloc -// implementation in C11. -// -//===----------------------------------------------------------------------===// - -#include "rpmalloc.h" - -//////////// -/// -/// Build time configurable limits -/// -////// - -#if defined(__clang__) -#pragma clang diagnostic ignored "-Wunused-macros" -#pragma clang diagnostic ignored "-Wunused-function" -#if __has_warning("-Wreserved-identifier") -#pragma clang diagnostic ignored "-Wreserved-identifier" -#endif -#if __has_warning("-Wstatic-in-inline") -#pragma clang diagnostic ignored "-Wstatic-in-inline" -#endif -#elif defined(__GNUC__) -#pragma GCC diagnostic ignored "-Wunused-macros" -#pragma GCC diagnostic ignored "-Wunused-function" -#endif - -#if !defined(__has_builtin) -#define __has_builtin(b) 0 -#endif - -#if defined(__GNUC__) || defined(__clang__) - -#if __has_builtin(__builtin_memcpy_inline) -#define _rpmalloc_memcpy_const(x, y, s) __builtin_memcpy_inline(x, y, s) -#else -#define _rpmalloc_memcpy_const(x, y, s) \ - do { \ - _Static_assert(__builtin_choose_expr(__builtin_constant_p(s), 1, 0), \ - "len must be a constant integer"); \ - memcpy(x, y, s); \ - } while (0) -#endif - -#if __has_builtin(__builtin_memset_inline) -#define _rpmalloc_memset_const(x, y, s) __builtin_memset_inline(x, y, s) -#else -#define _rpmalloc_memset_const(x, y, s) \ - do { \ - _Static_assert(__builtin_choose_expr(__builtin_constant_p(s), 1, 0), \ - "len must be a constant integer"); \ - memset(x, y, s); \ - } while (0) -#endif -#else -#define _rpmalloc_memcpy_const(x, y, s) memcpy(x, y, s) -#define _rpmalloc_memset_const(x, y, s) memset(x, y, s) -#endif - -#if __has_builtin(__builtin_assume) -#define rpmalloc_assume(cond) __builtin_assume(cond) -#elif defined(__GNUC__) -#define rpmalloc_assume(cond) \ - do { \ - if (!__builtin_expect(cond, 0)) \ - __builtin_unreachable(); \ - } while (0) -#elif defined(_MSC_VER) -#define rpmalloc_assume(cond) __assume(cond) -#else -#define rpmalloc_assume(cond) 0 -#endif - -#ifndef HEAP_ARRAY_SIZE -//! Size of heap hashmap -#define HEAP_ARRAY_SIZE 47 -#endif -#ifndef ENABLE_THREAD_CACHE -//! Enable per-thread cache -#define ENABLE_THREAD_CACHE 1 -#endif -#ifndef ENABLE_GLOBAL_CACHE -//! Enable global cache shared between all threads, requires thread cache -#define ENABLE_GLOBAL_CACHE 1 -#endif -#ifndef ENABLE_VALIDATE_ARGS -//! Enable validation of args to public entry points -#define ENABLE_VALIDATE_ARGS 0 -#endif -#ifndef ENABLE_STATISTICS -//! Enable statistics collection -#define ENABLE_STATISTICS 0 -#endif -#ifndef ENABLE_ASSERTS -//! Enable asserts -#define ENABLE_ASSERTS 0 -#endif -#ifndef ENABLE_OVERRIDE -//! Override standard library malloc/free and new/delete entry points -#define ENABLE_OVERRIDE 0 -#endif -#ifndef ENABLE_PRELOAD -//! Support preloading -#define ENABLE_PRELOAD 0 -#endif -#ifndef DISABLE_UNMAP -//! Disable unmapping memory pages (also enables unlimited cache) -#define DISABLE_UNMAP 0 -#endif -#ifndef ENABLE_UNLIMITED_CACHE -//! Enable unlimited global cache (no unmapping until finalization) -#define ENABLE_UNLIMITED_CACHE 0 -#endif -#ifndef ENABLE_ADAPTIVE_THREAD_CACHE -//! Enable adaptive thread cache size based on use heuristics -#define ENABLE_ADAPTIVE_THREAD_CACHE 0 -#endif -#ifndef DEFAULT_SPAN_MAP_COUNT -//! Default number of spans to map in call to map more virtual memory (default -//! values yield 4MiB here) -#define DEFAULT_SPAN_MAP_COUNT 64 -#endif -#ifndef GLOBAL_CACHE_MULTIPLIER -//! Multiplier for global cache -#define GLOBAL_CACHE_MULTIPLIER 8 -#endif - -#if DISABLE_UNMAP && !ENABLE_GLOBAL_CACHE -#error Must use global cache if unmap is disabled -#endif - -#if DISABLE_UNMAP -#undef ENABLE_UNLIMITED_CACHE -#define ENABLE_UNLIMITED_CACHE 1 -#endif - -#if !ENABLE_GLOBAL_CACHE -#undef ENABLE_UNLIMITED_CACHE -#define ENABLE_UNLIMITED_CACHE 0 -#endif - -#if !ENABLE_THREAD_CACHE -#undef ENABLE_ADAPTIVE_THREAD_CACHE -#define ENABLE_ADAPTIVE_THREAD_CACHE 0 -#endif - -#if defined(_WIN32) || defined(__WIN32__) || defined(_WIN64) -#define PLATFORM_WINDOWS 1 -#define PLATFORM_POSIX 0 -#else -#define PLATFORM_WINDOWS 0 -#define PLATFORM_POSIX 1 -#endif - -/// Platform and arch specifics -#if defined(_MSC_VER) && !defined(__clang__) -#pragma warning(disable : 5105) -#ifndef FORCEINLINE -#define FORCEINLINE inline __forceinline -#endif -#define _Static_assert static_assert -#else -#ifndef FORCEINLINE -#define FORCEINLINE inline __attribute__((__always_inline__)) -#endif -#endif -#if PLATFORM_WINDOWS -#ifndef WIN32_LEAN_AND_MEAN -#define WIN32_LEAN_AND_MEAN -#endif -#include -#if ENABLE_VALIDATE_ARGS -#include -#endif -#else -#include -#include -#include -#include -#if defined(__linux__) || defined(__ANDROID__) -#include -#if !defined(PR_SET_VMA) -#define PR_SET_VMA 0x53564d41 -#define PR_SET_VMA_ANON_NAME 0 -#endif -#endif -#if defined(__APPLE__) -#include -#if !TARGET_OS_IPHONE && !TARGET_OS_SIMULATOR -#include -#include -#endif -#include -#endif -#if defined(__HAIKU__) || defined(__TINYC__) -#include -#endif -#endif - -#include -#include -#include - -#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) -#include -static DWORD fls_key; -#endif - -#if PLATFORM_POSIX -#include -#include -#ifdef __FreeBSD__ -#include -#define MAP_HUGETLB MAP_ALIGNED_SUPER -#ifndef PROT_MAX -#define PROT_MAX(f) 0 -#endif -#else -#define PROT_MAX(f) 0 -#endif -#ifdef __sun -extern int madvise(caddr_t, size_t, int); -#endif -#ifndef MAP_UNINITIALIZED -#define MAP_UNINITIALIZED 0 -#endif -#endif -#include - -#if ENABLE_ASSERTS -#undef NDEBUG -#if defined(_MSC_VER) && !defined(_DEBUG) -#define _DEBUG -#endif -#include -#define RPMALLOC_TOSTRING_M(x) #x -#define RPMALLOC_TOSTRING(x) RPMALLOC_TOSTRING_M(x) -#define rpmalloc_assert(truth, message) \ - do { \ - if (!(truth)) { \ - if (_memory_config.error_callback) { \ - _memory_config.error_callback(message " (" RPMALLOC_TOSTRING( \ - truth) ") at " __FILE__ ":" RPMALLOC_TOSTRING(__LINE__)); \ - } else { \ - assert((truth) && message); \ - } \ - } \ - } while (0) -#else -#define rpmalloc_assert(truth, message) \ - do { \ - } while (0) -#endif -#if ENABLE_STATISTICS -#include -#endif - -////// -/// -/// Atomic access abstraction (since MSVC does not do C11 yet) -/// -////// - -#if defined(_MSC_VER) && !defined(__clang__) - -typedef volatile long atomic32_t; -typedef volatile long long atomic64_t; -typedef volatile void *atomicptr_t; - -static FORCEINLINE int32_t atomic_load32(atomic32_t *src) { return *src; } -static FORCEINLINE void atomic_store32(atomic32_t *dst, int32_t val) { - *dst = val; -} -static FORCEINLINE int32_t atomic_incr32(atomic32_t *val) { - return (int32_t)InterlockedIncrement(val); -} -static FORCEINLINE int32_t atomic_decr32(atomic32_t *val) { - return (int32_t)InterlockedDecrement(val); -} -static FORCEINLINE int32_t atomic_add32(atomic32_t *val, int32_t add) { - return (int32_t)InterlockedExchangeAdd(val, add) + add; -} -static FORCEINLINE int atomic_cas32_acquire(atomic32_t *dst, int32_t val, - int32_t ref) { - return (InterlockedCompareExchange(dst, val, ref) == ref) ? 1 : 0; -} -static FORCEINLINE void atomic_store32_release(atomic32_t *dst, int32_t val) { - *dst = val; -} -static FORCEINLINE int64_t atomic_load64(atomic64_t *src) { return *src; } -static FORCEINLINE int64_t atomic_add64(atomic64_t *val, int64_t add) { - return (int64_t)InterlockedExchangeAdd64(val, add) + add; -} -static FORCEINLINE void *atomic_load_ptr(atomicptr_t *src) { - return (void *)*src; -} -static FORCEINLINE void atomic_store_ptr(atomicptr_t *dst, void *val) { - *dst = val; -} -static FORCEINLINE void atomic_store_ptr_release(atomicptr_t *dst, void *val) { - *dst = val; -} -static FORCEINLINE void *atomic_exchange_ptr_acquire(atomicptr_t *dst, - void *val) { - return (void *)InterlockedExchangePointer((void *volatile *)dst, val); -} -static FORCEINLINE int atomic_cas_ptr(atomicptr_t *dst, void *val, void *ref) { - return (InterlockedCompareExchangePointer((void *volatile *)dst, val, ref) == - ref) - ? 1 - : 0; -} - -#define EXPECTED(x) (x) -#define UNEXPECTED(x) (x) - -#else - -#include - -typedef volatile _Atomic(int32_t) atomic32_t; -typedef volatile _Atomic(int64_t) atomic64_t; -typedef volatile _Atomic(void *) atomicptr_t; - -static FORCEINLINE int32_t atomic_load32(atomic32_t *src) { - return atomic_load_explicit(src, memory_order_relaxed); -} -static FORCEINLINE void atomic_store32(atomic32_t *dst, int32_t val) { - atomic_store_explicit(dst, val, memory_order_relaxed); -} -static FORCEINLINE int32_t atomic_incr32(atomic32_t *val) { - return atomic_fetch_add_explicit(val, 1, memory_order_relaxed) + 1; -} -static FORCEINLINE int32_t atomic_decr32(atomic32_t *val) { - return atomic_fetch_add_explicit(val, -1, memory_order_relaxed) - 1; -} -static FORCEINLINE int32_t atomic_add32(atomic32_t *val, int32_t add) { - return atomic_fetch_add_explicit(val, add, memory_order_relaxed) + add; -} -static FORCEINLINE int atomic_cas32_acquire(atomic32_t *dst, int32_t val, - int32_t ref) { - return atomic_compare_exchange_weak_explicit( - dst, &ref, val, memory_order_acquire, memory_order_relaxed); -} -static FORCEINLINE void atomic_store32_release(atomic32_t *dst, int32_t val) { - atomic_store_explicit(dst, val, memory_order_release); -} -static FORCEINLINE int64_t atomic_load64(atomic64_t *val) { - return atomic_load_explicit(val, memory_order_relaxed); -} -static FORCEINLINE int64_t atomic_add64(atomic64_t *val, int64_t add) { - return atomic_fetch_add_explicit(val, add, memory_order_relaxed) + add; -} -static FORCEINLINE void *atomic_load_ptr(atomicptr_t *src) { - return atomic_load_explicit(src, memory_order_relaxed); -} -static FORCEINLINE void atomic_store_ptr(atomicptr_t *dst, void *val) { - atomic_store_explicit(dst, val, memory_order_relaxed); -} -static FORCEINLINE void atomic_store_ptr_release(atomicptr_t *dst, void *val) { - atomic_store_explicit(dst, val, memory_order_release); -} -static FORCEINLINE void *atomic_exchange_ptr_acquire(atomicptr_t *dst, - void *val) { - return atomic_exchange_explicit(dst, val, memory_order_acquire); -} -static FORCEINLINE int atomic_cas_ptr(atomicptr_t *dst, void *val, void *ref) { - return atomic_compare_exchange_weak_explicit( - dst, &ref, val, memory_order_relaxed, memory_order_relaxed); -} - -#define EXPECTED(x) __builtin_expect((x), 1) -#define UNEXPECTED(x) __builtin_expect((x), 0) - -#endif - -//////////// -/// -/// Statistics related functions (evaluate to nothing when statistics not -/// enabled) -/// -////// - -#if ENABLE_STATISTICS -#define _rpmalloc_stat_inc(counter) atomic_incr32(counter) -#define _rpmalloc_stat_dec(counter) atomic_decr32(counter) -#define _rpmalloc_stat_add(counter, value) \ - atomic_add32(counter, (int32_t)(value)) -#define _rpmalloc_stat_add64(counter, value) \ - atomic_add64(counter, (int64_t)(value)) -#define _rpmalloc_stat_add_peak(counter, value, peak) \ - do { \ - int32_t _cur_count = atomic_add32(counter, (int32_t)(value)); \ - if (_cur_count > (peak)) \ - peak = _cur_count; \ - } while (0) -#define _rpmalloc_stat_sub(counter, value) \ - atomic_add32(counter, -(int32_t)(value)) -#define _rpmalloc_stat_inc_alloc(heap, class_idx) \ - do { \ - int32_t alloc_current = \ - atomic_incr32(&heap->size_class_use[class_idx].alloc_current); \ - if (alloc_current > heap->size_class_use[class_idx].alloc_peak) \ - heap->size_class_use[class_idx].alloc_peak = alloc_current; \ - atomic_incr32(&heap->size_class_use[class_idx].alloc_total); \ - } while (0) -#define _rpmalloc_stat_inc_free(heap, class_idx) \ - do { \ - atomic_decr32(&heap->size_class_use[class_idx].alloc_current); \ - atomic_incr32(&heap->size_class_use[class_idx].free_total); \ - } while (0) -#else -#define _rpmalloc_stat_inc(counter) \ - do { \ - } while (0) -#define _rpmalloc_stat_dec(counter) \ - do { \ - } while (0) -#define _rpmalloc_stat_add(counter, value) \ - do { \ - } while (0) -#define _rpmalloc_stat_add64(counter, value) \ - do { \ - } while (0) -#define _rpmalloc_stat_add_peak(counter, value, peak) \ - do { \ - } while (0) -#define _rpmalloc_stat_sub(counter, value) \ - do { \ - } while (0) -#define _rpmalloc_stat_inc_alloc(heap, class_idx) \ - do { \ - } while (0) -#define _rpmalloc_stat_inc_free(heap, class_idx) \ - do { \ - } while (0) -#endif - -/// -/// Preconfigured limits and sizes -/// - -//! Granularity of a small allocation block (must be power of two) -#define SMALL_GRANULARITY 16 -//! Small granularity shift count -#define SMALL_GRANULARITY_SHIFT 4 -//! Number of small block size classes -#define SMALL_CLASS_COUNT 65 -//! Maximum size of a small block -#define SMALL_SIZE_LIMIT (SMALL_GRANULARITY * (SMALL_CLASS_COUNT - 1)) -//! Granularity of a medium allocation block -#define MEDIUM_GRANULARITY 512 -//! Medium granularity shift count -#define MEDIUM_GRANULARITY_SHIFT 9 -//! Number of medium block size classes -#define MEDIUM_CLASS_COUNT 61 -//! Total number of small + medium size classes -#define SIZE_CLASS_COUNT (SMALL_CLASS_COUNT + MEDIUM_CLASS_COUNT) -//! Number of large block size classes -#define LARGE_CLASS_COUNT 63 -//! Maximum size of a medium block -#define MEDIUM_SIZE_LIMIT \ - (SMALL_SIZE_LIMIT + (MEDIUM_GRANULARITY * MEDIUM_CLASS_COUNT)) -//! Maximum size of a large block -#define LARGE_SIZE_LIMIT \ - ((LARGE_CLASS_COUNT * _memory_span_size) - SPAN_HEADER_SIZE) -//! Size of a span header (must be a multiple of SMALL_GRANULARITY and a power -//! of two) -#define SPAN_HEADER_SIZE 128 -//! Number of spans in thread cache -#define MAX_THREAD_SPAN_CACHE 400 -//! Number of spans to transfer between thread and global cache -#define THREAD_SPAN_CACHE_TRANSFER 64 -//! Number of spans in thread cache for large spans (must be greater than -//! LARGE_CLASS_COUNT / 2) -#define MAX_THREAD_SPAN_LARGE_CACHE 100 -//! Number of spans to transfer between thread and global cache for large spans -#define THREAD_SPAN_LARGE_CACHE_TRANSFER 6 - -_Static_assert((SMALL_GRANULARITY & (SMALL_GRANULARITY - 1)) == 0, - "Small granularity must be power of two"); -_Static_assert((SPAN_HEADER_SIZE & (SPAN_HEADER_SIZE - 1)) == 0, - "Span header size must be power of two"); - -#if ENABLE_VALIDATE_ARGS -//! Maximum allocation size to avoid integer overflow -#undef MAX_ALLOC_SIZE -#define MAX_ALLOC_SIZE (((size_t) - 1) - _memory_span_size) -#endif - -#define pointer_offset(ptr, ofs) (void *)((char *)(ptr) + (ptrdiff_t)(ofs)) -#define pointer_diff(first, second) \ - (ptrdiff_t)((const char *)(first) - (const char *)(second)) - -#define INVALID_POINTER ((void *)((uintptr_t) - 1)) - -#define SIZE_CLASS_LARGE SIZE_CLASS_COUNT -#define SIZE_CLASS_HUGE ((uint32_t) - 1) - -//////////// -/// -/// Data types -/// -////// - -//! A memory heap, per thread -typedef struct heap_t heap_t; -//! Span of memory pages -typedef struct span_t span_t; -//! Span list -typedef struct span_list_t span_list_t; -//! Span active data -typedef struct span_active_t span_active_t; -//! Size class definition -typedef struct size_class_t size_class_t; -//! Global cache -typedef struct global_cache_t global_cache_t; - -//! Flag indicating span is the first (master) span of a split superspan -#define SPAN_FLAG_MASTER 1U -//! Flag indicating span is a secondary (sub) span of a split superspan -#define SPAN_FLAG_SUBSPAN 2U -//! Flag indicating span has blocks with increased alignment -#define SPAN_FLAG_ALIGNED_BLOCKS 4U -//! Flag indicating an unmapped master span -#define SPAN_FLAG_UNMAPPED_MASTER 8U - -#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS -struct span_use_t { - //! Current number of spans used (actually used, not in cache) - atomic32_t current; - //! High water mark of spans used - atomic32_t high; -#if ENABLE_STATISTICS - //! Number of spans in deferred list - atomic32_t spans_deferred; - //! Number of spans transitioned to global cache - atomic32_t spans_to_global; - //! Number of spans transitioned from global cache - atomic32_t spans_from_global; - //! Number of spans transitioned to thread cache - atomic32_t spans_to_cache; - //! Number of spans transitioned from thread cache - atomic32_t spans_from_cache; - //! Number of spans transitioned to reserved state - atomic32_t spans_to_reserved; - //! Number of spans transitioned from reserved state - atomic32_t spans_from_reserved; - //! Number of raw memory map calls - atomic32_t spans_map_calls; -#endif -}; -typedef struct span_use_t span_use_t; -#endif - -#if ENABLE_STATISTICS -struct size_class_use_t { - //! Current number of allocations - atomic32_t alloc_current; - //! Peak number of allocations - int32_t alloc_peak; - //! Total number of allocations - atomic32_t alloc_total; - //! Total number of frees - atomic32_t free_total; - //! Number of spans in use - atomic32_t spans_current; - //! Number of spans transitioned to cache - int32_t spans_peak; - //! Number of spans transitioned to cache - atomic32_t spans_to_cache; - //! Number of spans transitioned from cache - atomic32_t spans_from_cache; - //! Number of spans transitioned from reserved state - atomic32_t spans_from_reserved; - //! Number of spans mapped - atomic32_t spans_map_calls; - int32_t unused; -}; -typedef struct size_class_use_t size_class_use_t; -#endif - -// A span can either represent a single span of memory pages with size declared -// by span_map_count configuration variable, or a set of spans in a continuous -// region, a super span. Any reference to the term "span" usually refers to both -// a single span or a super span. A super span can further be divided into -// multiple spans (or this, super spans), where the first (super)span is the -// master and subsequent (super)spans are subspans. The master span keeps track -// of how many subspans that are still alive and mapped in virtual memory, and -// once all subspans and master have been unmapped the entire superspan region -// is released and unmapped (on Windows for example, the entire superspan range -// has to be released in the same call to release the virtual memory range, but -// individual subranges can be decommitted individually to reduce physical -// memory use). -struct span_t { - //! Free list - void *free_list; - //! Total block count of size class - uint32_t block_count; - //! Size class - uint32_t size_class; - //! Index of last block initialized in free list - uint32_t free_list_limit; - //! Number of used blocks remaining when in partial state - uint32_t used_count; - //! Deferred free list - atomicptr_t free_list_deferred; - //! Size of deferred free list, or list of spans when part of a cache list - uint32_t list_size; - //! Size of a block - uint32_t block_size; - //! Flags and counters - uint32_t flags; - //! Number of spans - uint32_t span_count; - //! Total span counter for master spans - uint32_t total_spans; - //! Offset from master span for subspans - uint32_t offset_from_master; - //! Remaining span counter, for master spans - atomic32_t remaining_spans; - //! Alignment offset - uint32_t align_offset; - //! Owning heap - heap_t *heap; - //! Next span - span_t *next; - //! Previous span - span_t *prev; -}; -_Static_assert(sizeof(span_t) <= SPAN_HEADER_SIZE, "span size mismatch"); - -struct span_cache_t { - size_t count; - span_t *span[MAX_THREAD_SPAN_CACHE]; -}; -typedef struct span_cache_t span_cache_t; - -struct span_large_cache_t { - size_t count; - span_t *span[MAX_THREAD_SPAN_LARGE_CACHE]; -}; -typedef struct span_large_cache_t span_large_cache_t; - -struct heap_size_class_t { - //! Free list of active span - void *free_list; - //! Double linked list of partially used spans with free blocks. - // Previous span pointer in head points to tail span of list. - span_t *partial_span; - //! Early level cache of fully free spans - span_t *cache; -}; -typedef struct heap_size_class_t heap_size_class_t; - -// Control structure for a heap, either a thread heap or a first class heap if -// enabled -struct heap_t { - //! Owning thread ID - uintptr_t owner_thread; - //! Free lists for each size class - heap_size_class_t size_class[SIZE_CLASS_COUNT]; -#if ENABLE_THREAD_CACHE - //! Arrays of fully freed spans, single span - span_cache_t span_cache; -#endif - //! List of deferred free spans (single linked list) - atomicptr_t span_free_deferred; - //! Number of full spans - size_t full_span_count; - //! Mapped but unused spans - span_t *span_reserve; - //! Master span for mapped but unused spans - span_t *span_reserve_master; - //! Number of mapped but unused spans - uint32_t spans_reserved; - //! Child count - atomic32_t child_count; - //! Next heap in id list - heap_t *next_heap; - //! Next heap in orphan list - heap_t *next_orphan; - //! Heap ID - int32_t id; - //! Finalization state flag - int finalize; - //! Master heap owning the memory pages - heap_t *master_heap; -#if ENABLE_THREAD_CACHE - //! Arrays of fully freed spans, large spans with > 1 span count - span_large_cache_t span_large_cache[LARGE_CLASS_COUNT - 1]; -#endif -#if RPMALLOC_FIRST_CLASS_HEAPS - //! Double linked list of fully utilized spans with free blocks for each size - //! class. - // Previous span pointer in head points to tail span of list. - span_t *full_span[SIZE_CLASS_COUNT]; - //! Double linked list of large and huge spans allocated by this heap - span_t *large_huge_span; -#endif -#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS - //! Current and high water mark of spans used per span count - span_use_t span_use[LARGE_CLASS_COUNT]; -#endif -#if ENABLE_STATISTICS - //! Allocation stats per size class - size_class_use_t size_class_use[SIZE_CLASS_COUNT + 1]; - //! Number of bytes transitioned thread -> global - atomic64_t thread_to_global; - //! Number of bytes transitioned global -> thread - atomic64_t global_to_thread; -#endif -}; - -// Size class for defining a block size bucket -struct size_class_t { - //! Size of blocks in this class - uint32_t block_size; - //! Number of blocks in each chunk - uint16_t block_count; - //! Class index this class is merged with - uint16_t class_idx; -}; -_Static_assert(sizeof(size_class_t) == 8, "Size class size mismatch"); - -struct global_cache_t { - //! Cache lock - atomic32_t lock; - //! Cache count - uint32_t count; -#if ENABLE_STATISTICS - //! Insert count - size_t insert_count; - //! Extract count - size_t extract_count; -#endif - //! Cached spans - span_t *span[GLOBAL_CACHE_MULTIPLIER * MAX_THREAD_SPAN_CACHE]; - //! Unlimited cache overflow - span_t *overflow; -}; - -//////////// -/// -/// Global data -/// -////// - -//! Default span size (64KiB) -#define _memory_default_span_size (64 * 1024) -#define _memory_default_span_size_shift 16 -#define _memory_default_span_mask (~((uintptr_t)(_memory_span_size - 1))) - -//! Initialized flag -static int _rpmalloc_initialized; -//! Main thread ID -static uintptr_t _rpmalloc_main_thread_id; -//! Configuration -static rpmalloc_config_t _memory_config; -//! Memory page size -static size_t _memory_page_size; -//! Shift to divide by page size -static size_t _memory_page_size_shift; -//! Granularity at which memory pages are mapped by OS -static size_t _memory_map_granularity; -#if RPMALLOC_CONFIGURABLE -//! Size of a span of memory pages -static size_t _memory_span_size; -//! Shift to divide by span size -static size_t _memory_span_size_shift; -//! Mask to get to start of a memory span -static uintptr_t _memory_span_mask; -#else -//! Hardwired span size -#define _memory_span_size _memory_default_span_size -#define _memory_span_size_shift _memory_default_span_size_shift -#define _memory_span_mask _memory_default_span_mask -#endif -//! Number of spans to map in each map call -static size_t _memory_span_map_count; -//! Number of spans to keep reserved in each heap -static size_t _memory_heap_reserve_count; -//! Global size classes -static size_class_t _memory_size_class[SIZE_CLASS_COUNT]; -//! Run-time size limit of medium blocks -static size_t _memory_medium_size_limit; -//! Heap ID counter -static atomic32_t _memory_heap_id; -//! Huge page support -static int _memory_huge_pages; -#if ENABLE_GLOBAL_CACHE -//! Global span cache -static global_cache_t _memory_span_cache[LARGE_CLASS_COUNT]; -#endif -//! Global reserved spans -static span_t *_memory_global_reserve; -//! Global reserved count -static size_t _memory_global_reserve_count; -//! Global reserved master -static span_t *_memory_global_reserve_master; -//! All heaps -static heap_t *_memory_heaps[HEAP_ARRAY_SIZE]; -//! Used to restrict access to mapping memory for huge pages -static atomic32_t _memory_global_lock; -//! Orphaned heaps -static heap_t *_memory_orphan_heaps; -#if RPMALLOC_FIRST_CLASS_HEAPS -//! Orphaned heaps (first class heaps) -static heap_t *_memory_first_class_orphan_heaps; -#endif -#if ENABLE_STATISTICS -//! Allocations counter -static atomic64_t _allocation_counter; -//! Deallocations counter -static atomic64_t _deallocation_counter; -//! Active heap count -static atomic32_t _memory_active_heaps; -//! Number of currently mapped memory pages -static atomic32_t _mapped_pages; -//! Peak number of concurrently mapped memory pages -static int32_t _mapped_pages_peak; -//! Number of mapped master spans -static atomic32_t _master_spans; -//! Number of unmapped dangling master spans -static atomic32_t _unmapped_master_spans; -//! Running counter of total number of mapped memory pages since start -static atomic32_t _mapped_total; -//! Running counter of total number of unmapped memory pages since start -static atomic32_t _unmapped_total; -//! Number of currently mapped memory pages in OS calls -static atomic32_t _mapped_pages_os; -//! Number of currently allocated pages in huge allocations -static atomic32_t _huge_pages_current; -//! Peak number of currently allocated pages in huge allocations -static int32_t _huge_pages_peak; -#endif - -//////////// -/// -/// Thread local heap and ID -/// -////// - -//! Current thread heap -#if ((defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD) || \ - defined(__TINYC__) -static pthread_key_t _memory_thread_heap; -#else -#ifdef _MSC_VER -#define _Thread_local __declspec(thread) -#define TLS_MODEL -#else -#ifndef __HAIKU__ -#define TLS_MODEL __attribute__((tls_model("initial-exec"))) -#else -#define TLS_MODEL -#endif -#if !defined(__clang__) && defined(__GNUC__) -#define _Thread_local __thread -#endif -#endif -static _Thread_local heap_t *_memory_thread_heap TLS_MODEL; -#endif - -static inline heap_t *get_thread_heap_raw(void) { -#if (defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD - return pthread_getspecific(_memory_thread_heap); -#else - return _memory_thread_heap; -#endif -} - -//! Get the current thread heap -static inline heap_t *get_thread_heap(void) { - heap_t *heap = get_thread_heap_raw(); -#if ENABLE_PRELOAD - if (EXPECTED(heap != 0)) - return heap; - rpmalloc_initialize(); - return get_thread_heap_raw(); -#else - return heap; -#endif -} - -//! Fast thread ID -static inline uintptr_t get_thread_id(void) { -#if defined(_WIN32) - return (uintptr_t)((void *)NtCurrentTeb()); -#elif (defined(__GNUC__) || defined(__clang__)) && !defined(__CYGWIN__) - uintptr_t tid; -#if defined(__i386__) - __asm__("movl %%gs:0, %0" : "=r"(tid) : :); -#elif defined(__x86_64__) -#if defined(__MACH__) - __asm__("movq %%gs:0, %0" : "=r"(tid) : :); -#else - __asm__("movq %%fs:0, %0" : "=r"(tid) : :); -#endif -#elif defined(__arm__) - __asm__ volatile("mrc p15, 0, %0, c13, c0, 3" : "=r"(tid)); -#elif defined(__aarch64__) -#if defined(__MACH__) - // tpidr_el0 likely unused, always return 0 on iOS - __asm__ volatile("mrs %0, tpidrro_el0" : "=r"(tid)); -#else - __asm__ volatile("mrs %0, tpidr_el0" : "=r"(tid)); -#endif -#else -#error This platform needs implementation of get_thread_id() -#endif - return tid; -#else -#error This platform needs implementation of get_thread_id() -#endif -} - -//! Set the current thread heap -static void set_thread_heap(heap_t *heap) { -#if ((defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD) || \ - defined(__TINYC__) - pthread_setspecific(_memory_thread_heap, heap); -#else - _memory_thread_heap = heap; -#endif - if (heap) - heap->owner_thread = get_thread_id(); -} - -//! Set main thread ID -extern void rpmalloc_set_main_thread(void); - -void rpmalloc_set_main_thread(void) { - _rpmalloc_main_thread_id = get_thread_id(); -} - -static void _rpmalloc_spin(void) { -#if defined(_MSC_VER) -#if defined(_M_ARM64) - __yield(); -#else - _mm_pause(); -#endif -#elif defined(__x86_64__) || defined(__i386__) - __asm__ volatile("pause" ::: "memory"); -#elif defined(__aarch64__) || (defined(__arm__) && __ARM_ARCH >= 7) - __asm__ volatile("yield" ::: "memory"); -#elif defined(__powerpc__) || defined(__powerpc64__) - // No idea if ever been compiled in such archs but ... as precaution - __asm__ volatile("or 27,27,27"); -#elif defined(__sparc__) - __asm__ volatile("rd %ccr, %g0 \n\trd %ccr, %g0 \n\trd %ccr, %g0"); -#else - struct timespec ts = {0}; - nanosleep(&ts, 0); -#endif -} - -#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) -static void NTAPI _rpmalloc_thread_destructor(void *value) { -#if ENABLE_OVERRIDE - // If this is called on main thread it means rpmalloc_finalize - // has not been called and shutdown is forced (through _exit) or unclean - if (get_thread_id() == _rpmalloc_main_thread_id) - return; -#endif - if (value) - rpmalloc_thread_finalize(1); -} -#endif - -//////////// -/// -/// Low level memory map/unmap -/// -////// - -static void _rpmalloc_set_name(void *address, size_t size) { -#if defined(__linux__) || defined(__ANDROID__) - const char *name = _memory_huge_pages ? _memory_config.huge_page_name - : _memory_config.page_name; - if (address == MAP_FAILED || !name) - return; - // If the kernel does not support CONFIG_ANON_VMA_NAME or if the call fails - // (e.g. invalid name) it is a no-op basically. - (void)prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, (uintptr_t)address, size, - (uintptr_t)name); -#else - (void)sizeof(size); - (void)sizeof(address); -#endif -} - -//! Map more virtual memory -// size is number of bytes to map -// offset receives the offset in bytes from start of mapped region -// returns address to start of mapped region to use -static void *_rpmalloc_mmap(size_t size, size_t *offset) { - rpmalloc_assert(!(size % _memory_page_size), "Invalid mmap size"); - rpmalloc_assert(size >= _memory_page_size, "Invalid mmap size"); - void *address = _memory_config.memory_map(size, offset); - if (EXPECTED(address != 0)) { - _rpmalloc_stat_add_peak(&_mapped_pages, (size >> _memory_page_size_shift), - _mapped_pages_peak); - _rpmalloc_stat_add(&_mapped_total, (size >> _memory_page_size_shift)); - } - return address; -} - -//! Unmap virtual memory -// address is the memory address to unmap, as returned from _memory_map -// size is the number of bytes to unmap, which might be less than full region -// for a partial unmap offset is the offset in bytes to the actual mapped -// region, as set by _memory_map release is set to 0 for partial unmap, or size -// of entire range for a full unmap -static void _rpmalloc_unmap(void *address, size_t size, size_t offset, - size_t release) { - rpmalloc_assert(!release || (release >= size), "Invalid unmap size"); - rpmalloc_assert(!release || (release >= _memory_page_size), - "Invalid unmap size"); - if (release) { - rpmalloc_assert(!(release % _memory_page_size), "Invalid unmap size"); - _rpmalloc_stat_sub(&_mapped_pages, (release >> _memory_page_size_shift)); - _rpmalloc_stat_add(&_unmapped_total, (release >> _memory_page_size_shift)); - } - _memory_config.memory_unmap(address, size, offset, release); -} - -//! Default implementation to map new pages to virtual memory -static void *_rpmalloc_mmap_os(size_t size, size_t *offset) { - // Either size is a heap (a single page) or a (multiple) span - we only need - // to align spans, and only if larger than map granularity - size_t padding = ((size >= _memory_span_size) && - (_memory_span_size > _memory_map_granularity)) - ? _memory_span_size - : 0; - rpmalloc_assert(size >= _memory_page_size, "Invalid mmap size"); -#if PLATFORM_WINDOWS - // Ok to MEM_COMMIT - according to MSDN, "actual physical pages are not - // allocated unless/until the virtual addresses are actually accessed" - void *ptr = VirtualAlloc(0, size + padding, - (_memory_huge_pages ? MEM_LARGE_PAGES : 0) | - MEM_RESERVE | MEM_COMMIT, - PAGE_READWRITE); - if (!ptr) { - if (_memory_config.map_fail_callback) { - if (_memory_config.map_fail_callback(size + padding)) - return _rpmalloc_mmap_os(size, offset); - } else { - rpmalloc_assert(ptr, "Failed to map virtual memory block"); - } - return 0; - } -#else - int flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_UNINITIALIZED; -#if defined(__APPLE__) && !TARGET_OS_IPHONE && !TARGET_OS_SIMULATOR - int fd = (int)VM_MAKE_TAG(240U); - if (_memory_huge_pages) - fd |= VM_FLAGS_SUPERPAGE_SIZE_2MB; - void *ptr = mmap(0, size + padding, PROT_READ | PROT_WRITE, flags, fd, 0); -#elif defined(MAP_HUGETLB) - void *ptr = mmap(0, size + padding, - PROT_READ | PROT_WRITE | PROT_MAX(PROT_READ | PROT_WRITE), - (_memory_huge_pages ? MAP_HUGETLB : 0) | flags, -1, 0); -#if defined(MADV_HUGEPAGE) - // In some configurations, huge pages allocations might fail thus - // we fallback to normal allocations and promote the region as transparent - // huge page - if ((ptr == MAP_FAILED || !ptr) && _memory_huge_pages) { - ptr = mmap(0, size + padding, PROT_READ | PROT_WRITE, flags, -1, 0); - if (ptr && ptr != MAP_FAILED) { - int prm = madvise(ptr, size + padding, MADV_HUGEPAGE); - (void)prm; - rpmalloc_assert((prm == 0), "Failed to promote the page to THP"); - } - } -#endif - _rpmalloc_set_name(ptr, size + padding); -#elif defined(MAP_ALIGNED) - const size_t align = - (sizeof(size_t) * 8) - (size_t)(__builtin_clzl(size - 1)); - void *ptr = - mmap(0, size + padding, PROT_READ | PROT_WRITE, - (_memory_huge_pages ? MAP_ALIGNED(align) : 0) | flags, -1, 0); -#elif defined(MAP_ALIGN) - caddr_t base = (_memory_huge_pages ? (caddr_t)(4 << 20) : 0); - void *ptr = mmap(base, size + padding, PROT_READ | PROT_WRITE, - (_memory_huge_pages ? MAP_ALIGN : 0) | flags, -1, 0); -#else - void *ptr = mmap(0, size + padding, PROT_READ | PROT_WRITE, flags, -1, 0); -#endif - if ((ptr == MAP_FAILED) || !ptr) { - if (_memory_config.map_fail_callback) { - if (_memory_config.map_fail_callback(size + padding)) - return _rpmalloc_mmap_os(size, offset); - } else if (errno != ENOMEM) { - rpmalloc_assert((ptr != MAP_FAILED) && ptr, - "Failed to map virtual memory block"); - } - return 0; - } -#endif - _rpmalloc_stat_add(&_mapped_pages_os, - (int32_t)((size + padding) >> _memory_page_size_shift)); - if (padding) { - size_t final_padding = padding - ((uintptr_t)ptr & ~_memory_span_mask); - rpmalloc_assert(final_padding <= _memory_span_size, - "Internal failure in padding"); - rpmalloc_assert(final_padding <= padding, "Internal failure in padding"); - rpmalloc_assert(!(final_padding % 8), "Internal failure in padding"); - ptr = pointer_offset(ptr, final_padding); - *offset = final_padding >> 3; - } - rpmalloc_assert((size < _memory_span_size) || - !((uintptr_t)ptr & ~_memory_span_mask), - "Internal failure in padding"); - return ptr; -} - -//! Default implementation to unmap pages from virtual memory -static void _rpmalloc_unmap_os(void *address, size_t size, size_t offset, - size_t release) { - rpmalloc_assert(release || (offset == 0), "Invalid unmap size"); - rpmalloc_assert(!release || (release >= _memory_page_size), - "Invalid unmap size"); - rpmalloc_assert(size >= _memory_page_size, "Invalid unmap size"); - if (release && offset) { - offset <<= 3; - address = pointer_offset(address, -(int32_t)offset); - if ((release >= _memory_span_size) && - (_memory_span_size > _memory_map_granularity)) { - // Padding is always one span size - release += _memory_span_size; - } - } -#if !DISABLE_UNMAP -#if PLATFORM_WINDOWS - if (!VirtualFree(address, release ? 0 : size, - release ? MEM_RELEASE : MEM_DECOMMIT)) { - rpmalloc_assert(0, "Failed to unmap virtual memory block"); - } -#else - if (release) { - if (munmap(address, release)) { - rpmalloc_assert(0, "Failed to unmap virtual memory block"); - } - } else { -#if defined(MADV_FREE_REUSABLE) - int ret; - while ((ret = madvise(address, size, MADV_FREE_REUSABLE)) == -1 && - (errno == EAGAIN)) - errno = 0; - if ((ret == -1) && (errno != 0)) { -#elif defined(MADV_DONTNEED) - if (madvise(address, size, MADV_DONTNEED)) { -#elif defined(MADV_PAGEOUT) - if (madvise(address, size, MADV_PAGEOUT)) { -#elif defined(MADV_FREE) - if (madvise(address, size, MADV_FREE)) { -#else - if (posix_madvise(address, size, POSIX_MADV_DONTNEED)) { -#endif - rpmalloc_assert(0, "Failed to madvise virtual memory block as free"); - } - } -#endif -#endif - if (release) - _rpmalloc_stat_sub(&_mapped_pages_os, release >> _memory_page_size_shift); -} - -static void _rpmalloc_span_mark_as_subspan_unless_master(span_t *master, - span_t *subspan, - size_t span_count); - -//! Use global reserved spans to fulfill a memory map request (reserve size must -//! be checked by caller) -static span_t *_rpmalloc_global_get_reserved_spans(size_t span_count) { - span_t *span = _memory_global_reserve; - _rpmalloc_span_mark_as_subspan_unless_master(_memory_global_reserve_master, - span, span_count); - _memory_global_reserve_count -= span_count; - if (_memory_global_reserve_count) - _memory_global_reserve = - (span_t *)pointer_offset(span, span_count << _memory_span_size_shift); - else - _memory_global_reserve = 0; - return span; -} - -//! Store the given spans as global reserve (must only be called from within new -//! heap allocation, not thread safe) -static void _rpmalloc_global_set_reserved_spans(span_t *master, span_t *reserve, - size_t reserve_span_count) { - _memory_global_reserve_master = master; - _memory_global_reserve_count = reserve_span_count; - _memory_global_reserve = reserve; -} - -//////////// -/// -/// Span linked list management -/// -////// - -//! Add a span to double linked list at the head -static void _rpmalloc_span_double_link_list_add(span_t **head, span_t *span) { - if (*head) - (*head)->prev = span; - span->next = *head; - *head = span; -} - -//! Pop head span from double linked list -static void _rpmalloc_span_double_link_list_pop_head(span_t **head, - span_t *span) { - rpmalloc_assert(*head == span, "Linked list corrupted"); - span = *head; - *head = span->next; -} - -//! Remove a span from double linked list -static void _rpmalloc_span_double_link_list_remove(span_t **head, - span_t *span) { - rpmalloc_assert(*head, "Linked list corrupted"); - if (*head == span) { - *head = span->next; - } else { - span_t *next_span = span->next; - span_t *prev_span = span->prev; - prev_span->next = next_span; - if (EXPECTED(next_span != 0)) - next_span->prev = prev_span; - } -} - -//////////// -/// -/// Span control -/// -////// - -static void _rpmalloc_heap_cache_insert(heap_t *heap, span_t *span); - -static void _rpmalloc_heap_finalize(heap_t *heap); - -static void _rpmalloc_heap_set_reserved_spans(heap_t *heap, span_t *master, - span_t *reserve, - size_t reserve_span_count); - -//! Declare the span to be a subspan and store distance from master span and -//! span count -static void _rpmalloc_span_mark_as_subspan_unless_master(span_t *master, - span_t *subspan, - size_t span_count) { - rpmalloc_assert((subspan != master) || (subspan->flags & SPAN_FLAG_MASTER), - "Span master pointer and/or flag mismatch"); - if (subspan != master) { - subspan->flags = SPAN_FLAG_SUBSPAN; - subspan->offset_from_master = - (uint32_t)((uintptr_t)pointer_diff(subspan, master) >> - _memory_span_size_shift); - subspan->align_offset = 0; - } - subspan->span_count = (uint32_t)span_count; -} - -//! Use reserved spans to fulfill a memory map request (reserve size must be -//! checked by caller) -static span_t *_rpmalloc_span_map_from_reserve(heap_t *heap, - size_t span_count) { - // Update the heap span reserve - span_t *span = heap->span_reserve; - heap->span_reserve = - (span_t *)pointer_offset(span, span_count * _memory_span_size); - heap->spans_reserved -= (uint32_t)span_count; - - _rpmalloc_span_mark_as_subspan_unless_master(heap->span_reserve_master, span, - span_count); - if (span_count <= LARGE_CLASS_COUNT) - _rpmalloc_stat_inc(&heap->span_use[span_count - 1].spans_from_reserved); - - return span; -} - -//! Get the aligned number of spans to map in based on wanted count, configured -//! mapping granularity and the page size -static size_t _rpmalloc_span_align_count(size_t span_count) { - size_t request_count = (span_count > _memory_span_map_count) - ? span_count - : _memory_span_map_count; - if ((_memory_page_size > _memory_span_size) && - ((request_count * _memory_span_size) % _memory_page_size)) - request_count += - _memory_span_map_count - (request_count % _memory_span_map_count); - return request_count; -} - -//! Setup a newly mapped span -static void _rpmalloc_span_initialize(span_t *span, size_t total_span_count, - size_t span_count, size_t align_offset) { - span->total_spans = (uint32_t)total_span_count; - span->span_count = (uint32_t)span_count; - span->align_offset = (uint32_t)align_offset; - span->flags = SPAN_FLAG_MASTER; - atomic_store32(&span->remaining_spans, (int32_t)total_span_count); -} - -static void _rpmalloc_span_unmap(span_t *span); - -//! Map an aligned set of spans, taking configured mapping granularity and the -//! page size into account -static span_t *_rpmalloc_span_map_aligned_count(heap_t *heap, - size_t span_count) { - // If we already have some, but not enough, reserved spans, release those to - // heap cache and map a new full set of spans. Otherwise we would waste memory - // if page size > span size (huge pages) - size_t aligned_span_count = _rpmalloc_span_align_count(span_count); - size_t align_offset = 0; - span_t *span = (span_t *)_rpmalloc_mmap( - aligned_span_count * _memory_span_size, &align_offset); - if (!span) - return 0; - _rpmalloc_span_initialize(span, aligned_span_count, span_count, align_offset); - _rpmalloc_stat_inc(&_master_spans); - if (span_count <= LARGE_CLASS_COUNT) - _rpmalloc_stat_inc(&heap->span_use[span_count - 1].spans_map_calls); - if (aligned_span_count > span_count) { - span_t *reserved_spans = - (span_t *)pointer_offset(span, span_count * _memory_span_size); - size_t reserved_count = aligned_span_count - span_count; - if (heap->spans_reserved) { - _rpmalloc_span_mark_as_subspan_unless_master( - heap->span_reserve_master, heap->span_reserve, heap->spans_reserved); - _rpmalloc_heap_cache_insert(heap, heap->span_reserve); - } - if (reserved_count > _memory_heap_reserve_count) { - // If huge pages or eager spam map count, the global reserve spin lock is - // held by caller, _rpmalloc_span_map - rpmalloc_assert(atomic_load32(&_memory_global_lock) == 1, - "Global spin lock not held as expected"); - size_t remain_count = reserved_count - _memory_heap_reserve_count; - reserved_count = _memory_heap_reserve_count; - span_t *remain_span = (span_t *)pointer_offset( - reserved_spans, reserved_count * _memory_span_size); - if (_memory_global_reserve) { - _rpmalloc_span_mark_as_subspan_unless_master( - _memory_global_reserve_master, _memory_global_reserve, - _memory_global_reserve_count); - _rpmalloc_span_unmap(_memory_global_reserve); - } - _rpmalloc_global_set_reserved_spans(span, remain_span, remain_count); - } - _rpmalloc_heap_set_reserved_spans(heap, span, reserved_spans, - reserved_count); - } - return span; -} - -//! Map in memory pages for the given number of spans (or use previously -//! reserved pages) -static span_t *_rpmalloc_span_map(heap_t *heap, size_t span_count) { - if (span_count <= heap->spans_reserved) - return _rpmalloc_span_map_from_reserve(heap, span_count); - span_t *span = 0; - int use_global_reserve = - (_memory_page_size > _memory_span_size) || - (_memory_span_map_count > _memory_heap_reserve_count); - if (use_global_reserve) { - // If huge pages, make sure only one thread maps more memory to avoid bloat - while (!atomic_cas32_acquire(&_memory_global_lock, 1, 0)) - _rpmalloc_spin(); - if (_memory_global_reserve_count >= span_count) { - size_t reserve_count = - (!heap->spans_reserved ? _memory_heap_reserve_count : span_count); - if (_memory_global_reserve_count < reserve_count) - reserve_count = _memory_global_reserve_count; - span = _rpmalloc_global_get_reserved_spans(reserve_count); - if (span) { - if (reserve_count > span_count) { - span_t *reserved_span = (span_t *)pointer_offset( - span, span_count << _memory_span_size_shift); - _rpmalloc_heap_set_reserved_spans(heap, _memory_global_reserve_master, - reserved_span, - reserve_count - span_count); - } - // Already marked as subspan in _rpmalloc_global_get_reserved_spans - span->span_count = (uint32_t)span_count; - } - } - } - if (!span) - span = _rpmalloc_span_map_aligned_count(heap, span_count); - if (use_global_reserve) - atomic_store32_release(&_memory_global_lock, 0); - return span; -} - -//! Unmap memory pages for the given number of spans (or mark as unused if no -//! partial unmappings) -static void _rpmalloc_span_unmap(span_t *span) { - rpmalloc_assert((span->flags & SPAN_FLAG_MASTER) || - (span->flags & SPAN_FLAG_SUBSPAN), - "Span flag corrupted"); - rpmalloc_assert(!(span->flags & SPAN_FLAG_MASTER) || - !(span->flags & SPAN_FLAG_SUBSPAN), - "Span flag corrupted"); - - int is_master = !!(span->flags & SPAN_FLAG_MASTER); - span_t *master = - is_master ? span - : ((span_t *)pointer_offset( - span, -(intptr_t)((uintptr_t)span->offset_from_master * - _memory_span_size))); - rpmalloc_assert(is_master || (span->flags & SPAN_FLAG_SUBSPAN), - "Span flag corrupted"); - rpmalloc_assert(master->flags & SPAN_FLAG_MASTER, "Span flag corrupted"); - - size_t span_count = span->span_count; - if (!is_master) { - // Directly unmap subspans (unless huge pages, in which case we defer and - // unmap entire page range with master) - rpmalloc_assert(span->align_offset == 0, "Span align offset corrupted"); - if (_memory_span_size >= _memory_page_size) - _rpmalloc_unmap(span, span_count * _memory_span_size, 0, 0); - } else { - // Special double flag to denote an unmapped master - // It must be kept in memory since span header must be used - span->flags |= - SPAN_FLAG_MASTER | SPAN_FLAG_SUBSPAN | SPAN_FLAG_UNMAPPED_MASTER; - _rpmalloc_stat_add(&_unmapped_master_spans, 1); - } - - if (atomic_add32(&master->remaining_spans, -(int32_t)span_count) <= 0) { - // Everything unmapped, unmap the master span with release flag to unmap the - // entire range of the super span - rpmalloc_assert(!!(master->flags & SPAN_FLAG_MASTER) && - !!(master->flags & SPAN_FLAG_SUBSPAN), - "Span flag corrupted"); - size_t unmap_count = master->span_count; - if (_memory_span_size < _memory_page_size) - unmap_count = master->total_spans; - _rpmalloc_stat_sub(&_master_spans, 1); - _rpmalloc_stat_sub(&_unmapped_master_spans, 1); - _rpmalloc_unmap(master, unmap_count * _memory_span_size, - master->align_offset, - (size_t)master->total_spans * _memory_span_size); - } -} - -//! Move the span (used for small or medium allocations) to the heap thread -//! cache -static void _rpmalloc_span_release_to_cache(heap_t *heap, span_t *span) { - rpmalloc_assert(heap == span->heap, "Span heap pointer corrupted"); - rpmalloc_assert(span->size_class < SIZE_CLASS_COUNT, - "Invalid span size class"); - rpmalloc_assert(span->span_count == 1, "Invalid span count"); -#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS - atomic_decr32(&heap->span_use[0].current); -#endif - _rpmalloc_stat_dec(&heap->size_class_use[span->size_class].spans_current); - if (!heap->finalize) { - _rpmalloc_stat_inc(&heap->span_use[0].spans_to_cache); - _rpmalloc_stat_inc(&heap->size_class_use[span->size_class].spans_to_cache); - if (heap->size_class[span->size_class].cache) - _rpmalloc_heap_cache_insert(heap, - heap->size_class[span->size_class].cache); - heap->size_class[span->size_class].cache = span; - } else { - _rpmalloc_span_unmap(span); - } -} - -//! Initialize a (partial) free list up to next system memory page, while -//! reserving the first block as allocated, returning number of blocks in list -static uint32_t free_list_partial_init(void **list, void **first_block, - void *page_start, void *block_start, - uint32_t block_count, - uint32_t block_size) { - rpmalloc_assert(block_count, "Internal failure"); - *first_block = block_start; - if (block_count > 1) { - void *free_block = pointer_offset(block_start, block_size); - void *block_end = - pointer_offset(block_start, (size_t)block_size * block_count); - // If block size is less than half a memory page, bound init to next memory - // page boundary - if (block_size < (_memory_page_size >> 1)) { - void *page_end = pointer_offset(page_start, _memory_page_size); - if (page_end < block_end) - block_end = page_end; - } - *list = free_block; - block_count = 2; - void *next_block = pointer_offset(free_block, block_size); - while (next_block < block_end) { - *((void **)free_block) = next_block; - free_block = next_block; - ++block_count; - next_block = pointer_offset(next_block, block_size); - } - *((void **)free_block) = 0; - } else { - *list = 0; - } - return block_count; -} - -//! Initialize an unused span (from cache or mapped) to be new active span, -//! putting the initial free list in heap class free list -static void *_rpmalloc_span_initialize_new(heap_t *heap, - heap_size_class_t *heap_size_class, - span_t *span, uint32_t class_idx) { - rpmalloc_assert(span->span_count == 1, "Internal failure"); - size_class_t *size_class = _memory_size_class + class_idx; - span->size_class = class_idx; - span->heap = heap; - span->flags &= ~SPAN_FLAG_ALIGNED_BLOCKS; - span->block_size = size_class->block_size; - span->block_count = size_class->block_count; - span->free_list = 0; - span->list_size = 0; - atomic_store_ptr_release(&span->free_list_deferred, 0); - - // Setup free list. Only initialize one system page worth of free blocks in - // list - void *block; - span->free_list_limit = - free_list_partial_init(&heap_size_class->free_list, &block, span, - pointer_offset(span, SPAN_HEADER_SIZE), - size_class->block_count, size_class->block_size); - // Link span as partial if there remains blocks to be initialized as free - // list, or full if fully initialized - if (span->free_list_limit < span->block_count) { - _rpmalloc_span_double_link_list_add(&heap_size_class->partial_span, span); - span->used_count = span->free_list_limit; - } else { -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_add(&heap->full_span[class_idx], span); -#endif - ++heap->full_span_count; - span->used_count = span->block_count; - } - return block; -} - -static void _rpmalloc_span_extract_free_list_deferred(span_t *span) { - // We need acquire semantics on the CAS operation since we are interested in - // the list size Refer to _rpmalloc_deallocate_defer_small_or_medium for - // further comments on this dependency - do { - span->free_list = - atomic_exchange_ptr_acquire(&span->free_list_deferred, INVALID_POINTER); - } while (span->free_list == INVALID_POINTER); - span->used_count -= span->list_size; - span->list_size = 0; - atomic_store_ptr_release(&span->free_list_deferred, 0); -} - -static int _rpmalloc_span_is_fully_utilized(span_t *span) { - rpmalloc_assert(span->free_list_limit <= span->block_count, - "Span free list corrupted"); - return !span->free_list && (span->free_list_limit >= span->block_count); -} - -static int _rpmalloc_span_finalize(heap_t *heap, size_t iclass, span_t *span, - span_t **list_head) { - void *free_list = heap->size_class[iclass].free_list; - span_t *class_span = (span_t *)((uintptr_t)free_list & _memory_span_mask); - if (span == class_span) { - // Adopt the heap class free list back into the span free list - void *block = span->free_list; - void *last_block = 0; - while (block) { - last_block = block; - block = *((void **)block); - } - uint32_t free_count = 0; - block = free_list; - while (block) { - ++free_count; - block = *((void **)block); - } - if (last_block) { - *((void **)last_block) = free_list; - } else { - span->free_list = free_list; - } - heap->size_class[iclass].free_list = 0; - span->used_count -= free_count; - } - // If this assert triggers you have memory leaks - rpmalloc_assert(span->list_size == span->used_count, "Memory leak detected"); - if (span->list_size == span->used_count) { - _rpmalloc_stat_dec(&heap->span_use[0].current); - _rpmalloc_stat_dec(&heap->size_class_use[iclass].spans_current); - // This function only used for spans in double linked lists - if (list_head) - _rpmalloc_span_double_link_list_remove(list_head, span); - _rpmalloc_span_unmap(span); - return 1; - } - return 0; -} - -//////////// -/// -/// Global cache -/// -////// - -#if ENABLE_GLOBAL_CACHE - -//! Finalize a global cache -static void _rpmalloc_global_cache_finalize(global_cache_t *cache) { - while (!atomic_cas32_acquire(&cache->lock, 1, 0)) - _rpmalloc_spin(); - - for (size_t ispan = 0; ispan < cache->count; ++ispan) - _rpmalloc_span_unmap(cache->span[ispan]); - cache->count = 0; - - while (cache->overflow) { - span_t *span = cache->overflow; - cache->overflow = span->next; - _rpmalloc_span_unmap(span); - } - - atomic_store32_release(&cache->lock, 0); -} - -static void _rpmalloc_global_cache_insert_spans(span_t **span, - size_t span_count, - size_t count) { - const size_t cache_limit = - (span_count == 1) ? GLOBAL_CACHE_MULTIPLIER * MAX_THREAD_SPAN_CACHE - : GLOBAL_CACHE_MULTIPLIER * - (MAX_THREAD_SPAN_LARGE_CACHE - (span_count >> 1)); - - global_cache_t *cache = &_memory_span_cache[span_count - 1]; - - size_t insert_count = count; - while (!atomic_cas32_acquire(&cache->lock, 1, 0)) - _rpmalloc_spin(); - -#if ENABLE_STATISTICS - cache->insert_count += count; -#endif - if ((cache->count + insert_count) > cache_limit) - insert_count = cache_limit - cache->count; - - memcpy(cache->span + cache->count, span, sizeof(span_t *) * insert_count); - cache->count += (uint32_t)insert_count; - -#if ENABLE_UNLIMITED_CACHE - while (insert_count < count) { -#else - // Enable unlimited cache if huge pages, or we will leak since it is unlikely - // that an entire huge page will be unmapped, and we're unable to partially - // decommit a huge page - while ((_memory_page_size > _memory_span_size) && (insert_count < count)) { -#endif - span_t *current_span = span[insert_count++]; - current_span->next = cache->overflow; - cache->overflow = current_span; - } - atomic_store32_release(&cache->lock, 0); - - span_t *keep = 0; - for (size_t ispan = insert_count; ispan < count; ++ispan) { - span_t *current_span = span[ispan]; - // Keep master spans that has remaining subspans to avoid dangling them - if ((current_span->flags & SPAN_FLAG_MASTER) && - (atomic_load32(¤t_span->remaining_spans) > - (int32_t)current_span->span_count)) { - current_span->next = keep; - keep = current_span; - } else { - _rpmalloc_span_unmap(current_span); - } - } - - if (keep) { - while (!atomic_cas32_acquire(&cache->lock, 1, 0)) - _rpmalloc_spin(); - - size_t islot = 0; - while (keep) { - for (; islot < cache->count; ++islot) { - span_t *current_span = cache->span[islot]; - if (!(current_span->flags & SPAN_FLAG_MASTER) || - ((current_span->flags & SPAN_FLAG_MASTER) && - (atomic_load32(¤t_span->remaining_spans) <= - (int32_t)current_span->span_count))) { - _rpmalloc_span_unmap(current_span); - cache->span[islot] = keep; - break; - } - } - if (islot == cache->count) - break; - keep = keep->next; - } - - if (keep) { - span_t *tail = keep; - while (tail->next) - tail = tail->next; - tail->next = cache->overflow; - cache->overflow = keep; - } - - atomic_store32_release(&cache->lock, 0); - } -} - -static size_t _rpmalloc_global_cache_extract_spans(span_t **span, - size_t span_count, - size_t count) { - global_cache_t *cache = &_memory_span_cache[span_count - 1]; - - size_t extract_count = 0; - while (!atomic_cas32_acquire(&cache->lock, 1, 0)) - _rpmalloc_spin(); - -#if ENABLE_STATISTICS - cache->extract_count += count; -#endif - size_t want = count - extract_count; - if (want > cache->count) - want = cache->count; - - memcpy(span + extract_count, cache->span + (cache->count - want), - sizeof(span_t *) * want); - cache->count -= (uint32_t)want; - extract_count += want; - - while ((extract_count < count) && cache->overflow) { - span_t *current_span = cache->overflow; - span[extract_count++] = current_span; - cache->overflow = current_span->next; - } - -#if ENABLE_ASSERTS - for (size_t ispan = 0; ispan < extract_count; ++ispan) { - rpmalloc_assert(span[ispan]->span_count == span_count, - "Global cache span count mismatch"); - } -#endif - - atomic_store32_release(&cache->lock, 0); - - return extract_count; -} - -#endif - -//////////// -/// -/// Heap control -/// -////// - -static void _rpmalloc_deallocate_huge(span_t *); - -//! Store the given spans as reserve in the given heap -static void _rpmalloc_heap_set_reserved_spans(heap_t *heap, span_t *master, - span_t *reserve, - size_t reserve_span_count) { - heap->span_reserve_master = master; - heap->span_reserve = reserve; - heap->spans_reserved = (uint32_t)reserve_span_count; -} - -//! Adopt the deferred span cache list, optionally extracting the first single -//! span for immediate re-use -static void _rpmalloc_heap_cache_adopt_deferred(heap_t *heap, - span_t **single_span) { - span_t *span = (span_t *)((void *)atomic_exchange_ptr_acquire( - &heap->span_free_deferred, 0)); - while (span) { - span_t *next_span = (span_t *)span->free_list; - rpmalloc_assert(span->heap == heap, "Span heap pointer corrupted"); - if (EXPECTED(span->size_class < SIZE_CLASS_COUNT)) { - rpmalloc_assert(heap->full_span_count, "Heap span counter corrupted"); - --heap->full_span_count; - _rpmalloc_stat_dec(&heap->span_use[0].spans_deferred); -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_remove(&heap->full_span[span->size_class], - span); -#endif - _rpmalloc_stat_dec(&heap->span_use[0].current); - _rpmalloc_stat_dec(&heap->size_class_use[span->size_class].spans_current); - if (single_span && !*single_span) - *single_span = span; - else - _rpmalloc_heap_cache_insert(heap, span); - } else { - if (span->size_class == SIZE_CLASS_HUGE) { - _rpmalloc_deallocate_huge(span); - } else { - rpmalloc_assert(span->size_class == SIZE_CLASS_LARGE, - "Span size class invalid"); - rpmalloc_assert(heap->full_span_count, "Heap span counter corrupted"); - --heap->full_span_count; -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_remove(&heap->large_huge_span, span); -#endif - uint32_t idx = span->span_count - 1; - _rpmalloc_stat_dec(&heap->span_use[idx].spans_deferred); - _rpmalloc_stat_dec(&heap->span_use[idx].current); - if (!idx && single_span && !*single_span) - *single_span = span; - else - _rpmalloc_heap_cache_insert(heap, span); - } - } - span = next_span; - } -} - -static void _rpmalloc_heap_unmap(heap_t *heap) { - if (!heap->master_heap) { - if ((heap->finalize > 1) && !atomic_load32(&heap->child_count)) { - span_t *span = (span_t *)((uintptr_t)heap & _memory_span_mask); - _rpmalloc_span_unmap(span); - } - } else { - if (atomic_decr32(&heap->master_heap->child_count) == 0) { - _rpmalloc_heap_unmap(heap->master_heap); - } - } -} - -static void _rpmalloc_heap_global_finalize(heap_t *heap) { - if (heap->finalize++ > 1) { - --heap->finalize; - return; - } - - _rpmalloc_heap_finalize(heap); - -#if ENABLE_THREAD_CACHE - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - span_cache_t *span_cache; - if (!iclass) - span_cache = &heap->span_cache; - else - span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); - for (size_t ispan = 0; ispan < span_cache->count; ++ispan) - _rpmalloc_span_unmap(span_cache->span[ispan]); - span_cache->count = 0; - } -#endif - - if (heap->full_span_count) { - --heap->finalize; - return; - } - - for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { - if (heap->size_class[iclass].free_list || - heap->size_class[iclass].partial_span) { - --heap->finalize; - return; - } - } - // Heap is now completely free, unmap and remove from heap list - size_t list_idx = (size_t)heap->id % HEAP_ARRAY_SIZE; - heap_t *list_heap = _memory_heaps[list_idx]; - if (list_heap == heap) { - _memory_heaps[list_idx] = heap->next_heap; - } else { - while (list_heap->next_heap != heap) - list_heap = list_heap->next_heap; - list_heap->next_heap = heap->next_heap; - } - - _rpmalloc_heap_unmap(heap); -} - -//! Insert a single span into thread heap cache, releasing to global cache if -//! overflow -static void _rpmalloc_heap_cache_insert(heap_t *heap, span_t *span) { - if (UNEXPECTED(heap->finalize != 0)) { - _rpmalloc_span_unmap(span); - _rpmalloc_heap_global_finalize(heap); - return; - } -#if ENABLE_THREAD_CACHE - size_t span_count = span->span_count; - _rpmalloc_stat_inc(&heap->span_use[span_count - 1].spans_to_cache); - if (span_count == 1) { - span_cache_t *span_cache = &heap->span_cache; - span_cache->span[span_cache->count++] = span; - if (span_cache->count == MAX_THREAD_SPAN_CACHE) { - const size_t remain_count = - MAX_THREAD_SPAN_CACHE - THREAD_SPAN_CACHE_TRANSFER; -#if ENABLE_GLOBAL_CACHE - _rpmalloc_stat_add64(&heap->thread_to_global, - THREAD_SPAN_CACHE_TRANSFER * _memory_span_size); - _rpmalloc_stat_add(&heap->span_use[span_count - 1].spans_to_global, - THREAD_SPAN_CACHE_TRANSFER); - _rpmalloc_global_cache_insert_spans(span_cache->span + remain_count, - span_count, - THREAD_SPAN_CACHE_TRANSFER); -#else - for (size_t ispan = 0; ispan < THREAD_SPAN_CACHE_TRANSFER; ++ispan) - _rpmalloc_span_unmap(span_cache->span[remain_count + ispan]); -#endif - span_cache->count = remain_count; - } - } else { - size_t cache_idx = span_count - 2; - span_large_cache_t *span_cache = heap->span_large_cache + cache_idx; - span_cache->span[span_cache->count++] = span; - const size_t cache_limit = - (MAX_THREAD_SPAN_LARGE_CACHE - (span_count >> 1)); - if (span_cache->count == cache_limit) { - const size_t transfer_limit = 2 + (cache_limit >> 2); - const size_t transfer_count = - (THREAD_SPAN_LARGE_CACHE_TRANSFER <= transfer_limit - ? THREAD_SPAN_LARGE_CACHE_TRANSFER - : transfer_limit); - const size_t remain_count = cache_limit - transfer_count; -#if ENABLE_GLOBAL_CACHE - _rpmalloc_stat_add64(&heap->thread_to_global, - transfer_count * span_count * _memory_span_size); - _rpmalloc_stat_add(&heap->span_use[span_count - 1].spans_to_global, - transfer_count); - _rpmalloc_global_cache_insert_spans(span_cache->span + remain_count, - span_count, transfer_count); -#else - for (size_t ispan = 0; ispan < transfer_count; ++ispan) - _rpmalloc_span_unmap(span_cache->span[remain_count + ispan]); -#endif - span_cache->count = remain_count; - } - } -#else - (void)sizeof(heap); - _rpmalloc_span_unmap(span); -#endif -} - -//! Extract the given number of spans from the different cache levels -static span_t *_rpmalloc_heap_thread_cache_extract(heap_t *heap, - size_t span_count) { - span_t *span = 0; -#if ENABLE_THREAD_CACHE - span_cache_t *span_cache; - if (span_count == 1) - span_cache = &heap->span_cache; - else - span_cache = (span_cache_t *)(heap->span_large_cache + (span_count - 2)); - if (span_cache->count) { - _rpmalloc_stat_inc(&heap->span_use[span_count - 1].spans_from_cache); - return span_cache->span[--span_cache->count]; - } -#endif - return span; -} - -static span_t *_rpmalloc_heap_thread_cache_deferred_extract(heap_t *heap, - size_t span_count) { - span_t *span = 0; - if (span_count == 1) { - _rpmalloc_heap_cache_adopt_deferred(heap, &span); - } else { - _rpmalloc_heap_cache_adopt_deferred(heap, 0); - span = _rpmalloc_heap_thread_cache_extract(heap, span_count); - } - return span; -} - -static span_t *_rpmalloc_heap_reserved_extract(heap_t *heap, - size_t span_count) { - if (heap->spans_reserved >= span_count) - return _rpmalloc_span_map(heap, span_count); - return 0; -} - -//! Extract a span from the global cache -static span_t *_rpmalloc_heap_global_cache_extract(heap_t *heap, - size_t span_count) { -#if ENABLE_GLOBAL_CACHE -#if ENABLE_THREAD_CACHE - span_cache_t *span_cache; - size_t wanted_count; - if (span_count == 1) { - span_cache = &heap->span_cache; - wanted_count = THREAD_SPAN_CACHE_TRANSFER; - } else { - span_cache = (span_cache_t *)(heap->span_large_cache + (span_count - 2)); - wanted_count = THREAD_SPAN_LARGE_CACHE_TRANSFER; - } - span_cache->count = _rpmalloc_global_cache_extract_spans( - span_cache->span, span_count, wanted_count); - if (span_cache->count) { - _rpmalloc_stat_add64(&heap->global_to_thread, - span_count * span_cache->count * _memory_span_size); - _rpmalloc_stat_add(&heap->span_use[span_count - 1].spans_from_global, - span_cache->count); - return span_cache->span[--span_cache->count]; - } -#else - span_t *span = 0; - size_t count = _rpmalloc_global_cache_extract_spans(&span, span_count, 1); - if (count) { - _rpmalloc_stat_add64(&heap->global_to_thread, - span_count * count * _memory_span_size); - _rpmalloc_stat_add(&heap->span_use[span_count - 1].spans_from_global, - count); - return span; - } -#endif -#endif - (void)sizeof(heap); - (void)sizeof(span_count); - return 0; -} - -static void _rpmalloc_inc_span_statistics(heap_t *heap, size_t span_count, - uint32_t class_idx) { - (void)sizeof(heap); - (void)sizeof(span_count); - (void)sizeof(class_idx); -#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS - uint32_t idx = (uint32_t)span_count - 1; - uint32_t current_count = - (uint32_t)atomic_incr32(&heap->span_use[idx].current); - if (current_count > (uint32_t)atomic_load32(&heap->span_use[idx].high)) - atomic_store32(&heap->span_use[idx].high, (int32_t)current_count); - _rpmalloc_stat_add_peak(&heap->size_class_use[class_idx].spans_current, 1, - heap->size_class_use[class_idx].spans_peak); -#endif -} - -//! Get a span from one of the cache levels (thread cache, reserved, global -//! cache) or fallback to mapping more memory -static span_t * -_rpmalloc_heap_extract_new_span(heap_t *heap, - heap_size_class_t *heap_size_class, - size_t span_count, uint32_t class_idx) { - span_t *span; -#if ENABLE_THREAD_CACHE - if (heap_size_class && heap_size_class->cache) { - span = heap_size_class->cache; - heap_size_class->cache = - (heap->span_cache.count - ? heap->span_cache.span[--heap->span_cache.count] - : 0); - _rpmalloc_inc_span_statistics(heap, span_count, class_idx); - return span; - } -#endif - (void)sizeof(class_idx); - // Allow 50% overhead to increase cache hits - size_t base_span_count = span_count; - size_t limit_span_count = - (span_count > 2) ? (span_count + (span_count >> 1)) : span_count; - if (limit_span_count > LARGE_CLASS_COUNT) - limit_span_count = LARGE_CLASS_COUNT; - do { - span = _rpmalloc_heap_thread_cache_extract(heap, span_count); - if (EXPECTED(span != 0)) { - _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_from_cache); - _rpmalloc_inc_span_statistics(heap, span_count, class_idx); - return span; - } - span = _rpmalloc_heap_thread_cache_deferred_extract(heap, span_count); - if (EXPECTED(span != 0)) { - _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_from_cache); - _rpmalloc_inc_span_statistics(heap, span_count, class_idx); - return span; - } - span = _rpmalloc_heap_global_cache_extract(heap, span_count); - if (EXPECTED(span != 0)) { - _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_from_cache); - _rpmalloc_inc_span_statistics(heap, span_count, class_idx); - return span; - } - span = _rpmalloc_heap_reserved_extract(heap, span_count); - if (EXPECTED(span != 0)) { - _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_from_reserved); - _rpmalloc_inc_span_statistics(heap, span_count, class_idx); - return span; - } - ++span_count; - } while (span_count <= limit_span_count); - // Final fallback, map in more virtual memory - span = _rpmalloc_span_map(heap, base_span_count); - _rpmalloc_inc_span_statistics(heap, base_span_count, class_idx); - _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_map_calls); - return span; -} - -static void _rpmalloc_heap_initialize(heap_t *heap) { - _rpmalloc_memset_const(heap, 0, sizeof(heap_t)); - // Get a new heap ID - heap->id = 1 + atomic_incr32(&_memory_heap_id); - - // Link in heap in heap ID map - size_t list_idx = (size_t)heap->id % HEAP_ARRAY_SIZE; - heap->next_heap = _memory_heaps[list_idx]; - _memory_heaps[list_idx] = heap; -} - -static void _rpmalloc_heap_orphan(heap_t *heap, int first_class) { - heap->owner_thread = (uintptr_t)-1; -#if RPMALLOC_FIRST_CLASS_HEAPS - heap_t **heap_list = - (first_class ? &_memory_first_class_orphan_heaps : &_memory_orphan_heaps); -#else - (void)sizeof(first_class); - heap_t **heap_list = &_memory_orphan_heaps; -#endif - heap->next_orphan = *heap_list; - *heap_list = heap; -} - -//! Allocate a new heap from newly mapped memory pages -static heap_t *_rpmalloc_heap_allocate_new(void) { - // Map in pages for a 16 heaps. If page size is greater than required size for - // this, map a page and use first part for heaps and remaining part for spans - // for allocations. Adds a lot of complexity, but saves a lot of memory on - // systems where page size > 64 spans (4MiB) - size_t heap_size = sizeof(heap_t); - size_t aligned_heap_size = 16 * ((heap_size + 15) / 16); - size_t request_heap_count = 16; - size_t heap_span_count = ((aligned_heap_size * request_heap_count) + - sizeof(span_t) + _memory_span_size - 1) / - _memory_span_size; - size_t block_size = _memory_span_size * heap_span_count; - size_t span_count = heap_span_count; - span_t *span = 0; - // If there are global reserved spans, use these first - if (_memory_global_reserve_count >= heap_span_count) { - span = _rpmalloc_global_get_reserved_spans(heap_span_count); - } - if (!span) { - if (_memory_page_size > block_size) { - span_count = _memory_page_size / _memory_span_size; - block_size = _memory_page_size; - // If using huge pages, make sure to grab enough heaps to avoid - // reallocating a huge page just to serve new heaps - size_t possible_heap_count = - (block_size - sizeof(span_t)) / aligned_heap_size; - if (possible_heap_count >= (request_heap_count * 16)) - request_heap_count *= 16; - else if (possible_heap_count < request_heap_count) - request_heap_count = possible_heap_count; - heap_span_count = ((aligned_heap_size * request_heap_count) + - sizeof(span_t) + _memory_span_size - 1) / - _memory_span_size; - } - - size_t align_offset = 0; - span = (span_t *)_rpmalloc_mmap(block_size, &align_offset); - if (!span) - return 0; - - // Master span will contain the heaps - _rpmalloc_stat_inc(&_master_spans); - _rpmalloc_span_initialize(span, span_count, heap_span_count, align_offset); - } - - size_t remain_size = _memory_span_size - sizeof(span_t); - heap_t *heap = (heap_t *)pointer_offset(span, sizeof(span_t)); - _rpmalloc_heap_initialize(heap); - - // Put extra heaps as orphans - size_t num_heaps = remain_size / aligned_heap_size; - if (num_heaps < request_heap_count) - num_heaps = request_heap_count; - atomic_store32(&heap->child_count, (int32_t)num_heaps - 1); - heap_t *extra_heap = (heap_t *)pointer_offset(heap, aligned_heap_size); - while (num_heaps > 1) { - _rpmalloc_heap_initialize(extra_heap); - extra_heap->master_heap = heap; - _rpmalloc_heap_orphan(extra_heap, 1); - extra_heap = (heap_t *)pointer_offset(extra_heap, aligned_heap_size); - --num_heaps; - } - - if (span_count > heap_span_count) { - // Cap reserved spans - size_t remain_count = span_count - heap_span_count; - size_t reserve_count = - (remain_count > _memory_heap_reserve_count ? _memory_heap_reserve_count - : remain_count); - span_t *remain_span = - (span_t *)pointer_offset(span, heap_span_count * _memory_span_size); - _rpmalloc_heap_set_reserved_spans(heap, span, remain_span, reserve_count); - - if (remain_count > reserve_count) { - // Set to global reserved spans - remain_span = (span_t *)pointer_offset(remain_span, - reserve_count * _memory_span_size); - reserve_count = remain_count - reserve_count; - _rpmalloc_global_set_reserved_spans(span, remain_span, reserve_count); - } - } - - return heap; -} - -static heap_t *_rpmalloc_heap_extract_orphan(heap_t **heap_list) { - heap_t *heap = *heap_list; - *heap_list = (heap ? heap->next_orphan : 0); - return heap; -} - -//! Allocate a new heap, potentially reusing a previously orphaned heap -static heap_t *_rpmalloc_heap_allocate(int first_class) { - heap_t *heap = 0; - while (!atomic_cas32_acquire(&_memory_global_lock, 1, 0)) - _rpmalloc_spin(); - if (first_class == 0) - heap = _rpmalloc_heap_extract_orphan(&_memory_orphan_heaps); -#if RPMALLOC_FIRST_CLASS_HEAPS - if (!heap) - heap = _rpmalloc_heap_extract_orphan(&_memory_first_class_orphan_heaps); -#endif - if (!heap) - heap = _rpmalloc_heap_allocate_new(); - atomic_store32_release(&_memory_global_lock, 0); - if (heap) - _rpmalloc_heap_cache_adopt_deferred(heap, 0); - return heap; -} - -static void _rpmalloc_heap_release(void *heapptr, int first_class, - int release_cache) { - heap_t *heap = (heap_t *)heapptr; - if (!heap) - return; - // Release thread cache spans back to global cache - _rpmalloc_heap_cache_adopt_deferred(heap, 0); - if (release_cache || heap->finalize) { -#if ENABLE_THREAD_CACHE - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - span_cache_t *span_cache; - if (!iclass) - span_cache = &heap->span_cache; - else - span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); - if (!span_cache->count) - continue; -#if ENABLE_GLOBAL_CACHE - if (heap->finalize) { - for (size_t ispan = 0; ispan < span_cache->count; ++ispan) - _rpmalloc_span_unmap(span_cache->span[ispan]); - } else { - _rpmalloc_stat_add64(&heap->thread_to_global, span_cache->count * - (iclass + 1) * - _memory_span_size); - _rpmalloc_stat_add(&heap->span_use[iclass].spans_to_global, - span_cache->count); - _rpmalloc_global_cache_insert_spans(span_cache->span, iclass + 1, - span_cache->count); - } -#else - for (size_t ispan = 0; ispan < span_cache->count; ++ispan) - _rpmalloc_span_unmap(span_cache->span[ispan]); -#endif - span_cache->count = 0; - } -#endif - } - - if (get_thread_heap_raw() == heap) - set_thread_heap(0); - -#if ENABLE_STATISTICS - atomic_decr32(&_memory_active_heaps); - rpmalloc_assert(atomic_load32(&_memory_active_heaps) >= 0, - "Still active heaps during finalization"); -#endif - - // If we are forcibly terminating with _exit the state of the - // lock atomic is unknown and it's best to just go ahead and exit - if (get_thread_id() != _rpmalloc_main_thread_id) { - while (!atomic_cas32_acquire(&_memory_global_lock, 1, 0)) - _rpmalloc_spin(); - } - _rpmalloc_heap_orphan(heap, first_class); - atomic_store32_release(&_memory_global_lock, 0); -} - -static void _rpmalloc_heap_release_raw(void *heapptr, int release_cache) { - _rpmalloc_heap_release(heapptr, 0, release_cache); -} - -static void _rpmalloc_heap_release_raw_fc(void *heapptr) { - _rpmalloc_heap_release_raw(heapptr, 1); -} - -static void _rpmalloc_heap_finalize(heap_t *heap) { - if (heap->spans_reserved) { - span_t *span = _rpmalloc_span_map(heap, heap->spans_reserved); - _rpmalloc_span_unmap(span); - heap->spans_reserved = 0; - } - - _rpmalloc_heap_cache_adopt_deferred(heap, 0); - - for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { - if (heap->size_class[iclass].cache) - _rpmalloc_span_unmap(heap->size_class[iclass].cache); - heap->size_class[iclass].cache = 0; - span_t *span = heap->size_class[iclass].partial_span; - while (span) { - span_t *next = span->next; - _rpmalloc_span_finalize(heap, iclass, span, - &heap->size_class[iclass].partial_span); - span = next; - } - // If class still has a free list it must be a full span - if (heap->size_class[iclass].free_list) { - span_t *class_span = - (span_t *)((uintptr_t)heap->size_class[iclass].free_list & - _memory_span_mask); - span_t **list = 0; -#if RPMALLOC_FIRST_CLASS_HEAPS - list = &heap->full_span[iclass]; -#endif - --heap->full_span_count; - if (!_rpmalloc_span_finalize(heap, iclass, class_span, list)) { - if (list) - _rpmalloc_span_double_link_list_remove(list, class_span); - _rpmalloc_span_double_link_list_add( - &heap->size_class[iclass].partial_span, class_span); - } - } - } - -#if ENABLE_THREAD_CACHE - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - span_cache_t *span_cache; - if (!iclass) - span_cache = &heap->span_cache; - else - span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); - for (size_t ispan = 0; ispan < span_cache->count; ++ispan) - _rpmalloc_span_unmap(span_cache->span[ispan]); - span_cache->count = 0; - } -#endif - rpmalloc_assert(!atomic_load_ptr(&heap->span_free_deferred), - "Heaps still active during finalization"); -} - -//////////// -/// -/// Allocation entry points -/// -////// - -//! Pop first block from a free list -static void *free_list_pop(void **list) { - void *block = *list; - *list = *((void **)block); - return block; -} - -//! Allocate a small/medium sized memory block from the given heap -static void *_rpmalloc_allocate_from_heap_fallback( - heap_t *heap, heap_size_class_t *heap_size_class, uint32_t class_idx) { - span_t *span = heap_size_class->partial_span; - rpmalloc_assume(heap != 0); - if (EXPECTED(span != 0)) { - rpmalloc_assert(span->block_count == - _memory_size_class[span->size_class].block_count, - "Span block count corrupted"); - rpmalloc_assert(!_rpmalloc_span_is_fully_utilized(span), - "Internal failure"); - void *block; - if (span->free_list) { - // Span local free list is not empty, swap to size class free list - block = free_list_pop(&span->free_list); - heap_size_class->free_list = span->free_list; - span->free_list = 0; - } else { - // If the span did not fully initialize free list, link up another page - // worth of blocks - void *block_start = pointer_offset( - span, SPAN_HEADER_SIZE + - ((size_t)span->free_list_limit * span->block_size)); - span->free_list_limit += free_list_partial_init( - &heap_size_class->free_list, &block, - (void *)((uintptr_t)block_start & ~(_memory_page_size - 1)), - block_start, span->block_count - span->free_list_limit, - span->block_size); - } - rpmalloc_assert(span->free_list_limit <= span->block_count, - "Span block count corrupted"); - span->used_count = span->free_list_limit; - - // Swap in deferred free list if present - if (atomic_load_ptr(&span->free_list_deferred)) - _rpmalloc_span_extract_free_list_deferred(span); - - // If span is still not fully utilized keep it in partial list and early - // return block - if (!_rpmalloc_span_is_fully_utilized(span)) - return block; - - // The span is fully utilized, unlink from partial list and add to fully - // utilized list - _rpmalloc_span_double_link_list_pop_head(&heap_size_class->partial_span, - span); -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_add(&heap->full_span[class_idx], span); -#endif - ++heap->full_span_count; - return block; - } - - // Find a span in one of the cache levels - span = _rpmalloc_heap_extract_new_span(heap, heap_size_class, 1, class_idx); - if (EXPECTED(span != 0)) { - // Mark span as owned by this heap and set base data, return first block - return _rpmalloc_span_initialize_new(heap, heap_size_class, span, - class_idx); - } - - return 0; -} - -//! Allocate a small sized memory block from the given heap -static void *_rpmalloc_allocate_small(heap_t *heap, size_t size) { - rpmalloc_assert(heap, "No thread heap"); - // Small sizes have unique size classes - const uint32_t class_idx = - (uint32_t)((size + (SMALL_GRANULARITY - 1)) >> SMALL_GRANULARITY_SHIFT); - heap_size_class_t *heap_size_class = heap->size_class + class_idx; - _rpmalloc_stat_inc_alloc(heap, class_idx); - if (EXPECTED(heap_size_class->free_list != 0)) - return free_list_pop(&heap_size_class->free_list); - return _rpmalloc_allocate_from_heap_fallback(heap, heap_size_class, - class_idx); -} - -//! Allocate a medium sized memory block from the given heap -static void *_rpmalloc_allocate_medium(heap_t *heap, size_t size) { - rpmalloc_assert(heap, "No thread heap"); - // Calculate the size class index and do a dependent lookup of the final class - // index (in case of merged classes) - const uint32_t base_idx = - (uint32_t)(SMALL_CLASS_COUNT + - ((size - (SMALL_SIZE_LIMIT + 1)) >> MEDIUM_GRANULARITY_SHIFT)); - const uint32_t class_idx = _memory_size_class[base_idx].class_idx; - heap_size_class_t *heap_size_class = heap->size_class + class_idx; - _rpmalloc_stat_inc_alloc(heap, class_idx); - if (EXPECTED(heap_size_class->free_list != 0)) - return free_list_pop(&heap_size_class->free_list); - return _rpmalloc_allocate_from_heap_fallback(heap, heap_size_class, - class_idx); -} - -//! Allocate a large sized memory block from the given heap -static void *_rpmalloc_allocate_large(heap_t *heap, size_t size) { - rpmalloc_assert(heap, "No thread heap"); - // Calculate number of needed max sized spans (including header) - // Since this function is never called if size > LARGE_SIZE_LIMIT - // the span_count is guaranteed to be <= LARGE_CLASS_COUNT - size += SPAN_HEADER_SIZE; - size_t span_count = size >> _memory_span_size_shift; - if (size & (_memory_span_size - 1)) - ++span_count; - - // Find a span in one of the cache levels - span_t *span = - _rpmalloc_heap_extract_new_span(heap, 0, span_count, SIZE_CLASS_LARGE); - if (!span) - return span; - - // Mark span as owned by this heap and set base data - rpmalloc_assert(span->span_count >= span_count, "Internal failure"); - span->size_class = SIZE_CLASS_LARGE; - span->heap = heap; - -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_add(&heap->large_huge_span, span); -#endif - ++heap->full_span_count; - - return pointer_offset(span, SPAN_HEADER_SIZE); -} - -//! Allocate a huge block by mapping memory pages directly -static void *_rpmalloc_allocate_huge(heap_t *heap, size_t size) { - rpmalloc_assert(heap, "No thread heap"); - _rpmalloc_heap_cache_adopt_deferred(heap, 0); - size += SPAN_HEADER_SIZE; - size_t num_pages = size >> _memory_page_size_shift; - if (size & (_memory_page_size - 1)) - ++num_pages; - size_t align_offset = 0; - span_t *span = - (span_t *)_rpmalloc_mmap(num_pages * _memory_page_size, &align_offset); - if (!span) - return span; - - // Store page count in span_count - span->size_class = SIZE_CLASS_HUGE; - span->span_count = (uint32_t)num_pages; - span->align_offset = (uint32_t)align_offset; - span->heap = heap; - _rpmalloc_stat_add_peak(&_huge_pages_current, num_pages, _huge_pages_peak); - -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_add(&heap->large_huge_span, span); -#endif - ++heap->full_span_count; - - return pointer_offset(span, SPAN_HEADER_SIZE); -} - -//! Allocate a block of the given size -static void *_rpmalloc_allocate(heap_t *heap, size_t size) { - _rpmalloc_stat_add64(&_allocation_counter, 1); - if (EXPECTED(size <= SMALL_SIZE_LIMIT)) - return _rpmalloc_allocate_small(heap, size); - else if (size <= _memory_medium_size_limit) - return _rpmalloc_allocate_medium(heap, size); - else if (size <= LARGE_SIZE_LIMIT) - return _rpmalloc_allocate_large(heap, size); - return _rpmalloc_allocate_huge(heap, size); -} - -static void *_rpmalloc_aligned_allocate(heap_t *heap, size_t alignment, - size_t size) { - if (alignment <= SMALL_GRANULARITY) - return _rpmalloc_allocate(heap, size); - -#if ENABLE_VALIDATE_ARGS - if ((size + alignment) < size) { - errno = EINVAL; - return 0; - } - if (alignment & (alignment - 1)) { - errno = EINVAL; - return 0; - } -#endif - - if ((alignment <= SPAN_HEADER_SIZE) && - ((size + SPAN_HEADER_SIZE) < _memory_medium_size_limit)) { - // If alignment is less or equal to span header size (which is power of - // two), and size aligned to span header size multiples is less than size + - // alignment, then use natural alignment of blocks to provide alignment - size_t multiple_size = size ? (size + (SPAN_HEADER_SIZE - 1)) & - ~(uintptr_t)(SPAN_HEADER_SIZE - 1) - : SPAN_HEADER_SIZE; - rpmalloc_assert(!(multiple_size % SPAN_HEADER_SIZE), - "Failed alignment calculation"); - if (multiple_size <= (size + alignment)) - return _rpmalloc_allocate(heap, multiple_size); - } - - void *ptr = 0; - size_t align_mask = alignment - 1; - if (alignment <= _memory_page_size) { - ptr = _rpmalloc_allocate(heap, size + alignment); - if ((uintptr_t)ptr & align_mask) { - ptr = (void *)(((uintptr_t)ptr & ~(uintptr_t)align_mask) + alignment); - // Mark as having aligned blocks - span_t *span = (span_t *)((uintptr_t)ptr & _memory_span_mask); - span->flags |= SPAN_FLAG_ALIGNED_BLOCKS; - } - return ptr; - } - - // Fallback to mapping new pages for this request. Since pointers passed - // to rpfree must be able to reach the start of the span by bitmasking of - // the address with the span size, the returned aligned pointer from this - // function must be with a span size of the start of the mapped area. - // In worst case this requires us to loop and map pages until we get a - // suitable memory address. It also means we can never align to span size - // or greater, since the span header will push alignment more than one - // span size away from span start (thus causing pointer mask to give us - // an invalid span start on free) - if (alignment & align_mask) { - errno = EINVAL; - return 0; - } - if (alignment >= _memory_span_size) { - errno = EINVAL; - return 0; - } - - size_t extra_pages = alignment / _memory_page_size; - - // Since each span has a header, we will at least need one extra memory page - size_t num_pages = 1 + (size / _memory_page_size); - if (size & (_memory_page_size - 1)) - ++num_pages; - - if (extra_pages > num_pages) - num_pages = 1 + extra_pages; - - size_t original_pages = num_pages; - size_t limit_pages = (_memory_span_size / _memory_page_size) * 2; - if (limit_pages < (original_pages * 2)) - limit_pages = original_pages * 2; - - size_t mapped_size, align_offset; - span_t *span; - -retry: - align_offset = 0; - mapped_size = num_pages * _memory_page_size; - - span = (span_t *)_rpmalloc_mmap(mapped_size, &align_offset); - if (!span) { - errno = ENOMEM; - return 0; - } - ptr = pointer_offset(span, SPAN_HEADER_SIZE); - - if ((uintptr_t)ptr & align_mask) - ptr = (void *)(((uintptr_t)ptr & ~(uintptr_t)align_mask) + alignment); - - if (((size_t)pointer_diff(ptr, span) >= _memory_span_size) || - (pointer_offset(ptr, size) > pointer_offset(span, mapped_size)) || - (((uintptr_t)ptr & _memory_span_mask) != (uintptr_t)span)) { - _rpmalloc_unmap(span, mapped_size, align_offset, mapped_size); - ++num_pages; - if (num_pages > limit_pages) { - errno = EINVAL; - return 0; - } - goto retry; - } - - // Store page count in span_count - span->size_class = SIZE_CLASS_HUGE; - span->span_count = (uint32_t)num_pages; - span->align_offset = (uint32_t)align_offset; - span->heap = heap; - _rpmalloc_stat_add_peak(&_huge_pages_current, num_pages, _huge_pages_peak); - -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_add(&heap->large_huge_span, span); -#endif - ++heap->full_span_count; - - _rpmalloc_stat_add64(&_allocation_counter, 1); - - return ptr; -} - -//////////// -/// -/// Deallocation entry points -/// -////// - -//! Deallocate the given small/medium memory block in the current thread local -//! heap -static void _rpmalloc_deallocate_direct_small_or_medium(span_t *span, - void *block) { - heap_t *heap = span->heap; - rpmalloc_assert(heap->owner_thread == get_thread_id() || - !heap->owner_thread || heap->finalize, - "Internal failure"); - // Add block to free list - if (UNEXPECTED(_rpmalloc_span_is_fully_utilized(span))) { - span->used_count = span->block_count; -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_remove(&heap->full_span[span->size_class], - span); -#endif - _rpmalloc_span_double_link_list_add( - &heap->size_class[span->size_class].partial_span, span); - --heap->full_span_count; - } - *((void **)block) = span->free_list; - --span->used_count; - span->free_list = block; - if (UNEXPECTED(span->used_count == span->list_size)) { - // If there are no used blocks it is guaranteed that no other external - // thread is accessing the span - if (span->used_count) { - // Make sure we have synchronized the deferred list and list size by using - // acquire semantics and guarantee that no external thread is accessing - // span concurrently - void *free_list; - do { - free_list = atomic_exchange_ptr_acquire(&span->free_list_deferred, - INVALID_POINTER); - } while (free_list == INVALID_POINTER); - atomic_store_ptr_release(&span->free_list_deferred, free_list); - } - _rpmalloc_span_double_link_list_remove( - &heap->size_class[span->size_class].partial_span, span); - _rpmalloc_span_release_to_cache(heap, span); - } -} - -static void _rpmalloc_deallocate_defer_free_span(heap_t *heap, span_t *span) { - if (span->size_class != SIZE_CLASS_HUGE) - _rpmalloc_stat_inc(&heap->span_use[span->span_count - 1].spans_deferred); - // This list does not need ABA protection, no mutable side state - do { - span->free_list = (void *)atomic_load_ptr(&heap->span_free_deferred); - } while (!atomic_cas_ptr(&heap->span_free_deferred, span, span->free_list)); -} - -//! Put the block in the deferred free list of the owning span -static void _rpmalloc_deallocate_defer_small_or_medium(span_t *span, - void *block) { - // The memory ordering here is a bit tricky, to avoid having to ABA protect - // the deferred free list to avoid desynchronization of list and list size - // we need to have acquire semantics on successful CAS of the pointer to - // guarantee the list_size variable validity + release semantics on pointer - // store - void *free_list; - do { - free_list = - atomic_exchange_ptr_acquire(&span->free_list_deferred, INVALID_POINTER); - } while (free_list == INVALID_POINTER); - *((void **)block) = free_list; - uint32_t free_count = ++span->list_size; - int all_deferred_free = (free_count == span->block_count); - atomic_store_ptr_release(&span->free_list_deferred, block); - if (all_deferred_free) { - // Span was completely freed by this block. Due to the INVALID_POINTER spin - // lock no other thread can reach this state simultaneously on this span. - // Safe to move to owner heap deferred cache - _rpmalloc_deallocate_defer_free_span(span->heap, span); - } -} - -static void _rpmalloc_deallocate_small_or_medium(span_t *span, void *p) { - _rpmalloc_stat_inc_free(span->heap, span->size_class); - if (span->flags & SPAN_FLAG_ALIGNED_BLOCKS) { - // Realign pointer to block start - void *blocks_start = pointer_offset(span, SPAN_HEADER_SIZE); - uint32_t block_offset = (uint32_t)pointer_diff(p, blocks_start); - p = pointer_offset(p, -(int32_t)(block_offset % span->block_size)); - } - // Check if block belongs to this heap or if deallocation should be deferred -#if RPMALLOC_FIRST_CLASS_HEAPS - int defer = - (span->heap->owner_thread && - (span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); -#else - int defer = - ((span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); -#endif - if (!defer) - _rpmalloc_deallocate_direct_small_or_medium(span, p); - else - _rpmalloc_deallocate_defer_small_or_medium(span, p); -} - -//! Deallocate the given large memory block to the current heap -static void _rpmalloc_deallocate_large(span_t *span) { - rpmalloc_assert(span->size_class == SIZE_CLASS_LARGE, "Bad span size class"); - rpmalloc_assert(!(span->flags & SPAN_FLAG_MASTER) || - !(span->flags & SPAN_FLAG_SUBSPAN), - "Span flag corrupted"); - rpmalloc_assert((span->flags & SPAN_FLAG_MASTER) || - (span->flags & SPAN_FLAG_SUBSPAN), - "Span flag corrupted"); - // We must always defer (unless finalizing) if from another heap since we - // cannot touch the list or counters of another heap -#if RPMALLOC_FIRST_CLASS_HEAPS - int defer = - (span->heap->owner_thread && - (span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); -#else - int defer = - ((span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); -#endif - if (defer) { - _rpmalloc_deallocate_defer_free_span(span->heap, span); - return; - } - rpmalloc_assert(span->heap->full_span_count, "Heap span counter corrupted"); - --span->heap->full_span_count; -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_remove(&span->heap->large_huge_span, span); -#endif -#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS - // Decrease counter - size_t idx = span->span_count - 1; - atomic_decr32(&span->heap->span_use[idx].current); -#endif - heap_t *heap = span->heap; - rpmalloc_assert(heap, "No thread heap"); -#if ENABLE_THREAD_CACHE - const int set_as_reserved = - ((span->span_count > 1) && (heap->span_cache.count == 0) && - !heap->finalize && !heap->spans_reserved); -#else - const int set_as_reserved = - ((span->span_count > 1) && !heap->finalize && !heap->spans_reserved); -#endif - if (set_as_reserved) { - heap->span_reserve = span; - heap->spans_reserved = span->span_count; - if (span->flags & SPAN_FLAG_MASTER) { - heap->span_reserve_master = span; - } else { // SPAN_FLAG_SUBSPAN - span_t *master = (span_t *)pointer_offset( - span, - -(intptr_t)((size_t)span->offset_from_master * _memory_span_size)); - heap->span_reserve_master = master; - rpmalloc_assert(master->flags & SPAN_FLAG_MASTER, "Span flag corrupted"); - rpmalloc_assert(atomic_load32(&master->remaining_spans) >= - (int32_t)span->span_count, - "Master span count corrupted"); - } - _rpmalloc_stat_inc(&heap->span_use[idx].spans_to_reserved); - } else { - // Insert into cache list - _rpmalloc_heap_cache_insert(heap, span); - } -} - -//! Deallocate the given huge span -static void _rpmalloc_deallocate_huge(span_t *span) { - rpmalloc_assert(span->heap, "No span heap"); -#if RPMALLOC_FIRST_CLASS_HEAPS - int defer = - (span->heap->owner_thread && - (span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); -#else - int defer = - ((span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); -#endif - if (defer) { - _rpmalloc_deallocate_defer_free_span(span->heap, span); - return; - } - rpmalloc_assert(span->heap->full_span_count, "Heap span counter corrupted"); - --span->heap->full_span_count; -#if RPMALLOC_FIRST_CLASS_HEAPS - _rpmalloc_span_double_link_list_remove(&span->heap->large_huge_span, span); -#endif - - // Oversized allocation, page count is stored in span_count - size_t num_pages = span->span_count; - _rpmalloc_unmap(span, num_pages * _memory_page_size, span->align_offset, - num_pages * _memory_page_size); - _rpmalloc_stat_sub(&_huge_pages_current, num_pages); -} - -//! Deallocate the given block -static void _rpmalloc_deallocate(void *p) { - _rpmalloc_stat_add64(&_deallocation_counter, 1); - // Grab the span (always at start of span, using span alignment) - span_t *span = (span_t *)((uintptr_t)p & _memory_span_mask); - if (UNEXPECTED(!span)) - return; - if (EXPECTED(span->size_class < SIZE_CLASS_COUNT)) - _rpmalloc_deallocate_small_or_medium(span, p); - else if (span->size_class == SIZE_CLASS_LARGE) - _rpmalloc_deallocate_large(span); - else - _rpmalloc_deallocate_huge(span); -} - -//////////// -/// -/// Reallocation entry points -/// -////// - -static size_t _rpmalloc_usable_size(void *p); - -//! Reallocate the given block to the given size -static void *_rpmalloc_reallocate(heap_t *heap, void *p, size_t size, - size_t oldsize, unsigned int flags) { - if (p) { - // Grab the span using guaranteed span alignment - span_t *span = (span_t *)((uintptr_t)p & _memory_span_mask); - if (EXPECTED(span->size_class < SIZE_CLASS_COUNT)) { - // Small/medium sized block - rpmalloc_assert(span->span_count == 1, "Span counter corrupted"); - void *blocks_start = pointer_offset(span, SPAN_HEADER_SIZE); - uint32_t block_offset = (uint32_t)pointer_diff(p, blocks_start); - uint32_t block_idx = block_offset / span->block_size; - void *block = - pointer_offset(blocks_start, (size_t)block_idx * span->block_size); - if (!oldsize) - oldsize = - (size_t)((ptrdiff_t)span->block_size - pointer_diff(p, block)); - if ((size_t)span->block_size >= size) { - // Still fits in block, never mind trying to save memory, but preserve - // data if alignment changed - if ((p != block) && !(flags & RPMALLOC_NO_PRESERVE)) - memmove(block, p, oldsize); - return block; - } - } else if (span->size_class == SIZE_CLASS_LARGE) { - // Large block - size_t total_size = size + SPAN_HEADER_SIZE; - size_t num_spans = total_size >> _memory_span_size_shift; - if (total_size & (_memory_span_mask - 1)) - ++num_spans; - size_t current_spans = span->span_count; - void *block = pointer_offset(span, SPAN_HEADER_SIZE); - if (!oldsize) - oldsize = (current_spans * _memory_span_size) - - (size_t)pointer_diff(p, block) - SPAN_HEADER_SIZE; - if ((current_spans >= num_spans) && (total_size >= (oldsize / 2))) { - // Still fits in block, never mind trying to save memory, but preserve - // data if alignment changed - if ((p != block) && !(flags & RPMALLOC_NO_PRESERVE)) - memmove(block, p, oldsize); - return block; - } - } else { - // Oversized block - size_t total_size = size + SPAN_HEADER_SIZE; - size_t num_pages = total_size >> _memory_page_size_shift; - if (total_size & (_memory_page_size - 1)) - ++num_pages; - // Page count is stored in span_count - size_t current_pages = span->span_count; - void *block = pointer_offset(span, SPAN_HEADER_SIZE); - if (!oldsize) - oldsize = (current_pages * _memory_page_size) - - (size_t)pointer_diff(p, block) - SPAN_HEADER_SIZE; - if ((current_pages >= num_pages) && (num_pages >= (current_pages / 2))) { - // Still fits in block, never mind trying to save memory, but preserve - // data if alignment changed - if ((p != block) && !(flags & RPMALLOC_NO_PRESERVE)) - memmove(block, p, oldsize); - return block; - } - } - } else { - oldsize = 0; - } - - if (!!(flags & RPMALLOC_GROW_OR_FAIL)) - return 0; - - // Size is greater than block size, need to allocate a new block and - // deallocate the old Avoid hysteresis by overallocating if increase is small - // (below 37%) - size_t lower_bound = oldsize + (oldsize >> 2) + (oldsize >> 3); - size_t new_size = - (size > lower_bound) ? size : ((size > oldsize) ? lower_bound : size); - void *block = _rpmalloc_allocate(heap, new_size); - if (p && block) { - if (!(flags & RPMALLOC_NO_PRESERVE)) - memcpy(block, p, oldsize < new_size ? oldsize : new_size); - _rpmalloc_deallocate(p); - } - - return block; -} - -static void *_rpmalloc_aligned_reallocate(heap_t *heap, void *ptr, - size_t alignment, size_t size, - size_t oldsize, unsigned int flags) { - if (alignment <= SMALL_GRANULARITY) - return _rpmalloc_reallocate(heap, ptr, size, oldsize, flags); - - int no_alloc = !!(flags & RPMALLOC_GROW_OR_FAIL); - size_t usablesize = (ptr ? _rpmalloc_usable_size(ptr) : 0); - if ((usablesize >= size) && !((uintptr_t)ptr & (alignment - 1))) { - if (no_alloc || (size >= (usablesize / 2))) - return ptr; - } - // Aligned alloc marks span as having aligned blocks - void *block = - (!no_alloc ? _rpmalloc_aligned_allocate(heap, alignment, size) : 0); - if (EXPECTED(block != 0)) { - if (!(flags & RPMALLOC_NO_PRESERVE) && ptr) { - if (!oldsize) - oldsize = usablesize; - memcpy(block, ptr, oldsize < size ? oldsize : size); - } - _rpmalloc_deallocate(ptr); - } - return block; -} - -//////////// -/// -/// Initialization, finalization and utility -/// -////// - -//! Get the usable size of the given block -static size_t _rpmalloc_usable_size(void *p) { - // Grab the span using guaranteed span alignment - span_t *span = (span_t *)((uintptr_t)p & _memory_span_mask); - if (span->size_class < SIZE_CLASS_COUNT) { - // Small/medium block - void *blocks_start = pointer_offset(span, SPAN_HEADER_SIZE); - return span->block_size - - ((size_t)pointer_diff(p, blocks_start) % span->block_size); - } - if (span->size_class == SIZE_CLASS_LARGE) { - // Large block - size_t current_spans = span->span_count; - return (current_spans * _memory_span_size) - (size_t)pointer_diff(p, span); - } - // Oversized block, page count is stored in span_count - size_t current_pages = span->span_count; - return (current_pages * _memory_page_size) - (size_t)pointer_diff(p, span); -} - -//! Adjust and optimize the size class properties for the given class -static void _rpmalloc_adjust_size_class(size_t iclass) { - size_t block_size = _memory_size_class[iclass].block_size; - size_t block_count = (_memory_span_size - SPAN_HEADER_SIZE) / block_size; - - _memory_size_class[iclass].block_count = (uint16_t)block_count; - _memory_size_class[iclass].class_idx = (uint16_t)iclass; - - // Check if previous size classes can be merged - if (iclass >= SMALL_CLASS_COUNT) { - size_t prevclass = iclass; - while (prevclass > 0) { - --prevclass; - // A class can be merged if number of pages and number of blocks are equal - if (_memory_size_class[prevclass].block_count == - _memory_size_class[iclass].block_count) - _rpmalloc_memcpy_const(_memory_size_class + prevclass, - _memory_size_class + iclass, - sizeof(_memory_size_class[iclass])); - else - break; - } - } -} - -//! Initialize the allocator and setup global data -extern inline int rpmalloc_initialize(void) { - if (_rpmalloc_initialized) { - rpmalloc_thread_initialize(); - return 0; - } - return rpmalloc_initialize_config(0); -} - -int rpmalloc_initialize_config(const rpmalloc_config_t *config) { - if (_rpmalloc_initialized) { - rpmalloc_thread_initialize(); - return 0; - } - _rpmalloc_initialized = 1; - - if (config) - memcpy(&_memory_config, config, sizeof(rpmalloc_config_t)); - else - _rpmalloc_memset_const(&_memory_config, 0, sizeof(rpmalloc_config_t)); - - if (!_memory_config.memory_map || !_memory_config.memory_unmap) { - _memory_config.memory_map = _rpmalloc_mmap_os; - _memory_config.memory_unmap = _rpmalloc_unmap_os; - } - -#if PLATFORM_WINDOWS - SYSTEM_INFO system_info; - memset(&system_info, 0, sizeof(system_info)); - GetSystemInfo(&system_info); - _memory_map_granularity = system_info.dwAllocationGranularity; -#else - _memory_map_granularity = (size_t)sysconf(_SC_PAGESIZE); -#endif - -#if RPMALLOC_CONFIGURABLE - _memory_page_size = _memory_config.page_size; -#else - _memory_page_size = 0; -#endif - _memory_huge_pages = 0; - if (!_memory_page_size) { -#if PLATFORM_WINDOWS - _memory_page_size = system_info.dwPageSize; -#else - _memory_page_size = _memory_map_granularity; - if (_memory_config.enable_huge_pages) { -#if defined(__linux__) - size_t huge_page_size = 0; - FILE *meminfo = fopen("/proc/meminfo", "r"); - if (meminfo) { - char line[128]; - while (!huge_page_size && fgets(line, sizeof(line) - 1, meminfo)) { - line[sizeof(line) - 1] = 0; - if (strstr(line, "Hugepagesize:")) - huge_page_size = (size_t)strtol(line + 13, 0, 10) * 1024; - } - fclose(meminfo); - } - if (huge_page_size) { - _memory_huge_pages = 1; - _memory_page_size = huge_page_size; - _memory_map_granularity = huge_page_size; - } -#elif defined(__FreeBSD__) - int rc; - size_t sz = sizeof(rc); - - if (sysctlbyname("vm.pmap.pg_ps_enabled", &rc, &sz, NULL, 0) == 0 && - rc == 1) { - static size_t defsize = 2 * 1024 * 1024; - int nsize = 0; - size_t sizes[4] = {0}; - _memory_huge_pages = 1; - _memory_page_size = defsize; - if ((nsize = getpagesizes(sizes, 4)) >= 2) { - nsize--; - for (size_t csize = sizes[nsize]; nsize >= 0 && csize; - --nsize, csize = sizes[nsize]) { - //! Unlikely, but as a precaution.. - rpmalloc_assert(!(csize & (csize - 1)) && !(csize % 1024), - "Invalid page size"); - if (defsize < csize) { - _memory_page_size = csize; - break; - } - } - } - _memory_map_granularity = _memory_page_size; - } -#elif defined(__APPLE__) || defined(__NetBSD__) - _memory_huge_pages = 1; - _memory_page_size = 2 * 1024 * 1024; - _memory_map_granularity = _memory_page_size; -#endif - } -#endif - } else { - if (_memory_config.enable_huge_pages) - _memory_huge_pages = 1; - } - -#if PLATFORM_WINDOWS - if (_memory_config.enable_huge_pages) { - HANDLE token = 0; - size_t large_page_minimum = GetLargePageMinimum(); - if (large_page_minimum) - OpenProcessToken(GetCurrentProcess(), - TOKEN_ADJUST_PRIVILEGES | TOKEN_QUERY, &token); - if (token) { - LUID luid; - if (LookupPrivilegeValue(0, SE_LOCK_MEMORY_NAME, &luid)) { - TOKEN_PRIVILEGES token_privileges; - memset(&token_privileges, 0, sizeof(token_privileges)); - token_privileges.PrivilegeCount = 1; - token_privileges.Privileges[0].Luid = luid; - token_privileges.Privileges[0].Attributes = SE_PRIVILEGE_ENABLED; - if (AdjustTokenPrivileges(token, FALSE, &token_privileges, 0, 0, 0)) { - if (GetLastError() == ERROR_SUCCESS) - _memory_huge_pages = 1; - } - } - CloseHandle(token); - } - if (_memory_huge_pages) { - if (large_page_minimum > _memory_page_size) - _memory_page_size = large_page_minimum; - if (large_page_minimum > _memory_map_granularity) - _memory_map_granularity = large_page_minimum; - } - } -#endif - - size_t min_span_size = 256; - size_t max_page_size; -#if UINTPTR_MAX > 0xFFFFFFFF - max_page_size = 4096ULL * 1024ULL * 1024ULL; -#else - max_page_size = 4 * 1024 * 1024; -#endif - if (_memory_page_size < min_span_size) - _memory_page_size = min_span_size; - if (_memory_page_size > max_page_size) - _memory_page_size = max_page_size; - _memory_page_size_shift = 0; - size_t page_size_bit = _memory_page_size; - while (page_size_bit != 1) { - ++_memory_page_size_shift; - page_size_bit >>= 1; - } - _memory_page_size = ((size_t)1 << _memory_page_size_shift); - -#if RPMALLOC_CONFIGURABLE - if (!_memory_config.span_size) { - _memory_span_size = _memory_default_span_size; - _memory_span_size_shift = _memory_default_span_size_shift; - _memory_span_mask = _memory_default_span_mask; - } else { - size_t span_size = _memory_config.span_size; - if (span_size > (256 * 1024)) - span_size = (256 * 1024); - _memory_span_size = 4096; - _memory_span_size_shift = 12; - while (_memory_span_size < span_size) { - _memory_span_size <<= 1; - ++_memory_span_size_shift; - } - _memory_span_mask = ~(uintptr_t)(_memory_span_size - 1); - } -#endif - - _memory_span_map_count = - (_memory_config.span_map_count ? _memory_config.span_map_count - : DEFAULT_SPAN_MAP_COUNT); - if ((_memory_span_size * _memory_span_map_count) < _memory_page_size) - _memory_span_map_count = (_memory_page_size / _memory_span_size); - if ((_memory_page_size >= _memory_span_size) && - ((_memory_span_map_count * _memory_span_size) % _memory_page_size)) - _memory_span_map_count = (_memory_page_size / _memory_span_size); - _memory_heap_reserve_count = (_memory_span_map_count > DEFAULT_SPAN_MAP_COUNT) - ? DEFAULT_SPAN_MAP_COUNT - : _memory_span_map_count; - - _memory_config.page_size = _memory_page_size; - _memory_config.span_size = _memory_span_size; - _memory_config.span_map_count = _memory_span_map_count; - _memory_config.enable_huge_pages = _memory_huge_pages; - -#if ((defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD) || \ - defined(__TINYC__) - if (pthread_key_create(&_memory_thread_heap, _rpmalloc_heap_release_raw_fc)) - return -1; -#endif -#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) - fls_key = FlsAlloc(&_rpmalloc_thread_destructor); -#endif - - // Setup all small and medium size classes - size_t iclass = 0; - _memory_size_class[iclass].block_size = SMALL_GRANULARITY; - _rpmalloc_adjust_size_class(iclass); - for (iclass = 1; iclass < SMALL_CLASS_COUNT; ++iclass) { - size_t size = iclass * SMALL_GRANULARITY; - _memory_size_class[iclass].block_size = (uint32_t)size; - _rpmalloc_adjust_size_class(iclass); - } - // At least two blocks per span, then fall back to large allocations - _memory_medium_size_limit = (_memory_span_size - SPAN_HEADER_SIZE) >> 1; - if (_memory_medium_size_limit > MEDIUM_SIZE_LIMIT) - _memory_medium_size_limit = MEDIUM_SIZE_LIMIT; - for (iclass = 0; iclass < MEDIUM_CLASS_COUNT; ++iclass) { - size_t size = SMALL_SIZE_LIMIT + ((iclass + 1) * MEDIUM_GRANULARITY); - if (size > _memory_medium_size_limit) { - _memory_medium_size_limit = - SMALL_SIZE_LIMIT + (iclass * MEDIUM_GRANULARITY); - break; - } - _memory_size_class[SMALL_CLASS_COUNT + iclass].block_size = (uint32_t)size; - _rpmalloc_adjust_size_class(SMALL_CLASS_COUNT + iclass); - } - - _memory_orphan_heaps = 0; -#if RPMALLOC_FIRST_CLASS_HEAPS - _memory_first_class_orphan_heaps = 0; -#endif -#if ENABLE_STATISTICS - atomic_store32(&_memory_active_heaps, 0); - atomic_store32(&_mapped_pages, 0); - _mapped_pages_peak = 0; - atomic_store32(&_master_spans, 0); - atomic_store32(&_mapped_total, 0); - atomic_store32(&_unmapped_total, 0); - atomic_store32(&_mapped_pages_os, 0); - atomic_store32(&_huge_pages_current, 0); - _huge_pages_peak = 0; -#endif - memset(_memory_heaps, 0, sizeof(_memory_heaps)); - atomic_store32_release(&_memory_global_lock, 0); - - rpmalloc_linker_reference(); - - // Initialize this thread - rpmalloc_thread_initialize(); - return 0; -} - -//! Finalize the allocator -void rpmalloc_finalize(void) { - rpmalloc_thread_finalize(1); - // rpmalloc_dump_statistics(stdout); - - if (_memory_global_reserve) { - atomic_add32(&_memory_global_reserve_master->remaining_spans, - -(int32_t)_memory_global_reserve_count); - _memory_global_reserve_master = 0; - _memory_global_reserve_count = 0; - _memory_global_reserve = 0; - } - atomic_store32_release(&_memory_global_lock, 0); - - // Free all thread caches and fully free spans - for (size_t list_idx = 0; list_idx < HEAP_ARRAY_SIZE; ++list_idx) { - heap_t *heap = _memory_heaps[list_idx]; - while (heap) { - heap_t *next_heap = heap->next_heap; - heap->finalize = 1; - _rpmalloc_heap_global_finalize(heap); - heap = next_heap; - } - } - -#if ENABLE_GLOBAL_CACHE - // Free global caches - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) - _rpmalloc_global_cache_finalize(&_memory_span_cache[iclass]); -#endif - -#if (defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD - pthread_key_delete(_memory_thread_heap); -#endif -#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) - FlsFree(fls_key); - fls_key = 0; -#endif -#if ENABLE_STATISTICS - // If you hit these asserts you probably have memory leaks (perhaps global - // scope data doing dynamic allocations) or double frees in your code - rpmalloc_assert(atomic_load32(&_mapped_pages) == 0, "Memory leak detected"); - rpmalloc_assert(atomic_load32(&_mapped_pages_os) == 0, - "Memory leak detected"); -#endif - - _rpmalloc_initialized = 0; -} - -//! Initialize thread, assign heap -extern inline void rpmalloc_thread_initialize(void) { - if (!get_thread_heap_raw()) { - heap_t *heap = _rpmalloc_heap_allocate(0); - if (heap) { - _rpmalloc_stat_inc(&_memory_active_heaps); - set_thread_heap(heap); -#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) - FlsSetValue(fls_key, heap); -#endif - } - } -} - -//! Finalize thread, orphan heap -void rpmalloc_thread_finalize(int release_caches) { - heap_t *heap = get_thread_heap_raw(); - if (heap) - _rpmalloc_heap_release_raw(heap, release_caches); - set_thread_heap(0); -#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) - FlsSetValue(fls_key, 0); -#endif -} - -int rpmalloc_is_thread_initialized(void) { - return (get_thread_heap_raw() != 0) ? 1 : 0; -} - -const rpmalloc_config_t *rpmalloc_config(void) { return &_memory_config; } - -// Extern interface - -extern inline RPMALLOC_ALLOCATOR void *rpmalloc(size_t size) { -#if ENABLE_VALIDATE_ARGS - if (size >= MAX_ALLOC_SIZE) { - errno = EINVAL; - return 0; - } -#endif - heap_t *heap = get_thread_heap(); - return _rpmalloc_allocate(heap, size); -} - -extern inline void rpfree(void *ptr) { _rpmalloc_deallocate(ptr); } - -extern inline RPMALLOC_ALLOCATOR void *rpcalloc(size_t num, size_t size) { - size_t total; -#if ENABLE_VALIDATE_ARGS -#if PLATFORM_WINDOWS - int err = SizeTMult(num, size, &total); - if ((err != S_OK) || (total >= MAX_ALLOC_SIZE)) { - errno = EINVAL; - return 0; - } -#else - int err = __builtin_umull_overflow(num, size, &total); - if (err || (total >= MAX_ALLOC_SIZE)) { - errno = EINVAL; - return 0; - } -#endif -#else - total = num * size; -#endif - heap_t *heap = get_thread_heap(); - void *block = _rpmalloc_allocate(heap, total); - if (block) - memset(block, 0, total); - return block; -} - -extern inline RPMALLOC_ALLOCATOR void *rprealloc(void *ptr, size_t size) { -#if ENABLE_VALIDATE_ARGS - if (size >= MAX_ALLOC_SIZE) { - errno = EINVAL; - return ptr; - } -#endif - heap_t *heap = get_thread_heap(); - return _rpmalloc_reallocate(heap, ptr, size, 0, 0); -} - -extern RPMALLOC_ALLOCATOR void *rpaligned_realloc(void *ptr, size_t alignment, - size_t size, size_t oldsize, - unsigned int flags) { -#if ENABLE_VALIDATE_ARGS - if ((size + alignment < size) || (alignment > _memory_page_size)) { - errno = EINVAL; - return 0; - } -#endif - heap_t *heap = get_thread_heap(); - return _rpmalloc_aligned_reallocate(heap, ptr, alignment, size, oldsize, - flags); -} - -extern RPMALLOC_ALLOCATOR void *rpaligned_alloc(size_t alignment, size_t size) { - heap_t *heap = get_thread_heap(); - return _rpmalloc_aligned_allocate(heap, alignment, size); -} - -extern inline RPMALLOC_ALLOCATOR void * -rpaligned_calloc(size_t alignment, size_t num, size_t size) { - size_t total; -#if ENABLE_VALIDATE_ARGS -#if PLATFORM_WINDOWS - int err = SizeTMult(num, size, &total); - if ((err != S_OK) || (total >= MAX_ALLOC_SIZE)) { - errno = EINVAL; - return 0; - } -#else - int err = __builtin_umull_overflow(num, size, &total); - if (err || (total >= MAX_ALLOC_SIZE)) { - errno = EINVAL; - return 0; - } -#endif -#else - total = num * size; -#endif - void *block = rpaligned_alloc(alignment, total); - if (block) - memset(block, 0, total); - return block; -} - -extern inline RPMALLOC_ALLOCATOR void *rpmemalign(size_t alignment, - size_t size) { - return rpaligned_alloc(alignment, size); -} - -extern inline int rpposix_memalign(void **memptr, size_t alignment, - size_t size) { - if (memptr) - *memptr = rpaligned_alloc(alignment, size); - else - return EINVAL; - return *memptr ? 0 : ENOMEM; -} - -extern inline size_t rpmalloc_usable_size(void *ptr) { - return (ptr ? _rpmalloc_usable_size(ptr) : 0); -} - -extern inline void rpmalloc_thread_collect(void) {} - -void rpmalloc_thread_statistics(rpmalloc_thread_statistics_t *stats) { - memset(stats, 0, sizeof(rpmalloc_thread_statistics_t)); - heap_t *heap = get_thread_heap_raw(); - if (!heap) - return; - - for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { - size_class_t *size_class = _memory_size_class + iclass; - span_t *span = heap->size_class[iclass].partial_span; - while (span) { - size_t free_count = span->list_size; - size_t block_count = size_class->block_count; - if (span->free_list_limit < block_count) - block_count = span->free_list_limit; - free_count += (block_count - span->used_count); - stats->sizecache += free_count * size_class->block_size; - span = span->next; - } - } - -#if ENABLE_THREAD_CACHE - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - span_cache_t *span_cache; - if (!iclass) - span_cache = &heap->span_cache; - else - span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); - stats->spancache += span_cache->count * (iclass + 1) * _memory_span_size; - } -#endif - - span_t *deferred = (span_t *)atomic_load_ptr(&heap->span_free_deferred); - while (deferred) { - if (deferred->size_class != SIZE_CLASS_HUGE) - stats->spancache += (size_t)deferred->span_count * _memory_span_size; - deferred = (span_t *)deferred->free_list; - } - -#if ENABLE_STATISTICS - stats->thread_to_global = (size_t)atomic_load64(&heap->thread_to_global); - stats->global_to_thread = (size_t)atomic_load64(&heap->global_to_thread); - - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - stats->span_use[iclass].current = - (size_t)atomic_load32(&heap->span_use[iclass].current); - stats->span_use[iclass].peak = - (size_t)atomic_load32(&heap->span_use[iclass].high); - stats->span_use[iclass].to_global = - (size_t)atomic_load32(&heap->span_use[iclass].spans_to_global); - stats->span_use[iclass].from_global = - (size_t)atomic_load32(&heap->span_use[iclass].spans_from_global); - stats->span_use[iclass].to_cache = - (size_t)atomic_load32(&heap->span_use[iclass].spans_to_cache); - stats->span_use[iclass].from_cache = - (size_t)atomic_load32(&heap->span_use[iclass].spans_from_cache); - stats->span_use[iclass].to_reserved = - (size_t)atomic_load32(&heap->span_use[iclass].spans_to_reserved); - stats->span_use[iclass].from_reserved = - (size_t)atomic_load32(&heap->span_use[iclass].spans_from_reserved); - stats->span_use[iclass].map_calls = - (size_t)atomic_load32(&heap->span_use[iclass].spans_map_calls); - } - for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { - stats->size_use[iclass].alloc_current = - (size_t)atomic_load32(&heap->size_class_use[iclass].alloc_current); - stats->size_use[iclass].alloc_peak = - (size_t)heap->size_class_use[iclass].alloc_peak; - stats->size_use[iclass].alloc_total = - (size_t)atomic_load32(&heap->size_class_use[iclass].alloc_total); - stats->size_use[iclass].free_total = - (size_t)atomic_load32(&heap->size_class_use[iclass].free_total); - stats->size_use[iclass].spans_to_cache = - (size_t)atomic_load32(&heap->size_class_use[iclass].spans_to_cache); - stats->size_use[iclass].spans_from_cache = - (size_t)atomic_load32(&heap->size_class_use[iclass].spans_from_cache); - stats->size_use[iclass].spans_from_reserved = (size_t)atomic_load32( - &heap->size_class_use[iclass].spans_from_reserved); - stats->size_use[iclass].map_calls = - (size_t)atomic_load32(&heap->size_class_use[iclass].spans_map_calls); - } -#endif -} - -void rpmalloc_global_statistics(rpmalloc_global_statistics_t *stats) { - memset(stats, 0, sizeof(rpmalloc_global_statistics_t)); -#if ENABLE_STATISTICS - stats->mapped = (size_t)atomic_load32(&_mapped_pages) * _memory_page_size; - stats->mapped_peak = (size_t)_mapped_pages_peak * _memory_page_size; - stats->mapped_total = - (size_t)atomic_load32(&_mapped_total) * _memory_page_size; - stats->unmapped_total = - (size_t)atomic_load32(&_unmapped_total) * _memory_page_size; - stats->huge_alloc = - (size_t)atomic_load32(&_huge_pages_current) * _memory_page_size; - stats->huge_alloc_peak = (size_t)_huge_pages_peak * _memory_page_size; -#endif -#if ENABLE_GLOBAL_CACHE - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - global_cache_t *cache = &_memory_span_cache[iclass]; - while (!atomic_cas32_acquire(&cache->lock, 1, 0)) - _rpmalloc_spin(); - uint32_t count = cache->count; -#if ENABLE_UNLIMITED_CACHE - span_t *current_span = cache->overflow; - while (current_span) { - ++count; - current_span = current_span->next; - } -#endif - atomic_store32_release(&cache->lock, 0); - stats->cached += count * (iclass + 1) * _memory_span_size; - } -#endif -} - -#if ENABLE_STATISTICS - -static void _memory_heap_dump_statistics(heap_t *heap, void *file) { - fprintf(file, "Heap %d stats:\n", heap->id); - fprintf(file, "Class CurAlloc PeakAlloc TotAlloc TotFree BlkSize " - "BlkCount SpansCur SpansPeak PeakAllocMiB ToCacheMiB " - "FromCacheMiB FromReserveMiB MmapCalls\n"); - for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { - if (!atomic_load32(&heap->size_class_use[iclass].alloc_total)) - continue; - fprintf( - file, - "%3u: %10u %10u %10u %10u %8u %8u %8d %9d %13zu %11zu %12zu %14zu " - "%9u\n", - (uint32_t)iclass, - atomic_load32(&heap->size_class_use[iclass].alloc_current), - heap->size_class_use[iclass].alloc_peak, - atomic_load32(&heap->size_class_use[iclass].alloc_total), - atomic_load32(&heap->size_class_use[iclass].free_total), - _memory_size_class[iclass].block_size, - _memory_size_class[iclass].block_count, - atomic_load32(&heap->size_class_use[iclass].spans_current), - heap->size_class_use[iclass].spans_peak, - ((size_t)heap->size_class_use[iclass].alloc_peak * - (size_t)_memory_size_class[iclass].block_size) / - (size_t)(1024 * 1024), - ((size_t)atomic_load32(&heap->size_class_use[iclass].spans_to_cache) * - _memory_span_size) / - (size_t)(1024 * 1024), - ((size_t)atomic_load32(&heap->size_class_use[iclass].spans_from_cache) * - _memory_span_size) / - (size_t)(1024 * 1024), - ((size_t)atomic_load32( - &heap->size_class_use[iclass].spans_from_reserved) * - _memory_span_size) / - (size_t)(1024 * 1024), - atomic_load32(&heap->size_class_use[iclass].spans_map_calls)); - } - fprintf(file, "Spans Current Peak Deferred PeakMiB Cached ToCacheMiB " - "FromCacheMiB ToReserveMiB FromReserveMiB ToGlobalMiB " - "FromGlobalMiB MmapCalls\n"); - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - if (!atomic_load32(&heap->span_use[iclass].high) && - !atomic_load32(&heap->span_use[iclass].spans_map_calls)) - continue; - fprintf( - file, - "%4u: %8d %8u %8u %8zu %7u %11zu %12zu %12zu %14zu %11zu %13zu %10u\n", - (uint32_t)(iclass + 1), atomic_load32(&heap->span_use[iclass].current), - atomic_load32(&heap->span_use[iclass].high), - atomic_load32(&heap->span_use[iclass].spans_deferred), - ((size_t)atomic_load32(&heap->span_use[iclass].high) * - (size_t)_memory_span_size * (iclass + 1)) / - (size_t)(1024 * 1024), -#if ENABLE_THREAD_CACHE - (unsigned int)(!iclass ? heap->span_cache.count - : heap->span_large_cache[iclass - 1].count), - ((size_t)atomic_load32(&heap->span_use[iclass].spans_to_cache) * - (iclass + 1) * _memory_span_size) / - (size_t)(1024 * 1024), - ((size_t)atomic_load32(&heap->span_use[iclass].spans_from_cache) * - (iclass + 1) * _memory_span_size) / - (size_t)(1024 * 1024), -#else - 0, (size_t)0, (size_t)0, -#endif - ((size_t)atomic_load32(&heap->span_use[iclass].spans_to_reserved) * - (iclass + 1) * _memory_span_size) / - (size_t)(1024 * 1024), - ((size_t)atomic_load32(&heap->span_use[iclass].spans_from_reserved) * - (iclass + 1) * _memory_span_size) / - (size_t)(1024 * 1024), - ((size_t)atomic_load32(&heap->span_use[iclass].spans_to_global) * - (size_t)_memory_span_size * (iclass + 1)) / - (size_t)(1024 * 1024), - ((size_t)atomic_load32(&heap->span_use[iclass].spans_from_global) * - (size_t)_memory_span_size * (iclass + 1)) / - (size_t)(1024 * 1024), - atomic_load32(&heap->span_use[iclass].spans_map_calls)); - } - fprintf(file, "Full spans: %zu\n", heap->full_span_count); - fprintf(file, "ThreadToGlobalMiB GlobalToThreadMiB\n"); - fprintf( - file, "%17zu %17zu\n", - (size_t)atomic_load64(&heap->thread_to_global) / (size_t)(1024 * 1024), - (size_t)atomic_load64(&heap->global_to_thread) / (size_t)(1024 * 1024)); -} - -#endif - -void rpmalloc_dump_statistics(void *file) { -#if ENABLE_STATISTICS - for (size_t list_idx = 0; list_idx < HEAP_ARRAY_SIZE; ++list_idx) { - heap_t *heap = _memory_heaps[list_idx]; - while (heap) { - int need_dump = 0; - for (size_t iclass = 0; !need_dump && (iclass < SIZE_CLASS_COUNT); - ++iclass) { - if (!atomic_load32(&heap->size_class_use[iclass].alloc_total)) { - rpmalloc_assert( - !atomic_load32(&heap->size_class_use[iclass].free_total), - "Heap statistics counter mismatch"); - rpmalloc_assert( - !atomic_load32(&heap->size_class_use[iclass].spans_map_calls), - "Heap statistics counter mismatch"); - continue; - } - need_dump = 1; - } - for (size_t iclass = 0; !need_dump && (iclass < LARGE_CLASS_COUNT); - ++iclass) { - if (!atomic_load32(&heap->span_use[iclass].high) && - !atomic_load32(&heap->span_use[iclass].spans_map_calls)) - continue; - need_dump = 1; - } - if (need_dump) - _memory_heap_dump_statistics(heap, file); - heap = heap->next_heap; - } - } - fprintf(file, "Global stats:\n"); - size_t huge_current = - (size_t)atomic_load32(&_huge_pages_current) * _memory_page_size; - size_t huge_peak = (size_t)_huge_pages_peak * _memory_page_size; - fprintf(file, "HugeCurrentMiB HugePeakMiB\n"); - fprintf(file, "%14zu %11zu\n", huge_current / (size_t)(1024 * 1024), - huge_peak / (size_t)(1024 * 1024)); - -#if ENABLE_GLOBAL_CACHE - fprintf(file, "GlobalCacheMiB\n"); - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - global_cache_t *cache = _memory_span_cache + iclass; - size_t global_cache = (size_t)cache->count * iclass * _memory_span_size; - - size_t global_overflow_cache = 0; - span_t *span = cache->overflow; - while (span) { - global_overflow_cache += iclass * _memory_span_size; - span = span->next; - } - if (global_cache || global_overflow_cache || cache->insert_count || - cache->extract_count) - fprintf(file, - "%4zu: %8zuMiB (%8zuMiB overflow) %14zu insert %14zu extract\n", - iclass + 1, global_cache / (size_t)(1024 * 1024), - global_overflow_cache / (size_t)(1024 * 1024), - cache->insert_count, cache->extract_count); - } -#endif - - size_t mapped = (size_t)atomic_load32(&_mapped_pages) * _memory_page_size; - size_t mapped_os = - (size_t)atomic_load32(&_mapped_pages_os) * _memory_page_size; - size_t mapped_peak = (size_t)_mapped_pages_peak * _memory_page_size; - size_t mapped_total = - (size_t)atomic_load32(&_mapped_total) * _memory_page_size; - size_t unmapped_total = - (size_t)atomic_load32(&_unmapped_total) * _memory_page_size; - fprintf( - file, - "MappedMiB MappedOSMiB MappedPeakMiB MappedTotalMiB UnmappedTotalMiB\n"); - fprintf(file, "%9zu %11zu %13zu %14zu %16zu\n", - mapped / (size_t)(1024 * 1024), mapped_os / (size_t)(1024 * 1024), - mapped_peak / (size_t)(1024 * 1024), - mapped_total / (size_t)(1024 * 1024), - unmapped_total / (size_t)(1024 * 1024)); - - fprintf(file, "\n"); -#if 0 - int64_t allocated = atomic_load64(&_allocation_counter); - int64_t deallocated = atomic_load64(&_deallocation_counter); - fprintf(file, "Allocation count: %lli\n", allocated); - fprintf(file, "Deallocation count: %lli\n", deallocated); - fprintf(file, "Current allocations: %lli\n", (allocated - deallocated)); - fprintf(file, "Master spans: %d\n", atomic_load32(&_master_spans)); - fprintf(file, "Dangling master spans: %d\n", atomic_load32(&_unmapped_master_spans)); -#endif -#endif - (void)sizeof(file); -} - -#if RPMALLOC_FIRST_CLASS_HEAPS - -extern inline rpmalloc_heap_t *rpmalloc_heap_acquire(void) { - // Must be a pristine heap from newly mapped memory pages, or else memory - // blocks could already be allocated from the heap which would (wrongly) be - // released when heap is cleared with rpmalloc_heap_free_all(). Also heaps - // guaranteed to be pristine from the dedicated orphan list can be used. - heap_t *heap = _rpmalloc_heap_allocate(1); - rpmalloc_assume(heap != NULL); - heap->owner_thread = 0; - _rpmalloc_stat_inc(&_memory_active_heaps); - return heap; -} - -extern inline void rpmalloc_heap_release(rpmalloc_heap_t *heap) { - if (heap) - _rpmalloc_heap_release(heap, 1, 1); -} - -extern inline RPMALLOC_ALLOCATOR void * -rpmalloc_heap_alloc(rpmalloc_heap_t *heap, size_t size) { -#if ENABLE_VALIDATE_ARGS - if (size >= MAX_ALLOC_SIZE) { - errno = EINVAL; - return 0; - } -#endif - return _rpmalloc_allocate(heap, size); -} - -extern inline RPMALLOC_ALLOCATOR void * -rpmalloc_heap_aligned_alloc(rpmalloc_heap_t *heap, size_t alignment, - size_t size) { -#if ENABLE_VALIDATE_ARGS - if (size >= MAX_ALLOC_SIZE) { - errno = EINVAL; - return 0; - } -#endif - return _rpmalloc_aligned_allocate(heap, alignment, size); -} - -extern inline RPMALLOC_ALLOCATOR void * -rpmalloc_heap_calloc(rpmalloc_heap_t *heap, size_t num, size_t size) { - return rpmalloc_heap_aligned_calloc(heap, 0, num, size); -} - -extern inline RPMALLOC_ALLOCATOR void * -rpmalloc_heap_aligned_calloc(rpmalloc_heap_t *heap, size_t alignment, - size_t num, size_t size) { - size_t total; -#if ENABLE_VALIDATE_ARGS -#if PLATFORM_WINDOWS - int err = SizeTMult(num, size, &total); - if ((err != S_OK) || (total >= MAX_ALLOC_SIZE)) { - errno = EINVAL; - return 0; - } -#else - int err = __builtin_umull_overflow(num, size, &total); - if (err || (total >= MAX_ALLOC_SIZE)) { - errno = EINVAL; - return 0; - } -#endif -#else - total = num * size; -#endif - void *block = _rpmalloc_aligned_allocate(heap, alignment, total); - if (block) - memset(block, 0, total); - return block; -} - -extern inline RPMALLOC_ALLOCATOR void * -rpmalloc_heap_realloc(rpmalloc_heap_t *heap, void *ptr, size_t size, - unsigned int flags) { -#if ENABLE_VALIDATE_ARGS - if (size >= MAX_ALLOC_SIZE) { - errno = EINVAL; - return ptr; - } -#endif - return _rpmalloc_reallocate(heap, ptr, size, 0, flags); -} - -extern inline RPMALLOC_ALLOCATOR void * -rpmalloc_heap_aligned_realloc(rpmalloc_heap_t *heap, void *ptr, - size_t alignment, size_t size, - unsigned int flags) { -#if ENABLE_VALIDATE_ARGS - if ((size + alignment < size) || (alignment > _memory_page_size)) { - errno = EINVAL; - return 0; - } -#endif - return _rpmalloc_aligned_reallocate(heap, ptr, alignment, size, 0, flags); -} - -extern inline void rpmalloc_heap_free(rpmalloc_heap_t *heap, void *ptr) { - (void)sizeof(heap); - _rpmalloc_deallocate(ptr); -} - -extern inline void rpmalloc_heap_free_all(rpmalloc_heap_t *heap) { - span_t *span; - span_t *next_span; - - _rpmalloc_heap_cache_adopt_deferred(heap, 0); - - for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { - span = heap->size_class[iclass].partial_span; - while (span) { - next_span = span->next; - _rpmalloc_heap_cache_insert(heap, span); - span = next_span; - } - heap->size_class[iclass].partial_span = 0; - span = heap->full_span[iclass]; - while (span) { - next_span = span->next; - _rpmalloc_heap_cache_insert(heap, span); - span = next_span; - } - - span = heap->size_class[iclass].cache; - if (span) - _rpmalloc_heap_cache_insert(heap, span); - heap->size_class[iclass].cache = 0; - } - memset(heap->size_class, 0, sizeof(heap->size_class)); - memset(heap->full_span, 0, sizeof(heap->full_span)); - - span = heap->large_huge_span; - while (span) { - next_span = span->next; - if (UNEXPECTED(span->size_class == SIZE_CLASS_HUGE)) - _rpmalloc_deallocate_huge(span); - else - _rpmalloc_heap_cache_insert(heap, span); - span = next_span; - } - heap->large_huge_span = 0; - heap->full_span_count = 0; - -#if ENABLE_THREAD_CACHE - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - span_cache_t *span_cache; - if (!iclass) - span_cache = &heap->span_cache; - else - span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); - if (!span_cache->count) - continue; -#if ENABLE_GLOBAL_CACHE - _rpmalloc_stat_add64(&heap->thread_to_global, - span_cache->count * (iclass + 1) * _memory_span_size); - _rpmalloc_stat_add(&heap->span_use[iclass].spans_to_global, - span_cache->count); - _rpmalloc_global_cache_insert_spans(span_cache->span, iclass + 1, - span_cache->count); -#else - for (size_t ispan = 0; ispan < span_cache->count; ++ispan) - _rpmalloc_span_unmap(span_cache->span[ispan]); -#endif - span_cache->count = 0; - } -#endif - -#if ENABLE_STATISTICS - for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { - atomic_store32(&heap->size_class_use[iclass].alloc_current, 0); - atomic_store32(&heap->size_class_use[iclass].spans_current, 0); - } - for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { - atomic_store32(&heap->span_use[iclass].current, 0); - } -#endif -} - -extern inline void rpmalloc_heap_thread_set_current(rpmalloc_heap_t *heap) { - heap_t *prev_heap = get_thread_heap_raw(); - if (prev_heap != heap) { - set_thread_heap(heap); - if (prev_heap) - rpmalloc_heap_release(prev_heap); - } -} - -extern inline rpmalloc_heap_t *rpmalloc_get_heap_for_ptr(void *ptr) { - // Grab the span, and then the heap from the span - span_t *span = (span_t *)((uintptr_t)ptr & _memory_span_mask); - if (span) { - return span->heap; - } - return 0; -} - -#endif - -#if ENABLE_PRELOAD || ENABLE_OVERRIDE - -#include "malloc.c" - -#endif - -void rpmalloc_linker_reference(void) { (void)sizeof(_rpmalloc_initialized); } +//===---------------------- rpmalloc.c ------------------*- C -*-=============// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This library provides a cross-platform lock free thread caching malloc +// implementation in C11. +// +//===----------------------------------------------------------------------===// + +#include "rpmalloc.h" + +//////////// +/// +/// Build time configurable limits +/// +////// + +#if defined(__clang__) +#pragma clang diagnostic ignored "-Wunused-macros" +#pragma clang diagnostic ignored "-Wunused-function" +#if __has_warning("-Wreserved-identifier") +#pragma clang diagnostic ignored "-Wreserved-identifier" +#endif +#if __has_warning("-Wstatic-in-inline") +#pragma clang diagnostic ignored "-Wstatic-in-inline" +#endif +#elif defined(__GNUC__) +#pragma GCC diagnostic ignored "-Wunused-macros" +#pragma GCC diagnostic ignored "-Wunused-function" +#endif + +#if !defined(__has_builtin) +#define __has_builtin(b) 0 +#endif + +#if defined(__GNUC__) || defined(__clang__) + +#if __has_builtin(__builtin_memcpy_inline) +#define _rpmalloc_memcpy_const(x, y, s) __builtin_memcpy_inline(x, y, s) +#else +#define _rpmalloc_memcpy_const(x, y, s) \ + do { \ + _Static_assert(__builtin_choose_expr(__builtin_constant_p(s), 1, 0), \ + "len must be a constant integer"); \ + memcpy(x, y, s); \ + } while (0) +#endif + +#if __has_builtin(__builtin_memset_inline) +#define _rpmalloc_memset_const(x, y, s) __builtin_memset_inline(x, y, s) +#else +#define _rpmalloc_memset_const(x, y, s) \ + do { \ + _Static_assert(__builtin_choose_expr(__builtin_constant_p(s), 1, 0), \ + "len must be a constant integer"); \ + memset(x, y, s); \ + } while (0) +#endif +#else +#define _rpmalloc_memcpy_const(x, y, s) memcpy(x, y, s) +#define _rpmalloc_memset_const(x, y, s) memset(x, y, s) +#endif + +#if __has_builtin(__builtin_assume) +#define rpmalloc_assume(cond) __builtin_assume(cond) +#elif defined(__GNUC__) +#define rpmalloc_assume(cond) \ + do { \ + if (!__builtin_expect(cond, 0)) \ + __builtin_unreachable(); \ + } while (0) +#elif defined(_MSC_VER) +#define rpmalloc_assume(cond) __assume(cond) +#else +#define rpmalloc_assume(cond) 0 +#endif + +#ifndef HEAP_ARRAY_SIZE +//! Size of heap hashmap +#define HEAP_ARRAY_SIZE 47 +#endif +#ifndef ENABLE_THREAD_CACHE +//! Enable per-thread cache +#define ENABLE_THREAD_CACHE 1 +#endif +#ifndef ENABLE_GLOBAL_CACHE +//! Enable global cache shared between all threads, requires thread cache +#define ENABLE_GLOBAL_CACHE 1 +#endif +#ifndef ENABLE_VALIDATE_ARGS +//! Enable validation of args to public entry points +#define ENABLE_VALIDATE_ARGS 0 +#endif +#ifndef ENABLE_STATISTICS +//! Enable statistics collection +#define ENABLE_STATISTICS 0 +#endif +#ifndef ENABLE_ASSERTS +//! Enable asserts +#define ENABLE_ASSERTS 0 +#endif +#ifndef ENABLE_OVERRIDE +//! Override standard library malloc/free and new/delete entry points +#define ENABLE_OVERRIDE 0 +#endif +#ifndef ENABLE_PRELOAD +//! Support preloading +#define ENABLE_PRELOAD 0 +#endif +#ifndef DISABLE_UNMAP +//! Disable unmapping memory pages (also enables unlimited cache) +#define DISABLE_UNMAP 0 +#endif +#ifndef ENABLE_UNLIMITED_CACHE +//! Enable unlimited global cache (no unmapping until finalization) +#define ENABLE_UNLIMITED_CACHE 0 +#endif +#ifndef ENABLE_ADAPTIVE_THREAD_CACHE +//! Enable adaptive thread cache size based on use heuristics +#define ENABLE_ADAPTIVE_THREAD_CACHE 0 +#endif +#ifndef DEFAULT_SPAN_MAP_COUNT +//! Default number of spans to map in call to map more virtual memory (default +//! values yield 4MiB here) +#define DEFAULT_SPAN_MAP_COUNT 64 +#endif +#ifndef GLOBAL_CACHE_MULTIPLIER +//! Multiplier for global cache +#define GLOBAL_CACHE_MULTIPLIER 8 +#endif + +#if DISABLE_UNMAP && !ENABLE_GLOBAL_CACHE +#error Must use global cache if unmap is disabled +#endif + +#if DISABLE_UNMAP +#undef ENABLE_UNLIMITED_CACHE +#define ENABLE_UNLIMITED_CACHE 1 +#endif + +#if !ENABLE_GLOBAL_CACHE +#undef ENABLE_UNLIMITED_CACHE +#define ENABLE_UNLIMITED_CACHE 0 +#endif + +#if !ENABLE_THREAD_CACHE +#undef ENABLE_ADAPTIVE_THREAD_CACHE +#define ENABLE_ADAPTIVE_THREAD_CACHE 0 +#endif + +#if defined(_WIN32) || defined(__WIN32__) || defined(_WIN64) +#define PLATFORM_WINDOWS 1 +#define PLATFORM_POSIX 0 +#else +#define PLATFORM_WINDOWS 0 +#define PLATFORM_POSIX 1 +#endif + +/// Platform and arch specifics +#if defined(_MSC_VER) && !defined(__clang__) +#pragma warning(disable : 5105) +#ifndef FORCEINLINE +#define FORCEINLINE inline __forceinline +#endif +#define _Static_assert static_assert +#else +#ifndef FORCEINLINE +#define FORCEINLINE inline __attribute__((__always_inline__)) +#endif +#endif +#if PLATFORM_WINDOWS +#ifndef WIN32_LEAN_AND_MEAN +#define WIN32_LEAN_AND_MEAN +#endif +#include +#if ENABLE_VALIDATE_ARGS +#include +#endif +#else +#include +#include +#include +#include +#if defined(__linux__) || defined(__ANDROID__) +#include +#if !defined(PR_SET_VMA) +#define PR_SET_VMA 0x53564d41 +#define PR_SET_VMA_ANON_NAME 0 +#endif +#endif +#if defined(__APPLE__) +#include +#if !TARGET_OS_IPHONE && !TARGET_OS_SIMULATOR +#include +#include +#endif +#include +#endif +#if defined(__HAIKU__) || defined(__TINYC__) +#include +#endif +#endif + +#include +#include +#include + +#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) +#include +static DWORD fls_key; +#endif + +#if PLATFORM_POSIX +#include +#include +#ifdef __FreeBSD__ +#include +#define MAP_HUGETLB MAP_ALIGNED_SUPER +#ifndef PROT_MAX +#define PROT_MAX(f) 0 +#endif +#else +#define PROT_MAX(f) 0 +#endif +#ifdef __sun +extern int madvise(caddr_t, size_t, int); +#endif +#ifndef MAP_UNINITIALIZED +#define MAP_UNINITIALIZED 0 +#endif +#endif +#include + +#if ENABLE_ASSERTS +#undef NDEBUG +#if defined(_MSC_VER) && !defined(_DEBUG) +#define _DEBUG +#endif +#include +#define RPMALLOC_TOSTRING_M(x) #x +#define RPMALLOC_TOSTRING(x) RPMALLOC_TOSTRING_M(x) +#define rpmalloc_assert(truth, message) \ + do { \ + if (!(truth)) { \ + if (_memory_config.error_callback) { \ + _memory_config.error_callback(message " (" RPMALLOC_TOSTRING( \ + truth) ") at " __FILE__ ":" RPMALLOC_TOSTRING(__LINE__)); \ + } else { \ + assert((truth) && message); \ + } \ + } \ + } while (0) +#else +#define rpmalloc_assert(truth, message) \ + do { \ + } while (0) +#endif +#if ENABLE_STATISTICS +#include +#endif + +////// +/// +/// Atomic access abstraction (since MSVC does not do C11 yet) +/// +////// + +#if defined(_MSC_VER) && !defined(__clang__) + +typedef volatile long atomic32_t; +typedef volatile long long atomic64_t; +typedef volatile void *atomicptr_t; + +static FORCEINLINE int32_t atomic_load32(atomic32_t *src) { return *src; } +static FORCEINLINE void atomic_store32(atomic32_t *dst, int32_t val) { + *dst = val; +} +static FORCEINLINE int32_t atomic_incr32(atomic32_t *val) { + return (int32_t)InterlockedIncrement(val); +} +static FORCEINLINE int32_t atomic_decr32(atomic32_t *val) { + return (int32_t)InterlockedDecrement(val); +} +static FORCEINLINE int32_t atomic_add32(atomic32_t *val, int32_t add) { + return (int32_t)InterlockedExchangeAdd(val, add) + add; +} +static FORCEINLINE int atomic_cas32_acquire(atomic32_t *dst, int32_t val, + int32_t ref) { + return (InterlockedCompareExchange(dst, val, ref) == ref) ? 1 : 0; +} +static FORCEINLINE void atomic_store32_release(atomic32_t *dst, int32_t val) { + *dst = val; +} +static FORCEINLINE int64_t atomic_load64(atomic64_t *src) { return *src; } +static FORCEINLINE int64_t atomic_add64(atomic64_t *val, int64_t add) { + return (int64_t)InterlockedExchangeAdd64(val, add) + add; +} +static FORCEINLINE void *atomic_load_ptr(atomicptr_t *src) { + return (void *)*src; +} +static FORCEINLINE void atomic_store_ptr(atomicptr_t *dst, void *val) { + *dst = val; +} +static FORCEINLINE void atomic_store_ptr_release(atomicptr_t *dst, void *val) { + *dst = val; +} +static FORCEINLINE void *atomic_exchange_ptr_acquire(atomicptr_t *dst, + void *val) { + return (void *)InterlockedExchangePointer((void *volatile *)dst, val); +} +static FORCEINLINE int atomic_cas_ptr(atomicptr_t *dst, void *val, void *ref) { + return (InterlockedCompareExchangePointer((void *volatile *)dst, val, ref) == + ref) + ? 1 + : 0; +} + +#define EXPECTED(x) (x) +#define UNEXPECTED(x) (x) + +#else + +#include + +typedef volatile _Atomic(int32_t) atomic32_t; +typedef volatile _Atomic(int64_t) atomic64_t; +typedef volatile _Atomic(void *) atomicptr_t; + +static FORCEINLINE int32_t atomic_load32(atomic32_t *src) { + return atomic_load_explicit(src, memory_order_relaxed); +} +static FORCEINLINE void atomic_store32(atomic32_t *dst, int32_t val) { + atomic_store_explicit(dst, val, memory_order_relaxed); +} +static FORCEINLINE int32_t atomic_incr32(atomic32_t *val) { + return atomic_fetch_add_explicit(val, 1, memory_order_relaxed) + 1; +} +static FORCEINLINE int32_t atomic_decr32(atomic32_t *val) { + return atomic_fetch_add_explicit(val, -1, memory_order_relaxed) - 1; +} +static FORCEINLINE int32_t atomic_add32(atomic32_t *val, int32_t add) { + return atomic_fetch_add_explicit(val, add, memory_order_relaxed) + add; +} +static FORCEINLINE int atomic_cas32_acquire(atomic32_t *dst, int32_t val, + int32_t ref) { + return atomic_compare_exchange_weak_explicit( + dst, &ref, val, memory_order_acquire, memory_order_relaxed); +} +static FORCEINLINE void atomic_store32_release(atomic32_t *dst, int32_t val) { + atomic_store_explicit(dst, val, memory_order_release); +} +static FORCEINLINE int64_t atomic_load64(atomic64_t *val) { + return atomic_load_explicit(val, memory_order_relaxed); +} +static FORCEINLINE int64_t atomic_add64(atomic64_t *val, int64_t add) { + return atomic_fetch_add_explicit(val, add, memory_order_relaxed) + add; +} +static FORCEINLINE void *atomic_load_ptr(atomicptr_t *src) { + return atomic_load_explicit(src, memory_order_relaxed); +} +static FORCEINLINE void atomic_store_ptr(atomicptr_t *dst, void *val) { + atomic_store_explicit(dst, val, memory_order_relaxed); +} +static FORCEINLINE void atomic_store_ptr_release(atomicptr_t *dst, void *val) { + atomic_store_explicit(dst, val, memory_order_release); +} +static FORCEINLINE void *atomic_exchange_ptr_acquire(atomicptr_t *dst, + void *val) { + return atomic_exchange_explicit(dst, val, memory_order_acquire); +} +static FORCEINLINE int atomic_cas_ptr(atomicptr_t *dst, void *val, void *ref) { + return atomic_compare_exchange_weak_explicit( + dst, &ref, val, memory_order_relaxed, memory_order_relaxed); +} + +#define EXPECTED(x) __builtin_expect((x), 1) +#define UNEXPECTED(x) __builtin_expect((x), 0) + +#endif + +//////////// +/// +/// Statistics related functions (evaluate to nothing when statistics not +/// enabled) +/// +////// + +#if ENABLE_STATISTICS +#define _rpmalloc_stat_inc(counter) atomic_incr32(counter) +#define _rpmalloc_stat_dec(counter) atomic_decr32(counter) +#define _rpmalloc_stat_add(counter, value) \ + atomic_add32(counter, (int32_t)(value)) +#define _rpmalloc_stat_add64(counter, value) \ + atomic_add64(counter, (int64_t)(value)) +#define _rpmalloc_stat_add_peak(counter, value, peak) \ + do { \ + int32_t _cur_count = atomic_add32(counter, (int32_t)(value)); \ + if (_cur_count > (peak)) \ + peak = _cur_count; \ + } while (0) +#define _rpmalloc_stat_sub(counter, value) \ + atomic_add32(counter, -(int32_t)(value)) +#define _rpmalloc_stat_inc_alloc(heap, class_idx) \ + do { \ + int32_t alloc_current = \ + atomic_incr32(&heap->size_class_use[class_idx].alloc_current); \ + if (alloc_current > heap->size_class_use[class_idx].alloc_peak) \ + heap->size_class_use[class_idx].alloc_peak = alloc_current; \ + atomic_incr32(&heap->size_class_use[class_idx].alloc_total); \ + } while (0) +#define _rpmalloc_stat_inc_free(heap, class_idx) \ + do { \ + atomic_decr32(&heap->size_class_use[class_idx].alloc_current); \ + atomic_incr32(&heap->size_class_use[class_idx].free_total); \ + } while (0) +#else +#define _rpmalloc_stat_inc(counter) \ + do { \ + } while (0) +#define _rpmalloc_stat_dec(counter) \ + do { \ + } while (0) +#define _rpmalloc_stat_add(counter, value) \ + do { \ + } while (0) +#define _rpmalloc_stat_add64(counter, value) \ + do { \ + } while (0) +#define _rpmalloc_stat_add_peak(counter, value, peak) \ + do { \ + } while (0) +#define _rpmalloc_stat_sub(counter, value) \ + do { \ + } while (0) +#define _rpmalloc_stat_inc_alloc(heap, class_idx) \ + do { \ + } while (0) +#define _rpmalloc_stat_inc_free(heap, class_idx) \ + do { \ + } while (0) +#endif + +/// +/// Preconfigured limits and sizes +/// + +//! Granularity of a small allocation block (must be power of two) +#define SMALL_GRANULARITY 16 +//! Small granularity shift count +#define SMALL_GRANULARITY_SHIFT 4 +//! Number of small block size classes +#define SMALL_CLASS_COUNT 65 +//! Maximum size of a small block +#define SMALL_SIZE_LIMIT (SMALL_GRANULARITY * (SMALL_CLASS_COUNT - 1)) +//! Granularity of a medium allocation block +#define MEDIUM_GRANULARITY 512 +//! Medium granularity shift count +#define MEDIUM_GRANULARITY_SHIFT 9 +//! Number of medium block size classes +#define MEDIUM_CLASS_COUNT 61 +//! Total number of small + medium size classes +#define SIZE_CLASS_COUNT (SMALL_CLASS_COUNT + MEDIUM_CLASS_COUNT) +//! Number of large block size classes +#define LARGE_CLASS_COUNT 63 +//! Maximum size of a medium block +#define MEDIUM_SIZE_LIMIT \ + (SMALL_SIZE_LIMIT + (MEDIUM_GRANULARITY * MEDIUM_CLASS_COUNT)) +//! Maximum size of a large block +#define LARGE_SIZE_LIMIT \ + ((LARGE_CLASS_COUNT * _memory_span_size) - SPAN_HEADER_SIZE) +//! Size of a span header (must be a multiple of SMALL_GRANULARITY and a power +//! of two) +#define SPAN_HEADER_SIZE 128 +//! Number of spans in thread cache +#define MAX_THREAD_SPAN_CACHE 400 +//! Number of spans to transfer between thread and global cache +#define THREAD_SPAN_CACHE_TRANSFER 64 +//! Number of spans in thread cache for large spans (must be greater than +//! LARGE_CLASS_COUNT / 2) +#define MAX_THREAD_SPAN_LARGE_CACHE 100 +//! Number of spans to transfer between thread and global cache for large spans +#define THREAD_SPAN_LARGE_CACHE_TRANSFER 6 + +_Static_assert((SMALL_GRANULARITY & (SMALL_GRANULARITY - 1)) == 0, + "Small granularity must be power of two"); +_Static_assert((SPAN_HEADER_SIZE & (SPAN_HEADER_SIZE - 1)) == 0, + "Span header size must be power of two"); + +#if ENABLE_VALIDATE_ARGS +//! Maximum allocation size to avoid integer overflow +#undef MAX_ALLOC_SIZE +#define MAX_ALLOC_SIZE (((size_t) - 1) - _memory_span_size) +#endif + +#define pointer_offset(ptr, ofs) (void *)((char *)(ptr) + (ptrdiff_t)(ofs)) +#define pointer_diff(first, second) \ + (ptrdiff_t)((const char *)(first) - (const char *)(second)) + +#define INVALID_POINTER ((void *)((uintptr_t) - 1)) + +#define SIZE_CLASS_LARGE SIZE_CLASS_COUNT +#define SIZE_CLASS_HUGE ((uint32_t) - 1) + +//////////// +/// +/// Data types +/// +////// + +//! A memory heap, per thread +typedef struct heap_t heap_t; +//! Span of memory pages +typedef struct span_t span_t; +//! Span list +typedef struct span_list_t span_list_t; +//! Span active data +typedef struct span_active_t span_active_t; +//! Size class definition +typedef struct size_class_t size_class_t; +//! Global cache +typedef struct global_cache_t global_cache_t; + +//! Flag indicating span is the first (master) span of a split superspan +#define SPAN_FLAG_MASTER 1U +//! Flag indicating span is a secondary (sub) span of a split superspan +#define SPAN_FLAG_SUBSPAN 2U +//! Flag indicating span has blocks with increased alignment +#define SPAN_FLAG_ALIGNED_BLOCKS 4U +//! Flag indicating an unmapped master span +#define SPAN_FLAG_UNMAPPED_MASTER 8U + +#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS +struct span_use_t { + //! Current number of spans used (actually used, not in cache) + atomic32_t current; + //! High water mark of spans used + atomic32_t high; +#if ENABLE_STATISTICS + //! Number of spans in deferred list + atomic32_t spans_deferred; + //! Number of spans transitioned to global cache + atomic32_t spans_to_global; + //! Number of spans transitioned from global cache + atomic32_t spans_from_global; + //! Number of spans transitioned to thread cache + atomic32_t spans_to_cache; + //! Number of spans transitioned from thread cache + atomic32_t spans_from_cache; + //! Number of spans transitioned to reserved state + atomic32_t spans_to_reserved; + //! Number of spans transitioned from reserved state + atomic32_t spans_from_reserved; + //! Number of raw memory map calls + atomic32_t spans_map_calls; +#endif +}; +typedef struct span_use_t span_use_t; +#endif + +#if ENABLE_STATISTICS +struct size_class_use_t { + //! Current number of allocations + atomic32_t alloc_current; + //! Peak number of allocations + int32_t alloc_peak; + //! Total number of allocations + atomic32_t alloc_total; + //! Total number of frees + atomic32_t free_total; + //! Number of spans in use + atomic32_t spans_current; + //! Number of spans transitioned to cache + int32_t spans_peak; + //! Number of spans transitioned to cache + atomic32_t spans_to_cache; + //! Number of spans transitioned from cache + atomic32_t spans_from_cache; + //! Number of spans transitioned from reserved state + atomic32_t spans_from_reserved; + //! Number of spans mapped + atomic32_t spans_map_calls; + int32_t unused; +}; +typedef struct size_class_use_t size_class_use_t; +#endif + +// A span can either represent a single span of memory pages with size declared +// by span_map_count configuration variable, or a set of spans in a continuous +// region, a super span. Any reference to the term "span" usually refers to both +// a single span or a super span. A super span can further be divided into +// multiple spans (or this, super spans), where the first (super)span is the +// master and subsequent (super)spans are subspans. The master span keeps track +// of how many subspans that are still alive and mapped in virtual memory, and +// once all subspans and master have been unmapped the entire superspan region +// is released and unmapped (on Windows for example, the entire superspan range +// has to be released in the same call to release the virtual memory range, but +// individual subranges can be decommitted individually to reduce physical +// memory use). +struct span_t { + //! Free list + void *free_list; + //! Total block count of size class + uint32_t block_count; + //! Size class + uint32_t size_class; + //! Index of last block initialized in free list + uint32_t free_list_limit; + //! Number of used blocks remaining when in partial state + uint32_t used_count; + //! Deferred free list + atomicptr_t free_list_deferred; + //! Size of deferred free list, or list of spans when part of a cache list + uint32_t list_size; + //! Size of a block + uint32_t block_size; + //! Flags and counters + uint32_t flags; + //! Number of spans + uint32_t span_count; + //! Total span counter for master spans + uint32_t total_spans; + //! Offset from master span for subspans + uint32_t offset_from_master; + //! Remaining span counter, for master spans + atomic32_t remaining_spans; + //! Alignment offset + uint32_t align_offset; + //! Owning heap + heap_t *heap; + //! Next span + span_t *next; + //! Previous span + span_t *prev; +}; +_Static_assert(sizeof(span_t) <= SPAN_HEADER_SIZE, "span size mismatch"); + +struct span_cache_t { + size_t count; + span_t *span[MAX_THREAD_SPAN_CACHE]; +}; +typedef struct span_cache_t span_cache_t; + +struct span_large_cache_t { + size_t count; + span_t *span[MAX_THREAD_SPAN_LARGE_CACHE]; +}; +typedef struct span_large_cache_t span_large_cache_t; + +struct heap_size_class_t { + //! Free list of active span + void *free_list; + //! Double linked list of partially used spans with free blocks. + // Previous span pointer in head points to tail span of list. + span_t *partial_span; + //! Early level cache of fully free spans + span_t *cache; +}; +typedef struct heap_size_class_t heap_size_class_t; + +// Control structure for a heap, either a thread heap or a first class heap if +// enabled +struct heap_t { + //! Owning thread ID + uintptr_t owner_thread; + //! Free lists for each size class + heap_size_class_t size_class[SIZE_CLASS_COUNT]; +#if ENABLE_THREAD_CACHE + //! Arrays of fully freed spans, single span + span_cache_t span_cache; +#endif + //! List of deferred free spans (single linked list) + atomicptr_t span_free_deferred; + //! Number of full spans + size_t full_span_count; + //! Mapped but unused spans + span_t *span_reserve; + //! Master span for mapped but unused spans + span_t *span_reserve_master; + //! Number of mapped but unused spans + uint32_t spans_reserved; + //! Child count + atomic32_t child_count; + //! Next heap in id list + heap_t *next_heap; + //! Next heap in orphan list + heap_t *next_orphan; + //! Heap ID + int32_t id; + //! Finalization state flag + int finalize; + //! Master heap owning the memory pages + heap_t *master_heap; +#if ENABLE_THREAD_CACHE + //! Arrays of fully freed spans, large spans with > 1 span count + span_large_cache_t span_large_cache[LARGE_CLASS_COUNT - 1]; +#endif +#if RPMALLOC_FIRST_CLASS_HEAPS + //! Double linked list of fully utilized spans with free blocks for each size + //! class. + // Previous span pointer in head points to tail span of list. + span_t *full_span[SIZE_CLASS_COUNT]; + //! Double linked list of large and huge spans allocated by this heap + span_t *large_huge_span; +#endif +#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS + //! Current and high water mark of spans used per span count + span_use_t span_use[LARGE_CLASS_COUNT]; +#endif +#if ENABLE_STATISTICS + //! Allocation stats per size class + size_class_use_t size_class_use[SIZE_CLASS_COUNT + 1]; + //! Number of bytes transitioned thread -> global + atomic64_t thread_to_global; + //! Number of bytes transitioned global -> thread + atomic64_t global_to_thread; +#endif +}; + +// Size class for defining a block size bucket +struct size_class_t { + //! Size of blocks in this class + uint32_t block_size; + //! Number of blocks in each chunk + uint16_t block_count; + //! Class index this class is merged with + uint16_t class_idx; +}; +_Static_assert(sizeof(size_class_t) == 8, "Size class size mismatch"); + +struct global_cache_t { + //! Cache lock + atomic32_t lock; + //! Cache count + uint32_t count; +#if ENABLE_STATISTICS + //! Insert count + size_t insert_count; + //! Extract count + size_t extract_count; +#endif + //! Cached spans + span_t *span[GLOBAL_CACHE_MULTIPLIER * MAX_THREAD_SPAN_CACHE]; + //! Unlimited cache overflow + span_t *overflow; +}; + +//////////// +/// +/// Global data +/// +////// + +//! Default span size (64KiB) +#define _memory_default_span_size (64 * 1024) +#define _memory_default_span_size_shift 16 +#define _memory_default_span_mask (~((uintptr_t)(_memory_span_size - 1))) + +//! Initialized flag +static int _rpmalloc_initialized; +//! Main thread ID +static uintptr_t _rpmalloc_main_thread_id; +//! Configuration +static rpmalloc_config_t _memory_config; +//! Memory page size +static size_t _memory_page_size; +//! Shift to divide by page size +static size_t _memory_page_size_shift; +//! Granularity at which memory pages are mapped by OS +static size_t _memory_map_granularity; +#if RPMALLOC_CONFIGURABLE +//! Size of a span of memory pages +static size_t _memory_span_size; +//! Shift to divide by span size +static size_t _memory_span_size_shift; +//! Mask to get to start of a memory span +static uintptr_t _memory_span_mask; +#else +//! Hardwired span size +#define _memory_span_size _memory_default_span_size +#define _memory_span_size_shift _memory_default_span_size_shift +#define _memory_span_mask _memory_default_span_mask +#endif +//! Number of spans to map in each map call +static size_t _memory_span_map_count; +//! Number of spans to keep reserved in each heap +static size_t _memory_heap_reserve_count; +//! Global size classes +static size_class_t _memory_size_class[SIZE_CLASS_COUNT]; +//! Run-time size limit of medium blocks +static size_t _memory_medium_size_limit; +//! Heap ID counter +static atomic32_t _memory_heap_id; +//! Huge page support +static int _memory_huge_pages; +#if ENABLE_GLOBAL_CACHE +//! Global span cache +static global_cache_t _memory_span_cache[LARGE_CLASS_COUNT]; +#endif +//! Global reserved spans +static span_t *_memory_global_reserve; +//! Global reserved count +static size_t _memory_global_reserve_count; +//! Global reserved master +static span_t *_memory_global_reserve_master; +//! All heaps +static heap_t *_memory_heaps[HEAP_ARRAY_SIZE]; +//! Used to restrict access to mapping memory for huge pages +static atomic32_t _memory_global_lock; +//! Orphaned heaps +static heap_t *_memory_orphan_heaps; +#if RPMALLOC_FIRST_CLASS_HEAPS +//! Orphaned heaps (first class heaps) +static heap_t *_memory_first_class_orphan_heaps; +#endif +#if ENABLE_STATISTICS +//! Allocations counter +static atomic64_t _allocation_counter; +//! Deallocations counter +static atomic64_t _deallocation_counter; +//! Active heap count +static atomic32_t _memory_active_heaps; +//! Number of currently mapped memory pages +static atomic32_t _mapped_pages; +//! Peak number of concurrently mapped memory pages +static int32_t _mapped_pages_peak; +//! Number of mapped master spans +static atomic32_t _master_spans; +//! Number of unmapped dangling master spans +static atomic32_t _unmapped_master_spans; +//! Running counter of total number of mapped memory pages since start +static atomic32_t _mapped_total; +//! Running counter of total number of unmapped memory pages since start +static atomic32_t _unmapped_total; +//! Number of currently mapped memory pages in OS calls +static atomic32_t _mapped_pages_os; +//! Number of currently allocated pages in huge allocations +static atomic32_t _huge_pages_current; +//! Peak number of currently allocated pages in huge allocations +static int32_t _huge_pages_peak; +#endif + +//////////// +/// +/// Thread local heap and ID +/// +////// + +//! Current thread heap +#if ((defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD) || \ + defined(__TINYC__) +static pthread_key_t _memory_thread_heap; +#else +#ifdef _MSC_VER +#define _Thread_local __declspec(thread) +#define TLS_MODEL +#else +#ifndef __HAIKU__ +#define TLS_MODEL __attribute__((tls_model("initial-exec"))) +#else +#define TLS_MODEL +#endif +#if !defined(__clang__) && defined(__GNUC__) +#define _Thread_local __thread +#endif +#endif +static _Thread_local heap_t *_memory_thread_heap TLS_MODEL; +#endif + +static inline heap_t *get_thread_heap_raw(void) { +#if (defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD + return pthread_getspecific(_memory_thread_heap); +#else + return _memory_thread_heap; +#endif +} + +//! Get the current thread heap +static inline heap_t *get_thread_heap(void) { + heap_t *heap = get_thread_heap_raw(); +#if ENABLE_PRELOAD + if (EXPECTED(heap != 0)) + return heap; + rpmalloc_initialize(); + return get_thread_heap_raw(); +#else + return heap; +#endif +} + +//! Fast thread ID +static inline uintptr_t get_thread_id(void) { +#if defined(_WIN32) + return (uintptr_t)((void *)NtCurrentTeb()); +#elif (defined(__GNUC__) || defined(__clang__)) && !defined(__CYGWIN__) + uintptr_t tid; +#if defined(__i386__) + __asm__("movl %%gs:0, %0" : "=r"(tid) : :); +#elif defined(__x86_64__) +#if defined(__MACH__) + __asm__("movq %%gs:0, %0" : "=r"(tid) : :); +#else + __asm__("movq %%fs:0, %0" : "=r"(tid) : :); +#endif +#elif defined(__arm__) + __asm__ volatile("mrc p15, 0, %0, c13, c0, 3" : "=r"(tid)); +#elif defined(__aarch64__) +#if defined(__MACH__) + // tpidr_el0 likely unused, always return 0 on iOS + __asm__ volatile("mrs %0, tpidrro_el0" : "=r"(tid)); +#else + __asm__ volatile("mrs %0, tpidr_el0" : "=r"(tid)); +#endif +#else +#error This platform needs implementation of get_thread_id() +#endif + return tid; +#else +#error This platform needs implementation of get_thread_id() +#endif +} + +//! Set the current thread heap +static void set_thread_heap(heap_t *heap) { +#if ((defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD) || \ + defined(__TINYC__) + pthread_setspecific(_memory_thread_heap, heap); +#else + _memory_thread_heap = heap; +#endif + if (heap) + heap->owner_thread = get_thread_id(); +} + +//! Set main thread ID +extern void rpmalloc_set_main_thread(void); + +void rpmalloc_set_main_thread(void) { + _rpmalloc_main_thread_id = get_thread_id(); +} + +static void _rpmalloc_spin(void) { +#if defined(_MSC_VER) +#if defined(_M_ARM64) + __yield(); +#else + _mm_pause(); +#endif +#elif defined(__x86_64__) || defined(__i386__) + __asm__ volatile("pause" ::: "memory"); +#elif defined(__aarch64__) || (defined(__arm__) && __ARM_ARCH >= 7) + __asm__ volatile("yield" ::: "memory"); +#elif defined(__powerpc__) || defined(__powerpc64__) + // No idea if ever been compiled in such archs but ... as precaution + __asm__ volatile("or 27,27,27"); +#elif defined(__sparc__) + __asm__ volatile("rd %ccr, %g0 \n\trd %ccr, %g0 \n\trd %ccr, %g0"); +#else + struct timespec ts = {0}; + nanosleep(&ts, 0); +#endif +} + +#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) +static void NTAPI _rpmalloc_thread_destructor(void *value) { +#if ENABLE_OVERRIDE + // If this is called on main thread it means rpmalloc_finalize + // has not been called and shutdown is forced (through _exit) or unclean + if (get_thread_id() == _rpmalloc_main_thread_id) + return; +#endif + if (value) + rpmalloc_thread_finalize(1); +} +#endif + +//////////// +/// +/// Low level memory map/unmap +/// +////// + +static void _rpmalloc_set_name(void *address, size_t size) { +#if defined(__linux__) || defined(__ANDROID__) + const char *name = _memory_huge_pages ? _memory_config.huge_page_name + : _memory_config.page_name; + if (address == MAP_FAILED || !name) + return; + // If the kernel does not support CONFIG_ANON_VMA_NAME or if the call fails + // (e.g. invalid name) it is a no-op basically. + (void)prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, (uintptr_t)address, size, + (uintptr_t)name); +#else + (void)sizeof(size); + (void)sizeof(address); +#endif +} + +//! Map more virtual memory +// size is number of bytes to map +// offset receives the offset in bytes from start of mapped region +// returns address to start of mapped region to use +static void *_rpmalloc_mmap(size_t size, size_t *offset) { + rpmalloc_assert(!(size % _memory_page_size), "Invalid mmap size"); + rpmalloc_assert(size >= _memory_page_size, "Invalid mmap size"); + void *address = _memory_config.memory_map(size, offset); + if (EXPECTED(address != 0)) { + _rpmalloc_stat_add_peak(&_mapped_pages, (size >> _memory_page_size_shift), + _mapped_pages_peak); + _rpmalloc_stat_add(&_mapped_total, (size >> _memory_page_size_shift)); + } + return address; +} + +//! Unmap virtual memory +// address is the memory address to unmap, as returned from _memory_map +// size is the number of bytes to unmap, which might be less than full region +// for a partial unmap offset is the offset in bytes to the actual mapped +// region, as set by _memory_map release is set to 0 for partial unmap, or size +// of entire range for a full unmap +static void _rpmalloc_unmap(void *address, size_t size, size_t offset, + size_t release) { + rpmalloc_assert(!release || (release >= size), "Invalid unmap size"); + rpmalloc_assert(!release || (release >= _memory_page_size), + "Invalid unmap size"); + if (release) { + rpmalloc_assert(!(release % _memory_page_size), "Invalid unmap size"); + _rpmalloc_stat_sub(&_mapped_pages, (release >> _memory_page_size_shift)); + _rpmalloc_stat_add(&_unmapped_total, (release >> _memory_page_size_shift)); + } + _memory_config.memory_unmap(address, size, offset, release); +} + +//! Default implementation to map new pages to virtual memory +static void *_rpmalloc_mmap_os(size_t size, size_t *offset) { + // Either size is a heap (a single page) or a (multiple) span - we only need + // to align spans, and only if larger than map granularity + size_t padding = ((size >= _memory_span_size) && + (_memory_span_size > _memory_map_granularity)) + ? _memory_span_size + : 0; + rpmalloc_assert(size >= _memory_page_size, "Invalid mmap size"); +#if PLATFORM_WINDOWS + // Ok to MEM_COMMIT - according to MSDN, "actual physical pages are not + // allocated unless/until the virtual addresses are actually accessed" + void *ptr = VirtualAlloc(0, size + padding, + (_memory_huge_pages ? MEM_LARGE_PAGES : 0) | + MEM_RESERVE | MEM_COMMIT, + PAGE_READWRITE); + if (!ptr) { + if (_memory_config.map_fail_callback) { + if (_memory_config.map_fail_callback(size + padding)) + return _rpmalloc_mmap_os(size, offset); + } else { + rpmalloc_assert(ptr, "Failed to map virtual memory block"); + } + return 0; + } +#else + int flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_UNINITIALIZED; +#if defined(__APPLE__) && !TARGET_OS_IPHONE && !TARGET_OS_SIMULATOR + int fd = (int)VM_MAKE_TAG(240U); + if (_memory_huge_pages) + fd |= VM_FLAGS_SUPERPAGE_SIZE_2MB; + void *ptr = mmap(0, size + padding, PROT_READ | PROT_WRITE, flags, fd, 0); +#elif defined(MAP_HUGETLB) + void *ptr = mmap(0, size + padding, + PROT_READ | PROT_WRITE | PROT_MAX(PROT_READ | PROT_WRITE), + (_memory_huge_pages ? MAP_HUGETLB : 0) | flags, -1, 0); +#if defined(MADV_HUGEPAGE) + // In some configurations, huge pages allocations might fail thus + // we fallback to normal allocations and promote the region as transparent + // huge page + if ((ptr == MAP_FAILED || !ptr) && _memory_huge_pages) { + ptr = mmap(0, size + padding, PROT_READ | PROT_WRITE, flags, -1, 0); + if (ptr && ptr != MAP_FAILED) { + int prm = madvise(ptr, size + padding, MADV_HUGEPAGE); + (void)prm; + rpmalloc_assert((prm == 0), "Failed to promote the page to THP"); + } + } +#endif + _rpmalloc_set_name(ptr, size + padding); +#elif defined(MAP_ALIGNED) + const size_t align = + (sizeof(size_t) * 8) - (size_t)(__builtin_clzl(size - 1)); + void *ptr = + mmap(0, size + padding, PROT_READ | PROT_WRITE, + (_memory_huge_pages ? MAP_ALIGNED(align) : 0) | flags, -1, 0); +#elif defined(MAP_ALIGN) + caddr_t base = (_memory_huge_pages ? (caddr_t)(4 << 20) : 0); + void *ptr = mmap(base, size + padding, PROT_READ | PROT_WRITE, + (_memory_huge_pages ? MAP_ALIGN : 0) | flags, -1, 0); +#else + void *ptr = mmap(0, size + padding, PROT_READ | PROT_WRITE, flags, -1, 0); +#endif + if ((ptr == MAP_FAILED) || !ptr) { + if (_memory_config.map_fail_callback) { + if (_memory_config.map_fail_callback(size + padding)) + return _rpmalloc_mmap_os(size, offset); + } else if (errno != ENOMEM) { + rpmalloc_assert((ptr != MAP_FAILED) && ptr, + "Failed to map virtual memory block"); + } + return 0; + } +#endif + _rpmalloc_stat_add(&_mapped_pages_os, + (int32_t)((size + padding) >> _memory_page_size_shift)); + if (padding) { + size_t final_padding = padding - ((uintptr_t)ptr & ~_memory_span_mask); + rpmalloc_assert(final_padding <= _memory_span_size, + "Internal failure in padding"); + rpmalloc_assert(final_padding <= padding, "Internal failure in padding"); + rpmalloc_assert(!(final_padding % 8), "Internal failure in padding"); + ptr = pointer_offset(ptr, final_padding); + *offset = final_padding >> 3; + } + rpmalloc_assert((size < _memory_span_size) || + !((uintptr_t)ptr & ~_memory_span_mask), + "Internal failure in padding"); + return ptr; +} + +//! Default implementation to unmap pages from virtual memory +static void _rpmalloc_unmap_os(void *address, size_t size, size_t offset, + size_t release) { + rpmalloc_assert(release || (offset == 0), "Invalid unmap size"); + rpmalloc_assert(!release || (release >= _memory_page_size), + "Invalid unmap size"); + rpmalloc_assert(size >= _memory_page_size, "Invalid unmap size"); + if (release && offset) { + offset <<= 3; + address = pointer_offset(address, -(int32_t)offset); + if ((release >= _memory_span_size) && + (_memory_span_size > _memory_map_granularity)) { + // Padding is always one span size + release += _memory_span_size; + } + } +#if !DISABLE_UNMAP +#if PLATFORM_WINDOWS + if (!VirtualFree(address, release ? 0 : size, + release ? MEM_RELEASE : MEM_DECOMMIT)) { + rpmalloc_assert(0, "Failed to unmap virtual memory block"); + } +#else + if (release) { + if (munmap(address, release)) { + rpmalloc_assert(0, "Failed to unmap virtual memory block"); + } + } else { +#if defined(MADV_FREE_REUSABLE) + int ret; + while ((ret = madvise(address, size, MADV_FREE_REUSABLE)) == -1 && + (errno == EAGAIN)) + errno = 0; + if ((ret == -1) && (errno != 0)) { +#elif defined(MADV_DONTNEED) + if (madvise(address, size, MADV_DONTNEED)) { +#elif defined(MADV_PAGEOUT) + if (madvise(address, size, MADV_PAGEOUT)) { +#elif defined(MADV_FREE) + if (madvise(address, size, MADV_FREE)) { +#else + if (posix_madvise(address, size, POSIX_MADV_DONTNEED)) { +#endif + rpmalloc_assert(0, "Failed to madvise virtual memory block as free"); + } + } +#endif +#endif + if (release) + _rpmalloc_stat_sub(&_mapped_pages_os, release >> _memory_page_size_shift); +} + +static void _rpmalloc_span_mark_as_subspan_unless_master(span_t *master, + span_t *subspan, + size_t span_count); + +//! Use global reserved spans to fulfill a memory map request (reserve size must +//! be checked by caller) +static span_t *_rpmalloc_global_get_reserved_spans(size_t span_count) { + span_t *span = _memory_global_reserve; + _rpmalloc_span_mark_as_subspan_unless_master(_memory_global_reserve_master, + span, span_count); + _memory_global_reserve_count -= span_count; + if (_memory_global_reserve_count) + _memory_global_reserve = + (span_t *)pointer_offset(span, span_count << _memory_span_size_shift); + else + _memory_global_reserve = 0; + return span; +} + +//! Store the given spans as global reserve (must only be called from within new +//! heap allocation, not thread safe) +static void _rpmalloc_global_set_reserved_spans(span_t *master, span_t *reserve, + size_t reserve_span_count) { + _memory_global_reserve_master = master; + _memory_global_reserve_count = reserve_span_count; + _memory_global_reserve = reserve; +} + +//////////// +/// +/// Span linked list management +/// +////// + +//! Add a span to double linked list at the head +static void _rpmalloc_span_double_link_list_add(span_t **head, span_t *span) { + if (*head) + (*head)->prev = span; + span->next = *head; + *head = span; +} + +//! Pop head span from double linked list +static void _rpmalloc_span_double_link_list_pop_head(span_t **head, + span_t *span) { + rpmalloc_assert(*head == span, "Linked list corrupted"); + span = *head; + *head = span->next; +} + +//! Remove a span from double linked list +static void _rpmalloc_span_double_link_list_remove(span_t **head, + span_t *span) { + rpmalloc_assert(*head, "Linked list corrupted"); + if (*head == span) { + *head = span->next; + } else { + span_t *next_span = span->next; + span_t *prev_span = span->prev; + prev_span->next = next_span; + if (EXPECTED(next_span != 0)) + next_span->prev = prev_span; + } +} + +//////////// +/// +/// Span control +/// +////// + +static void _rpmalloc_heap_cache_insert(heap_t *heap, span_t *span); + +static void _rpmalloc_heap_finalize(heap_t *heap); + +static void _rpmalloc_heap_set_reserved_spans(heap_t *heap, span_t *master, + span_t *reserve, + size_t reserve_span_count); + +//! Declare the span to be a subspan and store distance from master span and +//! span count +static void _rpmalloc_span_mark_as_subspan_unless_master(span_t *master, + span_t *subspan, + size_t span_count) { + rpmalloc_assert((subspan != master) || (subspan->flags & SPAN_FLAG_MASTER), + "Span master pointer and/or flag mismatch"); + if (subspan != master) { + subspan->flags = SPAN_FLAG_SUBSPAN; + subspan->offset_from_master = + (uint32_t)((uintptr_t)pointer_diff(subspan, master) >> + _memory_span_size_shift); + subspan->align_offset = 0; + } + subspan->span_count = (uint32_t)span_count; +} + +//! Use reserved spans to fulfill a memory map request (reserve size must be +//! checked by caller) +static span_t *_rpmalloc_span_map_from_reserve(heap_t *heap, + size_t span_count) { + // Update the heap span reserve + span_t *span = heap->span_reserve; + heap->span_reserve = + (span_t *)pointer_offset(span, span_count * _memory_span_size); + heap->spans_reserved -= (uint32_t)span_count; + + _rpmalloc_span_mark_as_subspan_unless_master(heap->span_reserve_master, span, + span_count); + if (span_count <= LARGE_CLASS_COUNT) + _rpmalloc_stat_inc(&heap->span_use[span_count - 1].spans_from_reserved); + + return span; +} + +//! Get the aligned number of spans to map in based on wanted count, configured +//! mapping granularity and the page size +static size_t _rpmalloc_span_align_count(size_t span_count) { + size_t request_count = (span_count > _memory_span_map_count) + ? span_count + : _memory_span_map_count; + if ((_memory_page_size > _memory_span_size) && + ((request_count * _memory_span_size) % _memory_page_size)) + request_count += + _memory_span_map_count - (request_count % _memory_span_map_count); + return request_count; +} + +//! Setup a newly mapped span +static void _rpmalloc_span_initialize(span_t *span, size_t total_span_count, + size_t span_count, size_t align_offset) { + span->total_spans = (uint32_t)total_span_count; + span->span_count = (uint32_t)span_count; + span->align_offset = (uint32_t)align_offset; + span->flags = SPAN_FLAG_MASTER; + atomic_store32(&span->remaining_spans, (int32_t)total_span_count); +} + +static void _rpmalloc_span_unmap(span_t *span); + +//! Map an aligned set of spans, taking configured mapping granularity and the +//! page size into account +static span_t *_rpmalloc_span_map_aligned_count(heap_t *heap, + size_t span_count) { + // If we already have some, but not enough, reserved spans, release those to + // heap cache and map a new full set of spans. Otherwise we would waste memory + // if page size > span size (huge pages) + size_t aligned_span_count = _rpmalloc_span_align_count(span_count); + size_t align_offset = 0; + span_t *span = (span_t *)_rpmalloc_mmap( + aligned_span_count * _memory_span_size, &align_offset); + if (!span) + return 0; + _rpmalloc_span_initialize(span, aligned_span_count, span_count, align_offset); + _rpmalloc_stat_inc(&_master_spans); + if (span_count <= LARGE_CLASS_COUNT) + _rpmalloc_stat_inc(&heap->span_use[span_count - 1].spans_map_calls); + if (aligned_span_count > span_count) { + span_t *reserved_spans = + (span_t *)pointer_offset(span, span_count * _memory_span_size); + size_t reserved_count = aligned_span_count - span_count; + if (heap->spans_reserved) { + _rpmalloc_span_mark_as_subspan_unless_master( + heap->span_reserve_master, heap->span_reserve, heap->spans_reserved); + _rpmalloc_heap_cache_insert(heap, heap->span_reserve); + } + if (reserved_count > _memory_heap_reserve_count) { + // If huge pages or eager spam map count, the global reserve spin lock is + // held by caller, _rpmalloc_span_map + rpmalloc_assert(atomic_load32(&_memory_global_lock) == 1, + "Global spin lock not held as expected"); + size_t remain_count = reserved_count - _memory_heap_reserve_count; + reserved_count = _memory_heap_reserve_count; + span_t *remain_span = (span_t *)pointer_offset( + reserved_spans, reserved_count * _memory_span_size); + if (_memory_global_reserve) { + _rpmalloc_span_mark_as_subspan_unless_master( + _memory_global_reserve_master, _memory_global_reserve, + _memory_global_reserve_count); + _rpmalloc_span_unmap(_memory_global_reserve); + } + _rpmalloc_global_set_reserved_spans(span, remain_span, remain_count); + } + _rpmalloc_heap_set_reserved_spans(heap, span, reserved_spans, + reserved_count); + } + return span; +} + +//! Map in memory pages for the given number of spans (or use previously +//! reserved pages) +static span_t *_rpmalloc_span_map(heap_t *heap, size_t span_count) { + if (span_count <= heap->spans_reserved) + return _rpmalloc_span_map_from_reserve(heap, span_count); + span_t *span = 0; + int use_global_reserve = + (_memory_page_size > _memory_span_size) || + (_memory_span_map_count > _memory_heap_reserve_count); + if (use_global_reserve) { + // If huge pages, make sure only one thread maps more memory to avoid bloat + while (!atomic_cas32_acquire(&_memory_global_lock, 1, 0)) + _rpmalloc_spin(); + if (_memory_global_reserve_count >= span_count) { + size_t reserve_count = + (!heap->spans_reserved ? _memory_heap_reserve_count : span_count); + if (_memory_global_reserve_count < reserve_count) + reserve_count = _memory_global_reserve_count; + span = _rpmalloc_global_get_reserved_spans(reserve_count); + if (span) { + if (reserve_count > span_count) { + span_t *reserved_span = (span_t *)pointer_offset( + span, span_count << _memory_span_size_shift); + _rpmalloc_heap_set_reserved_spans(heap, _memory_global_reserve_master, + reserved_span, + reserve_count - span_count); + } + // Already marked as subspan in _rpmalloc_global_get_reserved_spans + span->span_count = (uint32_t)span_count; + } + } + } + if (!span) + span = _rpmalloc_span_map_aligned_count(heap, span_count); + if (use_global_reserve) + atomic_store32_release(&_memory_global_lock, 0); + return span; +} + +//! Unmap memory pages for the given number of spans (or mark as unused if no +//! partial unmappings) +static void _rpmalloc_span_unmap(span_t *span) { + rpmalloc_assert((span->flags & SPAN_FLAG_MASTER) || + (span->flags & SPAN_FLAG_SUBSPAN), + "Span flag corrupted"); + rpmalloc_assert(!(span->flags & SPAN_FLAG_MASTER) || + !(span->flags & SPAN_FLAG_SUBSPAN), + "Span flag corrupted"); + + int is_master = !!(span->flags & SPAN_FLAG_MASTER); + span_t *master = + is_master ? span + : ((span_t *)pointer_offset( + span, -(intptr_t)((uintptr_t)span->offset_from_master * + _memory_span_size))); + rpmalloc_assert(is_master || (span->flags & SPAN_FLAG_SUBSPAN), + "Span flag corrupted"); + rpmalloc_assert(master->flags & SPAN_FLAG_MASTER, "Span flag corrupted"); + + size_t span_count = span->span_count; + if (!is_master) { + // Directly unmap subspans (unless huge pages, in which case we defer and + // unmap entire page range with master) + rpmalloc_assert(span->align_offset == 0, "Span align offset corrupted"); + if (_memory_span_size >= _memory_page_size) + _rpmalloc_unmap(span, span_count * _memory_span_size, 0, 0); + } else { + // Special double flag to denote an unmapped master + // It must be kept in memory since span header must be used + span->flags |= + SPAN_FLAG_MASTER | SPAN_FLAG_SUBSPAN | SPAN_FLAG_UNMAPPED_MASTER; + _rpmalloc_stat_add(&_unmapped_master_spans, 1); + } + + if (atomic_add32(&master->remaining_spans, -(int32_t)span_count) <= 0) { + // Everything unmapped, unmap the master span with release flag to unmap the + // entire range of the super span + rpmalloc_assert(!!(master->flags & SPAN_FLAG_MASTER) && + !!(master->flags & SPAN_FLAG_SUBSPAN), + "Span flag corrupted"); + size_t unmap_count = master->span_count; + if (_memory_span_size < _memory_page_size) + unmap_count = master->total_spans; + _rpmalloc_stat_sub(&_master_spans, 1); + _rpmalloc_stat_sub(&_unmapped_master_spans, 1); + _rpmalloc_unmap(master, unmap_count * _memory_span_size, + master->align_offset, + (size_t)master->total_spans * _memory_span_size); + } +} + +//! Move the span (used for small or medium allocations) to the heap thread +//! cache +static void _rpmalloc_span_release_to_cache(heap_t *heap, span_t *span) { + rpmalloc_assert(heap == span->heap, "Span heap pointer corrupted"); + rpmalloc_assert(span->size_class < SIZE_CLASS_COUNT, + "Invalid span size class"); + rpmalloc_assert(span->span_count == 1, "Invalid span count"); +#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS + atomic_decr32(&heap->span_use[0].current); +#endif + _rpmalloc_stat_dec(&heap->size_class_use[span->size_class].spans_current); + if (!heap->finalize) { + _rpmalloc_stat_inc(&heap->span_use[0].spans_to_cache); + _rpmalloc_stat_inc(&heap->size_class_use[span->size_class].spans_to_cache); + if (heap->size_class[span->size_class].cache) + _rpmalloc_heap_cache_insert(heap, + heap->size_class[span->size_class].cache); + heap->size_class[span->size_class].cache = span; + } else { + _rpmalloc_span_unmap(span); + } +} + +//! Initialize a (partial) free list up to next system memory page, while +//! reserving the first block as allocated, returning number of blocks in list +static uint32_t free_list_partial_init(void **list, void **first_block, + void *page_start, void *block_start, + uint32_t block_count, + uint32_t block_size) { + rpmalloc_assert(block_count, "Internal failure"); + *first_block = block_start; + if (block_count > 1) { + void *free_block = pointer_offset(block_start, block_size); + void *block_end = + pointer_offset(block_start, (size_t)block_size * block_count); + // If block size is less than half a memory page, bound init to next memory + // page boundary + if (block_size < (_memory_page_size >> 1)) { + void *page_end = pointer_offset(page_start, _memory_page_size); + if (page_end < block_end) + block_end = page_end; + } + *list = free_block; + block_count = 2; + void *next_block = pointer_offset(free_block, block_size); + while (next_block < block_end) { + *((void **)free_block) = next_block; + free_block = next_block; + ++block_count; + next_block = pointer_offset(next_block, block_size); + } + *((void **)free_block) = 0; + } else { + *list = 0; + } + return block_count; +} + +//! Initialize an unused span (from cache or mapped) to be new active span, +//! putting the initial free list in heap class free list +static void *_rpmalloc_span_initialize_new(heap_t *heap, + heap_size_class_t *heap_size_class, + span_t *span, uint32_t class_idx) { + rpmalloc_assert(span->span_count == 1, "Internal failure"); + size_class_t *size_class = _memory_size_class + class_idx; + span->size_class = class_idx; + span->heap = heap; + span->flags &= ~SPAN_FLAG_ALIGNED_BLOCKS; + span->block_size = size_class->block_size; + span->block_count = size_class->block_count; + span->free_list = 0; + span->list_size = 0; + atomic_store_ptr_release(&span->free_list_deferred, 0); + + // Setup free list. Only initialize one system page worth of free blocks in + // list + void *block; + span->free_list_limit = + free_list_partial_init(&heap_size_class->free_list, &block, span, + pointer_offset(span, SPAN_HEADER_SIZE), + size_class->block_count, size_class->block_size); + // Link span as partial if there remains blocks to be initialized as free + // list, or full if fully initialized + if (span->free_list_limit < span->block_count) { + _rpmalloc_span_double_link_list_add(&heap_size_class->partial_span, span); + span->used_count = span->free_list_limit; + } else { +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_add(&heap->full_span[class_idx], span); +#endif + ++heap->full_span_count; + span->used_count = span->block_count; + } + return block; +} + +static void _rpmalloc_span_extract_free_list_deferred(span_t *span) { + // We need acquire semantics on the CAS operation since we are interested in + // the list size Refer to _rpmalloc_deallocate_defer_small_or_medium for + // further comments on this dependency + do { + span->free_list = + atomic_exchange_ptr_acquire(&span->free_list_deferred, INVALID_POINTER); + } while (span->free_list == INVALID_POINTER); + span->used_count -= span->list_size; + span->list_size = 0; + atomic_store_ptr_release(&span->free_list_deferred, 0); +} + +static int _rpmalloc_span_is_fully_utilized(span_t *span) { + rpmalloc_assert(span->free_list_limit <= span->block_count, + "Span free list corrupted"); + return !span->free_list && (span->free_list_limit >= span->block_count); +} + +static int _rpmalloc_span_finalize(heap_t *heap, size_t iclass, span_t *span, + span_t **list_head) { + void *free_list = heap->size_class[iclass].free_list; + span_t *class_span = (span_t *)((uintptr_t)free_list & _memory_span_mask); + if (span == class_span) { + // Adopt the heap class free list back into the span free list + void *block = span->free_list; + void *last_block = 0; + while (block) { + last_block = block; + block = *((void **)block); + } + uint32_t free_count = 0; + block = free_list; + while (block) { + ++free_count; + block = *((void **)block); + } + if (last_block) { + *((void **)last_block) = free_list; + } else { + span->free_list = free_list; + } + heap->size_class[iclass].free_list = 0; + span->used_count -= free_count; + } + // If this assert triggers you have memory leaks + rpmalloc_assert(span->list_size == span->used_count, "Memory leak detected"); + if (span->list_size == span->used_count) { + _rpmalloc_stat_dec(&heap->span_use[0].current); + _rpmalloc_stat_dec(&heap->size_class_use[iclass].spans_current); + // This function only used for spans in double linked lists + if (list_head) + _rpmalloc_span_double_link_list_remove(list_head, span); + _rpmalloc_span_unmap(span); + return 1; + } + return 0; +} + +//////////// +/// +/// Global cache +/// +////// + +#if ENABLE_GLOBAL_CACHE + +//! Finalize a global cache +static void _rpmalloc_global_cache_finalize(global_cache_t *cache) { + while (!atomic_cas32_acquire(&cache->lock, 1, 0)) + _rpmalloc_spin(); + + for (size_t ispan = 0; ispan < cache->count; ++ispan) + _rpmalloc_span_unmap(cache->span[ispan]); + cache->count = 0; + + while (cache->overflow) { + span_t *span = cache->overflow; + cache->overflow = span->next; + _rpmalloc_span_unmap(span); + } + + atomic_store32_release(&cache->lock, 0); +} + +static void _rpmalloc_global_cache_insert_spans(span_t **span, + size_t span_count, + size_t count) { + const size_t cache_limit = + (span_count == 1) ? GLOBAL_CACHE_MULTIPLIER * MAX_THREAD_SPAN_CACHE + : GLOBAL_CACHE_MULTIPLIER * + (MAX_THREAD_SPAN_LARGE_CACHE - (span_count >> 1)); + + global_cache_t *cache = &_memory_span_cache[span_count - 1]; + + size_t insert_count = count; + while (!atomic_cas32_acquire(&cache->lock, 1, 0)) + _rpmalloc_spin(); + +#if ENABLE_STATISTICS + cache->insert_count += count; +#endif + if ((cache->count + insert_count) > cache_limit) + insert_count = cache_limit - cache->count; + + memcpy(cache->span + cache->count, span, sizeof(span_t *) * insert_count); + cache->count += (uint32_t)insert_count; + +#if ENABLE_UNLIMITED_CACHE + while (insert_count < count) { +#else + // Enable unlimited cache if huge pages, or we will leak since it is unlikely + // that an entire huge page will be unmapped, and we're unable to partially + // decommit a huge page + while ((_memory_page_size > _memory_span_size) && (insert_count < count)) { +#endif + span_t *current_span = span[insert_count++]; + current_span->next = cache->overflow; + cache->overflow = current_span; + } + atomic_store32_release(&cache->lock, 0); + + span_t *keep = 0; + for (size_t ispan = insert_count; ispan < count; ++ispan) { + span_t *current_span = span[ispan]; + // Keep master spans that has remaining subspans to avoid dangling them + if ((current_span->flags & SPAN_FLAG_MASTER) && + (atomic_load32(¤t_span->remaining_spans) > + (int32_t)current_span->span_count)) { + current_span->next = keep; + keep = current_span; + } else { + _rpmalloc_span_unmap(current_span); + } + } + + if (keep) { + while (!atomic_cas32_acquire(&cache->lock, 1, 0)) + _rpmalloc_spin(); + + size_t islot = 0; + while (keep) { + for (; islot < cache->count; ++islot) { + span_t *current_span = cache->span[islot]; + if (!(current_span->flags & SPAN_FLAG_MASTER) || + ((current_span->flags & SPAN_FLAG_MASTER) && + (atomic_load32(¤t_span->remaining_spans) <= + (int32_t)current_span->span_count))) { + _rpmalloc_span_unmap(current_span); + cache->span[islot] = keep; + break; + } + } + if (islot == cache->count) + break; + keep = keep->next; + } + + if (keep) { + span_t *tail = keep; + while (tail->next) + tail = tail->next; + tail->next = cache->overflow; + cache->overflow = keep; + } + + atomic_store32_release(&cache->lock, 0); + } +} + +static size_t _rpmalloc_global_cache_extract_spans(span_t **span, + size_t span_count, + size_t count) { + global_cache_t *cache = &_memory_span_cache[span_count - 1]; + + size_t extract_count = 0; + while (!atomic_cas32_acquire(&cache->lock, 1, 0)) + _rpmalloc_spin(); + +#if ENABLE_STATISTICS + cache->extract_count += count; +#endif + size_t want = count - extract_count; + if (want > cache->count) + want = cache->count; + + memcpy(span + extract_count, cache->span + (cache->count - want), + sizeof(span_t *) * want); + cache->count -= (uint32_t)want; + extract_count += want; + + while ((extract_count < count) && cache->overflow) { + span_t *current_span = cache->overflow; + span[extract_count++] = current_span; + cache->overflow = current_span->next; + } + +#if ENABLE_ASSERTS + for (size_t ispan = 0; ispan < extract_count; ++ispan) { + rpmalloc_assert(span[ispan]->span_count == span_count, + "Global cache span count mismatch"); + } +#endif + + atomic_store32_release(&cache->lock, 0); + + return extract_count; +} + +#endif + +//////////// +/// +/// Heap control +/// +////// + +static void _rpmalloc_deallocate_huge(span_t *); + +//! Store the given spans as reserve in the given heap +static void _rpmalloc_heap_set_reserved_spans(heap_t *heap, span_t *master, + span_t *reserve, + size_t reserve_span_count) { + heap->span_reserve_master = master; + heap->span_reserve = reserve; + heap->spans_reserved = (uint32_t)reserve_span_count; +} + +//! Adopt the deferred span cache list, optionally extracting the first single +//! span for immediate re-use +static void _rpmalloc_heap_cache_adopt_deferred(heap_t *heap, + span_t **single_span) { + span_t *span = (span_t *)((void *)atomic_exchange_ptr_acquire( + &heap->span_free_deferred, 0)); + while (span) { + span_t *next_span = (span_t *)span->free_list; + rpmalloc_assert(span->heap == heap, "Span heap pointer corrupted"); + if (EXPECTED(span->size_class < SIZE_CLASS_COUNT)) { + rpmalloc_assert(heap->full_span_count, "Heap span counter corrupted"); + --heap->full_span_count; + _rpmalloc_stat_dec(&heap->span_use[0].spans_deferred); +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_remove(&heap->full_span[span->size_class], + span); +#endif + _rpmalloc_stat_dec(&heap->span_use[0].current); + _rpmalloc_stat_dec(&heap->size_class_use[span->size_class].spans_current); + if (single_span && !*single_span) + *single_span = span; + else + _rpmalloc_heap_cache_insert(heap, span); + } else { + if (span->size_class == SIZE_CLASS_HUGE) { + _rpmalloc_deallocate_huge(span); + } else { + rpmalloc_assert(span->size_class == SIZE_CLASS_LARGE, + "Span size class invalid"); + rpmalloc_assert(heap->full_span_count, "Heap span counter corrupted"); + --heap->full_span_count; +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_remove(&heap->large_huge_span, span); +#endif + uint32_t idx = span->span_count - 1; + _rpmalloc_stat_dec(&heap->span_use[idx].spans_deferred); + _rpmalloc_stat_dec(&heap->span_use[idx].current); + if (!idx && single_span && !*single_span) + *single_span = span; + else + _rpmalloc_heap_cache_insert(heap, span); + } + } + span = next_span; + } +} + +static void _rpmalloc_heap_unmap(heap_t *heap) { + if (!heap->master_heap) { + if ((heap->finalize > 1) && !atomic_load32(&heap->child_count)) { + span_t *span = (span_t *)((uintptr_t)heap & _memory_span_mask); + _rpmalloc_span_unmap(span); + } + } else { + if (atomic_decr32(&heap->master_heap->child_count) == 0) { + _rpmalloc_heap_unmap(heap->master_heap); + } + } +} + +static void _rpmalloc_heap_global_finalize(heap_t *heap) { + if (heap->finalize++ > 1) { + --heap->finalize; + return; + } + + _rpmalloc_heap_finalize(heap); + +#if ENABLE_THREAD_CACHE + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + span_cache_t *span_cache; + if (!iclass) + span_cache = &heap->span_cache; + else + span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); + for (size_t ispan = 0; ispan < span_cache->count; ++ispan) + _rpmalloc_span_unmap(span_cache->span[ispan]); + span_cache->count = 0; + } +#endif + + if (heap->full_span_count) { + --heap->finalize; + return; + } + + for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { + if (heap->size_class[iclass].free_list || + heap->size_class[iclass].partial_span) { + --heap->finalize; + return; + } + } + // Heap is now completely free, unmap and remove from heap list + size_t list_idx = (size_t)heap->id % HEAP_ARRAY_SIZE; + heap_t *list_heap = _memory_heaps[list_idx]; + if (list_heap == heap) { + _memory_heaps[list_idx] = heap->next_heap; + } else { + while (list_heap->next_heap != heap) + list_heap = list_heap->next_heap; + list_heap->next_heap = heap->next_heap; + } + + _rpmalloc_heap_unmap(heap); +} + +//! Insert a single span into thread heap cache, releasing to global cache if +//! overflow +static void _rpmalloc_heap_cache_insert(heap_t *heap, span_t *span) { + if (UNEXPECTED(heap->finalize != 0)) { + _rpmalloc_span_unmap(span); + _rpmalloc_heap_global_finalize(heap); + return; + } +#if ENABLE_THREAD_CACHE + size_t span_count = span->span_count; + _rpmalloc_stat_inc(&heap->span_use[span_count - 1].spans_to_cache); + if (span_count == 1) { + span_cache_t *span_cache = &heap->span_cache; + span_cache->span[span_cache->count++] = span; + if (span_cache->count == MAX_THREAD_SPAN_CACHE) { + const size_t remain_count = + MAX_THREAD_SPAN_CACHE - THREAD_SPAN_CACHE_TRANSFER; +#if ENABLE_GLOBAL_CACHE + _rpmalloc_stat_add64(&heap->thread_to_global, + THREAD_SPAN_CACHE_TRANSFER * _memory_span_size); + _rpmalloc_stat_add(&heap->span_use[span_count - 1].spans_to_global, + THREAD_SPAN_CACHE_TRANSFER); + _rpmalloc_global_cache_insert_spans(span_cache->span + remain_count, + span_count, + THREAD_SPAN_CACHE_TRANSFER); +#else + for (size_t ispan = 0; ispan < THREAD_SPAN_CACHE_TRANSFER; ++ispan) + _rpmalloc_span_unmap(span_cache->span[remain_count + ispan]); +#endif + span_cache->count = remain_count; + } + } else { + size_t cache_idx = span_count - 2; + span_large_cache_t *span_cache = heap->span_large_cache + cache_idx; + span_cache->span[span_cache->count++] = span; + const size_t cache_limit = + (MAX_THREAD_SPAN_LARGE_CACHE - (span_count >> 1)); + if (span_cache->count == cache_limit) { + const size_t transfer_limit = 2 + (cache_limit >> 2); + const size_t transfer_count = + (THREAD_SPAN_LARGE_CACHE_TRANSFER <= transfer_limit + ? THREAD_SPAN_LARGE_CACHE_TRANSFER + : transfer_limit); + const size_t remain_count = cache_limit - transfer_count; +#if ENABLE_GLOBAL_CACHE + _rpmalloc_stat_add64(&heap->thread_to_global, + transfer_count * span_count * _memory_span_size); + _rpmalloc_stat_add(&heap->span_use[span_count - 1].spans_to_global, + transfer_count); + _rpmalloc_global_cache_insert_spans(span_cache->span + remain_count, + span_count, transfer_count); +#else + for (size_t ispan = 0; ispan < transfer_count; ++ispan) + _rpmalloc_span_unmap(span_cache->span[remain_count + ispan]); +#endif + span_cache->count = remain_count; + } + } +#else + (void)sizeof(heap); + _rpmalloc_span_unmap(span); +#endif +} + +//! Extract the given number of spans from the different cache levels +static span_t *_rpmalloc_heap_thread_cache_extract(heap_t *heap, + size_t span_count) { + span_t *span = 0; +#if ENABLE_THREAD_CACHE + span_cache_t *span_cache; + if (span_count == 1) + span_cache = &heap->span_cache; + else + span_cache = (span_cache_t *)(heap->span_large_cache + (span_count - 2)); + if (span_cache->count) { + _rpmalloc_stat_inc(&heap->span_use[span_count - 1].spans_from_cache); + return span_cache->span[--span_cache->count]; + } +#endif + return span; +} + +static span_t *_rpmalloc_heap_thread_cache_deferred_extract(heap_t *heap, + size_t span_count) { + span_t *span = 0; + if (span_count == 1) { + _rpmalloc_heap_cache_adopt_deferred(heap, &span); + } else { + _rpmalloc_heap_cache_adopt_deferred(heap, 0); + span = _rpmalloc_heap_thread_cache_extract(heap, span_count); + } + return span; +} + +static span_t *_rpmalloc_heap_reserved_extract(heap_t *heap, + size_t span_count) { + if (heap->spans_reserved >= span_count) + return _rpmalloc_span_map(heap, span_count); + return 0; +} + +//! Extract a span from the global cache +static span_t *_rpmalloc_heap_global_cache_extract(heap_t *heap, + size_t span_count) { +#if ENABLE_GLOBAL_CACHE +#if ENABLE_THREAD_CACHE + span_cache_t *span_cache; + size_t wanted_count; + if (span_count == 1) { + span_cache = &heap->span_cache; + wanted_count = THREAD_SPAN_CACHE_TRANSFER; + } else { + span_cache = (span_cache_t *)(heap->span_large_cache + (span_count - 2)); + wanted_count = THREAD_SPAN_LARGE_CACHE_TRANSFER; + } + span_cache->count = _rpmalloc_global_cache_extract_spans( + span_cache->span, span_count, wanted_count); + if (span_cache->count) { + _rpmalloc_stat_add64(&heap->global_to_thread, + span_count * span_cache->count * _memory_span_size); + _rpmalloc_stat_add(&heap->span_use[span_count - 1].spans_from_global, + span_cache->count); + return span_cache->span[--span_cache->count]; + } +#else + span_t *span = 0; + size_t count = _rpmalloc_global_cache_extract_spans(&span, span_count, 1); + if (count) { + _rpmalloc_stat_add64(&heap->global_to_thread, + span_count * count * _memory_span_size); + _rpmalloc_stat_add(&heap->span_use[span_count - 1].spans_from_global, + count); + return span; + } +#endif +#endif + (void)sizeof(heap); + (void)sizeof(span_count); + return 0; +} + +static void _rpmalloc_inc_span_statistics(heap_t *heap, size_t span_count, + uint32_t class_idx) { + (void)sizeof(heap); + (void)sizeof(span_count); + (void)sizeof(class_idx); +#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS + uint32_t idx = (uint32_t)span_count - 1; + uint32_t current_count = + (uint32_t)atomic_incr32(&heap->span_use[idx].current); + if (current_count > (uint32_t)atomic_load32(&heap->span_use[idx].high)) + atomic_store32(&heap->span_use[idx].high, (int32_t)current_count); + _rpmalloc_stat_add_peak(&heap->size_class_use[class_idx].spans_current, 1, + heap->size_class_use[class_idx].spans_peak); +#endif +} + +//! Get a span from one of the cache levels (thread cache, reserved, global +//! cache) or fallback to mapping more memory +static span_t * +_rpmalloc_heap_extract_new_span(heap_t *heap, + heap_size_class_t *heap_size_class, + size_t span_count, uint32_t class_idx) { + span_t *span; +#if ENABLE_THREAD_CACHE + if (heap_size_class && heap_size_class->cache) { + span = heap_size_class->cache; + heap_size_class->cache = + (heap->span_cache.count + ? heap->span_cache.span[--heap->span_cache.count] + : 0); + _rpmalloc_inc_span_statistics(heap, span_count, class_idx); + return span; + } +#endif + (void)sizeof(class_idx); + // Allow 50% overhead to increase cache hits + size_t base_span_count = span_count; + size_t limit_span_count = + (span_count > 2) ? (span_count + (span_count >> 1)) : span_count; + if (limit_span_count > LARGE_CLASS_COUNT) + limit_span_count = LARGE_CLASS_COUNT; + do { + span = _rpmalloc_heap_thread_cache_extract(heap, span_count); + if (EXPECTED(span != 0)) { + _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_from_cache); + _rpmalloc_inc_span_statistics(heap, span_count, class_idx); + return span; + } + span = _rpmalloc_heap_thread_cache_deferred_extract(heap, span_count); + if (EXPECTED(span != 0)) { + _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_from_cache); + _rpmalloc_inc_span_statistics(heap, span_count, class_idx); + return span; + } + span = _rpmalloc_heap_global_cache_extract(heap, span_count); + if (EXPECTED(span != 0)) { + _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_from_cache); + _rpmalloc_inc_span_statistics(heap, span_count, class_idx); + return span; + } + span = _rpmalloc_heap_reserved_extract(heap, span_count); + if (EXPECTED(span != 0)) { + _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_from_reserved); + _rpmalloc_inc_span_statistics(heap, span_count, class_idx); + return span; + } + ++span_count; + } while (span_count <= limit_span_count); + // Final fallback, map in more virtual memory + span = _rpmalloc_span_map(heap, base_span_count); + _rpmalloc_inc_span_statistics(heap, base_span_count, class_idx); + _rpmalloc_stat_inc(&heap->size_class_use[class_idx].spans_map_calls); + return span; +} + +static void _rpmalloc_heap_initialize(heap_t *heap) { + _rpmalloc_memset_const(heap, 0, sizeof(heap_t)); + // Get a new heap ID + heap->id = 1 + atomic_incr32(&_memory_heap_id); + + // Link in heap in heap ID map + size_t list_idx = (size_t)heap->id % HEAP_ARRAY_SIZE; + heap->next_heap = _memory_heaps[list_idx]; + _memory_heaps[list_idx] = heap; +} + +static void _rpmalloc_heap_orphan(heap_t *heap, int first_class) { + heap->owner_thread = (uintptr_t)-1; +#if RPMALLOC_FIRST_CLASS_HEAPS + heap_t **heap_list = + (first_class ? &_memory_first_class_orphan_heaps : &_memory_orphan_heaps); +#else + (void)sizeof(first_class); + heap_t **heap_list = &_memory_orphan_heaps; +#endif + heap->next_orphan = *heap_list; + *heap_list = heap; +} + +//! Allocate a new heap from newly mapped memory pages +static heap_t *_rpmalloc_heap_allocate_new(void) { + // Map in pages for a 16 heaps. If page size is greater than required size for + // this, map a page and use first part for heaps and remaining part for spans + // for allocations. Adds a lot of complexity, but saves a lot of memory on + // systems where page size > 64 spans (4MiB) + size_t heap_size = sizeof(heap_t); + size_t aligned_heap_size = 16 * ((heap_size + 15) / 16); + size_t request_heap_count = 16; + size_t heap_span_count = ((aligned_heap_size * request_heap_count) + + sizeof(span_t) + _memory_span_size - 1) / + _memory_span_size; + size_t block_size = _memory_span_size * heap_span_count; + size_t span_count = heap_span_count; + span_t *span = 0; + // If there are global reserved spans, use these first + if (_memory_global_reserve_count >= heap_span_count) { + span = _rpmalloc_global_get_reserved_spans(heap_span_count); + } + if (!span) { + if (_memory_page_size > block_size) { + span_count = _memory_page_size / _memory_span_size; + block_size = _memory_page_size; + // If using huge pages, make sure to grab enough heaps to avoid + // reallocating a huge page just to serve new heaps + size_t possible_heap_count = + (block_size - sizeof(span_t)) / aligned_heap_size; + if (possible_heap_count >= (request_heap_count * 16)) + request_heap_count *= 16; + else if (possible_heap_count < request_heap_count) + request_heap_count = possible_heap_count; + heap_span_count = ((aligned_heap_size * request_heap_count) + + sizeof(span_t) + _memory_span_size - 1) / + _memory_span_size; + } + + size_t align_offset = 0; + span = (span_t *)_rpmalloc_mmap(block_size, &align_offset); + if (!span) + return 0; + + // Master span will contain the heaps + _rpmalloc_stat_inc(&_master_spans); + _rpmalloc_span_initialize(span, span_count, heap_span_count, align_offset); + } + + size_t remain_size = _memory_span_size - sizeof(span_t); + heap_t *heap = (heap_t *)pointer_offset(span, sizeof(span_t)); + _rpmalloc_heap_initialize(heap); + + // Put extra heaps as orphans + size_t num_heaps = remain_size / aligned_heap_size; + if (num_heaps < request_heap_count) + num_heaps = request_heap_count; + atomic_store32(&heap->child_count, (int32_t)num_heaps - 1); + heap_t *extra_heap = (heap_t *)pointer_offset(heap, aligned_heap_size); + while (num_heaps > 1) { + _rpmalloc_heap_initialize(extra_heap); + extra_heap->master_heap = heap; + _rpmalloc_heap_orphan(extra_heap, 1); + extra_heap = (heap_t *)pointer_offset(extra_heap, aligned_heap_size); + --num_heaps; + } + + if (span_count > heap_span_count) { + // Cap reserved spans + size_t remain_count = span_count - heap_span_count; + size_t reserve_count = + (remain_count > _memory_heap_reserve_count ? _memory_heap_reserve_count + : remain_count); + span_t *remain_span = + (span_t *)pointer_offset(span, heap_span_count * _memory_span_size); + _rpmalloc_heap_set_reserved_spans(heap, span, remain_span, reserve_count); + + if (remain_count > reserve_count) { + // Set to global reserved spans + remain_span = (span_t *)pointer_offset(remain_span, + reserve_count * _memory_span_size); + reserve_count = remain_count - reserve_count; + _rpmalloc_global_set_reserved_spans(span, remain_span, reserve_count); + } + } + + return heap; +} + +static heap_t *_rpmalloc_heap_extract_orphan(heap_t **heap_list) { + heap_t *heap = *heap_list; + *heap_list = (heap ? heap->next_orphan : 0); + return heap; +} + +//! Allocate a new heap, potentially reusing a previously orphaned heap +static heap_t *_rpmalloc_heap_allocate(int first_class) { + heap_t *heap = 0; + while (!atomic_cas32_acquire(&_memory_global_lock, 1, 0)) + _rpmalloc_spin(); + if (first_class == 0) + heap = _rpmalloc_heap_extract_orphan(&_memory_orphan_heaps); +#if RPMALLOC_FIRST_CLASS_HEAPS + if (!heap) + heap = _rpmalloc_heap_extract_orphan(&_memory_first_class_orphan_heaps); +#endif + if (!heap) + heap = _rpmalloc_heap_allocate_new(); + atomic_store32_release(&_memory_global_lock, 0); + if (heap) + _rpmalloc_heap_cache_adopt_deferred(heap, 0); + return heap; +} + +static void _rpmalloc_heap_release(void *heapptr, int first_class, + int release_cache) { + heap_t *heap = (heap_t *)heapptr; + if (!heap) + return; + // Release thread cache spans back to global cache + _rpmalloc_heap_cache_adopt_deferred(heap, 0); + if (release_cache || heap->finalize) { +#if ENABLE_THREAD_CACHE + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + span_cache_t *span_cache; + if (!iclass) + span_cache = &heap->span_cache; + else + span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); + if (!span_cache->count) + continue; +#if ENABLE_GLOBAL_CACHE + if (heap->finalize) { + for (size_t ispan = 0; ispan < span_cache->count; ++ispan) + _rpmalloc_span_unmap(span_cache->span[ispan]); + } else { + _rpmalloc_stat_add64(&heap->thread_to_global, span_cache->count * + (iclass + 1) * + _memory_span_size); + _rpmalloc_stat_add(&heap->span_use[iclass].spans_to_global, + span_cache->count); + _rpmalloc_global_cache_insert_spans(span_cache->span, iclass + 1, + span_cache->count); + } +#else + for (size_t ispan = 0; ispan < span_cache->count; ++ispan) + _rpmalloc_span_unmap(span_cache->span[ispan]); +#endif + span_cache->count = 0; + } +#endif + } + + if (get_thread_heap_raw() == heap) + set_thread_heap(0); + +#if ENABLE_STATISTICS + atomic_decr32(&_memory_active_heaps); + rpmalloc_assert(atomic_load32(&_memory_active_heaps) >= 0, + "Still active heaps during finalization"); +#endif + + // If we are forcibly terminating with _exit the state of the + // lock atomic is unknown and it's best to just go ahead and exit + if (get_thread_id() != _rpmalloc_main_thread_id) { + while (!atomic_cas32_acquire(&_memory_global_lock, 1, 0)) + _rpmalloc_spin(); + } + _rpmalloc_heap_orphan(heap, first_class); + atomic_store32_release(&_memory_global_lock, 0); +} + +static void _rpmalloc_heap_release_raw(void *heapptr, int release_cache) { + _rpmalloc_heap_release(heapptr, 0, release_cache); +} + +static void _rpmalloc_heap_release_raw_fc(void *heapptr) { + _rpmalloc_heap_release_raw(heapptr, 1); +} + +static void _rpmalloc_heap_finalize(heap_t *heap) { + if (heap->spans_reserved) { + span_t *span = _rpmalloc_span_map(heap, heap->spans_reserved); + _rpmalloc_span_unmap(span); + heap->spans_reserved = 0; + } + + _rpmalloc_heap_cache_adopt_deferred(heap, 0); + + for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { + if (heap->size_class[iclass].cache) + _rpmalloc_span_unmap(heap->size_class[iclass].cache); + heap->size_class[iclass].cache = 0; + span_t *span = heap->size_class[iclass].partial_span; + while (span) { + span_t *next = span->next; + _rpmalloc_span_finalize(heap, iclass, span, + &heap->size_class[iclass].partial_span); + span = next; + } + // If class still has a free list it must be a full span + if (heap->size_class[iclass].free_list) { + span_t *class_span = + (span_t *)((uintptr_t)heap->size_class[iclass].free_list & + _memory_span_mask); + span_t **list = 0; +#if RPMALLOC_FIRST_CLASS_HEAPS + list = &heap->full_span[iclass]; +#endif + --heap->full_span_count; + if (!_rpmalloc_span_finalize(heap, iclass, class_span, list)) { + if (list) + _rpmalloc_span_double_link_list_remove(list, class_span); + _rpmalloc_span_double_link_list_add( + &heap->size_class[iclass].partial_span, class_span); + } + } + } + +#if ENABLE_THREAD_CACHE + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + span_cache_t *span_cache; + if (!iclass) + span_cache = &heap->span_cache; + else + span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); + for (size_t ispan = 0; ispan < span_cache->count; ++ispan) + _rpmalloc_span_unmap(span_cache->span[ispan]); + span_cache->count = 0; + } +#endif + rpmalloc_assert(!atomic_load_ptr(&heap->span_free_deferred), + "Heaps still active during finalization"); +} + +//////////// +/// +/// Allocation entry points +/// +////// + +//! Pop first block from a free list +static void *free_list_pop(void **list) { + void *block = *list; + *list = *((void **)block); + return block; +} + +//! Allocate a small/medium sized memory block from the given heap +static void *_rpmalloc_allocate_from_heap_fallback( + heap_t *heap, heap_size_class_t *heap_size_class, uint32_t class_idx) { + span_t *span = heap_size_class->partial_span; + rpmalloc_assume(heap != 0); + if (EXPECTED(span != 0)) { + rpmalloc_assert(span->block_count == + _memory_size_class[span->size_class].block_count, + "Span block count corrupted"); + rpmalloc_assert(!_rpmalloc_span_is_fully_utilized(span), + "Internal failure"); + void *block; + if (span->free_list) { + // Span local free list is not empty, swap to size class free list + block = free_list_pop(&span->free_list); + heap_size_class->free_list = span->free_list; + span->free_list = 0; + } else { + // If the span did not fully initialize free list, link up another page + // worth of blocks + void *block_start = pointer_offset( + span, SPAN_HEADER_SIZE + + ((size_t)span->free_list_limit * span->block_size)); + span->free_list_limit += free_list_partial_init( + &heap_size_class->free_list, &block, + (void *)((uintptr_t)block_start & ~(_memory_page_size - 1)), + block_start, span->block_count - span->free_list_limit, + span->block_size); + } + rpmalloc_assert(span->free_list_limit <= span->block_count, + "Span block count corrupted"); + span->used_count = span->free_list_limit; + + // Swap in deferred free list if present + if (atomic_load_ptr(&span->free_list_deferred)) + _rpmalloc_span_extract_free_list_deferred(span); + + // If span is still not fully utilized keep it in partial list and early + // return block + if (!_rpmalloc_span_is_fully_utilized(span)) + return block; + + // The span is fully utilized, unlink from partial list and add to fully + // utilized list + _rpmalloc_span_double_link_list_pop_head(&heap_size_class->partial_span, + span); +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_add(&heap->full_span[class_idx], span); +#endif + ++heap->full_span_count; + return block; + } + + // Find a span in one of the cache levels + span = _rpmalloc_heap_extract_new_span(heap, heap_size_class, 1, class_idx); + if (EXPECTED(span != 0)) { + // Mark span as owned by this heap and set base data, return first block + return _rpmalloc_span_initialize_new(heap, heap_size_class, span, + class_idx); + } + + return 0; +} + +//! Allocate a small sized memory block from the given heap +static void *_rpmalloc_allocate_small(heap_t *heap, size_t size) { + rpmalloc_assert(heap, "No thread heap"); + // Small sizes have unique size classes + const uint32_t class_idx = + (uint32_t)((size + (SMALL_GRANULARITY - 1)) >> SMALL_GRANULARITY_SHIFT); + heap_size_class_t *heap_size_class = heap->size_class + class_idx; + _rpmalloc_stat_inc_alloc(heap, class_idx); + if (EXPECTED(heap_size_class->free_list != 0)) + return free_list_pop(&heap_size_class->free_list); + return _rpmalloc_allocate_from_heap_fallback(heap, heap_size_class, + class_idx); +} + +//! Allocate a medium sized memory block from the given heap +static void *_rpmalloc_allocate_medium(heap_t *heap, size_t size) { + rpmalloc_assert(heap, "No thread heap"); + // Calculate the size class index and do a dependent lookup of the final class + // index (in case of merged classes) + const uint32_t base_idx = + (uint32_t)(SMALL_CLASS_COUNT + + ((size - (SMALL_SIZE_LIMIT + 1)) >> MEDIUM_GRANULARITY_SHIFT)); + const uint32_t class_idx = _memory_size_class[base_idx].class_idx; + heap_size_class_t *heap_size_class = heap->size_class + class_idx; + _rpmalloc_stat_inc_alloc(heap, class_idx); + if (EXPECTED(heap_size_class->free_list != 0)) + return free_list_pop(&heap_size_class->free_list); + return _rpmalloc_allocate_from_heap_fallback(heap, heap_size_class, + class_idx); +} + +//! Allocate a large sized memory block from the given heap +static void *_rpmalloc_allocate_large(heap_t *heap, size_t size) { + rpmalloc_assert(heap, "No thread heap"); + // Calculate number of needed max sized spans (including header) + // Since this function is never called if size > LARGE_SIZE_LIMIT + // the span_count is guaranteed to be <= LARGE_CLASS_COUNT + size += SPAN_HEADER_SIZE; + size_t span_count = size >> _memory_span_size_shift; + if (size & (_memory_span_size - 1)) + ++span_count; + + // Find a span in one of the cache levels + span_t *span = + _rpmalloc_heap_extract_new_span(heap, 0, span_count, SIZE_CLASS_LARGE); + if (!span) + return span; + + // Mark span as owned by this heap and set base data + rpmalloc_assert(span->span_count >= span_count, "Internal failure"); + span->size_class = SIZE_CLASS_LARGE; + span->heap = heap; + +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_add(&heap->large_huge_span, span); +#endif + ++heap->full_span_count; + + return pointer_offset(span, SPAN_HEADER_SIZE); +} + +//! Allocate a huge block by mapping memory pages directly +static void *_rpmalloc_allocate_huge(heap_t *heap, size_t size) { + rpmalloc_assert(heap, "No thread heap"); + _rpmalloc_heap_cache_adopt_deferred(heap, 0); + size += SPAN_HEADER_SIZE; + size_t num_pages = size >> _memory_page_size_shift; + if (size & (_memory_page_size - 1)) + ++num_pages; + size_t align_offset = 0; + span_t *span = + (span_t *)_rpmalloc_mmap(num_pages * _memory_page_size, &align_offset); + if (!span) + return span; + + // Store page count in span_count + span->size_class = SIZE_CLASS_HUGE; + span->span_count = (uint32_t)num_pages; + span->align_offset = (uint32_t)align_offset; + span->heap = heap; + _rpmalloc_stat_add_peak(&_huge_pages_current, num_pages, _huge_pages_peak); + +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_add(&heap->large_huge_span, span); +#endif + ++heap->full_span_count; + + return pointer_offset(span, SPAN_HEADER_SIZE); +} + +//! Allocate a block of the given size +static void *_rpmalloc_allocate(heap_t *heap, size_t size) { + _rpmalloc_stat_add64(&_allocation_counter, 1); + if (EXPECTED(size <= SMALL_SIZE_LIMIT)) + return _rpmalloc_allocate_small(heap, size); + else if (size <= _memory_medium_size_limit) + return _rpmalloc_allocate_medium(heap, size); + else if (size <= LARGE_SIZE_LIMIT) + return _rpmalloc_allocate_large(heap, size); + return _rpmalloc_allocate_huge(heap, size); +} + +static void *_rpmalloc_aligned_allocate(heap_t *heap, size_t alignment, + size_t size) { + if (alignment <= SMALL_GRANULARITY) + return _rpmalloc_allocate(heap, size); + +#if ENABLE_VALIDATE_ARGS + if ((size + alignment) < size) { + errno = EINVAL; + return 0; + } + if (alignment & (alignment - 1)) { + errno = EINVAL; + return 0; + } +#endif + + if ((alignment <= SPAN_HEADER_SIZE) && + ((size + SPAN_HEADER_SIZE) < _memory_medium_size_limit)) { + // If alignment is less or equal to span header size (which is power of + // two), and size aligned to span header size multiples is less than size + + // alignment, then use natural alignment of blocks to provide alignment + size_t multiple_size = size ? (size + (SPAN_HEADER_SIZE - 1)) & + ~(uintptr_t)(SPAN_HEADER_SIZE - 1) + : SPAN_HEADER_SIZE; + rpmalloc_assert(!(multiple_size % SPAN_HEADER_SIZE), + "Failed alignment calculation"); + if (multiple_size <= (size + alignment)) + return _rpmalloc_allocate(heap, multiple_size); + } + + void *ptr = 0; + size_t align_mask = alignment - 1; + if (alignment <= _memory_page_size) { + ptr = _rpmalloc_allocate(heap, size + alignment); + if ((uintptr_t)ptr & align_mask) { + ptr = (void *)(((uintptr_t)ptr & ~(uintptr_t)align_mask) + alignment); + // Mark as having aligned blocks + span_t *span = (span_t *)((uintptr_t)ptr & _memory_span_mask); + span->flags |= SPAN_FLAG_ALIGNED_BLOCKS; + } + return ptr; + } + + // Fallback to mapping new pages for this request. Since pointers passed + // to rpfree must be able to reach the start of the span by bitmasking of + // the address with the span size, the returned aligned pointer from this + // function must be with a span size of the start of the mapped area. + // In worst case this requires us to loop and map pages until we get a + // suitable memory address. It also means we can never align to span size + // or greater, since the span header will push alignment more than one + // span size away from span start (thus causing pointer mask to give us + // an invalid span start on free) + if (alignment & align_mask) { + errno = EINVAL; + return 0; + } + if (alignment >= _memory_span_size) { + errno = EINVAL; + return 0; + } + + size_t extra_pages = alignment / _memory_page_size; + + // Since each span has a header, we will at least need one extra memory page + size_t num_pages = 1 + (size / _memory_page_size); + if (size & (_memory_page_size - 1)) + ++num_pages; + + if (extra_pages > num_pages) + num_pages = 1 + extra_pages; + + size_t original_pages = num_pages; + size_t limit_pages = (_memory_span_size / _memory_page_size) * 2; + if (limit_pages < (original_pages * 2)) + limit_pages = original_pages * 2; + + size_t mapped_size, align_offset; + span_t *span; + +retry: + align_offset = 0; + mapped_size = num_pages * _memory_page_size; + + span = (span_t *)_rpmalloc_mmap(mapped_size, &align_offset); + if (!span) { + errno = ENOMEM; + return 0; + } + ptr = pointer_offset(span, SPAN_HEADER_SIZE); + + if ((uintptr_t)ptr & align_mask) + ptr = (void *)(((uintptr_t)ptr & ~(uintptr_t)align_mask) + alignment); + + if (((size_t)pointer_diff(ptr, span) >= _memory_span_size) || + (pointer_offset(ptr, size) > pointer_offset(span, mapped_size)) || + (((uintptr_t)ptr & _memory_span_mask) != (uintptr_t)span)) { + _rpmalloc_unmap(span, mapped_size, align_offset, mapped_size); + ++num_pages; + if (num_pages > limit_pages) { + errno = EINVAL; + return 0; + } + goto retry; + } + + // Store page count in span_count + span->size_class = SIZE_CLASS_HUGE; + span->span_count = (uint32_t)num_pages; + span->align_offset = (uint32_t)align_offset; + span->heap = heap; + _rpmalloc_stat_add_peak(&_huge_pages_current, num_pages, _huge_pages_peak); + +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_add(&heap->large_huge_span, span); +#endif + ++heap->full_span_count; + + _rpmalloc_stat_add64(&_allocation_counter, 1); + + return ptr; +} + +//////////// +/// +/// Deallocation entry points +/// +////// + +//! Deallocate the given small/medium memory block in the current thread local +//! heap +static void _rpmalloc_deallocate_direct_small_or_medium(span_t *span, + void *block) { + heap_t *heap = span->heap; + rpmalloc_assert(heap->owner_thread == get_thread_id() || + !heap->owner_thread || heap->finalize, + "Internal failure"); + // Add block to free list + if (UNEXPECTED(_rpmalloc_span_is_fully_utilized(span))) { + span->used_count = span->block_count; +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_remove(&heap->full_span[span->size_class], + span); +#endif + _rpmalloc_span_double_link_list_add( + &heap->size_class[span->size_class].partial_span, span); + --heap->full_span_count; + } + *((void **)block) = span->free_list; + --span->used_count; + span->free_list = block; + if (UNEXPECTED(span->used_count == span->list_size)) { + // If there are no used blocks it is guaranteed that no other external + // thread is accessing the span + if (span->used_count) { + // Make sure we have synchronized the deferred list and list size by using + // acquire semantics and guarantee that no external thread is accessing + // span concurrently + void *free_list; + do { + free_list = atomic_exchange_ptr_acquire(&span->free_list_deferred, + INVALID_POINTER); + } while (free_list == INVALID_POINTER); + atomic_store_ptr_release(&span->free_list_deferred, free_list); + } + _rpmalloc_span_double_link_list_remove( + &heap->size_class[span->size_class].partial_span, span); + _rpmalloc_span_release_to_cache(heap, span); + } +} + +static void _rpmalloc_deallocate_defer_free_span(heap_t *heap, span_t *span) { + if (span->size_class != SIZE_CLASS_HUGE) + _rpmalloc_stat_inc(&heap->span_use[span->span_count - 1].spans_deferred); + // This list does not need ABA protection, no mutable side state + do { + span->free_list = (void *)atomic_load_ptr(&heap->span_free_deferred); + } while (!atomic_cas_ptr(&heap->span_free_deferred, span, span->free_list)); +} + +//! Put the block in the deferred free list of the owning span +static void _rpmalloc_deallocate_defer_small_or_medium(span_t *span, + void *block) { + // The memory ordering here is a bit tricky, to avoid having to ABA protect + // the deferred free list to avoid desynchronization of list and list size + // we need to have acquire semantics on successful CAS of the pointer to + // guarantee the list_size variable validity + release semantics on pointer + // store + void *free_list; + do { + free_list = + atomic_exchange_ptr_acquire(&span->free_list_deferred, INVALID_POINTER); + } while (free_list == INVALID_POINTER); + *((void **)block) = free_list; + uint32_t free_count = ++span->list_size; + int all_deferred_free = (free_count == span->block_count); + atomic_store_ptr_release(&span->free_list_deferred, block); + if (all_deferred_free) { + // Span was completely freed by this block. Due to the INVALID_POINTER spin + // lock no other thread can reach this state simultaneously on this span. + // Safe to move to owner heap deferred cache + _rpmalloc_deallocate_defer_free_span(span->heap, span); + } +} + +static void _rpmalloc_deallocate_small_or_medium(span_t *span, void *p) { + _rpmalloc_stat_inc_free(span->heap, span->size_class); + if (span->flags & SPAN_FLAG_ALIGNED_BLOCKS) { + // Realign pointer to block start + void *blocks_start = pointer_offset(span, SPAN_HEADER_SIZE); + uint32_t block_offset = (uint32_t)pointer_diff(p, blocks_start); + p = pointer_offset(p, -(int32_t)(block_offset % span->block_size)); + } + // Check if block belongs to this heap or if deallocation should be deferred +#if RPMALLOC_FIRST_CLASS_HEAPS + int defer = + (span->heap->owner_thread && + (span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); +#else + int defer = + ((span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); +#endif + if (!defer) + _rpmalloc_deallocate_direct_small_or_medium(span, p); + else + _rpmalloc_deallocate_defer_small_or_medium(span, p); +} + +//! Deallocate the given large memory block to the current heap +static void _rpmalloc_deallocate_large(span_t *span) { + rpmalloc_assert(span->size_class == SIZE_CLASS_LARGE, "Bad span size class"); + rpmalloc_assert(!(span->flags & SPAN_FLAG_MASTER) || + !(span->flags & SPAN_FLAG_SUBSPAN), + "Span flag corrupted"); + rpmalloc_assert((span->flags & SPAN_FLAG_MASTER) || + (span->flags & SPAN_FLAG_SUBSPAN), + "Span flag corrupted"); + // We must always defer (unless finalizing) if from another heap since we + // cannot touch the list or counters of another heap +#if RPMALLOC_FIRST_CLASS_HEAPS + int defer = + (span->heap->owner_thread && + (span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); +#else + int defer = + ((span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); +#endif + if (defer) { + _rpmalloc_deallocate_defer_free_span(span->heap, span); + return; + } + rpmalloc_assert(span->heap->full_span_count, "Heap span counter corrupted"); + --span->heap->full_span_count; +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_remove(&span->heap->large_huge_span, span); +#endif +#if ENABLE_ADAPTIVE_THREAD_CACHE || ENABLE_STATISTICS + // Decrease counter + size_t idx = span->span_count - 1; + atomic_decr32(&span->heap->span_use[idx].current); +#endif + heap_t *heap = span->heap; + rpmalloc_assert(heap, "No thread heap"); +#if ENABLE_THREAD_CACHE + const int set_as_reserved = + ((span->span_count > 1) && (heap->span_cache.count == 0) && + !heap->finalize && !heap->spans_reserved); +#else + const int set_as_reserved = + ((span->span_count > 1) && !heap->finalize && !heap->spans_reserved); +#endif + if (set_as_reserved) { + heap->span_reserve = span; + heap->spans_reserved = span->span_count; + if (span->flags & SPAN_FLAG_MASTER) { + heap->span_reserve_master = span; + } else { // SPAN_FLAG_SUBSPAN + span_t *master = (span_t *)pointer_offset( + span, + -(intptr_t)((size_t)span->offset_from_master * _memory_span_size)); + heap->span_reserve_master = master; + rpmalloc_assert(master->flags & SPAN_FLAG_MASTER, "Span flag corrupted"); + rpmalloc_assert(atomic_load32(&master->remaining_spans) >= + (int32_t)span->span_count, + "Master span count corrupted"); + } + _rpmalloc_stat_inc(&heap->span_use[idx].spans_to_reserved); + } else { + // Insert into cache list + _rpmalloc_heap_cache_insert(heap, span); + } +} + +//! Deallocate the given huge span +static void _rpmalloc_deallocate_huge(span_t *span) { + rpmalloc_assert(span->heap, "No span heap"); +#if RPMALLOC_FIRST_CLASS_HEAPS + int defer = + (span->heap->owner_thread && + (span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); +#else + int defer = + ((span->heap->owner_thread != get_thread_id()) && !span->heap->finalize); +#endif + if (defer) { + _rpmalloc_deallocate_defer_free_span(span->heap, span); + return; + } + rpmalloc_assert(span->heap->full_span_count, "Heap span counter corrupted"); + --span->heap->full_span_count; +#if RPMALLOC_FIRST_CLASS_HEAPS + _rpmalloc_span_double_link_list_remove(&span->heap->large_huge_span, span); +#endif + + // Oversized allocation, page count is stored in span_count + size_t num_pages = span->span_count; + _rpmalloc_unmap(span, num_pages * _memory_page_size, span->align_offset, + num_pages * _memory_page_size); + _rpmalloc_stat_sub(&_huge_pages_current, num_pages); +} + +//! Deallocate the given block +static void _rpmalloc_deallocate(void *p) { + _rpmalloc_stat_add64(&_deallocation_counter, 1); + // Grab the span (always at start of span, using span alignment) + span_t *span = (span_t *)((uintptr_t)p & _memory_span_mask); + if (UNEXPECTED(!span)) + return; + if (EXPECTED(span->size_class < SIZE_CLASS_COUNT)) + _rpmalloc_deallocate_small_or_medium(span, p); + else if (span->size_class == SIZE_CLASS_LARGE) + _rpmalloc_deallocate_large(span); + else + _rpmalloc_deallocate_huge(span); +} + +//////////// +/// +/// Reallocation entry points +/// +////// + +static size_t _rpmalloc_usable_size(void *p); + +//! Reallocate the given block to the given size +static void *_rpmalloc_reallocate(heap_t *heap, void *p, size_t size, + size_t oldsize, unsigned int flags) { + if (p) { + // Grab the span using guaranteed span alignment + span_t *span = (span_t *)((uintptr_t)p & _memory_span_mask); + if (EXPECTED(span->size_class < SIZE_CLASS_COUNT)) { + // Small/medium sized block + rpmalloc_assert(span->span_count == 1, "Span counter corrupted"); + void *blocks_start = pointer_offset(span, SPAN_HEADER_SIZE); + uint32_t block_offset = (uint32_t)pointer_diff(p, blocks_start); + uint32_t block_idx = block_offset / span->block_size; + void *block = + pointer_offset(blocks_start, (size_t)block_idx * span->block_size); + if (!oldsize) + oldsize = + (size_t)((ptrdiff_t)span->block_size - pointer_diff(p, block)); + if ((size_t)span->block_size >= size) { + // Still fits in block, never mind trying to save memory, but preserve + // data if alignment changed + if ((p != block) && !(flags & RPMALLOC_NO_PRESERVE)) + memmove(block, p, oldsize); + return block; + } + } else if (span->size_class == SIZE_CLASS_LARGE) { + // Large block + size_t total_size = size + SPAN_HEADER_SIZE; + size_t num_spans = total_size >> _memory_span_size_shift; + if (total_size & (_memory_span_mask - 1)) + ++num_spans; + size_t current_spans = span->span_count; + void *block = pointer_offset(span, SPAN_HEADER_SIZE); + if (!oldsize) + oldsize = (current_spans * _memory_span_size) - + (size_t)pointer_diff(p, block) - SPAN_HEADER_SIZE; + if ((current_spans >= num_spans) && (total_size >= (oldsize / 2))) { + // Still fits in block, never mind trying to save memory, but preserve + // data if alignment changed + if ((p != block) && !(flags & RPMALLOC_NO_PRESERVE)) + memmove(block, p, oldsize); + return block; + } + } else { + // Oversized block + size_t total_size = size + SPAN_HEADER_SIZE; + size_t num_pages = total_size >> _memory_page_size_shift; + if (total_size & (_memory_page_size - 1)) + ++num_pages; + // Page count is stored in span_count + size_t current_pages = span->span_count; + void *block = pointer_offset(span, SPAN_HEADER_SIZE); + if (!oldsize) + oldsize = (current_pages * _memory_page_size) - + (size_t)pointer_diff(p, block) - SPAN_HEADER_SIZE; + if ((current_pages >= num_pages) && (num_pages >= (current_pages / 2))) { + // Still fits in block, never mind trying to save memory, but preserve + // data if alignment changed + if ((p != block) && !(flags & RPMALLOC_NO_PRESERVE)) + memmove(block, p, oldsize); + return block; + } + } + } else { + oldsize = 0; + } + + if (!!(flags & RPMALLOC_GROW_OR_FAIL)) + return 0; + + // Size is greater than block size, need to allocate a new block and + // deallocate the old Avoid hysteresis by overallocating if increase is small + // (below 37%) + size_t lower_bound = oldsize + (oldsize >> 2) + (oldsize >> 3); + size_t new_size = + (size > lower_bound) ? size : ((size > oldsize) ? lower_bound : size); + void *block = _rpmalloc_allocate(heap, new_size); + if (p && block) { + if (!(flags & RPMALLOC_NO_PRESERVE)) + memcpy(block, p, oldsize < new_size ? oldsize : new_size); + _rpmalloc_deallocate(p); + } + + return block; +} + +static void *_rpmalloc_aligned_reallocate(heap_t *heap, void *ptr, + size_t alignment, size_t size, + size_t oldsize, unsigned int flags) { + if (alignment <= SMALL_GRANULARITY) + return _rpmalloc_reallocate(heap, ptr, size, oldsize, flags); + + int no_alloc = !!(flags & RPMALLOC_GROW_OR_FAIL); + size_t usablesize = (ptr ? _rpmalloc_usable_size(ptr) : 0); + if ((usablesize >= size) && !((uintptr_t)ptr & (alignment - 1))) { + if (no_alloc || (size >= (usablesize / 2))) + return ptr; + } + // Aligned alloc marks span as having aligned blocks + void *block = + (!no_alloc ? _rpmalloc_aligned_allocate(heap, alignment, size) : 0); + if (EXPECTED(block != 0)) { + if (!(flags & RPMALLOC_NO_PRESERVE) && ptr) { + if (!oldsize) + oldsize = usablesize; + memcpy(block, ptr, oldsize < size ? oldsize : size); + } + _rpmalloc_deallocate(ptr); + } + return block; +} + +//////////// +/// +/// Initialization, finalization and utility +/// +////// + +//! Get the usable size of the given block +static size_t _rpmalloc_usable_size(void *p) { + // Grab the span using guaranteed span alignment + span_t *span = (span_t *)((uintptr_t)p & _memory_span_mask); + if (span->size_class < SIZE_CLASS_COUNT) { + // Small/medium block + void *blocks_start = pointer_offset(span, SPAN_HEADER_SIZE); + return span->block_size - + ((size_t)pointer_diff(p, blocks_start) % span->block_size); + } + if (span->size_class == SIZE_CLASS_LARGE) { + // Large block + size_t current_spans = span->span_count; + return (current_spans * _memory_span_size) - (size_t)pointer_diff(p, span); + } + // Oversized block, page count is stored in span_count + size_t current_pages = span->span_count; + return (current_pages * _memory_page_size) - (size_t)pointer_diff(p, span); +} + +//! Adjust and optimize the size class properties for the given class +static void _rpmalloc_adjust_size_class(size_t iclass) { + size_t block_size = _memory_size_class[iclass].block_size; + size_t block_count = (_memory_span_size - SPAN_HEADER_SIZE) / block_size; + + _memory_size_class[iclass].block_count = (uint16_t)block_count; + _memory_size_class[iclass].class_idx = (uint16_t)iclass; + + // Check if previous size classes can be merged + if (iclass >= SMALL_CLASS_COUNT) { + size_t prevclass = iclass; + while (prevclass > 0) { + --prevclass; + // A class can be merged if number of pages and number of blocks are equal + if (_memory_size_class[prevclass].block_count == + _memory_size_class[iclass].block_count) + _rpmalloc_memcpy_const(_memory_size_class + prevclass, + _memory_size_class + iclass, + sizeof(_memory_size_class[iclass])); + else + break; + } + } +} + +//! Initialize the allocator and setup global data +extern inline int rpmalloc_initialize(void) { + if (_rpmalloc_initialized) { + rpmalloc_thread_initialize(); + return 0; + } + return rpmalloc_initialize_config(0); +} + +int rpmalloc_initialize_config(const rpmalloc_config_t *config) { + if (_rpmalloc_initialized) { + rpmalloc_thread_initialize(); + return 0; + } + _rpmalloc_initialized = 1; + + if (config) + memcpy(&_memory_config, config, sizeof(rpmalloc_config_t)); + else + _rpmalloc_memset_const(&_memory_config, 0, sizeof(rpmalloc_config_t)); + + if (!_memory_config.memory_map || !_memory_config.memory_unmap) { + _memory_config.memory_map = _rpmalloc_mmap_os; + _memory_config.memory_unmap = _rpmalloc_unmap_os; + } + +#if PLATFORM_WINDOWS + SYSTEM_INFO system_info; + memset(&system_info, 0, sizeof(system_info)); + GetSystemInfo(&system_info); + _memory_map_granularity = system_info.dwAllocationGranularity; +#else + _memory_map_granularity = (size_t)sysconf(_SC_PAGESIZE); +#endif + +#if RPMALLOC_CONFIGURABLE + _memory_page_size = _memory_config.page_size; +#else + _memory_page_size = 0; +#endif + _memory_huge_pages = 0; + if (!_memory_page_size) { +#if PLATFORM_WINDOWS + _memory_page_size = system_info.dwPageSize; +#else + _memory_page_size = _memory_map_granularity; + if (_memory_config.enable_huge_pages) { +#if defined(__linux__) + size_t huge_page_size = 0; + FILE *meminfo = fopen("/proc/meminfo", "r"); + if (meminfo) { + char line[128]; + while (!huge_page_size && fgets(line, sizeof(line) - 1, meminfo)) { + line[sizeof(line) - 1] = 0; + if (strstr(line, "Hugepagesize:")) + huge_page_size = (size_t)strtol(line + 13, 0, 10) * 1024; + } + fclose(meminfo); + } + if (huge_page_size) { + _memory_huge_pages = 1; + _memory_page_size = huge_page_size; + _memory_map_granularity = huge_page_size; + } +#elif defined(__FreeBSD__) + int rc; + size_t sz = sizeof(rc); + + if (sysctlbyname("vm.pmap.pg_ps_enabled", &rc, &sz, NULL, 0) == 0 && + rc == 1) { + static size_t defsize = 2 * 1024 * 1024; + int nsize = 0; + size_t sizes[4] = {0}; + _memory_huge_pages = 1; + _memory_page_size = defsize; + if ((nsize = getpagesizes(sizes, 4)) >= 2) { + nsize--; + for (size_t csize = sizes[nsize]; nsize >= 0 && csize; + --nsize, csize = sizes[nsize]) { + //! Unlikely, but as a precaution.. + rpmalloc_assert(!(csize & (csize - 1)) && !(csize % 1024), + "Invalid page size"); + if (defsize < csize) { + _memory_page_size = csize; + break; + } + } + } + _memory_map_granularity = _memory_page_size; + } +#elif defined(__APPLE__) || defined(__NetBSD__) + _memory_huge_pages = 1; + _memory_page_size = 2 * 1024 * 1024; + _memory_map_granularity = _memory_page_size; +#endif + } +#endif + } else { + if (_memory_config.enable_huge_pages) + _memory_huge_pages = 1; + } + +#if PLATFORM_WINDOWS + if (_memory_config.enable_huge_pages) { + HANDLE token = 0; + size_t large_page_minimum = GetLargePageMinimum(); + if (large_page_minimum) + OpenProcessToken(GetCurrentProcess(), + TOKEN_ADJUST_PRIVILEGES | TOKEN_QUERY, &token); + if (token) { + LUID luid; + if (LookupPrivilegeValue(0, SE_LOCK_MEMORY_NAME, &luid)) { + TOKEN_PRIVILEGES token_privileges; + memset(&token_privileges, 0, sizeof(token_privileges)); + token_privileges.PrivilegeCount = 1; + token_privileges.Privileges[0].Luid = luid; + token_privileges.Privileges[0].Attributes = SE_PRIVILEGE_ENABLED; + if (AdjustTokenPrivileges(token, FALSE, &token_privileges, 0, 0, 0)) { + if (GetLastError() == ERROR_SUCCESS) + _memory_huge_pages = 1; + } + } + CloseHandle(token); + } + if (_memory_huge_pages) { + if (large_page_minimum > _memory_page_size) + _memory_page_size = large_page_minimum; + if (large_page_minimum > _memory_map_granularity) + _memory_map_granularity = large_page_minimum; + } + } +#endif + + size_t min_span_size = 256; + size_t max_page_size; +#if UINTPTR_MAX > 0xFFFFFFFF + max_page_size = 4096ULL * 1024ULL * 1024ULL; +#else + max_page_size = 4 * 1024 * 1024; +#endif + if (_memory_page_size < min_span_size) + _memory_page_size = min_span_size; + if (_memory_page_size > max_page_size) + _memory_page_size = max_page_size; + _memory_page_size_shift = 0; + size_t page_size_bit = _memory_page_size; + while (page_size_bit != 1) { + ++_memory_page_size_shift; + page_size_bit >>= 1; + } + _memory_page_size = ((size_t)1 << _memory_page_size_shift); + +#if RPMALLOC_CONFIGURABLE + if (!_memory_config.span_size) { + _memory_span_size = _memory_default_span_size; + _memory_span_size_shift = _memory_default_span_size_shift; + _memory_span_mask = _memory_default_span_mask; + } else { + size_t span_size = _memory_config.span_size; + if (span_size > (256 * 1024)) + span_size = (256 * 1024); + _memory_span_size = 4096; + _memory_span_size_shift = 12; + while (_memory_span_size < span_size) { + _memory_span_size <<= 1; + ++_memory_span_size_shift; + } + _memory_span_mask = ~(uintptr_t)(_memory_span_size - 1); + } +#endif + + _memory_span_map_count = + (_memory_config.span_map_count ? _memory_config.span_map_count + : DEFAULT_SPAN_MAP_COUNT); + if ((_memory_span_size * _memory_span_map_count) < _memory_page_size) + _memory_span_map_count = (_memory_page_size / _memory_span_size); + if ((_memory_page_size >= _memory_span_size) && + ((_memory_span_map_count * _memory_span_size) % _memory_page_size)) + _memory_span_map_count = (_memory_page_size / _memory_span_size); + _memory_heap_reserve_count = (_memory_span_map_count > DEFAULT_SPAN_MAP_COUNT) + ? DEFAULT_SPAN_MAP_COUNT + : _memory_span_map_count; + + _memory_config.page_size = _memory_page_size; + _memory_config.span_size = _memory_span_size; + _memory_config.span_map_count = _memory_span_map_count; + _memory_config.enable_huge_pages = _memory_huge_pages; + +#if ((defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD) || \ + defined(__TINYC__) + if (pthread_key_create(&_memory_thread_heap, _rpmalloc_heap_release_raw_fc)) + return -1; +#endif +#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) + fls_key = FlsAlloc(&_rpmalloc_thread_destructor); +#endif + + // Setup all small and medium size classes + size_t iclass = 0; + _memory_size_class[iclass].block_size = SMALL_GRANULARITY; + _rpmalloc_adjust_size_class(iclass); + for (iclass = 1; iclass < SMALL_CLASS_COUNT; ++iclass) { + size_t size = iclass * SMALL_GRANULARITY; + _memory_size_class[iclass].block_size = (uint32_t)size; + _rpmalloc_adjust_size_class(iclass); + } + // At least two blocks per span, then fall back to large allocations + _memory_medium_size_limit = (_memory_span_size - SPAN_HEADER_SIZE) >> 1; + if (_memory_medium_size_limit > MEDIUM_SIZE_LIMIT) + _memory_medium_size_limit = MEDIUM_SIZE_LIMIT; + for (iclass = 0; iclass < MEDIUM_CLASS_COUNT; ++iclass) { + size_t size = SMALL_SIZE_LIMIT + ((iclass + 1) * MEDIUM_GRANULARITY); + if (size > _memory_medium_size_limit) { + _memory_medium_size_limit = + SMALL_SIZE_LIMIT + (iclass * MEDIUM_GRANULARITY); + break; + } + _memory_size_class[SMALL_CLASS_COUNT + iclass].block_size = (uint32_t)size; + _rpmalloc_adjust_size_class(SMALL_CLASS_COUNT + iclass); + } + + _memory_orphan_heaps = 0; +#if RPMALLOC_FIRST_CLASS_HEAPS + _memory_first_class_orphan_heaps = 0; +#endif +#if ENABLE_STATISTICS + atomic_store32(&_memory_active_heaps, 0); + atomic_store32(&_mapped_pages, 0); + _mapped_pages_peak = 0; + atomic_store32(&_master_spans, 0); + atomic_store32(&_mapped_total, 0); + atomic_store32(&_unmapped_total, 0); + atomic_store32(&_mapped_pages_os, 0); + atomic_store32(&_huge_pages_current, 0); + _huge_pages_peak = 0; +#endif + memset(_memory_heaps, 0, sizeof(_memory_heaps)); + atomic_store32_release(&_memory_global_lock, 0); + + rpmalloc_linker_reference(); + + // Initialize this thread + rpmalloc_thread_initialize(); + return 0; +} + +//! Finalize the allocator +void rpmalloc_finalize(void) { + rpmalloc_thread_finalize(1); + // rpmalloc_dump_statistics(stdout); + + if (_memory_global_reserve) { + atomic_add32(&_memory_global_reserve_master->remaining_spans, + -(int32_t)_memory_global_reserve_count); + _memory_global_reserve_master = 0; + _memory_global_reserve_count = 0; + _memory_global_reserve = 0; + } + atomic_store32_release(&_memory_global_lock, 0); + + // Free all thread caches and fully free spans + for (size_t list_idx = 0; list_idx < HEAP_ARRAY_SIZE; ++list_idx) { + heap_t *heap = _memory_heaps[list_idx]; + while (heap) { + heap_t *next_heap = heap->next_heap; + heap->finalize = 1; + _rpmalloc_heap_global_finalize(heap); + heap = next_heap; + } + } + +#if ENABLE_GLOBAL_CACHE + // Free global caches + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) + _rpmalloc_global_cache_finalize(&_memory_span_cache[iclass]); +#endif + +#if (defined(__APPLE__) || defined(__HAIKU__)) && ENABLE_PRELOAD + pthread_key_delete(_memory_thread_heap); +#endif +#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) + FlsFree(fls_key); + fls_key = 0; +#endif +#if ENABLE_STATISTICS + // If you hit these asserts you probably have memory leaks (perhaps global + // scope data doing dynamic allocations) or double frees in your code + rpmalloc_assert(atomic_load32(&_mapped_pages) == 0, "Memory leak detected"); + rpmalloc_assert(atomic_load32(&_mapped_pages_os) == 0, + "Memory leak detected"); +#endif + + _rpmalloc_initialized = 0; +} + +//! Initialize thread, assign heap +extern inline void rpmalloc_thread_initialize(void) { + if (!get_thread_heap_raw()) { + heap_t *heap = _rpmalloc_heap_allocate(0); + if (heap) { + _rpmalloc_stat_inc(&_memory_active_heaps); + set_thread_heap(heap); +#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) + FlsSetValue(fls_key, heap); +#endif + } + } +} + +//! Finalize thread, orphan heap +void rpmalloc_thread_finalize(int release_caches) { + heap_t *heap = get_thread_heap_raw(); + if (heap) + _rpmalloc_heap_release_raw(heap, release_caches); + set_thread_heap(0); +#if defined(_WIN32) && (!defined(BUILD_DYNAMIC_LINK) || !BUILD_DYNAMIC_LINK) + FlsSetValue(fls_key, 0); +#endif +} + +int rpmalloc_is_thread_initialized(void) { + return (get_thread_heap_raw() != 0) ? 1 : 0; +} + +const rpmalloc_config_t *rpmalloc_config(void) { return &_memory_config; } + +// Extern interface + +extern inline RPMALLOC_ALLOCATOR void *rpmalloc(size_t size) { +#if ENABLE_VALIDATE_ARGS + if (size >= MAX_ALLOC_SIZE) { + errno = EINVAL; + return 0; + } +#endif + heap_t *heap = get_thread_heap(); + return _rpmalloc_allocate(heap, size); +} + +extern inline void rpfree(void *ptr) { _rpmalloc_deallocate(ptr); } + +extern inline RPMALLOC_ALLOCATOR void *rpcalloc(size_t num, size_t size) { + size_t total; +#if ENABLE_VALIDATE_ARGS +#if PLATFORM_WINDOWS + int err = SizeTMult(num, size, &total); + if ((err != S_OK) || (total >= MAX_ALLOC_SIZE)) { + errno = EINVAL; + return 0; + } +#else + int err = __builtin_umull_overflow(num, size, &total); + if (err || (total >= MAX_ALLOC_SIZE)) { + errno = EINVAL; + return 0; + } +#endif +#else + total = num * size; +#endif + heap_t *heap = get_thread_heap(); + void *block = _rpmalloc_allocate(heap, total); + if (block) + memset(block, 0, total); + return block; +} + +extern inline RPMALLOC_ALLOCATOR void *rprealloc(void *ptr, size_t size) { +#if ENABLE_VALIDATE_ARGS + if (size >= MAX_ALLOC_SIZE) { + errno = EINVAL; + return ptr; + } +#endif + heap_t *heap = get_thread_heap(); + return _rpmalloc_reallocate(heap, ptr, size, 0, 0); +} + +extern RPMALLOC_ALLOCATOR void *rpaligned_realloc(void *ptr, size_t alignment, + size_t size, size_t oldsize, + unsigned int flags) { +#if ENABLE_VALIDATE_ARGS + if ((size + alignment < size) || (alignment > _memory_page_size)) { + errno = EINVAL; + return 0; + } +#endif + heap_t *heap = get_thread_heap(); + return _rpmalloc_aligned_reallocate(heap, ptr, alignment, size, oldsize, + flags); +} + +extern RPMALLOC_ALLOCATOR void *rpaligned_alloc(size_t alignment, size_t size) { + heap_t *heap = get_thread_heap(); + return _rpmalloc_aligned_allocate(heap, alignment, size); +} + +extern inline RPMALLOC_ALLOCATOR void * +rpaligned_calloc(size_t alignment, size_t num, size_t size) { + size_t total; +#if ENABLE_VALIDATE_ARGS +#if PLATFORM_WINDOWS + int err = SizeTMult(num, size, &total); + if ((err != S_OK) || (total >= MAX_ALLOC_SIZE)) { + errno = EINVAL; + return 0; + } +#else + int err = __builtin_umull_overflow(num, size, &total); + if (err || (total >= MAX_ALLOC_SIZE)) { + errno = EINVAL; + return 0; + } +#endif +#else + total = num * size; +#endif + void *block = rpaligned_alloc(alignment, total); + if (block) + memset(block, 0, total); + return block; +} + +extern inline RPMALLOC_ALLOCATOR void *rpmemalign(size_t alignment, + size_t size) { + return rpaligned_alloc(alignment, size); +} + +extern inline int rpposix_memalign(void **memptr, size_t alignment, + size_t size) { + if (memptr) + *memptr = rpaligned_alloc(alignment, size); + else + return EINVAL; + return *memptr ? 0 : ENOMEM; +} + +extern inline size_t rpmalloc_usable_size(void *ptr) { + return (ptr ? _rpmalloc_usable_size(ptr) : 0); +} + +extern inline void rpmalloc_thread_collect(void) {} + +void rpmalloc_thread_statistics(rpmalloc_thread_statistics_t *stats) { + memset(stats, 0, sizeof(rpmalloc_thread_statistics_t)); + heap_t *heap = get_thread_heap_raw(); + if (!heap) + return; + + for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { + size_class_t *size_class = _memory_size_class + iclass; + span_t *span = heap->size_class[iclass].partial_span; + while (span) { + size_t free_count = span->list_size; + size_t block_count = size_class->block_count; + if (span->free_list_limit < block_count) + block_count = span->free_list_limit; + free_count += (block_count - span->used_count); + stats->sizecache += free_count * size_class->block_size; + span = span->next; + } + } + +#if ENABLE_THREAD_CACHE + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + span_cache_t *span_cache; + if (!iclass) + span_cache = &heap->span_cache; + else + span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); + stats->spancache += span_cache->count * (iclass + 1) * _memory_span_size; + } +#endif + + span_t *deferred = (span_t *)atomic_load_ptr(&heap->span_free_deferred); + while (deferred) { + if (deferred->size_class != SIZE_CLASS_HUGE) + stats->spancache += (size_t)deferred->span_count * _memory_span_size; + deferred = (span_t *)deferred->free_list; + } + +#if ENABLE_STATISTICS + stats->thread_to_global = (size_t)atomic_load64(&heap->thread_to_global); + stats->global_to_thread = (size_t)atomic_load64(&heap->global_to_thread); + + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + stats->span_use[iclass].current = + (size_t)atomic_load32(&heap->span_use[iclass].current); + stats->span_use[iclass].peak = + (size_t)atomic_load32(&heap->span_use[iclass].high); + stats->span_use[iclass].to_global = + (size_t)atomic_load32(&heap->span_use[iclass].spans_to_global); + stats->span_use[iclass].from_global = + (size_t)atomic_load32(&heap->span_use[iclass].spans_from_global); + stats->span_use[iclass].to_cache = + (size_t)atomic_load32(&heap->span_use[iclass].spans_to_cache); + stats->span_use[iclass].from_cache = + (size_t)atomic_load32(&heap->span_use[iclass].spans_from_cache); + stats->span_use[iclass].to_reserved = + (size_t)atomic_load32(&heap->span_use[iclass].spans_to_reserved); + stats->span_use[iclass].from_reserved = + (size_t)atomic_load32(&heap->span_use[iclass].spans_from_reserved); + stats->span_use[iclass].map_calls = + (size_t)atomic_load32(&heap->span_use[iclass].spans_map_calls); + } + for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { + stats->size_use[iclass].alloc_current = + (size_t)atomic_load32(&heap->size_class_use[iclass].alloc_current); + stats->size_use[iclass].alloc_peak = + (size_t)heap->size_class_use[iclass].alloc_peak; + stats->size_use[iclass].alloc_total = + (size_t)atomic_load32(&heap->size_class_use[iclass].alloc_total); + stats->size_use[iclass].free_total = + (size_t)atomic_load32(&heap->size_class_use[iclass].free_total); + stats->size_use[iclass].spans_to_cache = + (size_t)atomic_load32(&heap->size_class_use[iclass].spans_to_cache); + stats->size_use[iclass].spans_from_cache = + (size_t)atomic_load32(&heap->size_class_use[iclass].spans_from_cache); + stats->size_use[iclass].spans_from_reserved = (size_t)atomic_load32( + &heap->size_class_use[iclass].spans_from_reserved); + stats->size_use[iclass].map_calls = + (size_t)atomic_load32(&heap->size_class_use[iclass].spans_map_calls); + } +#endif +} + +void rpmalloc_global_statistics(rpmalloc_global_statistics_t *stats) { + memset(stats, 0, sizeof(rpmalloc_global_statistics_t)); +#if ENABLE_STATISTICS + stats->mapped = (size_t)atomic_load32(&_mapped_pages) * _memory_page_size; + stats->mapped_peak = (size_t)_mapped_pages_peak * _memory_page_size; + stats->mapped_total = + (size_t)atomic_load32(&_mapped_total) * _memory_page_size; + stats->unmapped_total = + (size_t)atomic_load32(&_unmapped_total) * _memory_page_size; + stats->huge_alloc = + (size_t)atomic_load32(&_huge_pages_current) * _memory_page_size; + stats->huge_alloc_peak = (size_t)_huge_pages_peak * _memory_page_size; +#endif +#if ENABLE_GLOBAL_CACHE + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + global_cache_t *cache = &_memory_span_cache[iclass]; + while (!atomic_cas32_acquire(&cache->lock, 1, 0)) + _rpmalloc_spin(); + uint32_t count = cache->count; +#if ENABLE_UNLIMITED_CACHE + span_t *current_span = cache->overflow; + while (current_span) { + ++count; + current_span = current_span->next; + } +#endif + atomic_store32_release(&cache->lock, 0); + stats->cached += count * (iclass + 1) * _memory_span_size; + } +#endif +} + +#if ENABLE_STATISTICS + +static void _memory_heap_dump_statistics(heap_t *heap, void *file) { + fprintf(file, "Heap %d stats:\n", heap->id); + fprintf(file, "Class CurAlloc PeakAlloc TotAlloc TotFree BlkSize " + "BlkCount SpansCur SpansPeak PeakAllocMiB ToCacheMiB " + "FromCacheMiB FromReserveMiB MmapCalls\n"); + for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { + if (!atomic_load32(&heap->size_class_use[iclass].alloc_total)) + continue; + fprintf( + file, + "%3u: %10u %10u %10u %10u %8u %8u %8d %9d %13zu %11zu %12zu %14zu " + "%9u\n", + (uint32_t)iclass, + atomic_load32(&heap->size_class_use[iclass].alloc_current), + heap->size_class_use[iclass].alloc_peak, + atomic_load32(&heap->size_class_use[iclass].alloc_total), + atomic_load32(&heap->size_class_use[iclass].free_total), + _memory_size_class[iclass].block_size, + _memory_size_class[iclass].block_count, + atomic_load32(&heap->size_class_use[iclass].spans_current), + heap->size_class_use[iclass].spans_peak, + ((size_t)heap->size_class_use[iclass].alloc_peak * + (size_t)_memory_size_class[iclass].block_size) / + (size_t)(1024 * 1024), + ((size_t)atomic_load32(&heap->size_class_use[iclass].spans_to_cache) * + _memory_span_size) / + (size_t)(1024 * 1024), + ((size_t)atomic_load32(&heap->size_class_use[iclass].spans_from_cache) * + _memory_span_size) / + (size_t)(1024 * 1024), + ((size_t)atomic_load32( + &heap->size_class_use[iclass].spans_from_reserved) * + _memory_span_size) / + (size_t)(1024 * 1024), + atomic_load32(&heap->size_class_use[iclass].spans_map_calls)); + } + fprintf(file, "Spans Current Peak Deferred PeakMiB Cached ToCacheMiB " + "FromCacheMiB ToReserveMiB FromReserveMiB ToGlobalMiB " + "FromGlobalMiB MmapCalls\n"); + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + if (!atomic_load32(&heap->span_use[iclass].high) && + !atomic_load32(&heap->span_use[iclass].spans_map_calls)) + continue; + fprintf( + file, + "%4u: %8d %8u %8u %8zu %7u %11zu %12zu %12zu %14zu %11zu %13zu %10u\n", + (uint32_t)(iclass + 1), atomic_load32(&heap->span_use[iclass].current), + atomic_load32(&heap->span_use[iclass].high), + atomic_load32(&heap->span_use[iclass].spans_deferred), + ((size_t)atomic_load32(&heap->span_use[iclass].high) * + (size_t)_memory_span_size * (iclass + 1)) / + (size_t)(1024 * 1024), +#if ENABLE_THREAD_CACHE + (unsigned int)(!iclass ? heap->span_cache.count + : heap->span_large_cache[iclass - 1].count), + ((size_t)atomic_load32(&heap->span_use[iclass].spans_to_cache) * + (iclass + 1) * _memory_span_size) / + (size_t)(1024 * 1024), + ((size_t)atomic_load32(&heap->span_use[iclass].spans_from_cache) * + (iclass + 1) * _memory_span_size) / + (size_t)(1024 * 1024), +#else + 0, (size_t)0, (size_t)0, +#endif + ((size_t)atomic_load32(&heap->span_use[iclass].spans_to_reserved) * + (iclass + 1) * _memory_span_size) / + (size_t)(1024 * 1024), + ((size_t)atomic_load32(&heap->span_use[iclass].spans_from_reserved) * + (iclass + 1) * _memory_span_size) / + (size_t)(1024 * 1024), + ((size_t)atomic_load32(&heap->span_use[iclass].spans_to_global) * + (size_t)_memory_span_size * (iclass + 1)) / + (size_t)(1024 * 1024), + ((size_t)atomic_load32(&heap->span_use[iclass].spans_from_global) * + (size_t)_memory_span_size * (iclass + 1)) / + (size_t)(1024 * 1024), + atomic_load32(&heap->span_use[iclass].spans_map_calls)); + } + fprintf(file, "Full spans: %zu\n", heap->full_span_count); + fprintf(file, "ThreadToGlobalMiB GlobalToThreadMiB\n"); + fprintf( + file, "%17zu %17zu\n", + (size_t)atomic_load64(&heap->thread_to_global) / (size_t)(1024 * 1024), + (size_t)atomic_load64(&heap->global_to_thread) / (size_t)(1024 * 1024)); +} + +#endif + +void rpmalloc_dump_statistics(void *file) { +#if ENABLE_STATISTICS + for (size_t list_idx = 0; list_idx < HEAP_ARRAY_SIZE; ++list_idx) { + heap_t *heap = _memory_heaps[list_idx]; + while (heap) { + int need_dump = 0; + for (size_t iclass = 0; !need_dump && (iclass < SIZE_CLASS_COUNT); + ++iclass) { + if (!atomic_load32(&heap->size_class_use[iclass].alloc_total)) { + rpmalloc_assert( + !atomic_load32(&heap->size_class_use[iclass].free_total), + "Heap statistics counter mismatch"); + rpmalloc_assert( + !atomic_load32(&heap->size_class_use[iclass].spans_map_calls), + "Heap statistics counter mismatch"); + continue; + } + need_dump = 1; + } + for (size_t iclass = 0; !need_dump && (iclass < LARGE_CLASS_COUNT); + ++iclass) { + if (!atomic_load32(&heap->span_use[iclass].high) && + !atomic_load32(&heap->span_use[iclass].spans_map_calls)) + continue; + need_dump = 1; + } + if (need_dump) + _memory_heap_dump_statistics(heap, file); + heap = heap->next_heap; + } + } + fprintf(file, "Global stats:\n"); + size_t huge_current = + (size_t)atomic_load32(&_huge_pages_current) * _memory_page_size; + size_t huge_peak = (size_t)_huge_pages_peak * _memory_page_size; + fprintf(file, "HugeCurrentMiB HugePeakMiB\n"); + fprintf(file, "%14zu %11zu\n", huge_current / (size_t)(1024 * 1024), + huge_peak / (size_t)(1024 * 1024)); + +#if ENABLE_GLOBAL_CACHE + fprintf(file, "GlobalCacheMiB\n"); + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + global_cache_t *cache = _memory_span_cache + iclass; + size_t global_cache = (size_t)cache->count * iclass * _memory_span_size; + + size_t global_overflow_cache = 0; + span_t *span = cache->overflow; + while (span) { + global_overflow_cache += iclass * _memory_span_size; + span = span->next; + } + if (global_cache || global_overflow_cache || cache->insert_count || + cache->extract_count) + fprintf(file, + "%4zu: %8zuMiB (%8zuMiB overflow) %14zu insert %14zu extract\n", + iclass + 1, global_cache / (size_t)(1024 * 1024), + global_overflow_cache / (size_t)(1024 * 1024), + cache->insert_count, cache->extract_count); + } +#endif + + size_t mapped = (size_t)atomic_load32(&_mapped_pages) * _memory_page_size; + size_t mapped_os = + (size_t)atomic_load32(&_mapped_pages_os) * _memory_page_size; + size_t mapped_peak = (size_t)_mapped_pages_peak * _memory_page_size; + size_t mapped_total = + (size_t)atomic_load32(&_mapped_total) * _memory_page_size; + size_t unmapped_total = + (size_t)atomic_load32(&_unmapped_total) * _memory_page_size; + fprintf( + file, + "MappedMiB MappedOSMiB MappedPeakMiB MappedTotalMiB UnmappedTotalMiB\n"); + fprintf(file, "%9zu %11zu %13zu %14zu %16zu\n", + mapped / (size_t)(1024 * 1024), mapped_os / (size_t)(1024 * 1024), + mapped_peak / (size_t)(1024 * 1024), + mapped_total / (size_t)(1024 * 1024), + unmapped_total / (size_t)(1024 * 1024)); + + fprintf(file, "\n"); +#if 0 + int64_t allocated = atomic_load64(&_allocation_counter); + int64_t deallocated = atomic_load64(&_deallocation_counter); + fprintf(file, "Allocation count: %lli\n", allocated); + fprintf(file, "Deallocation count: %lli\n", deallocated); + fprintf(file, "Current allocations: %lli\n", (allocated - deallocated)); + fprintf(file, "Master spans: %d\n", atomic_load32(&_master_spans)); + fprintf(file, "Dangling master spans: %d\n", atomic_load32(&_unmapped_master_spans)); +#endif +#endif + (void)sizeof(file); +} + +#if RPMALLOC_FIRST_CLASS_HEAPS + +extern inline rpmalloc_heap_t *rpmalloc_heap_acquire(void) { + // Must be a pristine heap from newly mapped memory pages, or else memory + // blocks could already be allocated from the heap which would (wrongly) be + // released when heap is cleared with rpmalloc_heap_free_all(). Also heaps + // guaranteed to be pristine from the dedicated orphan list can be used. + heap_t *heap = _rpmalloc_heap_allocate(1); + rpmalloc_assume(heap != NULL); + heap->owner_thread = 0; + _rpmalloc_stat_inc(&_memory_active_heaps); + return heap; +} + +extern inline void rpmalloc_heap_release(rpmalloc_heap_t *heap) { + if (heap) + _rpmalloc_heap_release(heap, 1, 1); +} + +extern inline RPMALLOC_ALLOCATOR void * +rpmalloc_heap_alloc(rpmalloc_heap_t *heap, size_t size) { +#if ENABLE_VALIDATE_ARGS + if (size >= MAX_ALLOC_SIZE) { + errno = EINVAL; + return 0; + } +#endif + return _rpmalloc_allocate(heap, size); +} + +extern inline RPMALLOC_ALLOCATOR void * +rpmalloc_heap_aligned_alloc(rpmalloc_heap_t *heap, size_t alignment, + size_t size) { +#if ENABLE_VALIDATE_ARGS + if (size >= MAX_ALLOC_SIZE) { + errno = EINVAL; + return 0; + } +#endif + return _rpmalloc_aligned_allocate(heap, alignment, size); +} + +extern inline RPMALLOC_ALLOCATOR void * +rpmalloc_heap_calloc(rpmalloc_heap_t *heap, size_t num, size_t size) { + return rpmalloc_heap_aligned_calloc(heap, 0, num, size); +} + +extern inline RPMALLOC_ALLOCATOR void * +rpmalloc_heap_aligned_calloc(rpmalloc_heap_t *heap, size_t alignment, + size_t num, size_t size) { + size_t total; +#if ENABLE_VALIDATE_ARGS +#if PLATFORM_WINDOWS + int err = SizeTMult(num, size, &total); + if ((err != S_OK) || (total >= MAX_ALLOC_SIZE)) { + errno = EINVAL; + return 0; + } +#else + int err = __builtin_umull_overflow(num, size, &total); + if (err || (total >= MAX_ALLOC_SIZE)) { + errno = EINVAL; + return 0; + } +#endif +#else + total = num * size; +#endif + void *block = _rpmalloc_aligned_allocate(heap, alignment, total); + if (block) + memset(block, 0, total); + return block; +} + +extern inline RPMALLOC_ALLOCATOR void * +rpmalloc_heap_realloc(rpmalloc_heap_t *heap, void *ptr, size_t size, + unsigned int flags) { +#if ENABLE_VALIDATE_ARGS + if (size >= MAX_ALLOC_SIZE) { + errno = EINVAL; + return ptr; + } +#endif + return _rpmalloc_reallocate(heap, ptr, size, 0, flags); +} + +extern inline RPMALLOC_ALLOCATOR void * +rpmalloc_heap_aligned_realloc(rpmalloc_heap_t *heap, void *ptr, + size_t alignment, size_t size, + unsigned int flags) { +#if ENABLE_VALIDATE_ARGS + if ((size + alignment < size) || (alignment > _memory_page_size)) { + errno = EINVAL; + return 0; + } +#endif + return _rpmalloc_aligned_reallocate(heap, ptr, alignment, size, 0, flags); +} + +extern inline void rpmalloc_heap_free(rpmalloc_heap_t *heap, void *ptr) { + (void)sizeof(heap); + _rpmalloc_deallocate(ptr); +} + +extern inline void rpmalloc_heap_free_all(rpmalloc_heap_t *heap) { + span_t *span; + span_t *next_span; + + _rpmalloc_heap_cache_adopt_deferred(heap, 0); + + for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { + span = heap->size_class[iclass].partial_span; + while (span) { + next_span = span->next; + _rpmalloc_heap_cache_insert(heap, span); + span = next_span; + } + heap->size_class[iclass].partial_span = 0; + span = heap->full_span[iclass]; + while (span) { + next_span = span->next; + _rpmalloc_heap_cache_insert(heap, span); + span = next_span; + } + + span = heap->size_class[iclass].cache; + if (span) + _rpmalloc_heap_cache_insert(heap, span); + heap->size_class[iclass].cache = 0; + } + memset(heap->size_class, 0, sizeof(heap->size_class)); + memset(heap->full_span, 0, sizeof(heap->full_span)); + + span = heap->large_huge_span; + while (span) { + next_span = span->next; + if (UNEXPECTED(span->size_class == SIZE_CLASS_HUGE)) + _rpmalloc_deallocate_huge(span); + else + _rpmalloc_heap_cache_insert(heap, span); + span = next_span; + } + heap->large_huge_span = 0; + heap->full_span_count = 0; + +#if ENABLE_THREAD_CACHE + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + span_cache_t *span_cache; + if (!iclass) + span_cache = &heap->span_cache; + else + span_cache = (span_cache_t *)(heap->span_large_cache + (iclass - 1)); + if (!span_cache->count) + continue; +#if ENABLE_GLOBAL_CACHE + _rpmalloc_stat_add64(&heap->thread_to_global, + span_cache->count * (iclass + 1) * _memory_span_size); + _rpmalloc_stat_add(&heap->span_use[iclass].spans_to_global, + span_cache->count); + _rpmalloc_global_cache_insert_spans(span_cache->span, iclass + 1, + span_cache->count); +#else + for (size_t ispan = 0; ispan < span_cache->count; ++ispan) + _rpmalloc_span_unmap(span_cache->span[ispan]); +#endif + span_cache->count = 0; + } +#endif + +#if ENABLE_STATISTICS + for (size_t iclass = 0; iclass < SIZE_CLASS_COUNT; ++iclass) { + atomic_store32(&heap->size_class_use[iclass].alloc_current, 0); + atomic_store32(&heap->size_class_use[iclass].spans_current, 0); + } + for (size_t iclass = 0; iclass < LARGE_CLASS_COUNT; ++iclass) { + atomic_store32(&heap->span_use[iclass].current, 0); + } +#endif +} + +extern inline void rpmalloc_heap_thread_set_current(rpmalloc_heap_t *heap) { + heap_t *prev_heap = get_thread_heap_raw(); + if (prev_heap != heap) { + set_thread_heap(heap); + if (prev_heap) + rpmalloc_heap_release(prev_heap); + } +} + +extern inline rpmalloc_heap_t *rpmalloc_get_heap_for_ptr(void *ptr) { + // Grab the span, and then the heap from the span + span_t *span = (span_t *)((uintptr_t)ptr & _memory_span_mask); + if (span) { + return span->heap; + } + return 0; +} + +#endif + +#if ENABLE_PRELOAD || ENABLE_OVERRIDE + +#include "malloc.c" + +#endif + +void rpmalloc_linker_reference(void) { (void)sizeof(_rpmalloc_initialized); } diff --git a/llvm/lib/Support/rpmalloc/rpmalloc.h b/llvm/lib/Support/rpmalloc/rpmalloc.h index 3911c53b779b36..5b7fe1ff4286ba 100644 --- a/llvm/lib/Support/rpmalloc/rpmalloc.h +++ b/llvm/lib/Support/rpmalloc/rpmalloc.h @@ -1,428 +1,428 @@ -//===---------------------- rpmalloc.h ------------------*- C -*-=============// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// -// -// This library provides a cross-platform lock free thread caching malloc -// implementation in C11. -// -//===----------------------------------------------------------------------===// - -#pragma once - -#include - -#ifdef __cplusplus -extern "C" { -#endif - -#if defined(__clang__) || defined(__GNUC__) -#define RPMALLOC_EXPORT __attribute__((visibility("default"))) -#define RPMALLOC_ALLOCATOR -#if (defined(__clang_major__) && (__clang_major__ < 4)) || \ - (defined(__GNUC__) && defined(ENABLE_PRELOAD) && ENABLE_PRELOAD) -#define RPMALLOC_ATTRIB_MALLOC -#define RPMALLOC_ATTRIB_ALLOC_SIZE(size) -#define RPMALLOC_ATTRIB_ALLOC_SIZE2(count, size) -#else -#define RPMALLOC_ATTRIB_MALLOC __attribute__((__malloc__)) -#define RPMALLOC_ATTRIB_ALLOC_SIZE(size) __attribute__((alloc_size(size))) -#define RPMALLOC_ATTRIB_ALLOC_SIZE2(count, size) \ - __attribute__((alloc_size(count, size))) -#endif -#define RPMALLOC_CDECL -#elif defined(_MSC_VER) -#define RPMALLOC_EXPORT -#define RPMALLOC_ALLOCATOR __declspec(allocator) __declspec(restrict) -#define RPMALLOC_ATTRIB_MALLOC -#define RPMALLOC_ATTRIB_ALLOC_SIZE(size) -#define RPMALLOC_ATTRIB_ALLOC_SIZE2(count, size) -#define RPMALLOC_CDECL __cdecl -#else -#define RPMALLOC_EXPORT -#define RPMALLOC_ALLOCATOR -#define RPMALLOC_ATTRIB_MALLOC -#define RPMALLOC_ATTRIB_ALLOC_SIZE(size) -#define RPMALLOC_ATTRIB_ALLOC_SIZE2(count, size) -#define RPMALLOC_CDECL -#endif - -//! Define RPMALLOC_CONFIGURABLE to enable configuring sizes. Will introduce -// a very small overhead due to some size calculations not being compile time -// constants -#ifndef RPMALLOC_CONFIGURABLE -#define RPMALLOC_CONFIGURABLE 0 -#endif - -//! Define RPMALLOC_FIRST_CLASS_HEAPS to enable heap based API (rpmalloc_heap_* -//! functions). -// Will introduce a very small overhead to track fully allocated spans in heaps -#ifndef RPMALLOC_FIRST_CLASS_HEAPS -#define RPMALLOC_FIRST_CLASS_HEAPS 0 -#endif - -//! Flag to rpaligned_realloc to not preserve content in reallocation -#define RPMALLOC_NO_PRESERVE 1 -//! Flag to rpaligned_realloc to fail and return null pointer if grow cannot be -//! done in-place, -// in which case the original pointer is still valid (just like a call to -// realloc which failes to allocate a new block). -#define RPMALLOC_GROW_OR_FAIL 2 - -typedef struct rpmalloc_global_statistics_t { - //! Current amount of virtual memory mapped, all of which might not have been - //! committed (only if ENABLE_STATISTICS=1) - size_t mapped; - //! Peak amount of virtual memory mapped, all of which might not have been - //! committed (only if ENABLE_STATISTICS=1) - size_t mapped_peak; - //! Current amount of memory in global caches for small and medium sizes - //! (<32KiB) - size_t cached; - //! Current amount of memory allocated in huge allocations, i.e larger than - //! LARGE_SIZE_LIMIT which is 2MiB by default (only if ENABLE_STATISTICS=1) - size_t huge_alloc; - //! Peak amount of memory allocated in huge allocations, i.e larger than - //! LARGE_SIZE_LIMIT which is 2MiB by default (only if ENABLE_STATISTICS=1) - size_t huge_alloc_peak; - //! Total amount of memory mapped since initialization (only if - //! ENABLE_STATISTICS=1) - size_t mapped_total; - //! Total amount of memory unmapped since initialization (only if - //! ENABLE_STATISTICS=1) - size_t unmapped_total; -} rpmalloc_global_statistics_t; - -typedef struct rpmalloc_thread_statistics_t { - //! Current number of bytes available in thread size class caches for small - //! and medium sizes (<32KiB) - size_t sizecache; - //! Current number of bytes available in thread span caches for small and - //! medium sizes (<32KiB) - size_t spancache; - //! Total number of bytes transitioned from thread cache to global cache (only - //! if ENABLE_STATISTICS=1) - size_t thread_to_global; - //! Total number of bytes transitioned from global cache to thread cache (only - //! if ENABLE_STATISTICS=1) - size_t global_to_thread; - //! Per span count statistics (only if ENABLE_STATISTICS=1) - struct { - //! Currently used number of spans - size_t current; - //! High water mark of spans used - size_t peak; - //! Number of spans transitioned to global cache - size_t to_global; - //! Number of spans transitioned from global cache - size_t from_global; - //! Number of spans transitioned to thread cache - size_t to_cache; - //! Number of spans transitioned from thread cache - size_t from_cache; - //! Number of spans transitioned to reserved state - size_t to_reserved; - //! Number of spans transitioned from reserved state - size_t from_reserved; - //! Number of raw memory map calls (not hitting the reserve spans but - //! resulting in actual OS mmap calls) - size_t map_calls; - } span_use[64]; - //! Per size class statistics (only if ENABLE_STATISTICS=1) - struct { - //! Current number of allocations - size_t alloc_current; - //! Peak number of allocations - size_t alloc_peak; - //! Total number of allocations - size_t alloc_total; - //! Total number of frees - size_t free_total; - //! Number of spans transitioned to cache - size_t spans_to_cache; - //! Number of spans transitioned from cache - size_t spans_from_cache; - //! Number of spans transitioned from reserved state - size_t spans_from_reserved; - //! Number of raw memory map calls (not hitting the reserve spans but - //! resulting in actual OS mmap calls) - size_t map_calls; - } size_use[128]; -} rpmalloc_thread_statistics_t; - -typedef struct rpmalloc_config_t { - //! Map memory pages for the given number of bytes. The returned address MUST - //! be - // aligned to the rpmalloc span size, which will always be a power of two. - // Optionally the function can store an alignment offset in the offset - // variable in case it performs alignment and the returned pointer is offset - // from the actual start of the memory region due to this alignment. The - // alignment offset will be passed to the memory unmap function. The - // alignment offset MUST NOT be larger than 65535 (storable in an uint16_t), - // if it is you must use natural alignment to shift it into 16 bits. If you - // set a memory_map function, you must also set a memory_unmap function or - // else the default implementation will be used for both. This function must - // be thread safe, it can be called by multiple threads simultaneously. - void *(*memory_map)(size_t size, size_t *offset); - //! Unmap the memory pages starting at address and spanning the given number - //! of bytes. - // If release is set to non-zero, the unmap is for an entire span range as - // returned by a previous call to memory_map and that the entire range should - // be released. The release argument holds the size of the entire span range. - // If release is set to 0, the unmap is a partial decommit of a subset of the - // mapped memory range. If you set a memory_unmap function, you must also set - // a memory_map function or else the default implementation will be used for - // both. This function must be thread safe, it can be called by multiple - // threads simultaneously. - void (*memory_unmap)(void *address, size_t size, size_t offset, - size_t release); - //! Called when an assert fails, if asserts are enabled. Will use the standard - //! assert() - // if this is not set. - void (*error_callback)(const char *message); - //! Called when a call to map memory pages fails (out of memory). If this - //! callback is - // not set or returns zero the library will return a null pointer in the - // allocation call. If this callback returns non-zero the map call will be - // retried. The argument passed is the number of bytes that was requested in - // the map call. Only used if the default system memory map function is used - // (memory_map callback is not set). - int (*map_fail_callback)(size_t size); - //! Size of memory pages. The page size MUST be a power of two. All memory - //! mapping - // requests to memory_map will be made with size set to a multiple of the - // page size. Used if RPMALLOC_CONFIGURABLE is defined to 1, otherwise system - // page size is used. - size_t page_size; - //! Size of a span of memory blocks. MUST be a power of two, and in - //! [4096,262144] - // range (unless 0 - set to 0 to use the default span size). Used if - // RPMALLOC_CONFIGURABLE is defined to 1. - size_t span_size; - //! Number of spans to map at each request to map new virtual memory blocks. - //! This can - // be used to minimize the system call overhead at the cost of virtual memory - // address space. The extra mapped pages will not be written until actually - // used, so physical committed memory should not be affected in the default - // implementation. Will be aligned to a multiple of spans that match memory - // page size in case of huge pages. - size_t span_map_count; - //! Enable use of large/huge pages. If this flag is set to non-zero and page - //! size is - // zero, the allocator will try to enable huge pages and auto detect the - // configuration. If this is set to non-zero and page_size is also non-zero, - // the allocator will assume huge pages have been configured and enabled - // prior to initializing the allocator. For Windows, see - // https://docs.microsoft.com/en-us/windows/desktop/memory/large-page-support - // For Linux, see https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt - int enable_huge_pages; - //! Respectively allocated pages and huge allocated pages names for systems - // supporting it to be able to distinguish among anonymous regions. - const char *page_name; - const char *huge_page_name; -} rpmalloc_config_t; - -//! Initialize allocator with default configuration -RPMALLOC_EXPORT int rpmalloc_initialize(void); - -//! Initialize allocator with given configuration -RPMALLOC_EXPORT int rpmalloc_initialize_config(const rpmalloc_config_t *config); - -//! Get allocator configuration -RPMALLOC_EXPORT const rpmalloc_config_t *rpmalloc_config(void); - -//! Finalize allocator -RPMALLOC_EXPORT void rpmalloc_finalize(void); - -//! Initialize allocator for calling thread -RPMALLOC_EXPORT void rpmalloc_thread_initialize(void); - -//! Finalize allocator for calling thread -RPMALLOC_EXPORT void rpmalloc_thread_finalize(int release_caches); - -//! Perform deferred deallocations pending for the calling thread heap -RPMALLOC_EXPORT void rpmalloc_thread_collect(void); - -//! Query if allocator is initialized for calling thread -RPMALLOC_EXPORT int rpmalloc_is_thread_initialized(void); - -//! Get per-thread statistics -RPMALLOC_EXPORT void -rpmalloc_thread_statistics(rpmalloc_thread_statistics_t *stats); - -//! Get global statistics -RPMALLOC_EXPORT void -rpmalloc_global_statistics(rpmalloc_global_statistics_t *stats); - -//! Dump all statistics in human readable format to file (should be a FILE*) -RPMALLOC_EXPORT void rpmalloc_dump_statistics(void *file); - -//! Allocate a memory block of at least the given size -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpmalloc(size_t size) RPMALLOC_ATTRIB_MALLOC RPMALLOC_ATTRIB_ALLOC_SIZE(1); - -//! Free the given memory block -RPMALLOC_EXPORT void rpfree(void *ptr); - -//! Allocate a memory block of at least the given size and zero initialize it -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpcalloc(size_t num, size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE2(1, 2); - -//! Reallocate the given block to at least the given size -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rprealloc(void *ptr, size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(2); - -//! Reallocate the given block to at least the given size and alignment, -// with optional control flags (see RPMALLOC_NO_PRESERVE). -// Alignment must be a power of two and a multiple of sizeof(void*), -// and should ideally be less than memory page size. A caveat of rpmalloc -// internals is that this must also be strictly less than the span size -// (default 64KiB) -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpaligned_realloc(void *ptr, size_t alignment, size_t size, size_t oldsize, - unsigned int flags) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(3); - -//! Allocate a memory block of at least the given size and alignment. -// Alignment must be a power of two and a multiple of sizeof(void*), -// and should ideally be less than memory page size. A caveat of rpmalloc -// internals is that this must also be strictly less than the span size -// (default 64KiB) -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpaligned_alloc(size_t alignment, size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(2); - -//! Allocate a memory block of at least the given size and alignment, and zero -//! initialize it. -// Alignment must be a power of two and a multiple of sizeof(void*), -// and should ideally be less than memory page size. A caveat of rpmalloc -// internals is that this must also be strictly less than the span size -// (default 64KiB) -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpaligned_calloc(size_t alignment, size_t num, - size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE2(2, 3); - -//! Allocate a memory block of at least the given size and alignment. -// Alignment must be a power of two and a multiple of sizeof(void*), -// and should ideally be less than memory page size. A caveat of rpmalloc -// internals is that this must also be strictly less than the span size -// (default 64KiB) -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpmemalign(size_t alignment, size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(2); - -//! Allocate a memory block of at least the given size and alignment. -// Alignment must be a power of two and a multiple of sizeof(void*), -// and should ideally be less than memory page size. A caveat of rpmalloc -// internals is that this must also be strictly less than the span size -// (default 64KiB) -RPMALLOC_EXPORT int rpposix_memalign(void **memptr, size_t alignment, - size_t size); - -//! Query the usable size of the given memory block (from given pointer to the -//! end of block) -RPMALLOC_EXPORT size_t rpmalloc_usable_size(void *ptr); - -//! Dummy empty function for forcing linker symbol inclusion -RPMALLOC_EXPORT void rpmalloc_linker_reference(void); - -#if RPMALLOC_FIRST_CLASS_HEAPS - -//! Heap type -typedef struct heap_t rpmalloc_heap_t; - -//! Acquire a new heap. Will reuse existing released heaps or allocate memory -//! for a new heap -// if none available. Heap API is implemented with the strict assumption that -// only one single thread will call heap functions for a given heap at any -// given time, no functions are thread safe. -RPMALLOC_EXPORT rpmalloc_heap_t *rpmalloc_heap_acquire(void); - -//! Release a heap (does NOT free the memory allocated by the heap, use -//! rpmalloc_heap_free_all before destroying the heap). -// Releasing a heap will enable it to be reused by other threads. Safe to pass -// a null pointer. -RPMALLOC_EXPORT void rpmalloc_heap_release(rpmalloc_heap_t *heap); - -//! Allocate a memory block of at least the given size using the given heap. -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpmalloc_heap_alloc(rpmalloc_heap_t *heap, size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(2); - -//! Allocate a memory block of at least the given size using the given heap. The -//! returned -// block will have the requested alignment. Alignment must be a power of two -// and a multiple of sizeof(void*), and should ideally be less than memory page -// size. A caveat of rpmalloc internals is that this must also be strictly less -// than the span size (default 64KiB). -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpmalloc_heap_aligned_alloc(rpmalloc_heap_t *heap, size_t alignment, - size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(3); - -//! Allocate a memory block of at least the given size using the given heap and -//! zero initialize it. -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpmalloc_heap_calloc(rpmalloc_heap_t *heap, size_t num, - size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE2(2, 3); - -//! Allocate a memory block of at least the given size using the given heap and -//! zero initialize it. The returned -// block will have the requested alignment. Alignment must either be zero, or a -// power of two and a multiple of sizeof(void*), and should ideally be less -// than memory page size. A caveat of rpmalloc internals is that this must also -// be strictly less than the span size (default 64KiB). -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpmalloc_heap_aligned_calloc(rpmalloc_heap_t *heap, size_t alignment, - size_t num, size_t size) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE2(2, 3); - -//! Reallocate the given block to at least the given size. The memory block MUST -//! be allocated -// by the same heap given to this function. -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * -rpmalloc_heap_realloc(rpmalloc_heap_t *heap, void *ptr, size_t size, - unsigned int flags) RPMALLOC_ATTRIB_MALLOC - RPMALLOC_ATTRIB_ALLOC_SIZE(3); - -//! Reallocate the given block to at least the given size. The memory block MUST -//! be allocated -// by the same heap given to this function. The returned block will have the -// requested alignment. Alignment must be either zero, or a power of two and a -// multiple of sizeof(void*), and should ideally be less than memory page size. -// A caveat of rpmalloc internals is that this must also be strictly less than -// the span size (default 64KiB). -RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void *rpmalloc_heap_aligned_realloc( - rpmalloc_heap_t *heap, void *ptr, size_t alignment, size_t size, - unsigned int flags) RPMALLOC_ATTRIB_MALLOC RPMALLOC_ATTRIB_ALLOC_SIZE(4); - -//! Free the given memory block from the given heap. The memory block MUST be -//! allocated -// by the same heap given to this function. -RPMALLOC_EXPORT void rpmalloc_heap_free(rpmalloc_heap_t *heap, void *ptr); - -//! Free all memory allocated by the heap -RPMALLOC_EXPORT void rpmalloc_heap_free_all(rpmalloc_heap_t *heap); - -//! Set the given heap as the current heap for the calling thread. A heap MUST -//! only be current heap -// for a single thread, a heap can never be shared between multiple threads. -// The previous current heap for the calling thread is released to be reused by -// other threads. -RPMALLOC_EXPORT void rpmalloc_heap_thread_set_current(rpmalloc_heap_t *heap); - -//! Returns which heap the given pointer is allocated on -RPMALLOC_EXPORT rpmalloc_heap_t *rpmalloc_get_heap_for_ptr(void *ptr); - -#endif - -#ifdef __cplusplus -} -#endif +//===---------------------- rpmalloc.h ------------------*- C -*-=============// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This library provides a cross-platform lock free thread caching malloc +// implementation in C11. +// +//===----------------------------------------------------------------------===// + +#pragma once + +#include + +#ifdef __cplusplus +extern "C" { +#endif + +#if defined(__clang__) || defined(__GNUC__) +#define RPMALLOC_EXPORT __attribute__((visibility("default"))) +#define RPMALLOC_ALLOCATOR +#if (defined(__clang_major__) && (__clang_major__ < 4)) || \ + (defined(__GNUC__) && defined(ENABLE_PRELOAD) && ENABLE_PRELOAD) +#define RPMALLOC_ATTRIB_MALLOC +#define RPMALLOC_ATTRIB_ALLOC_SIZE(size) +#define RPMALLOC_ATTRIB_ALLOC_SIZE2(count, size) +#else +#define RPMALLOC_ATTRIB_MALLOC __attribute__((__malloc__)) +#define RPMALLOC_ATTRIB_ALLOC_SIZE(size) __attribute__((alloc_size(size))) +#define RPMALLOC_ATTRIB_ALLOC_SIZE2(count, size) \ + __attribute__((alloc_size(count, size))) +#endif +#define RPMALLOC_CDECL +#elif defined(_MSC_VER) +#define RPMALLOC_EXPORT +#define RPMALLOC_ALLOCATOR __declspec(allocator) __declspec(restrict) +#define RPMALLOC_ATTRIB_MALLOC +#define RPMALLOC_ATTRIB_ALLOC_SIZE(size) +#define RPMALLOC_ATTRIB_ALLOC_SIZE2(count, size) +#define RPMALLOC_CDECL __cdecl +#else +#define RPMALLOC_EXPORT +#define RPMALLOC_ALLOCATOR +#define RPMALLOC_ATTRIB_MALLOC +#define RPMALLOC_ATTRIB_ALLOC_SIZE(size) +#define RPMALLOC_ATTRIB_ALLOC_SIZE2(count, size) +#define RPMALLOC_CDECL +#endif + +//! Define RPMALLOC_CONFIGURABLE to enable configuring sizes. Will introduce +// a very small overhead due to some size calculations not being compile time +// constants +#ifndef RPMALLOC_CONFIGURABLE +#define RPMALLOC_CONFIGURABLE 0 +#endif + +//! Define RPMALLOC_FIRST_CLASS_HEAPS to enable heap based API (rpmalloc_heap_* +//! functions). +// Will introduce a very small overhead to track fully allocated spans in heaps +#ifndef RPMALLOC_FIRST_CLASS_HEAPS +#define RPMALLOC_FIRST_CLASS_HEAPS 0 +#endif + +//! Flag to rpaligned_realloc to not preserve content in reallocation +#define RPMALLOC_NO_PRESERVE 1 +//! Flag to rpaligned_realloc to fail and return null pointer if grow cannot be +//! done in-place, +// in which case the original pointer is still valid (just like a call to +// realloc which failes to allocate a new block). +#define RPMALLOC_GROW_OR_FAIL 2 + +typedef struct rpmalloc_global_statistics_t { + //! Current amount of virtual memory mapped, all of which might not have been + //! committed (only if ENABLE_STATISTICS=1) + size_t mapped; + //! Peak amount of virtual memory mapped, all of which might not have been + //! committed (only if ENABLE_STATISTICS=1) + size_t mapped_peak; + //! Current amount of memory in global caches for small and medium sizes + //! (<32KiB) + size_t cached; + //! Current amount of memory allocated in huge allocations, i.e larger than + //! LARGE_SIZE_LIMIT which is 2MiB by default (only if ENABLE_STATISTICS=1) + size_t huge_alloc; + //! Peak amount of memory allocated in huge allocations, i.e larger than + //! LARGE_SIZE_LIMIT which is 2MiB by default (only if ENABLE_STATISTICS=1) + size_t huge_alloc_peak; + //! Total amount of memory mapped since initialization (only if + //! ENABLE_STATISTICS=1) + size_t mapped_total; + //! Total amount of memory unmapped since initialization (only if + //! ENABLE_STATISTICS=1) + size_t unmapped_total; +} rpmalloc_global_statistics_t; + +typedef struct rpmalloc_thread_statistics_t { + //! Current number of bytes available in thread size class caches for small + //! and medium sizes (<32KiB) + size_t sizecache; + //! Current number of bytes available in thread span caches for small and + //! medium sizes (<32KiB) + size_t spancache; + //! Total number of bytes transitioned from thread cache to global cache (only + //! if ENABLE_STATISTICS=1) + size_t thread_to_global; + //! Total number of bytes transitioned from global cache to thread cache (only + //! if ENABLE_STATISTICS=1) + size_t global_to_thread; + //! Per span count statistics (only if ENABLE_STATISTICS=1) + struct { + //! Currently used number of spans + size_t current; + //! High water mark of spans used + size_t peak; + //! Number of spans transitioned to global cache + size_t to_global; + //! Number of spans transitioned from global cache + size_t from_global; + //! Number of spans transitioned to thread cache + size_t to_cache; + //! Number of spans transitioned from thread cache + size_t from_cache; + //! Number of spans transitioned to reserved state + size_t to_reserved; + //! Number of spans transitioned from reserved state + size_t from_reserved; + //! Number of raw memory map calls (not hitting the reserve spans but + //! resulting in actual OS mmap calls) + size_t map_calls; + } span_use[64]; + //! Per size class statistics (only if ENABLE_STATISTICS=1) + struct { + //! Current number of allocations + size_t alloc_current; + //! Peak number of allocations + size_t alloc_peak; + //! Total number of allocations + size_t alloc_total; + //! Total number of frees + size_t free_total; + //! Number of spans transitioned to cache + size_t spans_to_cache; + //! Number of spans transitioned from cache + size_t spans_from_cache; + //! Number of spans transitioned from reserved state + size_t spans_from_reserved; + //! Number of raw memory map calls (not hitting the reserve spans but + //! resulting in actual OS mmap calls) + size_t map_calls; + } size_use[128]; +} rpmalloc_thread_statistics_t; + +typedef struct rpmalloc_config_t { + //! Map memory pages for the given number of bytes. The returned address MUST + //! be + // aligned to the rpmalloc span size, which will always be a power of two. + // Optionally the function can store an alignment offset in the offset + // variable in case it performs alignment and the returned pointer is offset + // from the actual start of the memory region due to this alignment. The + // alignment offset will be passed to the memory unmap function. The + // alignment offset MUST NOT be larger than 65535 (storable in an uint16_t), + // if it is you must use natural alignment to shift it into 16 bits. If you + // set a memory_map function, you must also set a memory_unmap function or + // else the default implementation will be used for both. This function must + // be thread safe, it can be called by multiple threads simultaneously. + void *(*memory_map)(size_t size, size_t *offset); + //! Unmap the memory pages starting at address and spanning the given number + //! of bytes. + // If release is set to non-zero, the unmap is for an entire span range as + // returned by a previous call to memory_map and that the entire range should + // be released. The release argument holds the size of the entire span range. + // If release is set to 0, the unmap is a partial decommit of a subset of the + // mapped memory range. If you set a memory_unmap function, you must also set + // a memory_map function or else the default implementation will be used for + // both. This function must be thread safe, it can be called by multiple + // threads simultaneously. + void (*memory_unmap)(void *address, size_t size, size_t offset, + size_t release); + //! Called when an assert fails, if asserts are enabled. Will use the standard + //! assert() + // if this is not set. + void (*error_callback)(const char *message); + //! Called when a call to map memory pages fails (out of memory). If this + //! callback is + // not set or returns zero the library will return a null pointer in the + // allocation call. If this callback returns non-zero the map call will be + // retried. The argument passed is the number of bytes that was requested in + // the map call. Only used if the default system memory map function is used + // (memory_map callback is not set). + int (*map_fail_callback)(size_t size); + //! Size of memory pages. The page size MUST be a power of two. All memory + //! mapping + // requests to memory_map will be made with size set to a multiple of the + // page size. Used if RPMALLOC_CONFIGURABLE is defined to 1, otherwise system + // page size is used. + size_t page_size; + //! Size of a span of memory blocks. MUST be a power of two, and in + //! [4096,262144] + // range (unless 0 - set to 0 to use the default span size). Used if + // RPMALLOC_CONFIGURABLE is defined to 1. + size_t span_size; + //! Number of spans to map at each request to map new virtual memory blocks. + //! This can + // be used to minimize the system call overhead at the cost of virtual memory + // address space. The extra mapped pages will not be written until actually + // used, so physical committed memory should not be affected in the default + // implementation. Will be aligned to a multiple of spans that match memory + // page size in case of huge pages. + size_t span_map_count; + //! Enable use of large/huge pages. If this flag is set to non-zero and page + //! size is + // zero, the allocator will try to enable huge pages and auto detect the + // configuration. If this is set to non-zero and page_size is also non-zero, + // the allocator will assume huge pages have been configured and enabled + // prior to initializing the allocator. For Windows, see + // https://docs.microsoft.com/en-us/windows/desktop/memory/large-page-support + // For Linux, see https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt + int enable_huge_pages; + //! Respectively allocated pages and huge allocated pages names for systems + // supporting it to be able to distinguish among anonymous regions. + const char *page_name; + const char *huge_page_name; +} rpmalloc_config_t; + +//! Initialize allocator with default configuration +RPMALLOC_EXPORT int rpmalloc_initialize(void); + +//! Initialize allocator with given configuration +RPMALLOC_EXPORT int rpmalloc_initialize_config(const rpmalloc_config_t *config); + +//! Get allocator configuration +RPMALLOC_EXPORT const rpmalloc_config_t *rpmalloc_config(void); + +//! Finalize allocator +RPMALLOC_EXPORT void rpmalloc_finalize(void); + +//! Initialize allocator for calling thread +RPMALLOC_EXPORT void rpmalloc_thread_initialize(void); + +//! Finalize allocator for calling thread +RPMALLOC_EXPORT void rpmalloc_thread_finalize(int release_caches); + +//! Perform deferred deallocations pending for the calling thread heap +RPMALLOC_EXPORT void rpmalloc_thread_collect(void); + +//! Query if allocator is initialized for calling thread +RPMALLOC_EXPORT int rpmalloc_is_thread_initialized(void); + +//! Get per-thread statistics +RPMALLOC_EXPORT void +rpmalloc_thread_statistics(rpmalloc_thread_statistics_t *stats); + +//! Get global statistics +RPMALLOC_EXPORT void +rpmalloc_global_statistics(rpmalloc_global_statistics_t *stats); + +//! Dump all statistics in human readable format to file (should be a FILE*) +RPMALLOC_EXPORT void rpmalloc_dump_statistics(void *file); + +//! Allocate a memory block of at least the given size +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpmalloc(size_t size) RPMALLOC_ATTRIB_MALLOC RPMALLOC_ATTRIB_ALLOC_SIZE(1); + +//! Free the given memory block +RPMALLOC_EXPORT void rpfree(void *ptr); + +//! Allocate a memory block of at least the given size and zero initialize it +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpcalloc(size_t num, size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE2(1, 2); + +//! Reallocate the given block to at least the given size +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rprealloc(void *ptr, size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(2); + +//! Reallocate the given block to at least the given size and alignment, +// with optional control flags (see RPMALLOC_NO_PRESERVE). +// Alignment must be a power of two and a multiple of sizeof(void*), +// and should ideally be less than memory page size. A caveat of rpmalloc +// internals is that this must also be strictly less than the span size +// (default 64KiB) +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpaligned_realloc(void *ptr, size_t alignment, size_t size, size_t oldsize, + unsigned int flags) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(3); + +//! Allocate a memory block of at least the given size and alignment. +// Alignment must be a power of two and a multiple of sizeof(void*), +// and should ideally be less than memory page size. A caveat of rpmalloc +// internals is that this must also be strictly less than the span size +// (default 64KiB) +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpaligned_alloc(size_t alignment, size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(2); + +//! Allocate a memory block of at least the given size and alignment, and zero +//! initialize it. +// Alignment must be a power of two and a multiple of sizeof(void*), +// and should ideally be less than memory page size. A caveat of rpmalloc +// internals is that this must also be strictly less than the span size +// (default 64KiB) +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpaligned_calloc(size_t alignment, size_t num, + size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE2(2, 3); + +//! Allocate a memory block of at least the given size and alignment. +// Alignment must be a power of two and a multiple of sizeof(void*), +// and should ideally be less than memory page size. A caveat of rpmalloc +// internals is that this must also be strictly less than the span size +// (default 64KiB) +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpmemalign(size_t alignment, size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(2); + +//! Allocate a memory block of at least the given size and alignment. +// Alignment must be a power of two and a multiple of sizeof(void*), +// and should ideally be less than memory page size. A caveat of rpmalloc +// internals is that this must also be strictly less than the span size +// (default 64KiB) +RPMALLOC_EXPORT int rpposix_memalign(void **memptr, size_t alignment, + size_t size); + +//! Query the usable size of the given memory block (from given pointer to the +//! end of block) +RPMALLOC_EXPORT size_t rpmalloc_usable_size(void *ptr); + +//! Dummy empty function for forcing linker symbol inclusion +RPMALLOC_EXPORT void rpmalloc_linker_reference(void); + +#if RPMALLOC_FIRST_CLASS_HEAPS + +//! Heap type +typedef struct heap_t rpmalloc_heap_t; + +//! Acquire a new heap. Will reuse existing released heaps or allocate memory +//! for a new heap +// if none available. Heap API is implemented with the strict assumption that +// only one single thread will call heap functions for a given heap at any +// given time, no functions are thread safe. +RPMALLOC_EXPORT rpmalloc_heap_t *rpmalloc_heap_acquire(void); + +//! Release a heap (does NOT free the memory allocated by the heap, use +//! rpmalloc_heap_free_all before destroying the heap). +// Releasing a heap will enable it to be reused by other threads. Safe to pass +// a null pointer. +RPMALLOC_EXPORT void rpmalloc_heap_release(rpmalloc_heap_t *heap); + +//! Allocate a memory block of at least the given size using the given heap. +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpmalloc_heap_alloc(rpmalloc_heap_t *heap, size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(2); + +//! Allocate a memory block of at least the given size using the given heap. The +//! returned +// block will have the requested alignment. Alignment must be a power of two +// and a multiple of sizeof(void*), and should ideally be less than memory page +// size. A caveat of rpmalloc internals is that this must also be strictly less +// than the span size (default 64KiB). +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpmalloc_heap_aligned_alloc(rpmalloc_heap_t *heap, size_t alignment, + size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(3); + +//! Allocate a memory block of at least the given size using the given heap and +//! zero initialize it. +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpmalloc_heap_calloc(rpmalloc_heap_t *heap, size_t num, + size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE2(2, 3); + +//! Allocate a memory block of at least the given size using the given heap and +//! zero initialize it. The returned +// block will have the requested alignment. Alignment must either be zero, or a +// power of two and a multiple of sizeof(void*), and should ideally be less +// than memory page size. A caveat of rpmalloc internals is that this must also +// be strictly less than the span size (default 64KiB). +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpmalloc_heap_aligned_calloc(rpmalloc_heap_t *heap, size_t alignment, + size_t num, size_t size) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE2(2, 3); + +//! Reallocate the given block to at least the given size. The memory block MUST +//! be allocated +// by the same heap given to this function. +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void * +rpmalloc_heap_realloc(rpmalloc_heap_t *heap, void *ptr, size_t size, + unsigned int flags) RPMALLOC_ATTRIB_MALLOC + RPMALLOC_ATTRIB_ALLOC_SIZE(3); + +//! Reallocate the given block to at least the given size. The memory block MUST +//! be allocated +// by the same heap given to this function. The returned block will have the +// requested alignment. Alignment must be either zero, or a power of two and a +// multiple of sizeof(void*), and should ideally be less than memory page size. +// A caveat of rpmalloc internals is that this must also be strictly less than +// the span size (default 64KiB). +RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void *rpmalloc_heap_aligned_realloc( + rpmalloc_heap_t *heap, void *ptr, size_t alignment, size_t size, + unsigned int flags) RPMALLOC_ATTRIB_MALLOC RPMALLOC_ATTRIB_ALLOC_SIZE(4); + +//! Free the given memory block from the given heap. The memory block MUST be +//! allocated +// by the same heap given to this function. +RPMALLOC_EXPORT void rpmalloc_heap_free(rpmalloc_heap_t *heap, void *ptr); + +//! Free all memory allocated by the heap +RPMALLOC_EXPORT void rpmalloc_heap_free_all(rpmalloc_heap_t *heap); + +//! Set the given heap as the current heap for the calling thread. A heap MUST +//! only be current heap +// for a single thread, a heap can never be shared between multiple threads. +// The previous current heap for the calling thread is released to be reused by +// other threads. +RPMALLOC_EXPORT void rpmalloc_heap_thread_set_current(rpmalloc_heap_t *heap); + +//! Returns which heap the given pointer is allocated on +RPMALLOC_EXPORT rpmalloc_heap_t *rpmalloc_get_heap_for_ptr(void *ptr); + +#endif + +#ifdef __cplusplus +} +#endif diff --git a/llvm/lib/Support/rpmalloc/rpnew.h b/llvm/lib/Support/rpmalloc/rpnew.h index d8303c6f95652f..a18f0799d56d1f 100644 --- a/llvm/lib/Support/rpmalloc/rpnew.h +++ b/llvm/lib/Support/rpmalloc/rpnew.h @@ -1,113 +1,113 @@ -//===-------------------------- rpnew.h -----------------*- C -*-=============// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// -// -// This library provides a cross-platform lock free thread caching malloc -// implementation in C11. -// -//===----------------------------------------------------------------------===// - -#ifdef __cplusplus - -#include -#include - -#ifndef __CRTDECL -#define __CRTDECL -#endif - -extern void __CRTDECL operator delete(void *p) noexcept { rpfree(p); } - -extern void __CRTDECL operator delete[](void *p) noexcept { rpfree(p); } - -extern void *__CRTDECL operator new(std::size_t size) noexcept(false) { - return rpmalloc(size); -} - -extern void *__CRTDECL operator new[](std::size_t size) noexcept(false) { - return rpmalloc(size); -} - -extern void *__CRTDECL operator new(std::size_t size, - const std::nothrow_t &tag) noexcept { - (void)sizeof(tag); - return rpmalloc(size); -} - -extern void *__CRTDECL operator new[](std::size_t size, - const std::nothrow_t &tag) noexcept { - (void)sizeof(tag); - return rpmalloc(size); -} - -#if (__cplusplus >= 201402L || _MSC_VER >= 1916) - -extern void __CRTDECL operator delete(void *p, std::size_t size) noexcept { - (void)sizeof(size); - rpfree(p); -} - -extern void __CRTDECL operator delete[](void *p, std::size_t size) noexcept { - (void)sizeof(size); - rpfree(p); -} - -#endif - -#if (__cplusplus > 201402L || defined(__cpp_aligned_new)) - -extern void __CRTDECL operator delete(void *p, - std::align_val_t align) noexcept { - (void)sizeof(align); - rpfree(p); -} - -extern void __CRTDECL operator delete[](void *p, - std::align_val_t align) noexcept { - (void)sizeof(align); - rpfree(p); -} - -extern void __CRTDECL operator delete(void *p, std::size_t size, - std::align_val_t align) noexcept { - (void)sizeof(size); - (void)sizeof(align); - rpfree(p); -} - -extern void __CRTDECL operator delete[](void *p, std::size_t size, - std::align_val_t align) noexcept { - (void)sizeof(size); - (void)sizeof(align); - rpfree(p); -} - -extern void *__CRTDECL operator new(std::size_t size, - std::align_val_t align) noexcept(false) { - return rpaligned_alloc(static_cast(align), size); -} - -extern void *__CRTDECL operator new[](std::size_t size, - std::align_val_t align) noexcept(false) { - return rpaligned_alloc(static_cast(align), size); -} - -extern void *__CRTDECL operator new(std::size_t size, std::align_val_t align, - const std::nothrow_t &tag) noexcept { - (void)sizeof(tag); - return rpaligned_alloc(static_cast(align), size); -} - -extern void *__CRTDECL operator new[](std::size_t size, std::align_val_t align, - const std::nothrow_t &tag) noexcept { - (void)sizeof(tag); - return rpaligned_alloc(static_cast(align), size); -} - -#endif - -#endif +//===-------------------------- rpnew.h -----------------*- C -*-=============// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This library provides a cross-platform lock free thread caching malloc +// implementation in C11. +// +//===----------------------------------------------------------------------===// + +#ifdef __cplusplus + +#include +#include + +#ifndef __CRTDECL +#define __CRTDECL +#endif + +extern void __CRTDECL operator delete(void *p) noexcept { rpfree(p); } + +extern void __CRTDECL operator delete[](void *p) noexcept { rpfree(p); } + +extern void *__CRTDECL operator new(std::size_t size) noexcept(false) { + return rpmalloc(size); +} + +extern void *__CRTDECL operator new[](std::size_t size) noexcept(false) { + return rpmalloc(size); +} + +extern void *__CRTDECL operator new(std::size_t size, + const std::nothrow_t &tag) noexcept { + (void)sizeof(tag); + return rpmalloc(size); +} + +extern void *__CRTDECL operator new[](std::size_t size, + const std::nothrow_t &tag) noexcept { + (void)sizeof(tag); + return rpmalloc(size); +} + +#if (__cplusplus >= 201402L || _MSC_VER >= 1916) + +extern void __CRTDECL operator delete(void *p, std::size_t size) noexcept { + (void)sizeof(size); + rpfree(p); +} + +extern void __CRTDECL operator delete[](void *p, std::size_t size) noexcept { + (void)sizeof(size); + rpfree(p); +} + +#endif + +#if (__cplusplus > 201402L || defined(__cpp_aligned_new)) + +extern void __CRTDECL operator delete(void *p, + std::align_val_t align) noexcept { + (void)sizeof(align); + rpfree(p); +} + +extern void __CRTDECL operator delete[](void *p, + std::align_val_t align) noexcept { + (void)sizeof(align); + rpfree(p); +} + +extern void __CRTDECL operator delete(void *p, std::size_t size, + std::align_val_t align) noexcept { + (void)sizeof(size); + (void)sizeof(align); + rpfree(p); +} + +extern void __CRTDECL operator delete[](void *p, std::size_t size, + std::align_val_t align) noexcept { + (void)sizeof(size); + (void)sizeof(align); + rpfree(p); +} + +extern void *__CRTDECL operator new(std::size_t size, + std::align_val_t align) noexcept(false) { + return rpaligned_alloc(static_cast(align), size); +} + +extern void *__CRTDECL operator new[](std::size_t size, + std::align_val_t align) noexcept(false) { + return rpaligned_alloc(static_cast(align), size); +} + +extern void *__CRTDECL operator new(std::size_t size, std::align_val_t align, + const std::nothrow_t &tag) noexcept { + (void)sizeof(tag); + return rpaligned_alloc(static_cast(align), size); +} + +extern void *__CRTDECL operator new[](std::size_t size, std::align_val_t align, + const std::nothrow_t &tag) noexcept { + (void)sizeof(tag); + return rpaligned_alloc(static_cast(align), size); +} + +#endif + +#endif diff --git a/llvm/lib/Target/DirectX/DXILFinalizeLinkage.cpp b/llvm/lib/Target/DirectX/DXILFinalizeLinkage.cpp index d315d9bd16f439..d32dda2a67c951 100644 --- a/llvm/lib/Target/DirectX/DXILFinalizeLinkage.cpp +++ b/llvm/lib/Target/DirectX/DXILFinalizeLinkage.cpp @@ -1,65 +1,65 @@ -//===- DXILFinalizeLinkage.cpp - Finalize linkage of functions ------------===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// - -#include "DXILFinalizeLinkage.h" -#include "DirectX.h" -#include "llvm/Analysis/DXILResource.h" -#include "llvm/IR/Function.h" -#include "llvm/IR/GlobalValue.h" -#include "llvm/IR/Metadata.h" -#include "llvm/IR/Module.h" - -#define DEBUG_TYPE "dxil-finalize-linkage" - -using namespace llvm; - -static bool finalizeLinkage(Module &M) { - SmallPtrSet EntriesAndExports; - - // Find all entry points and export functions - for (Function &EF : M.functions()) { - if (!EF.hasFnAttribute("hlsl.shader") && !EF.hasFnAttribute("hlsl.export")) - continue; - EntriesAndExports.insert(&EF); - } - - for (Function &F : M.functions()) { - if (F.getLinkage() == GlobalValue::ExternalLinkage && - !EntriesAndExports.contains(&F)) { - F.setLinkage(GlobalValue::InternalLinkage); - } - } - - return false; -} - -PreservedAnalyses DXILFinalizeLinkage::run(Module &M, - ModuleAnalysisManager &AM) { - if (finalizeLinkage(M)) - return PreservedAnalyses::none(); - return PreservedAnalyses::all(); -} - -bool DXILFinalizeLinkageLegacy::runOnModule(Module &M) { - return finalizeLinkage(M); -} - -void DXILFinalizeLinkageLegacy::getAnalysisUsage(AnalysisUsage &AU) const { - AU.addPreserved(); -} - -char DXILFinalizeLinkageLegacy::ID = 0; - -INITIALIZE_PASS_BEGIN(DXILFinalizeLinkageLegacy, DEBUG_TYPE, - "DXIL Finalize Linkage", false, false) -INITIALIZE_PASS_END(DXILFinalizeLinkageLegacy, DEBUG_TYPE, - "DXIL Finalize Linkage", false, false) - -ModulePass *llvm::createDXILFinalizeLinkageLegacyPass() { - return new DXILFinalizeLinkageLegacy(); -} +//===- DXILFinalizeLinkage.cpp - Finalize linkage of functions ------------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#include "DXILFinalizeLinkage.h" +#include "DirectX.h" +#include "llvm/Analysis/DXILResource.h" +#include "llvm/IR/Function.h" +#include "llvm/IR/GlobalValue.h" +#include "llvm/IR/Metadata.h" +#include "llvm/IR/Module.h" + +#define DEBUG_TYPE "dxil-finalize-linkage" + +using namespace llvm; + +static bool finalizeLinkage(Module &M) { + SmallPtrSet EntriesAndExports; + + // Find all entry points and export functions + for (Function &EF : M.functions()) { + if (!EF.hasFnAttribute("hlsl.shader") && !EF.hasFnAttribute("hlsl.export")) + continue; + EntriesAndExports.insert(&EF); + } + + for (Function &F : M.functions()) { + if (F.getLinkage() == GlobalValue::ExternalLinkage && + !EntriesAndExports.contains(&F)) { + F.setLinkage(GlobalValue::InternalLinkage); + } + } + + return false; +} + +PreservedAnalyses DXILFinalizeLinkage::run(Module &M, + ModuleAnalysisManager &AM) { + if (finalizeLinkage(M)) + return PreservedAnalyses::none(); + return PreservedAnalyses::all(); +} + +bool DXILFinalizeLinkageLegacy::runOnModule(Module &M) { + return finalizeLinkage(M); +} + +void DXILFinalizeLinkageLegacy::getAnalysisUsage(AnalysisUsage &AU) const { + AU.addPreserved(); +} + +char DXILFinalizeLinkageLegacy::ID = 0; + +INITIALIZE_PASS_BEGIN(DXILFinalizeLinkageLegacy, DEBUG_TYPE, + "DXIL Finalize Linkage", false, false) +INITIALIZE_PASS_END(DXILFinalizeLinkageLegacy, DEBUG_TYPE, + "DXIL Finalize Linkage", false, false) + +ModulePass *llvm::createDXILFinalizeLinkageLegacyPass() { + return new DXILFinalizeLinkageLegacy(); +} diff --git a/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.cpp b/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.cpp index 8ea31401121bce..9844fd394aa4c5 100644 --- a/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.cpp +++ b/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.cpp @@ -1,38 +1,38 @@ -//===- DirectXTargetTransformInfo.cpp - DirectX TTI ---------------*- C++ -//-*-===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// -/// -//===----------------------------------------------------------------------===// - -#include "DirectXTargetTransformInfo.h" -#include "llvm/IR/Intrinsics.h" -#include "llvm/IR/IntrinsicsDirectX.h" - -using namespace llvm; - -bool DirectXTTIImpl::isTargetIntrinsicWithScalarOpAtArg(Intrinsic::ID ID, - unsigned ScalarOpdIdx) { - switch (ID) { - case Intrinsic::dx_wave_readlane: - return ScalarOpdIdx == 1; - default: - return false; - } -} - -bool DirectXTTIImpl::isTargetIntrinsicTriviallyScalarizable( - Intrinsic::ID ID) const { - switch (ID) { - case Intrinsic::dx_frac: - case Intrinsic::dx_rsqrt: - case Intrinsic::dx_wave_readlane: - return true; - default: - return false; - } -} +//===- DirectXTargetTransformInfo.cpp - DirectX TTI ---------------*- C++ +//-*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +/// +//===----------------------------------------------------------------------===// + +#include "DirectXTargetTransformInfo.h" +#include "llvm/IR/Intrinsics.h" +#include "llvm/IR/IntrinsicsDirectX.h" + +using namespace llvm; + +bool DirectXTTIImpl::isTargetIntrinsicWithScalarOpAtArg(Intrinsic::ID ID, + unsigned ScalarOpdIdx) { + switch (ID) { + case Intrinsic::dx_wave_readlane: + return ScalarOpdIdx == 1; + default: + return false; + } +} + +bool DirectXTTIImpl::isTargetIntrinsicTriviallyScalarizable( + Intrinsic::ID ID) const { + switch (ID) { + case Intrinsic::dx_frac: + case Intrinsic::dx_rsqrt: + case Intrinsic::dx_wave_readlane: + return true; + default: + return false; + } +} diff --git a/llvm/test/CodeGen/DirectX/atan2.ll b/llvm/test/CodeGen/DirectX/atan2.ll index 9d86f87f3ed50e..b2c650d1162655 100644 --- a/llvm/test/CodeGen/DirectX/atan2.ll +++ b/llvm/test/CodeGen/DirectX/atan2.ll @@ -1,87 +1,87 @@ -; RUN: opt -S -dxil-intrinsic-expansion -mtriple=dxil-pc-shadermodel6.3-library %s | FileCheck %s --check-prefixes=CHECK,EXPCHECK -; RUN: opt -S -dxil-intrinsic-expansion -scalarizer -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library %s | FileCheck %s --check-prefixes=CHECK,DOPCHECK - -; Make sure correct dxil expansions for atan2 are generated for float and half. - -define noundef float @atan2_float(float noundef %y, float noundef %x) { -entry: -; CHECK: [[DIV:%.+]] = fdiv float %y, %x -; EXPCHECK: [[ATAN:%.+]] = call float @llvm.atan.f32(float [[DIV]]) -; DOPCHECK: [[ATAN:%.+]] = call float @dx.op.unary.f32(i32 17, float [[DIV]]) -; CHECK-DAG: [[ADD_PI:%.+]] = fadd float [[ATAN]], 0x400921FB60000000 -; CHECK-DAG: [[SUB_PI:%.+]] = fsub float [[ATAN]], 0x400921FB60000000 -; CHECK-DAG: [[X_LT_0:%.+]] = fcmp olt float %x, 0.000000e+00 -; CHECK-DAG: [[X_EQ_0:%.+]] = fcmp oeq float %x, 0.000000e+00 -; CHECK-DAG: [[Y_GE_0:%.+]] = fcmp oge float %y, 0.000000e+00 -; CHECK-DAG: [[Y_LT_0:%.+]] = fcmp olt float %y, 0.000000e+00 -; CHECK: [[XLT0_AND_YGE0:%.+]] = and i1 [[X_LT_0]], [[Y_GE_0]] -; CHECK: [[SELECT_ADD_PI:%.+]] = select i1 [[XLT0_AND_YGE0]], float [[ADD_PI]], float [[ATAN]] -; CHECK: [[XLT0_AND_YLT0:%.+]] = and i1 [[X_LT_0]], [[Y_LT_0]] -; CHECK: [[SELECT_SUB_PI:%.+]] = select i1 [[XLT0_AND_YLT0]], float [[SUB_PI]], float [[SELECT_ADD_PI]] -; CHECK: [[XEQ0_AND_YLT0:%.+]] = and i1 [[X_EQ_0]], [[Y_LT_0]] -; CHECK: [[SELECT_NEGHPI:%.+]] = select i1 [[XEQ0_AND_YLT0]], float 0xBFF921FB60000000, float [[SELECT_SUB_PI]] -; CHECK: [[XEQ0_AND_YGE0:%.+]] = and i1 [[X_EQ_0]], [[Y_GE_0]] -; CHECK: [[SELECT_HPI:%.+]] = select i1 [[XEQ0_AND_YGE0]], float 0x3FF921FB60000000, float [[SELECT_NEGHPI]] -; CHECK: ret float [[SELECT_HPI]] - %elt.atan2 = call float @llvm.atan2.f32(float %y, float %x) - ret float %elt.atan2 -} - -define noundef half @atan2_half(half noundef %y, half noundef %x) { -entry: -; CHECK: [[DIV:%.+]] = fdiv half %y, %x -; EXPCHECK: [[ATAN:%.+]] = call half @llvm.atan.f16(half [[DIV]]) -; DOPCHECK: [[ATAN:%.+]] = call half @dx.op.unary.f16(i32 17, half [[DIV]]) -; CHECK-DAG: [[ADD_PI:%.+]] = fadd half [[ATAN]], 0xH4248 -; CHECK-DAG: [[SUB_PI:%.+]] = fsub half [[ATAN]], 0xH4248 -; CHECK-DAG: [[X_LT_0:%.+]] = fcmp olt half %x, 0xH0000 -; CHECK-DAG: [[X_EQ_0:%.+]] = fcmp oeq half %x, 0xH0000 -; CHECK-DAG: [[Y_GE_0:%.+]] = fcmp oge half %y, 0xH0000 -; CHECK-DAG: [[Y_LT_0:%.+]] = fcmp olt half %y, 0xH0000 -; CHECK: [[XLT0_AND_YGE0:%.+]] = and i1 [[X_LT_0]], [[Y_GE_0]] -; CHECK: [[SELECT_ADD_PI:%.+]] = select i1 [[XLT0_AND_YGE0]], half [[ADD_PI]], half [[ATAN]] -; CHECK: [[XLT0_AND_YLT0:%.+]] = and i1 [[X_LT_0]], [[Y_LT_0]] -; CHECK: [[SELECT_SUB_PI:%.+]] = select i1 [[XLT0_AND_YLT0]], half [[SUB_PI]], half [[SELECT_ADD_PI]] -; CHECK: [[XEQ0_AND_YLT0:%.+]] = and i1 [[X_EQ_0]], [[Y_LT_0]] -; CHECK: [[SELECT_NEGHPI:%.+]] = select i1 [[XEQ0_AND_YLT0]], half 0xHBE48, half [[SELECT_SUB_PI]] -; CHECK: [[XEQ0_AND_YGE0:%.+]] = and i1 [[X_EQ_0]], [[Y_GE_0]] -; CHECK: [[SELECT_HPI:%.+]] = select i1 [[XEQ0_AND_YGE0]], half 0xH3E48, half [[SELECT_NEGHPI]] -; CHECK: ret half [[SELECT_HPI]] - %elt.atan2 = call half @llvm.atan2.f16(half %y, half %x) - ret half %elt.atan2 -} - -define noundef <4 x float> @atan2_float4(<4 x float> noundef %y, <4 x float> noundef %x) { -entry: -; Just Expansion, no scalarization or lowering: -; EXPCHECK: [[DIV:%.+]] = fdiv <4 x float> %y, %x -; EXPCHECK: [[ATAN:%.+]] = call <4 x float> @llvm.atan.v4f32(<4 x float> [[DIV]]) -; EXPCHECK-DAG: [[ADD_PI:%.+]] = fadd <4 x float> [[ATAN]], -; EXPCHECK-DAG: [[SUB_PI:%.+]] = fsub <4 x float> [[ATAN]], -; EXPCHECK-DAG: [[X_LT_0:%.+]] = fcmp olt <4 x float> %x, zeroinitializer -; EXPCHECK-DAG: [[X_EQ_0:%.+]] = fcmp oeq <4 x float> %x, zeroinitializer -; EXPCHECK-DAG: [[Y_GE_0:%.+]] = fcmp oge <4 x float> %y, zeroinitializer -; EXPCHECK-DAG: [[Y_LT_0:%.+]] = fcmp olt <4 x float> %y, zeroinitializer -; EXPCHECK: [[XLT0_AND_YGE0:%.+]] = and <4 x i1> [[X_LT_0]], [[Y_GE_0]] -; EXPCHECK: [[SELECT_ADD_PI:%.+]] = select <4 x i1> [[XLT0_AND_YGE0]], <4 x float> [[ADD_PI]], <4 x float> [[ATAN]] -; EXPCHECK: [[XLT0_AND_YLT0:%.+]] = and <4 x i1> [[X_LT_0]], [[Y_LT_0]] -; EXPCHECK: [[SELECT_SUB_PI:%.+]] = select <4 x i1> [[XLT0_AND_YLT0]], <4 x float> [[SUB_PI]], <4 x float> [[SELECT_ADD_PI]] -; EXPCHECK: [[XEQ0_AND_YLT0:%.+]] = and <4 x i1> [[X_EQ_0]], [[Y_LT_0]] -; EXPCHECK: [[SELECT_NEGHPI:%.+]] = select <4 x i1> [[XEQ0_AND_YLT0]], <4 x float> , <4 x float> [[SELECT_SUB_PI]] -; EXPCHECK: [[XEQ0_AND_YGE0:%.+]] = and <4 x i1> [[X_EQ_0]], [[Y_GE_0]] -; EXPCHECK: [[SELECT_HPI:%.+]] = select <4 x i1> [[XEQ0_AND_YGE0]], <4 x float> , <4 x float> [[SELECT_NEGHPI]] -; EXPCHECK: ret <4 x float> [[SELECT_HPI]] - -; Scalarization occurs after expansion, so atan scalarization is tested separately. -; Expansion, scalarization and lowering: -; Just make sure this expands to exactly 4 scalar DXIL atan (OpCode=17) calls. -; DOPCHECK-COUNT-4: call float @dx.op.unary.f32(i32 17, float %{{.*}}) -; DOPCHECK-NOT: call float @dx.op.unary.f32(i32 17, - - %elt.atan2 = call <4 x float> @llvm.atan2.v4f32(<4 x float> %y, <4 x float> %x) - ret <4 x float> %elt.atan2 -} - -declare half @llvm.atan2.f16(half, half) -declare float @llvm.atan2.f32(float, float) -declare <4 x float> @llvm.atan2.v4f32(<4 x float>, <4 x float>) +; RUN: opt -S -dxil-intrinsic-expansion -mtriple=dxil-pc-shadermodel6.3-library %s | FileCheck %s --check-prefixes=CHECK,EXPCHECK +; RUN: opt -S -dxil-intrinsic-expansion -scalarizer -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library %s | FileCheck %s --check-prefixes=CHECK,DOPCHECK + +; Make sure correct dxil expansions for atan2 are generated for float and half. + +define noundef float @atan2_float(float noundef %y, float noundef %x) { +entry: +; CHECK: [[DIV:%.+]] = fdiv float %y, %x +; EXPCHECK: [[ATAN:%.+]] = call float @llvm.atan.f32(float [[DIV]]) +; DOPCHECK: [[ATAN:%.+]] = call float @dx.op.unary.f32(i32 17, float [[DIV]]) +; CHECK-DAG: [[ADD_PI:%.+]] = fadd float [[ATAN]], 0x400921FB60000000 +; CHECK-DAG: [[SUB_PI:%.+]] = fsub float [[ATAN]], 0x400921FB60000000 +; CHECK-DAG: [[X_LT_0:%.+]] = fcmp olt float %x, 0.000000e+00 +; CHECK-DAG: [[X_EQ_0:%.+]] = fcmp oeq float %x, 0.000000e+00 +; CHECK-DAG: [[Y_GE_0:%.+]] = fcmp oge float %y, 0.000000e+00 +; CHECK-DAG: [[Y_LT_0:%.+]] = fcmp olt float %y, 0.000000e+00 +; CHECK: [[XLT0_AND_YGE0:%.+]] = and i1 [[X_LT_0]], [[Y_GE_0]] +; CHECK: [[SELECT_ADD_PI:%.+]] = select i1 [[XLT0_AND_YGE0]], float [[ADD_PI]], float [[ATAN]] +; CHECK: [[XLT0_AND_YLT0:%.+]] = and i1 [[X_LT_0]], [[Y_LT_0]] +; CHECK: [[SELECT_SUB_PI:%.+]] = select i1 [[XLT0_AND_YLT0]], float [[SUB_PI]], float [[SELECT_ADD_PI]] +; CHECK: [[XEQ0_AND_YLT0:%.+]] = and i1 [[X_EQ_0]], [[Y_LT_0]] +; CHECK: [[SELECT_NEGHPI:%.+]] = select i1 [[XEQ0_AND_YLT0]], float 0xBFF921FB60000000, float [[SELECT_SUB_PI]] +; CHECK: [[XEQ0_AND_YGE0:%.+]] = and i1 [[X_EQ_0]], [[Y_GE_0]] +; CHECK: [[SELECT_HPI:%.+]] = select i1 [[XEQ0_AND_YGE0]], float 0x3FF921FB60000000, float [[SELECT_NEGHPI]] +; CHECK: ret float [[SELECT_HPI]] + %elt.atan2 = call float @llvm.atan2.f32(float %y, float %x) + ret float %elt.atan2 +} + +define noundef half @atan2_half(half noundef %y, half noundef %x) { +entry: +; CHECK: [[DIV:%.+]] = fdiv half %y, %x +; EXPCHECK: [[ATAN:%.+]] = call half @llvm.atan.f16(half [[DIV]]) +; DOPCHECK: [[ATAN:%.+]] = call half @dx.op.unary.f16(i32 17, half [[DIV]]) +; CHECK-DAG: [[ADD_PI:%.+]] = fadd half [[ATAN]], 0xH4248 +; CHECK-DAG: [[SUB_PI:%.+]] = fsub half [[ATAN]], 0xH4248 +; CHECK-DAG: [[X_LT_0:%.+]] = fcmp olt half %x, 0xH0000 +; CHECK-DAG: [[X_EQ_0:%.+]] = fcmp oeq half %x, 0xH0000 +; CHECK-DAG: [[Y_GE_0:%.+]] = fcmp oge half %y, 0xH0000 +; CHECK-DAG: [[Y_LT_0:%.+]] = fcmp olt half %y, 0xH0000 +; CHECK: [[XLT0_AND_YGE0:%.+]] = and i1 [[X_LT_0]], [[Y_GE_0]] +; CHECK: [[SELECT_ADD_PI:%.+]] = select i1 [[XLT0_AND_YGE0]], half [[ADD_PI]], half [[ATAN]] +; CHECK: [[XLT0_AND_YLT0:%.+]] = and i1 [[X_LT_0]], [[Y_LT_0]] +; CHECK: [[SELECT_SUB_PI:%.+]] = select i1 [[XLT0_AND_YLT0]], half [[SUB_PI]], half [[SELECT_ADD_PI]] +; CHECK: [[XEQ0_AND_YLT0:%.+]] = and i1 [[X_EQ_0]], [[Y_LT_0]] +; CHECK: [[SELECT_NEGHPI:%.+]] = select i1 [[XEQ0_AND_YLT0]], half 0xHBE48, half [[SELECT_SUB_PI]] +; CHECK: [[XEQ0_AND_YGE0:%.+]] = and i1 [[X_EQ_0]], [[Y_GE_0]] +; CHECK: [[SELECT_HPI:%.+]] = select i1 [[XEQ0_AND_YGE0]], half 0xH3E48, half [[SELECT_NEGHPI]] +; CHECK: ret half [[SELECT_HPI]] + %elt.atan2 = call half @llvm.atan2.f16(half %y, half %x) + ret half %elt.atan2 +} + +define noundef <4 x float> @atan2_float4(<4 x float> noundef %y, <4 x float> noundef %x) { +entry: +; Just Expansion, no scalarization or lowering: +; EXPCHECK: [[DIV:%.+]] = fdiv <4 x float> %y, %x +; EXPCHECK: [[ATAN:%.+]] = call <4 x float> @llvm.atan.v4f32(<4 x float> [[DIV]]) +; EXPCHECK-DAG: [[ADD_PI:%.+]] = fadd <4 x float> [[ATAN]], +; EXPCHECK-DAG: [[SUB_PI:%.+]] = fsub <4 x float> [[ATAN]], +; EXPCHECK-DAG: [[X_LT_0:%.+]] = fcmp olt <4 x float> %x, zeroinitializer +; EXPCHECK-DAG: [[X_EQ_0:%.+]] = fcmp oeq <4 x float> %x, zeroinitializer +; EXPCHECK-DAG: [[Y_GE_0:%.+]] = fcmp oge <4 x float> %y, zeroinitializer +; EXPCHECK-DAG: [[Y_LT_0:%.+]] = fcmp olt <4 x float> %y, zeroinitializer +; EXPCHECK: [[XLT0_AND_YGE0:%.+]] = and <4 x i1> [[X_LT_0]], [[Y_GE_0]] +; EXPCHECK: [[SELECT_ADD_PI:%.+]] = select <4 x i1> [[XLT0_AND_YGE0]], <4 x float> [[ADD_PI]], <4 x float> [[ATAN]] +; EXPCHECK: [[XLT0_AND_YLT0:%.+]] = and <4 x i1> [[X_LT_0]], [[Y_LT_0]] +; EXPCHECK: [[SELECT_SUB_PI:%.+]] = select <4 x i1> [[XLT0_AND_YLT0]], <4 x float> [[SUB_PI]], <4 x float> [[SELECT_ADD_PI]] +; EXPCHECK: [[XEQ0_AND_YLT0:%.+]] = and <4 x i1> [[X_EQ_0]], [[Y_LT_0]] +; EXPCHECK: [[SELECT_NEGHPI:%.+]] = select <4 x i1> [[XEQ0_AND_YLT0]], <4 x float> , <4 x float> [[SELECT_SUB_PI]] +; EXPCHECK: [[XEQ0_AND_YGE0:%.+]] = and <4 x i1> [[X_EQ_0]], [[Y_GE_0]] +; EXPCHECK: [[SELECT_HPI:%.+]] = select <4 x i1> [[XEQ0_AND_YGE0]], <4 x float> , <4 x float> [[SELECT_NEGHPI]] +; EXPCHECK: ret <4 x float> [[SELECT_HPI]] + +; Scalarization occurs after expansion, so atan scalarization is tested separately. +; Expansion, scalarization and lowering: +; Just make sure this expands to exactly 4 scalar DXIL atan (OpCode=17) calls. +; DOPCHECK-COUNT-4: call float @dx.op.unary.f32(i32 17, float %{{.*}}) +; DOPCHECK-NOT: call float @dx.op.unary.f32(i32 17, + + %elt.atan2 = call <4 x float> @llvm.atan2.v4f32(<4 x float> %y, <4 x float> %x) + ret <4 x float> %elt.atan2 +} + +declare half @llvm.atan2.f16(half, half) +declare float @llvm.atan2.f32(float, float) +declare <4 x float> @llvm.atan2.v4f32(<4 x float>, <4 x float>) diff --git a/llvm/test/CodeGen/DirectX/atan2_error.ll b/llvm/test/CodeGen/DirectX/atan2_error.ll index 372934098b7cab..9b66f9f1dd45a7 100644 --- a/llvm/test/CodeGen/DirectX/atan2_error.ll +++ b/llvm/test/CodeGen/DirectX/atan2_error.ll @@ -1,11 +1,11 @@ -; RUN: not opt -S -dxil-intrinsic-expansion -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library %s 2>&1 | FileCheck %s - -; DXIL operation atan does not support double overload type -; CHECK: in function atan2_double -; CHECK-SAME: Cannot create ATan operation: Invalid overload type - -define noundef double @atan2_double(double noundef %a, double noundef %b) #0 { -entry: - %1 = call double @llvm.atan2.f64(double %a, double %b) - ret double %1 -} +; RUN: not opt -S -dxil-intrinsic-expansion -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library %s 2>&1 | FileCheck %s + +; DXIL operation atan does not support double overload type +; CHECK: in function atan2_double +; CHECK-SAME: Cannot create ATan operation: Invalid overload type + +define noundef double @atan2_double(double noundef %a, double noundef %b) #0 { +entry: + %1 = call double @llvm.atan2.f64(double %a, double %b) + ret double %1 +} diff --git a/llvm/test/CodeGen/DirectX/cross.ll b/llvm/test/CodeGen/DirectX/cross.ll index 6ec3ec4d3594af..6153cf7cddc9d5 100644 --- a/llvm/test/CodeGen/DirectX/cross.ll +++ b/llvm/test/CodeGen/DirectX/cross.ll @@ -1,56 +1,56 @@ -; RUN: opt -S -dxil-intrinsic-expansion < %s | FileCheck %s - -; Make sure dxil operation function calls for cross are generated for half/float. - -declare <3 x half> @llvm.dx.cross.v3f16(<3 x half>, <3 x half>) -declare <3 x float> @llvm.dx.cross.v3f32(<3 x float>, <3 x float>) - -define noundef <3 x half> @test_cross_half3(<3 x half> noundef %p0, <3 x half> noundef %p1) { -entry: - ; CHECK: %x0 = extractelement <3 x half> %p0, i64 0 - ; CHECK: %x1 = extractelement <3 x half> %p0, i64 1 - ; CHECK: %x2 = extractelement <3 x half> %p0, i64 2 - ; CHECK: %y0 = extractelement <3 x half> %p1, i64 0 - ; CHECK: %y1 = extractelement <3 x half> %p1, i64 1 - ; CHECK: %y2 = extractelement <3 x half> %p1, i64 2 - ; CHECK: %0 = fmul half %x1, %y2 - ; CHECK: %1 = fmul half %x2, %y1 - ; CHECK: %hlsl.cross1 = fsub half %0, %1 - ; CHECK: %2 = fmul half %x2, %y0 - ; CHECK: %3 = fmul half %x0, %y2 - ; CHECK: %hlsl.cross2 = fsub half %2, %3 - ; CHECK: %4 = fmul half %x0, %y1 - ; CHECK: %5 = fmul half %x1, %y0 - ; CHECK: %hlsl.cross3 = fsub half %4, %5 - ; CHECK: %6 = insertelement <3 x half> undef, half %hlsl.cross1, i64 0 - ; CHECK: %7 = insertelement <3 x half> %6, half %hlsl.cross2, i64 1 - ; CHECK: %8 = insertelement <3 x half> %7, half %hlsl.cross3, i64 2 - ; CHECK: ret <3 x half> %8 - %hlsl.cross = call <3 x half> @llvm.dx.cross.v3f16(<3 x half> %p0, <3 x half> %p1) - ret <3 x half> %hlsl.cross -} - -define noundef <3 x float> @test_cross_float3(<3 x float> noundef %p0, <3 x float> noundef %p1) { -entry: - ; CHECK: %x0 = extractelement <3 x float> %p0, i64 0 - ; CHECK: %x1 = extractelement <3 x float> %p0, i64 1 - ; CHECK: %x2 = extractelement <3 x float> %p0, i64 2 - ; CHECK: %y0 = extractelement <3 x float> %p1, i64 0 - ; CHECK: %y1 = extractelement <3 x float> %p1, i64 1 - ; CHECK: %y2 = extractelement <3 x float> %p1, i64 2 - ; CHECK: %0 = fmul float %x1, %y2 - ; CHECK: %1 = fmul float %x2, %y1 - ; CHECK: %hlsl.cross1 = fsub float %0, %1 - ; CHECK: %2 = fmul float %x2, %y0 - ; CHECK: %3 = fmul float %x0, %y2 - ; CHECK: %hlsl.cross2 = fsub float %2, %3 - ; CHECK: %4 = fmul float %x0, %y1 - ; CHECK: %5 = fmul float %x1, %y0 - ; CHECK: %hlsl.cross3 = fsub float %4, %5 - ; CHECK: %6 = insertelement <3 x float> undef, float %hlsl.cross1, i64 0 - ; CHECK: %7 = insertelement <3 x float> %6, float %hlsl.cross2, i64 1 - ; CHECK: %8 = insertelement <3 x float> %7, float %hlsl.cross3, i64 2 - ; CHECK: ret <3 x float> %8 - %hlsl.cross = call <3 x float> @llvm.dx.cross.v3f32(<3 x float> %p0, <3 x float> %p1) - ret <3 x float> %hlsl.cross -} +; RUN: opt -S -dxil-intrinsic-expansion < %s | FileCheck %s + +; Make sure dxil operation function calls for cross are generated for half/float. + +declare <3 x half> @llvm.dx.cross.v3f16(<3 x half>, <3 x half>) +declare <3 x float> @llvm.dx.cross.v3f32(<3 x float>, <3 x float>) + +define noundef <3 x half> @test_cross_half3(<3 x half> noundef %p0, <3 x half> noundef %p1) { +entry: + ; CHECK: %x0 = extractelement <3 x half> %p0, i64 0 + ; CHECK: %x1 = extractelement <3 x half> %p0, i64 1 + ; CHECK: %x2 = extractelement <3 x half> %p0, i64 2 + ; CHECK: %y0 = extractelement <3 x half> %p1, i64 0 + ; CHECK: %y1 = extractelement <3 x half> %p1, i64 1 + ; CHECK: %y2 = extractelement <3 x half> %p1, i64 2 + ; CHECK: %0 = fmul half %x1, %y2 + ; CHECK: %1 = fmul half %x2, %y1 + ; CHECK: %hlsl.cross1 = fsub half %0, %1 + ; CHECK: %2 = fmul half %x2, %y0 + ; CHECK: %3 = fmul half %x0, %y2 + ; CHECK: %hlsl.cross2 = fsub half %2, %3 + ; CHECK: %4 = fmul half %x0, %y1 + ; CHECK: %5 = fmul half %x1, %y0 + ; CHECK: %hlsl.cross3 = fsub half %4, %5 + ; CHECK: %6 = insertelement <3 x half> undef, half %hlsl.cross1, i64 0 + ; CHECK: %7 = insertelement <3 x half> %6, half %hlsl.cross2, i64 1 + ; CHECK: %8 = insertelement <3 x half> %7, half %hlsl.cross3, i64 2 + ; CHECK: ret <3 x half> %8 + %hlsl.cross = call <3 x half> @llvm.dx.cross.v3f16(<3 x half> %p0, <3 x half> %p1) + ret <3 x half> %hlsl.cross +} + +define noundef <3 x float> @test_cross_float3(<3 x float> noundef %p0, <3 x float> noundef %p1) { +entry: + ; CHECK: %x0 = extractelement <3 x float> %p0, i64 0 + ; CHECK: %x1 = extractelement <3 x float> %p0, i64 1 + ; CHECK: %x2 = extractelement <3 x float> %p0, i64 2 + ; CHECK: %y0 = extractelement <3 x float> %p1, i64 0 + ; CHECK: %y1 = extractelement <3 x float> %p1, i64 1 + ; CHECK: %y2 = extractelement <3 x float> %p1, i64 2 + ; CHECK: %0 = fmul float %x1, %y2 + ; CHECK: %1 = fmul float %x2, %y1 + ; CHECK: %hlsl.cross1 = fsub float %0, %1 + ; CHECK: %2 = fmul float %x2, %y0 + ; CHECK: %3 = fmul float %x0, %y2 + ; CHECK: %hlsl.cross2 = fsub float %2, %3 + ; CHECK: %4 = fmul float %x0, %y1 + ; CHECK: %5 = fmul float %x1, %y0 + ; CHECK: %hlsl.cross3 = fsub float %4, %5 + ; CHECK: %6 = insertelement <3 x float> undef, float %hlsl.cross1, i64 0 + ; CHECK: %7 = insertelement <3 x float> %6, float %hlsl.cross2, i64 1 + ; CHECK: %8 = insertelement <3 x float> %7, float %hlsl.cross3, i64 2 + ; CHECK: ret <3 x float> %8 + %hlsl.cross = call <3 x float> @llvm.dx.cross.v3f32(<3 x float> %p0, <3 x float> %p1) + ret <3 x float> %hlsl.cross +} diff --git a/llvm/test/CodeGen/DirectX/finalize_linkage.ll b/llvm/test/CodeGen/DirectX/finalize_linkage.ll index 0ee8a5f44593ba..b6da9f6cb3926a 100644 --- a/llvm/test/CodeGen/DirectX/finalize_linkage.ll +++ b/llvm/test/CodeGen/DirectX/finalize_linkage.ll @@ -1,64 +1,64 @@ -; RUN: opt -S -dxil-finalize-linkage -mtriple=dxil-unknown-shadermodel6.5-compute %s | FileCheck %s -; RUN: llc %s --filetype=asm -o - | FileCheck %s --check-prefixes=CHECK-LLC - -target triple = "dxilv1.5-pc-shadermodel6.5-compute" - -; DXILFinalizeLinkage changes linkage of all functions that are not -; entry points or exported function to internal. - -; CHECK: define internal void @"?f1@@YAXXZ"() -define void @"?f1@@YAXXZ"() #0 { -entry: - ret void -} - -; CHECK: define internal void @"?f2@@YAXXZ"() -define void @"?f2@@YAXXZ"() #0 { -entry: - ret void -} - -; CHECK: define internal void @"?f3@@YAXXZ"() -define void @"?f3@@YAXXZ"() #0 { -entry: - ret void -} - -; CHECK: define internal void @"?foo@@YAXXZ"() -define void @"?foo@@YAXXZ"() #0 { -entry: - call void @"?f2@@YAXXZ"() #3 - ret void -} - -; Exported function - do not change linkage -; CHECK: define void @"?bar@@YAXXZ"() -define void @"?bar@@YAXXZ"() #1 { -entry: - call void @"?f3@@YAXXZ"() #3 - ret void -} - -; CHECK: define internal void @"?main@@YAXXZ"() #0 -define internal void @"?main@@YAXXZ"() #0 { -entry: - call void @"?foo@@YAXXZ"() #3 - call void @"?bar@@YAXXZ"() #3 - ret void -} - -; Entry point function - do not change linkage -; CHECK: define void @main() #2 -define void @main() #2 { -entry: - call void @"?main@@YAXXZ"() - ret void -} - -attributes #0 = { convergent noinline nounwind optnone} -attributes #1 = { convergent noinline nounwind optnone "hlsl.export"} -attributes #2 = { convergent "hlsl.numthreads"="4,1,1" "hlsl.shader"="compute"} -attributes #3 = { convergent } - -; Make sure "hlsl.export" attribute is stripped by llc -; CHECK-LLC-NOT: "hlsl.export" +; RUN: opt -S -dxil-finalize-linkage -mtriple=dxil-unknown-shadermodel6.5-compute %s | FileCheck %s +; RUN: llc %s --filetype=asm -o - | FileCheck %s --check-prefixes=CHECK-LLC + +target triple = "dxilv1.5-pc-shadermodel6.5-compute" + +; DXILFinalizeLinkage changes linkage of all functions that are not +; entry points or exported function to internal. + +; CHECK: define internal void @"?f1@@YAXXZ"() +define void @"?f1@@YAXXZ"() #0 { +entry: + ret void +} + +; CHECK: define internal void @"?f2@@YAXXZ"() +define void @"?f2@@YAXXZ"() #0 { +entry: + ret void +} + +; CHECK: define internal void @"?f3@@YAXXZ"() +define void @"?f3@@YAXXZ"() #0 { +entry: + ret void +} + +; CHECK: define internal void @"?foo@@YAXXZ"() +define void @"?foo@@YAXXZ"() #0 { +entry: + call void @"?f2@@YAXXZ"() #3 + ret void +} + +; Exported function - do not change linkage +; CHECK: define void @"?bar@@YAXXZ"() +define void @"?bar@@YAXXZ"() #1 { +entry: + call void @"?f3@@YAXXZ"() #3 + ret void +} + +; CHECK: define internal void @"?main@@YAXXZ"() #0 +define internal void @"?main@@YAXXZ"() #0 { +entry: + call void @"?foo@@YAXXZ"() #3 + call void @"?bar@@YAXXZ"() #3 + ret void +} + +; Entry point function - do not change linkage +; CHECK: define void @main() #2 +define void @main() #2 { +entry: + call void @"?main@@YAXXZ"() + ret void +} + +attributes #0 = { convergent noinline nounwind optnone} +attributes #1 = { convergent noinline nounwind optnone "hlsl.export"} +attributes #2 = { convergent "hlsl.numthreads"="4,1,1" "hlsl.shader"="compute"} +attributes #3 = { convergent } + +; Make sure "hlsl.export" attribute is stripped by llc +; CHECK-LLC-NOT: "hlsl.export" diff --git a/llvm/test/CodeGen/DirectX/normalize.ll b/llvm/test/CodeGen/DirectX/normalize.ll index 2aba9d5f74d78e..de106be1243712 100644 --- a/llvm/test/CodeGen/DirectX/normalize.ll +++ b/llvm/test/CodeGen/DirectX/normalize.ll @@ -1,112 +1,112 @@ -; RUN: opt -S -dxil-intrinsic-expansion < %s | FileCheck %s --check-prefixes=CHECK,EXPCHECK -; RUN: opt -S -dxil-intrinsic-expansion -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library < %s | FileCheck %s --check-prefixes=CHECK,DOPCHECK - -; Make sure dxil operation function calls for normalize are generated for half/float. - -declare half @llvm.dx.normalize.f16(half) -declare <2 x half> @llvm.dx.normalize.v2f16(<2 x half>) -declare <3 x half> @llvm.dx.normalize.v3f16(<3 x half>) -declare <4 x half> @llvm.dx.normalize.v4f16(<4 x half>) - -declare float @llvm.dx.normalize.f32(float) -declare <2 x float> @llvm.dx.normalize.v2f32(<2 x float>) -declare <3 x float> @llvm.dx.normalize.v3f32(<3 x float>) -declare <4 x float> @llvm.dx.normalize.v4f32(<4 x float>) - -define noundef half @test_normalize_half(half noundef %p0) { -entry: - ; CHECK: fdiv half %p0, %p0 - %hlsl.normalize = call half @llvm.dx.normalize.f16(half %p0) - ret half %hlsl.normalize -} - -define noundef <2 x half> @test_normalize_half2(<2 x half> noundef %p0) { -entry: - ; EXPCHECK: [[doth2:%.*]] = call half @llvm.dx.dot2.v2f16(<2 x half> %{{.*}}, <2 x half> %{{.*}}) - ; DOPCHECK: [[doth2:%.*]] = call half @dx.op.dot2.f16(i32 54, half %{{.*}}, half %{{.*}}, half %{{.*}}, half %{{.*}}) - ; EXPCHECK: [[rsqrt:%.*]] = call half @llvm.dx.rsqrt.f16(half [[doth2]]) - ; DOPCHECK: [[rsqrt:%.*]] = call half @dx.op.unary.f16(i32 25, half [[doth2]]) - ; CHECK: [[splatinserth2:%.*]] = insertelement <2 x half> poison, half [[rsqrt]], i64 0 - ; CHECK: [[splat:%.*]] = shufflevector <2 x half> [[splatinserth2]], <2 x half> poison, <2 x i32> zeroinitializer - ; CHECK: fmul <2 x half> %p0, [[splat]] - - %hlsl.normalize = call <2 x half> @llvm.dx.normalize.v2f16(<2 x half> %p0) - ret <2 x half> %hlsl.normalize -} - -define noundef <3 x half> @test_normalize_half3(<3 x half> noundef %p0) { -entry: - ; EXPCHECK: [[doth3:%.*]] = call half @llvm.dx.dot3.v3f16(<3 x half> %{{.*}}, <3 x half> %{{.*}}) - ; DOPCHECK: [[doth3:%.*]] = call half @dx.op.dot3.f16(i32 55, half %{{.*}}, half %{{.*}}, half %{{.*}}, half %{{.*}}) - ; EXPCHECK: [[rsqrt:%.*]] = call half @llvm.dx.rsqrt.f16(half [[doth3]]) - ; DOPCHECK: [[rsqrt:%.*]] = call half @dx.op.unary.f16(i32 25, half [[doth3]]) - ; CHECK: [[splatinserth3:%.*]] = insertelement <3 x half> poison, half [[rsqrt]], i64 0 - ; CHECK: [[splat:%.*]] shufflevector <3 x half> [[splatinserth3]], <3 x half> poison, <3 x i32> zeroinitializer - ; CHECK: fmul <3 x half> %p0, %.splat - - %hlsl.normalize = call <3 x half> @llvm.dx.normalize.v3f16(<3 x half> %p0) - ret <3 x half> %hlsl.normalize -} - -define noundef <4 x half> @test_normalize_half4(<4 x half> noundef %p0) { -entry: - ; EXPCHECK: [[doth4:%.*]] = call half @llvm.dx.dot4.v4f16(<4 x half> %{{.*}}, <4 x half> %{{.*}}) - ; DOPCHECK: [[doth4:%.*]] = call half @dx.op.dot4.f16(i32 56, half %{{.*}}, half %{{.*}}, half %{{.*}}, half %{{.*}}) - ; EXPCHECK: [[rsqrt:%.*]] = call half @llvm.dx.rsqrt.f16(half [[doth4]]) - ; DOPCHECK: [[rsqrt:%.*]] = call half @dx.op.unary.f16(i32 25, half [[doth4]]) - ; CHECK: [[splatinserth4:%.*]] = insertelement <4 x half> poison, half [[rsqrt]], i64 0 - ; CHECK: [[splat:%.*]] shufflevector <4 x half> [[splatinserth4]], <4 x half> poison, <4 x i32> zeroinitializer - ; CHECK: fmul <4 x half> %p0, %.splat - - %hlsl.normalize = call <4 x half> @llvm.dx.normalize.v4f16(<4 x half> %p0) - ret <4 x half> %hlsl.normalize -} - -define noundef float @test_normalize_float(float noundef %p0) { -entry: - ; CHECK: fdiv float %p0, %p0 - %hlsl.normalize = call float @llvm.dx.normalize.f32(float %p0) - ret float %hlsl.normalize -} - -define noundef <2 x float> @test_normalize_float2(<2 x float> noundef %p0) { -entry: - ; EXPCHECK: [[dotf2:%.*]] = call float @llvm.dx.dot2.v2f32(<2 x float> %{{.*}}, <2 x float> %{{.*}}) - ; DOPCHECK: [[dotf2:%.*]] = call float @dx.op.dot2.f32(i32 54, float %{{.*}}, float %{{.*}}, float %{{.*}}, float %{{.*}}) - ; EXPCHECK: [[rsqrt:%.*]] = call float @llvm.dx.rsqrt.f32(float [[dotf2]]) - ; DOPCHECK: [[rsqrt:%.*]] = call float @dx.op.unary.f32(i32 25, float [[dotf2]]) - ; CHECK: [[splatinsertf2:%.*]] = insertelement <2 x float> poison, float [[rsqrt]], i64 0 - ; CHECK: [[splat:%.*]] shufflevector <2 x float> [[splatinsertf2]], <2 x float> poison, <2 x i32> zeroinitializer - ; CHECK: fmul <2 x float> %p0, %.splat - - %hlsl.normalize = call <2 x float> @llvm.dx.normalize.v2f32(<2 x float> %p0) - ret <2 x float> %hlsl.normalize -} - -define noundef <3 x float> @test_normalize_float3(<3 x float> noundef %p0) { -entry: - ; EXPCHECK: [[dotf3:%.*]] = call float @llvm.dx.dot3.v3f32(<3 x float> %{{.*}}, <3 x float> %{{.*}}) - ; DOPCHECK: [[dotf3:%.*]] = call float @dx.op.dot3.f32(i32 55, float %{{.*}}, float %{{.*}}, float %{{.*}}, float %{{.*}}) - ; EXPCHECK: [[rsqrt:%.*]] = call float @llvm.dx.rsqrt.f32(float [[dotf3]]) - ; DOPCHECK: [[rsqrt:%.*]] = call float @dx.op.unary.f32(i32 25, float [[dotf3]]) - ; CHECK: [[splatinsertf3:%.*]] = insertelement <3 x float> poison, float [[rsqrt]], i64 0 - ; CHECK: [[splat:%.*]] shufflevector <3 x float> [[splatinsertf3]], <3 x float> poison, <3 x i32> zeroinitializer - ; CHECK: fmul <3 x float> %p0, %.splat - - %hlsl.normalize = call <3 x float> @llvm.dx.normalize.v3f32(<3 x float> %p0) - ret <3 x float> %hlsl.normalize -} - -define noundef <4 x float> @test_normalize_float4(<4 x float> noundef %p0) { -entry: - ; EXPCHECK: [[dotf4:%.*]] = call float @llvm.dx.dot4.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}) - ; DOPCHECK: [[dotf4:%.*]] = call float @dx.op.dot4.f32(i32 56, float %{{.*}}, float %{{.*}}, float %{{.*}}, float %{{.*}}) - ; EXPCHECK: [[rsqrt:%.*]] = call float @llvm.dx.rsqrt.f32(float [[dotf4]]) - ; DOPCHECK: [[rsqrt:%.*]] = call float @dx.op.unary.f32(i32 25, float [[dotf4]]) - ; CHECK: [[splatinsertf4:%.*]] = insertelement <4 x float> poison, float [[rsqrt]], i64 0 - ; CHECK: [[splat:%.*]] shufflevector <4 x float> [[splatinsertf4]], <4 x float> poison, <4 x i32> zeroinitializer - ; CHECK: fmul <4 x float> %p0, %.splat - - %hlsl.normalize = call <4 x float> @llvm.dx.normalize.v4f32(<4 x float> %p0) - ret <4 x float> %hlsl.normalize -} +; RUN: opt -S -dxil-intrinsic-expansion < %s | FileCheck %s --check-prefixes=CHECK,EXPCHECK +; RUN: opt -S -dxil-intrinsic-expansion -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library < %s | FileCheck %s --check-prefixes=CHECK,DOPCHECK + +; Make sure dxil operation function calls for normalize are generated for half/float. + +declare half @llvm.dx.normalize.f16(half) +declare <2 x half> @llvm.dx.normalize.v2f16(<2 x half>) +declare <3 x half> @llvm.dx.normalize.v3f16(<3 x half>) +declare <4 x half> @llvm.dx.normalize.v4f16(<4 x half>) + +declare float @llvm.dx.normalize.f32(float) +declare <2 x float> @llvm.dx.normalize.v2f32(<2 x float>) +declare <3 x float> @llvm.dx.normalize.v3f32(<3 x float>) +declare <4 x float> @llvm.dx.normalize.v4f32(<4 x float>) + +define noundef half @test_normalize_half(half noundef %p0) { +entry: + ; CHECK: fdiv half %p0, %p0 + %hlsl.normalize = call half @llvm.dx.normalize.f16(half %p0) + ret half %hlsl.normalize +} + +define noundef <2 x half> @test_normalize_half2(<2 x half> noundef %p0) { +entry: + ; EXPCHECK: [[doth2:%.*]] = call half @llvm.dx.dot2.v2f16(<2 x half> %{{.*}}, <2 x half> %{{.*}}) + ; DOPCHECK: [[doth2:%.*]] = call half @dx.op.dot2.f16(i32 54, half %{{.*}}, half %{{.*}}, half %{{.*}}, half %{{.*}}) + ; EXPCHECK: [[rsqrt:%.*]] = call half @llvm.dx.rsqrt.f16(half [[doth2]]) + ; DOPCHECK: [[rsqrt:%.*]] = call half @dx.op.unary.f16(i32 25, half [[doth2]]) + ; CHECK: [[splatinserth2:%.*]] = insertelement <2 x half> poison, half [[rsqrt]], i64 0 + ; CHECK: [[splat:%.*]] = shufflevector <2 x half> [[splatinserth2]], <2 x half> poison, <2 x i32> zeroinitializer + ; CHECK: fmul <2 x half> %p0, [[splat]] + + %hlsl.normalize = call <2 x half> @llvm.dx.normalize.v2f16(<2 x half> %p0) + ret <2 x half> %hlsl.normalize +} + +define noundef <3 x half> @test_normalize_half3(<3 x half> noundef %p0) { +entry: + ; EXPCHECK: [[doth3:%.*]] = call half @llvm.dx.dot3.v3f16(<3 x half> %{{.*}}, <3 x half> %{{.*}}) + ; DOPCHECK: [[doth3:%.*]] = call half @dx.op.dot3.f16(i32 55, half %{{.*}}, half %{{.*}}, half %{{.*}}, half %{{.*}}) + ; EXPCHECK: [[rsqrt:%.*]] = call half @llvm.dx.rsqrt.f16(half [[doth3]]) + ; DOPCHECK: [[rsqrt:%.*]] = call half @dx.op.unary.f16(i32 25, half [[doth3]]) + ; CHECK: [[splatinserth3:%.*]] = insertelement <3 x half> poison, half [[rsqrt]], i64 0 + ; CHECK: [[splat:%.*]] shufflevector <3 x half> [[splatinserth3]], <3 x half> poison, <3 x i32> zeroinitializer + ; CHECK: fmul <3 x half> %p0, %.splat + + %hlsl.normalize = call <3 x half> @llvm.dx.normalize.v3f16(<3 x half> %p0) + ret <3 x half> %hlsl.normalize +} + +define noundef <4 x half> @test_normalize_half4(<4 x half> noundef %p0) { +entry: + ; EXPCHECK: [[doth4:%.*]] = call half @llvm.dx.dot4.v4f16(<4 x half> %{{.*}}, <4 x half> %{{.*}}) + ; DOPCHECK: [[doth4:%.*]] = call half @dx.op.dot4.f16(i32 56, half %{{.*}}, half %{{.*}}, half %{{.*}}, half %{{.*}}) + ; EXPCHECK: [[rsqrt:%.*]] = call half @llvm.dx.rsqrt.f16(half [[doth4]]) + ; DOPCHECK: [[rsqrt:%.*]] = call half @dx.op.unary.f16(i32 25, half [[doth4]]) + ; CHECK: [[splatinserth4:%.*]] = insertelement <4 x half> poison, half [[rsqrt]], i64 0 + ; CHECK: [[splat:%.*]] shufflevector <4 x half> [[splatinserth4]], <4 x half> poison, <4 x i32> zeroinitializer + ; CHECK: fmul <4 x half> %p0, %.splat + + %hlsl.normalize = call <4 x half> @llvm.dx.normalize.v4f16(<4 x half> %p0) + ret <4 x half> %hlsl.normalize +} + +define noundef float @test_normalize_float(float noundef %p0) { +entry: + ; CHECK: fdiv float %p0, %p0 + %hlsl.normalize = call float @llvm.dx.normalize.f32(float %p0) + ret float %hlsl.normalize +} + +define noundef <2 x float> @test_normalize_float2(<2 x float> noundef %p0) { +entry: + ; EXPCHECK: [[dotf2:%.*]] = call float @llvm.dx.dot2.v2f32(<2 x float> %{{.*}}, <2 x float> %{{.*}}) + ; DOPCHECK: [[dotf2:%.*]] = call float @dx.op.dot2.f32(i32 54, float %{{.*}}, float %{{.*}}, float %{{.*}}, float %{{.*}}) + ; EXPCHECK: [[rsqrt:%.*]] = call float @llvm.dx.rsqrt.f32(float [[dotf2]]) + ; DOPCHECK: [[rsqrt:%.*]] = call float @dx.op.unary.f32(i32 25, float [[dotf2]]) + ; CHECK: [[splatinsertf2:%.*]] = insertelement <2 x float> poison, float [[rsqrt]], i64 0 + ; CHECK: [[splat:%.*]] shufflevector <2 x float> [[splatinsertf2]], <2 x float> poison, <2 x i32> zeroinitializer + ; CHECK: fmul <2 x float> %p0, %.splat + + %hlsl.normalize = call <2 x float> @llvm.dx.normalize.v2f32(<2 x float> %p0) + ret <2 x float> %hlsl.normalize +} + +define noundef <3 x float> @test_normalize_float3(<3 x float> noundef %p0) { +entry: + ; EXPCHECK: [[dotf3:%.*]] = call float @llvm.dx.dot3.v3f32(<3 x float> %{{.*}}, <3 x float> %{{.*}}) + ; DOPCHECK: [[dotf3:%.*]] = call float @dx.op.dot3.f32(i32 55, float %{{.*}}, float %{{.*}}, float %{{.*}}, float %{{.*}}) + ; EXPCHECK: [[rsqrt:%.*]] = call float @llvm.dx.rsqrt.f32(float [[dotf3]]) + ; DOPCHECK: [[rsqrt:%.*]] = call float @dx.op.unary.f32(i32 25, float [[dotf3]]) + ; CHECK: [[splatinsertf3:%.*]] = insertelement <3 x float> poison, float [[rsqrt]], i64 0 + ; CHECK: [[splat:%.*]] shufflevector <3 x float> [[splatinsertf3]], <3 x float> poison, <3 x i32> zeroinitializer + ; CHECK: fmul <3 x float> %p0, %.splat + + %hlsl.normalize = call <3 x float> @llvm.dx.normalize.v3f32(<3 x float> %p0) + ret <3 x float> %hlsl.normalize +} + +define noundef <4 x float> @test_normalize_float4(<4 x float> noundef %p0) { +entry: + ; EXPCHECK: [[dotf4:%.*]] = call float @llvm.dx.dot4.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}) + ; DOPCHECK: [[dotf4:%.*]] = call float @dx.op.dot4.f32(i32 56, float %{{.*}}, float %{{.*}}, float %{{.*}}, float %{{.*}}) + ; EXPCHECK: [[rsqrt:%.*]] = call float @llvm.dx.rsqrt.f32(float [[dotf4]]) + ; DOPCHECK: [[rsqrt:%.*]] = call float @dx.op.unary.f32(i32 25, float [[dotf4]]) + ; CHECK: [[splatinsertf4:%.*]] = insertelement <4 x float> poison, float [[rsqrt]], i64 0 + ; CHECK: [[splat:%.*]] shufflevector <4 x float> [[splatinsertf4]], <4 x float> poison, <4 x i32> zeroinitializer + ; CHECK: fmul <4 x float> %p0, %.splat + + %hlsl.normalize = call <4 x float> @llvm.dx.normalize.v4f32(<4 x float> %p0) + ret <4 x float> %hlsl.normalize +} diff --git a/llvm/test/CodeGen/DirectX/normalize_error.ll b/llvm/test/CodeGen/DirectX/normalize_error.ll index 35a91c0cdc24df..3041d2ecdd923a 100644 --- a/llvm/test/CodeGen/DirectX/normalize_error.ll +++ b/llvm/test/CodeGen/DirectX/normalize_error.ll @@ -1,10 +1,10 @@ -; RUN: not opt -S -dxil-intrinsic-expansion -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library %s 2>&1 | FileCheck %s - -; DXIL operation normalize does not support double overload type -; CHECK: Cannot create Dot2 operation: Invalid overload type - -define noundef <2 x double> @test_normalize_double2(<2 x double> noundef %p0) { -entry: - %hlsl.normalize = call <2 x double> @llvm.dx.normalize.v2f32(<2 x double> %p0) - ret <2 x double> %hlsl.normalize -} +; RUN: not opt -S -dxil-intrinsic-expansion -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library %s 2>&1 | FileCheck %s + +; DXIL operation normalize does not support double overload type +; CHECK: Cannot create Dot2 operation: Invalid overload type + +define noundef <2 x double> @test_normalize_double2(<2 x double> noundef %p0) { +entry: + %hlsl.normalize = call <2 x double> @llvm.dx.normalize.v2f32(<2 x double> %p0) + ret <2 x double> %hlsl.normalize +} diff --git a/llvm/test/CodeGen/DirectX/step.ll b/llvm/test/CodeGen/DirectX/step.ll index 1c9894026c62ec..6a9b5bf71da899 100644 --- a/llvm/test/CodeGen/DirectX/step.ll +++ b/llvm/test/CodeGen/DirectX/step.ll @@ -1,78 +1,78 @@ -; RUN: opt -S -dxil-intrinsic-expansion < %s | FileCheck %s --check-prefix=CHECK -; RUN: opt -S -dxil-intrinsic-expansion -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library < %s | FileCheck %s --check-prefix=CHECK - -; Make sure dxil operation function calls for step are generated for half/float. - -declare half @llvm.dx.step.f16(half, half) -declare <2 x half> @llvm.dx.step.v2f16(<2 x half>, <2 x half>) -declare <3 x half> @llvm.dx.step.v3f16(<3 x half>, <3 x half>) -declare <4 x half> @llvm.dx.step.v4f16(<4 x half>, <4 x half>) - -declare float @llvm.dx.step.f32(float, float) -declare <2 x float> @llvm.dx.step.v2f32(<2 x float>, <2 x float>) -declare <3 x float> @llvm.dx.step.v3f32(<3 x float>, <3 x float>) -declare <4 x float> @llvm.dx.step.v4f32(<4 x float>, <4 x float>) - -define noundef half @test_step_half(half noundef %p0, half noundef %p1) { -entry: - ; CHECK: %0 = fcmp olt half %p1, %p0 - ; CHECK: %1 = select i1 %0, half 0xH0000, half 0xH3C00 - %hlsl.step = call half @llvm.dx.step.f16(half %p0, half %p1) - ret half %hlsl.step -} - -define noundef <2 x half> @test_step_half2(<2 x half> noundef %p0, <2 x half> noundef %p1) { -entry: - ; CHECK: %0 = fcmp olt <2 x half> %p1, %p0 - ; CHECK: %1 = select <2 x i1> %0, <2 x half> zeroinitializer, <2 x half> - %hlsl.step = call <2 x half> @llvm.dx.step.v2f16(<2 x half> %p0, <2 x half> %p1) - ret <2 x half> %hlsl.step -} - -define noundef <3 x half> @test_step_half3(<3 x half> noundef %p0, <3 x half> noundef %p1) { -entry: - ; CHECK: %0 = fcmp olt <3 x half> %p1, %p0 - ; CHECK: %1 = select <3 x i1> %0, <3 x half> zeroinitializer, <3 x half> - %hlsl.step = call <3 x half> @llvm.dx.step.v3f16(<3 x half> %p0, <3 x half> %p1) - ret <3 x half> %hlsl.step -} - -define noundef <4 x half> @test_step_half4(<4 x half> noundef %p0, <4 x half> noundef %p1) { -entry: - ; CHECK: %0 = fcmp olt <4 x half> %p1, %p0 - ; CHECK: %1 = select <4 x i1> %0, <4 x half> zeroinitializer, <4 x half> - %hlsl.step = call <4 x half> @llvm.dx.step.v4f16(<4 x half> %p0, <4 x half> %p1) - ret <4 x half> %hlsl.step -} - -define noundef float @test_step_float(float noundef %p0, float noundef %p1) { -entry: - ; CHECK: %0 = fcmp olt float %p1, %p0 - ; CHECK: %1 = select i1 %0, float 0.000000e+00, float 1.000000e+00 - %hlsl.step = call float @llvm.dx.step.f32(float %p0, float %p1) - ret float %hlsl.step -} - -define noundef <2 x float> @test_step_float2(<2 x float> noundef %p0, <2 x float> noundef %p1) { -entry: - ; CHECK: %0 = fcmp olt <2 x float> %p1, %p0 - ; CHECK: %1 = select <2 x i1> %0, <2 x float> zeroinitializer, <2 x float> - %hlsl.step = call <2 x float> @llvm.dx.step.v2f32(<2 x float> %p0, <2 x float> %p1) - ret <2 x float> %hlsl.step -} - -define noundef <3 x float> @test_step_float3(<3 x float> noundef %p0, <3 x float> noundef %p1) { -entry: - ; CHECK: %0 = fcmp olt <3 x float> %p1, %p0 - ; CHECK: %1 = select <3 x i1> %0, <3 x float> zeroinitializer, <3 x float> - %hlsl.step = call <3 x float> @llvm.dx.step.v3f32(<3 x float> %p0, <3 x float> %p1) - ret <3 x float> %hlsl.step -} - -define noundef <4 x float> @test_step_float4(<4 x float> noundef %p0, <4 x float> noundef %p1) { -entry: - ; CHECK: %0 = fcmp olt <4 x float> %p1, %p0 - ; CHECK: %1 = select <4 x i1> %0, <4 x float> zeroinitializer, <4 x float> - %hlsl.step = call <4 x float> @llvm.dx.step.v4f32(<4 x float> %p0, <4 x float> %p1) - ret <4 x float> %hlsl.step -} +; RUN: opt -S -dxil-intrinsic-expansion < %s | FileCheck %s --check-prefix=CHECK +; RUN: opt -S -dxil-intrinsic-expansion -dxil-op-lower -mtriple=dxil-pc-shadermodel6.3-library < %s | FileCheck %s --check-prefix=CHECK + +; Make sure dxil operation function calls for step are generated for half/float. + +declare half @llvm.dx.step.f16(half, half) +declare <2 x half> @llvm.dx.step.v2f16(<2 x half>, <2 x half>) +declare <3 x half> @llvm.dx.step.v3f16(<3 x half>, <3 x half>) +declare <4 x half> @llvm.dx.step.v4f16(<4 x half>, <4 x half>) + +declare float @llvm.dx.step.f32(float, float) +declare <2 x float> @llvm.dx.step.v2f32(<2 x float>, <2 x float>) +declare <3 x float> @llvm.dx.step.v3f32(<3 x float>, <3 x float>) +declare <4 x float> @llvm.dx.step.v4f32(<4 x float>, <4 x float>) + +define noundef half @test_step_half(half noundef %p0, half noundef %p1) { +entry: + ; CHECK: %0 = fcmp olt half %p1, %p0 + ; CHECK: %1 = select i1 %0, half 0xH0000, half 0xH3C00 + %hlsl.step = call half @llvm.dx.step.f16(half %p0, half %p1) + ret half %hlsl.step +} + +define noundef <2 x half> @test_step_half2(<2 x half> noundef %p0, <2 x half> noundef %p1) { +entry: + ; CHECK: %0 = fcmp olt <2 x half> %p1, %p0 + ; CHECK: %1 = select <2 x i1> %0, <2 x half> zeroinitializer, <2 x half> + %hlsl.step = call <2 x half> @llvm.dx.step.v2f16(<2 x half> %p0, <2 x half> %p1) + ret <2 x half> %hlsl.step +} + +define noundef <3 x half> @test_step_half3(<3 x half> noundef %p0, <3 x half> noundef %p1) { +entry: + ; CHECK: %0 = fcmp olt <3 x half> %p1, %p0 + ; CHECK: %1 = select <3 x i1> %0, <3 x half> zeroinitializer, <3 x half> + %hlsl.step = call <3 x half> @llvm.dx.step.v3f16(<3 x half> %p0, <3 x half> %p1) + ret <3 x half> %hlsl.step +} + +define noundef <4 x half> @test_step_half4(<4 x half> noundef %p0, <4 x half> noundef %p1) { +entry: + ; CHECK: %0 = fcmp olt <4 x half> %p1, %p0 + ; CHECK: %1 = select <4 x i1> %0, <4 x half> zeroinitializer, <4 x half> + %hlsl.step = call <4 x half> @llvm.dx.step.v4f16(<4 x half> %p0, <4 x half> %p1) + ret <4 x half> %hlsl.step +} + +define noundef float @test_step_float(float noundef %p0, float noundef %p1) { +entry: + ; CHECK: %0 = fcmp olt float %p1, %p0 + ; CHECK: %1 = select i1 %0, float 0.000000e+00, float 1.000000e+00 + %hlsl.step = call float @llvm.dx.step.f32(float %p0, float %p1) + ret float %hlsl.step +} + +define noundef <2 x float> @test_step_float2(<2 x float> noundef %p0, <2 x float> noundef %p1) { +entry: + ; CHECK: %0 = fcmp olt <2 x float> %p1, %p0 + ; CHECK: %1 = select <2 x i1> %0, <2 x float> zeroinitializer, <2 x float> + %hlsl.step = call <2 x float> @llvm.dx.step.v2f32(<2 x float> %p0, <2 x float> %p1) + ret <2 x float> %hlsl.step +} + +define noundef <3 x float> @test_step_float3(<3 x float> noundef %p0, <3 x float> noundef %p1) { +entry: + ; CHECK: %0 = fcmp olt <3 x float> %p1, %p0 + ; CHECK: %1 = select <3 x i1> %0, <3 x float> zeroinitializer, <3 x float> + %hlsl.step = call <3 x float> @llvm.dx.step.v3f32(<3 x float> %p0, <3 x float> %p1) + ret <3 x float> %hlsl.step +} + +define noundef <4 x float> @test_step_float4(<4 x float> noundef %p0, <4 x float> noundef %p1) { +entry: + ; CHECK: %0 = fcmp olt <4 x float> %p1, %p0 + ; CHECK: %1 = select <4 x i1> %0, <4 x float> zeroinitializer, <4 x float> + %hlsl.step = call <4 x float> @llvm.dx.step.v4f32(<4 x float> %p0, <4 x float> %p1) + ret <4 x float> %hlsl.step +} diff --git a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/atan2.ll b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/atan2.ll index bdbfc133efa29b..a0306bae4a22de 100644 --- a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/atan2.ll +++ b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/atan2.ll @@ -1,49 +1,49 @@ -; RUN: llc -verify-machineinstrs -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s -; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} - -; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" -; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 -; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 -; CHECK-DAG: %[[#vec4_float_32:]] = OpTypeVector %[[#float_32]] 4 -; CHECK-DAG: %[[#vec4_float_16:]] = OpTypeVector %[[#float_16]] 4 - -define noundef float @atan2_float(float noundef %a, float noundef %b) { -entry: -; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] -; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#]] -; CHECK: %[[#]] = OpExtInst %[[#float_32]] %[[#op_ext_glsl]] Atan2 %[[#arg0]] %[[#arg1]] - %elt.atan2 = call float @llvm.atan2.f32(float %a, float %b) - ret float %elt.atan2 -} - -define noundef half @atan2_half(half noundef %a, half noundef %b) { -entry: -; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] -; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#]] -; CHECK: %[[#]] = OpExtInst %[[#float_16]] %[[#op_ext_glsl]] Atan2 %[[#arg0]] %[[#arg1]] - %elt.atan2 = call half @llvm.atan2.f16(half %a, half %b) - ret half %elt.atan2 -} - -define noundef <4 x float> @atan2_float4(<4 x float> noundef %a, <4 x float> noundef %b) { -entry: - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] - ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#]] - ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_32]] %[[#op_ext_glsl]] Atan2 %[[#arg0]] %[[#arg1]] - %elt.atan2 = call <4 x float> @llvm.atan2.v4f32(<4 x float> %a, <4 x float> %b) - ret <4 x float> %elt.atan2 -} - -define noundef <4 x half> @atan2_half4(<4 x half> noundef %a, <4 x half> noundef %b) { -entry: - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] - ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#]] - ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_16]] %[[#op_ext_glsl]] Atan2 %[[#arg0]] %[[#arg1]] - %elt.atan2 = call <4 x half> @llvm.atan2.v4f16(<4 x half> %a, <4 x half> %b) - ret <4 x half> %elt.atan2 -} - -declare half @llvm.atan2.f16(half, half) -declare float @llvm.atan2.f32(float, float) -declare <4 x half> @llvm.atan2.v4f16(<4 x half>, <4 x half>) -declare <4 x float> @llvm.atan2.v4f32(<4 x float>, <4 x float>) +; RUN: llc -verify-machineinstrs -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s +; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} + +; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" +; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 +; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 +; CHECK-DAG: %[[#vec4_float_32:]] = OpTypeVector %[[#float_32]] 4 +; CHECK-DAG: %[[#vec4_float_16:]] = OpTypeVector %[[#float_16]] 4 + +define noundef float @atan2_float(float noundef %a, float noundef %b) { +entry: +; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] +; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#]] +; CHECK: %[[#]] = OpExtInst %[[#float_32]] %[[#op_ext_glsl]] Atan2 %[[#arg0]] %[[#arg1]] + %elt.atan2 = call float @llvm.atan2.f32(float %a, float %b) + ret float %elt.atan2 +} + +define noundef half @atan2_half(half noundef %a, half noundef %b) { +entry: +; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] +; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#]] +; CHECK: %[[#]] = OpExtInst %[[#float_16]] %[[#op_ext_glsl]] Atan2 %[[#arg0]] %[[#arg1]] + %elt.atan2 = call half @llvm.atan2.f16(half %a, half %b) + ret half %elt.atan2 +} + +define noundef <4 x float> @atan2_float4(<4 x float> noundef %a, <4 x float> noundef %b) { +entry: + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] + ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#]] + ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_32]] %[[#op_ext_glsl]] Atan2 %[[#arg0]] %[[#arg1]] + %elt.atan2 = call <4 x float> @llvm.atan2.v4f32(<4 x float> %a, <4 x float> %b) + ret <4 x float> %elt.atan2 +} + +define noundef <4 x half> @atan2_half4(<4 x half> noundef %a, <4 x half> noundef %b) { +entry: + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] + ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#]] + ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_16]] %[[#op_ext_glsl]] Atan2 %[[#arg0]] %[[#arg1]] + %elt.atan2 = call <4 x half> @llvm.atan2.v4f16(<4 x half> %a, <4 x half> %b) + ret <4 x half> %elt.atan2 +} + +declare half @llvm.atan2.f16(half, half) +declare float @llvm.atan2.f32(float, float) +declare <4 x half> @llvm.atan2.v4f16(<4 x half>, <4 x half>) +declare <4 x float> @llvm.atan2.v4f32(<4 x float>, <4 x float>) diff --git a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/cross.ll b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/cross.ll index 2e0eb8c429ac27..7c06c14bb968d1 100644 --- a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/cross.ll +++ b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/cross.ll @@ -1,33 +1,33 @@ -; RUN: llc -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s -; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} - -; Make sure SPIRV operation function calls for cross are lowered correctly. - -; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" -; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 -; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 -; CHECK-DAG: %[[#vec3_float_16:]] = OpTypeVector %[[#float_16]] 3 -; CHECK-DAG: %[[#vec3_float_32:]] = OpTypeVector %[[#float_32]] 3 - -define noundef <3 x half> @cross_half4(<3 x half> noundef %a, <3 x half> noundef %b) { -entry: - ; CHECK: %[[#]] = OpFunction %[[#vec3_float_16]] None %[[#]] - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec3_float_16]] - ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#vec3_float_16]] - ; CHECK: %[[#]] = OpExtInst %[[#vec3_float_16]] %[[#op_ext_glsl]] Cross %[[#arg0]] %[[#arg1]] - %hlsl.cross = call <3 x half> @llvm.spv.cross.v4f16(<3 x half> %a, <3 x half> %b) - ret <3 x half> %hlsl.cross -} - -define noundef <3 x float> @cross_float4(<3 x float> noundef %a, <3 x float> noundef %b) { -entry: - ; CHECK: %[[#]] = OpFunction %[[#vec3_float_32]] None %[[#]] - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec3_float_32]] - ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#vec3_float_32]] - ; CHECK: %[[#]] = OpExtInst %[[#vec3_float_32]] %[[#op_ext_glsl]] Cross %[[#arg0]] %[[#arg1]] - %hlsl.cross = call <3 x float> @llvm.spv.cross.v4f32(<3 x float> %a, <3 x float> %b) - ret <3 x float> %hlsl.cross -} - -declare <3 x half> @llvm.spv.cross.v4f16(<3 x half>, <3 x half>) -declare <3 x float> @llvm.spv.cross.v4f32(<3 x float>, <3 x float>) +; RUN: llc -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s +; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} + +; Make sure SPIRV operation function calls for cross are lowered correctly. + +; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" +; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 +; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 +; CHECK-DAG: %[[#vec3_float_16:]] = OpTypeVector %[[#float_16]] 3 +; CHECK-DAG: %[[#vec3_float_32:]] = OpTypeVector %[[#float_32]] 3 + +define noundef <3 x half> @cross_half4(<3 x half> noundef %a, <3 x half> noundef %b) { +entry: + ; CHECK: %[[#]] = OpFunction %[[#vec3_float_16]] None %[[#]] + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec3_float_16]] + ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#vec3_float_16]] + ; CHECK: %[[#]] = OpExtInst %[[#vec3_float_16]] %[[#op_ext_glsl]] Cross %[[#arg0]] %[[#arg1]] + %hlsl.cross = call <3 x half> @llvm.spv.cross.v4f16(<3 x half> %a, <3 x half> %b) + ret <3 x half> %hlsl.cross +} + +define noundef <3 x float> @cross_float4(<3 x float> noundef %a, <3 x float> noundef %b) { +entry: + ; CHECK: %[[#]] = OpFunction %[[#vec3_float_32]] None %[[#]] + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec3_float_32]] + ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#vec3_float_32]] + ; CHECK: %[[#]] = OpExtInst %[[#vec3_float_32]] %[[#op_ext_glsl]] Cross %[[#arg0]] %[[#arg1]] + %hlsl.cross = call <3 x float> @llvm.spv.cross.v4f32(<3 x float> %a, <3 x float> %b) + ret <3 x float> %hlsl.cross +} + +declare <3 x half> @llvm.spv.cross.v4f16(<3 x half>, <3 x half>) +declare <3 x float> @llvm.spv.cross.v4f32(<3 x float>, <3 x float>) diff --git a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/length.ll b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/length.ll index b4a9d8e0664b7e..df1ef3a7287c3b 100644 --- a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/length.ll +++ b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/length.ll @@ -1,29 +1,29 @@ -; RUN: llc -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s -; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} - -; Make sure SPIRV operation function calls for length are lowered correctly. - -; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" -; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 -; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 -; CHECK-DAG: %[[#vec4_float_16:]] = OpTypeVector %[[#float_16]] 4 -; CHECK-DAG: %[[#vec4_float_32:]] = OpTypeVector %[[#float_32]] 4 - -define noundef half @length_half4(<4 x half> noundef %a) { -entry: - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] - ; CHECK: %[[#]] = OpExtInst %[[#float_16]] %[[#op_ext_glsl]] Length %[[#arg0]] - %hlsl.length = call half @llvm.spv.length.v4f16(<4 x half> %a) - ret half %hlsl.length -} - -define noundef float @length_float4(<4 x float> noundef %a) { -entry: - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] - ; CHECK: %[[#]] = OpExtInst %[[#float_32]] %[[#op_ext_glsl]] Length %[[#arg0]] - %hlsl.length = call float @llvm.spv.length.v4f32(<4 x float> %a) - ret float %hlsl.length -} - -declare half @llvm.spv.length.v4f16(<4 x half>) -declare float @llvm.spv.length.v4f32(<4 x float>) +; RUN: llc -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s +; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} + +; Make sure SPIRV operation function calls for length are lowered correctly. + +; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" +; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 +; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 +; CHECK-DAG: %[[#vec4_float_16:]] = OpTypeVector %[[#float_16]] 4 +; CHECK-DAG: %[[#vec4_float_32:]] = OpTypeVector %[[#float_32]] 4 + +define noundef half @length_half4(<4 x half> noundef %a) { +entry: + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] + ; CHECK: %[[#]] = OpExtInst %[[#float_16]] %[[#op_ext_glsl]] Length %[[#arg0]] + %hlsl.length = call half @llvm.spv.length.v4f16(<4 x half> %a) + ret half %hlsl.length +} + +define noundef float @length_float4(<4 x float> noundef %a) { +entry: + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#]] + ; CHECK: %[[#]] = OpExtInst %[[#float_32]] %[[#op_ext_glsl]] Length %[[#arg0]] + %hlsl.length = call float @llvm.spv.length.v4f32(<4 x float> %a) + ret float %hlsl.length +} + +declare half @llvm.spv.length.v4f16(<4 x half>) +declare float @llvm.spv.length.v4f32(<4 x float>) diff --git a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/normalize.ll b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/normalize.ll index fa73b9c2a4d3ab..4659b5146e4327 100644 --- a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/normalize.ll +++ b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/normalize.ll @@ -1,31 +1,31 @@ -; RUN: llc -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s -; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} - -; Make sure SPIRV operation function calls for normalize are lowered correctly. - -; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" -; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 -; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 -; CHECK-DAG: %[[#vec4_float_16:]] = OpTypeVector %[[#float_16]] 4 -; CHECK-DAG: %[[#vec4_float_32:]] = OpTypeVector %[[#float_32]] 4 - -define noundef <4 x half> @normalize_half4(<4 x half> noundef %a) { -entry: - ; CHECK: %[[#]] = OpFunction %[[#vec4_float_16]] None %[[#]] - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec4_float_16]] - ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_16]] %[[#op_ext_glsl]] Normalize %[[#arg0]] - %hlsl.normalize = call <4 x half> @llvm.spv.normalize.v4f16(<4 x half> %a) - ret <4 x half> %hlsl.normalize -} - -define noundef <4 x float> @normalize_float4(<4 x float> noundef %a) { -entry: - ; CHECK: %[[#]] = OpFunction %[[#vec4_float_32]] None %[[#]] - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec4_float_32]] - ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_32]] %[[#op_ext_glsl]] Normalize %[[#arg0]] - %hlsl.normalize = call <4 x float> @llvm.spv.normalize.v4f32(<4 x float> %a) - ret <4 x float> %hlsl.normalize -} - -declare <4 x half> @llvm.spv.normalize.v4f16(<4 x half>) -declare <4 x float> @llvm.spv.normalize.v4f32(<4 x float>) +; RUN: llc -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s +; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} + +; Make sure SPIRV operation function calls for normalize are lowered correctly. + +; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" +; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 +; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 +; CHECK-DAG: %[[#vec4_float_16:]] = OpTypeVector %[[#float_16]] 4 +; CHECK-DAG: %[[#vec4_float_32:]] = OpTypeVector %[[#float_32]] 4 + +define noundef <4 x half> @normalize_half4(<4 x half> noundef %a) { +entry: + ; CHECK: %[[#]] = OpFunction %[[#vec4_float_16]] None %[[#]] + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec4_float_16]] + ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_16]] %[[#op_ext_glsl]] Normalize %[[#arg0]] + %hlsl.normalize = call <4 x half> @llvm.spv.normalize.v4f16(<4 x half> %a) + ret <4 x half> %hlsl.normalize +} + +define noundef <4 x float> @normalize_float4(<4 x float> noundef %a) { +entry: + ; CHECK: %[[#]] = OpFunction %[[#vec4_float_32]] None %[[#]] + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec4_float_32]] + ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_32]] %[[#op_ext_glsl]] Normalize %[[#arg0]] + %hlsl.normalize = call <4 x float> @llvm.spv.normalize.v4f32(<4 x float> %a) + ret <4 x float> %hlsl.normalize +} + +declare <4 x half> @llvm.spv.normalize.v4f16(<4 x half>) +declare <4 x float> @llvm.spv.normalize.v4f32(<4 x float>) diff --git a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/step.ll b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/step.ll index bb50d8c790f8ad..7c0ee9398d15fc 100644 --- a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/step.ll +++ b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/step.ll @@ -1,33 +1,33 @@ -; RUN: llc -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s -; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} - -; Make sure SPIRV operation function calls for step are lowered correctly. - -; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" -; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 -; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 -; CHECK-DAG: %[[#vec4_float_16:]] = OpTypeVector %[[#float_16]] 4 -; CHECK-DAG: %[[#vec4_float_32:]] = OpTypeVector %[[#float_32]] 4 - -define noundef <4 x half> @step_half4(<4 x half> noundef %a, <4 x half> noundef %b) { -entry: - ; CHECK: %[[#]] = OpFunction %[[#vec4_float_16]] None %[[#]] - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec4_float_16]] - ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#vec4_float_16]] - ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_16]] %[[#op_ext_glsl]] Step %[[#arg0]] %[[#arg1]] - %hlsl.step = call <4 x half> @llvm.spv.step.v4f16(<4 x half> %a, <4 x half> %b) - ret <4 x half> %hlsl.step -} - -define noundef <4 x float> @step_float4(<4 x float> noundef %a, <4 x float> noundef %b) { -entry: - ; CHECK: %[[#]] = OpFunction %[[#vec4_float_32]] None %[[#]] - ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec4_float_32]] - ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#vec4_float_32]] - ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_32]] %[[#op_ext_glsl]] Step %[[#arg0]] %[[#arg1]] - %hlsl.step = call <4 x float> @llvm.spv.step.v4f32(<4 x float> %a, <4 x float> %b) - ret <4 x float> %hlsl.step -} - -declare <4 x half> @llvm.spv.step.v4f16(<4 x half>, <4 x half>) -declare <4 x float> @llvm.spv.step.v4f32(<4 x float>, <4 x float>) +; RUN: llc -O0 -mtriple=spirv-unknown-unknown %s -o - | FileCheck %s +; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - -filetype=obj | spirv-val %} + +; Make sure SPIRV operation function calls for step are lowered correctly. + +; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450" +; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32 +; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16 +; CHECK-DAG: %[[#vec4_float_16:]] = OpTypeVector %[[#float_16]] 4 +; CHECK-DAG: %[[#vec4_float_32:]] = OpTypeVector %[[#float_32]] 4 + +define noundef <4 x half> @step_half4(<4 x half> noundef %a, <4 x half> noundef %b) { +entry: + ; CHECK: %[[#]] = OpFunction %[[#vec4_float_16]] None %[[#]] + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec4_float_16]] + ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#vec4_float_16]] + ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_16]] %[[#op_ext_glsl]] Step %[[#arg0]] %[[#arg1]] + %hlsl.step = call <4 x half> @llvm.spv.step.v4f16(<4 x half> %a, <4 x half> %b) + ret <4 x half> %hlsl.step +} + +define noundef <4 x float> @step_float4(<4 x float> noundef %a, <4 x float> noundef %b) { +entry: + ; CHECK: %[[#]] = OpFunction %[[#vec4_float_32]] None %[[#]] + ; CHECK: %[[#arg0:]] = OpFunctionParameter %[[#vec4_float_32]] + ; CHECK: %[[#arg1:]] = OpFunctionParameter %[[#vec4_float_32]] + ; CHECK: %[[#]] = OpExtInst %[[#vec4_float_32]] %[[#op_ext_glsl]] Step %[[#arg0]] %[[#arg1]] + %hlsl.step = call <4 x float> @llvm.spv.step.v4f32(<4 x float> %a, <4 x float> %b) + ret <4 x float> %hlsl.step +} + +declare <4 x half> @llvm.spv.step.v4f16(<4 x half>, <4 x half>) +declare <4 x float> @llvm.spv.step.v4f32(<4 x float>, <4 x float>) diff --git a/llvm/test/Demangle/ms-placeholder-return-type.test b/llvm/test/Demangle/ms-placeholder-return-type.test index 18038e636c8d5a..a656400fe140fb 100644 --- a/llvm/test/Demangle/ms-placeholder-return-type.test +++ b/llvm/test/Demangle/ms-placeholder-return-type.test @@ -1,18 +1,18 @@ -; RUN: llvm-undname < %s | FileCheck %s - -; CHECK-NOT: Invalid mangled name - -?TestNonTemplateAuto@@YA at XZ -; CHECK: __cdecl TestNonTemplateAuto(void) - -??$AutoT at X@@YA?A_PXZ -; CHECK: auto __cdecl AutoT(void) - -??$AutoT at X@@YA?B_PXZ -; CHECK: auto const __cdecl AutoT(void) - -??$AutoT at X@@YA?A_TXZ -; CHECK: decltype(auto) __cdecl AutoT(void) - -??$AutoT at X@@YA?B_TXZ -; CHECK: decltype(auto) const __cdecl AutoT(void) +; RUN: llvm-undname < %s | FileCheck %s + +; CHECK-NOT: Invalid mangled name + +?TestNonTemplateAuto@@YA at XZ +; CHECK: __cdecl TestNonTemplateAuto(void) + +??$AutoT at X@@YA?A_PXZ +; CHECK: auto __cdecl AutoT(void) + +??$AutoT at X@@YA?B_PXZ +; CHECK: auto const __cdecl AutoT(void) + +??$AutoT at X@@YA?A_TXZ +; CHECK: decltype(auto) __cdecl AutoT(void) + +??$AutoT at X@@YA?B_TXZ +; CHECK: decltype(auto) const __cdecl AutoT(void) diff --git a/llvm/test/FileCheck/dos-style-eol.txt b/llvm/test/FileCheck/dos-style-eol.txt index 4252aad4d3e7bf..52184f465c3fdf 100644 --- a/llvm/test/FileCheck/dos-style-eol.txt +++ b/llvm/test/FileCheck/dos-style-eol.txt @@ -1,11 +1,11 @@ -// Test for using FileCheck on DOS style end-of-line -// This test was deliberately committed with DOS style end of line. -// Don't change line endings! -// RUN: FileCheck -input-file %s %s -// RUN: FileCheck --strict-whitespace -input-file %s %s - -LINE 1 -; CHECK: {{^}}LINE 1{{$}} - -LINE 2 +// Test for using FileCheck on DOS style end-of-line +// This test was deliberately committed with DOS style end of line. +// Don't change line endings! +// RUN: FileCheck -input-file %s %s +// RUN: FileCheck --strict-whitespace -input-file %s %s + +LINE 1 +; CHECK: {{^}}LINE 1{{$}} + +LINE 2 ; CHECK: {{^}}LINE 2{{$}} \ No newline at end of file diff --git a/llvm/test/tools/llvm-ar/Inputs/mri-crlf.mri b/llvm/test/tools/llvm-ar/Inputs/mri-crlf.mri index 72d23d041ae807..857c4ff87b6cf8 100644 --- a/llvm/test/tools/llvm-ar/Inputs/mri-crlf.mri +++ b/llvm/test/tools/llvm-ar/Inputs/mri-crlf.mri @@ -1,4 +1,4 @@ -; this file intentionally has crlf line endings -create crlf.a -addmod foo.txt -end +; this file intentionally has crlf line endings +create crlf.a +addmod foo.txt +end diff --git a/llvm/test/tools/llvm-cvtres/Inputs/languages.rc b/llvm/test/tools/llvm-cvtres/Inputs/languages.rc index 081b3a77bebc10..82031d0e208395 100644 --- a/llvm/test/tools/llvm-cvtres/Inputs/languages.rc +++ b/llvm/test/tools/llvm-cvtres/Inputs/languages.rc @@ -1,36 +1,36 @@ -#include "windows.h" - -LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US -randomdat RCDATA -{ - "this is a random bit of data that means nothing\0", - 0x23a9, - 0x140e, - 194292, -} - -LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED -randomdat RCDATA -{ - "zhe4 shi4 yi1ge4 sui2ji1 de shu4ju4, zhe4 yi4wei4zhe shen2me\0", - 0x23a9, - 0x140e, - 194292, -} - -LANGUAGE LANG_GERMAN, SUBLANG_GERMAN_LUXEMBOURG -randomdat RCDATA -{ - "Dies ist ein zufälliges Bit von Daten, die nichts bedeutet\0", - 0x23a9, - 0x140e, - 194292, -} - -LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED -myaccelerators ACCELERATORS -{ - "^C", 999, VIRTKEY, ALT - "D", 1100, VIRTKEY, CONTROL, SHIFT - "^R", 444, ASCII, NOINVERT -} +#include "windows.h" + +LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US +randomdat RCDATA +{ + "this is a random bit of data that means nothing\0", + 0x23a9, + 0x140e, + 194292, +} + +LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED +randomdat RCDATA +{ + "zhe4 shi4 yi1ge4 sui2ji1 de shu4ju4, zhe4 yi4wei4zhe shen2me\0", + 0x23a9, + 0x140e, + 194292, +} + +LANGUAGE LANG_GERMAN, SUBLANG_GERMAN_LUXEMBOURG +randomdat RCDATA +{ + "Dies ist ein zufälliges Bit von Daten, die nichts bedeutet\0", + 0x23a9, + 0x140e, + 194292, +} + +LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED +myaccelerators ACCELERATORS +{ + "^C", 999, VIRTKEY, ALT + "D", 1100, VIRTKEY, CONTROL, SHIFT + "^R", 444, ASCII, NOINVERT +} diff --git a/llvm/test/tools/llvm-cvtres/Inputs/test_resource.rc b/llvm/test/tools/llvm-cvtres/Inputs/test_resource.rc index 5ca097baa0f736..494849f57a0a9e 100644 --- a/llvm/test/tools/llvm-cvtres/Inputs/test_resource.rc +++ b/llvm/test/tools/llvm-cvtres/Inputs/test_resource.rc @@ -1,50 +1,50 @@ -#include "windows.h" - -LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US - -myaccelerators ACCELERATORS -{ - "^C", 999, VIRTKEY, ALT - "D", 1100, VIRTKEY, CONTROL, SHIFT - "^R", 444, ASCII, NOINVERT -} - -cursor BITMAP "cursor_small.bmp" -okay BITMAP "okay_small.bmp" - -14432 MENU -LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED -{ - MENUITEM "yu", 100 - MENUITEM "shala", 101 - MENUITEM "kaoya", 102 -} - -testdialog DIALOG 10, 10, 200, 300 -STYLE WS_POPUP | WS_BORDER -CAPTION "Test" -{ - CTEXT "Continue:", 1, 10, 10, 230, 14 - PUSHBUTTON "&OK", 2, 66, 134, 161, 13 -} - -12 ACCELERATORS -{ - "X", 164, VIRTKEY, ALT - "H", 5678, VIRTKEY, CONTROL, SHIFT - "^R", 444, ASCII, NOINVERT -} - -"eat" MENU -LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_AUS -{ - MENUITEM "fish", 100 - MENUITEM "salad", 101 - MENUITEM "duck", 102 -} - - -myresource stringarray { - "this is a user defined resource\0", - "it contains many strings\0", +#include "windows.h" + +LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US + +myaccelerators ACCELERATORS +{ + "^C", 999, VIRTKEY, ALT + "D", 1100, VIRTKEY, CONTROL, SHIFT + "^R", 444, ASCII, NOINVERT +} + +cursor BITMAP "cursor_small.bmp" +okay BITMAP "okay_small.bmp" + +14432 MENU +LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED +{ + MENUITEM "yu", 100 + MENUITEM "shala", 101 + MENUITEM "kaoya", 102 +} + +testdialog DIALOG 10, 10, 200, 300 +STYLE WS_POPUP | WS_BORDER +CAPTION "Test" +{ + CTEXT "Continue:", 1, 10, 10, 230, 14 + PUSHBUTTON "&OK", 2, 66, 134, 161, 13 +} + +12 ACCELERATORS +{ + "X", 164, VIRTKEY, ALT + "H", 5678, VIRTKEY, CONTROL, SHIFT + "^R", 444, ASCII, NOINVERT +} + +"eat" MENU +LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_AUS +{ + MENUITEM "fish", 100 + MENUITEM "salad", 101 + MENUITEM "duck", 102 +} + + +myresource stringarray { + "this is a user defined resource\0", + "it contains many strings\0", } \ No newline at end of file diff --git a/llvm/test/tools/llvm-rc/Inputs/dialog-with-menu.rc b/llvm/test/tools/llvm-rc/Inputs/dialog-with-menu.rc index bb79dca399c219..c700b587af6483 100644 --- a/llvm/test/tools/llvm-rc/Inputs/dialog-with-menu.rc +++ b/llvm/test/tools/llvm-rc/Inputs/dialog-with-menu.rc @@ -1,16 +1,16 @@ -101 DIALOG 0, 0, 362, 246 -STYLE 0x40l | 0x0004l | 0x0008l | 0x0800l | 0x00020000l | - 0x00010000l | 0x80000000l | 0x10000000l | 0x02000000l | 0x00C00000l | - 0x00080000l | 0x00040000l -CAPTION "MakeNSISW" -MENU 104 -FONT 8, "MS Shell Dlg" -BEGIN - CONTROL "",202,"RichEdit20A",0x0004l | 0x0040l | - 0x0100l | 0x0800l | 0x00008000 | - 0x00010000l | 0x00800000l | 0x00200000l,7,22,348,190 - CONTROL "",-1,"Static",0x00000010l,7,220,346,1 - LTEXT "",200,7,230,200,12,0x08000000l - DEFPUSHBUTTON "Test &Installer",203,230,226,60,15,0x08000000l | 0x00010000l - PUSHBUTTON "&Close",2,296,226,49,15,0x00010000l -END +101 DIALOG 0, 0, 362, 246 +STYLE 0x40l | 0x0004l | 0x0008l | 0x0800l | 0x00020000l | + 0x00010000l | 0x80000000l | 0x10000000l | 0x02000000l | 0x00C00000l | + 0x00080000l | 0x00040000l +CAPTION "MakeNSISW" +MENU 104 +FONT 8, "MS Shell Dlg" +BEGIN + CONTROL "",202,"RichEdit20A",0x0004l | 0x0040l | + 0x0100l | 0x0800l | 0x00008000 | + 0x00010000l | 0x00800000l | 0x00200000l,7,22,348,190 + CONTROL "",-1,"Static",0x00000010l,7,220,346,1 + LTEXT "",200,7,230,200,12,0x08000000l + DEFPUSHBUTTON "Test &Installer",203,230,226,60,15,0x08000000l | 0x00010000l + PUSHBUTTON "&Close",2,296,226,49,15,0x00010000l +END diff --git a/llvm/test/tools/llvm-readobj/COFF/Inputs/resources/test_resource.rc b/llvm/test/tools/llvm-readobj/COFF/Inputs/resources/test_resource.rc index fd616520dbe1b3..6ad56bc02d73ca 100644 --- a/llvm/test/tools/llvm-readobj/COFF/Inputs/resources/test_resource.rc +++ b/llvm/test/tools/llvm-readobj/COFF/Inputs/resources/test_resource.rc @@ -1,44 +1,44 @@ -#include "windows.h" - -LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US - -myaccelerators ACCELERATORS -{ - "^C", 999, VIRTKEY, ALT - "D", 1100, VIRTKEY, CONTROL, SHIFT - "^R", 444, ASCII, NOINVERT -} - -cursor BITMAP "cursor_small.bmp" -okay BITMAP "okay_small.bmp" - -14432 MENU -LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED -{ - MENUITEM "yu", 100 - MENUITEM "shala", 101 - MENUITEM "kaoya", 102 -} - -testdialog DIALOG 10, 10, 200, 300 -STYLE WS_POPUP | WS_BORDER -CAPTION "Test" -{ - CTEXT "Continue:", 1, 10, 10, 230, 14 - PUSHBUTTON "&OK", 2, 66, 134, 161, 13 -} - -12 ACCELERATORS -{ - "X", 164, VIRTKEY, ALT - "H", 5678, VIRTKEY, CONTROL, SHIFT - "^R", 444, ASCII, NOINVERT -} - -"eat" MENU -LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_AUS -{ - MENUITEM "fish", 100 - MENUITEM "salad", 101 - MENUITEM "duck", 102 -} +#include "windows.h" + +LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US + +myaccelerators ACCELERATORS +{ + "^C", 999, VIRTKEY, ALT + "D", 1100, VIRTKEY, CONTROL, SHIFT + "^R", 444, ASCII, NOINVERT +} + +cursor BITMAP "cursor_small.bmp" +okay BITMAP "okay_small.bmp" + +14432 MENU +LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED +{ + MENUITEM "yu", 100 + MENUITEM "shala", 101 + MENUITEM "kaoya", 102 +} + +testdialog DIALOG 10, 10, 200, 300 +STYLE WS_POPUP | WS_BORDER +CAPTION "Test" +{ + CTEXT "Continue:", 1, 10, 10, 230, 14 + PUSHBUTTON "&OK", 2, 66, 134, 161, 13 +} + +12 ACCELERATORS +{ + "X", 164, VIRTKEY, ALT + "H", 5678, VIRTKEY, CONTROL, SHIFT + "^R", 444, ASCII, NOINVERT +} + +"eat" MENU +LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_AUS +{ + MENUITEM "fish", 100 + MENUITEM "salad", 101 + MENUITEM "duck", 102 +} diff --git a/llvm/unittests/Support/ModRefTest.cpp b/llvm/unittests/Support/ModRefTest.cpp index 35107e50b32db7..f77e7e39e14eab 100644 --- a/llvm/unittests/Support/ModRefTest.cpp +++ b/llvm/unittests/Support/ModRefTest.cpp @@ -1,27 +1,27 @@ -//===- llvm/unittest/Support/ModRefTest.cpp - ModRef tests ----------------===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// - -#include "llvm/Support/ModRef.h" -#include "llvm/ADT/SmallString.h" -#include "llvm/Support/raw_ostream.h" -#include "gtest/gtest.h" -#include - -using namespace llvm; - -namespace { - -// Verify that printing a MemoryEffects does not end with a ,. -TEST(ModRefTest, PrintMemoryEffects) { - std::string S; - raw_string_ostream OS(S); - OS << MemoryEffects::none(); - EXPECT_EQ(S, "ArgMem: NoModRef, InaccessibleMem: NoModRef, Other: NoModRef"); -} - -} // namespace +//===- llvm/unittest/Support/ModRefTest.cpp - ModRef tests ----------------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#include "llvm/Support/ModRef.h" +#include "llvm/ADT/SmallString.h" +#include "llvm/Support/raw_ostream.h" +#include "gtest/gtest.h" +#include + +using namespace llvm; + +namespace { + +// Verify that printing a MemoryEffects does not end with a ,. +TEST(ModRefTest, PrintMemoryEffects) { + std::string S; + raw_string_ostream OS(S); + OS << MemoryEffects::none(); + EXPECT_EQ(S, "ArgMem: NoModRef, InaccessibleMem: NoModRef, Other: NoModRef"); +} + +} // namespace diff --git a/llvm/utils/LLVMVisualizers/llvm.natvis b/llvm/utils/LLVMVisualizers/llvm.natvis index d83ae8013c51e2..03ca2d33a80ba6 100644 --- a/llvm/utils/LLVMVisualizers/llvm.natvis +++ b/llvm/utils/LLVMVisualizers/llvm.natvis @@ -1,408 +1,408 @@ - - - - - empty - {(value_type*)BeginX,[Size]} - {Size} elements - Uninitialized - - Size - Capacity - - Size - (value_type*)BeginX - - - - - - {U.VAL} - Cannot visualize APInts longer than 64 bits - - - {Data,[Length]} - {Length} elements - Uninitialized - - Length - - Length - Data - - - - - {(const char*)BeginX,[Size]s8} - (const char*)BeginX,[Size] - - Size - Capacity - - Size - (char*)BeginX - - - - - - {First,[Last - First]s8} - - - - {Data,[Length]s8} - Data,[Length]s8 - - Length - - Length - Data - - - - - - {($T1)*(intptr_t *)Data} - - - - - - {($T1)(*(intptr_t *)Value.Data & $T6::PointerBitMask)} - {($T4)((*(intptr_t *)Value.Data >> $T6::IntShift) & $T6::IntMask)} - {$T6::IntMask}: {($T1)(*(intptr_t *)Value.Data & $T6::PointerBitMask)} [{($T4)((*(intptr_t *)Value.Data >> $T6::IntShift) & $T6::IntMask)}] - - ($T1)(*(intptr_t *)Value.Data & $T6::PointerBitMask) - ($T4)((*(intptr_t *)Value.Data >> $T6::IntShift) & $T6::IntMask) - - - - - {($T1)(*(intptr_t *)Value.Data & $T5::PointerBitMask)} - {((*(intptr_t *)Value.Data >> $T5::IntShift) & $T5::IntMask)} - {$T5::IntMask}: {($T1)(*(intptr_t *)Value.Data & $T5::PointerBitMask)} [{((*(intptr_t *)Value.Data >> $T5::IntShift) & $T5::IntMask)}] - - ($T1)(*(intptr_t *)Value.Data & $T5::PointerBitMask) - ((*(intptr_t *)Value.Data >> $T5::IntShift) & $T5::IntMask) - - - - - - {($T4)(*(intptr_t *)Val.Value.Data & $T2::InfoTy::PointerBitMask)} - - - {($T5)(*(intptr_t *)Val.Value.Data & $T2::InfoTy::PointerBitMask)} - - Unexpected index in PointerUnion: {(*(intptr_t *)Val.Value.Data>>$T2::InfoTy::IntShift) & $T2::InfoTy::IntMask} - - "$T4",s8b - - ($T4)(*(intptr_t *)Val.Value.Data & $T2::InfoTy::PointerBitMask) - - "$T5",s8b - - ($T5)(*(intptr_t *)Val.Value.Data & $T2::InfoTy::PointerBitMask) - - - - - - {{ empty }} - {{ head={Head} }} - - - Head - Next - this - - - - - - empty - RefPtr [1 ref] {*Obj} - RefPtr [{Obj->RefCount} refs] {*Obj} - - Obj->RefCount - Obj - - - - - {{ [Small Mode] size={NumNonEmpty}, capacity={CurArraySize} }} - {{ [Big Mode] size={NumNonEmpty}, capacity={CurArraySize} }} - - NumNonEmpty - CurArraySize - - NumNonEmpty - ($T1*)CurArray - - - - - - empty - {{ size={NumEntries}, buckets={NumBuckets} }} - - NumEntries - NumBuckets - - NumBuckets - Buckets - - - - - - {{ size={NumItems}, buckets={NumBuckets} }} - - NumItems - NumBuckets - - NumBuckets - (MapEntryTy**)TheTable - - - - - - empty - ({this+1,s8}, {second}) - - this+1,s - second - - - - - {Data} - - - - None - {Storage.value} - - Storage.value - - - - - Error - {*((storage_type *)TStorage.buffer)} - - *((storage_type *)TStorage.buffer) - *((error_type *)ErrorStorage.buffer) - - - - - - - {{little endian value = {*(($T1*)(unsigned char *)Value.buffer)} }} - - (unsigned char *)Value.buffer,1 - (unsigned char *)Value.buffer,2 - (unsigned char *)Value.buffer,4 - (unsigned char *)Value.buffer,8 - - - - - - {{ big endian value = {*(unsigned char *)Value.buffer} }} - {{ big endian value = {(($T1)(*(unsigned char *)Value.buffer) << 8) - | ($T1)(*((unsigned char *)Value.buffer+1))} }} - {{ big endian value = {(($T1)(*(unsigned char *)Value.buffer) << 24) - | (($T1)(*((unsigned char *)Value.buffer+1)) << 16) - | (($T1)(*((unsigned char *)Value.buffer+2)) << 8) - | ($T1)(*((unsigned char *)Value.buffer+3))} }} - {{ big endian value = {(($T1)(*(unsigned char *)Value.buffer) << 56) - | (($T1)(*((unsigned char *)Value.buffer+1)) << 48) - | (($T1)(*((unsigned char *)Value.buffer+2)) << 40) - | (($T1)(*((unsigned char *)Value.buffer+3)) << 32) - | (($T1)(*((unsigned char *)Value.buffer+4)) << 24) - | (($T1)(*((unsigned char *)Value.buffer+5)) << 16) - | (($T1)(*((unsigned char *)Value.buffer+6)) << 8) - | ($T1)(*((unsigned char *)Value.buffer+7))} }} - - (unsigned char *)Value.buffer,1 - (unsigned char *)Value.buffer,2 - (unsigned char *)Value.buffer,4 - (unsigned char *)Value.buffer,8 - - - - - {ID} - - ID - - SubclassData - - *ContainedTys - - {NumContainedTys - 1} - - - NumContainedTys - 1 - ContainedTys + 1 - - - - SubclassData == 1 - - (SubclassData & llvm::StructType::SCDB_HasBody) != 0 - (SubclassData & llvm::StructType::SCDB_Packed) != 0 - (SubclassData & llvm::StructType::SCDB_IsLiteral) != 0 - (SubclassData & llvm::StructType::SCDB_IsSized) != 0 - - {NumContainedTys} - - - NumContainedTys - ContainedTys - - - - - *ContainedTys - ((llvm::ArrayType*)this)->NumElements - - *ContainedTys - ((llvm::VectorType*)this)->ElementQuantity - - *ContainedTys - ((llvm::VectorType*)this)->ElementQuantity - - SubclassData - *ContainedTys - - Context - - - - - $(Type) {*Value} - - - - $(Type) {(llvm::ISD::NodeType)this->NodeType} - - - NumOperands - OperandList - - - - - - i{Val.BitWidth} {Val.VAL} - - - - {IDAndSubclassData >> 8}bit integer type - - - - $(Type) {*VTy} {this->getName()} {SubclassData} - $(Type) {*VTy} anon {SubclassData} - - (Instruction*)this - (User*)this - - UseList - Next - Prev.Value & 3 == 3 ? (User*)(this + 1) : (User*)(this + 2) - - - - - - - Val - - - - - - - $(Type) {*VTy} {this->getName()} {SubclassData} - $(Type) {*VTy} anon {SubclassData} - - (Value*)this,nd - *VTy - - NumUserOperands - (llvm::Use*)this - NumUserOperands - - - NumUserOperands - *((llvm::Use**)this - 1) - - - - - - {getOpcodeName(SubclassID - InstructionVal)} - - (User*)this,nd - - - - - {this->getName()} {(LinkageTypes)Linkage} {(VisibilityTypes)Visibility} {(DLLStorageClassTypes)DllStorageClass} {(llvm::GlobalValue::ThreadLocalMode) ThreadLocal} - - - - - - - this - Next - this - - - - - - - pImpl - - - - - {ModuleID,s8} {TargetTriple} - - - - $(Type) {PassID} {Kind} - - + + + + + empty + {(value_type*)BeginX,[Size]} + {Size} elements + Uninitialized + + Size + Capacity + + Size + (value_type*)BeginX + + + + + + {U.VAL} + Cannot visualize APInts longer than 64 bits + + + {Data,[Length]} + {Length} elements + Uninitialized + + Length + + Length + Data + + + + + {(const char*)BeginX,[Size]s8} + (const char*)BeginX,[Size] + + Size + Capacity + + Size + (char*)BeginX + + + + + + {First,[Last - First]s8} + + + + {Data,[Length]s8} + Data,[Length]s8 + + Length + + Length + Data + + + + + + {($T1)*(intptr_t *)Data} + + + + + + {($T1)(*(intptr_t *)Value.Data & $T6::PointerBitMask)} + {($T4)((*(intptr_t *)Value.Data >> $T6::IntShift) & $T6::IntMask)} + {$T6::IntMask}: {($T1)(*(intptr_t *)Value.Data & $T6::PointerBitMask)} [{($T4)((*(intptr_t *)Value.Data >> $T6::IntShift) & $T6::IntMask)}] + + ($T1)(*(intptr_t *)Value.Data & $T6::PointerBitMask) + ($T4)((*(intptr_t *)Value.Data >> $T6::IntShift) & $T6::IntMask) + + + + + {($T1)(*(intptr_t *)Value.Data & $T5::PointerBitMask)} + {((*(intptr_t *)Value.Data >> $T5::IntShift) & $T5::IntMask)} + {$T5::IntMask}: {($T1)(*(intptr_t *)Value.Data & $T5::PointerBitMask)} [{((*(intptr_t *)Value.Data >> $T5::IntShift) & $T5::IntMask)}] + + ($T1)(*(intptr_t *)Value.Data & $T5::PointerBitMask) + ((*(intptr_t *)Value.Data >> $T5::IntShift) & $T5::IntMask) + + + + + + {($T4)(*(intptr_t *)Val.Value.Data & $T2::InfoTy::PointerBitMask)} + + + {($T5)(*(intptr_t *)Val.Value.Data & $T2::InfoTy::PointerBitMask)} + + Unexpected index in PointerUnion: {(*(intptr_t *)Val.Value.Data>>$T2::InfoTy::IntShift) & $T2::InfoTy::IntMask} + + "$T4",s8b + + ($T4)(*(intptr_t *)Val.Value.Data & $T2::InfoTy::PointerBitMask) + + "$T5",s8b + + ($T5)(*(intptr_t *)Val.Value.Data & $T2::InfoTy::PointerBitMask) + + + + + + {{ empty }} + {{ head={Head} }} + + + Head + Next + this + + + + + + empty + RefPtr [1 ref] {*Obj} + RefPtr [{Obj->RefCount} refs] {*Obj} + + Obj->RefCount + Obj + + + + + {{ [Small Mode] size={NumNonEmpty}, capacity={CurArraySize} }} + {{ [Big Mode] size={NumNonEmpty}, capacity={CurArraySize} }} + + NumNonEmpty + CurArraySize + + NumNonEmpty + ($T1*)CurArray + + + + + + empty + {{ size={NumEntries}, buckets={NumBuckets} }} + + NumEntries + NumBuckets + + NumBuckets + Buckets + + + + + + {{ size={NumItems}, buckets={NumBuckets} }} + + NumItems + NumBuckets + + NumBuckets + (MapEntryTy**)TheTable + + + + + + empty + ({this+1,s8}, {second}) + + this+1,s + second + + + + + {Data} + + + + None + {Storage.value} + + Storage.value + + + + + Error + {*((storage_type *)TStorage.buffer)} + + *((storage_type *)TStorage.buffer) + *((error_type *)ErrorStorage.buffer) + + + + + + + {{little endian value = {*(($T1*)(unsigned char *)Value.buffer)} }} + + (unsigned char *)Value.buffer,1 + (unsigned char *)Value.buffer,2 + (unsigned char *)Value.buffer,4 + (unsigned char *)Value.buffer,8 + + + + + + {{ big endian value = {*(unsigned char *)Value.buffer} }} + {{ big endian value = {(($T1)(*(unsigned char *)Value.buffer) << 8) + | ($T1)(*((unsigned char *)Value.buffer+1))} }} + {{ big endian value = {(($T1)(*(unsigned char *)Value.buffer) << 24) + | (($T1)(*((unsigned char *)Value.buffer+1)) << 16) + | (($T1)(*((unsigned char *)Value.buffer+2)) << 8) + | ($T1)(*((unsigned char *)Value.buffer+3))} }} + {{ big endian value = {(($T1)(*(unsigned char *)Value.buffer) << 56) + | (($T1)(*((unsigned char *)Value.buffer+1)) << 48) + | (($T1)(*((unsigned char *)Value.buffer+2)) << 40) + | (($T1)(*((unsigned char *)Value.buffer+3)) << 32) + | (($T1)(*((unsigned char *)Value.buffer+4)) << 24) + | (($T1)(*((unsigned char *)Value.buffer+5)) << 16) + | (($T1)(*((unsigned char *)Value.buffer+6)) << 8) + | ($T1)(*((unsigned char *)Value.buffer+7))} }} + + (unsigned char *)Value.buffer,1 + (unsigned char *)Value.buffer,2 + (unsigned char *)Value.buffer,4 + (unsigned char *)Value.buffer,8 + + + + + {ID} + + ID + + SubclassData + + *ContainedTys + + {NumContainedTys - 1} + + + NumContainedTys - 1 + ContainedTys + 1 + + + + SubclassData == 1 + + (SubclassData & llvm::StructType::SCDB_HasBody) != 0 + (SubclassData & llvm::StructType::SCDB_Packed) != 0 + (SubclassData & llvm::StructType::SCDB_IsLiteral) != 0 + (SubclassData & llvm::StructType::SCDB_IsSized) != 0 + + {NumContainedTys} + + + NumContainedTys + ContainedTys + + + + + *ContainedTys + ((llvm::ArrayType*)this)->NumElements + + *ContainedTys + ((llvm::VectorType*)this)->ElementQuantity + + *ContainedTys + ((llvm::VectorType*)this)->ElementQuantity + + SubclassData + *ContainedTys + + Context + + + + + $(Type) {*Value} + + + + $(Type) {(llvm::ISD::NodeType)this->NodeType} + + + NumOperands + OperandList + + + + + + i{Val.BitWidth} {Val.VAL} + + + + {IDAndSubclassData >> 8}bit integer type + + + + $(Type) {*VTy} {this->getName()} {SubclassData} + $(Type) {*VTy} anon {SubclassData} + + (Instruction*)this + (User*)this + + UseList + Next + Prev.Value & 3 == 3 ? (User*)(this + 1) : (User*)(this + 2) + + + + + + + Val + + + + + + + $(Type) {*VTy} {this->getName()} {SubclassData} + $(Type) {*VTy} anon {SubclassData} + + (Value*)this,nd + *VTy + + NumUserOperands + (llvm::Use*)this - NumUserOperands + + + NumUserOperands + *((llvm::Use**)this - 1) + + + + + + {getOpcodeName(SubclassID - InstructionVal)} + + (User*)this,nd + + + + + {this->getName()} {(LinkageTypes)Linkage} {(VisibilityTypes)Visibility} {(DLLStorageClassTypes)DllStorageClass} {(llvm::GlobalValue::ThreadLocalMode) ThreadLocal} + + + + + + + this + Next + this + + + + + + + pImpl + + + + + {ModuleID,s8} {TargetTriple} + + + + $(Type) {PassID} {Kind} + + diff --git a/llvm/utils/lit/tests/Inputs/shtest-shell/diff-in.dos b/llvm/utils/lit/tests/Inputs/shtest-shell/diff-in.dos index 7a0560654c5c70..0f25621c787ed3 100644 --- a/llvm/utils/lit/tests/Inputs/shtest-shell/diff-in.dos +++ b/llvm/utils/lit/tests/Inputs/shtest-shell/diff-in.dos @@ -1,3 +1,3 @@ -In this file, the -sequence "\r\n" -terminates lines. +In this file, the +sequence "\r\n" +terminates lines. diff --git a/llvm/utils/release/build_llvm_release.bat b/llvm/utils/release/build_llvm_release.bat index dd041d7d384ec4..3718673ae7a28d 100755 --- a/llvm/utils/release/build_llvm_release.bat +++ b/llvm/utils/release/build_llvm_release.bat @@ -1,515 +1,515 @@ - at echo off -setlocal enabledelayedexpansion - -goto begin - -:usage -echo Script for building the LLVM installer on Windows, -echo used for the releases at https://github.com/llvm/llvm-project/releases -echo. -echo Usage: build_llvm_release.bat --version ^ [--x86,--x64, --arm64] [--skip-checkout] [--local-python] -echo. -echo Options: -echo --version: [required] version to build -echo --help: display this help -echo --x86: build and test x86 variant -echo --x64: build and test x64 variant -echo --arm64: build and test arm64 variant -echo --skip-checkout: use local git checkout instead of downloading src.zip -echo --local-python: use installed Python and does not try to use a specific version (3.10) -echo. -echo Note: At least one variant to build is required. -echo. -echo Example: build_llvm_release.bat --version 15.0.0 --x86 --x64 -exit /b 1 - -:begin - -::============================================================================== -:: parse args -set version= -set help= -set x86= -set x64= -set arm64= -set skip-checkout= -set local-python= -call :parse_args %* - -if "%help%" NEQ "" goto usage - -if "%version%" == "" ( - echo --version option is required - echo ============================= - goto usage -) - -if "%arm64%" == "" if "%x64%" == "" if "%x86%" == "" ( - echo nothing to build! - echo choose one or several variants from: --x86 --x64 --arm64 - exit /b 1 -) - -::============================================================================== -:: check prerequisites -REM Note: -REM 7zip versions 21.x and higher will try to extract the symlinks in -REM llvm's git archive, which requires running as administrator. - -REM Check 7-zip version and/or administrator permissions. -for /f "delims=" %%i in ('7z.exe ^| findstr /r "2[1-9].[0-9][0-9]"') do set version_7z=%%i -if not "%version_7z%"=="" ( - REM Unique temporary filename to use by the 'mklink' command. - set "link_name=%temp%\%username%_%random%_%random%.tmp" - - REM As the 'mklink' requires elevated permissions, the symbolic link - REM creation will fail if the script is not running as administrator. - mklink /d "!link_name!" . 1>nul 2>nul - if errorlevel 1 ( - echo. - echo Script requires administrator permissions, or a 7-zip version 20.x or older. - echo Current version is "%version_7z%" - exit /b 1 - ) else ( - REM Remove the temporary symbolic link. - rd "!link_name!" - ) -) - -REM Prerequisites: -REM -REM Visual Studio 2019, CMake, Ninja, GNUWin32, SWIG, Python 3, -REM NSIS with the strlen_8192 patch, -REM Perl (for the OpenMP run-time). -REM -REM -REM For LLDB, SWIG version 4.1.1 should be used. -REM - -:: Detect Visual Studio -set vsinstall= -set vswhere=%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\vswhere.exe - -if "%VSINSTALLDIR%" NEQ "" ( - echo using enabled Visual Studio installation - set "vsinstall=%VSINSTALLDIR%" -) else ( - echo using vswhere to detect Visual Studio installation - FOR /F "delims=" %%r IN ('^""%vswhere%" -nologo -latest -products "*" -all -property installationPath^"') DO set vsinstall=%%r -) -set "vsdevcmd=%vsinstall%\Common7\Tools\VsDevCmd.bat" - -if not exist "%vsdevcmd%" ( - echo Can't find any installation of Visual Studio - exit /b 1 -) -echo Using VS devcmd: %vsdevcmd% - -::============================================================================== -:: start echoing what we do - at echo on - -set python32_dir=C:\Users\%USERNAME%\AppData\Local\Programs\Python\Python310-32 -set python64_dir=C:\Users\%USERNAME%\AppData\Local\Programs\Python\Python310 -set pythonarm64_dir=C:\Users\%USERNAME%\AppData\Local\Programs\Python\Python311-arm64 - -set revision=llvmorg-%version% -set package_version=%version% -set build_dir=%cd%\llvm_package_%package_version% - -echo Revision: %revision% -echo Package version: %package_version% -echo Build dir: %build_dir% -echo. - -if exist %build_dir% ( - echo Build directory already exists: %build_dir% - exit /b 1 -) -mkdir %build_dir% -cd %build_dir% || exit /b 1 - -if "%skip-checkout%" == "true" ( - echo Using local source - set llvm_src=%~dp0..\..\.. -) else ( - echo Checking out %revision% - curl -L https://github.com/llvm/llvm-project/archive/%revision%.zip -o src.zip || exit /b 1 - 7z x src.zip || exit /b 1 - mv llvm-project-* llvm-project || exit /b 1 - set llvm_src=%build_dir%\llvm-project -) - -curl -O https://gitlab.gnome.org/GNOME/libxml2/-/archive/v2.9.12/libxml2-v2.9.12.tar.gz || exit /b 1 -tar zxf libxml2-v2.9.12.tar.gz - -REM Setting CMAKE_CL_SHOWINCLUDES_PREFIX to work around PR27226. -REM Common flags for all builds. -set common_compiler_flags=-DLIBXML_STATIC -set common_cmake_flags=^ - -DCMAKE_BUILD_TYPE=Release ^ - -DLLVM_ENABLE_ASSERTIONS=OFF ^ - -DLLVM_INSTALL_TOOLCHAIN_ONLY=ON ^ - -DLLVM_TARGETS_TO_BUILD="AArch64;ARM;X86" ^ - -DLLVM_BUILD_LLVM_C_DYLIB=ON ^ - -DCMAKE_INSTALL_UCRT_LIBRARIES=ON ^ - -DPython3_FIND_REGISTRY=NEVER ^ - -DPACKAGE_VERSION=%package_version% ^ - -DLLDB_RELOCATABLE_PYTHON=1 ^ - -DLLDB_EMBED_PYTHON_HOME=OFF ^ - -DCMAKE_CL_SHOWINCLUDES_PREFIX="Note: including file: " ^ - -DLLVM_ENABLE_LIBXML2=FORCE_ON ^ - -DLLDB_ENABLE_LIBXML2=OFF ^ - -DCLANG_ENABLE_LIBXML2=OFF ^ - -DCMAKE_C_FLAGS="%common_compiler_flags%" ^ - -DCMAKE_CXX_FLAGS="%common_compiler_flags%" ^ - -DLLVM_ENABLE_RPMALLOC=ON ^ - -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;lld;compiler-rt;lldb;openmp" - -set cmake_profile_flags="" - -REM Preserve original path -set OLDPATH=%PATH% - -REM Build the 32-bits and/or 64-bits binaries. -if "%x86%" == "true" call :do_build_32 || exit /b 1 -if "%x64%" == "true" call :do_build_64 || exit /b 1 -if "%arm64%" == "true" call :do_build_arm64 || exit /b 1 -exit /b 0 - -::============================================================================== -:: Build 32-bits binaries. -::============================================================================== -:do_build_32 -call :set_environment %python32_dir% || exit /b 1 -call "%vsdevcmd%" -arch=x86 || exit /b 1 - at echo on -mkdir build32_stage0 -cd build32_stage0 -call :do_build_libxml || exit /b 1 - -REM Stage0 binaries directory; used in stage1. -set "stage0_bin_dir=%build_dir%/build32_stage0/bin" -set cmake_flags=^ - %common_cmake_flags% ^ - -DLLVM_ENABLE_RPMALLOC=OFF ^ - -DLLDB_TEST_COMPILER=%stage0_bin_dir%/clang.exe ^ - -DPYTHON_HOME=%PYTHONHOME% ^ - -DPython3_ROOT_DIR=%PYTHONHOME% ^ - -DLIBXML2_INCLUDE_DIR=%libxmldir%/include/libxml2 ^ - -DLIBXML2_LIBRARIES=%libxmldir%/lib/libxml2s.lib - -cmake -GNinja %cmake_flags% %llvm_src%\llvm || exit /b 1 -ninja || ninja || ninja || exit /b 1 -REM ninja check-llvm || ninja check-llvm || ninja check-llvm || exit /b 1 -REM ninja check-clang || ninja check-clang || ninja check-clang || exit /b 1 -ninja check-lld || ninja check-lld || ninja check-lld || exit /b 1 -ninja check-sanitizer || ninja check-sanitizer || ninja check-sanitizer || exit /b 1 -REM ninja check-clang-tools || ninja check-clang-tools || ninja check-clang-tools || exit /b 1 -cd.. - -REM CMake expects the paths that specifies the compiler and linker to be -REM with forward slash. -set all_cmake_flags=^ - %cmake_flags% ^ - -DCMAKE_C_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ - -DCMAKE_CXX_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ - -DCMAKE_LINKER=%stage0_bin_dir%/lld-link.exe ^ - -DCMAKE_AR=%stage0_bin_dir%/llvm-lib.exe ^ - -DCMAKE_RC=%stage0_bin_dir%/llvm-windres.exe -set cmake_flags=%all_cmake_flags:\=/% - -mkdir build32 -cd build32 -cmake -GNinja %cmake_flags% %llvm_src%\llvm || exit /b 1 -ninja || ninja || ninja || exit /b 1 -REM ninja check-llvm || ninja check-llvm || ninja check-llvm || exit /b 1 -REM ninja check-clang || ninja check-clang || ninja check-clang || exit /b 1 -ninja check-lld || ninja check-lld || ninja check-lld || exit /b 1 -ninja check-sanitizer || ninja check-sanitizer || ninja check-sanitizer || exit /b 1 -REM ninja check-clang-tools || ninja check-clang-tools || ninja check-clang-tools || exit /b 1 -ninja package || exit /b 1 -cd .. - -exit /b 0 -::============================================================================== - -::============================================================================== -:: Build 64-bits binaries. -::============================================================================== -:do_build_64 -call :set_environment %python64_dir% || exit /b 1 -call "%vsdevcmd%" -arch=amd64 || exit /b 1 - at echo on -mkdir build64_stage0 -cd build64_stage0 -call :do_build_libxml || exit /b 1 - -REM Stage0 binaries directory; used in stage1. -set "stage0_bin_dir=%build_dir%/build64_stage0/bin" -set cmake_flags=^ - %common_cmake_flags% ^ - -DLLDB_TEST_COMPILER=%stage0_bin_dir%/clang.exe ^ - -DPYTHON_HOME=%PYTHONHOME% ^ - -DPython3_ROOT_DIR=%PYTHONHOME% ^ - -DLIBXML2_INCLUDE_DIR=%libxmldir%/include/libxml2 ^ - -DLIBXML2_LIBRARIES=%libxmldir%/lib/libxml2s.lib - -cmake -GNinja %cmake_flags% %llvm_src%\llvm || exit /b 1 -ninja || ninja || ninja || exit /b 1 -ninja check-llvm || ninja check-llvm || ninja check-llvm || exit /b 1 -ninja check-clang || ninja check-clang || ninja check-clang || exit /b 1 -ninja check-lld || ninja check-lld || ninja check-lld || exit /b 1 -ninja check-sanitizer || ninja check-sanitizer || ninja check-sanitizer || exit /b 1 -ninja check-clang-tools || ninja check-clang-tools || ninja check-clang-tools || exit /b 1 -ninja check-clangd || ninja check-clangd || ninja check-clangd || exit /b 1 -cd.. - -REM CMake expects the paths that specifies the compiler and linker to be -REM with forward slash. -set all_cmake_flags=^ - %cmake_flags% ^ - -DCMAKE_C_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ - -DCMAKE_CXX_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ - -DCMAKE_LINKER=%stage0_bin_dir%/lld-link.exe ^ - -DCMAKE_AR=%stage0_bin_dir%/llvm-lib.exe ^ - -DCMAKE_RC=%stage0_bin_dir%/llvm-windres.exe -set cmake_flags=%all_cmake_flags:\=/% - - -mkdir build64 -cd build64 -call :do_generate_profile || exit /b 1 -cmake -GNinja %cmake_flags% %cmake_profile_flags% %llvm_src%\llvm || exit /b 1 -ninja || ninja || ninja || exit /b 1 -ninja check-llvm || ninja check-llvm || ninja check-llvm || exit /b 1 -ninja check-clang || ninja check-clang || ninja check-clang || exit /b 1 -ninja check-lld || ninja check-lld || ninja check-lld || exit /b 1 -ninja check-sanitizer || ninja check-sanitizer || ninja check-sanitizer || exit /b 1 -ninja check-clang-tools || ninja check-clang-tools || ninja check-clang-tools || exit /b 1 -ninja check-clangd || ninja check-clangd || ninja check-clangd || exit /b 1 -ninja package || exit /b 1 - -:: generate tarball with install toolchain only off -set filename=clang+llvm-%version%-x86_64-pc-windows-msvc -cmake -GNinja %cmake_flags% %cmake_profile_flags% -DLLVM_INSTALL_TOOLCHAIN_ONLY=OFF ^ - -DCMAKE_INSTALL_PREFIX=%build_dir%/%filename% ..\llvm-project\llvm || exit /b 1 -ninja install || exit /b 1 -:: check llvm_config is present & returns something -%build_dir%/%filename%/bin/llvm-config.exe --bindir || exit /b 1 -cd .. -7z a -ttar -so %filename%.tar %filename% | 7z a -txz -si %filename%.tar.xz - -exit /b 0 -::============================================================================== - -::============================================================================== -:: Build arm64 binaries. -::============================================================================== -:do_build_arm64 -call :set_environment %pythonarm64_dir% || exit /b 1 -call "%vsdevcmd%" -host_arch=x64 -arch=arm64 || exit /b 1 - at echo on -mkdir build_arm64_stage0 -cd build_arm64_stage0 -call :do_build_libxml || exit /b 1 - -REM Stage0 binaries directory; used in stage1. -set "stage0_bin_dir=%build_dir%/build_arm64_stage0/bin" -set cmake_flags=^ - %common_cmake_flags% ^ - -DCLANG_DEFAULT_LINKER=lld ^ - -DLIBXML2_INCLUDE_DIR=%libxmldir%/include/libxml2 ^ - -DLIBXML2_LIBRARIES=%libxmldir%/lib/libxml2s.lib ^ - -DPython3_ROOT_DIR=%PYTHONHOME% ^ - -DCOMPILER_RT_BUILD_PROFILE=OFF ^ - -DCOMPILER_RT_BUILD_SANITIZERS=OFF - -REM We need to build stage0 compiler-rt with clang-cl (msvc lacks some builtins). -cmake -GNinja %cmake_flags% ^ - -DCMAKE_C_COMPILER=clang-cl.exe ^ - -DCMAKE_CXX_COMPILER=clang-cl.exe ^ - %llvm_src%\llvm || exit /b 1 -ninja || exit /b 1 -::ninja check-llvm || exit /b 1 -::ninja check-clang || exit /b 1 -::ninja check-lld || exit /b 1 -::ninja check-sanitizer || exit /b 1 -::ninja check-clang-tools || exit /b 1 -::ninja check-clangd || exit /b 1 -cd.. - -REM CMake expects the paths that specifies the compiler and linker to be -REM with forward slash. -REM CPACK_SYSTEM_NAME is set to have a correct name for installer generated. -set all_cmake_flags=^ - %cmake_flags% ^ - -DCMAKE_C_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ - -DCMAKE_CXX_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ - -DCMAKE_LINKER=%stage0_bin_dir%/lld-link.exe ^ - -DCMAKE_AR=%stage0_bin_dir%/llvm-lib.exe ^ - -DCMAKE_RC=%stage0_bin_dir%/llvm-windres.exe ^ - -DCPACK_SYSTEM_NAME=woa64 -set cmake_flags=%all_cmake_flags:\=/% - -mkdir build_arm64 -cd build_arm64 -cmake -GNinja %cmake_flags% %llvm_src%\llvm || exit /b 1 -ninja || exit /b 1 -REM Check but do not fail on errors. -ninja check-lldb -::ninja check-llvm || exit /b 1 -::ninja check-clang || exit /b 1 -::ninja check-lld || exit /b 1 -::ninja check-sanitizer || exit /b 1 -::ninja check-clang-tools || exit /b 1 -::ninja check-clangd || exit /b 1 -ninja package || exit /b 1 -cd .. - -exit /b 0 -::============================================================================== -:: -::============================================================================== -:: Set PATH and some environment variables. -::============================================================================== -:set_environment -REM Restore original path -set PATH=%OLDPATH% - -set python_dir=%1 - -REM Set Python environment -if "%local-python%" == "true" ( - FOR /F "delims=" %%i IN ('where python.exe ^| head -1') DO set python_exe=%%i - set PYTHONHOME=!python_exe:~0,-11! -) else ( - %python_dir%/python.exe --version || exit /b 1 - set PYTHONHOME=%python_dir% -) -set PATH=%PYTHONHOME%;%PATH% - -set "VSCMD_START_DIR=%build_dir%" - -exit /b 0 - -::============================================================================= - -::============================================================================== -:: Build libxml. -::============================================================================== -:do_build_libxml -mkdir libxmlbuild -cd libxmlbuild -cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=install ^ - -DBUILD_SHARED_LIBS=OFF -DLIBXML2_WITH_C14N=OFF -DLIBXML2_WITH_CATALOG=OFF ^ - -DLIBXML2_WITH_DEBUG=OFF -DLIBXML2_WITH_DOCB=OFF -DLIBXML2_WITH_FTP=OFF ^ - -DLIBXML2_WITH_HTML=OFF -DLIBXML2_WITH_HTTP=OFF -DLIBXML2_WITH_ICONV=OFF ^ - -DLIBXML2_WITH_ICU=OFF -DLIBXML2_WITH_ISO8859X=OFF -DLIBXML2_WITH_LEGACY=OFF ^ - -DLIBXML2_WITH_LZMA=OFF -DLIBXML2_WITH_MEM_DEBUG=OFF -DLIBXML2_WITH_MODULES=OFF ^ - -DLIBXML2_WITH_OUTPUT=ON -DLIBXML2_WITH_PATTERN=OFF -DLIBXML2_WITH_PROGRAMS=OFF ^ - -DLIBXML2_WITH_PUSH=OFF -DLIBXML2_WITH_PYTHON=OFF -DLIBXML2_WITH_READER=OFF ^ - -DLIBXML2_WITH_REGEXPS=OFF -DLIBXML2_WITH_RUN_DEBUG=OFF -DLIBXML2_WITH_SAX1=OFF ^ - -DLIBXML2_WITH_SCHEMAS=OFF -DLIBXML2_WITH_SCHEMATRON=OFF -DLIBXML2_WITH_TESTS=OFF ^ - -DLIBXML2_WITH_THREADS=ON -DLIBXML2_WITH_THREAD_ALLOC=OFF -DLIBXML2_WITH_TREE=ON ^ - -DLIBXML2_WITH_VALID=OFF -DLIBXML2_WITH_WRITER=OFF -DLIBXML2_WITH_XINCLUDE=OFF ^ - -DLIBXML2_WITH_XPATH=OFF -DLIBXML2_WITH_XPTR=OFF -DLIBXML2_WITH_ZLIB=OFF ^ - -DCMAKE_MSVC_RUNTIME_LIBRARY=MultiThreaded ^ - ../../libxml2-v2.9.12 || exit /b 1 -ninja install || exit /b 1 -set libxmldir=%cd%\install -set "libxmldir=%libxmldir:\=/%" -cd .. -exit /b 0 - -::============================================================================== -:: Generate a PGO profile. -::============================================================================== -:do_generate_profile -REM Build Clang with instrumentation. -mkdir instrument -cd instrument -cmake -GNinja %cmake_flags% -DLLVM_TARGETS_TO_BUILD=Native ^ - -DLLVM_BUILD_INSTRUMENTED=IR %llvm_src%\llvm || exit /b 1 -ninja clang || ninja clang || ninja clang || exit /b 1 -set instrumented_clang=%cd:\=/%/bin/clang-cl.exe -cd .. -REM Use that to build part of llvm to generate a profile. -mkdir train -cd train -cmake -GNinja %cmake_flags% ^ - -DCMAKE_C_COMPILER=%instrumented_clang% ^ - -DCMAKE_CXX_COMPILER=%instrumented_clang% ^ - -DLLVM_ENABLE_PROJECTS=clang ^ - -DLLVM_TARGETS_TO_BUILD=Native ^ - %llvm_src%\llvm || exit /b 1 -REM Drop profiles generated from running cmake; those are not representative. -del ..\instrument\profiles\*.profraw -ninja tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/Sema.cpp.obj -cd .. -set profile=%cd:\=/%/profile.profdata -%stage0_bin_dir%\llvm-profdata merge -output=%profile% instrument\profiles\*.profraw || exit /b 1 -set common_compiler_flags=%common_compiler_flags% -Wno-backend-plugin -set cmake_profile_flags=-DLLVM_PROFDATA_FILE=%profile% ^ - -DCMAKE_C_FLAGS="%common_compiler_flags%" ^ - -DCMAKE_CXX_FLAGS="%common_compiler_flags%" -exit /b 0 - -::============================================================================= -:: Parse command line arguments. -:: The format for the arguments is: -:: Boolean: --option -:: Value: --optionvalue -:: with being: space, colon, semicolon or equal sign -:: -:: Command line usage example: -:: my-batch-file.bat --build --type=release --version 123 -:: It will create 3 variables: -:: 'build' with the value 'true' -:: 'type' with the value 'release' -:: 'version' with the value '123' -:: -:: Usage: -:: set "build=" -:: set "type=" -:: set "version=" -:: -:: REM Parse arguments. -:: call :parse_args %* -:: -:: if defined build ( -:: ... -:: ) -:: if %type%=='release' ( -:: ... -:: ) -:: if %version%=='123' ( -:: ... -:: ) -::============================================================================= -:parse_args - set "arg_name=" - :parse_args_start - if "%1" == "" ( - :: Set a seen boolean argument. - if "%arg_name%" neq "" ( - set "%arg_name%=true" - ) - goto :parse_args_done - ) - set aux=%1 - if "%aux:~0,2%" == "--" ( - :: Set a seen boolean argument. - if "%arg_name%" neq "" ( - set "%arg_name%=true" - ) - set "arg_name=%aux:~2,250%" - ) else ( - set "%arg_name%=%1" - set "arg_name=" - ) - shift - goto :parse_args_start - -:parse_args_done -exit /b 0 + at echo off +setlocal enabledelayedexpansion + +goto begin + +:usage +echo Script for building the LLVM installer on Windows, +echo used for the releases at https://github.com/llvm/llvm-project/releases +echo. +echo Usage: build_llvm_release.bat --version ^ [--x86,--x64, --arm64] [--skip-checkout] [--local-python] +echo. +echo Options: +echo --version: [required] version to build +echo --help: display this help +echo --x86: build and test x86 variant +echo --x64: build and test x64 variant +echo --arm64: build and test arm64 variant +echo --skip-checkout: use local git checkout instead of downloading src.zip +echo --local-python: use installed Python and does not try to use a specific version (3.10) +echo. +echo Note: At least one variant to build is required. +echo. +echo Example: build_llvm_release.bat --version 15.0.0 --x86 --x64 +exit /b 1 + +:begin + +::============================================================================== +:: parse args +set version= +set help= +set x86= +set x64= +set arm64= +set skip-checkout= +set local-python= +call :parse_args %* + +if "%help%" NEQ "" goto usage + +if "%version%" == "" ( + echo --version option is required + echo ============================= + goto usage +) + +if "%arm64%" == "" if "%x64%" == "" if "%x86%" == "" ( + echo nothing to build! + echo choose one or several variants from: --x86 --x64 --arm64 + exit /b 1 +) + +::============================================================================== +:: check prerequisites +REM Note: +REM 7zip versions 21.x and higher will try to extract the symlinks in +REM llvm's git archive, which requires running as administrator. + +REM Check 7-zip version and/or administrator permissions. +for /f "delims=" %%i in ('7z.exe ^| findstr /r "2[1-9].[0-9][0-9]"') do set version_7z=%%i +if not "%version_7z%"=="" ( + REM Unique temporary filename to use by the 'mklink' command. + set "link_name=%temp%\%username%_%random%_%random%.tmp" + + REM As the 'mklink' requires elevated permissions, the symbolic link + REM creation will fail if the script is not running as administrator. + mklink /d "!link_name!" . 1>nul 2>nul + if errorlevel 1 ( + echo. + echo Script requires administrator permissions, or a 7-zip version 20.x or older. + echo Current version is "%version_7z%" + exit /b 1 + ) else ( + REM Remove the temporary symbolic link. + rd "!link_name!" + ) +) + +REM Prerequisites: +REM +REM Visual Studio 2019, CMake, Ninja, GNUWin32, SWIG, Python 3, +REM NSIS with the strlen_8192 patch, +REM Perl (for the OpenMP run-time). +REM +REM +REM For LLDB, SWIG version 4.1.1 should be used. +REM + +:: Detect Visual Studio +set vsinstall= +set vswhere=%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\vswhere.exe + +if "%VSINSTALLDIR%" NEQ "" ( + echo using enabled Visual Studio installation + set "vsinstall=%VSINSTALLDIR%" +) else ( + echo using vswhere to detect Visual Studio installation + FOR /F "delims=" %%r IN ('^""%vswhere%" -nologo -latest -products "*" -all -property installationPath^"') DO set vsinstall=%%r +) +set "vsdevcmd=%vsinstall%\Common7\Tools\VsDevCmd.bat" + +if not exist "%vsdevcmd%" ( + echo Can't find any installation of Visual Studio + exit /b 1 +) +echo Using VS devcmd: %vsdevcmd% + +::============================================================================== +:: start echoing what we do + at echo on + +set python32_dir=C:\Users\%USERNAME%\AppData\Local\Programs\Python\Python310-32 +set python64_dir=C:\Users\%USERNAME%\AppData\Local\Programs\Python\Python310 +set pythonarm64_dir=C:\Users\%USERNAME%\AppData\Local\Programs\Python\Python311-arm64 + +set revision=llvmorg-%version% +set package_version=%version% +set build_dir=%cd%\llvm_package_%package_version% + +echo Revision: %revision% +echo Package version: %package_version% +echo Build dir: %build_dir% +echo. + +if exist %build_dir% ( + echo Build directory already exists: %build_dir% + exit /b 1 +) +mkdir %build_dir% +cd %build_dir% || exit /b 1 + +if "%skip-checkout%" == "true" ( + echo Using local source + set llvm_src=%~dp0..\..\.. +) else ( + echo Checking out %revision% + curl -L https://github.com/llvm/llvm-project/archive/%revision%.zip -o src.zip || exit /b 1 + 7z x src.zip || exit /b 1 + mv llvm-project-* llvm-project || exit /b 1 + set llvm_src=%build_dir%\llvm-project +) + +curl -O https://gitlab.gnome.org/GNOME/libxml2/-/archive/v2.9.12/libxml2-v2.9.12.tar.gz || exit /b 1 +tar zxf libxml2-v2.9.12.tar.gz + +REM Setting CMAKE_CL_SHOWINCLUDES_PREFIX to work around PR27226. +REM Common flags for all builds. +set common_compiler_flags=-DLIBXML_STATIC +set common_cmake_flags=^ + -DCMAKE_BUILD_TYPE=Release ^ + -DLLVM_ENABLE_ASSERTIONS=OFF ^ + -DLLVM_INSTALL_TOOLCHAIN_ONLY=ON ^ + -DLLVM_TARGETS_TO_BUILD="AArch64;ARM;X86" ^ + -DLLVM_BUILD_LLVM_C_DYLIB=ON ^ + -DCMAKE_INSTALL_UCRT_LIBRARIES=ON ^ + -DPython3_FIND_REGISTRY=NEVER ^ + -DPACKAGE_VERSION=%package_version% ^ + -DLLDB_RELOCATABLE_PYTHON=1 ^ + -DLLDB_EMBED_PYTHON_HOME=OFF ^ + -DCMAKE_CL_SHOWINCLUDES_PREFIX="Note: including file: " ^ + -DLLVM_ENABLE_LIBXML2=FORCE_ON ^ + -DLLDB_ENABLE_LIBXML2=OFF ^ + -DCLANG_ENABLE_LIBXML2=OFF ^ + -DCMAKE_C_FLAGS="%common_compiler_flags%" ^ + -DCMAKE_CXX_FLAGS="%common_compiler_flags%" ^ + -DLLVM_ENABLE_RPMALLOC=ON ^ + -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;lld;compiler-rt;lldb;openmp" + +set cmake_profile_flags="" + +REM Preserve original path +set OLDPATH=%PATH% + +REM Build the 32-bits and/or 64-bits binaries. +if "%x86%" == "true" call :do_build_32 || exit /b 1 +if "%x64%" == "true" call :do_build_64 || exit /b 1 +if "%arm64%" == "true" call :do_build_arm64 || exit /b 1 +exit /b 0 + +::============================================================================== +:: Build 32-bits binaries. +::============================================================================== +:do_build_32 +call :set_environment %python32_dir% || exit /b 1 +call "%vsdevcmd%" -arch=x86 || exit /b 1 + at echo on +mkdir build32_stage0 +cd build32_stage0 +call :do_build_libxml || exit /b 1 + +REM Stage0 binaries directory; used in stage1. +set "stage0_bin_dir=%build_dir%/build32_stage0/bin" +set cmake_flags=^ + %common_cmake_flags% ^ + -DLLVM_ENABLE_RPMALLOC=OFF ^ + -DLLDB_TEST_COMPILER=%stage0_bin_dir%/clang.exe ^ + -DPYTHON_HOME=%PYTHONHOME% ^ + -DPython3_ROOT_DIR=%PYTHONHOME% ^ + -DLIBXML2_INCLUDE_DIR=%libxmldir%/include/libxml2 ^ + -DLIBXML2_LIBRARIES=%libxmldir%/lib/libxml2s.lib + +cmake -GNinja %cmake_flags% %llvm_src%\llvm || exit /b 1 +ninja || ninja || ninja || exit /b 1 +REM ninja check-llvm || ninja check-llvm || ninja check-llvm || exit /b 1 +REM ninja check-clang || ninja check-clang || ninja check-clang || exit /b 1 +ninja check-lld || ninja check-lld || ninja check-lld || exit /b 1 +ninja check-sanitizer || ninja check-sanitizer || ninja check-sanitizer || exit /b 1 +REM ninja check-clang-tools || ninja check-clang-tools || ninja check-clang-tools || exit /b 1 +cd.. + +REM CMake expects the paths that specifies the compiler and linker to be +REM with forward slash. +set all_cmake_flags=^ + %cmake_flags% ^ + -DCMAKE_C_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ + -DCMAKE_CXX_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ + -DCMAKE_LINKER=%stage0_bin_dir%/lld-link.exe ^ + -DCMAKE_AR=%stage0_bin_dir%/llvm-lib.exe ^ + -DCMAKE_RC=%stage0_bin_dir%/llvm-windres.exe +set cmake_flags=%all_cmake_flags:\=/% + +mkdir build32 +cd build32 +cmake -GNinja %cmake_flags% %llvm_src%\llvm || exit /b 1 +ninja || ninja || ninja || exit /b 1 +REM ninja check-llvm || ninja check-llvm || ninja check-llvm || exit /b 1 +REM ninja check-clang || ninja check-clang || ninja check-clang || exit /b 1 +ninja check-lld || ninja check-lld || ninja check-lld || exit /b 1 +ninja check-sanitizer || ninja check-sanitizer || ninja check-sanitizer || exit /b 1 +REM ninja check-clang-tools || ninja check-clang-tools || ninja check-clang-tools || exit /b 1 +ninja package || exit /b 1 +cd .. + +exit /b 0 +::============================================================================== + +::============================================================================== +:: Build 64-bits binaries. +::============================================================================== +:do_build_64 +call :set_environment %python64_dir% || exit /b 1 +call "%vsdevcmd%" -arch=amd64 || exit /b 1 + at echo on +mkdir build64_stage0 +cd build64_stage0 +call :do_build_libxml || exit /b 1 + +REM Stage0 binaries directory; used in stage1. +set "stage0_bin_dir=%build_dir%/build64_stage0/bin" +set cmake_flags=^ + %common_cmake_flags% ^ + -DLLDB_TEST_COMPILER=%stage0_bin_dir%/clang.exe ^ + -DPYTHON_HOME=%PYTHONHOME% ^ + -DPython3_ROOT_DIR=%PYTHONHOME% ^ + -DLIBXML2_INCLUDE_DIR=%libxmldir%/include/libxml2 ^ + -DLIBXML2_LIBRARIES=%libxmldir%/lib/libxml2s.lib + +cmake -GNinja %cmake_flags% %llvm_src%\llvm || exit /b 1 +ninja || ninja || ninja || exit /b 1 +ninja check-llvm || ninja check-llvm || ninja check-llvm || exit /b 1 +ninja check-clang || ninja check-clang || ninja check-clang || exit /b 1 +ninja check-lld || ninja check-lld || ninja check-lld || exit /b 1 +ninja check-sanitizer || ninja check-sanitizer || ninja check-sanitizer || exit /b 1 +ninja check-clang-tools || ninja check-clang-tools || ninja check-clang-tools || exit /b 1 +ninja check-clangd || ninja check-clangd || ninja check-clangd || exit /b 1 +cd.. + +REM CMake expects the paths that specifies the compiler and linker to be +REM with forward slash. +set all_cmake_flags=^ + %cmake_flags% ^ + -DCMAKE_C_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ + -DCMAKE_CXX_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ + -DCMAKE_LINKER=%stage0_bin_dir%/lld-link.exe ^ + -DCMAKE_AR=%stage0_bin_dir%/llvm-lib.exe ^ + -DCMAKE_RC=%stage0_bin_dir%/llvm-windres.exe +set cmake_flags=%all_cmake_flags:\=/% + + +mkdir build64 +cd build64 +call :do_generate_profile || exit /b 1 +cmake -GNinja %cmake_flags% %cmake_profile_flags% %llvm_src%\llvm || exit /b 1 +ninja || ninja || ninja || exit /b 1 +ninja check-llvm || ninja check-llvm || ninja check-llvm || exit /b 1 +ninja check-clang || ninja check-clang || ninja check-clang || exit /b 1 +ninja check-lld || ninja check-lld || ninja check-lld || exit /b 1 +ninja check-sanitizer || ninja check-sanitizer || ninja check-sanitizer || exit /b 1 +ninja check-clang-tools || ninja check-clang-tools || ninja check-clang-tools || exit /b 1 +ninja check-clangd || ninja check-clangd || ninja check-clangd || exit /b 1 +ninja package || exit /b 1 + +:: generate tarball with install toolchain only off +set filename=clang+llvm-%version%-x86_64-pc-windows-msvc +cmake -GNinja %cmake_flags% %cmake_profile_flags% -DLLVM_INSTALL_TOOLCHAIN_ONLY=OFF ^ + -DCMAKE_INSTALL_PREFIX=%build_dir%/%filename% ..\llvm-project\llvm || exit /b 1 +ninja install || exit /b 1 +:: check llvm_config is present & returns something +%build_dir%/%filename%/bin/llvm-config.exe --bindir || exit /b 1 +cd .. +7z a -ttar -so %filename%.tar %filename% | 7z a -txz -si %filename%.tar.xz + +exit /b 0 +::============================================================================== + +::============================================================================== +:: Build arm64 binaries. +::============================================================================== +:do_build_arm64 +call :set_environment %pythonarm64_dir% || exit /b 1 +call "%vsdevcmd%" -host_arch=x64 -arch=arm64 || exit /b 1 + at echo on +mkdir build_arm64_stage0 +cd build_arm64_stage0 +call :do_build_libxml || exit /b 1 + +REM Stage0 binaries directory; used in stage1. +set "stage0_bin_dir=%build_dir%/build_arm64_stage0/bin" +set cmake_flags=^ + %common_cmake_flags% ^ + -DCLANG_DEFAULT_LINKER=lld ^ + -DLIBXML2_INCLUDE_DIR=%libxmldir%/include/libxml2 ^ + -DLIBXML2_LIBRARIES=%libxmldir%/lib/libxml2s.lib ^ + -DPython3_ROOT_DIR=%PYTHONHOME% ^ + -DCOMPILER_RT_BUILD_PROFILE=OFF ^ + -DCOMPILER_RT_BUILD_SANITIZERS=OFF + +REM We need to build stage0 compiler-rt with clang-cl (msvc lacks some builtins). +cmake -GNinja %cmake_flags% ^ + -DCMAKE_C_COMPILER=clang-cl.exe ^ + -DCMAKE_CXX_COMPILER=clang-cl.exe ^ + %llvm_src%\llvm || exit /b 1 +ninja || exit /b 1 +::ninja check-llvm || exit /b 1 +::ninja check-clang || exit /b 1 +::ninja check-lld || exit /b 1 +::ninja check-sanitizer || exit /b 1 +::ninja check-clang-tools || exit /b 1 +::ninja check-clangd || exit /b 1 +cd.. + +REM CMake expects the paths that specifies the compiler and linker to be +REM with forward slash. +REM CPACK_SYSTEM_NAME is set to have a correct name for installer generated. +set all_cmake_flags=^ + %cmake_flags% ^ + -DCMAKE_C_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ + -DCMAKE_CXX_COMPILER=%stage0_bin_dir%/clang-cl.exe ^ + -DCMAKE_LINKER=%stage0_bin_dir%/lld-link.exe ^ + -DCMAKE_AR=%stage0_bin_dir%/llvm-lib.exe ^ + -DCMAKE_RC=%stage0_bin_dir%/llvm-windres.exe ^ + -DCPACK_SYSTEM_NAME=woa64 +set cmake_flags=%all_cmake_flags:\=/% + +mkdir build_arm64 +cd build_arm64 +cmake -GNinja %cmake_flags% %llvm_src%\llvm || exit /b 1 +ninja || exit /b 1 +REM Check but do not fail on errors. +ninja check-lldb +::ninja check-llvm || exit /b 1 +::ninja check-clang || exit /b 1 +::ninja check-lld || exit /b 1 +::ninja check-sanitizer || exit /b 1 +::ninja check-clang-tools || exit /b 1 +::ninja check-clangd || exit /b 1 +ninja package || exit /b 1 +cd .. + +exit /b 0 +::============================================================================== +:: +::============================================================================== +:: Set PATH and some environment variables. +::============================================================================== +:set_environment +REM Restore original path +set PATH=%OLDPATH% + +set python_dir=%1 + +REM Set Python environment +if "%local-python%" == "true" ( + FOR /F "delims=" %%i IN ('where python.exe ^| head -1') DO set python_exe=%%i + set PYTHONHOME=!python_exe:~0,-11! +) else ( + %python_dir%/python.exe --version || exit /b 1 + set PYTHONHOME=%python_dir% +) +set PATH=%PYTHONHOME%;%PATH% + +set "VSCMD_START_DIR=%build_dir%" + +exit /b 0 + +::============================================================================= + +::============================================================================== +:: Build libxml. +::============================================================================== +:do_build_libxml +mkdir libxmlbuild +cd libxmlbuild +cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=install ^ + -DBUILD_SHARED_LIBS=OFF -DLIBXML2_WITH_C14N=OFF -DLIBXML2_WITH_CATALOG=OFF ^ + -DLIBXML2_WITH_DEBUG=OFF -DLIBXML2_WITH_DOCB=OFF -DLIBXML2_WITH_FTP=OFF ^ + -DLIBXML2_WITH_HTML=OFF -DLIBXML2_WITH_HTTP=OFF -DLIBXML2_WITH_ICONV=OFF ^ + -DLIBXML2_WITH_ICU=OFF -DLIBXML2_WITH_ISO8859X=OFF -DLIBXML2_WITH_LEGACY=OFF ^ + -DLIBXML2_WITH_LZMA=OFF -DLIBXML2_WITH_MEM_DEBUG=OFF -DLIBXML2_WITH_MODULES=OFF ^ + -DLIBXML2_WITH_OUTPUT=ON -DLIBXML2_WITH_PATTERN=OFF -DLIBXML2_WITH_PROGRAMS=OFF ^ + -DLIBXML2_WITH_PUSH=OFF -DLIBXML2_WITH_PYTHON=OFF -DLIBXML2_WITH_READER=OFF ^ + -DLIBXML2_WITH_REGEXPS=OFF -DLIBXML2_WITH_RUN_DEBUG=OFF -DLIBXML2_WITH_SAX1=OFF ^ + -DLIBXML2_WITH_SCHEMAS=OFF -DLIBXML2_WITH_SCHEMATRON=OFF -DLIBXML2_WITH_TESTS=OFF ^ + -DLIBXML2_WITH_THREADS=ON -DLIBXML2_WITH_THREAD_ALLOC=OFF -DLIBXML2_WITH_TREE=ON ^ + -DLIBXML2_WITH_VALID=OFF -DLIBXML2_WITH_WRITER=OFF -DLIBXML2_WITH_XINCLUDE=OFF ^ + -DLIBXML2_WITH_XPATH=OFF -DLIBXML2_WITH_XPTR=OFF -DLIBXML2_WITH_ZLIB=OFF ^ + -DCMAKE_MSVC_RUNTIME_LIBRARY=MultiThreaded ^ + ../../libxml2-v2.9.12 || exit /b 1 +ninja install || exit /b 1 +set libxmldir=%cd%\install +set "libxmldir=%libxmldir:\=/%" +cd .. +exit /b 0 + +::============================================================================== +:: Generate a PGO profile. +::============================================================================== +:do_generate_profile +REM Build Clang with instrumentation. +mkdir instrument +cd instrument +cmake -GNinja %cmake_flags% -DLLVM_TARGETS_TO_BUILD=Native ^ + -DLLVM_BUILD_INSTRUMENTED=IR %llvm_src%\llvm || exit /b 1 +ninja clang || ninja clang || ninja clang || exit /b 1 +set instrumented_clang=%cd:\=/%/bin/clang-cl.exe +cd .. +REM Use that to build part of llvm to generate a profile. +mkdir train +cd train +cmake -GNinja %cmake_flags% ^ + -DCMAKE_C_COMPILER=%instrumented_clang% ^ + -DCMAKE_CXX_COMPILER=%instrumented_clang% ^ + -DLLVM_ENABLE_PROJECTS=clang ^ + -DLLVM_TARGETS_TO_BUILD=Native ^ + %llvm_src%\llvm || exit /b 1 +REM Drop profiles generated from running cmake; those are not representative. +del ..\instrument\profiles\*.profraw +ninja tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/Sema.cpp.obj +cd .. +set profile=%cd:\=/%/profile.profdata +%stage0_bin_dir%\llvm-profdata merge -output=%profile% instrument\profiles\*.profraw || exit /b 1 +set common_compiler_flags=%common_compiler_flags% -Wno-backend-plugin +set cmake_profile_flags=-DLLVM_PROFDATA_FILE=%profile% ^ + -DCMAKE_C_FLAGS="%common_compiler_flags%" ^ + -DCMAKE_CXX_FLAGS="%common_compiler_flags%" +exit /b 0 + +::============================================================================= +:: Parse command line arguments. +:: The format for the arguments is: +:: Boolean: --option +:: Value: --optionvalue +:: with being: space, colon, semicolon or equal sign +:: +:: Command line usage example: +:: my-batch-file.bat --build --type=release --version 123 +:: It will create 3 variables: +:: 'build' with the value 'true' +:: 'type' with the value 'release' +:: 'version' with the value '123' +:: +:: Usage: +:: set "build=" +:: set "type=" +:: set "version=" +:: +:: REM Parse arguments. +:: call :parse_args %* +:: +:: if defined build ( +:: ... +:: ) +:: if %type%=='release' ( +:: ... +:: ) +:: if %version%=='123' ( +:: ... +:: ) +::============================================================================= +:parse_args + set "arg_name=" + :parse_args_start + if "%1" == "" ( + :: Set a seen boolean argument. + if "%arg_name%" neq "" ( + set "%arg_name%=true" + ) + goto :parse_args_done + ) + set aux=%1 + if "%aux:~0,2%" == "--" ( + :: Set a seen boolean argument. + if "%arg_name%" neq "" ( + set "%arg_name%=true" + ) + set "arg_name=%aux:~2,250%" + ) else ( + set "%arg_name%=%1" + set "arg_name=" + ) + shift + goto :parse_args_start + +:parse_args_done +exit /b 0 diff --git a/openmp/runtime/doc/doxygen/config b/openmp/runtime/doc/doxygen/config index 04c966766ba6ef..8d79dc143cc1a0 100644 --- a/openmp/runtime/doc/doxygen/config +++ b/openmp/runtime/doc/doxygen/config @@ -1,1822 +1,1822 @@ -# Doxyfile 1.o8.2 - -# This file describes the settings to be used by the documentation system -# doxygen (www.doxygen.org) for a project. -# -# All text after a hash (#) is considered a comment and will be ignored. -# The format is: -# TAG = value [value, ...] -# For lists items can also be appended using: -# TAG += value [value, ...] -# Values that contain spaces should be placed between quotes (" "). - -#--------------------------------------------------------------------------- -# Project related configuration options -#--------------------------------------------------------------------------- - -# This tag specifies the encoding used for all characters in the config file -# that follow. The default is UTF-8 which is also the encoding used for all -# text before the first occurrence of this tag. Doxygen uses libiconv (or the -# iconv built into libc) for the transcoding. See -# http://www.gnu.org/software/libiconv for the list of possible encodings. - -DOXYFILE_ENCODING = UTF-8 - -# The PROJECT_NAME tag is a single word (or sequence of words) that should -# identify the project. Note that if you do not use Doxywizard you need -# to put quotes around the project name if it contains spaces. - -PROJECT_NAME = "LLVM OpenMP* Runtime Library" - -# The PROJECT_NUMBER tag can be used to enter a project or revision number. -# This could be handy for archiving the generated documentation or -# if some version control system is used. - -PROJECT_NUMBER = - -# Using the PROJECT_BRIEF tag one can provide an optional one line description -# for a project that appears at the top of each page and should give viewer -# a quick idea about the purpose of the project. Keep the description short. - -PROJECT_BRIEF = - -# With the PROJECT_LOGO tag one can specify an logo or icon that is -# included in the documentation. The maximum height of the logo should not -# exceed 55 pixels and the maximum width should not exceed 200 pixels. -# Doxygen will copy the logo to the output directory. - -PROJECT_LOGO = - -# The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute) -# base path where the generated documentation will be put. -# If a relative path is entered, it will be relative to the location -# where doxygen was started. If left blank the current directory will be used. - -OUTPUT_DIRECTORY = doc/doxygen/generated - -# If the CREATE_SUBDIRS tag is set to YES, then doxygen will create -# 4096 sub-directories (in 2 levels) under the output directory of each output -# format and will distribute the generated files over these directories. -# Enabling this option can be useful when feeding doxygen a huge amount of -# source files, where putting all generated files in the same directory would -# otherwise cause performance problems for the file system. - -CREATE_SUBDIRS = NO - -# The OUTPUT_LANGUAGE tag is used to specify the language in which all -# documentation generated by doxygen is written. Doxygen will use this -# information to generate all constant output in the proper language. -# The default language is English, other supported languages are: -# Afrikaans, Arabic, Brazilian, Catalan, Chinese, Chinese-Traditional, -# Croatian, Czech, Danish, Dutch, Esperanto, Farsi, Finnish, French, German, -# Greek, Hungarian, Italian, Japanese, Japanese-en (Japanese with English -# messages), Korean, Korean-en, Lithuanian, Norwegian, Macedonian, Persian, -# Polish, Portuguese, Romanian, Russian, Serbian, Serbian-Cyrillic, Slovak, -# Slovene, Spanish, Swedish, Ukrainian, and Vietnamese. - -OUTPUT_LANGUAGE = English - -# If the BRIEF_MEMBER_DESC tag is set to YES (the default) Doxygen will -# include brief member descriptions after the members that are listed in -# the file and class documentation (similar to JavaDoc). -# Set to NO to disable this. - -BRIEF_MEMBER_DESC = YES - -# If the REPEAT_BRIEF tag is set to YES (the default) Doxygen will prepend -# the brief description of a member or function before the detailed description. -# Note: if both HIDE_UNDOC_MEMBERS and BRIEF_MEMBER_DESC are set to NO, the -# brief descriptions will be completely suppressed. - -REPEAT_BRIEF = YES - -# This tag implements a quasi-intelligent brief description abbreviator -# that is used to form the text in various listings. Each string -# in this list, if found as the leading text of the brief description, will be -# stripped from the text and the result after processing the whole list, is -# used as the annotated text. Otherwise, the brief description is used as-is. -# If left blank, the following values are used ("$name" is automatically -# replaced with the name of the entity): "The $name class" "The $name widget" -# "The $name file" "is" "provides" "specifies" "contains" -# "represents" "a" "an" "the" - -ABBREVIATE_BRIEF = - -# If the ALWAYS_DETAILED_SEC and REPEAT_BRIEF tags are both set to YES then -# Doxygen will generate a detailed section even if there is only a brief -# description. - -ALWAYS_DETAILED_SEC = NO - -# If the INLINE_INHERITED_MEMB tag is set to YES, doxygen will show all -# inherited members of a class in the documentation of that class as if those -# members were ordinary class members. Constructors, destructors and assignment -# operators of the base classes will not be shown. - -INLINE_INHERITED_MEMB = NO - -# If the FULL_PATH_NAMES tag is set to YES then Doxygen will prepend the full -# path before files name in the file list and in the header files. If set -# to NO the shortest path that makes the file name unique will be used. - -FULL_PATH_NAMES = NO - -# If the FULL_PATH_NAMES tag is set to YES then the STRIP_FROM_PATH tag -# can be used to strip a user-defined part of the path. Stripping is -# only done if one of the specified strings matches the left-hand part of -# the path. The tag can be used to show relative paths in the file list. -# If left blank the directory from which doxygen is run is used as the -# path to strip. Note that you specify absolute paths here, but also -# relative paths, which will be relative from the directory where doxygen is -# started. - -STRIP_FROM_PATH = - -# The STRIP_FROM_INC_PATH tag can be used to strip a user-defined part of -# the path mentioned in the documentation of a class, which tells -# the reader which header file to include in order to use a class. -# If left blank only the name of the header file containing the class -# definition is used. Otherwise one should specify the include paths that -# are normally passed to the compiler using the -I flag. - -STRIP_FROM_INC_PATH = - -# If the SHORT_NAMES tag is set to YES, doxygen will generate much shorter -# (but less readable) file names. This can be useful if your file system -# doesn't support long names like on DOS, Mac, or CD-ROM. - -SHORT_NAMES = NO - -# If the JAVADOC_AUTOBRIEF tag is set to YES then Doxygen -# will interpret the first line (until the first dot) of a JavaDoc-style -# comment as the brief description. If set to NO, the JavaDoc -# comments will behave just like regular Qt-style comments -# (thus requiring an explicit @brief command for a brief description.) - -JAVADOC_AUTOBRIEF = NO - -# If the QT_AUTOBRIEF tag is set to YES then Doxygen will -# interpret the first line (until the first dot) of a Qt-style -# comment as the brief description. If set to NO, the comments -# will behave just like regular Qt-style comments (thus requiring -# an explicit \brief command for a brief description.) - -QT_AUTOBRIEF = NO - -# The MULTILINE_CPP_IS_BRIEF tag can be set to YES to make Doxygen -# treat a multi-line C++ special comment block (i.e. a block of //! or /// -# comments) as a brief description. This used to be the default behaviour. -# The new default is to treat a multi-line C++ comment block as a detailed -# description. Set this tag to YES if you prefer the old behaviour instead. - -MULTILINE_CPP_IS_BRIEF = NO - -# If the INHERIT_DOCS tag is set to YES (the default) then an undocumented -# member inherits the documentation from any documented member that it -# re-implements. - -INHERIT_DOCS = YES - -# If the SEPARATE_MEMBER_PAGES tag is set to YES, then doxygen will produce -# a new page for each member. If set to NO, the documentation of a member will -# be part of the file/class/namespace that contains it. - -SEPARATE_MEMBER_PAGES = NO - -# The TAB_SIZE tag can be used to set the number of spaces in a tab. -# Doxygen uses this value to replace tabs by spaces in code fragments. - -TAB_SIZE = 8 - -# This tag can be used to specify a number of aliases that acts -# as commands in the documentation. An alias has the form "name=value". -# For example adding "sideeffect=\par Side Effects:\n" will allow you to -# put the command \sideeffect (or @sideeffect) in the documentation, which -# will result in a user-defined paragraph with heading "Side Effects:". -# You can put \n's in the value part of an alias to insert newlines. - -ALIASES = "other=*" - -# This tag can be used to specify a number of word-keyword mappings (TCL only). -# A mapping has the form "name=value". For example adding -# "class=itcl::class" will allow you to use the command class in the -# itcl::class meaning. - -TCL_SUBST = - -# Set the OPTIMIZE_OUTPUT_FOR_C tag to YES if your project consists of C -# sources only. Doxygen will then generate output that is more tailored for C. -# For instance, some of the names that are used will be different. The list -# of all members will be omitted, etc. - -OPTIMIZE_OUTPUT_FOR_C = NO - -# Set the OPTIMIZE_OUTPUT_JAVA tag to YES if your project consists of Java -# sources only. Doxygen will then generate output that is more tailored for -# Java. For instance, namespaces will be presented as packages, qualified -# scopes will look different, etc. - -OPTIMIZE_OUTPUT_JAVA = NO - -# Set the OPTIMIZE_FOR_FORTRAN tag to YES if your project consists of Fortran -# sources only. Doxygen will then generate output that is more tailored for -# Fortran. - -OPTIMIZE_FOR_FORTRAN = NO - -# Set the OPTIMIZE_OUTPUT_VHDL tag to YES if your project consists of VHDL -# sources. Doxygen will then generate output that is tailored for -# VHDL. - -OPTIMIZE_OUTPUT_VHDL = NO - -# Doxygen selects the parser to use depending on the extension of the files it -# parses. With this tag you can assign which parser to use for a given -# extension. Doxygen has a built-in mapping, but you can override or extend it -# using this tag. The format is ext=language, where ext is a file extension, -# and language is one of the parsers supported by doxygen: IDL, Java, -# Javascript, CSharp, C, C++, D, PHP, Objective-C, Python, Fortran, VHDL, C, -# C++. For instance to make doxygen treat .inc files as Fortran files (default -# is PHP), and .f files as C (default is Fortran), use: inc=Fortran f=C. Note -# that for custom extensions you also need to set FILE_PATTERNS otherwise the -# files are not read by doxygen. - -EXTENSION_MAPPING = - -# If MARKDOWN_SUPPORT is enabled (the default) then doxygen pre-processes all -# comments according to the Markdown format, which allows for more readable -# documentation. See http://daringfireball.net/projects/markdown/ for details. -# The output of markdown processing is further processed by doxygen, so you -# can mix doxygen, HTML, and XML commands with Markdown formatting. -# Disable only in case of backward compatibilities issues. - -MARKDOWN_SUPPORT = YES - -# When enabled doxygen tries to link words that correspond to documented classes, -# or namespaces to their corresponding documentation. Such a link can be -# prevented in individual cases by by putting a % sign in front of the word or -# globally by setting AUTOLINK_SUPPORT to NO. - -AUTOLINK_SUPPORT = YES - -# If you use STL classes (i.e. std::string, std::vector, etc.) but do not want -# to include (a tag file for) the STL sources as input, then you should -# set this tag to YES in order to let doxygen match functions declarations and -# definitions whose arguments contain STL classes (e.g. func(std::string); v.s. -# func(std::string) {}). This also makes the inheritance and collaboration -# diagrams that involve STL classes more complete and accurate. - -BUILTIN_STL_SUPPORT = NO - -# If you use Microsoft's C++/CLI language, you should set this option to YES to -# enable parsing support. - -CPP_CLI_SUPPORT = NO - -# Set the SIP_SUPPORT tag to YES if your project consists of sip sources only. -# Doxygen will parse them like normal C++ but will assume all classes use public -# instead of private inheritance when no explicit protection keyword is present. - -SIP_SUPPORT = NO - -# For Microsoft's IDL there are propget and propput attributes to -# indicate getter and setter methods for a property. Setting this -# option to YES (the default) will make doxygen replace the get and -# set methods by a property in the documentation. This will only work -# if the methods are indeed getting or setting a simple type. If this -# is not the case, or you want to show the methods anyway, you should -# set this option to NO. - -IDL_PROPERTY_SUPPORT = YES - -# If member grouping is used in the documentation and the DISTRIBUTE_GROUP_DOC -# tag is set to YES, then doxygen will reuse the documentation of the first -# member in the group (if any) for the other members of the group. By default -# all members of a group must be documented explicitly. - -DISTRIBUTE_GROUP_DOC = NO - -# Set the SUBGROUPING tag to YES (the default) to allow class member groups of -# the same type (for instance a group of public functions) to be put as a -# subgroup of that type (e.g. under the Public Functions section). Set it to -# NO to prevent subgrouping. Alternatively, this can be done per class using -# the \nosubgrouping command. - -SUBGROUPING = YES - -# When the INLINE_GROUPED_CLASSES tag is set to YES, classes, structs and -# unions are shown inside the group in which they are included (e.g. using -# @ingroup) instead of on a separate page (for HTML and Man pages) or -# section (for LaTeX and RTF). - -INLINE_GROUPED_CLASSES = NO - -# When the INLINE_SIMPLE_STRUCTS tag is set to YES, structs, classes, and -# unions with only public data fields will be shown inline in the documentation -# of the scope in which they are defined (i.e. file, namespace, or group -# documentation), provided this scope is documented. If set to NO (the default), -# structs, classes, and unions are shown on a separate page (for HTML and Man -# pages) or section (for LaTeX and RTF). - -INLINE_SIMPLE_STRUCTS = NO - -# When TYPEDEF_HIDES_STRUCT is enabled, a typedef of a struct, union, or enum -# is documented as struct, union, or enum with the name of the typedef. So -# typedef struct TypeS {} TypeT, will appear in the documentation as a struct -# with name TypeT. When disabled the typedef will appear as a member of a file, -# namespace, or class. And the struct will be named TypeS. This can typically -# be useful for C code in case the coding convention dictates that all compound -# types are typedef'ed and only the typedef is referenced, never the tag name. - -TYPEDEF_HIDES_STRUCT = NO - -# The SYMBOL_CACHE_SIZE determines the size of the internal cache use to -# determine which symbols to keep in memory and which to flush to disk. -# When the cache is full, less often used symbols will be written to disk. -# For small to medium size projects (<1000 input files) the default value is -# probably good enough. For larger projects a too small cache size can cause -# doxygen to be busy swapping symbols to and from disk most of the time -# causing a significant performance penalty. -# If the system has enough physical memory increasing the cache will improve the -# performance by keeping more symbols in memory. Note that the value works on -# a logarithmic scale so increasing the size by one will roughly double the -# memory usage. The cache size is given by this formula: -# 2^(16+SYMBOL_CACHE_SIZE). The valid range is 0..9, the default is 0, -# corresponding to a cache size of 2^16 = 65536 symbols. - -SYMBOL_CACHE_SIZE = 0 - -# Similar to the SYMBOL_CACHE_SIZE the size of the symbol lookup cache can be -# set using LOOKUP_CACHE_SIZE. This cache is used to resolve symbols given -# their name and scope. Since this can be an expensive process and often the -# same symbol appear multiple times in the code, doxygen keeps a cache of -# pre-resolved symbols. If the cache is too small doxygen will become slower. -# If the cache is too large, memory is wasted. The cache size is given by this -# formula: 2^(16+LOOKUP_CACHE_SIZE). The valid range is 0..9, the default is 0, -# corresponding to a cache size of 2^16 = 65536 symbols. - -LOOKUP_CACHE_SIZE = 0 - -#--------------------------------------------------------------------------- -# Build related configuration options -#--------------------------------------------------------------------------- - -# If the EXTRACT_ALL tag is set to YES doxygen will assume all entities in -# documentation are documented, even if no documentation was available. -# Private class members and static file members will be hidden unless -# the EXTRACT_PRIVATE and EXTRACT_STATIC tags are set to YES - -EXTRACT_ALL = NO - -# If the EXTRACT_PRIVATE tag is set to YES all private members of a class -# will be included in the documentation. - -EXTRACT_PRIVATE = YES - -# If the EXTRACT_PACKAGE tag is set to YES all members with package or internal -# scope will be included in the documentation. - -EXTRACT_PACKAGE = NO - -# If the EXTRACT_STATIC tag is set to YES all static members of a file -# will be included in the documentation. - -EXTRACT_STATIC = YES - -# If the EXTRACT_LOCAL_CLASSES tag is set to YES classes (and structs) -# defined locally in source files will be included in the documentation. -# If set to NO only classes defined in header files are included. - -EXTRACT_LOCAL_CLASSES = YES - -# This flag is only useful for Objective-C code. When set to YES local -# methods, which are defined in the implementation section but not in -# the interface are included in the documentation. -# If set to NO (the default) only methods in the interface are included. - -EXTRACT_LOCAL_METHODS = NO - -# If this flag is set to YES, the members of anonymous namespaces will be -# extracted and appear in the documentation as a namespace called -# 'anonymous_namespace{file}', where file will be replaced with the base -# name of the file that contains the anonymous namespace. By default -# anonymous namespaces are hidden. - -EXTRACT_ANON_NSPACES = NO - -# If the HIDE_UNDOC_MEMBERS tag is set to YES, Doxygen will hide all -# undocumented members of documented classes, files or namespaces. -# If set to NO (the default) these members will be included in the -# various overviews, but no documentation section is generated. -# This option has no effect if EXTRACT_ALL is enabled. - -HIDE_UNDOC_MEMBERS = YES - -# If the HIDE_UNDOC_CLASSES tag is set to YES, Doxygen will hide all -# undocumented classes that are normally visible in the class hierarchy. -# If set to NO (the default) these classes will be included in the various -# overviews. This option has no effect if EXTRACT_ALL is enabled. - -HIDE_UNDOC_CLASSES = YES - -# If the HIDE_FRIEND_COMPOUNDS tag is set to YES, Doxygen will hide all -# friend (class|struct|union) declarations. -# If set to NO (the default) these declarations will be included in the -# documentation. - -HIDE_FRIEND_COMPOUNDS = NO - -# If the HIDE_IN_BODY_DOCS tag is set to YES, Doxygen will hide any -# documentation blocks found inside the body of a function. -# If set to NO (the default) these blocks will be appended to the -# function's detailed documentation block. - -HIDE_IN_BODY_DOCS = NO - -# The INTERNAL_DOCS tag determines if documentation -# that is typed after a \internal command is included. If the tag is set -# to NO (the default) then the documentation will be excluded. -# Set it to YES to include the internal documentation. - -INTERNAL_DOCS = NO - -# If the CASE_SENSE_NAMES tag is set to NO then Doxygen will only generate -# file names in lower-case letters. If set to YES upper-case letters are also -# allowed. This is useful if you have classes or files whose names only differ -# in case and if your file system supports case sensitive file names. Windows -# and Mac users are advised to set this option to NO. - -CASE_SENSE_NAMES = YES - -# If the HIDE_SCOPE_NAMES tag is set to NO (the default) then Doxygen -# will show members with their full class and namespace scopes in the -# documentation. If set to YES the scope will be hidden. - -HIDE_SCOPE_NAMES = NO - -# If the SHOW_INCLUDE_FILES tag is set to YES (the default) then Doxygen -# will put a list of the files that are included by a file in the documentation -# of that file. - -SHOW_INCLUDE_FILES = YES - -# If the FORCE_LOCAL_INCLUDES tag is set to YES then Doxygen -# will list include files with double quotes in the documentation -# rather than with sharp brackets. - -FORCE_LOCAL_INCLUDES = NO - -# If the INLINE_INFO tag is set to YES (the default) then a tag [inline] -# is inserted in the documentation for inline members. - -INLINE_INFO = YES - -# If the SORT_MEMBER_DOCS tag is set to YES (the default) then doxygen -# will sort the (detailed) documentation of file and class members -# alphabetically by member name. If set to NO the members will appear in -# declaration order. - -SORT_MEMBER_DOCS = YES - -# If the SORT_BRIEF_DOCS tag is set to YES then doxygen will sort the -# brief documentation of file, namespace and class members alphabetically -# by member name. If set to NO (the default) the members will appear in -# declaration order. - -SORT_BRIEF_DOCS = NO - -# If the SORT_MEMBERS_CTORS_1ST tag is set to YES then doxygen -# will sort the (brief and detailed) documentation of class members so that -# constructors and destructors are listed first. If set to NO (the default) -# the constructors will appear in the respective orders defined by -# SORT_MEMBER_DOCS and SORT_BRIEF_DOCS. -# This tag will be ignored for brief docs if SORT_BRIEF_DOCS is set to NO -# and ignored for detailed docs if SORT_MEMBER_DOCS is set to NO. - -SORT_MEMBERS_CTORS_1ST = NO - -# If the SORT_GROUP_NAMES tag is set to YES then doxygen will sort the -# hierarchy of group names into alphabetical order. If set to NO (the default) -# the group names will appear in their defined order. - -SORT_GROUP_NAMES = NO - -# If the SORT_BY_SCOPE_NAME tag is set to YES, the class list will be -# sorted by fully-qualified names, including namespaces. If set to -# NO (the default), the class list will be sorted only by class name, -# not including the namespace part. -# Note: This option is not very useful if HIDE_SCOPE_NAMES is set to YES. -# Note: This option applies only to the class list, not to the -# alphabetical list. - -SORT_BY_SCOPE_NAME = NO - -# If the STRICT_PROTO_MATCHING option is enabled and doxygen fails to -# do proper type resolution of all parameters of a function it will reject a -# match between the prototype and the implementation of a member function even -# if there is only one candidate or it is obvious which candidate to choose -# by doing a simple string match. By disabling STRICT_PROTO_MATCHING doxygen -# will still accept a match between prototype and implementation in such cases. - -STRICT_PROTO_MATCHING = NO - -# The GENERATE_TODOLIST tag can be used to enable (YES) or -# disable (NO) the todo list. This list is created by putting \todo -# commands in the documentation. - -GENERATE_TODOLIST = YES - -# The GENERATE_TESTLIST tag can be used to enable (YES) or -# disable (NO) the test list. This list is created by putting \test -# commands in the documentation. - -GENERATE_TESTLIST = YES - -# The GENERATE_BUGLIST tag can be used to enable (YES) or -# disable (NO) the bug list. This list is created by putting \bug -# commands in the documentation. - -GENERATE_BUGLIST = YES - -# The GENERATE_DEPRECATEDLIST tag can be used to enable (YES) or -# disable (NO) the deprecated list. This list is created by putting -# \deprecated commands in the documentation. - -GENERATE_DEPRECATEDLIST= YES - -# The ENABLED_SECTIONS tag can be used to enable conditional -# documentation sections, marked by \if sectionname ... \endif. - -ENABLED_SECTIONS = - -# The MAX_INITIALIZER_LINES tag determines the maximum number of lines -# the initial value of a variable or macro consists of for it to appear in -# the documentation. If the initializer consists of more lines than specified -# here it will be hidden. Use a value of 0 to hide initializers completely. -# The appearance of the initializer of individual variables and macros in the -# documentation can be controlled using \showinitializer or \hideinitializer -# command in the documentation regardless of this setting. - -MAX_INITIALIZER_LINES = 30 - -# Set the SHOW_USED_FILES tag to NO to disable the list of files generated -# at the bottom of the documentation of classes and structs. If set to YES the -# list will mention the files that were used to generate the documentation. - -SHOW_USED_FILES = YES - -# Set the SHOW_FILES tag to NO to disable the generation of the Files page. -# This will remove the Files entry from the Quick Index and from the -# Folder Tree View (if specified). The default is YES. - -# We probably will want this, but we have no file documentation yet so it's simpler to remove -# it for now. -SHOW_FILES = NO - -# Set the SHOW_NAMESPACES tag to NO to disable the generation of the -# Namespaces page. -# This will remove the Namespaces entry from the Quick Index -# and from the Folder Tree View (if specified). The default is YES. - -SHOW_NAMESPACES = YES - -# The FILE_VERSION_FILTER tag can be used to specify a program or script that -# doxygen should invoke to get the current version for each file (typically from -# the version control system). Doxygen will invoke the program by executing (via -# popen()) the command , where is the value of -# the FILE_VERSION_FILTER tag, and is the name of an input file -# provided by doxygen. Whatever the program writes to standard output -# is used as the file version. See the manual for examples. - -FILE_VERSION_FILTER = - -# The LAYOUT_FILE tag can be used to specify a layout file which will be parsed -# by doxygen. The layout file controls the global structure of the generated -# output files in an output format independent way. To create the layout file -# that represents doxygen's defaults, run doxygen with the -l option. -# You can optionally specify a file name after the option, if omitted -# DoxygenLayout.xml will be used as the name of the layout file. - -LAYOUT_FILE = - -# The CITE_BIB_FILES tag can be used to specify one or more bib files -# containing the references data. This must be a list of .bib files. The -# .bib extension is automatically appended if omitted. Using this command -# requires the bibtex tool to be installed. See also -# http://en.wikipedia.org/wiki/BibTeX for more info. For LaTeX the style -# of the bibliography can be controlled using LATEX_BIB_STYLE. To use this -# feature you need bibtex and perl available in the search path. - -CITE_BIB_FILES = - -#--------------------------------------------------------------------------- -# configuration options related to warning and progress messages -#--------------------------------------------------------------------------- - -# The QUIET tag can be used to turn on/off the messages that are generated -# by doxygen. Possible values are YES and NO. If left blank NO is used. - -QUIET = NO - -# The WARNINGS tag can be used to turn on/off the warning messages that are -# generated by doxygen. Possible values are YES and NO. If left blank -# NO is used. - -WARNINGS = YES - -# If WARN_IF_UNDOCUMENTED is set to YES, then doxygen will generate warnings -# for undocumented members. If EXTRACT_ALL is set to YES then this flag will -# automatically be disabled. - -WARN_IF_UNDOCUMENTED = YES - -# If WARN_IF_DOC_ERROR is set to YES, doxygen will generate warnings for -# potential errors in the documentation, such as not documenting some -# parameters in a documented function, or documenting parameters that -# don't exist or using markup commands wrongly. - -WARN_IF_DOC_ERROR = YES - -# The WARN_NO_PARAMDOC option can be enabled to get warnings for -# functions that are documented, but have no documentation for their parameters -# or return value. If set to NO (the default) doxygen will only warn about -# wrong or incomplete parameter documentation, but not about the absence of -# documentation. - -WARN_NO_PARAMDOC = NO - -# The WARN_FORMAT tag determines the format of the warning messages that -# doxygen can produce. The string should contain the $file, $line, and $text -# tags, which will be replaced by the file and line number from which the -# warning originated and the warning text. Optionally the format may contain -# $version, which will be replaced by the version of the file (if it could -# be obtained via FILE_VERSION_FILTER) - -WARN_FORMAT = - -# The WARN_LOGFILE tag can be used to specify a file to which warning -# and error messages should be written. If left blank the output is written -# to stderr. - -WARN_LOGFILE = - -#--------------------------------------------------------------------------- -# configuration options related to the input files -#--------------------------------------------------------------------------- - -# The INPUT tag can be used to specify the files and/or directories that contain -# documented source files. You may enter file names like "myfile.cpp" or -# directories like "/usr/src/myproject". Separate the files or directories -# with spaces. - -INPUT = src doc/doxygen/libomp_interface.h -# The ittnotify code also has doxygen documentation, but if we include it here -# it takes over from us! -# src/thirdparty/ittnotify - -# This tag can be used to specify the character encoding of the source files -# that doxygen parses. Internally doxygen uses the UTF-8 encoding, which is -# also the default input encoding. Doxygen uses libiconv (or the iconv built -# into libc) for the transcoding. See http://www.gnu.org/software/libiconv for -# the list of possible encodings. - -INPUT_ENCODING = UTF-8 - -# If the value of the INPUT tag contains directories, you can use the -# FILE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp -# and *.h) to filter out the source-files in the directories. If left -# blank the following patterns are tested: -# *.c *.cc *.cxx *.cpp *.c++ *.d *.java *.ii *.ixx *.ipp *.i++ *.inl *.h *.hh -# *.hxx *.hpp *.h++ *.idl *.odl *.cs *.php *.php3 *.inc *.m *.mm *.dox *.py -# *.f90 *.f *.for *.vhd *.vhdl - -FILE_PATTERNS = *.c *.h *.cpp -# We may also want to include the asm files with appropriate ifdef to ensure -# doxygen doesn't see the content, just the documentation... - -# The RECURSIVE tag can be used to turn specify whether or not subdirectories -# should be searched for input files as well. Possible values are YES and NO. -# If left blank NO is used. - -# Only look in the one directory. -RECURSIVE = NO - -# The EXCLUDE tag can be used to specify files and/or directories that should be -# excluded from the INPUT source files. This way you can easily exclude a -# subdirectory from a directory tree whose root is specified with the INPUT tag. -# Note that relative paths are relative to the directory from which doxygen is -# run. - -EXCLUDE = src/test-touch.c - -# The EXCLUDE_SYMLINKS tag can be used to select whether or not files or -# directories that are symbolic links (a Unix file system feature) are excluded -# from the input. - -EXCLUDE_SYMLINKS = NO - -# If the value of the INPUT tag contains directories, you can use the -# EXCLUDE_PATTERNS tag to specify one or more wildcard patterns to exclude -# certain files from those directories. Note that the wildcards are matched -# against the file with absolute path, so to exclude all test directories -# for example use the pattern */test/* - -EXCLUDE_PATTERNS = - -# The EXCLUDE_SYMBOLS tag can be used to specify one or more symbol names -# (namespaces, classes, functions, etc.) that should be excluded from the -# output. The symbol name can be a fully qualified name, a word, or if the -# wildcard * is used, a substring. Examples: ANamespace, AClass, -# AClass::ANamespace, ANamespace::*Test - -EXCLUDE_SYMBOLS = - -# The EXAMPLE_PATH tag can be used to specify one or more files or -# directories that contain example code fragments that are included (see -# the \include command). - -EXAMPLE_PATH = - -# If the value of the EXAMPLE_PATH tag contains directories, you can use the -# EXAMPLE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp -# and *.h) to filter out the source-files in the directories. If left -# blank all files are included. - -EXAMPLE_PATTERNS = - -# If the EXAMPLE_RECURSIVE tag is set to YES then subdirectories will be -# searched for input files to be used with the \include or \dontinclude -# commands irrespective of the value of the RECURSIVE tag. -# Possible values are YES and NO. If left blank NO is used. - -EXAMPLE_RECURSIVE = NO - -# The IMAGE_PATH tag can be used to specify one or more files or -# directories that contain image that are included in the documentation (see -# the \image command). - -IMAGE_PATH = - -# The INPUT_FILTER tag can be used to specify a program that doxygen should -# invoke to filter for each input file. Doxygen will invoke the filter program -# by executing (via popen()) the command , where -# is the value of the INPUT_FILTER tag, and is the name of an -# input file. Doxygen will then use the output that the filter program writes -# to standard output. -# If FILTER_PATTERNS is specified, this tag will be -# ignored. - -INPUT_FILTER = - -# The FILTER_PATTERNS tag can be used to specify filters on a per file pattern -# basis. -# Doxygen will compare the file name with each pattern and apply the -# filter if there is a match. -# The filters are a list of the form: -# pattern=filter (like *.cpp=my_cpp_filter). See INPUT_FILTER for further -# info on how filters are used. If FILTER_PATTERNS is empty or if -# non of the patterns match the file name, INPUT_FILTER is applied. - -FILTER_PATTERNS = - -# If the FILTER_SOURCE_FILES tag is set to YES, the input filter (if set using -# INPUT_FILTER) will be used to filter the input files when producing source -# files to browse (i.e. when SOURCE_BROWSER is set to YES). - -FILTER_SOURCE_FILES = NO - -# The FILTER_SOURCE_PATTERNS tag can be used to specify source filters per file -# pattern. A pattern will override the setting for FILTER_PATTERN (if any) -# and it is also possible to disable source filtering for a specific pattern -# using *.ext= (so without naming a filter). This option only has effect when -# FILTER_SOURCE_FILES is enabled. - -FILTER_SOURCE_PATTERNS = - -#--------------------------------------------------------------------------- -# configuration options related to source browsing -#--------------------------------------------------------------------------- - -# If the SOURCE_BROWSER tag is set to YES then a list of source files will -# be generated. Documented entities will be cross-referenced with these sources. -# Note: To get rid of all source code in the generated output, make sure also -# VERBATIM_HEADERS is set to NO. - -SOURCE_BROWSER = YES - -# Setting the INLINE_SOURCES tag to YES will include the body -# of functions and classes directly in the documentation. - -INLINE_SOURCES = NO - -# Setting the STRIP_CODE_COMMENTS tag to YES (the default) will instruct -# doxygen to hide any special comment blocks from generated source code -# fragments. Normal C, C++ and Fortran comments will always remain visible. - -STRIP_CODE_COMMENTS = YES - -# If the REFERENCED_BY_RELATION tag is set to YES -# then for each documented function all documented -# functions referencing it will be listed. - -REFERENCED_BY_RELATION = YES - -# If the REFERENCES_RELATION tag is set to YES -# then for each documented function all documented entities -# called/used by that function will be listed. - -REFERENCES_RELATION = NO - -# If the REFERENCES_LINK_SOURCE tag is set to YES (the default) -# and SOURCE_BROWSER tag is set to YES, then the hyperlinks from -# functions in REFERENCES_RELATION and REFERENCED_BY_RELATION lists will -# link to the source code. -# Otherwise they will link to the documentation. - -REFERENCES_LINK_SOURCE = YES - -# If the USE_HTAGS tag is set to YES then the references to source code -# will point to the HTML generated by the htags(1) tool instead of doxygen -# built-in source browser. The htags tool is part of GNU's global source -# tagging system (see http://www.gnu.org/software/global/global.html). You -# will need version 4.8.6 or higher. - -USE_HTAGS = NO - -# If the VERBATIM_HEADERS tag is set to YES (the default) then Doxygen -# will generate a verbatim copy of the header file for each class for -# which an include is specified. Set to NO to disable this. - -VERBATIM_HEADERS = YES - -#--------------------------------------------------------------------------- -# configuration options related to the alphabetical class index -#--------------------------------------------------------------------------- - -# If the ALPHABETICAL_INDEX tag is set to YES, an alphabetical index -# of all compounds will be generated. Enable this if the project -# contains a lot of classes, structs, unions or interfaces. - -ALPHABETICAL_INDEX = YES - -# If the alphabetical index is enabled (see ALPHABETICAL_INDEX) then -# the COLS_IN_ALPHA_INDEX tag can be used to specify the number of columns -# in which this list will be split (can be a number in the range [1..20]) - -COLS_IN_ALPHA_INDEX = 5 - -# In case all classes in a project start with a common prefix, all -# classes will be put under the same header in the alphabetical index. -# The IGNORE_PREFIX tag can be used to specify one or more prefixes that -# should be ignored while generating the index headers. - -IGNORE_PREFIX = - -#--------------------------------------------------------------------------- -# configuration options related to the HTML output -#--------------------------------------------------------------------------- - -# If the GENERATE_HTML tag is set to YES (the default) Doxygen will -# generate HTML output. - -GENERATE_HTML = YES - -# The HTML_OUTPUT tag is used to specify where the HTML docs will be put. -# If a relative path is entered the value of OUTPUT_DIRECTORY will be -# put in front of it. If left blank `html' will be used as the default path. - -HTML_OUTPUT = - -# The HTML_FILE_EXTENSION tag can be used to specify the file extension for -# each generated HTML page (for example: .htm,.php,.asp). If it is left blank -# doxygen will generate files with .html extension. - -HTML_FILE_EXTENSION = .html - -# The HTML_HEADER tag can be used to specify a personal HTML header for -# each generated HTML page. If it is left blank doxygen will generate a -# standard header. Note that when using a custom header you are responsible -# for the proper inclusion of any scripts and style sheets that doxygen -# needs, which is dependent on the configuration options used. -# It is advised to generate a default header using "doxygen -w html -# header.html footer.html stylesheet.css YourConfigFile" and then modify -# that header. Note that the header is subject to change so you typically -# have to redo this when upgrading to a newer version of doxygen or when -# changing the value of configuration settings such as GENERATE_TREEVIEW! - -HTML_HEADER = - -# The HTML_FOOTER tag can be used to specify a personal HTML footer for -# each generated HTML page. If it is left blank doxygen will generate a -# standard footer. - -HTML_FOOTER = - -# The HTML_STYLESHEET tag can be used to specify a user-defined cascading -# style sheet that is used by each HTML page. It can be used to -# fine-tune the look of the HTML output. If left blank doxygen will -# generate a default style sheet. Note that it is recommended to use -# HTML_EXTRA_STYLESHEET instead of this one, as it is more robust and this -# tag will in the future become obsolete. - -HTML_STYLESHEET = - -# The HTML_EXTRA_STYLESHEET tag can be used to specify an additional -# user-defined cascading style sheet that is included after the standard -# style sheets created by doxygen. Using this option one can overrule -# certain style aspects. This is preferred over using HTML_STYLESHEET -# since it does not replace the standard style sheet and is therefor more -# robust against future updates. Doxygen will copy the style sheet file to -# the output directory. - -HTML_EXTRA_STYLESHEET = - -# The HTML_EXTRA_FILES tag can be used to specify one or more extra images or -# other source files which should be copied to the HTML output directory. Note -# that these files will be copied to the base HTML output directory. Use the -# $relpath$ marker in the HTML_HEADER and/or HTML_FOOTER files to load these -# files. In the HTML_STYLESHEET file, use the file name only. Also note that -# the files will be copied as-is; there are no commands or markers available. - -HTML_EXTRA_FILES = - -# The HTML_COLORSTYLE_HUE tag controls the color of the HTML output. -# Doxygen will adjust the colors in the style sheet and background images -# according to this color. Hue is specified as an angle on a colorwheel, -# see http://en.wikipedia.org/wiki/Hue for more information. -# For instance the value 0 represents red, 60 is yellow, 120 is green, -# 180 is cyan, 240 is blue, 300 purple, and 360 is red again. -# The allowed range is 0 to 359. - -HTML_COLORSTYLE_HUE = 220 - -# The HTML_COLORSTYLE_SAT tag controls the purity (or saturation) of -# the colors in the HTML output. For a value of 0 the output will use -# grayscales only. A value of 255 will produce the most vivid colors. - -HTML_COLORSTYLE_SAT = 100 - -# The HTML_COLORSTYLE_GAMMA tag controls the gamma correction applied to -# the luminance component of the colors in the HTML output. Values below -# 100 gradually make the output lighter, whereas values above 100 make -# the output darker. The value divided by 100 is the actual gamma applied, -# so 80 represents a gamma of 0.8, The value 220 represents a gamma of 2.2, -# and 100 does not change the gamma. - -HTML_COLORSTYLE_GAMMA = 80 - -# If the HTML_TIMESTAMP tag is set to YES then the footer of each generated HTML -# page will contain the date and time when the page was generated. Setting -# this to NO can help when comparing the output of multiple runs. - -HTML_TIMESTAMP = NO - -# If the HTML_DYNAMIC_SECTIONS tag is set to YES then the generated HTML -# documentation will contain sections that can be hidden and shown after the -# page has loaded. - -HTML_DYNAMIC_SECTIONS = NO - -# With HTML_INDEX_NUM_ENTRIES one can control the preferred number of -# entries shown in the various tree structured indices initially; the user -# can expand and collapse entries dynamically later on. Doxygen will expand -# the tree to such a level that at most the specified number of entries are -# visible (unless a fully collapsed tree already exceeds this amount). -# So setting the number of entries 1 will produce a full collapsed tree by -# default. 0 is a special value representing an infinite number of entries -# and will result in a full expanded tree by default. - -HTML_INDEX_NUM_ENTRIES = 100 - -# If the GENERATE_DOCSET tag is set to YES, additional index files -# will be generated that can be used as input for Apple's Xcode 3 -# integrated development environment, introduced with OSX 10.5 (Leopard). -# To create a documentation set, doxygen will generate a Makefile in the -# HTML output directory. Running make will produce the docset in that -# directory and running "make install" will install the docset in -# ~/Library/Developer/Shared/Documentation/DocSets so that Xcode will find -# it at startup. -# See http://developer.apple.com/tools/creatingdocsetswithdoxygen.html -# for more information. - -GENERATE_DOCSET = NO - -# When GENERATE_DOCSET tag is set to YES, this tag determines the name of the -# feed. A documentation feed provides an umbrella under which multiple -# documentation sets from a single provider (such as a company or product suite) -# can be grouped. - -DOCSET_FEEDNAME = "Doxygen generated docs" - -# When GENERATE_DOCSET tag is set to YES, this tag specifies a string that -# should uniquely identify the documentation set bundle. This should be a -# reverse domain-name style string, e.g. com.mycompany.MyDocSet. Doxygen -# will append .docset to the name. - -DOCSET_BUNDLE_ID = org.doxygen.Project - -# When GENERATE_PUBLISHER_ID tag specifies a string that should uniquely -# identify the documentation publisher. This should be a reverse domain-name -# style string, e.g. com.mycompany.MyDocSet.documentation. - -DOCSET_PUBLISHER_ID = org.doxygen.Publisher - -# The GENERATE_PUBLISHER_NAME tag identifies the documentation publisher. - -DOCSET_PUBLISHER_NAME = Publisher - -# If the GENERATE_HTMLHELP tag is set to YES, additional index files -# will be generated that can be used as input for tools like the -# Microsoft HTML help workshop to generate a compiled HTML help file (.chm) -# of the generated HTML documentation. - -GENERATE_HTMLHELP = NO - -# If the GENERATE_HTMLHELP tag is set to YES, the CHM_FILE tag can -# be used to specify the file name of the resulting .chm file. You -# can add a path in front of the file if the result should not be -# written to the html output directory. - -CHM_FILE = - -# If the GENERATE_HTMLHELP tag is set to YES, the HHC_LOCATION tag can -# be used to specify the location (absolute path including file name) of -# the HTML help compiler (hhc.exe). If non-empty doxygen will try to run -# the HTML help compiler on the generated index.hhp. - -HHC_LOCATION = - -# If the GENERATE_HTMLHELP tag is set to YES, the GENERATE_CHI flag -# controls if a separate .chi index file is generated (YES) or that -# it should be included in the main .chm file (NO). - -GENERATE_CHI = NO - -# If the GENERATE_HTMLHELP tag is set to YES, the CHM_INDEX_ENCODING -# is used to encode HtmlHelp index (hhk), content (hhc) and project file -# content. - -CHM_INDEX_ENCODING = - -# If the GENERATE_HTMLHELP tag is set to YES, the BINARY_TOC flag -# controls whether a binary table of contents is generated (YES) or a -# normal table of contents (NO) in the .chm file. - -BINARY_TOC = NO - -# The TOC_EXPAND flag can be set to YES to add extra items for group members -# to the contents of the HTML help documentation and to the tree view. - -TOC_EXPAND = NO - -# If the GENERATE_QHP tag is set to YES and both QHP_NAMESPACE and -# QHP_VIRTUAL_FOLDER are set, an additional index file will be generated -# that can be used as input for Qt's qhelpgenerator to generate a -# Qt Compressed Help (.qch) of the generated HTML documentation. - -GENERATE_QHP = NO - -# If the QHG_LOCATION tag is specified, the QCH_FILE tag can -# be used to specify the file name of the resulting .qch file. -# The path specified is relative to the HTML output folder. - -QCH_FILE = - -# The QHP_NAMESPACE tag specifies the namespace to use when generating -# Qt Help Project output. For more information please see -# http://doc.trolltech.com/qthelpproject.html#namespace - -QHP_NAMESPACE = org.doxygen.Project - -# The QHP_VIRTUAL_FOLDER tag specifies the namespace to use when generating -# Qt Help Project output. For more information please see -# http://doc.trolltech.com/qthelpproject.html#virtual-folders - -QHP_VIRTUAL_FOLDER = doc - -# If QHP_CUST_FILTER_NAME is set, it specifies the name of a custom filter to -# add. For more information please see -# http://doc.trolltech.com/qthelpproject.html#custom-filters - -QHP_CUST_FILTER_NAME = - -# The QHP_CUST_FILT_ATTRS tag specifies the list of the attributes of the -# custom filter to add. For more information please see -# -# Qt Help Project / Custom Filters. - -QHP_CUST_FILTER_ATTRS = - -# The QHP_SECT_FILTER_ATTRS tag specifies the list of the attributes this -# project's -# filter section matches. -# -# Qt Help Project / Filter Attributes. - -QHP_SECT_FILTER_ATTRS = - -# If the GENERATE_QHP tag is set to YES, the QHG_LOCATION tag can -# be used to specify the location of Qt's qhelpgenerator. -# If non-empty doxygen will try to run qhelpgenerator on the generated -# .qhp file. - -QHG_LOCATION = - -# If the GENERATE_ECLIPSEHELP tag is set to YES, additional index files -# will be generated, which together with the HTML files, form an Eclipse help -# plugin. To install this plugin and make it available under the help contents -# menu in Eclipse, the contents of the directory containing the HTML and XML -# files needs to be copied into the plugins directory of eclipse. The name of -# the directory within the plugins directory should be the same as -# the ECLIPSE_DOC_ID value. After copying Eclipse needs to be restarted before -# the help appears. - -GENERATE_ECLIPSEHELP = NO - -# A unique identifier for the eclipse help plugin. When installing the plugin -# the directory name containing the HTML and XML files should also have -# this name. - -ECLIPSE_DOC_ID = org.doxygen.Project - -# The DISABLE_INDEX tag can be used to turn on/off the condensed index (tabs) -# at top of each HTML page. The value NO (the default) enables the index and -# the value YES disables it. Since the tabs have the same information as the -# navigation tree you can set this option to NO if you already set -# GENERATE_TREEVIEW to YES. - -DISABLE_INDEX = NO - -# The GENERATE_TREEVIEW tag is used to specify whether a tree-like index -# structure should be generated to display hierarchical information. -# If the tag value is set to YES, a side panel will be generated -# containing a tree-like index structure (just like the one that -# is generated for HTML Help). For this to work a browser that supports -# JavaScript, DHTML, CSS and frames is required (i.e. any modern browser). -# Windows users are probably better off using the HTML help feature. -# Since the tree basically has the same information as the tab index you -# could consider to set DISABLE_INDEX to NO when enabling this option. - -GENERATE_TREEVIEW = NO - -# The ENUM_VALUES_PER_LINE tag can be used to set the number of enum values -# (range [0,1..20]) that doxygen will group on one line in the generated HTML -# documentation. Note that a value of 0 will completely suppress the enum -# values from appearing in the overview section. - -ENUM_VALUES_PER_LINE = 4 - -# If the treeview is enabled (see GENERATE_TREEVIEW) then this tag can be -# used to set the initial width (in pixels) of the frame in which the tree -# is shown. - -TREEVIEW_WIDTH = 250 - -# When the EXT_LINKS_IN_WINDOW option is set to YES doxygen will open -# links to external symbols imported via tag files in a separate window. - -EXT_LINKS_IN_WINDOW = NO - -# Use this tag to change the font size of Latex formulas included -# as images in the HTML documentation. The default is 10. Note that -# when you change the font size after a successful doxygen run you need -# to manually remove any form_*.png images from the HTML output directory -# to force them to be regenerated. - -FORMULA_FONTSIZE = 10 - -# Use the FORMULA_TRANPARENT tag to determine whether or not the images -# generated for formulas are transparent PNGs. Transparent PNGs are -# not supported properly for IE 6.0, but are supported on all modern browsers. -# Note that when changing this option you need to delete any form_*.png files -# in the HTML output before the changes have effect. - -FORMULA_TRANSPARENT = YES - -# Enable the USE_MATHJAX option to render LaTeX formulas using MathJax -# (see http://www.mathjax.org) which uses client side Javascript for the -# rendering instead of using prerendered bitmaps. Use this if you do not -# have LaTeX installed or if you want to formulas look prettier in the HTML -# output. When enabled you may also need to install MathJax separately and -# configure the path to it using the MATHJAX_RELPATH option. - -USE_MATHJAX = NO - -# When MathJax is enabled you need to specify the location relative to the -# HTML output directory using the MATHJAX_RELPATH option. The destination -# directory should contain the MathJax.js script. For instance, if the mathjax -# directory is located at the same level as the HTML output directory, then -# MATHJAX_RELPATH should be ../mathjax. The default value points to -# the MathJax Content Delivery Network so you can quickly see the result without -# installing MathJax. -# However, it is strongly recommended to install a local -# copy of MathJax from http://www.mathjax.org before deployment. - -MATHJAX_RELPATH = http://cdn.mathjax.org/mathjax/latest - -# The MATHJAX_EXTENSIONS tag can be used to specify one or MathJax extension -# names that should be enabled during MathJax rendering. - -MATHJAX_EXTENSIONS = - -# When the SEARCHENGINE tag is enabled doxygen will generate a search box -# for the HTML output. The underlying search engine uses javascript -# and DHTML and should work on any modern browser. Note that when using -# HTML help (GENERATE_HTMLHELP), Qt help (GENERATE_QHP), or docsets -# (GENERATE_DOCSET) there is already a search function so this one should -# typically be disabled. For large projects the javascript based search engine -# can be slow, then enabling SERVER_BASED_SEARCH may provide a better solution. - -SEARCHENGINE = YES - -# When the SERVER_BASED_SEARCH tag is enabled the search engine will be -# implemented using a PHP enabled web server instead of at the web client -# using Javascript. Doxygen will generate the search PHP script and index -# file to put on the web server. The advantage of the server -# based approach is that it scales better to large projects and allows -# full text search. The disadvantages are that it is more difficult to setup -# and does not have live searching capabilities. - -SERVER_BASED_SEARCH = NO - -#--------------------------------------------------------------------------- -# configuration options related to the LaTeX output -#--------------------------------------------------------------------------- - -# If the GENERATE_LATEX tag is set to YES (the default) Doxygen will -# generate Latex output. - -GENERATE_LATEX = YES - -# The LATEX_OUTPUT tag is used to specify where the LaTeX docs will be put. -# If a relative path is entered the value of OUTPUT_DIRECTORY will be -# put in front of it. If left blank `latex' will be used as the default path. - -LATEX_OUTPUT = - -# The LATEX_CMD_NAME tag can be used to specify the LaTeX command name to be -# invoked. If left blank `latex' will be used as the default command name. -# Note that when enabling USE_PDFLATEX this option is only used for -# generating bitmaps for formulas in the HTML output, but not in the -# Makefile that is written to the output directory. - -LATEX_CMD_NAME = latex - -# The MAKEINDEX_CMD_NAME tag can be used to specify the command name to -# generate index for LaTeX. If left blank `makeindex' will be used as the -# default command name. - -MAKEINDEX_CMD_NAME = makeindex - -# If the COMPACT_LATEX tag is set to YES Doxygen generates more compact -# LaTeX documents. This may be useful for small projects and may help to -# save some trees in general. - -COMPACT_LATEX = NO - -# The PAPER_TYPE tag can be used to set the paper type that is used -# by the printer. Possible values are: a4, letter, legal and -# executive. If left blank a4wide will be used. - -PAPER_TYPE = a4wide - -# The EXTRA_PACKAGES tag can be to specify one or more names of LaTeX -# packages that should be included in the LaTeX output. - -EXTRA_PACKAGES = - -# The LATEX_HEADER tag can be used to specify a personal LaTeX header for -# the generated latex document. The header should contain everything until -# the first chapter. If it is left blank doxygen will generate a -# standard header. Notice: only use this tag if you know what you are doing! - -LATEX_HEADER = doc/doxygen/header.tex - -# The LATEX_FOOTER tag can be used to specify a personal LaTeX footer for -# the generated latex document. The footer should contain everything after -# the last chapter. If it is left blank doxygen will generate a -# standard footer. Notice: only use this tag if you know what you are doing! - -LATEX_FOOTER = - -# If the PDF_HYPERLINKS tag is set to YES, the LaTeX that is generated -# is prepared for conversion to pdf (using ps2pdf). The pdf file will -# contain links (just like the HTML output) instead of page references -# This makes the output suitable for online browsing using a pdf viewer. - -PDF_HYPERLINKS = YES - -# If the USE_PDFLATEX tag is set to YES, pdflatex will be used instead of -# plain latex in the generated Makefile. Set this option to YES to get a -# higher quality PDF documentation. - -USE_PDFLATEX = YES - -# If the LATEX_BATCHMODE tag is set to YES, doxygen will add the \\batchmode. -# command to the generated LaTeX files. This will instruct LaTeX to keep -# running if errors occur, instead of asking the user for help. -# This option is also used when generating formulas in HTML. - -LATEX_BATCHMODE = NO - -# If LATEX_HIDE_INDICES is set to YES then doxygen will not -# include the index chapters (such as File Index, Compound Index, etc.) -# in the output. - -LATEX_HIDE_INDICES = NO - -# If LATEX_SOURCE_CODE is set to YES then doxygen will include -# source code with syntax highlighting in the LaTeX output. -# Note that which sources are shown also depends on other settings -# such as SOURCE_BROWSER. - -LATEX_SOURCE_CODE = NO - -# The LATEX_BIB_STYLE tag can be used to specify the style to use for the -# bibliography, e.g. plainnat, or ieeetr. The default style is "plain". See -# http://en.wikipedia.org/wiki/BibTeX for more info. - -LATEX_BIB_STYLE = plain - -#--------------------------------------------------------------------------- -# configuration options related to the RTF output -#--------------------------------------------------------------------------- - -# If the GENERATE_RTF tag is set to YES Doxygen will generate RTF output -# The RTF output is optimized for Word 97 and may not look very pretty with -# other RTF readers or editors. - -GENERATE_RTF = NO - -# The RTF_OUTPUT tag is used to specify where the RTF docs will be put. -# If a relative path is entered the value of OUTPUT_DIRECTORY will be -# put in front of it. If left blank `rtf' will be used as the default path. - -RTF_OUTPUT = - -# If the COMPACT_RTF tag is set to YES Doxygen generates more compact -# RTF documents. This may be useful for small projects and may help to -# save some trees in general. - -COMPACT_RTF = NO - -# If the RTF_HYPERLINKS tag is set to YES, the RTF that is generated -# will contain hyperlink fields. The RTF file will -# contain links (just like the HTML output) instead of page references. -# This makes the output suitable for online browsing using WORD or other -# programs which support those fields. -# Note: wordpad (write) and others do not support links. - -RTF_HYPERLINKS = NO - -# Load style sheet definitions from file. Syntax is similar to doxygen's -# config file, i.e. a series of assignments. You only have to provide -# replacements, missing definitions are set to their default value. - -RTF_STYLESHEET_FILE = - -# Set optional variables used in the generation of an rtf document. -# Syntax is similar to doxygen's config file. - -RTF_EXTENSIONS_FILE = - -#--------------------------------------------------------------------------- -# configuration options related to the man page output -#--------------------------------------------------------------------------- - -# If the GENERATE_MAN tag is set to YES (the default) Doxygen will -# generate man pages - -GENERATE_MAN = NO - -# The MAN_OUTPUT tag is used to specify where the man pages will be put. -# If a relative path is entered the value of OUTPUT_DIRECTORY will be -# put in front of it. If left blank `man' will be used as the default path. - -MAN_OUTPUT = - -# The MAN_EXTENSION tag determines the extension that is added to -# the generated man pages (default is the subroutine's section .3) - -MAN_EXTENSION = - -# If the MAN_LINKS tag is set to YES and Doxygen generates man output, -# then it will generate one additional man file for each entity -# documented in the real man page(s). These additional files -# only source the real man page, but without them the man command -# would be unable to find the correct page. The default is NO. - -MAN_LINKS = NO - -#--------------------------------------------------------------------------- -# configuration options related to the XML output -#--------------------------------------------------------------------------- - -# If the GENERATE_XML tag is set to YES Doxygen will -# generate an XML file that captures the structure of -# the code including all documentation. - -GENERATE_XML = NO - -# The XML_OUTPUT tag is used to specify where the XML pages will be put. -# If a relative path is entered the value of OUTPUT_DIRECTORY will be -# put in front of it. If left blank `xml' will be used as the default path. - -XML_OUTPUT = xml - -# The XML_SCHEMA tag can be used to specify an XML schema, -# which can be used by a validating XML parser to check the -# syntax of the XML files. - -XML_SCHEMA = - -# The XML_DTD tag can be used to specify an XML DTD, -# which can be used by a validating XML parser to check the -# syntax of the XML files. - -XML_DTD = - -# If the XML_PROGRAMLISTING tag is set to YES Doxygen will -# dump the program listings (including syntax highlighting -# and cross-referencing information) to the XML output. Note that -# enabling this will significantly increase the size of the XML output. - -XML_PROGRAMLISTING = YES - -#--------------------------------------------------------------------------- -# configuration options for the AutoGen Definitions output -#--------------------------------------------------------------------------- - -# If the GENERATE_AUTOGEN_DEF tag is set to YES Doxygen will -# generate an AutoGen Definitions (see autogen.sf.net) file -# that captures the structure of the code including all -# documentation. Note that this feature is still experimental -# and incomplete at the moment. - -GENERATE_AUTOGEN_DEF = NO - -#--------------------------------------------------------------------------- -# configuration options related to the Perl module output -#--------------------------------------------------------------------------- - -# If the GENERATE_PERLMOD tag is set to YES Doxygen will -# generate a Perl module file that captures the structure of -# the code including all documentation. Note that this -# feature is still experimental and incomplete at the -# moment. - -GENERATE_PERLMOD = NO - -# If the PERLMOD_LATEX tag is set to YES Doxygen will generate -# the necessary Makefile rules, Perl scripts and LaTeX code to be able -# to generate PDF and DVI output from the Perl module output. - -PERLMOD_LATEX = NO - -# If the PERLMOD_PRETTY tag is set to YES the Perl module output will be -# nicely formatted so it can be parsed by a human reader. -# This is useful -# if you want to understand what is going on. -# On the other hand, if this -# tag is set to NO the size of the Perl module output will be much smaller -# and Perl will parse it just the same. - -PERLMOD_PRETTY = YES - -# The names of the make variables in the generated doxyrules.make file -# are prefixed with the string contained in PERLMOD_MAKEVAR_PREFIX. -# This is useful so different doxyrules.make files included by the same -# Makefile don't overwrite each other's variables. - -PERLMOD_MAKEVAR_PREFIX = - -#--------------------------------------------------------------------------- -# Configuration options related to the preprocessor -#--------------------------------------------------------------------------- - -# If the ENABLE_PREPROCESSING tag is set to YES (the default) Doxygen will -# evaluate all C-preprocessor directives found in the sources and include -# files. - -ENABLE_PREPROCESSING = YES - -# If the MACRO_EXPANSION tag is set to YES Doxygen will expand all macro -# names in the source code. If set to NO (the default) only conditional -# compilation will be performed. Macro expansion can be done in a controlled -# way by setting EXPAND_ONLY_PREDEF to YES. - -MACRO_EXPANSION = YES - -# If the EXPAND_ONLY_PREDEF and MACRO_EXPANSION tags are both set to YES -# then the macro expansion is limited to the macros specified with the -# PREDEFINED and EXPAND_AS_DEFINED tags. - -EXPAND_ONLY_PREDEF = YES - -# If the SEARCH_INCLUDES tag is set to YES (the default) the includes files -# pointed to by INCLUDE_PATH will be searched when a #include is found. - -SEARCH_INCLUDES = YES - -# The INCLUDE_PATH tag can be used to specify one or more directories that -# contain include files that are not input files but should be processed by -# the preprocessor. - -INCLUDE_PATH = - -# You can use the INCLUDE_FILE_PATTERNS tag to specify one or more wildcard -# patterns (like *.h and *.hpp) to filter out the header-files in the -# directories. If left blank, the patterns specified with FILE_PATTERNS will -# be used. - -INCLUDE_FILE_PATTERNS = - -# The PREDEFINED tag can be used to specify one or more macro names that -# are defined before the preprocessor is started (similar to the -D option of -# gcc). The argument of the tag is a list of macros of the form: name -# or name=definition (no spaces). If the definition and the = are -# omitted =1 is assumed. To prevent a macro definition from being -# undefined via #undef or recursively expanded use the := operator -# instead of the = operator. - -PREDEFINED = OMP_30_ENABLED=1, OMP_40_ENABLED=1, KMP_STATS_ENABLED=1 - -# If the MACRO_EXPANSION and EXPAND_ONLY_PREDEF tags are set to YES then -# this tag can be used to specify a list of macro names that should be expanded. -# The macro definition that is found in the sources will be used. -# Use the PREDEFINED tag if you want to use a different macro definition that -# overrules the definition found in the source code. - -EXPAND_AS_DEFINED = - -# If the SKIP_FUNCTION_MACROS tag is set to YES (the default) then -# doxygen's preprocessor will remove all references to function-like macros -# that are alone on a line, have an all uppercase name, and do not end with a -# semicolon, because these will confuse the parser if not removed. - -SKIP_FUNCTION_MACROS = YES - -#--------------------------------------------------------------------------- -# Configuration::additions related to external references -#--------------------------------------------------------------------------- - -# The TAGFILES option can be used to specify one or more tagfiles. For each -# tag file the location of the external documentation should be added. The -# format of a tag file without this location is as follows: -# -# TAGFILES = file1 file2 ... -# Adding location for the tag files is done as follows: -# -# TAGFILES = file1=loc1 "file2 = loc2" ... -# where "loc1" and "loc2" can be relative or absolute paths -# or URLs. Note that each tag file must have a unique name (where the name does -# NOT include the path). If a tag file is not located in the directory in which -# doxygen is run, you must also specify the path to the tagfile here. - -TAGFILES = - -# When a file name is specified after GENERATE_TAGFILE, doxygen will create -# a tag file that is based on the input files it reads. - -GENERATE_TAGFILE = - -# If the ALLEXTERNALS tag is set to YES all external classes will be listed -# in the class index. If set to NO only the inherited external classes -# will be listed. - -ALLEXTERNALS = NO - -# If the EXTERNAL_GROUPS tag is set to YES all external groups will be listed -# in the modules index. If set to NO, only the current project's groups will -# be listed. - -EXTERNAL_GROUPS = YES - -# The PERL_PATH should be the absolute path and name of the perl script -# interpreter (i.e. the result of `which perl'). - -PERL_PATH = - -#--------------------------------------------------------------------------- -# Configuration options related to the dot tool -#--------------------------------------------------------------------------- - -# If the CLASS_DIAGRAMS tag is set to YES (the default) Doxygen will -# generate a inheritance diagram (in HTML, RTF and LaTeX) for classes with base -# or super classes. Setting the tag to NO turns the diagrams off. Note that -# this option also works with HAVE_DOT disabled, but it is recommended to -# install and use dot, since it yields more powerful graphs. - -CLASS_DIAGRAMS = YES - -# You can define message sequence charts within doxygen comments using the \msc -# command. Doxygen will then run the mscgen tool (see -# http://www.mcternan.me.uk/mscgen/) to produce the chart and insert it in the -# documentation. The MSCGEN_PATH tag allows you to specify the directory where -# the mscgen tool resides. If left empty the tool is assumed to be found in the -# default search path. - -MSCGEN_PATH = - -# If set to YES, the inheritance and collaboration graphs will hide -# inheritance and usage relations if the target is undocumented -# or is not a class. - -HIDE_UNDOC_RELATIONS = YES - -# If you set the HAVE_DOT tag to YES then doxygen will assume the dot tool is -# available from the path. This tool is part of Graphviz, a graph visualization -# toolkit from AT&T and Lucent Bell Labs. The other options in this section -# have no effect if this option is set to NO (the default) - -HAVE_DOT = NO - -# The DOT_NUM_THREADS specifies the number of dot invocations doxygen is -# allowed to run in parallel. When set to 0 (the default) doxygen will -# base this on the number of processors available in the system. You can set it -# explicitly to a value larger than 0 to get control over the balance -# between CPU load and processing speed. - -DOT_NUM_THREADS = 0 - -# By default doxygen will use the Helvetica font for all dot files that -# doxygen generates. When you want a differently looking font you can specify -# the font name using DOT_FONTNAME. You need to make sure dot is able to find -# the font, which can be done by putting it in a standard location or by setting -# the DOTFONTPATH environment variable or by setting DOT_FONTPATH to the -# directory containing the font. - -DOT_FONTNAME = Helvetica - -# The DOT_FONTSIZE tag can be used to set the size of the font of dot graphs. -# The default size is 10pt. - -DOT_FONTSIZE = 10 - -# By default doxygen will tell dot to use the Helvetica font. -# If you specify a different font using DOT_FONTNAME you can use DOT_FONTPATH to -# set the path where dot can find it. - -DOT_FONTPATH = - -# If the CLASS_GRAPH and HAVE_DOT tags are set to YES then doxygen -# will generate a graph for each documented class showing the direct and -# indirect inheritance relations. Setting this tag to YES will force the -# CLASS_DIAGRAMS tag to NO. - -CLASS_GRAPH = YES - -# If the COLLABORATION_GRAPH and HAVE_DOT tags are set to YES then doxygen -# will generate a graph for each documented class showing the direct and -# indirect implementation dependencies (inheritance, containment, and -# class references variables) of the class with other documented classes. - -COLLABORATION_GRAPH = NO - -# If the GROUP_GRAPHS and HAVE_DOT tags are set to YES then doxygen -# will generate a graph for groups, showing the direct groups dependencies - -GROUP_GRAPHS = YES - -# If the UML_LOOK tag is set to YES doxygen will generate inheritance and -# collaboration diagrams in a style similar to the OMG's Unified Modeling -# Language. - -UML_LOOK = NO - -# If the UML_LOOK tag is enabled, the fields and methods are shown inside -# the class node. If there are many fields or methods and many nodes the -# graph may become too big to be useful. The UML_LIMIT_NUM_FIELDS -# threshold limits the number of items for each type to make the size more -# manageable. Set this to 0 for no limit. Note that the threshold may be -# exceeded by 50% before the limit is enforced. - -UML_LIMIT_NUM_FIELDS = 10 - -# If set to YES, the inheritance and collaboration graphs will show the -# relations between templates and their instances. - -TEMPLATE_RELATIONS = YES - -# If the ENABLE_PREPROCESSING, SEARCH_INCLUDES, INCLUDE_GRAPH, and HAVE_DOT -# tags are set to YES then doxygen will generate a graph for each documented -# file showing the direct and indirect include dependencies of the file with -# other documented files. - -INCLUDE_GRAPH = NO - -# If the ENABLE_PREPROCESSING, SEARCH_INCLUDES, INCLUDED_BY_GRAPH, and -# HAVE_DOT tags are set to YES then doxygen will generate a graph for each -# documented header file showing the documented files that directly or -# indirectly include this file. - -INCLUDED_BY_GRAPH = NO - -# If the CALL_GRAPH and HAVE_DOT options are set to YES then -# doxygen will generate a call dependency graph for every global function -# or class method. Note that enabling this option will significantly increase -# the time of a run. So in most cases it will be better to enable call graphs -# for selected functions only using the \callgraph command. - -CALL_GRAPH = NO - -# If the CALLER_GRAPH and HAVE_DOT tags are set to YES then -# doxygen will generate a caller dependency graph for every global function -# or class method. Note that enabling this option will significantly increase -# the time of a run. So in most cases it will be better to enable caller -# graphs for selected functions only using the \callergraph command. - -CALLER_GRAPH = NO - -# If the GRAPHICAL_HIERARCHY and HAVE_DOT tags are set to YES then doxygen -# will generate a graphical hierarchy of all classes instead of a textual one. - -GRAPHICAL_HIERARCHY = YES - -# If the DIRECTORY_GRAPH and HAVE_DOT tags are set to YES -# then doxygen will show the dependencies a directory has on other directories -# in a graphical way. The dependency relations are determined by the #include -# relations between the files in the directories. - -DIRECTORY_GRAPH = YES - -# The DOT_IMAGE_FORMAT tag can be used to set the image format of the images -# generated by dot. Possible values are svg, png, jpg, or gif. -# If left blank png will be used. If you choose svg you need to set -# HTML_FILE_EXTENSION to xhtml in order to make the SVG files -# visible in IE 9+ (other browsers do not have this requirement). - -DOT_IMAGE_FORMAT = png - -# If DOT_IMAGE_FORMAT is set to svg, then this option can be set to YES to -# enable generation of interactive SVG images that allow zooming and panning. -# Note that this requires a modern browser other than Internet Explorer. -# Tested and working are Firefox, Chrome, Safari, and Opera. For IE 9+ you -# need to set HTML_FILE_EXTENSION to xhtml in order to make the SVG files -# visible. Older versions of IE do not have SVG support. - -INTERACTIVE_SVG = NO - -# The tag DOT_PATH can be used to specify the path where the dot tool can be -# found. If left blank, it is assumed the dot tool can be found in the path. - -DOT_PATH = - -# The DOTFILE_DIRS tag can be used to specify one or more directories that -# contain dot files that are included in the documentation (see the -# \dotfile command). - -DOTFILE_DIRS = - -# The MSCFILE_DIRS tag can be used to specify one or more directories that -# contain msc files that are included in the documentation (see the -# \mscfile command). - -MSCFILE_DIRS = - -# The DOT_GRAPH_MAX_NODES tag can be used to set the maximum number of -# nodes that will be shown in the graph. If the number of nodes in a graph -# becomes larger than this value, doxygen will truncate the graph, which is -# visualized by representing a node as a red box. Note that doxygen if the -# number of direct children of the root node in a graph is already larger than -# DOT_GRAPH_MAX_NODES then the graph will not be shown at all. Also note -# that the size of a graph can be further restricted by MAX_DOT_GRAPH_DEPTH. - -DOT_GRAPH_MAX_NODES = 50 - -# The MAX_DOT_GRAPH_DEPTH tag can be used to set the maximum depth of the -# graphs generated by dot. A depth value of 3 means that only nodes reachable -# from the root by following a path via at most 3 edges will be shown. Nodes -# that lay further from the root node will be omitted. Note that setting this -# option to 1 or 2 may greatly reduce the computation time needed for large -# code bases. Also note that the size of a graph can be further restricted by -# DOT_GRAPH_MAX_NODES. Using a depth of 0 means no depth restriction. - -MAX_DOT_GRAPH_DEPTH = 0 - -# Set the DOT_TRANSPARENT tag to YES to generate images with a transparent -# background. This is disabled by default, because dot on Windows does not -# seem to support this out of the box. Warning: Depending on the platform used, -# enabling this option may lead to badly anti-aliased labels on the edges of -# a graph (i.e. they become hard to read). - -DOT_TRANSPARENT = NO - -# Set the DOT_MULTI_TARGETS tag to YES allow dot to generate multiple output -# files in one run (i.e. multiple -o and -T options on the command line). This -# makes dot run faster, but since only newer versions of dot (>1.8.10) -# support this, this feature is disabled by default. - -DOT_MULTI_TARGETS = NO - -# If the GENERATE_LEGEND tag is set to YES (the default) Doxygen will -# generate a legend page explaining the meaning of the various boxes and -# arrows in the dot generated graphs. - -GENERATE_LEGEND = YES - -# If the DOT_CLEANUP tag is set to YES (the default) Doxygen will -# remove the intermediate dot files that are used to generate -# the various graphs. - -DOT_CLEANUP = YES +# Doxyfile 1.o8.2 + +# This file describes the settings to be used by the documentation system +# doxygen (www.doxygen.org) for a project. +# +# All text after a hash (#) is considered a comment and will be ignored. +# The format is: +# TAG = value [value, ...] +# For lists items can also be appended using: +# TAG += value [value, ...] +# Values that contain spaces should be placed between quotes (" "). + +#--------------------------------------------------------------------------- +# Project related configuration options +#--------------------------------------------------------------------------- + +# This tag specifies the encoding used for all characters in the config file +# that follow. The default is UTF-8 which is also the encoding used for all +# text before the first occurrence of this tag. Doxygen uses libiconv (or the +# iconv built into libc) for the transcoding. See +# http://www.gnu.org/software/libiconv for the list of possible encodings. + +DOXYFILE_ENCODING = UTF-8 + +# The PROJECT_NAME tag is a single word (or sequence of words) that should +# identify the project. Note that if you do not use Doxywizard you need +# to put quotes around the project name if it contains spaces. + +PROJECT_NAME = "LLVM OpenMP* Runtime Library" + +# The PROJECT_NUMBER tag can be used to enter a project or revision number. +# This could be handy for archiving the generated documentation or +# if some version control system is used. + +PROJECT_NUMBER = + +# Using the PROJECT_BRIEF tag one can provide an optional one line description +# for a project that appears at the top of each page and should give viewer +# a quick idea about the purpose of the project. Keep the description short. + +PROJECT_BRIEF = + +# With the PROJECT_LOGO tag one can specify an logo or icon that is +# included in the documentation. The maximum height of the logo should not +# exceed 55 pixels and the maximum width should not exceed 200 pixels. +# Doxygen will copy the logo to the output directory. + +PROJECT_LOGO = + +# The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute) +# base path where the generated documentation will be put. +# If a relative path is entered, it will be relative to the location +# where doxygen was started. If left blank the current directory will be used. + +OUTPUT_DIRECTORY = doc/doxygen/generated + +# If the CREATE_SUBDIRS tag is set to YES, then doxygen will create +# 4096 sub-directories (in 2 levels) under the output directory of each output +# format and will distribute the generated files over these directories. +# Enabling this option can be useful when feeding doxygen a huge amount of +# source files, where putting all generated files in the same directory would +# otherwise cause performance problems for the file system. + +CREATE_SUBDIRS = NO + +# The OUTPUT_LANGUAGE tag is used to specify the language in which all +# documentation generated by doxygen is written. Doxygen will use this +# information to generate all constant output in the proper language. +# The default language is English, other supported languages are: +# Afrikaans, Arabic, Brazilian, Catalan, Chinese, Chinese-Traditional, +# Croatian, Czech, Danish, Dutch, Esperanto, Farsi, Finnish, French, German, +# Greek, Hungarian, Italian, Japanese, Japanese-en (Japanese with English +# messages), Korean, Korean-en, Lithuanian, Norwegian, Macedonian, Persian, +# Polish, Portuguese, Romanian, Russian, Serbian, Serbian-Cyrillic, Slovak, +# Slovene, Spanish, Swedish, Ukrainian, and Vietnamese. + +OUTPUT_LANGUAGE = English + +# If the BRIEF_MEMBER_DESC tag is set to YES (the default) Doxygen will +# include brief member descriptions after the members that are listed in +# the file and class documentation (similar to JavaDoc). +# Set to NO to disable this. + +BRIEF_MEMBER_DESC = YES + +# If the REPEAT_BRIEF tag is set to YES (the default) Doxygen will prepend +# the brief description of a member or function before the detailed description. +# Note: if both HIDE_UNDOC_MEMBERS and BRIEF_MEMBER_DESC are set to NO, the +# brief descriptions will be completely suppressed. + +REPEAT_BRIEF = YES + +# This tag implements a quasi-intelligent brief description abbreviator +# that is used to form the text in various listings. Each string +# in this list, if found as the leading text of the brief description, will be +# stripped from the text and the result after processing the whole list, is +# used as the annotated text. Otherwise, the brief description is used as-is. +# If left blank, the following values are used ("$name" is automatically +# replaced with the name of the entity): "The $name class" "The $name widget" +# "The $name file" "is" "provides" "specifies" "contains" +# "represents" "a" "an" "the" + +ABBREVIATE_BRIEF = + +# If the ALWAYS_DETAILED_SEC and REPEAT_BRIEF tags are both set to YES then +# Doxygen will generate a detailed section even if there is only a brief +# description. + +ALWAYS_DETAILED_SEC = NO + +# If the INLINE_INHERITED_MEMB tag is set to YES, doxygen will show all +# inherited members of a class in the documentation of that class as if those +# members were ordinary class members. Constructors, destructors and assignment +# operators of the base classes will not be shown. + +INLINE_INHERITED_MEMB = NO + +# If the FULL_PATH_NAMES tag is set to YES then Doxygen will prepend the full +# path before files name in the file list and in the header files. If set +# to NO the shortest path that makes the file name unique will be used. + +FULL_PATH_NAMES = NO + +# If the FULL_PATH_NAMES tag is set to YES then the STRIP_FROM_PATH tag +# can be used to strip a user-defined part of the path. Stripping is +# only done if one of the specified strings matches the left-hand part of +# the path. The tag can be used to show relative paths in the file list. +# If left blank the directory from which doxygen is run is used as the +# path to strip. Note that you specify absolute paths here, but also +# relative paths, which will be relative from the directory where doxygen is +# started. + +STRIP_FROM_PATH = + +# The STRIP_FROM_INC_PATH tag can be used to strip a user-defined part of +# the path mentioned in the documentation of a class, which tells +# the reader which header file to include in order to use a class. +# If left blank only the name of the header file containing the class +# definition is used. Otherwise one should specify the include paths that +# are normally passed to the compiler using the -I flag. + +STRIP_FROM_INC_PATH = + +# If the SHORT_NAMES tag is set to YES, doxygen will generate much shorter +# (but less readable) file names. This can be useful if your file system +# doesn't support long names like on DOS, Mac, or CD-ROM. + +SHORT_NAMES = NO + +# If the JAVADOC_AUTOBRIEF tag is set to YES then Doxygen +# will interpret the first line (until the first dot) of a JavaDoc-style +# comment as the brief description. If set to NO, the JavaDoc +# comments will behave just like regular Qt-style comments +# (thus requiring an explicit @brief command for a brief description.) + +JAVADOC_AUTOBRIEF = NO + +# If the QT_AUTOBRIEF tag is set to YES then Doxygen will +# interpret the first line (until the first dot) of a Qt-style +# comment as the brief description. If set to NO, the comments +# will behave just like regular Qt-style comments (thus requiring +# an explicit \brief command for a brief description.) + +QT_AUTOBRIEF = NO + +# The MULTILINE_CPP_IS_BRIEF tag can be set to YES to make Doxygen +# treat a multi-line C++ special comment block (i.e. a block of //! or /// +# comments) as a brief description. This used to be the default behaviour. +# The new default is to treat a multi-line C++ comment block as a detailed +# description. Set this tag to YES if you prefer the old behaviour instead. + +MULTILINE_CPP_IS_BRIEF = NO + +# If the INHERIT_DOCS tag is set to YES (the default) then an undocumented +# member inherits the documentation from any documented member that it +# re-implements. + +INHERIT_DOCS = YES + +# If the SEPARATE_MEMBER_PAGES tag is set to YES, then doxygen will produce +# a new page for each member. If set to NO, the documentation of a member will +# be part of the file/class/namespace that contains it. + +SEPARATE_MEMBER_PAGES = NO + +# The TAB_SIZE tag can be used to set the number of spaces in a tab. +# Doxygen uses this value to replace tabs by spaces in code fragments. + +TAB_SIZE = 8 + +# This tag can be used to specify a number of aliases that acts +# as commands in the documentation. An alias has the form "name=value". +# For example adding "sideeffect=\par Side Effects:\n" will allow you to +# put the command \sideeffect (or @sideeffect) in the documentation, which +# will result in a user-defined paragraph with heading "Side Effects:". +# You can put \n's in the value part of an alias to insert newlines. + +ALIASES = "other=*" + +# This tag can be used to specify a number of word-keyword mappings (TCL only). +# A mapping has the form "name=value". For example adding +# "class=itcl::class" will allow you to use the command class in the +# itcl::class meaning. + +TCL_SUBST = + +# Set the OPTIMIZE_OUTPUT_FOR_C tag to YES if your project consists of C +# sources only. Doxygen will then generate output that is more tailored for C. +# For instance, some of the names that are used will be different. The list +# of all members will be omitted, etc. + +OPTIMIZE_OUTPUT_FOR_C = NO + +# Set the OPTIMIZE_OUTPUT_JAVA tag to YES if your project consists of Java +# sources only. Doxygen will then generate output that is more tailored for +# Java. For instance, namespaces will be presented as packages, qualified +# scopes will look different, etc. + +OPTIMIZE_OUTPUT_JAVA = NO + +# Set the OPTIMIZE_FOR_FORTRAN tag to YES if your project consists of Fortran +# sources only. Doxygen will then generate output that is more tailored for +# Fortran. + +OPTIMIZE_FOR_FORTRAN = NO + +# Set the OPTIMIZE_OUTPUT_VHDL tag to YES if your project consists of VHDL +# sources. Doxygen will then generate output that is tailored for +# VHDL. + +OPTIMIZE_OUTPUT_VHDL = NO + +# Doxygen selects the parser to use depending on the extension of the files it +# parses. With this tag you can assign which parser to use for a given +# extension. Doxygen has a built-in mapping, but you can override or extend it +# using this tag. The format is ext=language, where ext is a file extension, +# and language is one of the parsers supported by doxygen: IDL, Java, +# Javascript, CSharp, C, C++, D, PHP, Objective-C, Python, Fortran, VHDL, C, +# C++. For instance to make doxygen treat .inc files as Fortran files (default +# is PHP), and .f files as C (default is Fortran), use: inc=Fortran f=C. Note +# that for custom extensions you also need to set FILE_PATTERNS otherwise the +# files are not read by doxygen. + +EXTENSION_MAPPING = + +# If MARKDOWN_SUPPORT is enabled (the default) then doxygen pre-processes all +# comments according to the Markdown format, which allows for more readable +# documentation. See http://daringfireball.net/projects/markdown/ for details. +# The output of markdown processing is further processed by doxygen, so you +# can mix doxygen, HTML, and XML commands with Markdown formatting. +# Disable only in case of backward compatibilities issues. + +MARKDOWN_SUPPORT = YES + +# When enabled doxygen tries to link words that correspond to documented classes, +# or namespaces to their corresponding documentation. Such a link can be +# prevented in individual cases by by putting a % sign in front of the word or +# globally by setting AUTOLINK_SUPPORT to NO. + +AUTOLINK_SUPPORT = YES + +# If you use STL classes (i.e. std::string, std::vector, etc.) but do not want +# to include (a tag file for) the STL sources as input, then you should +# set this tag to YES in order to let doxygen match functions declarations and +# definitions whose arguments contain STL classes (e.g. func(std::string); v.s. +# func(std::string) {}). This also makes the inheritance and collaboration +# diagrams that involve STL classes more complete and accurate. + +BUILTIN_STL_SUPPORT = NO + +# If you use Microsoft's C++/CLI language, you should set this option to YES to +# enable parsing support. + +CPP_CLI_SUPPORT = NO + +# Set the SIP_SUPPORT tag to YES if your project consists of sip sources only. +# Doxygen will parse them like normal C++ but will assume all classes use public +# instead of private inheritance when no explicit protection keyword is present. + +SIP_SUPPORT = NO + +# For Microsoft's IDL there are propget and propput attributes to +# indicate getter and setter methods for a property. Setting this +# option to YES (the default) will make doxygen replace the get and +# set methods by a property in the documentation. This will only work +# if the methods are indeed getting or setting a simple type. If this +# is not the case, or you want to show the methods anyway, you should +# set this option to NO. + +IDL_PROPERTY_SUPPORT = YES + +# If member grouping is used in the documentation and the DISTRIBUTE_GROUP_DOC +# tag is set to YES, then doxygen will reuse the documentation of the first +# member in the group (if any) for the other members of the group. By default +# all members of a group must be documented explicitly. + +DISTRIBUTE_GROUP_DOC = NO + +# Set the SUBGROUPING tag to YES (the default) to allow class member groups of +# the same type (for instance a group of public functions) to be put as a +# subgroup of that type (e.g. under the Public Functions section). Set it to +# NO to prevent subgrouping. Alternatively, this can be done per class using +# the \nosubgrouping command. + +SUBGROUPING = YES + +# When the INLINE_GROUPED_CLASSES tag is set to YES, classes, structs and +# unions are shown inside the group in which they are included (e.g. using +# @ingroup) instead of on a separate page (for HTML and Man pages) or +# section (for LaTeX and RTF). + +INLINE_GROUPED_CLASSES = NO + +# When the INLINE_SIMPLE_STRUCTS tag is set to YES, structs, classes, and +# unions with only public data fields will be shown inline in the documentation +# of the scope in which they are defined (i.e. file, namespace, or group +# documentation), provided this scope is documented. If set to NO (the default), +# structs, classes, and unions are shown on a separate page (for HTML and Man +# pages) or section (for LaTeX and RTF). + +INLINE_SIMPLE_STRUCTS = NO + +# When TYPEDEF_HIDES_STRUCT is enabled, a typedef of a struct, union, or enum +# is documented as struct, union, or enum with the name of the typedef. So +# typedef struct TypeS {} TypeT, will appear in the documentation as a struct +# with name TypeT. When disabled the typedef will appear as a member of a file, +# namespace, or class. And the struct will be named TypeS. This can typically +# be useful for C code in case the coding convention dictates that all compound +# types are typedef'ed and only the typedef is referenced, never the tag name. + +TYPEDEF_HIDES_STRUCT = NO + +# The SYMBOL_CACHE_SIZE determines the size of the internal cache use to +# determine which symbols to keep in memory and which to flush to disk. +# When the cache is full, less often used symbols will be written to disk. +# For small to medium size projects (<1000 input files) the default value is +# probably good enough. For larger projects a too small cache size can cause +# doxygen to be busy swapping symbols to and from disk most of the time +# causing a significant performance penalty. +# If the system has enough physical memory increasing the cache will improve the +# performance by keeping more symbols in memory. Note that the value works on +# a logarithmic scale so increasing the size by one will roughly double the +# memory usage. The cache size is given by this formula: +# 2^(16+SYMBOL_CACHE_SIZE). The valid range is 0..9, the default is 0, +# corresponding to a cache size of 2^16 = 65536 symbols. + +SYMBOL_CACHE_SIZE = 0 + +# Similar to the SYMBOL_CACHE_SIZE the size of the symbol lookup cache can be +# set using LOOKUP_CACHE_SIZE. This cache is used to resolve symbols given +# their name and scope. Since this can be an expensive process and often the +# same symbol appear multiple times in the code, doxygen keeps a cache of +# pre-resolved symbols. If the cache is too small doxygen will become slower. +# If the cache is too large, memory is wasted. The cache size is given by this +# formula: 2^(16+LOOKUP_CACHE_SIZE). The valid range is 0..9, the default is 0, +# corresponding to a cache size of 2^16 = 65536 symbols. + +LOOKUP_CACHE_SIZE = 0 + +#--------------------------------------------------------------------------- +# Build related configuration options +#--------------------------------------------------------------------------- + +# If the EXTRACT_ALL tag is set to YES doxygen will assume all entities in +# documentation are documented, even if no documentation was available. +# Private class members and static file members will be hidden unless +# the EXTRACT_PRIVATE and EXTRACT_STATIC tags are set to YES + +EXTRACT_ALL = NO + +# If the EXTRACT_PRIVATE tag is set to YES all private members of a class +# will be included in the documentation. + +EXTRACT_PRIVATE = YES + +# If the EXTRACT_PACKAGE tag is set to YES all members with package or internal +# scope will be included in the documentation. + +EXTRACT_PACKAGE = NO + +# If the EXTRACT_STATIC tag is set to YES all static members of a file +# will be included in the documentation. + +EXTRACT_STATIC = YES + +# If the EXTRACT_LOCAL_CLASSES tag is set to YES classes (and structs) +# defined locally in source files will be included in the documentation. +# If set to NO only classes defined in header files are included. + +EXTRACT_LOCAL_CLASSES = YES + +# This flag is only useful for Objective-C code. When set to YES local +# methods, which are defined in the implementation section but not in +# the interface are included in the documentation. +# If set to NO (the default) only methods in the interface are included. + +EXTRACT_LOCAL_METHODS = NO + +# If this flag is set to YES, the members of anonymous namespaces will be +# extracted and appear in the documentation as a namespace called +# 'anonymous_namespace{file}', where file will be replaced with the base +# name of the file that contains the anonymous namespace. By default +# anonymous namespaces are hidden. + +EXTRACT_ANON_NSPACES = NO + +# If the HIDE_UNDOC_MEMBERS tag is set to YES, Doxygen will hide all +# undocumented members of documented classes, files or namespaces. +# If set to NO (the default) these members will be included in the +# various overviews, but no documentation section is generated. +# This option has no effect if EXTRACT_ALL is enabled. + +HIDE_UNDOC_MEMBERS = YES + +# If the HIDE_UNDOC_CLASSES tag is set to YES, Doxygen will hide all +# undocumented classes that are normally visible in the class hierarchy. +# If set to NO (the default) these classes will be included in the various +# overviews. This option has no effect if EXTRACT_ALL is enabled. + +HIDE_UNDOC_CLASSES = YES + +# If the HIDE_FRIEND_COMPOUNDS tag is set to YES, Doxygen will hide all +# friend (class|struct|union) declarations. +# If set to NO (the default) these declarations will be included in the +# documentation. + +HIDE_FRIEND_COMPOUNDS = NO + +# If the HIDE_IN_BODY_DOCS tag is set to YES, Doxygen will hide any +# documentation blocks found inside the body of a function. +# If set to NO (the default) these blocks will be appended to the +# function's detailed documentation block. + +HIDE_IN_BODY_DOCS = NO + +# The INTERNAL_DOCS tag determines if documentation +# that is typed after a \internal command is included. If the tag is set +# to NO (the default) then the documentation will be excluded. +# Set it to YES to include the internal documentation. + +INTERNAL_DOCS = NO + +# If the CASE_SENSE_NAMES tag is set to NO then Doxygen will only generate +# file names in lower-case letters. If set to YES upper-case letters are also +# allowed. This is useful if you have classes or files whose names only differ +# in case and if your file system supports case sensitive file names. Windows +# and Mac users are advised to set this option to NO. + +CASE_SENSE_NAMES = YES + +# If the HIDE_SCOPE_NAMES tag is set to NO (the default) then Doxygen +# will show members with their full class and namespace scopes in the +# documentation. If set to YES the scope will be hidden. + +HIDE_SCOPE_NAMES = NO + +# If the SHOW_INCLUDE_FILES tag is set to YES (the default) then Doxygen +# will put a list of the files that are included by a file in the documentation +# of that file. + +SHOW_INCLUDE_FILES = YES + +# If the FORCE_LOCAL_INCLUDES tag is set to YES then Doxygen +# will list include files with double quotes in the documentation +# rather than with sharp brackets. + +FORCE_LOCAL_INCLUDES = NO + +# If the INLINE_INFO tag is set to YES (the default) then a tag [inline] +# is inserted in the documentation for inline members. + +INLINE_INFO = YES + +# If the SORT_MEMBER_DOCS tag is set to YES (the default) then doxygen +# will sort the (detailed) documentation of file and class members +# alphabetically by member name. If set to NO the members will appear in +# declaration order. + +SORT_MEMBER_DOCS = YES + +# If the SORT_BRIEF_DOCS tag is set to YES then doxygen will sort the +# brief documentation of file, namespace and class members alphabetically +# by member name. If set to NO (the default) the members will appear in +# declaration order. + +SORT_BRIEF_DOCS = NO + +# If the SORT_MEMBERS_CTORS_1ST tag is set to YES then doxygen +# will sort the (brief and detailed) documentation of class members so that +# constructors and destructors are listed first. If set to NO (the default) +# the constructors will appear in the respective orders defined by +# SORT_MEMBER_DOCS and SORT_BRIEF_DOCS. +# This tag will be ignored for brief docs if SORT_BRIEF_DOCS is set to NO +# and ignored for detailed docs if SORT_MEMBER_DOCS is set to NO. + +SORT_MEMBERS_CTORS_1ST = NO + +# If the SORT_GROUP_NAMES tag is set to YES then doxygen will sort the +# hierarchy of group names into alphabetical order. If set to NO (the default) +# the group names will appear in their defined order. + +SORT_GROUP_NAMES = NO + +# If the SORT_BY_SCOPE_NAME tag is set to YES, the class list will be +# sorted by fully-qualified names, including namespaces. If set to +# NO (the default), the class list will be sorted only by class name, +# not including the namespace part. +# Note: This option is not very useful if HIDE_SCOPE_NAMES is set to YES. +# Note: This option applies only to the class list, not to the +# alphabetical list. + +SORT_BY_SCOPE_NAME = NO + +# If the STRICT_PROTO_MATCHING option is enabled and doxygen fails to +# do proper type resolution of all parameters of a function it will reject a +# match between the prototype and the implementation of a member function even +# if there is only one candidate or it is obvious which candidate to choose +# by doing a simple string match. By disabling STRICT_PROTO_MATCHING doxygen +# will still accept a match between prototype and implementation in such cases. + +STRICT_PROTO_MATCHING = NO + +# The GENERATE_TODOLIST tag can be used to enable (YES) or +# disable (NO) the todo list. This list is created by putting \todo +# commands in the documentation. + +GENERATE_TODOLIST = YES + +# The GENERATE_TESTLIST tag can be used to enable (YES) or +# disable (NO) the test list. This list is created by putting \test +# commands in the documentation. + +GENERATE_TESTLIST = YES + +# The GENERATE_BUGLIST tag can be used to enable (YES) or +# disable (NO) the bug list. This list is created by putting \bug +# commands in the documentation. + +GENERATE_BUGLIST = YES + +# The GENERATE_DEPRECATEDLIST tag can be used to enable (YES) or +# disable (NO) the deprecated list. This list is created by putting +# \deprecated commands in the documentation. + +GENERATE_DEPRECATEDLIST= YES + +# The ENABLED_SECTIONS tag can be used to enable conditional +# documentation sections, marked by \if sectionname ... \endif. + +ENABLED_SECTIONS = + +# The MAX_INITIALIZER_LINES tag determines the maximum number of lines +# the initial value of a variable or macro consists of for it to appear in +# the documentation. If the initializer consists of more lines than specified +# here it will be hidden. Use a value of 0 to hide initializers completely. +# The appearance of the initializer of individual variables and macros in the +# documentation can be controlled using \showinitializer or \hideinitializer +# command in the documentation regardless of this setting. + +MAX_INITIALIZER_LINES = 30 + +# Set the SHOW_USED_FILES tag to NO to disable the list of files generated +# at the bottom of the documentation of classes and structs. If set to YES the +# list will mention the files that were used to generate the documentation. + +SHOW_USED_FILES = YES + +# Set the SHOW_FILES tag to NO to disable the generation of the Files page. +# This will remove the Files entry from the Quick Index and from the +# Folder Tree View (if specified). The default is YES. + +# We probably will want this, but we have no file documentation yet so it's simpler to remove +# it for now. +SHOW_FILES = NO + +# Set the SHOW_NAMESPACES tag to NO to disable the generation of the +# Namespaces page. +# This will remove the Namespaces entry from the Quick Index +# and from the Folder Tree View (if specified). The default is YES. + +SHOW_NAMESPACES = YES + +# The FILE_VERSION_FILTER tag can be used to specify a program or script that +# doxygen should invoke to get the current version for each file (typically from +# the version control system). Doxygen will invoke the program by executing (via +# popen()) the command , where is the value of +# the FILE_VERSION_FILTER tag, and is the name of an input file +# provided by doxygen. Whatever the program writes to standard output +# is used as the file version. See the manual for examples. + +FILE_VERSION_FILTER = + +# The LAYOUT_FILE tag can be used to specify a layout file which will be parsed +# by doxygen. The layout file controls the global structure of the generated +# output files in an output format independent way. To create the layout file +# that represents doxygen's defaults, run doxygen with the -l option. +# You can optionally specify a file name after the option, if omitted +# DoxygenLayout.xml will be used as the name of the layout file. + +LAYOUT_FILE = + +# The CITE_BIB_FILES tag can be used to specify one or more bib files +# containing the references data. This must be a list of .bib files. The +# .bib extension is automatically appended if omitted. Using this command +# requires the bibtex tool to be installed. See also +# http://en.wikipedia.org/wiki/BibTeX for more info. For LaTeX the style +# of the bibliography can be controlled using LATEX_BIB_STYLE. To use this +# feature you need bibtex and perl available in the search path. + +CITE_BIB_FILES = + +#--------------------------------------------------------------------------- +# configuration options related to warning and progress messages +#--------------------------------------------------------------------------- + +# The QUIET tag can be used to turn on/off the messages that are generated +# by doxygen. Possible values are YES and NO. If left blank NO is used. + +QUIET = NO + +# The WARNINGS tag can be used to turn on/off the warning messages that are +# generated by doxygen. Possible values are YES and NO. If left blank +# NO is used. + +WARNINGS = YES + +# If WARN_IF_UNDOCUMENTED is set to YES, then doxygen will generate warnings +# for undocumented members. If EXTRACT_ALL is set to YES then this flag will +# automatically be disabled. + +WARN_IF_UNDOCUMENTED = YES + +# If WARN_IF_DOC_ERROR is set to YES, doxygen will generate warnings for +# potential errors in the documentation, such as not documenting some +# parameters in a documented function, or documenting parameters that +# don't exist or using markup commands wrongly. + +WARN_IF_DOC_ERROR = YES + +# The WARN_NO_PARAMDOC option can be enabled to get warnings for +# functions that are documented, but have no documentation for their parameters +# or return value. If set to NO (the default) doxygen will only warn about +# wrong or incomplete parameter documentation, but not about the absence of +# documentation. + +WARN_NO_PARAMDOC = NO + +# The WARN_FORMAT tag determines the format of the warning messages that +# doxygen can produce. The string should contain the $file, $line, and $text +# tags, which will be replaced by the file and line number from which the +# warning originated and the warning text. Optionally the format may contain +# $version, which will be replaced by the version of the file (if it could +# be obtained via FILE_VERSION_FILTER) + +WARN_FORMAT = + +# The WARN_LOGFILE tag can be used to specify a file to which warning +# and error messages should be written. If left blank the output is written +# to stderr. + +WARN_LOGFILE = + +#--------------------------------------------------------------------------- +# configuration options related to the input files +#--------------------------------------------------------------------------- + +# The INPUT tag can be used to specify the files and/or directories that contain +# documented source files. You may enter file names like "myfile.cpp" or +# directories like "/usr/src/myproject". Separate the files or directories +# with spaces. + +INPUT = src doc/doxygen/libomp_interface.h +# The ittnotify code also has doxygen documentation, but if we include it here +# it takes over from us! +# src/thirdparty/ittnotify + +# This tag can be used to specify the character encoding of the source files +# that doxygen parses. Internally doxygen uses the UTF-8 encoding, which is +# also the default input encoding. Doxygen uses libiconv (or the iconv built +# into libc) for the transcoding. See http://www.gnu.org/software/libiconv for +# the list of possible encodings. + +INPUT_ENCODING = UTF-8 + +# If the value of the INPUT tag contains directories, you can use the +# FILE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp +# and *.h) to filter out the source-files in the directories. If left +# blank the following patterns are tested: +# *.c *.cc *.cxx *.cpp *.c++ *.d *.java *.ii *.ixx *.ipp *.i++ *.inl *.h *.hh +# *.hxx *.hpp *.h++ *.idl *.odl *.cs *.php *.php3 *.inc *.m *.mm *.dox *.py +# *.f90 *.f *.for *.vhd *.vhdl + +FILE_PATTERNS = *.c *.h *.cpp +# We may also want to include the asm files with appropriate ifdef to ensure +# doxygen doesn't see the content, just the documentation... + +# The RECURSIVE tag can be used to turn specify whether or not subdirectories +# should be searched for input files as well. Possible values are YES and NO. +# If left blank NO is used. + +# Only look in the one directory. +RECURSIVE = NO + +# The EXCLUDE tag can be used to specify files and/or directories that should be +# excluded from the INPUT source files. This way you can easily exclude a +# subdirectory from a directory tree whose root is specified with the INPUT tag. +# Note that relative paths are relative to the directory from which doxygen is +# run. + +EXCLUDE = src/test-touch.c + +# The EXCLUDE_SYMLINKS tag can be used to select whether or not files or +# directories that are symbolic links (a Unix file system feature) are excluded +# from the input. + +EXCLUDE_SYMLINKS = NO + +# If the value of the INPUT tag contains directories, you can use the +# EXCLUDE_PATTERNS tag to specify one or more wildcard patterns to exclude +# certain files from those directories. Note that the wildcards are matched +# against the file with absolute path, so to exclude all test directories +# for example use the pattern */test/* + +EXCLUDE_PATTERNS = + +# The EXCLUDE_SYMBOLS tag can be used to specify one or more symbol names +# (namespaces, classes, functions, etc.) that should be excluded from the +# output. The symbol name can be a fully qualified name, a word, or if the +# wildcard * is used, a substring. Examples: ANamespace, AClass, +# AClass::ANamespace, ANamespace::*Test + +EXCLUDE_SYMBOLS = + +# The EXAMPLE_PATH tag can be used to specify one or more files or +# directories that contain example code fragments that are included (see +# the \include command). + +EXAMPLE_PATH = + +# If the value of the EXAMPLE_PATH tag contains directories, you can use the +# EXAMPLE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp +# and *.h) to filter out the source-files in the directories. If left +# blank all files are included. + +EXAMPLE_PATTERNS = + +# If the EXAMPLE_RECURSIVE tag is set to YES then subdirectories will be +# searched for input files to be used with the \include or \dontinclude +# commands irrespective of the value of the RECURSIVE tag. +# Possible values are YES and NO. If left blank NO is used. + +EXAMPLE_RECURSIVE = NO + +# The IMAGE_PATH tag can be used to specify one or more files or +# directories that contain image that are included in the documentation (see +# the \image command). + +IMAGE_PATH = + +# The INPUT_FILTER tag can be used to specify a program that doxygen should +# invoke to filter for each input file. Doxygen will invoke the filter program +# by executing (via popen()) the command , where +# is the value of the INPUT_FILTER tag, and is the name of an +# input file. Doxygen will then use the output that the filter program writes +# to standard output. +# If FILTER_PATTERNS is specified, this tag will be +# ignored. + +INPUT_FILTER = + +# The FILTER_PATTERNS tag can be used to specify filters on a per file pattern +# basis. +# Doxygen will compare the file name with each pattern and apply the +# filter if there is a match. +# The filters are a list of the form: +# pattern=filter (like *.cpp=my_cpp_filter). See INPUT_FILTER for further +# info on how filters are used. If FILTER_PATTERNS is empty or if +# non of the patterns match the file name, INPUT_FILTER is applied. + +FILTER_PATTERNS = + +# If the FILTER_SOURCE_FILES tag is set to YES, the input filter (if set using +# INPUT_FILTER) will be used to filter the input files when producing source +# files to browse (i.e. when SOURCE_BROWSER is set to YES). + +FILTER_SOURCE_FILES = NO + +# The FILTER_SOURCE_PATTERNS tag can be used to specify source filters per file +# pattern. A pattern will override the setting for FILTER_PATTERN (if any) +# and it is also possible to disable source filtering for a specific pattern +# using *.ext= (so without naming a filter). This option only has effect when +# FILTER_SOURCE_FILES is enabled. + +FILTER_SOURCE_PATTERNS = + +#--------------------------------------------------------------------------- +# configuration options related to source browsing +#--------------------------------------------------------------------------- + +# If the SOURCE_BROWSER tag is set to YES then a list of source files will +# be generated. Documented entities will be cross-referenced with these sources. +# Note: To get rid of all source code in the generated output, make sure also +# VERBATIM_HEADERS is set to NO. + +SOURCE_BROWSER = YES + +# Setting the INLINE_SOURCES tag to YES will include the body +# of functions and classes directly in the documentation. + +INLINE_SOURCES = NO + +# Setting the STRIP_CODE_COMMENTS tag to YES (the default) will instruct +# doxygen to hide any special comment blocks from generated source code +# fragments. Normal C, C++ and Fortran comments will always remain visible. + +STRIP_CODE_COMMENTS = YES + +# If the REFERENCED_BY_RELATION tag is set to YES +# then for each documented function all documented +# functions referencing it will be listed. + +REFERENCED_BY_RELATION = YES + +# If the REFERENCES_RELATION tag is set to YES +# then for each documented function all documented entities +# called/used by that function will be listed. + +REFERENCES_RELATION = NO + +# If the REFERENCES_LINK_SOURCE tag is set to YES (the default) +# and SOURCE_BROWSER tag is set to YES, then the hyperlinks from +# functions in REFERENCES_RELATION and REFERENCED_BY_RELATION lists will +# link to the source code. +# Otherwise they will link to the documentation. + +REFERENCES_LINK_SOURCE = YES + +# If the USE_HTAGS tag is set to YES then the references to source code +# will point to the HTML generated by the htags(1) tool instead of doxygen +# built-in source browser. The htags tool is part of GNU's global source +# tagging system (see http://www.gnu.org/software/global/global.html). You +# will need version 4.8.6 or higher. + +USE_HTAGS = NO + +# If the VERBATIM_HEADERS tag is set to YES (the default) then Doxygen +# will generate a verbatim copy of the header file for each class for +# which an include is specified. Set to NO to disable this. + +VERBATIM_HEADERS = YES + +#--------------------------------------------------------------------------- +# configuration options related to the alphabetical class index +#--------------------------------------------------------------------------- + +# If the ALPHABETICAL_INDEX tag is set to YES, an alphabetical index +# of all compounds will be generated. Enable this if the project +# contains a lot of classes, structs, unions or interfaces. + +ALPHABETICAL_INDEX = YES + +# If the alphabetical index is enabled (see ALPHABETICAL_INDEX) then +# the COLS_IN_ALPHA_INDEX tag can be used to specify the number of columns +# in which this list will be split (can be a number in the range [1..20]) + +COLS_IN_ALPHA_INDEX = 5 + +# In case all classes in a project start with a common prefix, all +# classes will be put under the same header in the alphabetical index. +# The IGNORE_PREFIX tag can be used to specify one or more prefixes that +# should be ignored while generating the index headers. + +IGNORE_PREFIX = + +#--------------------------------------------------------------------------- +# configuration options related to the HTML output +#--------------------------------------------------------------------------- + +# If the GENERATE_HTML tag is set to YES (the default) Doxygen will +# generate HTML output. + +GENERATE_HTML = YES + +# The HTML_OUTPUT tag is used to specify where the HTML docs will be put. +# If a relative path is entered the value of OUTPUT_DIRECTORY will be +# put in front of it. If left blank `html' will be used as the default path. + +HTML_OUTPUT = + +# The HTML_FILE_EXTENSION tag can be used to specify the file extension for +# each generated HTML page (for example: .htm,.php,.asp). If it is left blank +# doxygen will generate files with .html extension. + +HTML_FILE_EXTENSION = .html + +# The HTML_HEADER tag can be used to specify a personal HTML header for +# each generated HTML page. If it is left blank doxygen will generate a +# standard header. Note that when using a custom header you are responsible +# for the proper inclusion of any scripts and style sheets that doxygen +# needs, which is dependent on the configuration options used. +# It is advised to generate a default header using "doxygen -w html +# header.html footer.html stylesheet.css YourConfigFile" and then modify +# that header. Note that the header is subject to change so you typically +# have to redo this when upgrading to a newer version of doxygen or when +# changing the value of configuration settings such as GENERATE_TREEVIEW! + +HTML_HEADER = + +# The HTML_FOOTER tag can be used to specify a personal HTML footer for +# each generated HTML page. If it is left blank doxygen will generate a +# standard footer. + +HTML_FOOTER = + +# The HTML_STYLESHEET tag can be used to specify a user-defined cascading +# style sheet that is used by each HTML page. It can be used to +# fine-tune the look of the HTML output. If left blank doxygen will +# generate a default style sheet. Note that it is recommended to use +# HTML_EXTRA_STYLESHEET instead of this one, as it is more robust and this +# tag will in the future become obsolete. + +HTML_STYLESHEET = + +# The HTML_EXTRA_STYLESHEET tag can be used to specify an additional +# user-defined cascading style sheet that is included after the standard +# style sheets created by doxygen. Using this option one can overrule +# certain style aspects. This is preferred over using HTML_STYLESHEET +# since it does not replace the standard style sheet and is therefor more +# robust against future updates. Doxygen will copy the style sheet file to +# the output directory. + +HTML_EXTRA_STYLESHEET = + +# The HTML_EXTRA_FILES tag can be used to specify one or more extra images or +# other source files which should be copied to the HTML output directory. Note +# that these files will be copied to the base HTML output directory. Use the +# $relpath$ marker in the HTML_HEADER and/or HTML_FOOTER files to load these +# files. In the HTML_STYLESHEET file, use the file name only. Also note that +# the files will be copied as-is; there are no commands or markers available. + +HTML_EXTRA_FILES = + +# The HTML_COLORSTYLE_HUE tag controls the color of the HTML output. +# Doxygen will adjust the colors in the style sheet and background images +# according to this color. Hue is specified as an angle on a colorwheel, +# see http://en.wikipedia.org/wiki/Hue for more information. +# For instance the value 0 represents red, 60 is yellow, 120 is green, +# 180 is cyan, 240 is blue, 300 purple, and 360 is red again. +# The allowed range is 0 to 359. + +HTML_COLORSTYLE_HUE = 220 + +# The HTML_COLORSTYLE_SAT tag controls the purity (or saturation) of +# the colors in the HTML output. For a value of 0 the output will use +# grayscales only. A value of 255 will produce the most vivid colors. + +HTML_COLORSTYLE_SAT = 100 + +# The HTML_COLORSTYLE_GAMMA tag controls the gamma correction applied to +# the luminance component of the colors in the HTML output. Values below +# 100 gradually make the output lighter, whereas values above 100 make +# the output darker. The value divided by 100 is the actual gamma applied, +# so 80 represents a gamma of 0.8, The value 220 represents a gamma of 2.2, +# and 100 does not change the gamma. + +HTML_COLORSTYLE_GAMMA = 80 + +# If the HTML_TIMESTAMP tag is set to YES then the footer of each generated HTML +# page will contain the date and time when the page was generated. Setting +# this to NO can help when comparing the output of multiple runs. + +HTML_TIMESTAMP = NO + +# If the HTML_DYNAMIC_SECTIONS tag is set to YES then the generated HTML +# documentation will contain sections that can be hidden and shown after the +# page has loaded. + +HTML_DYNAMIC_SECTIONS = NO + +# With HTML_INDEX_NUM_ENTRIES one can control the preferred number of +# entries shown in the various tree structured indices initially; the user +# can expand and collapse entries dynamically later on. Doxygen will expand +# the tree to such a level that at most the specified number of entries are +# visible (unless a fully collapsed tree already exceeds this amount). +# So setting the number of entries 1 will produce a full collapsed tree by +# default. 0 is a special value representing an infinite number of entries +# and will result in a full expanded tree by default. + +HTML_INDEX_NUM_ENTRIES = 100 + +# If the GENERATE_DOCSET tag is set to YES, additional index files +# will be generated that can be used as input for Apple's Xcode 3 +# integrated development environment, introduced with OSX 10.5 (Leopard). +# To create a documentation set, doxygen will generate a Makefile in the +# HTML output directory. Running make will produce the docset in that +# directory and running "make install" will install the docset in +# ~/Library/Developer/Shared/Documentation/DocSets so that Xcode will find +# it at startup. +# See http://developer.apple.com/tools/creatingdocsetswithdoxygen.html +# for more information. + +GENERATE_DOCSET = NO + +# When GENERATE_DOCSET tag is set to YES, this tag determines the name of the +# feed. A documentation feed provides an umbrella under which multiple +# documentation sets from a single provider (such as a company or product suite) +# can be grouped. + +DOCSET_FEEDNAME = "Doxygen generated docs" + +# When GENERATE_DOCSET tag is set to YES, this tag specifies a string that +# should uniquely identify the documentation set bundle. This should be a +# reverse domain-name style string, e.g. com.mycompany.MyDocSet. Doxygen +# will append .docset to the name. + +DOCSET_BUNDLE_ID = org.doxygen.Project + +# When GENERATE_PUBLISHER_ID tag specifies a string that should uniquely +# identify the documentation publisher. This should be a reverse domain-name +# style string, e.g. com.mycompany.MyDocSet.documentation. + +DOCSET_PUBLISHER_ID = org.doxygen.Publisher + +# The GENERATE_PUBLISHER_NAME tag identifies the documentation publisher. + +DOCSET_PUBLISHER_NAME = Publisher + +# If the GENERATE_HTMLHELP tag is set to YES, additional index files +# will be generated that can be used as input for tools like the +# Microsoft HTML help workshop to generate a compiled HTML help file (.chm) +# of the generated HTML documentation. + +GENERATE_HTMLHELP = NO + +# If the GENERATE_HTMLHELP tag is set to YES, the CHM_FILE tag can +# be used to specify the file name of the resulting .chm file. You +# can add a path in front of the file if the result should not be +# written to the html output directory. + +CHM_FILE = + +# If the GENERATE_HTMLHELP tag is set to YES, the HHC_LOCATION tag can +# be used to specify the location (absolute path including file name) of +# the HTML help compiler (hhc.exe). If non-empty doxygen will try to run +# the HTML help compiler on the generated index.hhp. + +HHC_LOCATION = + +# If the GENERATE_HTMLHELP tag is set to YES, the GENERATE_CHI flag +# controls if a separate .chi index file is generated (YES) or that +# it should be included in the main .chm file (NO). + +GENERATE_CHI = NO + +# If the GENERATE_HTMLHELP tag is set to YES, the CHM_INDEX_ENCODING +# is used to encode HtmlHelp index (hhk), content (hhc) and project file +# content. + +CHM_INDEX_ENCODING = + +# If the GENERATE_HTMLHELP tag is set to YES, the BINARY_TOC flag +# controls whether a binary table of contents is generated (YES) or a +# normal table of contents (NO) in the .chm file. + +BINARY_TOC = NO + +# The TOC_EXPAND flag can be set to YES to add extra items for group members +# to the contents of the HTML help documentation and to the tree view. + +TOC_EXPAND = NO + +# If the GENERATE_QHP tag is set to YES and both QHP_NAMESPACE and +# QHP_VIRTUAL_FOLDER are set, an additional index file will be generated +# that can be used as input for Qt's qhelpgenerator to generate a +# Qt Compressed Help (.qch) of the generated HTML documentation. + +GENERATE_QHP = NO + +# If the QHG_LOCATION tag is specified, the QCH_FILE tag can +# be used to specify the file name of the resulting .qch file. +# The path specified is relative to the HTML output folder. + +QCH_FILE = + +# The QHP_NAMESPACE tag specifies the namespace to use when generating +# Qt Help Project output. For more information please see +# http://doc.trolltech.com/qthelpproject.html#namespace + +QHP_NAMESPACE = org.doxygen.Project + +# The QHP_VIRTUAL_FOLDER tag specifies the namespace to use when generating +# Qt Help Project output. For more information please see +# http://doc.trolltech.com/qthelpproject.html#virtual-folders + +QHP_VIRTUAL_FOLDER = doc + +# If QHP_CUST_FILTER_NAME is set, it specifies the name of a custom filter to +# add. For more information please see +# http://doc.trolltech.com/qthelpproject.html#custom-filters + +QHP_CUST_FILTER_NAME = + +# The QHP_CUST_FILT_ATTRS tag specifies the list of the attributes of the +# custom filter to add. For more information please see +# +# Qt Help Project / Custom Filters. + +QHP_CUST_FILTER_ATTRS = + +# The QHP_SECT_FILTER_ATTRS tag specifies the list of the attributes this +# project's +# filter section matches. +# +# Qt Help Project / Filter Attributes. + +QHP_SECT_FILTER_ATTRS = + +# If the GENERATE_QHP tag is set to YES, the QHG_LOCATION tag can +# be used to specify the location of Qt's qhelpgenerator. +# If non-empty doxygen will try to run qhelpgenerator on the generated +# .qhp file. + +QHG_LOCATION = + +# If the GENERATE_ECLIPSEHELP tag is set to YES, additional index files +# will be generated, which together with the HTML files, form an Eclipse help +# plugin. To install this plugin and make it available under the help contents +# menu in Eclipse, the contents of the directory containing the HTML and XML +# files needs to be copied into the plugins directory of eclipse. The name of +# the directory within the plugins directory should be the same as +# the ECLIPSE_DOC_ID value. After copying Eclipse needs to be restarted before +# the help appears. + +GENERATE_ECLIPSEHELP = NO + +# A unique identifier for the eclipse help plugin. When installing the plugin +# the directory name containing the HTML and XML files should also have +# this name. + +ECLIPSE_DOC_ID = org.doxygen.Project + +# The DISABLE_INDEX tag can be used to turn on/off the condensed index (tabs) +# at top of each HTML page. The value NO (the default) enables the index and +# the value YES disables it. Since the tabs have the same information as the +# navigation tree you can set this option to NO if you already set +# GENERATE_TREEVIEW to YES. + +DISABLE_INDEX = NO + +# The GENERATE_TREEVIEW tag is used to specify whether a tree-like index +# structure should be generated to display hierarchical information. +# If the tag value is set to YES, a side panel will be generated +# containing a tree-like index structure (just like the one that +# is generated for HTML Help). For this to work a browser that supports +# JavaScript, DHTML, CSS and frames is required (i.e. any modern browser). +# Windows users are probably better off using the HTML help feature. +# Since the tree basically has the same information as the tab index you +# could consider to set DISABLE_INDEX to NO when enabling this option. + +GENERATE_TREEVIEW = NO + +# The ENUM_VALUES_PER_LINE tag can be used to set the number of enum values +# (range [0,1..20]) that doxygen will group on one line in the generated HTML +# documentation. Note that a value of 0 will completely suppress the enum +# values from appearing in the overview section. + +ENUM_VALUES_PER_LINE = 4 + +# If the treeview is enabled (see GENERATE_TREEVIEW) then this tag can be +# used to set the initial width (in pixels) of the frame in which the tree +# is shown. + +TREEVIEW_WIDTH = 250 + +# When the EXT_LINKS_IN_WINDOW option is set to YES doxygen will open +# links to external symbols imported via tag files in a separate window. + +EXT_LINKS_IN_WINDOW = NO + +# Use this tag to change the font size of Latex formulas included +# as images in the HTML documentation. The default is 10. Note that +# when you change the font size after a successful doxygen run you need +# to manually remove any form_*.png images from the HTML output directory +# to force them to be regenerated. + +FORMULA_FONTSIZE = 10 + +# Use the FORMULA_TRANPARENT tag to determine whether or not the images +# generated for formulas are transparent PNGs. Transparent PNGs are +# not supported properly for IE 6.0, but are supported on all modern browsers. +# Note that when changing this option you need to delete any form_*.png files +# in the HTML output before the changes have effect. + +FORMULA_TRANSPARENT = YES + +# Enable the USE_MATHJAX option to render LaTeX formulas using MathJax +# (see http://www.mathjax.org) which uses client side Javascript for the +# rendering instead of using prerendered bitmaps. Use this if you do not +# have LaTeX installed or if you want to formulas look prettier in the HTML +# output. When enabled you may also need to install MathJax separately and +# configure the path to it using the MATHJAX_RELPATH option. + +USE_MATHJAX = NO + +# When MathJax is enabled you need to specify the location relative to the +# HTML output directory using the MATHJAX_RELPATH option. The destination +# directory should contain the MathJax.js script. For instance, if the mathjax +# directory is located at the same level as the HTML output directory, then +# MATHJAX_RELPATH should be ../mathjax. The default value points to +# the MathJax Content Delivery Network so you can quickly see the result without +# installing MathJax. +# However, it is strongly recommended to install a local +# copy of MathJax from http://www.mathjax.org before deployment. + +MATHJAX_RELPATH = http://cdn.mathjax.org/mathjax/latest + +# The MATHJAX_EXTENSIONS tag can be used to specify one or MathJax extension +# names that should be enabled during MathJax rendering. + +MATHJAX_EXTENSIONS = + +# When the SEARCHENGINE tag is enabled doxygen will generate a search box +# for the HTML output. The underlying search engine uses javascript +# and DHTML and should work on any modern browser. Note that when using +# HTML help (GENERATE_HTMLHELP), Qt help (GENERATE_QHP), or docsets +# (GENERATE_DOCSET) there is already a search function so this one should +# typically be disabled. For large projects the javascript based search engine +# can be slow, then enabling SERVER_BASED_SEARCH may provide a better solution. + +SEARCHENGINE = YES + +# When the SERVER_BASED_SEARCH tag is enabled the search engine will be +# implemented using a PHP enabled web server instead of at the web client +# using Javascript. Doxygen will generate the search PHP script and index +# file to put on the web server. The advantage of the server +# based approach is that it scales better to large projects and allows +# full text search. The disadvantages are that it is more difficult to setup +# and does not have live searching capabilities. + +SERVER_BASED_SEARCH = NO + +#--------------------------------------------------------------------------- +# configuration options related to the LaTeX output +#--------------------------------------------------------------------------- + +# If the GENERATE_LATEX tag is set to YES (the default) Doxygen will +# generate Latex output. + +GENERATE_LATEX = YES + +# The LATEX_OUTPUT tag is used to specify where the LaTeX docs will be put. +# If a relative path is entered the value of OUTPUT_DIRECTORY will be +# put in front of it. If left blank `latex' will be used as the default path. + +LATEX_OUTPUT = + +# The LATEX_CMD_NAME tag can be used to specify the LaTeX command name to be +# invoked. If left blank `latex' will be used as the default command name. +# Note that when enabling USE_PDFLATEX this option is only used for +# generating bitmaps for formulas in the HTML output, but not in the +# Makefile that is written to the output directory. + +LATEX_CMD_NAME = latex + +# The MAKEINDEX_CMD_NAME tag can be used to specify the command name to +# generate index for LaTeX. If left blank `makeindex' will be used as the +# default command name. + +MAKEINDEX_CMD_NAME = makeindex + +# If the COMPACT_LATEX tag is set to YES Doxygen generates more compact +# LaTeX documents. This may be useful for small projects and may help to +# save some trees in general. + +COMPACT_LATEX = NO + +# The PAPER_TYPE tag can be used to set the paper type that is used +# by the printer. Possible values are: a4, letter, legal and +# executive. If left blank a4wide will be used. + +PAPER_TYPE = a4wide + +# The EXTRA_PACKAGES tag can be to specify one or more names of LaTeX +# packages that should be included in the LaTeX output. + +EXTRA_PACKAGES = + +# The LATEX_HEADER tag can be used to specify a personal LaTeX header for +# the generated latex document. The header should contain everything until +# the first chapter. If it is left blank doxygen will generate a +# standard header. Notice: only use this tag if you know what you are doing! + +LATEX_HEADER = doc/doxygen/header.tex + +# The LATEX_FOOTER tag can be used to specify a personal LaTeX footer for +# the generated latex document. The footer should contain everything after +# the last chapter. If it is left blank doxygen will generate a +# standard footer. Notice: only use this tag if you know what you are doing! + +LATEX_FOOTER = + +# If the PDF_HYPERLINKS tag is set to YES, the LaTeX that is generated +# is prepared for conversion to pdf (using ps2pdf). The pdf file will +# contain links (just like the HTML output) instead of page references +# This makes the output suitable for online browsing using a pdf viewer. + +PDF_HYPERLINKS = YES + +# If the USE_PDFLATEX tag is set to YES, pdflatex will be used instead of +# plain latex in the generated Makefile. Set this option to YES to get a +# higher quality PDF documentation. + +USE_PDFLATEX = YES + +# If the LATEX_BATCHMODE tag is set to YES, doxygen will add the \\batchmode. +# command to the generated LaTeX files. This will instruct LaTeX to keep +# running if errors occur, instead of asking the user for help. +# This option is also used when generating formulas in HTML. + +LATEX_BATCHMODE = NO + +# If LATEX_HIDE_INDICES is set to YES then doxygen will not +# include the index chapters (such as File Index, Compound Index, etc.) +# in the output. + +LATEX_HIDE_INDICES = NO + +# If LATEX_SOURCE_CODE is set to YES then doxygen will include +# source code with syntax highlighting in the LaTeX output. +# Note that which sources are shown also depends on other settings +# such as SOURCE_BROWSER. + +LATEX_SOURCE_CODE = NO + +# The LATEX_BIB_STYLE tag can be used to specify the style to use for the +# bibliography, e.g. plainnat, or ieeetr. The default style is "plain". See +# http://en.wikipedia.org/wiki/BibTeX for more info. + +LATEX_BIB_STYLE = plain + +#--------------------------------------------------------------------------- +# configuration options related to the RTF output +#--------------------------------------------------------------------------- + +# If the GENERATE_RTF tag is set to YES Doxygen will generate RTF output +# The RTF output is optimized for Word 97 and may not look very pretty with +# other RTF readers or editors. + +GENERATE_RTF = NO + +# The RTF_OUTPUT tag is used to specify where the RTF docs will be put. +# If a relative path is entered the value of OUTPUT_DIRECTORY will be +# put in front of it. If left blank `rtf' will be used as the default path. + +RTF_OUTPUT = + +# If the COMPACT_RTF tag is set to YES Doxygen generates more compact +# RTF documents. This may be useful for small projects and may help to +# save some trees in general. + +COMPACT_RTF = NO + +# If the RTF_HYPERLINKS tag is set to YES, the RTF that is generated +# will contain hyperlink fields. The RTF file will +# contain links (just like the HTML output) instead of page references. +# This makes the output suitable for online browsing using WORD or other +# programs which support those fields. +# Note: wordpad (write) and others do not support links. + +RTF_HYPERLINKS = NO + +# Load style sheet definitions from file. Syntax is similar to doxygen's +# config file, i.e. a series of assignments. You only have to provide +# replacements, missing definitions are set to their default value. + +RTF_STYLESHEET_FILE = + +# Set optional variables used in the generation of an rtf document. +# Syntax is similar to doxygen's config file. + +RTF_EXTENSIONS_FILE = + +#--------------------------------------------------------------------------- +# configuration options related to the man page output +#--------------------------------------------------------------------------- + +# If the GENERATE_MAN tag is set to YES (the default) Doxygen will +# generate man pages + +GENERATE_MAN = NO + +# The MAN_OUTPUT tag is used to specify where the man pages will be put. +# If a relative path is entered the value of OUTPUT_DIRECTORY will be +# put in front of it. If left blank `man' will be used as the default path. + +MAN_OUTPUT = + +# The MAN_EXTENSION tag determines the extension that is added to +# the generated man pages (default is the subroutine's section .3) + +MAN_EXTENSION = + +# If the MAN_LINKS tag is set to YES and Doxygen generates man output, +# then it will generate one additional man file for each entity +# documented in the real man page(s). These additional files +# only source the real man page, but without them the man command +# would be unable to find the correct page. The default is NO. + +MAN_LINKS = NO + +#--------------------------------------------------------------------------- +# configuration options related to the XML output +#--------------------------------------------------------------------------- + +# If the GENERATE_XML tag is set to YES Doxygen will +# generate an XML file that captures the structure of +# the code including all documentation. + +GENERATE_XML = NO + +# The XML_OUTPUT tag is used to specify where the XML pages will be put. +# If a relative path is entered the value of OUTPUT_DIRECTORY will be +# put in front of it. If left blank `xml' will be used as the default path. + +XML_OUTPUT = xml + +# The XML_SCHEMA tag can be used to specify an XML schema, +# which can be used by a validating XML parser to check the +# syntax of the XML files. + +XML_SCHEMA = + +# The XML_DTD tag can be used to specify an XML DTD, +# which can be used by a validating XML parser to check the +# syntax of the XML files. + +XML_DTD = + +# If the XML_PROGRAMLISTING tag is set to YES Doxygen will +# dump the program listings (including syntax highlighting +# and cross-referencing information) to the XML output. Note that +# enabling this will significantly increase the size of the XML output. + +XML_PROGRAMLISTING = YES + +#--------------------------------------------------------------------------- +# configuration options for the AutoGen Definitions output +#--------------------------------------------------------------------------- + +# If the GENERATE_AUTOGEN_DEF tag is set to YES Doxygen will +# generate an AutoGen Definitions (see autogen.sf.net) file +# that captures the structure of the code including all +# documentation. Note that this feature is still experimental +# and incomplete at the moment. + +GENERATE_AUTOGEN_DEF = NO + +#--------------------------------------------------------------------------- +# configuration options related to the Perl module output +#--------------------------------------------------------------------------- + +# If the GENERATE_PERLMOD tag is set to YES Doxygen will +# generate a Perl module file that captures the structure of +# the code including all documentation. Note that this +# feature is still experimental and incomplete at the +# moment. + +GENERATE_PERLMOD = NO + +# If the PERLMOD_LATEX tag is set to YES Doxygen will generate +# the necessary Makefile rules, Perl scripts and LaTeX code to be able +# to generate PDF and DVI output from the Perl module output. + +PERLMOD_LATEX = NO + +# If the PERLMOD_PRETTY tag is set to YES the Perl module output will be +# nicely formatted so it can be parsed by a human reader. +# This is useful +# if you want to understand what is going on. +# On the other hand, if this +# tag is set to NO the size of the Perl module output will be much smaller +# and Perl will parse it just the same. + +PERLMOD_PRETTY = YES + +# The names of the make variables in the generated doxyrules.make file +# are prefixed with the string contained in PERLMOD_MAKEVAR_PREFIX. +# This is useful so different doxyrules.make files included by the same +# Makefile don't overwrite each other's variables. + +PERLMOD_MAKEVAR_PREFIX = + +#--------------------------------------------------------------------------- +# Configuration options related to the preprocessor +#--------------------------------------------------------------------------- + +# If the ENABLE_PREPROCESSING tag is set to YES (the default) Doxygen will +# evaluate all C-preprocessor directives found in the sources and include +# files. + +ENABLE_PREPROCESSING = YES + +# If the MACRO_EXPANSION tag is set to YES Doxygen will expand all macro +# names in the source code. If set to NO (the default) only conditional +# compilation will be performed. Macro expansion can be done in a controlled +# way by setting EXPAND_ONLY_PREDEF to YES. + +MACRO_EXPANSION = YES + +# If the EXPAND_ONLY_PREDEF and MACRO_EXPANSION tags are both set to YES +# then the macro expansion is limited to the macros specified with the +# PREDEFINED and EXPAND_AS_DEFINED tags. + +EXPAND_ONLY_PREDEF = YES + +# If the SEARCH_INCLUDES tag is set to YES (the default) the includes files +# pointed to by INCLUDE_PATH will be searched when a #include is found. + +SEARCH_INCLUDES = YES + +# The INCLUDE_PATH tag can be used to specify one or more directories that +# contain include files that are not input files but should be processed by +# the preprocessor. + +INCLUDE_PATH = + +# You can use the INCLUDE_FILE_PATTERNS tag to specify one or more wildcard +# patterns (like *.h and *.hpp) to filter out the header-files in the +# directories. If left blank, the patterns specified with FILE_PATTERNS will +# be used. + +INCLUDE_FILE_PATTERNS = + +# The PREDEFINED tag can be used to specify one or more macro names that +# are defined before the preprocessor is started (similar to the -D option of +# gcc). The argument of the tag is a list of macros of the form: name +# or name=definition (no spaces). If the definition and the = are +# omitted =1 is assumed. To prevent a macro definition from being +# undefined via #undef or recursively expanded use the := operator +# instead of the = operator. + +PREDEFINED = OMP_30_ENABLED=1, OMP_40_ENABLED=1, KMP_STATS_ENABLED=1 + +# If the MACRO_EXPANSION and EXPAND_ONLY_PREDEF tags are set to YES then +# this tag can be used to specify a list of macro names that should be expanded. +# The macro definition that is found in the sources will be used. +# Use the PREDEFINED tag if you want to use a different macro definition that +# overrules the definition found in the source code. + +EXPAND_AS_DEFINED = + +# If the SKIP_FUNCTION_MACROS tag is set to YES (the default) then +# doxygen's preprocessor will remove all references to function-like macros +# that are alone on a line, have an all uppercase name, and do not end with a +# semicolon, because these will confuse the parser if not removed. + +SKIP_FUNCTION_MACROS = YES + +#--------------------------------------------------------------------------- +# Configuration::additions related to external references +#--------------------------------------------------------------------------- + +# The TAGFILES option can be used to specify one or more tagfiles. For each +# tag file the location of the external documentation should be added. The +# format of a tag file without this location is as follows: +# +# TAGFILES = file1 file2 ... +# Adding location for the tag files is done as follows: +# +# TAGFILES = file1=loc1 "file2 = loc2" ... +# where "loc1" and "loc2" can be relative or absolute paths +# or URLs. Note that each tag file must have a unique name (where the name does +# NOT include the path). If a tag file is not located in the directory in which +# doxygen is run, you must also specify the path to the tagfile here. + +TAGFILES = + +# When a file name is specified after GENERATE_TAGFILE, doxygen will create +# a tag file that is based on the input files it reads. + +GENERATE_TAGFILE = + +# If the ALLEXTERNALS tag is set to YES all external classes will be listed +# in the class index. If set to NO only the inherited external classes +# will be listed. + +ALLEXTERNALS = NO + +# If the EXTERNAL_GROUPS tag is set to YES all external groups will be listed +# in the modules index. If set to NO, only the current project's groups will +# be listed. + +EXTERNAL_GROUPS = YES + +# The PERL_PATH should be the absolute path and name of the perl script +# interpreter (i.e. the result of `which perl'). + +PERL_PATH = + +#--------------------------------------------------------------------------- +# Configuration options related to the dot tool +#--------------------------------------------------------------------------- + +# If the CLASS_DIAGRAMS tag is set to YES (the default) Doxygen will +# generate a inheritance diagram (in HTML, RTF and LaTeX) for classes with base +# or super classes. Setting the tag to NO turns the diagrams off. Note that +# this option also works with HAVE_DOT disabled, but it is recommended to +# install and use dot, since it yields more powerful graphs. + +CLASS_DIAGRAMS = YES + +# You can define message sequence charts within doxygen comments using the \msc +# command. Doxygen will then run the mscgen tool (see +# http://www.mcternan.me.uk/mscgen/) to produce the chart and insert it in the +# documentation. The MSCGEN_PATH tag allows you to specify the directory where +# the mscgen tool resides. If left empty the tool is assumed to be found in the +# default search path. + +MSCGEN_PATH = + +# If set to YES, the inheritance and collaboration graphs will hide +# inheritance and usage relations if the target is undocumented +# or is not a class. + +HIDE_UNDOC_RELATIONS = YES + +# If you set the HAVE_DOT tag to YES then doxygen will assume the dot tool is +# available from the path. This tool is part of Graphviz, a graph visualization +# toolkit from AT&T and Lucent Bell Labs. The other options in this section +# have no effect if this option is set to NO (the default) + +HAVE_DOT = NO + +# The DOT_NUM_THREADS specifies the number of dot invocations doxygen is +# allowed to run in parallel. When set to 0 (the default) doxygen will +# base this on the number of processors available in the system. You can set it +# explicitly to a value larger than 0 to get control over the balance +# between CPU load and processing speed. + +DOT_NUM_THREADS = 0 + +# By default doxygen will use the Helvetica font for all dot files that +# doxygen generates. When you want a differently looking font you can specify +# the font name using DOT_FONTNAME. You need to make sure dot is able to find +# the font, which can be done by putting it in a standard location or by setting +# the DOTFONTPATH environment variable or by setting DOT_FONTPATH to the +# directory containing the font. + +DOT_FONTNAME = Helvetica + +# The DOT_FONTSIZE tag can be used to set the size of the font of dot graphs. +# The default size is 10pt. + +DOT_FONTSIZE = 10 + +# By default doxygen will tell dot to use the Helvetica font. +# If you specify a different font using DOT_FONTNAME you can use DOT_FONTPATH to +# set the path where dot can find it. + +DOT_FONTPATH = + +# If the CLASS_GRAPH and HAVE_DOT tags are set to YES then doxygen +# will generate a graph for each documented class showing the direct and +# indirect inheritance relations. Setting this tag to YES will force the +# CLASS_DIAGRAMS tag to NO. + +CLASS_GRAPH = YES + +# If the COLLABORATION_GRAPH and HAVE_DOT tags are set to YES then doxygen +# will generate a graph for each documented class showing the direct and +# indirect implementation dependencies (inheritance, containment, and +# class references variables) of the class with other documented classes. + +COLLABORATION_GRAPH = NO + +# If the GROUP_GRAPHS and HAVE_DOT tags are set to YES then doxygen +# will generate a graph for groups, showing the direct groups dependencies + +GROUP_GRAPHS = YES + +# If the UML_LOOK tag is set to YES doxygen will generate inheritance and +# collaboration diagrams in a style similar to the OMG's Unified Modeling +# Language. + +UML_LOOK = NO + +# If the UML_LOOK tag is enabled, the fields and methods are shown inside +# the class node. If there are many fields or methods and many nodes the +# graph may become too big to be useful. The UML_LIMIT_NUM_FIELDS +# threshold limits the number of items for each type to make the size more +# manageable. Set this to 0 for no limit. Note that the threshold may be +# exceeded by 50% before the limit is enforced. + +UML_LIMIT_NUM_FIELDS = 10 + +# If set to YES, the inheritance and collaboration graphs will show the +# relations between templates and their instances. + +TEMPLATE_RELATIONS = YES + +# If the ENABLE_PREPROCESSING, SEARCH_INCLUDES, INCLUDE_GRAPH, and HAVE_DOT +# tags are set to YES then doxygen will generate a graph for each documented +# file showing the direct and indirect include dependencies of the file with +# other documented files. + +INCLUDE_GRAPH = NO + +# If the ENABLE_PREPROCESSING, SEARCH_INCLUDES, INCLUDED_BY_GRAPH, and +# HAVE_DOT tags are set to YES then doxygen will generate a graph for each +# documented header file showing the documented files that directly or +# indirectly include this file. + +INCLUDED_BY_GRAPH = NO + +# If the CALL_GRAPH and HAVE_DOT options are set to YES then +# doxygen will generate a call dependency graph for every global function +# or class method. Note that enabling this option will significantly increase +# the time of a run. So in most cases it will be better to enable call graphs +# for selected functions only using the \callgraph command. + +CALL_GRAPH = NO + +# If the CALLER_GRAPH and HAVE_DOT tags are set to YES then +# doxygen will generate a caller dependency graph for every global function +# or class method. Note that enabling this option will significantly increase +# the time of a run. So in most cases it will be better to enable caller +# graphs for selected functions only using the \callergraph command. + +CALLER_GRAPH = NO + +# If the GRAPHICAL_HIERARCHY and HAVE_DOT tags are set to YES then doxygen +# will generate a graphical hierarchy of all classes instead of a textual one. + +GRAPHICAL_HIERARCHY = YES + +# If the DIRECTORY_GRAPH and HAVE_DOT tags are set to YES +# then doxygen will show the dependencies a directory has on other directories +# in a graphical way. The dependency relations are determined by the #include +# relations between the files in the directories. + +DIRECTORY_GRAPH = YES + +# The DOT_IMAGE_FORMAT tag can be used to set the image format of the images +# generated by dot. Possible values are svg, png, jpg, or gif. +# If left blank png will be used. If you choose svg you need to set +# HTML_FILE_EXTENSION to xhtml in order to make the SVG files +# visible in IE 9+ (other browsers do not have this requirement). + +DOT_IMAGE_FORMAT = png + +# If DOT_IMAGE_FORMAT is set to svg, then this option can be set to YES to +# enable generation of interactive SVG images that allow zooming and panning. +# Note that this requires a modern browser other than Internet Explorer. +# Tested and working are Firefox, Chrome, Safari, and Opera. For IE 9+ you +# need to set HTML_FILE_EXTENSION to xhtml in order to make the SVG files +# visible. Older versions of IE do not have SVG support. + +INTERACTIVE_SVG = NO + +# The tag DOT_PATH can be used to specify the path where the dot tool can be +# found. If left blank, it is assumed the dot tool can be found in the path. + +DOT_PATH = + +# The DOTFILE_DIRS tag can be used to specify one or more directories that +# contain dot files that are included in the documentation (see the +# \dotfile command). + +DOTFILE_DIRS = + +# The MSCFILE_DIRS tag can be used to specify one or more directories that +# contain msc files that are included in the documentation (see the +# \mscfile command). + +MSCFILE_DIRS = + +# The DOT_GRAPH_MAX_NODES tag can be used to set the maximum number of +# nodes that will be shown in the graph. If the number of nodes in a graph +# becomes larger than this value, doxygen will truncate the graph, which is +# visualized by representing a node as a red box. Note that doxygen if the +# number of direct children of the root node in a graph is already larger than +# DOT_GRAPH_MAX_NODES then the graph will not be shown at all. Also note +# that the size of a graph can be further restricted by MAX_DOT_GRAPH_DEPTH. + +DOT_GRAPH_MAX_NODES = 50 + +# The MAX_DOT_GRAPH_DEPTH tag can be used to set the maximum depth of the +# graphs generated by dot. A depth value of 3 means that only nodes reachable +# from the root by following a path via at most 3 edges will be shown. Nodes +# that lay further from the root node will be omitted. Note that setting this +# option to 1 or 2 may greatly reduce the computation time needed for large +# code bases. Also note that the size of a graph can be further restricted by +# DOT_GRAPH_MAX_NODES. Using a depth of 0 means no depth restriction. + +MAX_DOT_GRAPH_DEPTH = 0 + +# Set the DOT_TRANSPARENT tag to YES to generate images with a transparent +# background. This is disabled by default, because dot on Windows does not +# seem to support this out of the box. Warning: Depending on the platform used, +# enabling this option may lead to badly anti-aliased labels on the edges of +# a graph (i.e. they become hard to read). + +DOT_TRANSPARENT = NO + +# Set the DOT_MULTI_TARGETS tag to YES allow dot to generate multiple output +# files in one run (i.e. multiple -o and -T options on the command line). This +# makes dot run faster, but since only newer versions of dot (>1.8.10) +# support this, this feature is disabled by default. + +DOT_MULTI_TARGETS = NO + +# If the GENERATE_LEGEND tag is set to YES (the default) Doxygen will +# generate a legend page explaining the meaning of the various boxes and +# arrows in the dot generated graphs. + +GENERATE_LEGEND = YES + +# If the DOT_CLEANUP tag is set to YES (the default) Doxygen will +# remove the intermediate dot files that are used to generate +# the various graphs. + +DOT_CLEANUP = YES diff --git a/pstl/CREDITS.txt b/pstl/CREDITS.txt index 4945fd5ad308be..174722510fdea4 100644 --- a/pstl/CREDITS.txt +++ b/pstl/CREDITS.txt @@ -1,21 +1,21 @@ -This file is a partial list of people who have contributed to the LLVM/pstl -(Parallel STL) project. If you have contributed a patch or made some other -contribution to LLVM/pstl, please submit a patch to this file to add yourself, -and it will be done! - -The list is sorted by surname and formatted to allow easy grepping and -beautification by scripts. The fields are: name (N), email (E), web-address -(W), PGP key ID and fingerprint (P), description (D), and snail-mail address -(S). - -N: Intel Corporation -W: http://www.intel.com -D: Created the initial implementation. - -N: Thomas Rodgers -E: trodgers at redhat.com -D: Identifier name transformation for inclusion in a Standard C++ library. - -N: Christopher Nelson -E: nadiasvertex at gmail.com -D: Add support for an OpenMP backend. +This file is a partial list of people who have contributed to the LLVM/pstl +(Parallel STL) project. If you have contributed a patch or made some other +contribution to LLVM/pstl, please submit a patch to this file to add yourself, +and it will be done! + +The list is sorted by surname and formatted to allow easy grepping and +beautification by scripts. The fields are: name (N), email (E), web-address +(W), PGP key ID and fingerprint (P), description (D), and snail-mail address +(S). + +N: Intel Corporation +W: http://www.intel.com +D: Created the initial implementation. + +N: Thomas Rodgers +E: trodgers at redhat.com +D: Identifier name transformation for inclusion in a Standard C++ library. + +N: Christopher Nelson +E: nadiasvertex at gmail.com +D: Add support for an OpenMP backend. From openmp-commits at lists.llvm.org Thu Oct 17 06:56:00 2024 From: openmp-commits at lists.llvm.org (Shilei Tian via Openmp-commits) Date: Thu, 17 Oct 2024 06:56:00 -0700 (PDT) Subject: [Openmp-commits] [openmp] [OpenMP] Fix missing gtid argument in __kmp_print_tdg_dot function (PR #111986) In-Reply-To: Message-ID: <67111770.170a0220.541aa.1a87@mx.google.com> https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/111986 From openmp-commits at lists.llvm.org Thu Oct 17 07:01:34 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Thu, 17 Oct 2024 07:01:34 -0700 (PDT) Subject: [Openmp-commits] [openmp] af1e9c8 - [OpenMP] Fix missing gtid argument in __kmp_print_tdg_dot function (#111986) Message-ID: <671118be.050a0220.1044d1.3697@mx.google.com> Author: Josep Pinot Date: 2024-10-17T10:01:28-04:00 New Revision: af1e9c81f4ab06ab46db87e273ec6eef5a24ef27 URL: https://github.com/llvm/llvm-project/commit/af1e9c81f4ab06ab46db87e273ec6eef5a24ef27 DIFF: https://github.com/llvm/llvm-project/commit/af1e9c81f4ab06ab46db87e273ec6eef5a24ef27.diff LOG: [OpenMP] Fix missing gtid argument in __kmp_print_tdg_dot function (#111986) This patch modifies the signature of the `__kmp_print_tdg_dot` function in `kmp_tasking.cpp` to include the global thread ID (gtid) as an argument. The gtid is now correctly passed to the function. - Updated the function declaration to accept the gtid parameter. - Modified all calls to `__kmp_print_tdg_dot` to pass the correct gtid value. This change addresses issues encountered when compiling with `OMPX_TASKGRAPH` enabled. No functional changes are expected beyond successful compilation. Added: Modified: openmp/runtime/src/kmp_tasking.cpp Removed: ################################################################################ diff --git a/openmp/runtime/src/kmp_tasking.cpp b/openmp/runtime/src/kmp_tasking.cpp index 7edaa8e127e52c..932799e133b45b 100644 --- a/openmp/runtime/src/kmp_tasking.cpp +++ b/openmp/runtime/src/kmp_tasking.cpp @@ -5491,7 +5491,8 @@ static kmp_tdg_info_t *__kmp_find_tdg(kmp_int32 tdg_id) { // __kmp_print_tdg_dot: prints the TDG to a dot file // tdg: ID of the TDG -void __kmp_print_tdg_dot(kmp_tdg_info_t *tdg) { +// gtid: Global Thread ID +void __kmp_print_tdg_dot(kmp_tdg_info_t *tdg, kmp_int32 gtid) { kmp_int32 tdg_id = tdg->tdg_id; KA_TRACE(10, ("__kmp_print_tdg_dot(enter): T#%d tdg_id=%d \n", gtid, tdg_id)); @@ -5693,7 +5694,7 @@ void __kmp_end_record(kmp_int32 gtid, kmp_tdg_info_t *tdg) { KMP_ATOMIC_ST_RLX(&__kmp_tdg_task_id, 0); if (__kmp_tdg_dot) - __kmp_print_tdg_dot(tdg); + __kmp_print_tdg_dot(tdg, gtid); } // __kmpc_end_record_task: wrapper around __kmp_end_record to mark From openmp-commits at lists.llvm.org Thu Oct 17 07:01:35 2024 From: openmp-commits at lists.llvm.org (Shilei Tian via Openmp-commits) Date: Thu, 17 Oct 2024 07:01:35 -0700 (PDT) Subject: [Openmp-commits] [openmp] [OpenMP] Fix missing gtid argument in __kmp_print_tdg_dot function (PR #111986) In-Reply-To: Message-ID: <671118bf.170a0220.19a061.190c@mx.google.com> https://github.com/shiltian closed https://github.com/llvm/llvm-project/pull/111986 From openmp-commits at lists.llvm.org Thu Oct 17 07:02:02 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Thu, 17 Oct 2024 07:02:02 -0700 (PDT) Subject: [Openmp-commits] [openmp] [OpenMP] Fix missing gtid argument in __kmp_print_tdg_dot function (PR #111986) In-Reply-To: Message-ID: <671118da.170a0220.17e28c.1a8c@mx.google.com> github-actions[bot] wrote: @jpinot Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our [build bots](https://lab.llvm.org/buildbot/). If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail [here](https://llvm.org/docs/MyFirstTypoFix.html#myfirsttypofix-issues-after-landing-your-pr). If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of [LLVM development](https://llvm.org/docs/DeveloperPolicy.html#patch-reversion-policy). You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! https://github.com/llvm/llvm-project/pull/111986 From openmp-commits at lists.llvm.org Fri Oct 18 04:56:36 2024 From: openmp-commits at lists.llvm.org (=?UTF-8?Q?Martin_Storsj=C3=B6?= via Openmp-commits) Date: Fri, 18 Oct 2024 04:56:36 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <67124cf4.170a0220.93691.2b89@mx.google.com> mstorsjo wrote: This breaks a number of tests on Windows. Previously, to have tests working on Windows, one would do `git config --global core.autocrlf false` or similar, before checking out llvm-project - a number of test input files _need_ to be in LF form to work. This was brought up earlier already by @llvm-beanz in https://github.com/llvm/llvm-project/pull/86318#issuecomment-2093160376. Now after this change, due to the added `.gitattributes` which overrides the `core.autocrlf` setting, these files get checked out with CRLF newlines (as the native form for the platform). Based on the comment in the .gitattributes file, it seems like this is both known and intentional behaviour: ``` # Checkout as native, commit as LF except in specific circumstances * text=auto ``` While it was already stated that blanket checkouts with CRLF _will_ fail. Additionally, the old mechanism of getting working newlines in the files is suddenly broken. --- To make things worse, you won't notice this thing if you're updating an existing workdir - files that aren't touched aren't rewritten. (To trigger re-checkout of files to get `.gitattributes` applied, one can do something like `git rm -r subdir && git reset && git checkout subdir`.) So I guess most buildbots will keep chugging along fine, until someone pushes changes that touch those files. If running with a fresh checkout of llvm-project, this has a much bigger impact. --- For compiler-rt tests, a handful of the profile tests depend on the right line endings - I can push a local `.gitattributes` file to fix that (see https://github.com/mstorsjo/llvm-project/commit/99bec81c87dcd2b7a7970954882bc0e42239d381). But the Clang, clang-tools-extra and LLVM testsuites have _many_ tests that are broken - from an initial (unverified) test run, there are around 80 tests failing due to this - sorting that out is a much bigger task that I'm not volunteering to take on right now. CC @AaronBallman as the majority of those failing tests are in Clang. Can we revert this until we figure out these bits? Do we really want checkouts to default to having the majority of files with CRLF on Windows? I would expect that most people working on Windows don't really want this (this wasn't the case so far anyway)? And other than that, we do need to tag the files that rely on being in LF form so that things work on Windows with either setting. https://github.com/llvm/llvm-project/pull/86318 From openmp-commits at lists.llvm.org Fri Oct 18 07:07:18 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Fri, 18 Oct 2024 07:07:18 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <67126b96.170a0220.107d28.5cd1@mx.google.com> ldrumm wrote: There are a couple of things to unpack here > a number of test input files need to be in LF form to work Which ones? Either there's a bug in a parser somewhere, or I missed some test files. In either case I'd like to fix the issue. I watched the buildbots quite closely last night and only noticed failures in ARM frame lowering - which isn't this, I think. > Now after this change, due to the added .gitattributes which overrides the core.autocrlf setting, these files get checked out with CRLF newlines (as the native form for the platform). It's my understanding that `text=auto` does not override `core.autocrlf`. As far as I can tell from the documentation it honours the user's configuration for `core.eol` in combination with `core.autocrlf` - from `git config --help`: > core.eol Sets the line ending type to use in the working directory for files that are marked as text (either by having the text attribute set, or by having text=auto and Git auto-detecting the contents as text). Alternatives are lf, crlf and native, which uses the platform’s native line ending. The default value is native. See gitattributes(5) for more information on end-of-line conversion. Note that this value is ignored if core.autocrlf is set to true or input. and > core.autocrlf Setting this variable to "true" is the same as setting the text attribute to "auto" on all files and core.eol to "crlf". Set to true if you want to have CRLF line endings in your working directory and the repository has LF line endings. This variable can be set to input, in which case no output conversion is performed. > Can we revert this until we figure out these bits Sure, but like I said, I'm happy to fix these broken cases. The old configuration was broken, but not in a controllable way, so I think it's reasonable to fix up the broken tests and move forward. Perhaps we also need clearer documentation? https://github.com/llvm/llvm-project/pull/86318 From openmp-commits at lists.llvm.org Fri Oct 18 08:36:39 2024 From: openmp-commits at lists.llvm.org (Aaron Ballman via Openmp-commits) Date: Fri, 18 Oct 2024 08:36:39 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <67128087.630a0220.21ca12.4da5@mx.google.com> AaronBallman wrote: This seems to have broken precommit CI on Windows: https://buildkite.com/llvm-project/github-pull-requests/builds/111165#0192a01b-d3ac-44ad-abff-e53ac4a206ab all of the failures look related to line endings, and I noticed that I got a ton of command line messages of the form: ``` warning: in the working copy of 'clang/include/clang/Basic/Attr.td', LF will be replaced by CRLF the next time Git touches it ``` > Can we revert this until we figure out these bits? Yes, please. > Do we really want checkouts to default to having the majority of files with CRLF on Windows? I would expect that most people working on Windows don't really want this (this wasn't the case so far anyway)? It certainly came as a surprise to me; my editors handle LF line endings just fine on Windows; it was very jarring to get hundreds of warnings from git that all seemed unactionable. https://github.com/llvm/llvm-project/pull/86318 From openmp-commits at lists.llvm.org Fri Oct 18 09:14:35 2024 From: openmp-commits at lists.llvm.org (=?UTF-8?Q?Martin_Storsj=C3=B6?= via Openmp-commits) Date: Fri, 18 Oct 2024 09:14:35 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <6712896b.050a0220.2e5518.693e@mx.google.com> mstorsjo wrote: > > a number of test input files need to be in LF form to work > > Which ones? A whole bunch of them. @AaronBallman's link to https://buildkite.com/llvm-project/github-pull-requests/builds/111165#0192a01b-d3ac-44ad-abff-e53ac4a206ab shows mostly what I saw. If including `clang-tools-extra` and `llvm` in the set of tests to run, I have failures in the following tests as well: ``` Clang Tools :: clang-move/move-used-helper-decls.cpp LLVM :: MC/ELF/warn-newline-in-escaped-string.s LLVM :: TableGen/x86-fold-tables.td LLVM :: tools/llvm-rc/tag-html.test lit :: shtest-shell.py ``` > Either there's a bug in a parser somewhere, In the vast majority of cases, it's not a bug in a parser, but a test that relies on the exact contents of the source files. E.g. for `tools/llvm-rc/tag-html.test` I would guess that the issue is that the test takes a text file (which now has variable line endings) and includes it in a binary resource file, and checks the output to match bitwise the expected output. In my nightly run of compiler-rt tests https://github.com/mstorsjo/llvm-mingw/actions/runs/11395494568/job/31717433112, I had the following failures: ``` Failed Tests (5): Profile-x86_64 :: instrprof-gcov-exceptions.test Profile-x86_64 :: instrprof-gcov-multiple-bbs-single-line.test Profile-x86_64 :: instrprof-gcov-one-line-function.test Profile-x86_64 :: instrprof-gcov-switch.test Profile-x86_64 :: instrprof-gcov-two-objects.test ``` I think the issue here may be something around testing the exact amount of whitespace somewhere or so. Mostly "brittle" tests that don't expect the source files to vary. > or I missed some test files. In either case I'd like to fix the issue. I watched the buildbots quite closely last night and only noticed failures in ARM frame lowering - which isn't this, I think. Did you do a test run on Windows? Even on Linux, I guess it should be possible to check out the code, forcing Git to prefer checking out text files as CRLF, so you could experience the same amount of fallout. > > Now after this change, due to the added .gitattributes which overrides the core.autocrlf setting, these files get checked out with CRLF newlines (as the native form for the platform). > > It's my understanding that `text=auto` does not override `core.autocrlf`. As far as I can tell from the documentation it honours the user's configuration for `core.eol` in combination with `core.autocrlf` - from `git config --help`: This doesn't match my experience. See https://github.com/mstorsjo/llvm-project/commit/inspect-newlines for a test github actions job that shows checking out the repo on Windows. First we check out an old branch, with default settings - we get CRLF. Then we set `core.autocrlf=false` and retry the above, we get a file with LF newlines. Then we check out the new version with gitattributes, and we notice how we now suddenly are getting CRLF again, despite our setting. See https://github.com/mstorsjo/llvm-project/actions/runs/11407224268 for the actual run log. Feel free to play around with such an action, to come up with a better way of checking it out while still getting LF newlines - but this is the way I have been doing it (which is scripted in a number of places), and I would expect that many others have the same setup. https://github.com/llvm/llvm-project/pull/86318 From openmp-commits at lists.llvm.org Fri Oct 18 09:19:45 2024 From: openmp-commits at lists.llvm.org (=?UTF-8?Q?Martin_Storsj=C3=B6?= via Openmp-commits) Date: Fri, 18 Oct 2024 09:19:45 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <67128aa1.170a0220.1adcba.88f6@mx.google.com> mstorsjo wrote: > This seems to have broken precommit CI on Windows: https://buildkite.com/llvm-project/github-pull-requests/builds/111165#0192a01b-d3ac-44ad-abff-e53ac4a206ab all of the failures look related to line endings, and I noticed that I got a ton of command line messages of the form: > > ``` > warning: in the working copy of 'clang/include/clang/Basic/Attr.td', LF will be replaced by CRLF the next time Git touches it > ``` Right, that probably relates to the odd situation where files are checked out in one form, but the git attributes indicate that they should be treated differently. Getting git to rewrite the files on disk to match it is kinda messy actually - the best way I've found today is `git rm -r . && git reset && git checkout .`. > > Do we really want checkouts to default to having the majority of files with CRLF on Windows? I would expect that most people working on Windows don't really want this (this wasn't the case so far anyway)? > > It certainly came as a surprise to me; my editors handle LF line endings just fine on Windows; Exactly - people haven't had a problem with _having_ LF newlines so far. The main problem probably has been to set up Git to get it in that form. I obviously don't mind fixing as many tests as possible to pass with any form of newlines, but having all checkouts of the repo actually have files in identical form, rather than in any fuzzy form, feels like a feature to me. So keeping the gitattributes, but in a form where it dictates checking out files in LF form (which has required setting `core.autocrlf=false` so far) would IMO be preferrable. Would we get there by setting the wildcard rule in `.gitattrubtes` to `* text eof=lf`, or something along those lines? https://github.com/llvm/llvm-project/pull/86318 From openmp-commits at lists.llvm.org Fri Oct 18 09:57:20 2024 From: openmp-commits at lists.llvm.org (Arthur Eubanks via Openmp-commits) Date: Fri, 18 Oct 2024 09:57:20 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <67129370.a70a0220.343fda.7803@mx.google.com> aeubanks wrote: I believe Chrome is also seeing many test failures due to this (https://crbug.com/374115887), although I haven't yet confirmed it's due to this specific commit. https://github.com/llvm/llvm-project/pull/86318 From openmp-commits at lists.llvm.org Fri Oct 18 11:39:10 2024 From: openmp-commits at lists.llvm.org (Aaron Ballman via Openmp-commits) Date: Fri, 18 Oct 2024 11:39:10 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <6712ab4e.170a0220.15d325.7548@mx.google.com> AaronBallman wrote: I just had someone in my office hours also running into problems from this commit. I went to revert the changes myself and I cannot because of merge conflicts... due to line endings. @ldrumm -- can you revert these changes ASAP? They're causing significant problems in practice, so best to get us back to green rather than fix forward. Thanks! https://github.com/llvm/llvm-project/pull/86318 From openmp-commits at lists.llvm.org Fri Oct 18 13:18:38 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Fri, 18 Oct 2024 13:18:38 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <6712c29e.170a0220.1e3c66.a692@mx.google.com> ldrumm wrote: On Fri Oct 18, 2024 at 7:39 PM BST, Aaron Ballman wrote: > @ldrumm -- can you revert these changes ASAP? They're causing > significant problems in practice, so best to get us back to green rather > than fix forward. Thanks! Reverted. https://github.com/llvm/llvm-project/pull/86318 From openmp-commits at lists.llvm.org Fri Oct 18 13:30:38 2024 From: openmp-commits at lists.llvm.org (Ye Luo via Openmp-commits) Date: Fri, 18 Oct 2024 13:30:38 -0700 (PDT) Subject: [Openmp-commits] [openmp] [OpenMP] Create versioned libgomp softlinks (PR #112973) Message-ID: https://github.com/ye-luo created https://github.com/llvm/llvm-project/pull/112973 Add libgomp.1.dylib for MacOS and libgomp.so.1 for Linux Linkers on Mac and Linux pick up versioned libgomp dynamic library files. The existing softlinks (libgomp.dylib for MacOS and libgomp.so for Linux) are insufficient. This helps alleviate the issue of mixing libgomp and libomp at runtime. >From 3635125a3f291072227a4b77df214ffb97a58b7d Mon Sep 17 00:00:00 2001 From: Ye Luo Date: Fri, 18 Oct 2024 15:20:22 -0500 Subject: [PATCH] [OpenMP] Create versioned libgomp softlinks. libgomp.1.dylib for MacOS or libgomp.so.1 for Linux --- openmp/runtime/src/CMakeLists.txt | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/openmp/runtime/src/CMakeLists.txt b/openmp/runtime/src/CMakeLists.txt index 439cc20963a129..61c0bacc9f2062 100644 --- a/openmp/runtime/src/CMakeLists.txt +++ b/openmp/runtime/src/CMakeLists.txt @@ -253,6 +253,17 @@ if(NOT WIN32) libiomp5${LIBOMP_LIBRARY_SUFFIX} WORKING_DIRECTORY ${LIBOMP_LIBRARY_DIR} ) + if(LIBOMP_ENABLE_SHARED) + if(APPLE) + set(VERSIONED_LIBGOMP_NAME libgomp.1${LIBOMP_LIBRARY_SUFFIX}) + else() + set(VERSIONED_LIBGOMP_NAME libgomp${LIBOMP_LIBRARY_SUFFIX}.1) + endif() + add_custom_command(TARGET omp POST_BUILD + COMMAND ${CMAKE_COMMAND} -E create_symlink ${LIBOMP_LIB_FILE} ${VERSIONED_LIBGOMP_NAME} + WORKING_DIRECTORY ${LIBOMP_LIBRARY_DIR} + ) + endif() endif() # Definitions for testing, for reuse when testing libomptarget-nvptx. @@ -439,13 +450,18 @@ else() if(${LIBOMP_INSTALL_ALIASES}) # Create aliases (symlinks) of the library for backwards compatibility + extend_path(outdir "${CMAKE_INSTALL_PREFIX}" "${OPENMP_INSTALL_LIBDIR}") set(LIBOMP_ALIASES "libgomp;libiomp5") foreach(alias IN LISTS LIBOMP_ALIASES) - extend_path(outdir "${CMAKE_INSTALL_PREFIX}" "${OPENMP_INSTALL_LIBDIR}") install(CODE "execute_process(COMMAND \"\${CMAKE_COMMAND}\" -E create_symlink \"${LIBOMP_LIB_FILE}\" \"${alias}${LIBOMP_LIBRARY_SUFFIX}\" WORKING_DIRECTORY \"\$ENV{DESTDIR}${outdir}\")") endforeach() + if(LIBOMP_ENABLE_SHARED) + install(CODE "execute_process(COMMAND \"\${CMAKE_COMMAND}\" -E create_symlink \"${LIBOMP_LIB_FILE}\" + \"${VERSIONED_LIBGOMP_NAME}\" WORKING_DIRECTORY + \"\$ENV{DESTDIR}${outdir}\")") + endif() endif() endif() install( From openmp-commits at lists.llvm.org Mon Oct 21 09:52:16 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Mon, 21 Oct 2024 09:52:16 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <671686c0.170a0220.1bb366.ceec@mx.google.com> zmodem wrote: Thanks for reverting! I must have missed this PR originally. I oppose letting Git change any line endings. It always ends like this. https://github.com/llvm/llvm-project/pull/86318 From openmp-commits at lists.llvm.org Mon Oct 21 14:46:35 2024 From: openmp-commits at lists.llvm.org (Ye Luo via Openmp-commits) Date: Mon, 21 Oct 2024 14:46:35 -0700 (PDT) Subject: [Openmp-commits] [openmp] [OpenMP] Create versioned libgomp softlinks (PR #112973) In-Reply-To: Message-ID: <6716cbbb.050a0220.155453.a77d@mx.google.com> ye-luo wrote: Ping @shiltian https://github.com/llvm/llvm-project/pull/112973 From openmp-commits at lists.llvm.org Tue Oct 22 03:54:45 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Tue, 22 Oct 2024 03:54:45 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <67178475.050a0220.52845.ceb5@mx.google.com> ldrumm wrote: >> It's my understanding that text=auto does not override core.autocrlf. As far as I can tell from the documentation it honours the user's configuration for core.eol in combination with core.autocrlf - from git config --help: > This doesn't match my experience. I think this is due to a subtly of config. Setting `core.autocrlf` to `false` doesn't actually do anything since it's the default. In that case git is still in "no opinion" mode - which means it stores the input line endings and does no conversion. However, once `eol=auto` is set in a `.gitattributes`, it forces git to use the configured eol config: ```c static int text_eol_is_crlf(void) { if (auto_crlf == AUTO_CRLF_TRUE) return 1; else if (auto_crlf == AUTO_CRLF_INPUT) return 0; if (core_eol == EOL_CRLF) return 1; if (core_eol == EOL_UNSET && EOL_NATIVE == EOL_CRLF) return 1; return 0; } static enum eol output_eol(enum convert_crlf_action crlf_action) { switch (crlf_action) { case CRLF_BINARY: return EOL_UNSET; case CRLF_TEXT_CRLF: return EOL_CRLF; case CRLF_TEXT_INPUT: return EOL_LF; case CRLF_UNDEFINED: case CRLF_AUTO_CRLF: return EOL_CRLF; case CRLF_AUTO_INPUT: return EOL_LF; case CRLF_TEXT: case CRLF_AUTO: /* fall through */ return text_eol_is_crlf() ? EOL_CRLF : EOL_LF; } warning(_("illegal crlf_action %d"), (int)crlf_action); return core_eol; } ``` `output_eol` is the git function that decides to write out a file with CRLF or LF endings Notice that now we hit the `CRLF_AUTO` case so it's `text_eol_is_crlf() ? EOL_CRLF : EOL_LF;` `text_eol_is_crlf()` checks against `core.autocrlf`. Since you've stated that it's `false`, then it then it checks `core.eol`. So if I've read that correctly, `core.autocrlf=false` is a red herring and you should really set `core.eol=lf` if you want git to use `lf` on windows. > Would we get there by setting the wildcard rule in .gitattrubtes to * text eof=lf, or something along those lines? This patch is about respecting local config, which is the exact opposite of that suggestion. It would be a way to solve the line-ending issue by fiat, not by co-operation, so I'm against it on principle. To be clear I very much don't like CRLF, but I also very much don't like it when someone forces me to use wrong-handed tools. Windows users would be forced to use wrong handed tools if we force line-endings one way or another https://github.com/llvm/llvm-project/pull/86318 From openmp-commits at lists.llvm.org Tue Oct 22 03:57:49 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Tue, 22 Oct 2024 03:57:49 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <6717852d.170a0220.1fea8d.be95@mx.google.com> ldrumm wrote: > It always ends like this. Ends Like what? As far as I can see all this has done has exposed latent bugs in our testing and in clang's parser https://github.com/llvm/llvm-project/pull/86318 From openmp-commits at lists.llvm.org Tue Oct 22 04:01:39 2024 From: openmp-commits at lists.llvm.org (=?UTF-8?Q?Martin_Storsj=C3=B6?= via Openmp-commits) Date: Tue, 22 Oct 2024 04:01:39 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <67178613.170a0220.1ba535.c1bb@mx.google.com> mstorsjo wrote: > > > It's my understanding that text=auto does not override core.autocrlf. As far as I can tell from the documentation it honours the user's configuration for core.eol in combination with core.autocrlf - from git config --help: > > > This doesn't match my experience. > > I think this is due to a subtly of config. Setting `core.autocrlf` to `false` doesn't actually do anything since it's the default. It most definitely does something. Please have another look at https://github.com/mstorsjo/llvm-project/commit/inspect-newlines and the output of the log at https://github.com/mstorsjo/llvm-project/actions/runs/11407224268/job/31742748818. First we do a checkout without setting anything. We get files with CRLF. Then we set `core.autocrlf` to `false`, which is the common way of dealing with this, then we do another checkout, and we get files with LF. Do you dispute the above? https://github.com/llvm/llvm-project/pull/86318 From openmp-commits at lists.llvm.org Tue Oct 22 04:07:50 2024 From: openmp-commits at lists.llvm.org (=?UTF-8?Q?Martin_Storsj=C3=B6?= via Openmp-commits) Date: Tue, 22 Oct 2024 04:07:50 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <67178786.050a0220.2800f7.d3b2@mx.google.com> mstorsjo wrote: > So if I've read that correctly, `core.autocrlf=false` is a red herring and you should really set `core.eol=lf` if you want git to use `lf` on windows. That perhaps may be the case, but all common docs and all common practices around this revolve around setting `core.autocrlf`, not setting `core.eol`. Before this discussion, I have never seen a guide recommending setting `core.eol`. While so far, all docs related to this say that it is `core.autocrlf` one should set: https://github.com/llvm/llvm-project/blob/llvmorg-19.1.2/clang/www/get_started.html#L151-L154 and https://github.com/llvm/llvm-project/blob/llvmorg-19.1.2/llvm/docs/GettingStarted.rst?plain=1#L37. > > Would we get there by setting the wildcard rule in .gitattrubtes to * text eof=lf, or something along those lines? > > This patch is about respecting local config, which is the exact opposite of that suggestion. It would be a way to solve the line-ending issue by fiat, not by co-operation, so I'm against it on principle. To be clear I very much don't like CRLF, but I also very much don't like it when someone forces me to use wrong-handed tools. Windows users would be forced to use wrong handed tools if we force line-endings one way or another But if every single Windows developer involved here say that they want LF, and they don't want any ambiguity about it? Is it more important to give hypothetical users the choice to pick what they like, at the cost of every single current developer who do not want that, and breaking every established setup routine? https://github.com/llvm/llvm-project/pull/86318 From openmp-commits at lists.llvm.org Tue Oct 22 04:10:23 2024 From: openmp-commits at lists.llvm.org (=?UTF-8?Q?Martin_Storsj=C3=B6?= via Openmp-commits) Date: Tue, 22 Oct 2024 04:10:23 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <6717881f.170a0220.3aa3c9.c1e5@mx.google.com> mstorsjo wrote: > I must have missed this PR originally. I oppose letting Git change any line endings. It always ends like this. Also just for context - the Clang precommit CI is allegedly still broken, because those buildbots happened to be restarted when we had these gitattributes in place, so all files are checked out with CRLF right now, and any incremental update on top doesn't change that, as long as those files aren't touched: https://discourse.llvm.org/t/windows-premerge-buildbot-broken-for-5-days/82571/6 https://github.com/llvm/llvm-project/pull/86318 From openmp-commits at lists.llvm.org Tue Oct 22 04:22:08 2024 From: openmp-commits at lists.llvm.org (=?UTF-8?Q?Martin_Storsj=C3=B6?= via Openmp-commits) Date: Tue, 22 Oct 2024 04:22:08 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <67178ae0.620a0220.3abbb3.d074@mx.google.com> mstorsjo wrote: > This patch is about respecting local config, which is the exact opposite of that suggestion. It would be a way to solve the line-ending issue by fiat, not by co-operation, so I'm against it on principle. To be clear I very much don't like CRLF, but I also very much don't like it when someone forces me to use wrong-handed tools. Windows users would be forced to use wrong handed tools if we force line-endings one way or another Also FWIW, I wouldn't that much mind letting users pick whichever form of line endings they would like, if all tests would have been cleaned up _before_ this, so that they pass regardless of the user choice or tool defaults - but alas, there still are >70 Clang tests failing. https://github.com/llvm/llvm-project/pull/86318 From openmp-commits at lists.llvm.org Tue Oct 22 04:23:28 2024 From: openmp-commits at lists.llvm.org (=?UTF-8?Q?Martin_Storsj=C3=B6?= via Openmp-commits) Date: Tue, 22 Oct 2024 04:23:28 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <67178b30.050a0220.28191c.d7c5@mx.google.com> mstorsjo wrote: > I think this is due to a subtly of config. Setting `core.autocrlf` to `false` doesn't actually do anything since it's the default. In Git for Windows, the default actually is `core.autocrlf` set to `true`. When manually installing, the installer wizard used to ask the user which way they want their defaults to be set, not sure if this still is the case. But in e.g. Github Actions runners, you get it set to `true` by default - see https://github.com/mstorsjo/llvm-project/actions/runs/11459095411/job/31882864878. https://github.com/llvm/llvm-project/pull/86318 From openmp-commits at lists.llvm.org Tue Oct 22 04:43:55 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Tue, 22 Oct 2024 04:43:55 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <67178ffb.a70a0220.3a4b10.dc6d@mx.google.com> ldrumm wrote: > if all tests would have been cleaned up before this That was most certainly my intention, and I saw green before merging, so I must've looked in the wrong place https://github.com/llvm/llvm-project/pull/86318 From openmp-commits at lists.llvm.org Tue Oct 22 04:52:28 2024 From: openmp-commits at lists.llvm.org (=?UTF-8?Q?Martin_Storsj=C3=B6?= via Openmp-commits) Date: Tue, 22 Oct 2024 04:52:28 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <671791fc.050a0220.332edd.d8a3@mx.google.com> mstorsjo wrote: > > if all tests would have been cleaned up before this > > That was most certainly my intention, and I saw green before merging, so I must've looked in the wrong place Ah, right - as we've seen that the CI runner normally only updates an existing checkout, where changes to gitattributes like these don't really take effect, I guess this can be understood. (Plus there are a number of tests in less frequently executed testsuites, like compiler-rt, clang-tools-extra, and in `llvm/utils/lit/tests`, that don't necessarily get included in each normal run in CI.) On that topic - some of the scripts that orchestrate that premerge testing lives in `.ci/generate-buildkite-pipeline-premerge`. By editing that script, it should be possible to trigger it to run all possible testsuites, not just the ones currently touched. And I'm wondering if there's anything we could add there temporarily (both right now, to flush the current checkouts on the premerge cluster, and when testing PRs like this one), to force it to check files out again? https://github.com/llvm/llvm-project/pull/86318 From openmp-commits at lists.llvm.org Tue Oct 22 07:51:56 2024 From: openmp-commits at lists.llvm.org (Aaron Ballman via Openmp-commits) Date: Tue, 22 Oct 2024 07:51:56 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <6717bc0c.170a0220.dd0bf.f9e4@mx.google.com> AaronBallman wrote: > But if every single Windows developer involved here say that they want LF, and they don't want any ambiguity about it? Is it more important to give hypothetical users the choice to pick what they like, at the cost of every single current developer who do not want that, and breaking every established setup routine? +1; I was [on the fence](https://github.com/llvm/llvm-project/pull/86318#pullrequestreview-2023166237) about the changes because what happened here has [happened before](https://github.com/llvm/llvm-project/pull/86318#issuecomment-2427230026). The amount of churn is already pretty high -- please make sure the original commit, fixes, and reverts get added to https://github.com/llvm/llvm-project/blob/main/.git-blame-ignore-revs. At the end of the day, we have a number of tests and files which are sensitive to line endings and we have a lot of existing clones of the repo which have been set up to work properly with the current setup, so this is a risky change. Has there been a recent RFC asking if the community wants to go down this path? If not, we should run one before attempting further changes (aside from fixing anything up that still needs fixing, if anything). https://github.com/llvm/llvm-project/pull/86318 From openmp-commits at lists.llvm.org Tue Oct 22 07:54:17 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Tue, 22 Oct 2024 07:54:17 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <6717bc99.170a0220.3c6f67.5df9@mx.google.com> ldrumm wrote: Yes. An RFC makes sense. None of us here speak for every windows developer. I will submit one to discourse once I iron out the kinks and am ready to try again https://github.com/llvm/llvm-project/pull/86318 From openmp-commits at lists.llvm.org Tue Oct 22 08:44:57 2024 From: openmp-commits at lists.llvm.org (=?UTF-8?Q?Martin_Storsj=C3=B6?= via Openmp-commits) Date: Tue, 22 Oct 2024 08:44:57 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <6717c879.630a0220.36c8fc.38ed@mx.google.com> mstorsjo wrote: > The amount of churn is already pretty high -- please make sure the original commit, fixes, and reverts get added to https://github.com/llvm/llvm-project/blob/main/.git-blame-ignore-revs. At the end of the day, we have a number of tests and files which are sensitive to line endings and we have a lot of existing clones of the repo which have been set up to work properly with the current setup, so this is a risky change. FWIW, while I'm not a fan of checking things out with CRLF per se (or making `core.autocrlf` no longer have precedence), I totally don't mind fixing tests (or in most cases, adding some individual files to `.gitattributes` to mark them as needing to be checked out with LF newlines always), and I don't consider that part churn. We have a number of such files marked that way from before, but we clearly do need to add a few more. But I haven't checked all the commits that have gone in to try to fix up tests that were failing due to CRLF - in case some of them really should be reverted once we're back to checking things out with LF. https://github.com/llvm/llvm-project/pull/86318 From openmp-commits at lists.llvm.org Tue Oct 22 09:07:35 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Tue, 22 Oct 2024 09:07:35 -0700 (PDT) Subject: [Openmp-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [openmp] [pstl] Finally formalise our defacto line-ending policy (PR #86318) In-Reply-To: Message-ID: <6717cdc7.170a0220.1807fc.febf@mx.google.com> ldrumm wrote: @AaronBallman you said this has happened before, but I don't see this in history. Can you link to the commit to which you're referring? I only see one other commit (9783f28cb) that touches the root `.gitattributes` https://github.com/llvm/llvm-project/pull/86318 From openmp-commits at lists.llvm.org Wed Oct 23 08:54:20 2024 From: openmp-commits at lists.llvm.org (Gheorghe-Teodor Bercea via Openmp-commits) Date: Wed, 23 Oct 2024 08:54:20 -0700 (PDT) Subject: [Openmp-commits] [clang] [llvm] [openmp] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587) In-Reply-To: Message-ID: <67191c2c.170a0220.4e2e6.701f@mx.google.com> ================ @@ -0,0 +1,77 @@ +// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ +// RUN: -Xclang "-fprofile-instrument=clang" +// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ +// RUN: --check-prefix="CLANG-PGO" +// RUN: %libomptarget-compile-generic -fprofile-generate \ +// RUN: -Xclang "-fprofile-instrument=llvm" +// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ +// RUN: --check-prefix="LLVM-PGO" + +// UNSUPPORTED: x86_64-pc-linux-gnu +// UNSUPPORTED: x86_64-pc-linux-gnu-LTO +// UNSUPPORTED: aarch64-unknown-linux-gnu +// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +// REQUIRES: pgo + +#ifdef _OPENMP +#include +#endif + +int test1(int a) { return a / 2; } +int test2(int a) { return a * 2; } + +int main() { + int m = 2; +#pragma omp target + for (int i = 0; i < 10; i++) { + m = test1(m); + for (int j = 0; j < 2; j++) { + m = test2(m); + } + } +} + +// CLANG-PGO: ======== Counters ========= +// CLANG-PGO-NEXT: [ 0 11 20 ] +// CLANG-PGO-NEXT: [ 10 ] +// CLANG-PGO-NEXT: [ 20 ] +// CLANG-PGO-NEXT: ========== Data =========== +// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CLANG-PGO-NEXT: ======== Functions ======== +// CLANG-PGO-NEXT: pgo1.c: +// CLANG-PGO-SAME: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} +// CLANG-PGO-NEXT: test1 +// CLANG-PGO-NEXT: test2 + +// LLVM-PGO: ======== Counters ========= +// LLVM-PGO-NEXT: [ 20 ] +// LLVM-PGO-NEXT: [ 10 ] +// LLVM-PGO-NEXT: [ 20 10 1 1 ] ---------------- doru1004 wrote: What do these numbers represent and why have they changed in subsequent commits to 20 10 2 1 ? https://github.com/llvm/llvm-project/pull/76587 From openmp-commits at lists.llvm.org Wed Oct 23 09:24:50 2024 From: openmp-commits at lists.llvm.org (Gheorghe-Teodor Bercea via Openmp-commits) Date: Wed, 23 Oct 2024 09:24:50 -0700 (PDT) Subject: [Openmp-commits] [clang] [llvm] [openmp] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587) In-Reply-To: Message-ID: <67192352.170a0220.1bc3c6.81e3@mx.google.com> ================ @@ -0,0 +1,77 @@ +// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ +// RUN: -Xclang "-fprofile-instrument=clang" +// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ +// RUN: --check-prefix="CLANG-PGO" +// RUN: %libomptarget-compile-generic -fprofile-generate \ +// RUN: -Xclang "-fprofile-instrument=llvm" +// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ +// RUN: --check-prefix="LLVM-PGO" + +// UNSUPPORTED: x86_64-pc-linux-gnu +// UNSUPPORTED: x86_64-pc-linux-gnu-LTO +// UNSUPPORTED: aarch64-unknown-linux-gnu +// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +// REQUIRES: pgo + +#ifdef _OPENMP +#include +#endif + +int test1(int a) { return a / 2; } +int test2(int a) { return a * 2; } + +int main() { + int m = 2; +#pragma omp target + for (int i = 0; i < 10; i++) { + m = test1(m); + for (int j = 0; j < 2; j++) { + m = test2(m); + } + } +} + +// CLANG-PGO: ======== Counters ========= +// CLANG-PGO-NEXT: [ 0 11 20 ] +// CLANG-PGO-NEXT: [ 10 ] +// CLANG-PGO-NEXT: [ 20 ] +// CLANG-PGO-NEXT: ========== Data =========== +// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CLANG-PGO-NEXT: ======== Functions ======== +// CLANG-PGO-NEXT: pgo1.c: +// CLANG-PGO-SAME: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} +// CLANG-PGO-NEXT: test1 +// CLANG-PGO-NEXT: test2 + +// LLVM-PGO: ======== Counters ========= +// LLVM-PGO-NEXT: [ 20 ] +// LLVM-PGO-NEXT: [ 10 ] +// LLVM-PGO-NEXT: [ 20 10 1 1 ] ---------------- doru1004 wrote: @EthanLuisMcDonough https://github.com/llvm/llvm-project/pull/76587 From openmp-commits at lists.llvm.org Wed Oct 23 21:36:18 2024 From: openmp-commits at lists.llvm.org (Shilei Tian via Openmp-commits) Date: Wed, 23 Oct 2024 21:36:18 -0700 (PDT) Subject: [Openmp-commits] [openmp] [OpenMP] Fix the test issue when `libomp` is built as a static library (PR #113522) Message-ID: https://github.com/shiltian created https://github.com/llvm/llvm-project/pull/113522 Fixes #113436. >From 27b6f28268d1a16163c358062ebc0261619e9fbe Mon Sep 17 00:00:00 2001 From: Shilei Tian Date: Thu, 24 Oct 2024 00:35:53 -0400 Subject: [PATCH] [OpenMP] Fix the test issue when `libomp` is built as a static library Fixes #113436. --- openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp b/openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp index bc02caccb69ed9..9a07564406f7f9 100644 --- a/openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp +++ b/openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp @@ -43,7 +43,7 @@ struct anon { }; } -kmp_int32 __kmp_hidden_helper_threads_num; +static kmp_int32 __kmp_hidden_helper_threads_num; kmp_int32 omp_task_entry(kmp_int32 gtid, kmp_task_t_with_privates *task) { auto shareds = reinterpret_cast(task->task.shareds); From openmp-commits at lists.llvm.org Wed Oct 23 21:36:31 2024 From: openmp-commits at lists.llvm.org (Shilei Tian via Openmp-commits) Date: Wed, 23 Oct 2024 21:36:31 -0700 (PDT) Subject: [Openmp-commits] [openmp] [OpenMP] Fix the test issue when `libomp` is built as a static library (PR #113522) In-Reply-To: Message-ID: <6719cecf.050a0220.bf865.bf9a@mx.google.com> shiltian wrote: * **#113522** Graphite 👈 * `main` This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @shiltian and the rest of your teammates on Graphite Graphite https://github.com/llvm/llvm-project/pull/113522 From openmp-commits at lists.llvm.org Thu Oct 24 01:20:53 2024 From: openmp-commits at lists.llvm.org (Paul Osmialowski via Openmp-commits) Date: Thu, 24 Oct 2024 01:20:53 -0700 (PDT) Subject: [Openmp-commits] [openmp] [OpenMP] Fix the test issue when `libomp` is built as a static library (PR #113522) In-Reply-To: Message-ID: <671a0365.050a0220.238fed.caae@mx.google.com> https://github.com/pawosm-arm approved this pull request. https://github.com/llvm/llvm-project/pull/113522 From openmp-commits at lists.llvm.org Thu Oct 24 09:52:21 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Thu, 24 Oct 2024 09:52:21 -0700 (PDT) Subject: [Openmp-commits] [openmp] 5d07162 - [OpenMP] Fix the test issue when `libomp` is built as a static library (#113522) Message-ID: <671a7b45.170a0220.3501ee.ff7a@mx.google.com> Author: Shilei Tian Date: 2024-10-24T12:52:17-04:00 New Revision: 5d07162bba0648f5a5733039a7795eb7e9913863 URL: https://github.com/llvm/llvm-project/commit/5d07162bba0648f5a5733039a7795eb7e9913863 DIFF: https://github.com/llvm/llvm-project/commit/5d07162bba0648f5a5733039a7795eb7e9913863.diff LOG: [OpenMP] Fix the test issue when `libomp` is built as a static library (#113522) Added: Modified: openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp Removed: ################################################################################ diff --git a/openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp b/openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp index bc02caccb69ed9..9a07564406f7f9 100644 --- a/openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp +++ b/openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp @@ -43,7 +43,7 @@ struct anon { }; } -kmp_int32 __kmp_hidden_helper_threads_num; +static kmp_int32 __kmp_hidden_helper_threads_num; kmp_int32 omp_task_entry(kmp_int32 gtid, kmp_task_t_with_privates *task) { auto shareds = reinterpret_cast(task->task.shareds); From openmp-commits at lists.llvm.org Thu Oct 24 09:52:23 2024 From: openmp-commits at lists.llvm.org (Shilei Tian via Openmp-commits) Date: Thu, 24 Oct 2024 09:52:23 -0700 (PDT) Subject: [Openmp-commits] [openmp] [OpenMP] Fix the test issue when `libomp` is built as a static library (PR #113522) In-Reply-To: Message-ID: <671a7b47.620a0220.14b2e2.2785@mx.google.com> https://github.com/shiltian closed https://github.com/llvm/llvm-project/pull/113522 From openmp-commits at lists.llvm.org Fri Oct 25 07:49:56 2024 From: openmp-commits at lists.llvm.org (Shilei Tian via Openmp-commits) Date: Fri, 25 Oct 2024 07:49:56 -0700 (PDT) Subject: [Openmp-commits] [openmp] [OpenMP] Create versioned libgomp softlinks (PR #112973) In-Reply-To: Message-ID: <671bb014.170a0220.3d606b.3f65@mx.google.com> shiltian wrote: If I understand your comment correctly, `libgomp.so --> libomp.so`, `libomp.so.1 --> libomp.so`. Eventually you still get the same `libomp`, no? https://github.com/llvm/llvm-project/pull/112973 From openmp-commits at lists.llvm.org Fri Oct 25 08:05:20 2024 From: openmp-commits at lists.llvm.org (Ye Luo via Openmp-commits) Date: Fri, 25 Oct 2024 08:05:20 -0700 (PDT) Subject: [Openmp-commits] [openmp] [OpenMP] Create versioned libgomp softlinks (PR #112973) In-Reply-To: Message-ID: <671bb3b0.170a0220.3a1553.409c@mx.google.com> ye-luo wrote: > If I understand your comment correctly, `libgomp.so --> libomp.so`, `libomp.so.1 --> libomp.so`. Eventually you still get the same `libomp`, no? Not sure if you were trying to say that there is `libgomp.so.1 --> libgomp.so (GCC)` such that an app eventually picks up `libgomp.so`. Sorry for the confusing. I corrected above. `libgomp.so --> libomp.so`, `libgomp.so.1 --> libomp.so`. The intention is to only pick up libomp at runtime. https://github.com/llvm/llvm-project/pull/112973 From openmp-commits at lists.llvm.org Fri Oct 25 11:12:31 2024 From: openmp-commits at lists.llvm.org (Shilei Tian via Openmp-commits) Date: Fri, 25 Oct 2024 11:12:31 -0700 (PDT) Subject: [Openmp-commits] [openmp] [OpenMP] Create versioned libgomp softlinks (PR #112973) In-Reply-To: Message-ID: <671bdf8f.a70a0220.2bd03f.76f8@mx.google.com> https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/112973 From openmp-commits at lists.llvm.org Fri Oct 25 11:20:02 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Fri, 25 Oct 2024 11:20:02 -0700 (PDT) Subject: [Openmp-commits] [openmp] eccdb24 - [OpenMP] Create versioned libgomp softlinks (#112973) Message-ID: <671be152.630a0220.322603.6cbb@mx.google.com> Author: Ye Luo Date: 2024-10-25T13:19:58-05:00 New Revision: eccdb2489483ca58d2cb35bc38967a8e33117575 URL: https://github.com/llvm/llvm-project/commit/eccdb2489483ca58d2cb35bc38967a8e33117575 DIFF: https://github.com/llvm/llvm-project/commit/eccdb2489483ca58d2cb35bc38967a8e33117575.diff LOG: [OpenMP] Create versioned libgomp softlinks (#112973) Add libgomp.1.dylib for MacOS and libgomp.so.1 for Linux Linkers on Mac and Linux pick up versioned libgomp dynamic library files. The existing softlinks (libgomp.dylib for MacOS and libgomp.so for Linux) are insufficient. This helps alleviate the issue of mixing libgomp and libomp at runtime. Added: Modified: openmp/runtime/src/CMakeLists.txt Removed: ################################################################################ diff --git a/openmp/runtime/src/CMakeLists.txt b/openmp/runtime/src/CMakeLists.txt index 439cc20963a129..61c0bacc9f2062 100644 --- a/openmp/runtime/src/CMakeLists.txt +++ b/openmp/runtime/src/CMakeLists.txt @@ -253,6 +253,17 @@ if(NOT WIN32) libiomp5${LIBOMP_LIBRARY_SUFFIX} WORKING_DIRECTORY ${LIBOMP_LIBRARY_DIR} ) + if(LIBOMP_ENABLE_SHARED) + if(APPLE) + set(VERSIONED_LIBGOMP_NAME libgomp.1${LIBOMP_LIBRARY_SUFFIX}) + else() + set(VERSIONED_LIBGOMP_NAME libgomp${LIBOMP_LIBRARY_SUFFIX}.1) + endif() + add_custom_command(TARGET omp POST_BUILD + COMMAND ${CMAKE_COMMAND} -E create_symlink ${LIBOMP_LIB_FILE} ${VERSIONED_LIBGOMP_NAME} + WORKING_DIRECTORY ${LIBOMP_LIBRARY_DIR} + ) + endif() endif() # Definitions for testing, for reuse when testing libomptarget-nvptx. @@ -439,13 +450,18 @@ else() if(${LIBOMP_INSTALL_ALIASES}) # Create aliases (symlinks) of the library for backwards compatibility + extend_path(outdir "${CMAKE_INSTALL_PREFIX}" "${OPENMP_INSTALL_LIBDIR}") set(LIBOMP_ALIASES "libgomp;libiomp5") foreach(alias IN LISTS LIBOMP_ALIASES) - extend_path(outdir "${CMAKE_INSTALL_PREFIX}" "${OPENMP_INSTALL_LIBDIR}") install(CODE "execute_process(COMMAND \"\${CMAKE_COMMAND}\" -E create_symlink \"${LIBOMP_LIB_FILE}\" \"${alias}${LIBOMP_LIBRARY_SUFFIX}\" WORKING_DIRECTORY \"\$ENV{DESTDIR}${outdir}\")") endforeach() + if(LIBOMP_ENABLE_SHARED) + install(CODE "execute_process(COMMAND \"\${CMAKE_COMMAND}\" -E create_symlink \"${LIBOMP_LIB_FILE}\" + \"${VERSIONED_LIBGOMP_NAME}\" WORKING_DIRECTORY + \"\$ENV{DESTDIR}${outdir}\")") + endif() endif() endif() install( From openmp-commits at lists.llvm.org Fri Oct 25 11:20:05 2024 From: openmp-commits at lists.llvm.org (Ye Luo via Openmp-commits) Date: Fri, 25 Oct 2024 11:20:05 -0700 (PDT) Subject: [Openmp-commits] [openmp] [OpenMP] Create versioned libgomp softlinks (PR #112973) In-Reply-To: Message-ID: <671be155.050a0220.21c2a7.72bb@mx.google.com> https://github.com/ye-luo closed https://github.com/llvm/llvm-project/pull/112973 From openmp-commits at lists.llvm.org Fri Oct 25 11:43:00 2024 From: openmp-commits at lists.llvm.org (Ethan Luis McDonough via Openmp-commits) Date: Fri, 25 Oct 2024 11:43:00 -0700 (PDT) Subject: [Openmp-commits] [clang] [compiler-rt] [llvm] [openmp] [PGO][Offload] Profile profraw generation for GPU instrumentation #76587 (PR #93365) In-Reply-To: Message-ID: <671be6b4.050a0220.2d6ff5.736f@mx.google.com> https://github.com/EthanLuisMcDonough updated https://github.com/llvm/llvm-project/pull/93365 >From 530eb982b9770190377bb0bd09c5cb715f34d484 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 15 Dec 2023 20:38:38 -0600 Subject: [PATCH 01/38] Add profiling functions to libomptarget --- .../include/llvm/Frontend/OpenMP/OMPKinds.def | 3 +++ openmp/libomptarget/DeviceRTL/CMakeLists.txt | 2 ++ .../DeviceRTL/include/Profiling.h | 21 +++++++++++++++++++ .../libomptarget/DeviceRTL/src/Profiling.cpp | 19 +++++++++++++++++ 4 files changed, 45 insertions(+) create mode 100644 openmp/libomptarget/DeviceRTL/include/Profiling.h create mode 100644 openmp/libomptarget/DeviceRTL/src/Profiling.cpp diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def index d22d2a8e948b00..1d887d5cb58127 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def +++ b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def @@ -503,6 +503,9 @@ __OMP_RTL(__kmpc_barrier_simple_generic, false, Void, IdentPtr, Int32) __OMP_RTL(__kmpc_warp_active_thread_mask, false, Int64,) __OMP_RTL(__kmpc_syncwarp, false, Void, Int64) +__OMP_RTL(__llvm_profile_register_function, false, Void, VoidPtr) +__OMP_RTL(__llvm_profile_register_names_function, false, Void, VoidPtr, Int64) + __OMP_RTL(__last, false, Void, ) #undef __OMP_RTL diff --git a/openmp/libomptarget/DeviceRTL/CMakeLists.txt b/openmp/libomptarget/DeviceRTL/CMakeLists.txt index 1ce3e1e40a80ab..55ee15d068c67b 100644 --- a/openmp/libomptarget/DeviceRTL/CMakeLists.txt +++ b/openmp/libomptarget/DeviceRTL/CMakeLists.txt @@ -89,6 +89,7 @@ set(include_files ${include_directory}/Interface.h ${include_directory}/LibC.h ${include_directory}/Mapping.h + ${include_directory}/Profiling.h ${include_directory}/State.h ${include_directory}/Synchronization.h ${include_directory}/Types.h @@ -104,6 +105,7 @@ set(src_files ${source_directory}/Mapping.cpp ${source_directory}/Misc.cpp ${source_directory}/Parallelism.cpp + ${source_directory}/Profiling.cpp ${source_directory}/Reduction.cpp ${source_directory}/State.cpp ${source_directory}/Synchronization.cpp diff --git a/openmp/libomptarget/DeviceRTL/include/Profiling.h b/openmp/libomptarget/DeviceRTL/include/Profiling.h new file mode 100644 index 00000000000000..68c7744cd60752 --- /dev/null +++ b/openmp/libomptarget/DeviceRTL/include/Profiling.h @@ -0,0 +1,21 @@ +//===-------- Profiling.h - OpenMP interface ---------------------- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// +//===----------------------------------------------------------------------===// + +#ifndef OMPTARGET_DEVICERTL_PROFILING_H +#define OMPTARGET_DEVICERTL_PROFILING_H + +extern "C" { + +void __llvm_profile_register_function(void *ptr); +void __llvm_profile_register_names_function(void *ptr, long int i); +} + +#endif diff --git a/openmp/libomptarget/DeviceRTL/src/Profiling.cpp b/openmp/libomptarget/DeviceRTL/src/Profiling.cpp new file mode 100644 index 00000000000000..799477f5e47d27 --- /dev/null +++ b/openmp/libomptarget/DeviceRTL/src/Profiling.cpp @@ -0,0 +1,19 @@ +//===------- Profiling.cpp ---------------------------------------- C++ ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#include "Profiling.h" + +#pragma omp begin declare target device_type(nohost) + +extern "C" { + +void __llvm_profile_register_function(void *ptr) {} +void __llvm_profile_register_names_function(void *ptr, long int i) {} +} + +#pragma omp end declare target >From fb067d4ffe604fd68cf90b705db1942bce49dbb1 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Sat, 16 Dec 2023 01:18:41 -0600 Subject: [PATCH 02/38] Fix PGO instrumentation for GPU targets --- clang/lib/CodeGen/CodeGenPGO.cpp | 10 ++++++++-- .../lib/Transforms/Instrumentation/InstrProfiling.cpp | 11 ++++++++--- 2 files changed, 16 insertions(+), 5 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index 81bf8ea696b164..edae6885b528ac 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -959,8 +959,14 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, unsigned Counter = (*RegionCounterMap)[S]; - llvm::Value *Args[] = {FuncNameVar, - Builder.getInt64(FunctionHash), + // Make sure that pointer to global is passed in with zero addrspace + // This is relevant during GPU profiling + auto *I8Ty = llvm::Type::getInt8Ty(CGM.getLLVMContext()); + auto *I8PtrTy = llvm::PointerType::getUnqual(I8Ty); + auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( + FuncNameVar, I8PtrTy); + + llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), Builder.getInt32(NumRegionCounters), Builder.getInt32(Counter), StepV}; if (!StepV) diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index fe5a0578bd9721..d2cb8155c17967 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -1658,10 +1658,13 @@ void InstrLowerer::emitRegistration() { IRBuilder<> IRB(BasicBlock::Create(M.getContext(), "", RegisterF)); for (Value *Data : CompilerUsedVars) if (!isa(Data)) - IRB.CreateCall(RuntimeRegisterF, Data); + // Check for addrspace cast when profiling GPU + IRB.CreateCall(RuntimeRegisterF, + IRB.CreatePointerBitCastOrAddrSpaceCast(Data, VoidPtrTy)); for (Value *Data : UsedVars) if (Data != NamesVar && !isa(Data)) - IRB.CreateCall(RuntimeRegisterF, Data); + IRB.CreateCall(RuntimeRegisterF, + IRB.CreatePointerBitCastOrAddrSpaceCast(Data, VoidPtrTy)); if (NamesVar) { Type *ParamTypes[] = {VoidPtrTy, Int64Ty}; @@ -1670,7 +1673,9 @@ void InstrLowerer::emitRegistration() { auto *NamesRegisterF = Function::Create(NamesRegisterTy, GlobalVariable::ExternalLinkage, getInstrProfNamesRegFuncName(), M); - IRB.CreateCall(NamesRegisterF, {NamesVar, IRB.getInt64(NamesSize)}); + IRB.CreateCall(NamesRegisterF, {IRB.CreatePointerBitCastOrAddrSpaceCast( + NamesVar, VoidPtrTy), + IRB.getInt64(NamesSize)}); } IRB.CreateRetVoid(); >From 7a0e0efa178cc4de6a22a8f5cc3f53cd1c81ea3a Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 21 Dec 2023 00:25:46 -0600 Subject: [PATCH 03/38] Change global visibility on GPU targets --- llvm/include/llvm/ProfileData/InstrProf.h | 4 ++++ llvm/lib/ProfileData/InstrProf.cpp | 17 +++++++++++++++-- .../Instrumentation/InstrProfiling.cpp | 15 +++++++++++---- 3 files changed, 30 insertions(+), 6 deletions(-) diff --git a/llvm/include/llvm/ProfileData/InstrProf.h b/llvm/include/llvm/ProfileData/InstrProf.h index 288dc71d756aee..bf9899d867e3dd 100644 --- a/llvm/include/llvm/ProfileData/InstrProf.h +++ b/llvm/include/llvm/ProfileData/InstrProf.h @@ -171,6 +171,10 @@ inline StringRef getInstrProfCounterBiasVarName() { /// Return the marker used to separate PGO names during serialization. inline StringRef getInstrProfNameSeparator() { return "\01"; } +/// Determines whether module targets a GPU eligable for PGO +/// instrumentation +bool isGPUProfTarget(const Module &M); + /// Return the modified name for function \c F suitable to be /// used the key for profile lookup. Variable \c InLTO indicates if this /// is called in LTO optimization passes. diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp index 649d814cfd9de0..0d6717aeb0142c 100644 --- a/llvm/lib/ProfileData/InstrProf.cpp +++ b/llvm/lib/ProfileData/InstrProf.cpp @@ -410,13 +410,22 @@ std::string getPGOFuncNameVarName(StringRef FuncName, return VarName; } +bool isGPUProfTarget(const Module &M) { + const auto &triple = M.getTargetTriple(); + return triple.rfind("nvptx", 0) == 0 || triple.rfind("amdgcn", 0) == 0 || + triple.rfind("r600", 0) == 0; +} + GlobalVariable *createPGOFuncNameVar(Module &M, GlobalValue::LinkageTypes Linkage, StringRef PGOFuncName) { + // Ensure profiling variables on GPU are visible to be read from host + if (isGPUProfTarget(M)) + Linkage = GlobalValue::ExternalLinkage; // We generally want to match the function's linkage, but available_externally // and extern_weak both have the wrong semantics, and anything that doesn't // need to link across compilation units doesn't need to be visible at all. - if (Linkage == GlobalValue::ExternalWeakLinkage) + else if (Linkage == GlobalValue::ExternalWeakLinkage) Linkage = GlobalValue::LinkOnceAnyLinkage; else if (Linkage == GlobalValue::AvailableExternallyLinkage) Linkage = GlobalValue::LinkOnceODRLinkage; @@ -430,8 +439,12 @@ GlobalVariable *createPGOFuncNameVar(Module &M, new GlobalVariable(M, Value->getType(), true, Linkage, Value, getPGOFuncNameVarName(PGOFuncName, Linkage)); + // If the target is a GPU, make the symbol protected so it can + // be read from the host device + if (isGPUProfTarget(M)) + FuncNameVar->setVisibility(GlobalValue::ProtectedVisibility); // Hide the symbol so that we correctly get a copy for each executable. - if (!GlobalValue::isLocalLinkage(FuncNameVar->getLinkage())) + else if (!GlobalValue::isLocalLinkage(FuncNameVar->getLinkage())) FuncNameVar->setVisibility(GlobalValue::HiddenVisibility); return FuncNameVar; diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index d2cb8155c17967..3b582b65190808 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -1481,6 +1481,10 @@ void InstrLowerer::createDataVariable(InstrProfCntrInstBase *Inc) { for (uint32_t Kind = IPVK_First; Kind <= IPVK_Last; ++Kind) Int16ArrayVals[Kind] = ConstantInt::get(Int16Ty, PD.NumValueSites[Kind]); + if (isGPUProfTarget(M)) { + Linkage = GlobalValue::ExternalLinkage; + Visibility = GlobalValue::ProtectedVisibility; + } // If the data variable is not referenced by code (if we don't emit // @llvm.instrprof.value.profile, NS will be 0), and the counter keeps the // data variable live under linker GC, the data variable can be private. This @@ -1492,9 +1496,9 @@ void InstrLowerer::createDataVariable(InstrProfCntrInstBase *Inc) { // If profd is in a deduplicate comdat, NS==0 with a hash suffix guarantees // that other copies must have the same CFG and cannot have value profiling. // If no hash suffix, other profd copies may be referenced by code. - if (NS == 0 && !(DataReferencedByCode && NeedComdat && !Renamed) && - (TT.isOSBinFormatELF() || - (!DataReferencedByCode && TT.isOSBinFormatCOFF()))) { + else if (NS == 0 && !(DataReferencedByCode && NeedComdat && !Renamed) && + (TT.isOSBinFormatELF() || + (!DataReferencedByCode && TT.isOSBinFormatCOFF()))) { Linkage = GlobalValue::PrivateLinkage; Visibility = GlobalValue::DefaultVisibility; } @@ -1696,7 +1700,10 @@ bool InstrLowerer::emitRuntimeHook() { auto *Var = new GlobalVariable(M, Int32Ty, false, GlobalValue::ExternalLinkage, nullptr, getInstrProfRuntimeHookVarName()); - Var->setVisibility(GlobalValue::HiddenVisibility); + if (isGPUProfTarget(M)) + Var->setVisibility(GlobalValue::ProtectedVisibility); + else + Var->setVisibility(GlobalValue::HiddenVisibility); if (TT.isOSBinFormatELF() && !TT.isPS()) { // Mark the user variable as used so that it isn't stripped out. >From fddc07908ed9aa698fe3250ddbfc5621ab4d049d Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 22 Dec 2023 23:43:29 -0600 Subject: [PATCH 04/38] Make names global public on GPU --- llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index 3b582b65190808..61fba7be3ee0ee 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -1621,6 +1621,13 @@ void InstrLowerer::emitNameData() { NamesVar = new GlobalVariable(M, NamesVal->getType(), true, GlobalValue::PrivateLinkage, NamesVal, getInstrProfNamesVarName()); + + // Make names variable public if current target is a GPU + if (isGPUProfTarget(M)) { + NamesVar->setLinkage(GlobalValue::ExternalLinkage); + NamesVar->setVisibility(GlobalValue::VisibilityTypes::ProtectedVisibility); + } + NamesSize = CompressedNameStr.size(); setGlobalVariableLargeSection(TT, *NamesVar); NamesVar->setSection( >From e9db03c70bf79f4f4ddad4b48a5aa63a37e0d4f6 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 29 Dec 2023 12:54:50 -0600 Subject: [PATCH 05/38] Read and print GPU device PGO globals --- .../common/include/GlobalHandler.h | 27 ++++++ .../common/src/GlobalHandler.cpp | 82 +++++++++++++++++++ .../common/src/PluginInterface.cpp | 14 ++++ 3 files changed, 123 insertions(+) diff --git a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h index fa079ac9660ee0..a82cd536487653 100644 --- a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h +++ b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h @@ -14,9 +14,11 @@ #define LLVM_OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_GLOBALHANDLER_H #include +#include #include "llvm/ADT/DenseMap.h" #include "llvm/Object/ELFObjectFile.h" +#include "llvm/ProfileData/InstrProf.h" #include "Shared/Debug.h" #include "Shared/Utils.h" @@ -58,6 +60,22 @@ class GlobalTy { void setPtr(void *P) { Ptr = P; } }; +typedef void *IntPtrT; +struct __llvm_profile_data { +#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) Type Name; +#include "llvm/ProfileData/InstrProfData.inc" +}; + +/// PGO profiling data extracted from a GPU device +struct GPUProfGlobals { + std::string names; + std::vector> counts; + std::vector<__llvm_profile_data> data; + Triple targetTriple; + + void dump() const; +}; + /// Subclass of GlobalTy that holds the memory for a global of \p Ty. template class StaticGlobalTy : public GlobalTy { Ty Data; @@ -172,6 +190,15 @@ class GenericGlobalHandlerTy { return moveGlobalBetweenDeviceAndHost(Device, Image, HostGlobal, /* D2H */ false); } + + /// Checks whether a given image contains profiling globals. + bool hasProfilingGlobals(GenericDeviceTy &Device, DeviceImageTy &Image); + + /// Reads profiling data from a GPU image to supplied profdata struct. + /// Iterates through the image symbol table and stores global values + /// with profiling prefixes. + Expected readProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image); }; } // namespace plugin diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index 3a272e228c7dfe..5dd5daec468ca5 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -176,3 +176,85 @@ Error GenericGlobalHandlerTy::readGlobalFromImage(GenericDeviceTy &Device, return Plugin::success(); } + +bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GlobalTy global(getInstrProfNamesVarName().str(), 0); + if (auto Err = getGlobalMetadataFromImage(Device, Image, global)) { + consumeError(std::move(Err)); + return false; + } + return true; +} + +Expected +GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GPUProfGlobals profdata; + const auto *elf = getOrCreateELFObjectFile(Device, Image); + profdata.targetTriple = elf->makeTriple(); + // Iterate through + for (auto &sym : elf->symbols()) { + if (auto name = sym.getName()) { + // Check if given current global is a profiling global based + // on name + if (name->equals(getInstrProfNamesVarName())) { + // Read in profiled function names + std::vector chars(sym.getSize() / sizeof(char), ' '); + GlobalTy NamesGlobal(name->str(), sym.getSize(), chars.data()); + if (auto Err = readGlobalFromDevice(Device, Image, NamesGlobal)) + return Err; + std::string names(chars.begin(), chars.end()); + profdata.names = std::move(names); + } else if (name->starts_with(getInstrProfCountersVarPrefix())) { + // Read global variable profiling counts + std::vector counts(sym.getSize() / sizeof(int64_t), 0); + GlobalTy CountGlobal(name->str(), sym.getSize(), counts.data()); + if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal)) + return Err; + profdata.counts.push_back(std::move(counts)); + } else if (name->starts_with(getInstrProfDataVarPrefix())) { + // Read profiling data for this global variable + __llvm_profile_data data{}; + GlobalTy DataGlobal(name->str(), sym.getSize(), &data); + if (auto Err = readGlobalFromDevice(Device, Image, DataGlobal)) + return Err; + profdata.data.push_back(std::move(data)); + } + } + } + return profdata; +} + +void GPUProfGlobals::dump() const { + llvm::outs() << "======= GPU Profile =======\nTarget: " << targetTriple.str() + << "\n"; + + llvm::outs() << "======== Counters =========\n"; + for (const auto &count : counts) { + llvm::outs() << "["; + for (size_t i = 0; i < count.size(); i++) { + if (i == 0) + llvm::outs() << " "; + llvm::outs() << count[i] << " "; + } + llvm::outs() << "]\n"; + } + + llvm::outs() << "========== Data ===========\n"; + for (const auto &d : data) { + llvm::outs() << "{ "; +#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ + llvm::outs() << d.Name << " "; +#include "llvm/ProfileData/InstrProfData.inc" + llvm::outs() << " }\n"; + } + + llvm::outs() << "======== Functions ========\n"; + InstrProfSymtab symtab; + if (Error Err = symtab.create(StringRef(names))) { + consumeError(std::move(Err)); + } + symtab.dumpNames(llvm::outs()); + llvm::outs() << "===========================\n"; +} diff --git a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp index 3c7d1ca8998787..84ed90f03f84f1 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp @@ -811,6 +811,20 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { DeviceMemoryPoolTracking.AllocationMax); } + for (auto *Image : LoadedImages) { + GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler(); + if (!Handler.hasProfilingGlobals(*this, *Image)) + continue; + + GPUProfGlobals profdata; + auto ProfOrErr = Handler.readProfilingGlobals(*this, *Image); + if (!ProfOrErr) + return ProfOrErr.takeError(); + + // TODO: write data to profiling file + ProfOrErr->dump(); + } + // Delete the memory manager before deinitializing the device. Otherwise, // we may delete device allocations after the device is deinitialized. if (MemoryManager) >From e4687605d1a6ca932312025826db09dba84845a3 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 3 Jan 2024 17:06:15 -0600 Subject: [PATCH 06/38] Fix rebase bug --- .../plugins-nextgen/common/src/GlobalHandler.cpp | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index cb71b61f4a9c4f..86742d0f77a2fe 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -178,10 +178,12 @@ Expected GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, DeviceImageTy &Image) { GPUProfGlobals profdata; - const auto *elf = getOrCreateELFObjectFile(Device, Image); - profdata.targetTriple = elf->makeTriple(); - // Iterate through - for (auto &sym : elf->symbols()) { + auto ELFObj = getELFObjectFile(Image); + if (!ELFObj) + return ELFObj.takeError(); + profdata.targetTriple = ELFObj->makeTriple(); + // Iterate through elf symbols + for (auto &sym : ELFObj->symbols()) { if (auto name = sym.getName()) { // Check if given current global is a profiling global based // on name >From ec18ce94c227e1d43927955fa1c67360ecfcfca6 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 3 Jan 2024 17:10:19 -0600 Subject: [PATCH 07/38] Refactor portions to be more idiomatic --- clang/lib/CodeGen/CodeGenPGO.cpp | 4 +--- llvm/lib/ProfileData/InstrProf.cpp | 5 ++--- 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index edae6885b528ac..7bfcec43ee4c98 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -961,10 +961,8 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *I8Ty = llvm::Type::getInt8Ty(CGM.getLLVMContext()); - auto *I8PtrTy = llvm::PointerType::getUnqual(I8Ty); auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - FuncNameVar, I8PtrTy); + FuncNameVar, llvm::PointerType::getUnqual(CGM.getLLVMContext())); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), Builder.getInt32(NumRegionCounters), diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp index cdcd6840bb5108..1d88da16a5ff9c 100644 --- a/llvm/lib/ProfileData/InstrProf.cpp +++ b/llvm/lib/ProfileData/InstrProf.cpp @@ -429,9 +429,8 @@ std::string getPGOFuncNameVarName(StringRef FuncName, } bool isGPUProfTarget(const Module &M) { - const auto &triple = M.getTargetTriple(); - return triple.rfind("nvptx", 0) == 0 || triple.rfind("amdgcn", 0) == 0 || - triple.rfind("r600", 0) == 0; + const auto &Triple = llvm::Triple(M.getTargetTriple()); + return Triple.isAMDGPU() || Triple.isNVPTX(); } GlobalVariable *createPGOFuncNameVar(Module &M, >From 0872556f597056361b0a2c23cdd0be3d9745aef3 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 3 Jan 2024 17:18:47 -0600 Subject: [PATCH 08/38] Reformat DeviceRTL prof functions --- openmp/libomptarget/DeviceRTL/include/Profiling.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/openmp/libomptarget/DeviceRTL/include/Profiling.h b/openmp/libomptarget/DeviceRTL/include/Profiling.h index 68c7744cd60752..9efc1554c176bc 100644 --- a/openmp/libomptarget/DeviceRTL/include/Profiling.h +++ b/openmp/libomptarget/DeviceRTL/include/Profiling.h @@ -13,9 +13,8 @@ #define OMPTARGET_DEVICERTL_PROFILING_H extern "C" { - -void __llvm_profile_register_function(void *ptr); -void __llvm_profile_register_names_function(void *ptr, long int i); +void __llvm_profile_register_function(void *Ptr); +void __llvm_profile_register_names_function(void *Ptr, long int I); } #endif >From 62f31d1c71b5d100f38d6dc584cc138b3904581b Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 9 Jan 2024 11:52:29 -0600 Subject: [PATCH 09/38] Style changes + catch name error --- .../common/include/GlobalHandler.h | 16 ++-- .../common/src/GlobalHandler.cpp | 87 ++++++++++--------- 2 files changed, 56 insertions(+), 47 deletions(-) diff --git a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h index a803b3f76d8b25..755bb23a414e37 100644 --- a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h +++ b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h @@ -13,8 +13,7 @@ #ifndef LLVM_OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_GLOBALHANDLER_H #define LLVM_OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_GLOBALHANDLER_H -#include -#include +#include #include "llvm/ADT/DenseMap.h" #include "llvm/Object/ELFObjectFile.h" @@ -60,18 +59,19 @@ class GlobalTy { void setPtr(void *P) { Ptr = P; } }; -typedef void *IntPtrT; +using IntPtrT = void *; struct __llvm_profile_data { -#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) Type Name; +#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ + std::remove_const::type Name; #include "llvm/ProfileData/InstrProfData.inc" }; /// PGO profiling data extracted from a GPU device struct GPUProfGlobals { - std::string names; - std::vector> counts; - std::vector<__llvm_profile_data> data; - Triple targetTriple; + SmallVector NamesData; + SmallVector> Counts; + SmallVector<__llvm_profile_data> Data; + Triple TargetTriple; void dump() const; }; diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index 86742d0f77a2fe..7cb672e7b26839 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -19,6 +19,7 @@ #include "llvm/Support/Error.h" #include +#include using namespace llvm; using namespace omp; @@ -177,73 +178,81 @@ bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy &Device, Expected GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, DeviceImageTy &Image) { - GPUProfGlobals profdata; + GPUProfGlobals DeviceProfileData; auto ELFObj = getELFObjectFile(Image); if (!ELFObj) return ELFObj.takeError(); - profdata.targetTriple = ELFObj->makeTriple(); + DeviceProfileData.TargetTriple = ELFObj->makeTriple(); + // Iterate through elf symbols - for (auto &sym : ELFObj->symbols()) { - if (auto name = sym.getName()) { - // Check if given current global is a profiling global based - // on name - if (name->equals(getInstrProfNamesVarName())) { - // Read in profiled function names - std::vector chars(sym.getSize() / sizeof(char), ' '); - GlobalTy NamesGlobal(name->str(), sym.getSize(), chars.data()); - if (auto Err = readGlobalFromDevice(Device, Image, NamesGlobal)) - return Err; - std::string names(chars.begin(), chars.end()); - profdata.names = std::move(names); - } else if (name->starts_with(getInstrProfCountersVarPrefix())) { - // Read global variable profiling counts - std::vector counts(sym.getSize() / sizeof(int64_t), 0); - GlobalTy CountGlobal(name->str(), sym.getSize(), counts.data()); - if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal)) - return Err; - profdata.counts.push_back(std::move(counts)); - } else if (name->starts_with(getInstrProfDataVarPrefix())) { - // Read profiling data for this global variable - __llvm_profile_data data{}; - GlobalTy DataGlobal(name->str(), sym.getSize(), &data); - if (auto Err = readGlobalFromDevice(Device, Image, DataGlobal)) - return Err; - profdata.data.push_back(std::move(data)); - } + for (auto &Sym : ELFObj->symbols()) { + auto NameOrErr = Sym.getName(); + if (!NameOrErr) + return ELFObj.takeError(); + + // Check if given current global is a profiling global based + // on name + if (NameOrErr->equals(getInstrProfNamesVarName())) { + // Read in profiled function names + DeviceProfileData.NamesData = SmallVector(Sym.getSize(), 0); + GlobalTy NamesGlobal(NameOrErr->str(), Sym.getSize(), + DeviceProfileData.NamesData.data()); + if (auto Err = readGlobalFromDevice(Device, Image, NamesGlobal)) + return Err; + } else if (NameOrErr->starts_with(getInstrProfCountersVarPrefix())) { + // Read global variable profiling counts + SmallVector Counts(Sym.getSize() / sizeof(int64_t), 0); + GlobalTy CountGlobal(NameOrErr->str(), Sym.getSize(), Counts.data()); + if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal)) + return Err; + DeviceProfileData.Counts.push_back(std::move(Counts)); + } else if (NameOrErr->starts_with(getInstrProfDataVarPrefix())) { + // Read profiling data for this global variable + __llvm_profile_data Data{}; + GlobalTy DataGlobal(NameOrErr->str(), Sym.getSize(), &Data); + if (auto Err = readGlobalFromDevice(Device, Image, DataGlobal)) + return Err; + DeviceProfileData.Data.push_back(std::move(Data)); } } - return profdata; + return DeviceProfileData; } void GPUProfGlobals::dump() const { - llvm::outs() << "======= GPU Profile =======\nTarget: " << targetTriple.str() + llvm::outs() << "======= GPU Profile =======\nTarget: " << TargetTriple.str() << "\n"; llvm::outs() << "======== Counters =========\n"; - for (const auto &count : counts) { + for (const auto &Count : Counts) { llvm::outs() << "["; - for (size_t i = 0; i < count.size(); i++) { + for (size_t i = 0; i < Count.size(); i++) { if (i == 0) llvm::outs() << " "; - llvm::outs() << count[i] << " "; + llvm::outs() << Count[i] << " "; } llvm::outs() << "]\n"; } llvm::outs() << "========== Data ===========\n"; - for (const auto &d : data) { + for (const auto &ProfData : Data) { llvm::outs() << "{ "; #define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ - llvm::outs() << d.Name << " "; + llvm::outs() << ProfData.Name << " "; #include "llvm/ProfileData/InstrProfData.inc" llvm::outs() << " }\n"; } llvm::outs() << "======== Functions ========\n"; - InstrProfSymtab symtab; - if (Error Err = symtab.create(StringRef(names))) { + std::string s; + s.reserve(NamesData.size()); + for (uint8_t Name : NamesData) { + s.push_back((char)Name); + } + + InstrProfSymtab Symtab; + if (Error Err = Symtab.create(StringRef(s))) { consumeError(std::move(Err)); } - symtab.dumpNames(llvm::outs()); + Symtab.dumpNames(llvm::outs()); llvm::outs() << "===========================\n"; } >From 0c4bbeb54d189c1461affd37853aa86c3e3ca7d8 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 17 Jan 2024 19:59:06 -0600 Subject: [PATCH 10/38] Add GPU PGO test --- .../common/src/GlobalHandler.cpp | 2 +- openmp/libomptarget/test/CMakeLists.txt | 6 +++ openmp/libomptarget/test/lit.cfg | 3 ++ openmp/libomptarget/test/lit.site.cfg.in | 2 +- openmp/libomptarget/test/offloading/pgo1.c | 39 +++++++++++++++++++ 5 files changed, 50 insertions(+), 2 deletions(-) create mode 100644 openmp/libomptarget/test/offloading/pgo1.c diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index 7cb672e7b26839..e5eb653d022287 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -239,7 +239,7 @@ void GPUProfGlobals::dump() const { #define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ llvm::outs() << ProfData.Name << " "; #include "llvm/ProfileData/InstrProfData.inc" - llvm::outs() << " }\n"; + llvm::outs() << "}\n"; } llvm::outs() << "======== Functions ========\n"; diff --git a/openmp/libomptarget/test/CMakeLists.txt b/openmp/libomptarget/test/CMakeLists.txt index a0ba233eaa5726..21233f3e252eb5 100644 --- a/openmp/libomptarget/test/CMakeLists.txt +++ b/openmp/libomptarget/test/CMakeLists.txt @@ -12,6 +12,12 @@ else() set(LIBOMPTARGET_DEBUG False) endif() +if (OPENMP_STANDALONE_BUILD) + set(LIBOMPTARGET_TEST_GPU_PGO False) +else() + set(LIBOMPTARGET_TEST_GPU_PGO True) +endif() + # Replace the space from user's input with ";" in case that CMake add escape # char into the lit command. string(REPLACE " " ";" LIBOMPTARGET_LIT_ARG_LIST "${LIBOMPTARGET_LIT_ARGS}") diff --git a/openmp/libomptarget/test/lit.cfg b/openmp/libomptarget/test/lit.cfg index 19c5e5c4572227..49743f9fed7f29 100644 --- a/openmp/libomptarget/test/lit.cfg +++ b/openmp/libomptarget/test/lit.cfg @@ -104,6 +104,9 @@ config.available_features.add(config.libomptarget_current_target) if config.libomptarget_has_libc: config.available_features.add('libc') +if config.libomptarget_test_pgo: + config.available_features.add('pgo') + # Determine whether the test system supports unified memory. # For CUDA, this is the case with compute capability 70 (Volta) or higher. # For all other targets, we currently assume it is. diff --git a/openmp/libomptarget/test/lit.site.cfg.in b/openmp/libomptarget/test/lit.site.cfg.in index 2d638118838727..494d1636af304a 100644 --- a/openmp/libomptarget/test/lit.site.cfg.in +++ b/openmp/libomptarget/test/lit.site.cfg.in @@ -25,6 +25,6 @@ config.libomptarget_not = "@OPENMP_NOT_EXECUTABLE@" config.libomptarget_debug = @LIBOMPTARGET_DEBUG@ config.has_libomptarget_ompt = @LIBOMPTARGET_OMPT_SUPPORT@ config.libomptarget_has_libc = @LIBOMPTARGET_GPU_LIBC_SUPPORT@ - +config.libomptarget_test_pgo = @LIBOMPTARGET_TEST_GPU_PGO@ # Let the main config do the real work. lit_config.load_config(config, "@CMAKE_CURRENT_SOURCE_DIR@/lit.cfg") diff --git a/openmp/libomptarget/test/offloading/pgo1.c b/openmp/libomptarget/test/offloading/pgo1.c new file mode 100644 index 00000000000000..ca8a6f502a06aa --- /dev/null +++ b/openmp/libomptarget/test/offloading/pgo1.c @@ -0,0 +1,39 @@ +// RUN: %libomptarget-compile-generic -fprofile-instr-generate -Xclang "-fprofile-instrument=clang" +// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic + +// UNSUPPORTED: x86_64-pc-linux-gnu +// UNSUPPORTED: x86_64-pc-linux-gnu-LTO +// UNSUPPORTED: aarch64-unknown-linux-gnu +// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +// REQUIRES: pgo + +#ifdef _OPENMP +#include +#endif + +int test1(int a) { return a / 2; } +int test2(int a) { return a * 2; } + +int main() { + int m = 2; +#pragma omp target + for (int i = 0; i < 10; i++) { + m = test1(m); + for (int j = 0; j < 2; j++) { + m = test2(m); + } + } +} + +// CHECK: ======== Counters ========= +// CHECK-NEXT: [ 0 11 20 ] +// CHECK-NEXT: [ 10 ] +// CHECK-NEXT: [ 20 ] +// CHECK-NEXT: ========== Data =========== +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: ======== Functions ======== +// CHECK-NEXT: pgo1.c:__omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} +// CHECK-NEXT: test1 +// CHECK-NEXT: test2 >From c7ae2a74daa93b05058fcc9bba64e0734359362c Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 17 Jan 2024 23:12:27 -0600 Subject: [PATCH 11/38] Fix PGO test formatting --- openmp/libomptarget/test/offloading/pgo1.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/openmp/libomptarget/test/offloading/pgo1.c b/openmp/libomptarget/test/offloading/pgo1.c index ca8a6f502a06aa..389be19b670d76 100644 --- a/openmp/libomptarget/test/offloading/pgo1.c +++ b/openmp/libomptarget/test/offloading/pgo1.c @@ -1,4 +1,5 @@ -// RUN: %libomptarget-compile-generic -fprofile-instr-generate -Xclang "-fprofile-instrument=clang" +// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ +// RUN: -Xclang "-fprofile-instrument=clang" // RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic // UNSUPPORTED: x86_64-pc-linux-gnu @@ -30,9 +31,18 @@ int main() { // CHECK-NEXT: [ 10 ] // CHECK-NEXT: [ 20 ] // CHECK-NEXT: ========== Data =========== -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } // CHECK-NEXT: ======== Functions ======== // CHECK-NEXT: pgo1.c:__omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} // CHECK-NEXT: test1 >From 8bb22072914bbb830e2788d117aedd0e0bab66ff Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 18 Jan 2024 23:15:55 -0600 Subject: [PATCH 12/38] Refactor visibility logic --- llvm/lib/ProfileData/InstrProf.cpp | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp index 511571a3eed9b0..708ea63fd95e04 100644 --- a/llvm/lib/ProfileData/InstrProf.cpp +++ b/llvm/lib/ProfileData/InstrProf.cpp @@ -422,6 +422,16 @@ bool isGPUProfTarget(const Module &M) { return Triple.isAMDGPU() || Triple.isNVPTX(); } +void setPGOFuncVisibility(Module &M, GlobalVariable *FuncNameVar) { + // If the target is a GPU, make the symbol protected so it can + // be read from the host device + if (isGPUProfTarget(M)) + FuncNameVar->setVisibility(GlobalValue::ProtectedVisibility); + // Hide the symbol so that we correctly get a copy for each executable. + else if (!GlobalValue::isLocalLinkage(FuncNameVar->getLinkage())) + FuncNameVar->setVisibility(GlobalValue::HiddenVisibility); +} + GlobalVariable *createPGOFuncNameVar(Module &M, GlobalValue::LinkageTypes Linkage, StringRef PGOFuncName) { @@ -445,14 +455,7 @@ GlobalVariable *createPGOFuncNameVar(Module &M, new GlobalVariable(M, Value->getType(), true, Linkage, Value, getPGOFuncNameVarName(PGOFuncName, Linkage)); - // If the target is a GPU, make the symbol protected so it can - // be read from the host device - if (isGPUProfTarget(M)) - FuncNameVar->setVisibility(GlobalValue::ProtectedVisibility); - // Hide the symbol so that we correctly get a copy for each executable. - else if (!GlobalValue::isLocalLinkage(FuncNameVar->getLinkage())) - FuncNameVar->setVisibility(GlobalValue::HiddenVisibility); - + setPGOFuncVisibility(M, FuncNameVar); return FuncNameVar; } >From 9f13943f64cb16162e44902d54de53a9b1229179 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 23 Jan 2024 18:33:58 -0600 Subject: [PATCH 13/38] Add LLVM instrumentation support This PR formerly only supported -fprofile-instrument=clang. This commit adds support for -fprofile-instrument=llvm --- .../Instrumentation/PGOInstrumentation.cpp | 12 +++- openmp/libomptarget/test/offloading/pgo1.c | 72 +++++++++++++------ 2 files changed, 59 insertions(+), 25 deletions(-) diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index c20fc942eaf0d5..bbc8da78fd7baf 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -862,6 +862,10 @@ static void instrumentOneFunc( auto Name = FuncInfo.FuncNameVar; auto CFGHash = ConstantInt::get(Type::getInt64Ty(M->getContext()), FuncInfo.FunctionHash); + // Make sure that pointer to global is passed in with zero addrspace + // This is relevant during GPU profiling + auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( + Name, llvm::PointerType::getUnqual(M->getContext())); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); IRBuilder<> Builder(&EntryBB, EntryBB.getFirstInsertionPt()); @@ -869,7 +873,7 @@ static void instrumentOneFunc( // i32 ) Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_cover), - {Name, CFGHash, Builder.getInt32(1), Builder.getInt32(0)}); + {NormalizedPtr, CFGHash, Builder.getInt32(1), Builder.getInt32(0)}); return; } @@ -887,7 +891,8 @@ static void instrumentOneFunc( // i32 ) Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_timestamp), - {Name, CFGHash, Builder.getInt32(NumCounters), Builder.getInt32(I)}); + {NormalizedPtr, CFGHash, Builder.getInt32(NumCounters), + Builder.getInt32(I)}); I += PGOBlockCoverage ? 8 : 1; } @@ -901,7 +906,8 @@ static void instrumentOneFunc( Intrinsic::getDeclaration(M, PGOBlockCoverage ? Intrinsic::instrprof_cover : Intrinsic::instrprof_increment), - {Name, CFGHash, Builder.getInt32(NumCounters), Builder.getInt32(I++)}); + {NormalizedPtr, CFGHash, Builder.getInt32(NumCounters), + Builder.getInt32(I++)}); } // Now instrument select instructions: diff --git a/openmp/libomptarget/test/offloading/pgo1.c b/openmp/libomptarget/test/offloading/pgo1.c index 389be19b670d76..d95793b508dcfc 100644 --- a/openmp/libomptarget/test/offloading/pgo1.c +++ b/openmp/libomptarget/test/offloading/pgo1.c @@ -1,6 +1,11 @@ // RUN: %libomptarget-compile-generic -fprofile-instr-generate \ // RUN: -Xclang "-fprofile-instrument=clang" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic +// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ +// RUN: --check-prefix="CLANG-PGO" +// RUN: %libomptarget-compile-generic -fprofile-generate \ +// RUN: -Xclang "-fprofile-instrument=llvm" +// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ +// RUN: --check-prefix="LLVM-PGO" // UNSUPPORTED: x86_64-pc-linux-gnu // UNSUPPORTED: x86_64-pc-linux-gnu-LTO @@ -26,24 +31,47 @@ int main() { } } -// CHECK: ======== Counters ========= -// CHECK-NEXT: [ 0 11 20 ] -// CHECK-NEXT: [ 10 ] -// CHECK-NEXT: [ 20 ] -// CHECK-NEXT: ========== Data =========== -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: ======== Functions ======== -// CHECK-NEXT: pgo1.c:__omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} -// CHECK-NEXT: test1 -// CHECK-NEXT: test2 +// CLANG-PGO: ======== Counters ========= +// CLANG-PGO-NEXT: [ 0 11 20 ] +// CLANG-PGO-NEXT: [ 10 ] +// CLANG-PGO-NEXT: [ 20 ] +// CLANG-PGO-NEXT: ========== Data =========== +// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CLANG-PGO-NEXT: ======== Functions ======== +// CLANG-PGO-NEXT: pgo1.c: +// CLANG-PGO-SAME: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} +// CLANG-PGO-NEXT: test1 +// CLANG-PGO-NEXT: test2 + +// LLVM-PGO: ======== Counters ========= +// LLVM-PGO-NEXT: [ 20 ] +// LLVM-PGO-NEXT: [ 10 ] +// LLVM-PGO-NEXT: [ 20 10 1 1 ] +// LLVM-PGO-NEXT: ========== Data =========== +// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// LLVM-PGO-NEXT: ======== Functions ======== +// LLVM-PGO-NEXT: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} +// LLVM-PGO-NEXT: test1 +// LLVM-PGO-NEXT: test2 >From 0606f0dd1b32ef9ebe138bbc964b3921e22d95d1 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 14 Feb 2024 01:46:55 -0600 Subject: [PATCH 14/38] Use explicit addrspace instead of unqual --- clang/lib/CodeGen/CodeGenPGO.cpp | 2 +- llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index e084dda879cbc0..4c75a01222d304 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1103,7 +1103,7 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - FuncNameVar, llvm::PointerType::getUnqual(CGM.getLLVMContext())); + FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), Builder.getInt32(NumRegionCounters), diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index bbc8da78fd7baf..c63b3e4ecf786a 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -865,7 +865,7 @@ static void instrumentOneFunc( // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - Name, llvm::PointerType::getUnqual(M->getContext())); + Name, llvm::PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); IRBuilder<> Builder(&EntryBB, EntryBB.getFirstInsertionPt()); >From c1f9be321678766525141214aaab74636cafbc2c Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 15 Feb 2024 19:10:09 -0600 Subject: [PATCH 15/38] Remove redundant namespaces --- .../Instrumentation/PGOInstrumentation.cpp | 4 +-- .../common/src/GlobalHandler.cpp | 26 +++++++++---------- 2 files changed, 15 insertions(+), 15 deletions(-) diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index c63b3e4ecf786a..3058e577738fda 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -864,8 +864,8 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - Name, llvm::PointerType::get(M->getContext(), 0)); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); IRBuilder<> Builder(&EntryBB, EntryBB.getFirstInsertionPt()); diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index e5eb653d022287..ae270c60804d26 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -219,30 +219,30 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, } void GPUProfGlobals::dump() const { - llvm::outs() << "======= GPU Profile =======\nTarget: " << TargetTriple.str() + outs() << "======= GPU Profile =======\nTarget: " << TargetTriple.str() << "\n"; - llvm::outs() << "======== Counters =========\n"; + outs() << "======== Counters =========\n"; for (const auto &Count : Counts) { - llvm::outs() << "["; + outs() << "["; for (size_t i = 0; i < Count.size(); i++) { if (i == 0) - llvm::outs() << " "; - llvm::outs() << Count[i] << " "; + outs() << " "; + outs() << Count[i] << " "; } - llvm::outs() << "]\n"; + outs() << "]\n"; } - llvm::outs() << "========== Data ===========\n"; + outs() << "========== Data ===========\n"; for (const auto &ProfData : Data) { - llvm::outs() << "{ "; + outs() << "{ "; #define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ - llvm::outs() << ProfData.Name << " "; + outs() << ProfData.Name << " "; #include "llvm/ProfileData/InstrProfData.inc" - llvm::outs() << "}\n"; + outs() << "}\n"; } - llvm::outs() << "======== Functions ========\n"; + outs() << "======== Functions ========\n"; std::string s; s.reserve(NamesData.size()); for (uint8_t Name : NamesData) { @@ -253,6 +253,6 @@ void GPUProfGlobals::dump() const { if (Error Err = Symtab.create(StringRef(s))) { consumeError(std::move(Err)); } - Symtab.dumpNames(llvm::outs()); - llvm::outs() << "===========================\n"; + Symtab.dumpNames(outs()); + outs() << "===========================\n"; } >From 6a3ae407e69e7524f0f808329c534f8352ee1779 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 15 Feb 2024 19:15:15 -0600 Subject: [PATCH 16/38] Clang format --- .../libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index ae270c60804d26..1fce2448922624 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -220,7 +220,7 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, void GPUProfGlobals::dump() const { outs() << "======= GPU Profile =======\nTarget: " << TargetTriple.str() - << "\n"; + << "\n"; outs() << "======== Counters =========\n"; for (const auto &Count : Counts) { >From 6866862d459e3c3fa65fae8ae639ddc3ff735252 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 16 Feb 2024 13:13:39 -0600 Subject: [PATCH 17/38] Use getAddrSpaceCast Replace getPointerBitCastOrAddrSpaceCast with getAddrSpaceCast and allow no-op getAddrSpaceCast calls when types are identical --- clang/lib/CodeGen/CodeGenPGO.cpp | 2 +- llvm/lib/IR/Constants.cpp | 4 ++++ llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | 2 +- 3 files changed, 6 insertions(+), 2 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index 8f52018445d2b0..baceeba8380ddb 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1099,7 +1099,7 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( + auto *NormalizedPtr = llvm::ConstantExpr::getAddrSpaceCast( FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp index a38b912164b130..2d89c5bbd4a4c2 100644 --- a/llvm/lib/IR/Constants.cpp +++ b/llvm/lib/IR/Constants.cpp @@ -2067,6 +2067,10 @@ Constant *ConstantExpr::getBitCast(Constant *C, Type *DstTy, Constant *ConstantExpr::getAddrSpaceCast(Constant *C, Type *DstTy, bool OnlyIfReduced) { + // Skip cast if types are identical + if (C->getType() == DstTy) + return C; + assert(CastInst::castIsValid(Instruction::AddrSpaceCast, C, DstTy) && "Invalid constantexpr addrspacecast!"); return getFoldedCast(Instruction::AddrSpaceCast, C, DstTy, OnlyIfReduced); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index 3058e577738fda..c0be71aa4cc004 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -864,7 +864,7 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + auto *NormalizedPtr = ConstantExpr::getAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); >From 62a5ee1c75545571f81d9edd22e19e9ef7cff69f Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 27 Feb 2024 14:53:51 -0600 Subject: [PATCH 18/38] Revert "Use getAddrSpaceCast" This reverts commit 6866862d459e3c3fa65fae8ae639ddc3ff735252. --- clang/lib/CodeGen/CodeGenPGO.cpp | 2 +- llvm/lib/IR/Constants.cpp | 4 ---- llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | 2 +- 3 files changed, 2 insertions(+), 6 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index baceeba8380ddb..8f52018445d2b0 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1099,7 +1099,7 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getAddrSpaceCast( + auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp index 2d89c5bbd4a4c2..a38b912164b130 100644 --- a/llvm/lib/IR/Constants.cpp +++ b/llvm/lib/IR/Constants.cpp @@ -2067,10 +2067,6 @@ Constant *ConstantExpr::getBitCast(Constant *C, Type *DstTy, Constant *ConstantExpr::getAddrSpaceCast(Constant *C, Type *DstTy, bool OnlyIfReduced) { - // Skip cast if types are identical - if (C->getType() == DstTy) - return C; - assert(CastInst::castIsValid(Instruction::AddrSpaceCast, C, DstTy) && "Invalid constantexpr addrspacecast!"); return getFoldedCast(Instruction::AddrSpaceCast, C, DstTy, OnlyIfReduced); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index c0be71aa4cc004..3058e577738fda 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -864,7 +864,7 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = ConstantExpr::getAddrSpaceCast( + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); >From 052394fa28c923d130bf73a07b965a9751467302 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 27 Feb 2024 15:34:34 -0600 Subject: [PATCH 19/38] Revert "Use getAddrSpaceCast" This reverts commit 6866862d459e3c3fa65fae8ae639ddc3ff735252. --- clang/lib/CodeGen/CodeGenPGO.cpp | 2 +- llvm/lib/IR/Constants.cpp | 4 ---- llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | 2 +- 3 files changed, 2 insertions(+), 6 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index baceeba8380ddb..8f52018445d2b0 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1099,7 +1099,7 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getAddrSpaceCast( + auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp index 2d89c5bbd4a4c2..a38b912164b130 100644 --- a/llvm/lib/IR/Constants.cpp +++ b/llvm/lib/IR/Constants.cpp @@ -2067,10 +2067,6 @@ Constant *ConstantExpr::getBitCast(Constant *C, Type *DstTy, Constant *ConstantExpr::getAddrSpaceCast(Constant *C, Type *DstTy, bool OnlyIfReduced) { - // Skip cast if types are identical - if (C->getType() == DstTy) - return C; - assert(CastInst::castIsValid(Instruction::AddrSpaceCast, C, DstTy) && "Invalid constantexpr addrspacecast!"); return getFoldedCast(Instruction::AddrSpaceCast, C, DstTy, OnlyIfReduced); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index c0be71aa4cc004..3058e577738fda 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -864,7 +864,7 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = ConstantExpr::getAddrSpaceCast( + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); >From 612d5a5f6966a77e82e5591f5aea475fbf886e55 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 1 Mar 2024 02:04:00 -0600 Subject: [PATCH 20/38] Write PGO TODO: Fix tests --- compiler-rt/lib/profile/InstrProfiling.h | 11 ++ compiler-rt/lib/profile/InstrProfilingFile.c | 148 +++++++++++++++--- .../common/include/GlobalHandler.h | 14 +- .../common/src/GlobalHandler.cpp | 57 +++++-- .../common/src/PluginInterface.cpp | 6 +- 5 files changed, 200 insertions(+), 36 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfiling.h b/compiler-rt/lib/profile/InstrProfiling.h index 01239083369187..937acbd417de46 100644 --- a/compiler-rt/lib/profile/InstrProfiling.h +++ b/compiler-rt/lib/profile/InstrProfiling.h @@ -275,6 +275,17 @@ void __llvm_profile_get_padding_sizes_for_counters( */ void __llvm_profile_set_dumped(); +/*! + * \brief Write custom target-specific profiling data to a seperate file. + * Used by libomptarget for GPU PGO. + */ +int __llvm_write_custom_profile(const char *Target, + const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, + const char *CountersBegin, + const char *CountersEnd, const char *NamesBegin, + const char *NamesEnd); + /*! * This variable is defined in InstrProfilingRuntime.cpp as a hidden * symbol. Its main purpose is to enable profile runtime user to diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index f3b457d786e6bd..4fc401bb9bebf5 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -502,27 +502,15 @@ static FILE *getFileObject(const char *OutputName) { return fopen(OutputName, "ab"); } -/* Write profile data to file \c OutputName. */ -static int writeFile(const char *OutputName) { - int RetVal; - FILE *OutputFile; - - int MergeDone = 0; +/* Get file object and merge if applicable */ +static FILE *getMergeFileObject(const char *OutputName, int *MergeDone) { VPMergeHook = &lprofMergeValueProfData; if (doMerging()) - OutputFile = openFileForMerging(OutputName, &MergeDone); - else - OutputFile = getFileObject(OutputName); - - if (!OutputFile) - return -1; - - FreeHook = &free; - setupIOBuffer(); - ProfDataWriter fileWriter; - initFileWriter(&fileWriter, OutputFile); - RetVal = lprofWriteData(&fileWriter, lprofGetVPDataReader(), MergeDone); + return openFileForMerging(OutputName, MergeDone); + return getFileObject(OutputName); +} +static void closeFileObject(FILE *OutputFile) { if (OutputFile == getProfileFile()) { fflush(OutputFile); if (doMerging() && !__llvm_profile_is_continuous_mode_enabled()) { @@ -531,7 +519,23 @@ static int writeFile(const char *OutputName) { } else { fclose(OutputFile); } +} + +/* Write profile data to file \c OutputName. */ +static int writeFile(const char *OutputName) { + int RetVal, MergeDone = 0; + FILE *OutputFile = getMergeFileObject(OutputName, &MergeDone); + + if (!OutputFile) + return -1; + + FreeHook = &free; + setupIOBuffer(); + ProfDataWriter fileWriter; + initFileWriter(&fileWriter, OutputFile); + RetVal = lprofWriteData(&fileWriter, lprofGetVPDataReader(), MergeDone); + closeFileObject(OutputFile); return RetVal; } @@ -558,10 +562,16 @@ static int writeOrderFile(const char *OutputName) { #define LPROF_INIT_ONCE_ENV "__LLVM_PROFILE_RT_INIT_ONCE" +static void forceTruncateFile(const char *Filename) { + FILE *File = fopen(Filename, "w"); + if (!File) + return; + fclose(File); +} + static void truncateCurrentFile(void) { const char *Filename; char *FilenameBuf; - FILE *File; int Length; Length = getCurFilenameLength(); @@ -591,10 +601,7 @@ static void truncateCurrentFile(void) { return; /* Truncate the file. Later we'll reopen and append. */ - File = fopen(Filename, "w"); - if (!File) - return; - fclose(File); + forceTruncateFile(Filename); } /* Write a partial profile to \p Filename, which is required to be backed by @@ -1271,4 +1278,99 @@ COMPILER_RT_VISIBILITY int __llvm_profile_set_file_object(FILE *File, return 0; } +int __llvm_write_custom_profile(const char *Target, + const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, + const char *CountersBegin, + const char *CountersEnd, const char *NamesBegin, + const char *NamesEnd) { + int ReturnValue = 0, FilenameLength, TargetLength, MergeDone; + char *FilenameBuf, *TargetFilename; + const char *Filename; + + /* Save old profile data */ + FILE *oldFile = getProfileFile(); + + // Temporarily suspend getting SIGKILL when the parent exits. + int PDeathSig = lprofSuspendSigKill(); + + if (lprofProfileDumped() || __llvm_profile_is_continuous_mode_enabled()) { + PROF_NOTE("Profile data not written to file: %s.\n", "already written"); + if (PDeathSig == 1) + lprofRestoreSigKill(); + return 0; + } + + /* Get current filename */ + FilenameLength = getCurFilenameLength(); + FilenameBuf = (char *)COMPILER_RT_ALLOCA(FilenameLength + 1); + Filename = getCurFilename(FilenameBuf, 0); + + /* Check the filename. */ + if (!Filename) { + PROF_ERR("Failed to write file : %s\n", "Filename not set"); + if (PDeathSig == 1) + lprofRestoreSigKill(); + return -1; + } + + /* Allocate new space for our target-specific PGO filename */ + TargetLength = strlen(Target); + TargetFilename = + (char *)COMPILER_RT_ALLOCA(FilenameLength + TargetLength + 2); + + /* Prepend "TARGET." to current filename */ + memcpy(TargetFilename, Target, TargetLength); + TargetFilename[TargetLength] = '.'; + memcpy(TargetFilename, Target, TargetLength); + memcpy(TargetFilename + 1 + TargetLength, Filename, FilenameLength); + TargetFilename[FilenameLength + 1 + TargetLength] = 0; + + /* Check if there is llvm/runtime version mismatch. */ + if (GET_VERSION(__llvm_profile_get_version()) != INSTR_PROF_RAW_VERSION) { + PROF_ERR("Runtime and instrumentation version mismatch : " + "expected %d, but get %d\n", + INSTR_PROF_RAW_VERSION, + (int)GET_VERSION(__llvm_profile_get_version())); + if (PDeathSig == 1) + lprofRestoreSigKill(); + return -1; + } + + /* Clean old target file */ + forceTruncateFile(TargetFilename); + + /* Open target-specific PGO file */ + MergeDone = 0; + FILE *OutputFile = getMergeFileObject(TargetFilename, &MergeDone); + + if (!OutputFile) { + PROF_ERR("Failed to open file : %s\n", TargetFilename); + if (PDeathSig == 1) + lprofRestoreSigKill(); + return -1; + } + + FreeHook = &free; + setupIOBuffer(); + ProfDataWriter fileWriter; + initFileWriter(&fileWriter, OutputFile); + + /* Write custom data to the file */ + ReturnValue = lprofWriteDataImpl( + &fileWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, + lprofGetVPDataReader(), NamesBegin, NamesEnd, MergeDone); + + closeFileObject(OutputFile); + + // Restore SIGKILL. + if (PDeathSig == 1) + lprofRestoreSigKill(); + + /* Restore old profiling file */ + setProfileFile(oldFile); + + return ReturnValue; +} + #endif diff --git a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h index f5a15ca11bfcda..af0cd4dcdf5dcf 100644 --- a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h +++ b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h @@ -63,14 +63,24 @@ struct __llvm_profile_data { #include "llvm/ProfileData/InstrProfData.inc" }; +extern "C" { +extern int __attribute__((weak)) +__llvm_write_custom_profile(const char *Target, + const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, + const char *CountersBegin, const char *CountersEnd, + const char *NamesBegin, const char *NamesEnd); +} + /// PGO profiling data extracted from a GPU device struct GPUProfGlobals { - SmallVector NamesData; - SmallVector> Counts; + SmallVector Counts; SmallVector<__llvm_profile_data> Data; + SmallVector NamesData; Triple TargetTriple; void dump() const; + Error write() const; }; /// Subclass of GlobalTy that holds the memory for a global of \p Ty. diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index 1fce2448922624..2f16b6e3c139e9 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -205,7 +205,7 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, GlobalTy CountGlobal(NameOrErr->str(), Sym.getSize(), Counts.data()); if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal)) return Err; - DeviceProfileData.Counts.push_back(std::move(Counts)); + DeviceProfileData.Counts.append(std::move(Counts)); } else if (NameOrErr->starts_with(getInstrProfDataVarPrefix())) { // Read profiling data for this global variable __llvm_profile_data Data{}; @@ -223,15 +223,14 @@ void GPUProfGlobals::dump() const { << "\n"; outs() << "======== Counters =========\n"; - for (const auto &Count : Counts) { - outs() << "["; - for (size_t i = 0; i < Count.size(); i++) { - if (i == 0) - outs() << " "; - outs() << Count[i] << " "; - } - outs() << "]\n"; + for (size_t i = 0; i < Counts.size(); i++) { + if (i > 0 && i % 10 == 0) + outs() << "\n"; + else if (i != 0) + outs() << " "; + outs() << Counts[i]; } + outs() << "\n"; outs() << "========== Data ===========\n"; for (const auto &ProfData : Data) { @@ -256,3 +255,43 @@ void GPUProfGlobals::dump() const { Symtab.dumpNames(outs()); outs() << "===========================\n"; } + +Error GPUProfGlobals::write() const { + if (!__llvm_write_custom_profile) + return Plugin::error("Could not find symbol __llvm_write_custom_profile. " + "The compiler-rt profiling library must be linked for " + "GPU PGO to work."); + + size_t DataSize = Data.size() * sizeof(__llvm_profile_data), + CountsSize = Counts.size() * sizeof(int64_t); + __llvm_profile_data *DataBegin, *DataEnd; + char *CountersBegin, *CountersEnd, *NamesBegin, *NamesEnd; + + // Initialize array of contiguous data. We need to make sure each section is + // contiguous so that the PGO library can compute deltas properly + SmallVector ContiguousData(NamesData.size() + DataSize + CountsSize); + + // Compute region pointers + DataBegin = (__llvm_profile_data *)(ContiguousData.data() + CountsSize); + DataEnd = + (__llvm_profile_data *)(ContiguousData.data() + CountsSize + DataSize); + CountersBegin = (char *)ContiguousData.data(); + CountersEnd = (char *)(ContiguousData.data() + CountsSize); + NamesBegin = (char *)(ContiguousData.data() + CountsSize + DataSize); + NamesEnd = (char *)(ContiguousData.data() + CountsSize + DataSize + + NamesData.size()); + + // Copy data to contiguous buffer + memcpy(DataBegin, Data.data(), DataSize); + memcpy(CountersBegin, Counts.data(), CountsSize); + memcpy(NamesBegin, NamesData.data(), NamesData.size()); + + // Invoke compiler-rt entrypoint + int result = __llvm_write_custom_profile(TargetTriple.str().c_str(), + DataBegin, DataEnd, CountersBegin, + CountersEnd, NamesBegin, NamesEnd); + if (result != 0) + return Plugin::error("Error writing GPU PGO data to file"); + + return Plugin::success(); +} diff --git a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp index 1ea93795ce8ce4..d5e6b6128152dc 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp @@ -837,8 +837,10 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { if (!ProfOrErr) return ProfOrErr.takeError(); - // TODO: write data to profiling file - ProfOrErr->dump(); + // Write data to profiling file + if (auto Err = ProfOrErr->write()) { + consumeError(std::move(Err)); + } } // Delete the memory manager before deinitializing the device. Otherwise, >From b8c916305acf08c0bd2d51b81875be5e8fc59ff3 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 13 Mar 2024 20:05:32 -0500 Subject: [PATCH 21/38] Fix tests --- .../plugins-nextgen/common/src/PluginInterface.cpp | 3 +++ openmp/libomptarget/test/offloading/pgo1.c | 8 ++------ 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp index d5e6b6128152dc..2359ad28a25b04 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp @@ -837,6 +837,9 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { if (!ProfOrErr) return ProfOrErr.takeError(); + // Dump out profdata + ProfOrErr->dump(); + // Write data to profiling file if (auto Err = ProfOrErr->write()) { consumeError(std::move(Err)); diff --git a/openmp/libomptarget/test/offloading/pgo1.c b/openmp/libomptarget/test/offloading/pgo1.c index d95793b508dcfc..79e93d0f10827f 100644 --- a/openmp/libomptarget/test/offloading/pgo1.c +++ b/openmp/libomptarget/test/offloading/pgo1.c @@ -32,9 +32,7 @@ int main() { } // CLANG-PGO: ======== Counters ========= -// CLANG-PGO-NEXT: [ 0 11 20 ] -// CLANG-PGO-NEXT: [ 10 ] -// CLANG-PGO-NEXT: [ 20 ] +// CLANG-PGO-NEXT: 0 11 20 10 20 // CLANG-PGO-NEXT: ========== Data =========== // CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} // CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} @@ -55,9 +53,7 @@ int main() { // CLANG-PGO-NEXT: test2 // LLVM-PGO: ======== Counters ========= -// LLVM-PGO-NEXT: [ 20 ] -// LLVM-PGO-NEXT: [ 10 ] -// LLVM-PGO-NEXT: [ 20 10 1 1 ] +// LLVM-PGO-NEXT: 20 10 20 10 1 1 // LLVM-PGO-NEXT: ========== Data =========== // LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} // LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} >From 7770b37a5a4c40bd45887f762bd7f1e652bc0ed2 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 7 May 2024 16:31:48 -0500 Subject: [PATCH 22/38] Fix params --- compiler-rt/lib/profile/InstrProfilingFile.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index 466bfe480543bc..bc1d40a37a5ad6 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1360,9 +1360,10 @@ int __llvm_write_custom_profile(const char *Target, initFileWriter(&fileWriter, OutputFile); /* Write custom data to the file */ - ReturnValue = lprofWriteDataImpl( - &fileWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, - lprofGetVPDataReader(), NamesBegin, NamesEnd, MergeDone); + ReturnValue = + lprofWriteDataImpl(&fileWriter, DataBegin, DataEnd, CountersBegin, + CountersEnd, NULL, NULL, lprofGetVPDataReader(), NULL, + NULL, NULL, NULL, NamesBegin, NamesEnd, MergeDone); closeFileObject(OutputFile); >From aa895a1788969a0d27692057a1457074e9772c78 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 18 Mar 2024 21:31:32 -0500 Subject: [PATCH 23/38] Fix elf obj file --- offload/plugins-nextgen/common/src/GlobalHandler.cpp | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/offload/plugins-nextgen/common/src/GlobalHandler.cpp b/offload/plugins-nextgen/common/src/GlobalHandler.cpp index 80cdcaff75528e..7717e19a5b6779 100644 --- a/offload/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/offload/plugins-nextgen/common/src/GlobalHandler.cpp @@ -177,16 +177,19 @@ Expected GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, DeviceImageTy &Image) { GPUProfGlobals DeviceProfileData; - auto ELFObj = getELFObjectFile(Image); - if (!ELFObj) - return ELFObj.takeError(); + auto ObjFile = getELFObjectFile(Image); + if (!ObjFile) + return ObjFile.takeError(); + + std::unique_ptr ELFObj( + static_cast(ObjFile->release())); DeviceProfileData.TargetTriple = ELFObj->makeTriple(); // Iterate through elf symbols for (auto &Sym : ELFObj->symbols()) { auto NameOrErr = Sym.getName(); if (!NameOrErr) - return ELFObj.takeError(); + return NameOrErr.takeError(); // Check if given current global is a profiling global based // on name >From 2031e49c2b26864f2dab72e629eb6cbe34928a7a Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 6 May 2024 23:13:58 -0500 Subject: [PATCH 24/38] Add more addrspace casts for GPU targets --- .../Transforms/Instrumentation/InstrProfiling.cpp | 11 ++++++++--- .../Instrumentation/PGOInstrumentation.cpp | 13 +++++++++---- 2 files changed, 17 insertions(+), 7 deletions(-) diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index a6b1e0d488120a..dd8c027c4bbf62 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -869,6 +869,8 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { llvm::InstrProfValueKind::IPVK_MemOPSize); CallInst *Call = nullptr; auto *TLI = &GetTLI(*Ind->getFunction()); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + DataVar, PointerType::getUnqual(M.getContext())); // To support value profiling calls within Windows exception handlers, funclet // information contained within operand bundles needs to be copied over to @@ -877,11 +879,13 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { SmallVector OpBundles; Ind->getOperandBundlesAsDefs(OpBundles); if (!IsMemOpSize) { - Value *Args[3] = {Ind->getTargetValue(), DataVar, Builder.getInt32(Index)}; + Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Builder.getInt32(Index)}; Call = Builder.CreateCall(getOrInsertValueProfilingCall(M, *TLI), Args, OpBundles); } else { - Value *Args[3] = {Ind->getTargetValue(), DataVar, Builder.getInt32(Index)}; + Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Builder.getInt32(Index)}; Call = Builder.CreateCall( getOrInsertValueProfilingCall(M, *TLI, ValueProfilingCallType::MemOp), Args, OpBundles); @@ -1575,7 +1579,8 @@ void InstrLowerer::createDataVariable(InstrProfCntrInstBase *Inc) { getInstrProfSectionName(IPSK_vals, TT.getObjectFormat())); ValuesVar->setAlignment(Align(8)); maybeSetComdat(ValuesVar, Fn, CntsVarName); - ValuesPtrExpr = ValuesVar; + ValuesPtrExpr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + ValuesVar, PointerType::getUnqual(Fn->getContext())); } uint64_t NumCounters = Inc->getNumCounters()->getZExtValue(); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index 4b51396a8baa35..ee1657ba8400ee 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -1007,12 +1007,15 @@ static void instrumentOneFunc( ToProfile = Builder.CreatePtrToInt(Cand.V, Builder.getInt64Ty()); assert(ToProfile && "value profiling Value is of unexpected type"); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + Name, PointerType::get(M->getContext(), 0)); + SmallVector OpBundles; populateEHOperandBundle(Cand, BlockColors, OpBundles); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_value_profile), - {FuncInfo.FuncNameVar, Builder.getInt64(FuncInfo.FunctionHash), - ToProfile, Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, + {NormalizedPtr, Builder.getInt64(FuncInfo.FunctionHash), ToProfile, + Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, OpBundles); } } // IPVK_First <= Kind <= IPVK_Last @@ -1685,10 +1688,12 @@ void SelectInstVisitor::instrumentOneSelectInst(SelectInst &SI) { IRBuilder<> Builder(&SI); Type *Int64Ty = Builder.getInt64Ty(); auto *Step = Builder.CreateZExt(SI.getCondition(), Int64Ty); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + FuncNameVar, PointerType::get(M->getContext(), 0)); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_increment_step), - {FuncNameVar, Builder.getInt64(FuncHash), Builder.getInt32(TotalNumCtrs), - Builder.getInt32(*CurCtrIdx), Step}); + {NormalizedPtr, Builder.getInt64(FuncHash), + Builder.getInt32(TotalNumCtrs), Builder.getInt32(*CurCtrIdx), Step}); ++(*CurCtrIdx); } >From be6524bb4f77de0add1e698f68115fd336f32238 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 13 May 2024 17:41:00 -0500 Subject: [PATCH 25/38] Have test read from profraw instead of dump --- offload/test/lit.cfg | 2 + offload/test/offloading/pgo1.c | 94 ++++++++++++++++------------------ 2 files changed, 46 insertions(+), 50 deletions(-) diff --git a/offload/test/lit.cfg b/offload/test/lit.cfg index 069110dc69a6e4..38e6a33b01fafc 100644 --- a/offload/test/lit.cfg +++ b/offload/test/lit.cfg @@ -391,6 +391,8 @@ if config.test_fortran_compiler: config.available_features.add('flang') config.substitutions.append(("%flang", config.test_fortran_compiler)) +config.substitutions.append(("%target_triple", config.libomptarget_current_target)) + config.substitutions.append(("%openmp_flags", config.test_openmp_flags)) if config.libomptarget_current_target.startswith('nvptx') and config.cuda_path: config.substitutions.append(("%cuda_flags", "--cuda-path=" + config.cuda_path)) diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index 79e93d0f10827f..d22d5340f5b3ec 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -1,22 +1,21 @@ -// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ -// RUN: -Xclang "-fprofile-instrument=clang" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ -// RUN: --check-prefix="CLANG-PGO" -// RUN: %libomptarget-compile-generic -fprofile-generate \ -// RUN: -Xclang "-fprofile-instrument=llvm" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ +// RUN: %libomptarget-compile-generic -Xclang "-fprofile-instrument=llvm" +// RUN: env LLVM_PROFILE_FILE=llvm.profraw %libomptarget-run-generic 2>&1 +// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %target_triple.llvm.profraw | %fcheck-generic \ // RUN: --check-prefix="LLVM-PGO" +// RUN: %libomptarget-compile-generic -Xclang "-fprofile-instrument=clang" +// RUN: env LLVM_PROFILE_FILE=clang.profraw %libomptarget-run-generic 2>&1 +// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %target_triple.clang.profraw | %fcheck-generic \ +// RUN: --check-prefix="CLANG-PGO" + // UNSUPPORTED: x86_64-pc-linux-gnu // UNSUPPORTED: x86_64-pc-linux-gnu-LTO // UNSUPPORTED: aarch64-unknown-linux-gnu // UNSUPPORTED: aarch64-unknown-linux-gnu-LTO // REQUIRES: pgo -#ifdef _OPENMP -#include -#endif - int test1(int a) { return a / 2; } int test2(int a) { return a * 2; } @@ -31,43 +30,38 @@ int main() { } } -// CLANG-PGO: ======== Counters ========= -// CLANG-PGO-NEXT: 0 11 20 10 20 -// CLANG-PGO-NEXT: ========== Data =========== -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: ======== Functions ======== -// CLANG-PGO-NEXT: pgo1.c: -// CLANG-PGO-SAME: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} -// CLANG-PGO-NEXT: test1 -// CLANG-PGO-NEXT: test2 +// LLVM-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: +// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-PGO: Counters: 4 +// LLVM-PGO: Function count: 20 +// LLVM-PGO: Block counts: [10, 20, 10] + +// LLVM-PGO-LABEL: test1: +// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-PGO: Counters: 1 +// LLVM-PGO: Function count: 1 +// LLVM-PGO: Block counts: [] + +// LLVM-PGO-LABEL: test2: +// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-PGO: Counters: 1 +// LLVM-PGO: Function count: 1 +// LLVM-PGO: Block counts: [] + +// CLANG-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: +// CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-PGO: Counters: 3 +// CLANG-PGO: Function count: 0 +// CLANG-PGO: Block counts: [11, 20] + +// CLANG-PGO-LABEL: test1: +// CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-PGO: Counters: 1 +// CLANG-PGO: Function count: 10 +// CLANG-PGO: Block counts: [] -// LLVM-PGO: ======== Counters ========= -// LLVM-PGO-NEXT: 20 10 20 10 1 1 -// LLVM-PGO-NEXT: ========== Data =========== -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: ======== Functions ======== -// LLVM-PGO-NEXT: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} -// LLVM-PGO-NEXT: test1 -// LLVM-PGO-NEXT: test2 +// CLANG-PGO-LABEL: test2: +// CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-PGO: Counters: 1 +// CLANG-PGO: Function count: 20 +// CLANG-PGO: Block counts: [] >From 2b8eb2935ec21bf0acc5c56f45837b5976560963 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 24 May 2024 19:59:33 -0500 Subject: [PATCH 26/38] Fix PGO test format --- offload/test/offloading/pgo1.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index d22d5340f5b3ec..0e75c684ed9263 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -33,20 +33,17 @@ int main() { // LLVM-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 4 -// LLVM-PGO: Function count: 20 -// LLVM-PGO: Block counts: [10, 20, 10] +// LLVM-PGO: Block counts: [20, 10, 20, 10] // LLVM-PGO-LABEL: test1: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Function count: 1 -// LLVM-PGO: Block counts: [] +// LLVM-PGO: Block counts: [1] // LLVM-PGO-LABEL: test2: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Function count: 1 -// LLVM-PGO: Block counts: [] +// LLVM-PGO: Block counts: [1] // CLANG-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} >From 67f3009173d815295f36e2b37e85add1347e3bf9 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 24 May 2024 20:45:04 -0500 Subject: [PATCH 27/38] Refactor profile writer --- compiler-rt/lib/profile/InstrProfilingFile.c | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index bc1d40a37a5ad6..76238214c13aa3 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1344,8 +1344,7 @@ int __llvm_write_custom_profile(const char *Target, forceTruncateFile(TargetFilename); /* Open target-specific PGO file */ - MergeDone = 0; - FILE *OutputFile = getMergeFileObject(TargetFilename, &MergeDone); + FILE *OutputFile = getFileObject(TargetFilename); if (!OutputFile) { PROF_ERR("Failed to open file : %s\n", TargetFilename); @@ -1356,15 +1355,11 @@ int __llvm_write_custom_profile(const char *Target, FreeHook = &free; setupIOBuffer(); - ProfDataWriter fileWriter; - initFileWriter(&fileWriter, OutputFile); - - /* Write custom data to the file */ - ReturnValue = - lprofWriteDataImpl(&fileWriter, DataBegin, DataEnd, CountersBegin, - CountersEnd, NULL, NULL, lprofGetVPDataReader(), NULL, - NULL, NULL, NULL, NamesBegin, NamesEnd, MergeDone); + /* Write custom data */ + ReturnValue = __llvm_profile_write_buffer_internal( + OutputFile, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, + NamesBegin, NamesEnd); closeFileObject(OutputFile); // Restore SIGKILL. >From e8ad1322c557f7b48e2b28fe3a34a696a1103bba Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 27 May 2024 18:29:18 -0500 Subject: [PATCH 28/38] Fix refactor bug --- compiler-rt/lib/profile/InstrProfilingFile.c | 52 ++++++++++---------- offload/test/offloading/pgo1.c | 6 ++- 2 files changed, 29 insertions(+), 29 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index 76238214c13aa3..784cb9af6169d8 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -505,14 +505,6 @@ static FILE *getFileObject(const char *OutputName) { return fopen(OutputName, "ab"); } -/* Get file object and merge if applicable */ -static FILE *getMergeFileObject(const char *OutputName, int *MergeDone) { - VPMergeHook = &lprofMergeValueProfData; - if (doMerging()) - return openFileForMerging(OutputName, MergeDone); - return getFileObject(OutputName); -} - static void closeFileObject(FILE *OutputFile) { if (OutputFile == getProfileFile()) { fflush(OutputFile); @@ -526,8 +518,15 @@ static void closeFileObject(FILE *OutputFile) { /* Write profile data to file \c OutputName. */ static int writeFile(const char *OutputName) { - int RetVal, MergeDone = 0; - FILE *OutputFile = getMergeFileObject(OutputName, &MergeDone); + int RetVal; + FILE *OutputFile; + + int MergeDone = 0; + VPMergeHook = &lprofMergeValueProfData; + if (doMerging()) + OutputFile = openFileForMerging(OutputName, &MergeDone); + else + OutputFile = getFileObject(OutputName); if (!OutputFile) return -1; @@ -565,16 +564,10 @@ static int writeOrderFile(const char *OutputName) { #define LPROF_INIT_ONCE_ENV "__LLVM_PROFILE_RT_INIT_ONCE" -static void forceTruncateFile(const char *Filename) { - FILE *File = fopen(Filename, "w"); - if (!File) - return; - fclose(File); -} - static void truncateCurrentFile(void) { const char *Filename; char *FilenameBuf; + FILE *File; int Length; Length = getCurFilenameLength(); @@ -604,7 +597,10 @@ static void truncateCurrentFile(void) { return; /* Truncate the file. Later we'll reopen and append. */ - forceTruncateFile(Filename); + File = fopen(Filename, "w"); + if (!File) + return; + fclose(File); } /* Write a partial profile to \p Filename, which is required to be backed by @@ -1287,7 +1283,7 @@ int __llvm_write_custom_profile(const char *Target, const char *CountersBegin, const char *CountersEnd, const char *NamesBegin, const char *NamesEnd) { - int ReturnValue = 0, FilenameLength, TargetLength, MergeDone; + int ReturnValue = 0, FilenameLength, TargetLength; char *FilenameBuf, *TargetFilename; const char *Filename; @@ -1340,11 +1336,9 @@ int __llvm_write_custom_profile(const char *Target, return -1; } - /* Clean old target file */ - forceTruncateFile(TargetFilename); - - /* Open target-specific PGO file */ - FILE *OutputFile = getFileObject(TargetFilename); + /* Open and truncate target-specific PGO file */ + FILE *OutputFile = fopen(TargetFilename, "w"); + setProfileFile(OutputFile); if (!OutputFile) { PROF_ERR("Failed to open file : %s\n", TargetFilename); @@ -1357,9 +1351,13 @@ int __llvm_write_custom_profile(const char *Target, setupIOBuffer(); /* Write custom data */ - ReturnValue = __llvm_profile_write_buffer_internal( - OutputFile, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, - NamesBegin, NamesEnd); + ProfDataWriter fileWriter; + initFileWriter(&fileWriter, OutputFile); + + /* Write custom data to the file */ + ReturnValue = lprofWriteDataImpl( + &fileWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, + lprofGetVPDataReader(), NULL, NULL, NULL, NULL, NamesBegin, NamesEnd, 0); closeFileObject(OutputFile); // Restore SIGKILL. diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index 0e75c684ed9263..d6747113265803 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -1,10 +1,12 @@ -// RUN: %libomptarget-compile-generic -Xclang "-fprofile-instrument=llvm" +// RUN: %libomptarget-compile-generic -fprofile-generate \ +// RUN: -Xclang "-fprofile-instrument=llvm" // RUN: env LLVM_PROFILE_FILE=llvm.profraw %libomptarget-run-generic 2>&1 // RUN: llvm-profdata show --all-functions --counts \ // RUN: %target_triple.llvm.profraw | %fcheck-generic \ // RUN: --check-prefix="LLVM-PGO" -// RUN: %libomptarget-compile-generic -Xclang "-fprofile-instrument=clang" +// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ +// RUN: -Xclang "-fprofile-instrument=clang" // RUN: env LLVM_PROFILE_FILE=clang.profraw %libomptarget-run-generic 2>&1 // RUN: llvm-profdata show --all-functions --counts \ // RUN: %target_triple.clang.profraw | %fcheck-generic \ >From 4c9f814ce14aeb6766a93f5c1d15b847b98dc29f Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 28 May 2024 12:58:43 -0500 Subject: [PATCH 29/38] Make requested clang-format change --- offload/plugins-nextgen/common/include/GlobalHandler.h | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/offload/plugins-nextgen/common/include/GlobalHandler.h b/offload/plugins-nextgen/common/include/GlobalHandler.h index 017d7e994f07a8..1d7b9f80f9dfd3 100644 --- a/offload/plugins-nextgen/common/include/GlobalHandler.h +++ b/offload/plugins-nextgen/common/include/GlobalHandler.h @@ -64,12 +64,10 @@ struct __llvm_profile_data { }; extern "C" { -extern int __attribute__((weak)) -__llvm_write_custom_profile(const char *Target, - const __llvm_profile_data *DataBegin, - const __llvm_profile_data *DataEnd, - const char *CountersBegin, const char *CountersEnd, - const char *NamesBegin, const char *NamesEnd); +extern int __attribute__((weak)) __llvm_write_custom_profile( + const char *Target, const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, const char *CountersBegin, + const char *CountersEnd, const char *NamesBegin, const char *NamesEnd); } /// PGO profiling data extracted from a GPU device >From 344e357de657f54c068be969dcfc3ea33f2f026e Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 31 May 2024 20:29:20 -0500 Subject: [PATCH 30/38] Tighten PGO test requirements Require compiler-rt to be an enabled runtime --- offload/test/CMakeLists.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/offload/test/CMakeLists.txt b/offload/test/CMakeLists.txt index 32df1e47afaeb2..41ab339147791c 100644 --- a/offload/test/CMakeLists.txt +++ b/offload/test/CMakeLists.txt @@ -12,10 +12,10 @@ else() set(LIBOMPTARGET_DEBUG False) endif() -if (OPENMP_STANDALONE_BUILD) - set(LIBOMPTARGET_TEST_GPU_PGO False) -else() +if (NOT OPENMP_STANDALONE_BUILD AND "compiler-rt" IN_LIST LLVM_ENABLE_RUNTIMES) set(LIBOMPTARGET_TEST_GPU_PGO True) +else() + set(LIBOMPTARGET_TEST_GPU_PGO False) endif() # Replace the space from user's input with ";" in case that CMake add escape >From 2f751420b9ad2ffc7c9fac4a645724b45cdae59a Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 31 May 2024 20:29:20 -0500 Subject: [PATCH 31/38] Tighten PGO test requirements Require compiler-rt to be an enabled runtime --- offload/test/CMakeLists.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/offload/test/CMakeLists.txt b/offload/test/CMakeLists.txt index 32df1e47afaeb2..41ab339147791c 100644 --- a/offload/test/CMakeLists.txt +++ b/offload/test/CMakeLists.txt @@ -12,10 +12,10 @@ else() set(LIBOMPTARGET_DEBUG False) endif() -if (OPENMP_STANDALONE_BUILD) - set(LIBOMPTARGET_TEST_GPU_PGO False) -else() +if (NOT OPENMP_STANDALONE_BUILD AND "compiler-rt" IN_LIST LLVM_ENABLE_RUNTIMES) set(LIBOMPTARGET_TEST_GPU_PGO True) +else() + set(LIBOMPTARGET_TEST_GPU_PGO False) endif() # Replace the space from user's input with ";" in case that CMake add escape >From 488cb4a349fdfbd73d0a78ddb2c17522c46145ba Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 26 Jun 2024 18:18:31 -0500 Subject: [PATCH 32/38] Apply requested formatting changes --- clang/lib/CodeGen/CodeGenPGO.cpp | 11 +++++----- llvm/lib/ProfileData/InstrProf.cpp | 4 ++-- .../Instrumentation/InstrProfiling.cpp | 10 ++++----- .../Instrumentation/PGOInstrumentation.cpp | 21 ++++++++++--------- offload/DeviceRTL/src/Profiling.cpp | 6 ++++-- 5 files changed, 28 insertions(+), 24 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index a7ce0b8f6a35f3..3edfbdd679c61d 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1199,12 +1199,13 @@ void CodeGenPGO::emitCounterSetOrIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); + auto *NormalizedFuncNameVarPtr = + llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( + FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); - llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), - Builder.getInt32(NumRegionCounters), - Builder.getInt32(Counter), StepV}; + llvm::Value *Args[] = { + NormalizedFuncNameVarPtr, Builder.getInt64(FunctionHash), + Builder.getInt32(NumRegionCounters), Builder.getInt32(Counter), StepV}; if (llvm::EnableSingleByteCoverage) Builder.CreateCall(CGM.getIntrinsic(llvm::Intrinsic::instrprof_cover), diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp index 1284efd4b5f4da..6742435c9d065e 100644 --- a/llvm/lib/ProfileData/InstrProf.cpp +++ b/llvm/lib/ProfileData/InstrProf.cpp @@ -433,8 +433,8 @@ std::string getPGOFuncNameVarName(StringRef FuncName, } bool isGPUProfTarget(const Module &M) { - const auto &Triple = llvm::Triple(M.getTargetTriple()); - return Triple.isAMDGPU() || Triple.isNVPTX(); + const auto &T = Triple(M.getTargetTriple()); + return T.isAMDGPU() || T.isNVPTX(); } void setPGOFuncVisibility(Module &M, GlobalVariable *FuncNameVar) { diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index dd8c027c4bbf62..05cef1236f0879 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -869,8 +869,8 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { llvm::InstrProfValueKind::IPVK_MemOPSize); CallInst *Call = nullptr; auto *TLI = &GetTLI(*Ind->getFunction()); - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( - DataVar, PointerType::getUnqual(M.getContext())); + auto *NormalizedDataVarPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + DataVar, PointerType::get(M.getContext(), 0)); // To support value profiling calls within Windows exception handlers, funclet // information contained within operand bundles needs to be copied over to @@ -879,12 +879,12 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { SmallVector OpBundles; Ind->getOperandBundlesAsDefs(OpBundles); if (!IsMemOpSize) { - Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Value *Args[3] = {Ind->getTargetValue(), NormalizedDataVarPtr, Builder.getInt32(Index)}; Call = Builder.CreateCall(getOrInsertValueProfilingCall(M, *TLI), Args, OpBundles); } else { - Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Value *Args[3] = {Ind->getTargetValue(), NormalizedDataVarPtr, Builder.getInt32(Index)}; Call = Builder.CreateCall( getOrInsertValueProfilingCall(M, *TLI, ValueProfilingCallType::MemOp), @@ -1580,7 +1580,7 @@ void InstrLowerer::createDataVariable(InstrProfCntrInstBase *Inc) { ValuesVar->setAlignment(Align(8)); maybeSetComdat(ValuesVar, Fn, CntsVarName); ValuesPtrExpr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( - ValuesVar, PointerType::getUnqual(Fn->getContext())); + ValuesVar, PointerType::get(Fn->getContext(), 0)); } uint64_t NumCounters = Inc->getNumCounters()->getZExtValue(); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index ee1657ba8400ee..f8f34ea25597f3 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -884,7 +884,7 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + auto *NormalizedNamePtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); @@ -893,7 +893,7 @@ static void instrumentOneFunc( // i32 ) Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_cover), - {NormalizedPtr, CFGHash, Builder.getInt32(1), Builder.getInt32(0)}); + {NormalizedNamePtr, CFGHash, Builder.getInt32(1), Builder.getInt32(0)}); return; } @@ -948,7 +948,7 @@ static void instrumentOneFunc( // i32 ) Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_timestamp), - {NormalizedPtr, CFGHash, Builder.getInt32(NumCounters), + {NormalizedNamePtr, CFGHash, Builder.getInt32(NumCounters), Builder.getInt32(I)}); I += PGOBlockCoverage ? 8 : 1; } @@ -963,7 +963,7 @@ static void instrumentOneFunc( Intrinsic::getDeclaration(M, PGOBlockCoverage ? Intrinsic::instrprof_cover : Intrinsic::instrprof_increment), - {NormalizedPtr, CFGHash, Builder.getInt32(NumCounters), + {NormalizedNamePtr, CFGHash, Builder.getInt32(NumCounters), Builder.getInt32(I++)}); } @@ -1007,15 +1007,15 @@ static void instrumentOneFunc( ToProfile = Builder.CreatePtrToInt(Cand.V, Builder.getInt64Ty()); assert(ToProfile && "value profiling Value is of unexpected type"); - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + auto *NormalizedNamePtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); SmallVector OpBundles; populateEHOperandBundle(Cand, BlockColors, OpBundles); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_value_profile), - {NormalizedPtr, Builder.getInt64(FuncInfo.FunctionHash), ToProfile, - Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, + {NormalizedNamePtr, Builder.getInt64(FuncInfo.FunctionHash), + ToProfile, Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, OpBundles); } } // IPVK_First <= Kind <= IPVK_Last @@ -1688,11 +1688,12 @@ void SelectInstVisitor::instrumentOneSelectInst(SelectInst &SI) { IRBuilder<> Builder(&SI); Type *Int64Ty = Builder.getInt64Ty(); auto *Step = Builder.CreateZExt(SI.getCondition(), Int64Ty); - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( - FuncNameVar, PointerType::get(M->getContext(), 0)); + auto *NormalizedFuncNameVarPtr = + ConstantExpr::getPointerBitCastOrAddrSpaceCast( + FuncNameVar, PointerType::get(M->getContext(), 0)); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_increment_step), - {NormalizedPtr, Builder.getInt64(FuncHash), + {NormalizedFuncNameVarPtr, Builder.getInt64(FuncHash), Builder.getInt32(TotalNumCtrs), Builder.getInt32(*CurCtrIdx), Step}); ++(*CurCtrIdx); } diff --git a/offload/DeviceRTL/src/Profiling.cpp b/offload/DeviceRTL/src/Profiling.cpp index 799477f5e47d27..639c62ceff7a69 100644 --- a/offload/DeviceRTL/src/Profiling.cpp +++ b/offload/DeviceRTL/src/Profiling.cpp @@ -12,8 +12,10 @@ extern "C" { -void __llvm_profile_register_function(void *ptr) {} -void __llvm_profile_register_names_function(void *ptr, long int i) {} +// Provides empty implementations for certain functions in compiler-rt +// that are emitted by the PGO instrumentation. +void __llvm_profile_register_function(void *Ptr) {} +void __llvm_profile_register_names_function(void *Ptr, long int I) {} } #pragma omp end declare target >From b90c01583f1893802aba0180b07a448584585365 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 26 Jun 2024 18:29:59 -0500 Subject: [PATCH 33/38] Add memop function shim to DeviceRTL This comes up sometimes when using LLVM IR level instrumentation. --- offload/DeviceRTL/include/Profiling.h | 1 + offload/DeviceRTL/src/Profiling.cpp | 1 + 2 files changed, 2 insertions(+) diff --git a/offload/DeviceRTL/include/Profiling.h b/offload/DeviceRTL/include/Profiling.h index 9efc1554c176bc..d9947522541219 100644 --- a/offload/DeviceRTL/include/Profiling.h +++ b/offload/DeviceRTL/include/Profiling.h @@ -15,6 +15,7 @@ extern "C" { void __llvm_profile_register_function(void *Ptr); void __llvm_profile_register_names_function(void *Ptr, long int I); +void __llvm_profile_instrument_memop(long int I, void *Ptr, int I2); } #endif diff --git a/offload/DeviceRTL/src/Profiling.cpp b/offload/DeviceRTL/src/Profiling.cpp index 639c62ceff7a69..bb3caaadcc03dd 100644 --- a/offload/DeviceRTL/src/Profiling.cpp +++ b/offload/DeviceRTL/src/Profiling.cpp @@ -16,6 +16,7 @@ extern "C" { // that are emitted by the PGO instrumentation. void __llvm_profile_register_function(void *Ptr) {} void __llvm_profile_register_names_function(void *Ptr, long int I) {} +void __llvm_profile_instrument_memop(long int I, void *Ptr, int I2) {} } #pragma omp end declare target >From c68c6e2fa98a1fe608b88ed38f7db68eae804c5b Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 27 Jun 2024 02:04:27 -0500 Subject: [PATCH 34/38] Make requested changes --- compiler-rt/lib/profile/InstrProfiling.h | 2 +- compiler-rt/lib/profile/InstrProfilingFile.c | 1 - offload/plugins-nextgen/common/src/PluginInterface.cpp | 5 ++--- 3 files changed, 3 insertions(+), 5 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfiling.h b/compiler-rt/lib/profile/InstrProfiling.h index ef1292a45bf01d..eda3e9a673c1af 100644 --- a/compiler-rt/lib/profile/InstrProfiling.h +++ b/compiler-rt/lib/profile/InstrProfiling.h @@ -298,7 +298,7 @@ void __llvm_profile_set_dumped(); /*! * \brief Write custom target-specific profiling data to a seperate file. - * Used by libomptarget for GPU PGO. + * Used by offload PGO. */ int __llvm_write_custom_profile(const char *Target, const __llvm_profile_data *DataBegin, diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index 784cb9af6169d8..93436ecbabb40d 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1321,7 +1321,6 @@ int __llvm_write_custom_profile(const char *Target, /* Prepend "TARGET." to current filename */ memcpy(TargetFilename, Target, TargetLength); TargetFilename[TargetLength] = '.'; - memcpy(TargetFilename, Target, TargetLength); memcpy(TargetFilename + 1 + TargetLength, Filename, FilenameLength); TargetFilename[FilenameLength + 1 + TargetLength] = 0; diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp index c4e1e63777de8a..445f4ad942bd4d 100644 --- a/offload/plugins-nextgen/common/src/PluginInterface.cpp +++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp @@ -843,9 +843,8 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { ProfOrErr->dump(); // Write data to profiling file - if (auto Err = ProfOrErr->write()) { - consumeError(std::move(Err)); - } + if (auto Err = ProfOrErr->write()) + return Err; } // Delete the memory manager before deinitializing the device. Otherwise, >From ca52c58c7fde412897cf6b10b9bbb321812f193d Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 27 Jun 2024 02:26:20 -0500 Subject: [PATCH 35/38] Only dump counters if PGODump flag is set --- offload/include/Shared/Environment.h | 1 + offload/plugins-nextgen/common/src/PluginInterface.cpp | 4 +++- openmp/docs/design/Runtimes.rst | 1 + 3 files changed, 5 insertions(+), 1 deletion(-) diff --git a/offload/include/Shared/Environment.h b/offload/include/Shared/Environment.h index d141146b6bd5a1..86f6d1c6ea2d36 100644 --- a/offload/include/Shared/Environment.h +++ b/offload/include/Shared/Environment.h @@ -30,6 +30,7 @@ enum class DeviceDebugKind : uint32_t { FunctionTracing = 1U << 1, CommonIssues = 1U << 2, AllocationTracker = 1U << 3, + PGODump = 1U << 4, }; struct DeviceEnvironmentTy { diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp index 445f4ad942bd4d..35fb04863d8741 100644 --- a/offload/plugins-nextgen/common/src/PluginInterface.cpp +++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp @@ -840,7 +840,9 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { return ProfOrErr.takeError(); // Dump out profdata - ProfOrErr->dump(); + if ((OMPX_DebugKind.get() & uint32_t(DeviceDebugKind::PGODump)) == + uint32_t(DeviceDebugKind::PGODump)) + ProfOrErr->dump(); // Write data to profiling file if (auto Err = ProfOrErr->write()) diff --git a/openmp/docs/design/Runtimes.rst b/openmp/docs/design/Runtimes.rst index f8a8cb87e83e66..7fc697a838e229 100644 --- a/openmp/docs/design/Runtimes.rst +++ b/openmp/docs/design/Runtimes.rst @@ -1493,3 +1493,4 @@ debugging features are supported. * Enable debugging assertions in the device. ``0x01`` * Enable diagnosing common problems during offloading . ``0x4`` * Enable device malloc statistics (amdgpu only). ``0x8`` + * Dump device PGO counters (only if PGO on GPU is enabled). ``0x10`` >From ee4431a1b57469c7679f54f124ca5f3dd7f0433b Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 9 Aug 2024 20:21:38 -0500 Subject: [PATCH 36/38] Update requirements --- offload/test/offloading/pgo1.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index d6747113265803..fbf6337374a997 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -12,10 +12,7 @@ // RUN: %target_triple.clang.profraw | %fcheck-generic \ // RUN: --check-prefix="CLANG-PGO" -// UNSUPPORTED: x86_64-pc-linux-gnu -// UNSUPPORTED: x86_64-pc-linux-gnu-LTO -// UNSUPPORTED: aarch64-unknown-linux-gnu -// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +// REQUIRES: gpu // REQUIRES: pgo int test1(int a) { return a / 2; } >From fb699b6bca72d42359a304bcbba88f3564ae9ac9 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Sat, 10 Aug 2024 00:54:36 -0500 Subject: [PATCH 37/38] Merge changes --- offload/plugins-nextgen/common/src/GlobalHandler.cpp | 2 +- offload/test/offloading/pgo1.c | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/offload/plugins-nextgen/common/src/GlobalHandler.cpp b/offload/plugins-nextgen/common/src/GlobalHandler.cpp index bca66cff6558a2..d7bfbba01c8efc 100644 --- a/offload/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/offload/plugins-nextgen/common/src/GlobalHandler.cpp @@ -193,7 +193,7 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, // Check if given current global is a profiling global based // on name - if (NameOrErr->equals(getInstrProfNamesVarName())) { + if (*NameOrErr == getInstrProfNamesVarName()) { // Read in profiled function names DeviceProfileData.NamesData = SmallVector(Sym.getSize(), 0); GlobalTy NamesGlobal(NameOrErr->str(), Sym.getSize(), diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index fbf6337374a997..3270ce8f15e7dc 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -32,17 +32,17 @@ int main() { // LLVM-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 4 -// LLVM-PGO: Block counts: [20, 10, 20, 10] +// LLVM-PGO: Block counts: [20, 10, 2, 1] // LLVM-PGO-LABEL: test1: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Block counts: [1] +// LLVM-PGO: Block counts: [10] // LLVM-PGO-LABEL: test2: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Block counts: [1] +// LLVM-PGO: Block counts: [20] // CLANG-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} >From 1d0a961aabe488e6d09b96a80329498b8f586923 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 25 Oct 2024 13:42:19 -0500 Subject: [PATCH 38/38] Add llvm-profdata substitution to offload tests --- offload/test/lit.cfg | 2 ++ offload/test/lit.site.cfg.in | 2 +- offload/test/offloading/pgo1.c | 4 ++-- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/offload/test/lit.cfg b/offload/test/lit.cfg index 7994a08ba063fb..cfd1ad6c3c1eb5 100644 --- a/offload/test/lit.cfg +++ b/offload/test/lit.cfg @@ -112,8 +112,10 @@ config.available_features.add(config.libomptarget_current_target) if config.libomptarget_has_libc: config.available_features.add('libc') +profdata_path = os.path.join(config.bin_llvm_tools_dir, "llvm-profdata") if config.libomptarget_test_pgo: config.available_features.add('pgo') + config.substitutions.append(("%profdata", profdata_path)) # Determine whether the test system supports unified memory. # For CUDA, this is the case with compute capability 70 (Volta) or higher. diff --git a/offload/test/lit.site.cfg.in b/offload/test/lit.site.cfg.in index a1cb5acc38a405..d998fb0c839700 100644 --- a/offload/test/lit.site.cfg.in +++ b/offload/test/lit.site.cfg.in @@ -1,6 +1,6 @@ @AUTO_GEN_COMMENT@ -config.bin_llvm_tools_dir = "@CMAKE_BINARY_DIR@/bin" +config.bin_llvm_tools_dir = "@LLVM_RUNTIME_OUTPUT_INTDIR@" config.test_c_compiler = "@OPENMP_TEST_C_COMPILER@" config.test_cxx_compiler = "@OPENMP_TEST_CXX_COMPILER@" config.test_fortran_compiler="@OPENMP_TEST_Fortran_COMPILER@" diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index 1ef540e430a27a..51671afa62b0db 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -1,14 +1,14 @@ // RUN: %libomptarget-compile-generic -fprofile-generate \ // RUN: -Xclang "-fprofile-instrument=llvm" // RUN: env LLVM_PROFILE_FILE=llvm.profraw %libomptarget-run-generic 2>&1 -// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %profdata show --all-functions --counts \ // RUN: %target_triple.llvm.profraw | %fcheck-generic \ // RUN: --check-prefix="LLVM-PGO" // RUN: %libomptarget-compile-generic -fprofile-instr-generate \ // RUN: -Xclang "-fprofile-instrument=clang" // RUN: env LLVM_PROFILE_FILE=clang.profraw %libomptarget-run-generic 2>&1 -// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %profdata show --all-functions --counts \ // RUN: %target_triple.clang.profraw | %fcheck-generic \ // RUN: --check-prefix="CLANG-PGO" From openmp-commits at lists.llvm.org Fri Oct 25 11:47:34 2024 From: openmp-commits at lists.llvm.org (Ethan Luis McDonough via Openmp-commits) Date: Fri, 25 Oct 2024 11:47:34 -0700 (PDT) Subject: [Openmp-commits] [clang] [compiler-rt] [llvm] [openmp] [PGO][Offload] Profile profraw generation for GPU instrumentation #76587 (PR #93365) In-Reply-To: Message-ID: <671be7c6.170a0220.2cafb9.96f5@mx.google.com> https://github.com/EthanLuisMcDonough updated https://github.com/llvm/llvm-project/pull/93365 >From 530eb982b9770190377bb0bd09c5cb715f34d484 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 15 Dec 2023 20:38:38 -0600 Subject: [PATCH 01/38] Add profiling functions to libomptarget --- .../include/llvm/Frontend/OpenMP/OMPKinds.def | 3 +++ openmp/libomptarget/DeviceRTL/CMakeLists.txt | 2 ++ .../DeviceRTL/include/Profiling.h | 21 +++++++++++++++++++ .../libomptarget/DeviceRTL/src/Profiling.cpp | 19 +++++++++++++++++ 4 files changed, 45 insertions(+) create mode 100644 openmp/libomptarget/DeviceRTL/include/Profiling.h create mode 100644 openmp/libomptarget/DeviceRTL/src/Profiling.cpp diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def index d22d2a8e948b00..1d887d5cb58127 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def +++ b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def @@ -503,6 +503,9 @@ __OMP_RTL(__kmpc_barrier_simple_generic, false, Void, IdentPtr, Int32) __OMP_RTL(__kmpc_warp_active_thread_mask, false, Int64,) __OMP_RTL(__kmpc_syncwarp, false, Void, Int64) +__OMP_RTL(__llvm_profile_register_function, false, Void, VoidPtr) +__OMP_RTL(__llvm_profile_register_names_function, false, Void, VoidPtr, Int64) + __OMP_RTL(__last, false, Void, ) #undef __OMP_RTL diff --git a/openmp/libomptarget/DeviceRTL/CMakeLists.txt b/openmp/libomptarget/DeviceRTL/CMakeLists.txt index 1ce3e1e40a80ab..55ee15d068c67b 100644 --- a/openmp/libomptarget/DeviceRTL/CMakeLists.txt +++ b/openmp/libomptarget/DeviceRTL/CMakeLists.txt @@ -89,6 +89,7 @@ set(include_files ${include_directory}/Interface.h ${include_directory}/LibC.h ${include_directory}/Mapping.h + ${include_directory}/Profiling.h ${include_directory}/State.h ${include_directory}/Synchronization.h ${include_directory}/Types.h @@ -104,6 +105,7 @@ set(src_files ${source_directory}/Mapping.cpp ${source_directory}/Misc.cpp ${source_directory}/Parallelism.cpp + ${source_directory}/Profiling.cpp ${source_directory}/Reduction.cpp ${source_directory}/State.cpp ${source_directory}/Synchronization.cpp diff --git a/openmp/libomptarget/DeviceRTL/include/Profiling.h b/openmp/libomptarget/DeviceRTL/include/Profiling.h new file mode 100644 index 00000000000000..68c7744cd60752 --- /dev/null +++ b/openmp/libomptarget/DeviceRTL/include/Profiling.h @@ -0,0 +1,21 @@ +//===-------- Profiling.h - OpenMP interface ---------------------- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// +//===----------------------------------------------------------------------===// + +#ifndef OMPTARGET_DEVICERTL_PROFILING_H +#define OMPTARGET_DEVICERTL_PROFILING_H + +extern "C" { + +void __llvm_profile_register_function(void *ptr); +void __llvm_profile_register_names_function(void *ptr, long int i); +} + +#endif diff --git a/openmp/libomptarget/DeviceRTL/src/Profiling.cpp b/openmp/libomptarget/DeviceRTL/src/Profiling.cpp new file mode 100644 index 00000000000000..799477f5e47d27 --- /dev/null +++ b/openmp/libomptarget/DeviceRTL/src/Profiling.cpp @@ -0,0 +1,19 @@ +//===------- Profiling.cpp ---------------------------------------- C++ ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#include "Profiling.h" + +#pragma omp begin declare target device_type(nohost) + +extern "C" { + +void __llvm_profile_register_function(void *ptr) {} +void __llvm_profile_register_names_function(void *ptr, long int i) {} +} + +#pragma omp end declare target >From fb067d4ffe604fd68cf90b705db1942bce49dbb1 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Sat, 16 Dec 2023 01:18:41 -0600 Subject: [PATCH 02/38] Fix PGO instrumentation for GPU targets --- clang/lib/CodeGen/CodeGenPGO.cpp | 10 ++++++++-- .../lib/Transforms/Instrumentation/InstrProfiling.cpp | 11 ++++++++--- 2 files changed, 16 insertions(+), 5 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index 81bf8ea696b164..edae6885b528ac 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -959,8 +959,14 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, unsigned Counter = (*RegionCounterMap)[S]; - llvm::Value *Args[] = {FuncNameVar, - Builder.getInt64(FunctionHash), + // Make sure that pointer to global is passed in with zero addrspace + // This is relevant during GPU profiling + auto *I8Ty = llvm::Type::getInt8Ty(CGM.getLLVMContext()); + auto *I8PtrTy = llvm::PointerType::getUnqual(I8Ty); + auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( + FuncNameVar, I8PtrTy); + + llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), Builder.getInt32(NumRegionCounters), Builder.getInt32(Counter), StepV}; if (!StepV) diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index fe5a0578bd9721..d2cb8155c17967 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -1658,10 +1658,13 @@ void InstrLowerer::emitRegistration() { IRBuilder<> IRB(BasicBlock::Create(M.getContext(), "", RegisterF)); for (Value *Data : CompilerUsedVars) if (!isa(Data)) - IRB.CreateCall(RuntimeRegisterF, Data); + // Check for addrspace cast when profiling GPU + IRB.CreateCall(RuntimeRegisterF, + IRB.CreatePointerBitCastOrAddrSpaceCast(Data, VoidPtrTy)); for (Value *Data : UsedVars) if (Data != NamesVar && !isa(Data)) - IRB.CreateCall(RuntimeRegisterF, Data); + IRB.CreateCall(RuntimeRegisterF, + IRB.CreatePointerBitCastOrAddrSpaceCast(Data, VoidPtrTy)); if (NamesVar) { Type *ParamTypes[] = {VoidPtrTy, Int64Ty}; @@ -1670,7 +1673,9 @@ void InstrLowerer::emitRegistration() { auto *NamesRegisterF = Function::Create(NamesRegisterTy, GlobalVariable::ExternalLinkage, getInstrProfNamesRegFuncName(), M); - IRB.CreateCall(NamesRegisterF, {NamesVar, IRB.getInt64(NamesSize)}); + IRB.CreateCall(NamesRegisterF, {IRB.CreatePointerBitCastOrAddrSpaceCast( + NamesVar, VoidPtrTy), + IRB.getInt64(NamesSize)}); } IRB.CreateRetVoid(); >From 7a0e0efa178cc4de6a22a8f5cc3f53cd1c81ea3a Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 21 Dec 2023 00:25:46 -0600 Subject: [PATCH 03/38] Change global visibility on GPU targets --- llvm/include/llvm/ProfileData/InstrProf.h | 4 ++++ llvm/lib/ProfileData/InstrProf.cpp | 17 +++++++++++++++-- .../Instrumentation/InstrProfiling.cpp | 15 +++++++++++---- 3 files changed, 30 insertions(+), 6 deletions(-) diff --git a/llvm/include/llvm/ProfileData/InstrProf.h b/llvm/include/llvm/ProfileData/InstrProf.h index 288dc71d756aee..bf9899d867e3dd 100644 --- a/llvm/include/llvm/ProfileData/InstrProf.h +++ b/llvm/include/llvm/ProfileData/InstrProf.h @@ -171,6 +171,10 @@ inline StringRef getInstrProfCounterBiasVarName() { /// Return the marker used to separate PGO names during serialization. inline StringRef getInstrProfNameSeparator() { return "\01"; } +/// Determines whether module targets a GPU eligable for PGO +/// instrumentation +bool isGPUProfTarget(const Module &M); + /// Return the modified name for function \c F suitable to be /// used the key for profile lookup. Variable \c InLTO indicates if this /// is called in LTO optimization passes. diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp index 649d814cfd9de0..0d6717aeb0142c 100644 --- a/llvm/lib/ProfileData/InstrProf.cpp +++ b/llvm/lib/ProfileData/InstrProf.cpp @@ -410,13 +410,22 @@ std::string getPGOFuncNameVarName(StringRef FuncName, return VarName; } +bool isGPUProfTarget(const Module &M) { + const auto &triple = M.getTargetTriple(); + return triple.rfind("nvptx", 0) == 0 || triple.rfind("amdgcn", 0) == 0 || + triple.rfind("r600", 0) == 0; +} + GlobalVariable *createPGOFuncNameVar(Module &M, GlobalValue::LinkageTypes Linkage, StringRef PGOFuncName) { + // Ensure profiling variables on GPU are visible to be read from host + if (isGPUProfTarget(M)) + Linkage = GlobalValue::ExternalLinkage; // We generally want to match the function's linkage, but available_externally // and extern_weak both have the wrong semantics, and anything that doesn't // need to link across compilation units doesn't need to be visible at all. - if (Linkage == GlobalValue::ExternalWeakLinkage) + else if (Linkage == GlobalValue::ExternalWeakLinkage) Linkage = GlobalValue::LinkOnceAnyLinkage; else if (Linkage == GlobalValue::AvailableExternallyLinkage) Linkage = GlobalValue::LinkOnceODRLinkage; @@ -430,8 +439,12 @@ GlobalVariable *createPGOFuncNameVar(Module &M, new GlobalVariable(M, Value->getType(), true, Linkage, Value, getPGOFuncNameVarName(PGOFuncName, Linkage)); + // If the target is a GPU, make the symbol protected so it can + // be read from the host device + if (isGPUProfTarget(M)) + FuncNameVar->setVisibility(GlobalValue::ProtectedVisibility); // Hide the symbol so that we correctly get a copy for each executable. - if (!GlobalValue::isLocalLinkage(FuncNameVar->getLinkage())) + else if (!GlobalValue::isLocalLinkage(FuncNameVar->getLinkage())) FuncNameVar->setVisibility(GlobalValue::HiddenVisibility); return FuncNameVar; diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index d2cb8155c17967..3b582b65190808 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -1481,6 +1481,10 @@ void InstrLowerer::createDataVariable(InstrProfCntrInstBase *Inc) { for (uint32_t Kind = IPVK_First; Kind <= IPVK_Last; ++Kind) Int16ArrayVals[Kind] = ConstantInt::get(Int16Ty, PD.NumValueSites[Kind]); + if (isGPUProfTarget(M)) { + Linkage = GlobalValue::ExternalLinkage; + Visibility = GlobalValue::ProtectedVisibility; + } // If the data variable is not referenced by code (if we don't emit // @llvm.instrprof.value.profile, NS will be 0), and the counter keeps the // data variable live under linker GC, the data variable can be private. This @@ -1492,9 +1496,9 @@ void InstrLowerer::createDataVariable(InstrProfCntrInstBase *Inc) { // If profd is in a deduplicate comdat, NS==0 with a hash suffix guarantees // that other copies must have the same CFG and cannot have value profiling. // If no hash suffix, other profd copies may be referenced by code. - if (NS == 0 && !(DataReferencedByCode && NeedComdat && !Renamed) && - (TT.isOSBinFormatELF() || - (!DataReferencedByCode && TT.isOSBinFormatCOFF()))) { + else if (NS == 0 && !(DataReferencedByCode && NeedComdat && !Renamed) && + (TT.isOSBinFormatELF() || + (!DataReferencedByCode && TT.isOSBinFormatCOFF()))) { Linkage = GlobalValue::PrivateLinkage; Visibility = GlobalValue::DefaultVisibility; } @@ -1696,7 +1700,10 @@ bool InstrLowerer::emitRuntimeHook() { auto *Var = new GlobalVariable(M, Int32Ty, false, GlobalValue::ExternalLinkage, nullptr, getInstrProfRuntimeHookVarName()); - Var->setVisibility(GlobalValue::HiddenVisibility); + if (isGPUProfTarget(M)) + Var->setVisibility(GlobalValue::ProtectedVisibility); + else + Var->setVisibility(GlobalValue::HiddenVisibility); if (TT.isOSBinFormatELF() && !TT.isPS()) { // Mark the user variable as used so that it isn't stripped out. >From fddc07908ed9aa698fe3250ddbfc5621ab4d049d Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 22 Dec 2023 23:43:29 -0600 Subject: [PATCH 04/38] Make names global public on GPU --- llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index 3b582b65190808..61fba7be3ee0ee 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -1621,6 +1621,13 @@ void InstrLowerer::emitNameData() { NamesVar = new GlobalVariable(M, NamesVal->getType(), true, GlobalValue::PrivateLinkage, NamesVal, getInstrProfNamesVarName()); + + // Make names variable public if current target is a GPU + if (isGPUProfTarget(M)) { + NamesVar->setLinkage(GlobalValue::ExternalLinkage); + NamesVar->setVisibility(GlobalValue::VisibilityTypes::ProtectedVisibility); + } + NamesSize = CompressedNameStr.size(); setGlobalVariableLargeSection(TT, *NamesVar); NamesVar->setSection( >From e9db03c70bf79f4f4ddad4b48a5aa63a37e0d4f6 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 29 Dec 2023 12:54:50 -0600 Subject: [PATCH 05/38] Read and print GPU device PGO globals --- .../common/include/GlobalHandler.h | 27 ++++++ .../common/src/GlobalHandler.cpp | 82 +++++++++++++++++++ .../common/src/PluginInterface.cpp | 14 ++++ 3 files changed, 123 insertions(+) diff --git a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h index fa079ac9660ee0..a82cd536487653 100644 --- a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h +++ b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h @@ -14,9 +14,11 @@ #define LLVM_OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_GLOBALHANDLER_H #include +#include #include "llvm/ADT/DenseMap.h" #include "llvm/Object/ELFObjectFile.h" +#include "llvm/ProfileData/InstrProf.h" #include "Shared/Debug.h" #include "Shared/Utils.h" @@ -58,6 +60,22 @@ class GlobalTy { void setPtr(void *P) { Ptr = P; } }; +typedef void *IntPtrT; +struct __llvm_profile_data { +#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) Type Name; +#include "llvm/ProfileData/InstrProfData.inc" +}; + +/// PGO profiling data extracted from a GPU device +struct GPUProfGlobals { + std::string names; + std::vector> counts; + std::vector<__llvm_profile_data> data; + Triple targetTriple; + + void dump() const; +}; + /// Subclass of GlobalTy that holds the memory for a global of \p Ty. template class StaticGlobalTy : public GlobalTy { Ty Data; @@ -172,6 +190,15 @@ class GenericGlobalHandlerTy { return moveGlobalBetweenDeviceAndHost(Device, Image, HostGlobal, /* D2H */ false); } + + /// Checks whether a given image contains profiling globals. + bool hasProfilingGlobals(GenericDeviceTy &Device, DeviceImageTy &Image); + + /// Reads profiling data from a GPU image to supplied profdata struct. + /// Iterates through the image symbol table and stores global values + /// with profiling prefixes. + Expected readProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image); }; } // namespace plugin diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index 3a272e228c7dfe..5dd5daec468ca5 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -176,3 +176,85 @@ Error GenericGlobalHandlerTy::readGlobalFromImage(GenericDeviceTy &Device, return Plugin::success(); } + +bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GlobalTy global(getInstrProfNamesVarName().str(), 0); + if (auto Err = getGlobalMetadataFromImage(Device, Image, global)) { + consumeError(std::move(Err)); + return false; + } + return true; +} + +Expected +GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GPUProfGlobals profdata; + const auto *elf = getOrCreateELFObjectFile(Device, Image); + profdata.targetTriple = elf->makeTriple(); + // Iterate through + for (auto &sym : elf->symbols()) { + if (auto name = sym.getName()) { + // Check if given current global is a profiling global based + // on name + if (name->equals(getInstrProfNamesVarName())) { + // Read in profiled function names + std::vector chars(sym.getSize() / sizeof(char), ' '); + GlobalTy NamesGlobal(name->str(), sym.getSize(), chars.data()); + if (auto Err = readGlobalFromDevice(Device, Image, NamesGlobal)) + return Err; + std::string names(chars.begin(), chars.end()); + profdata.names = std::move(names); + } else if (name->starts_with(getInstrProfCountersVarPrefix())) { + // Read global variable profiling counts + std::vector counts(sym.getSize() / sizeof(int64_t), 0); + GlobalTy CountGlobal(name->str(), sym.getSize(), counts.data()); + if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal)) + return Err; + profdata.counts.push_back(std::move(counts)); + } else if (name->starts_with(getInstrProfDataVarPrefix())) { + // Read profiling data for this global variable + __llvm_profile_data data{}; + GlobalTy DataGlobal(name->str(), sym.getSize(), &data); + if (auto Err = readGlobalFromDevice(Device, Image, DataGlobal)) + return Err; + profdata.data.push_back(std::move(data)); + } + } + } + return profdata; +} + +void GPUProfGlobals::dump() const { + llvm::outs() << "======= GPU Profile =======\nTarget: " << targetTriple.str() + << "\n"; + + llvm::outs() << "======== Counters =========\n"; + for (const auto &count : counts) { + llvm::outs() << "["; + for (size_t i = 0; i < count.size(); i++) { + if (i == 0) + llvm::outs() << " "; + llvm::outs() << count[i] << " "; + } + llvm::outs() << "]\n"; + } + + llvm::outs() << "========== Data ===========\n"; + for (const auto &d : data) { + llvm::outs() << "{ "; +#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ + llvm::outs() << d.Name << " "; +#include "llvm/ProfileData/InstrProfData.inc" + llvm::outs() << " }\n"; + } + + llvm::outs() << "======== Functions ========\n"; + InstrProfSymtab symtab; + if (Error Err = symtab.create(StringRef(names))) { + consumeError(std::move(Err)); + } + symtab.dumpNames(llvm::outs()); + llvm::outs() << "===========================\n"; +} diff --git a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp index 3c7d1ca8998787..84ed90f03f84f1 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp @@ -811,6 +811,20 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { DeviceMemoryPoolTracking.AllocationMax); } + for (auto *Image : LoadedImages) { + GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler(); + if (!Handler.hasProfilingGlobals(*this, *Image)) + continue; + + GPUProfGlobals profdata; + auto ProfOrErr = Handler.readProfilingGlobals(*this, *Image); + if (!ProfOrErr) + return ProfOrErr.takeError(); + + // TODO: write data to profiling file + ProfOrErr->dump(); + } + // Delete the memory manager before deinitializing the device. Otherwise, // we may delete device allocations after the device is deinitialized. if (MemoryManager) >From e4687605d1a6ca932312025826db09dba84845a3 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 3 Jan 2024 17:06:15 -0600 Subject: [PATCH 06/38] Fix rebase bug --- .../plugins-nextgen/common/src/GlobalHandler.cpp | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index cb71b61f4a9c4f..86742d0f77a2fe 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -178,10 +178,12 @@ Expected GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, DeviceImageTy &Image) { GPUProfGlobals profdata; - const auto *elf = getOrCreateELFObjectFile(Device, Image); - profdata.targetTriple = elf->makeTriple(); - // Iterate through - for (auto &sym : elf->symbols()) { + auto ELFObj = getELFObjectFile(Image); + if (!ELFObj) + return ELFObj.takeError(); + profdata.targetTriple = ELFObj->makeTriple(); + // Iterate through elf symbols + for (auto &sym : ELFObj->symbols()) { if (auto name = sym.getName()) { // Check if given current global is a profiling global based // on name >From ec18ce94c227e1d43927955fa1c67360ecfcfca6 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 3 Jan 2024 17:10:19 -0600 Subject: [PATCH 07/38] Refactor portions to be more idiomatic --- clang/lib/CodeGen/CodeGenPGO.cpp | 4 +--- llvm/lib/ProfileData/InstrProf.cpp | 5 ++--- 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index edae6885b528ac..7bfcec43ee4c98 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -961,10 +961,8 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *I8Ty = llvm::Type::getInt8Ty(CGM.getLLVMContext()); - auto *I8PtrTy = llvm::PointerType::getUnqual(I8Ty); auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - FuncNameVar, I8PtrTy); + FuncNameVar, llvm::PointerType::getUnqual(CGM.getLLVMContext())); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), Builder.getInt32(NumRegionCounters), diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp index cdcd6840bb5108..1d88da16a5ff9c 100644 --- a/llvm/lib/ProfileData/InstrProf.cpp +++ b/llvm/lib/ProfileData/InstrProf.cpp @@ -429,9 +429,8 @@ std::string getPGOFuncNameVarName(StringRef FuncName, } bool isGPUProfTarget(const Module &M) { - const auto &triple = M.getTargetTriple(); - return triple.rfind("nvptx", 0) == 0 || triple.rfind("amdgcn", 0) == 0 || - triple.rfind("r600", 0) == 0; + const auto &Triple = llvm::Triple(M.getTargetTriple()); + return Triple.isAMDGPU() || Triple.isNVPTX(); } GlobalVariable *createPGOFuncNameVar(Module &M, >From 0872556f597056361b0a2c23cdd0be3d9745aef3 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 3 Jan 2024 17:18:47 -0600 Subject: [PATCH 08/38] Reformat DeviceRTL prof functions --- openmp/libomptarget/DeviceRTL/include/Profiling.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/openmp/libomptarget/DeviceRTL/include/Profiling.h b/openmp/libomptarget/DeviceRTL/include/Profiling.h index 68c7744cd60752..9efc1554c176bc 100644 --- a/openmp/libomptarget/DeviceRTL/include/Profiling.h +++ b/openmp/libomptarget/DeviceRTL/include/Profiling.h @@ -13,9 +13,8 @@ #define OMPTARGET_DEVICERTL_PROFILING_H extern "C" { - -void __llvm_profile_register_function(void *ptr); -void __llvm_profile_register_names_function(void *ptr, long int i); +void __llvm_profile_register_function(void *Ptr); +void __llvm_profile_register_names_function(void *Ptr, long int I); } #endif >From 62f31d1c71b5d100f38d6dc584cc138b3904581b Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 9 Jan 2024 11:52:29 -0600 Subject: [PATCH 09/38] Style changes + catch name error --- .../common/include/GlobalHandler.h | 16 ++-- .../common/src/GlobalHandler.cpp | 87 ++++++++++--------- 2 files changed, 56 insertions(+), 47 deletions(-) diff --git a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h index a803b3f76d8b25..755bb23a414e37 100644 --- a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h +++ b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h @@ -13,8 +13,7 @@ #ifndef LLVM_OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_GLOBALHANDLER_H #define LLVM_OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_GLOBALHANDLER_H -#include -#include +#include #include "llvm/ADT/DenseMap.h" #include "llvm/Object/ELFObjectFile.h" @@ -60,18 +59,19 @@ class GlobalTy { void setPtr(void *P) { Ptr = P; } }; -typedef void *IntPtrT; +using IntPtrT = void *; struct __llvm_profile_data { -#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) Type Name; +#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ + std::remove_const::type Name; #include "llvm/ProfileData/InstrProfData.inc" }; /// PGO profiling data extracted from a GPU device struct GPUProfGlobals { - std::string names; - std::vector> counts; - std::vector<__llvm_profile_data> data; - Triple targetTriple; + SmallVector NamesData; + SmallVector> Counts; + SmallVector<__llvm_profile_data> Data; + Triple TargetTriple; void dump() const; }; diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index 86742d0f77a2fe..7cb672e7b26839 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -19,6 +19,7 @@ #include "llvm/Support/Error.h" #include +#include using namespace llvm; using namespace omp; @@ -177,73 +178,81 @@ bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy &Device, Expected GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, DeviceImageTy &Image) { - GPUProfGlobals profdata; + GPUProfGlobals DeviceProfileData; auto ELFObj = getELFObjectFile(Image); if (!ELFObj) return ELFObj.takeError(); - profdata.targetTriple = ELFObj->makeTriple(); + DeviceProfileData.TargetTriple = ELFObj->makeTriple(); + // Iterate through elf symbols - for (auto &sym : ELFObj->symbols()) { - if (auto name = sym.getName()) { - // Check if given current global is a profiling global based - // on name - if (name->equals(getInstrProfNamesVarName())) { - // Read in profiled function names - std::vector chars(sym.getSize() / sizeof(char), ' '); - GlobalTy NamesGlobal(name->str(), sym.getSize(), chars.data()); - if (auto Err = readGlobalFromDevice(Device, Image, NamesGlobal)) - return Err; - std::string names(chars.begin(), chars.end()); - profdata.names = std::move(names); - } else if (name->starts_with(getInstrProfCountersVarPrefix())) { - // Read global variable profiling counts - std::vector counts(sym.getSize() / sizeof(int64_t), 0); - GlobalTy CountGlobal(name->str(), sym.getSize(), counts.data()); - if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal)) - return Err; - profdata.counts.push_back(std::move(counts)); - } else if (name->starts_with(getInstrProfDataVarPrefix())) { - // Read profiling data for this global variable - __llvm_profile_data data{}; - GlobalTy DataGlobal(name->str(), sym.getSize(), &data); - if (auto Err = readGlobalFromDevice(Device, Image, DataGlobal)) - return Err; - profdata.data.push_back(std::move(data)); - } + for (auto &Sym : ELFObj->symbols()) { + auto NameOrErr = Sym.getName(); + if (!NameOrErr) + return ELFObj.takeError(); + + // Check if given current global is a profiling global based + // on name + if (NameOrErr->equals(getInstrProfNamesVarName())) { + // Read in profiled function names + DeviceProfileData.NamesData = SmallVector(Sym.getSize(), 0); + GlobalTy NamesGlobal(NameOrErr->str(), Sym.getSize(), + DeviceProfileData.NamesData.data()); + if (auto Err = readGlobalFromDevice(Device, Image, NamesGlobal)) + return Err; + } else if (NameOrErr->starts_with(getInstrProfCountersVarPrefix())) { + // Read global variable profiling counts + SmallVector Counts(Sym.getSize() / sizeof(int64_t), 0); + GlobalTy CountGlobal(NameOrErr->str(), Sym.getSize(), Counts.data()); + if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal)) + return Err; + DeviceProfileData.Counts.push_back(std::move(Counts)); + } else if (NameOrErr->starts_with(getInstrProfDataVarPrefix())) { + // Read profiling data for this global variable + __llvm_profile_data Data{}; + GlobalTy DataGlobal(NameOrErr->str(), Sym.getSize(), &Data); + if (auto Err = readGlobalFromDevice(Device, Image, DataGlobal)) + return Err; + DeviceProfileData.Data.push_back(std::move(Data)); } } - return profdata; + return DeviceProfileData; } void GPUProfGlobals::dump() const { - llvm::outs() << "======= GPU Profile =======\nTarget: " << targetTriple.str() + llvm::outs() << "======= GPU Profile =======\nTarget: " << TargetTriple.str() << "\n"; llvm::outs() << "======== Counters =========\n"; - for (const auto &count : counts) { + for (const auto &Count : Counts) { llvm::outs() << "["; - for (size_t i = 0; i < count.size(); i++) { + for (size_t i = 0; i < Count.size(); i++) { if (i == 0) llvm::outs() << " "; - llvm::outs() << count[i] << " "; + llvm::outs() << Count[i] << " "; } llvm::outs() << "]\n"; } llvm::outs() << "========== Data ===========\n"; - for (const auto &d : data) { + for (const auto &ProfData : Data) { llvm::outs() << "{ "; #define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ - llvm::outs() << d.Name << " "; + llvm::outs() << ProfData.Name << " "; #include "llvm/ProfileData/InstrProfData.inc" llvm::outs() << " }\n"; } llvm::outs() << "======== Functions ========\n"; - InstrProfSymtab symtab; - if (Error Err = symtab.create(StringRef(names))) { + std::string s; + s.reserve(NamesData.size()); + for (uint8_t Name : NamesData) { + s.push_back((char)Name); + } + + InstrProfSymtab Symtab; + if (Error Err = Symtab.create(StringRef(s))) { consumeError(std::move(Err)); } - symtab.dumpNames(llvm::outs()); + Symtab.dumpNames(llvm::outs()); llvm::outs() << "===========================\n"; } >From 0c4bbeb54d189c1461affd37853aa86c3e3ca7d8 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 17 Jan 2024 19:59:06 -0600 Subject: [PATCH 10/38] Add GPU PGO test --- .../common/src/GlobalHandler.cpp | 2 +- openmp/libomptarget/test/CMakeLists.txt | 6 +++ openmp/libomptarget/test/lit.cfg | 3 ++ openmp/libomptarget/test/lit.site.cfg.in | 2 +- openmp/libomptarget/test/offloading/pgo1.c | 39 +++++++++++++++++++ 5 files changed, 50 insertions(+), 2 deletions(-) create mode 100644 openmp/libomptarget/test/offloading/pgo1.c diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index 7cb672e7b26839..e5eb653d022287 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -239,7 +239,7 @@ void GPUProfGlobals::dump() const { #define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ llvm::outs() << ProfData.Name << " "; #include "llvm/ProfileData/InstrProfData.inc" - llvm::outs() << " }\n"; + llvm::outs() << "}\n"; } llvm::outs() << "======== Functions ========\n"; diff --git a/openmp/libomptarget/test/CMakeLists.txt b/openmp/libomptarget/test/CMakeLists.txt index a0ba233eaa5726..21233f3e252eb5 100644 --- a/openmp/libomptarget/test/CMakeLists.txt +++ b/openmp/libomptarget/test/CMakeLists.txt @@ -12,6 +12,12 @@ else() set(LIBOMPTARGET_DEBUG False) endif() +if (OPENMP_STANDALONE_BUILD) + set(LIBOMPTARGET_TEST_GPU_PGO False) +else() + set(LIBOMPTARGET_TEST_GPU_PGO True) +endif() + # Replace the space from user's input with ";" in case that CMake add escape # char into the lit command. string(REPLACE " " ";" LIBOMPTARGET_LIT_ARG_LIST "${LIBOMPTARGET_LIT_ARGS}") diff --git a/openmp/libomptarget/test/lit.cfg b/openmp/libomptarget/test/lit.cfg index 19c5e5c4572227..49743f9fed7f29 100644 --- a/openmp/libomptarget/test/lit.cfg +++ b/openmp/libomptarget/test/lit.cfg @@ -104,6 +104,9 @@ config.available_features.add(config.libomptarget_current_target) if config.libomptarget_has_libc: config.available_features.add('libc') +if config.libomptarget_test_pgo: + config.available_features.add('pgo') + # Determine whether the test system supports unified memory. # For CUDA, this is the case with compute capability 70 (Volta) or higher. # For all other targets, we currently assume it is. diff --git a/openmp/libomptarget/test/lit.site.cfg.in b/openmp/libomptarget/test/lit.site.cfg.in index 2d638118838727..494d1636af304a 100644 --- a/openmp/libomptarget/test/lit.site.cfg.in +++ b/openmp/libomptarget/test/lit.site.cfg.in @@ -25,6 +25,6 @@ config.libomptarget_not = "@OPENMP_NOT_EXECUTABLE@" config.libomptarget_debug = @LIBOMPTARGET_DEBUG@ config.has_libomptarget_ompt = @LIBOMPTARGET_OMPT_SUPPORT@ config.libomptarget_has_libc = @LIBOMPTARGET_GPU_LIBC_SUPPORT@ - +config.libomptarget_test_pgo = @LIBOMPTARGET_TEST_GPU_PGO@ # Let the main config do the real work. lit_config.load_config(config, "@CMAKE_CURRENT_SOURCE_DIR@/lit.cfg") diff --git a/openmp/libomptarget/test/offloading/pgo1.c b/openmp/libomptarget/test/offloading/pgo1.c new file mode 100644 index 00000000000000..ca8a6f502a06aa --- /dev/null +++ b/openmp/libomptarget/test/offloading/pgo1.c @@ -0,0 +1,39 @@ +// RUN: %libomptarget-compile-generic -fprofile-instr-generate -Xclang "-fprofile-instrument=clang" +// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic + +// UNSUPPORTED: x86_64-pc-linux-gnu +// UNSUPPORTED: x86_64-pc-linux-gnu-LTO +// UNSUPPORTED: aarch64-unknown-linux-gnu +// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +// REQUIRES: pgo + +#ifdef _OPENMP +#include +#endif + +int test1(int a) { return a / 2; } +int test2(int a) { return a * 2; } + +int main() { + int m = 2; +#pragma omp target + for (int i = 0; i < 10; i++) { + m = test1(m); + for (int j = 0; j < 2; j++) { + m = test2(m); + } + } +} + +// CHECK: ======== Counters ========= +// CHECK-NEXT: [ 0 11 20 ] +// CHECK-NEXT: [ 10 ] +// CHECK-NEXT: [ 20 ] +// CHECK-NEXT: ========== Data =========== +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: ======== Functions ======== +// CHECK-NEXT: pgo1.c:__omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} +// CHECK-NEXT: test1 +// CHECK-NEXT: test2 >From c7ae2a74daa93b05058fcc9bba64e0734359362c Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 17 Jan 2024 23:12:27 -0600 Subject: [PATCH 11/38] Fix PGO test formatting --- openmp/libomptarget/test/offloading/pgo1.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/openmp/libomptarget/test/offloading/pgo1.c b/openmp/libomptarget/test/offloading/pgo1.c index ca8a6f502a06aa..389be19b670d76 100644 --- a/openmp/libomptarget/test/offloading/pgo1.c +++ b/openmp/libomptarget/test/offloading/pgo1.c @@ -1,4 +1,5 @@ -// RUN: %libomptarget-compile-generic -fprofile-instr-generate -Xclang "-fprofile-instrument=clang" +// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ +// RUN: -Xclang "-fprofile-instrument=clang" // RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic // UNSUPPORTED: x86_64-pc-linux-gnu @@ -30,9 +31,18 @@ int main() { // CHECK-NEXT: [ 10 ] // CHECK-NEXT: [ 20 ] // CHECK-NEXT: ========== Data =========== -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } // CHECK-NEXT: ======== Functions ======== // CHECK-NEXT: pgo1.c:__omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} // CHECK-NEXT: test1 >From 8bb22072914bbb830e2788d117aedd0e0bab66ff Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 18 Jan 2024 23:15:55 -0600 Subject: [PATCH 12/38] Refactor visibility logic --- llvm/lib/ProfileData/InstrProf.cpp | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp index 511571a3eed9b0..708ea63fd95e04 100644 --- a/llvm/lib/ProfileData/InstrProf.cpp +++ b/llvm/lib/ProfileData/InstrProf.cpp @@ -422,6 +422,16 @@ bool isGPUProfTarget(const Module &M) { return Triple.isAMDGPU() || Triple.isNVPTX(); } +void setPGOFuncVisibility(Module &M, GlobalVariable *FuncNameVar) { + // If the target is a GPU, make the symbol protected so it can + // be read from the host device + if (isGPUProfTarget(M)) + FuncNameVar->setVisibility(GlobalValue::ProtectedVisibility); + // Hide the symbol so that we correctly get a copy for each executable. + else if (!GlobalValue::isLocalLinkage(FuncNameVar->getLinkage())) + FuncNameVar->setVisibility(GlobalValue::HiddenVisibility); +} + GlobalVariable *createPGOFuncNameVar(Module &M, GlobalValue::LinkageTypes Linkage, StringRef PGOFuncName) { @@ -445,14 +455,7 @@ GlobalVariable *createPGOFuncNameVar(Module &M, new GlobalVariable(M, Value->getType(), true, Linkage, Value, getPGOFuncNameVarName(PGOFuncName, Linkage)); - // If the target is a GPU, make the symbol protected so it can - // be read from the host device - if (isGPUProfTarget(M)) - FuncNameVar->setVisibility(GlobalValue::ProtectedVisibility); - // Hide the symbol so that we correctly get a copy for each executable. - else if (!GlobalValue::isLocalLinkage(FuncNameVar->getLinkage())) - FuncNameVar->setVisibility(GlobalValue::HiddenVisibility); - + setPGOFuncVisibility(M, FuncNameVar); return FuncNameVar; } >From 9f13943f64cb16162e44902d54de53a9b1229179 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 23 Jan 2024 18:33:58 -0600 Subject: [PATCH 13/38] Add LLVM instrumentation support This PR formerly only supported -fprofile-instrument=clang. This commit adds support for -fprofile-instrument=llvm --- .../Instrumentation/PGOInstrumentation.cpp | 12 +++- openmp/libomptarget/test/offloading/pgo1.c | 72 +++++++++++++------ 2 files changed, 59 insertions(+), 25 deletions(-) diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index c20fc942eaf0d5..bbc8da78fd7baf 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -862,6 +862,10 @@ static void instrumentOneFunc( auto Name = FuncInfo.FuncNameVar; auto CFGHash = ConstantInt::get(Type::getInt64Ty(M->getContext()), FuncInfo.FunctionHash); + // Make sure that pointer to global is passed in with zero addrspace + // This is relevant during GPU profiling + auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( + Name, llvm::PointerType::getUnqual(M->getContext())); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); IRBuilder<> Builder(&EntryBB, EntryBB.getFirstInsertionPt()); @@ -869,7 +873,7 @@ static void instrumentOneFunc( // i32 ) Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_cover), - {Name, CFGHash, Builder.getInt32(1), Builder.getInt32(0)}); + {NormalizedPtr, CFGHash, Builder.getInt32(1), Builder.getInt32(0)}); return; } @@ -887,7 +891,8 @@ static void instrumentOneFunc( // i32 ) Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_timestamp), - {Name, CFGHash, Builder.getInt32(NumCounters), Builder.getInt32(I)}); + {NormalizedPtr, CFGHash, Builder.getInt32(NumCounters), + Builder.getInt32(I)}); I += PGOBlockCoverage ? 8 : 1; } @@ -901,7 +906,8 @@ static void instrumentOneFunc( Intrinsic::getDeclaration(M, PGOBlockCoverage ? Intrinsic::instrprof_cover : Intrinsic::instrprof_increment), - {Name, CFGHash, Builder.getInt32(NumCounters), Builder.getInt32(I++)}); + {NormalizedPtr, CFGHash, Builder.getInt32(NumCounters), + Builder.getInt32(I++)}); } // Now instrument select instructions: diff --git a/openmp/libomptarget/test/offloading/pgo1.c b/openmp/libomptarget/test/offloading/pgo1.c index 389be19b670d76..d95793b508dcfc 100644 --- a/openmp/libomptarget/test/offloading/pgo1.c +++ b/openmp/libomptarget/test/offloading/pgo1.c @@ -1,6 +1,11 @@ // RUN: %libomptarget-compile-generic -fprofile-instr-generate \ // RUN: -Xclang "-fprofile-instrument=clang" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic +// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ +// RUN: --check-prefix="CLANG-PGO" +// RUN: %libomptarget-compile-generic -fprofile-generate \ +// RUN: -Xclang "-fprofile-instrument=llvm" +// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ +// RUN: --check-prefix="LLVM-PGO" // UNSUPPORTED: x86_64-pc-linux-gnu // UNSUPPORTED: x86_64-pc-linux-gnu-LTO @@ -26,24 +31,47 @@ int main() { } } -// CHECK: ======== Counters ========= -// CHECK-NEXT: [ 0 11 20 ] -// CHECK-NEXT: [ 10 ] -// CHECK-NEXT: [ 20 ] -// CHECK-NEXT: ========== Data =========== -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: ======== Functions ======== -// CHECK-NEXT: pgo1.c:__omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} -// CHECK-NEXT: test1 -// CHECK-NEXT: test2 +// CLANG-PGO: ======== Counters ========= +// CLANG-PGO-NEXT: [ 0 11 20 ] +// CLANG-PGO-NEXT: [ 10 ] +// CLANG-PGO-NEXT: [ 20 ] +// CLANG-PGO-NEXT: ========== Data =========== +// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CLANG-PGO-NEXT: ======== Functions ======== +// CLANG-PGO-NEXT: pgo1.c: +// CLANG-PGO-SAME: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} +// CLANG-PGO-NEXT: test1 +// CLANG-PGO-NEXT: test2 + +// LLVM-PGO: ======== Counters ========= +// LLVM-PGO-NEXT: [ 20 ] +// LLVM-PGO-NEXT: [ 10 ] +// LLVM-PGO-NEXT: [ 20 10 1 1 ] +// LLVM-PGO-NEXT: ========== Data =========== +// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// LLVM-PGO-NEXT: ======== Functions ======== +// LLVM-PGO-NEXT: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} +// LLVM-PGO-NEXT: test1 +// LLVM-PGO-NEXT: test2 >From 0606f0dd1b32ef9ebe138bbc964b3921e22d95d1 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 14 Feb 2024 01:46:55 -0600 Subject: [PATCH 14/38] Use explicit addrspace instead of unqual --- clang/lib/CodeGen/CodeGenPGO.cpp | 2 +- llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index e084dda879cbc0..4c75a01222d304 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1103,7 +1103,7 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - FuncNameVar, llvm::PointerType::getUnqual(CGM.getLLVMContext())); + FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), Builder.getInt32(NumRegionCounters), diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index bbc8da78fd7baf..c63b3e4ecf786a 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -865,7 +865,7 @@ static void instrumentOneFunc( // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - Name, llvm::PointerType::getUnqual(M->getContext())); + Name, llvm::PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); IRBuilder<> Builder(&EntryBB, EntryBB.getFirstInsertionPt()); >From c1f9be321678766525141214aaab74636cafbc2c Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 15 Feb 2024 19:10:09 -0600 Subject: [PATCH 15/38] Remove redundant namespaces --- .../Instrumentation/PGOInstrumentation.cpp | 4 +-- .../common/src/GlobalHandler.cpp | 26 +++++++++---------- 2 files changed, 15 insertions(+), 15 deletions(-) diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index c63b3e4ecf786a..3058e577738fda 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -864,8 +864,8 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - Name, llvm::PointerType::get(M->getContext(), 0)); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); IRBuilder<> Builder(&EntryBB, EntryBB.getFirstInsertionPt()); diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index e5eb653d022287..ae270c60804d26 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -219,30 +219,30 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, } void GPUProfGlobals::dump() const { - llvm::outs() << "======= GPU Profile =======\nTarget: " << TargetTriple.str() + outs() << "======= GPU Profile =======\nTarget: " << TargetTriple.str() << "\n"; - llvm::outs() << "======== Counters =========\n"; + outs() << "======== Counters =========\n"; for (const auto &Count : Counts) { - llvm::outs() << "["; + outs() << "["; for (size_t i = 0; i < Count.size(); i++) { if (i == 0) - llvm::outs() << " "; - llvm::outs() << Count[i] << " "; + outs() << " "; + outs() << Count[i] << " "; } - llvm::outs() << "]\n"; + outs() << "]\n"; } - llvm::outs() << "========== Data ===========\n"; + outs() << "========== Data ===========\n"; for (const auto &ProfData : Data) { - llvm::outs() << "{ "; + outs() << "{ "; #define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ - llvm::outs() << ProfData.Name << " "; + outs() << ProfData.Name << " "; #include "llvm/ProfileData/InstrProfData.inc" - llvm::outs() << "}\n"; + outs() << "}\n"; } - llvm::outs() << "======== Functions ========\n"; + outs() << "======== Functions ========\n"; std::string s; s.reserve(NamesData.size()); for (uint8_t Name : NamesData) { @@ -253,6 +253,6 @@ void GPUProfGlobals::dump() const { if (Error Err = Symtab.create(StringRef(s))) { consumeError(std::move(Err)); } - Symtab.dumpNames(llvm::outs()); - llvm::outs() << "===========================\n"; + Symtab.dumpNames(outs()); + outs() << "===========================\n"; } >From 6a3ae407e69e7524f0f808329c534f8352ee1779 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 15 Feb 2024 19:15:15 -0600 Subject: [PATCH 16/38] Clang format --- .../libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index ae270c60804d26..1fce2448922624 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -220,7 +220,7 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, void GPUProfGlobals::dump() const { outs() << "======= GPU Profile =======\nTarget: " << TargetTriple.str() - << "\n"; + << "\n"; outs() << "======== Counters =========\n"; for (const auto &Count : Counts) { >From 6866862d459e3c3fa65fae8ae639ddc3ff735252 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 16 Feb 2024 13:13:39 -0600 Subject: [PATCH 17/38] Use getAddrSpaceCast Replace getPointerBitCastOrAddrSpaceCast with getAddrSpaceCast and allow no-op getAddrSpaceCast calls when types are identical --- clang/lib/CodeGen/CodeGenPGO.cpp | 2 +- llvm/lib/IR/Constants.cpp | 4 ++++ llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | 2 +- 3 files changed, 6 insertions(+), 2 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index 8f52018445d2b0..baceeba8380ddb 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1099,7 +1099,7 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( + auto *NormalizedPtr = llvm::ConstantExpr::getAddrSpaceCast( FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp index a38b912164b130..2d89c5bbd4a4c2 100644 --- a/llvm/lib/IR/Constants.cpp +++ b/llvm/lib/IR/Constants.cpp @@ -2067,6 +2067,10 @@ Constant *ConstantExpr::getBitCast(Constant *C, Type *DstTy, Constant *ConstantExpr::getAddrSpaceCast(Constant *C, Type *DstTy, bool OnlyIfReduced) { + // Skip cast if types are identical + if (C->getType() == DstTy) + return C; + assert(CastInst::castIsValid(Instruction::AddrSpaceCast, C, DstTy) && "Invalid constantexpr addrspacecast!"); return getFoldedCast(Instruction::AddrSpaceCast, C, DstTy, OnlyIfReduced); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index 3058e577738fda..c0be71aa4cc004 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -864,7 +864,7 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + auto *NormalizedPtr = ConstantExpr::getAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); >From 62a5ee1c75545571f81d9edd22e19e9ef7cff69f Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 27 Feb 2024 14:53:51 -0600 Subject: [PATCH 18/38] Revert "Use getAddrSpaceCast" This reverts commit 6866862d459e3c3fa65fae8ae639ddc3ff735252. --- clang/lib/CodeGen/CodeGenPGO.cpp | 2 +- llvm/lib/IR/Constants.cpp | 4 ---- llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | 2 +- 3 files changed, 2 insertions(+), 6 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index baceeba8380ddb..8f52018445d2b0 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1099,7 +1099,7 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getAddrSpaceCast( + auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp index 2d89c5bbd4a4c2..a38b912164b130 100644 --- a/llvm/lib/IR/Constants.cpp +++ b/llvm/lib/IR/Constants.cpp @@ -2067,10 +2067,6 @@ Constant *ConstantExpr::getBitCast(Constant *C, Type *DstTy, Constant *ConstantExpr::getAddrSpaceCast(Constant *C, Type *DstTy, bool OnlyIfReduced) { - // Skip cast if types are identical - if (C->getType() == DstTy) - return C; - assert(CastInst::castIsValid(Instruction::AddrSpaceCast, C, DstTy) && "Invalid constantexpr addrspacecast!"); return getFoldedCast(Instruction::AddrSpaceCast, C, DstTy, OnlyIfReduced); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index c0be71aa4cc004..3058e577738fda 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -864,7 +864,7 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = ConstantExpr::getAddrSpaceCast( + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); >From 052394fa28c923d130bf73a07b965a9751467302 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 27 Feb 2024 15:34:34 -0600 Subject: [PATCH 19/38] Revert "Use getAddrSpaceCast" This reverts commit 6866862d459e3c3fa65fae8ae639ddc3ff735252. --- clang/lib/CodeGen/CodeGenPGO.cpp | 2 +- llvm/lib/IR/Constants.cpp | 4 ---- llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | 2 +- 3 files changed, 2 insertions(+), 6 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index baceeba8380ddb..8f52018445d2b0 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1099,7 +1099,7 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getAddrSpaceCast( + auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp index 2d89c5bbd4a4c2..a38b912164b130 100644 --- a/llvm/lib/IR/Constants.cpp +++ b/llvm/lib/IR/Constants.cpp @@ -2067,10 +2067,6 @@ Constant *ConstantExpr::getBitCast(Constant *C, Type *DstTy, Constant *ConstantExpr::getAddrSpaceCast(Constant *C, Type *DstTy, bool OnlyIfReduced) { - // Skip cast if types are identical - if (C->getType() == DstTy) - return C; - assert(CastInst::castIsValid(Instruction::AddrSpaceCast, C, DstTy) && "Invalid constantexpr addrspacecast!"); return getFoldedCast(Instruction::AddrSpaceCast, C, DstTy, OnlyIfReduced); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index c0be71aa4cc004..3058e577738fda 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -864,7 +864,7 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = ConstantExpr::getAddrSpaceCast( + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); >From 612d5a5f6966a77e82e5591f5aea475fbf886e55 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 1 Mar 2024 02:04:00 -0600 Subject: [PATCH 20/38] Write PGO TODO: Fix tests --- compiler-rt/lib/profile/InstrProfiling.h | 11 ++ compiler-rt/lib/profile/InstrProfilingFile.c | 148 +++++++++++++++--- .../common/include/GlobalHandler.h | 14 +- .../common/src/GlobalHandler.cpp | 57 +++++-- .../common/src/PluginInterface.cpp | 6 +- 5 files changed, 200 insertions(+), 36 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfiling.h b/compiler-rt/lib/profile/InstrProfiling.h index 01239083369187..937acbd417de46 100644 --- a/compiler-rt/lib/profile/InstrProfiling.h +++ b/compiler-rt/lib/profile/InstrProfiling.h @@ -275,6 +275,17 @@ void __llvm_profile_get_padding_sizes_for_counters( */ void __llvm_profile_set_dumped(); +/*! + * \brief Write custom target-specific profiling data to a seperate file. + * Used by libomptarget for GPU PGO. + */ +int __llvm_write_custom_profile(const char *Target, + const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, + const char *CountersBegin, + const char *CountersEnd, const char *NamesBegin, + const char *NamesEnd); + /*! * This variable is defined in InstrProfilingRuntime.cpp as a hidden * symbol. Its main purpose is to enable profile runtime user to diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index f3b457d786e6bd..4fc401bb9bebf5 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -502,27 +502,15 @@ static FILE *getFileObject(const char *OutputName) { return fopen(OutputName, "ab"); } -/* Write profile data to file \c OutputName. */ -static int writeFile(const char *OutputName) { - int RetVal; - FILE *OutputFile; - - int MergeDone = 0; +/* Get file object and merge if applicable */ +static FILE *getMergeFileObject(const char *OutputName, int *MergeDone) { VPMergeHook = &lprofMergeValueProfData; if (doMerging()) - OutputFile = openFileForMerging(OutputName, &MergeDone); - else - OutputFile = getFileObject(OutputName); - - if (!OutputFile) - return -1; - - FreeHook = &free; - setupIOBuffer(); - ProfDataWriter fileWriter; - initFileWriter(&fileWriter, OutputFile); - RetVal = lprofWriteData(&fileWriter, lprofGetVPDataReader(), MergeDone); + return openFileForMerging(OutputName, MergeDone); + return getFileObject(OutputName); +} +static void closeFileObject(FILE *OutputFile) { if (OutputFile == getProfileFile()) { fflush(OutputFile); if (doMerging() && !__llvm_profile_is_continuous_mode_enabled()) { @@ -531,7 +519,23 @@ static int writeFile(const char *OutputName) { } else { fclose(OutputFile); } +} + +/* Write profile data to file \c OutputName. */ +static int writeFile(const char *OutputName) { + int RetVal, MergeDone = 0; + FILE *OutputFile = getMergeFileObject(OutputName, &MergeDone); + + if (!OutputFile) + return -1; + + FreeHook = &free; + setupIOBuffer(); + ProfDataWriter fileWriter; + initFileWriter(&fileWriter, OutputFile); + RetVal = lprofWriteData(&fileWriter, lprofGetVPDataReader(), MergeDone); + closeFileObject(OutputFile); return RetVal; } @@ -558,10 +562,16 @@ static int writeOrderFile(const char *OutputName) { #define LPROF_INIT_ONCE_ENV "__LLVM_PROFILE_RT_INIT_ONCE" +static void forceTruncateFile(const char *Filename) { + FILE *File = fopen(Filename, "w"); + if (!File) + return; + fclose(File); +} + static void truncateCurrentFile(void) { const char *Filename; char *FilenameBuf; - FILE *File; int Length; Length = getCurFilenameLength(); @@ -591,10 +601,7 @@ static void truncateCurrentFile(void) { return; /* Truncate the file. Later we'll reopen and append. */ - File = fopen(Filename, "w"); - if (!File) - return; - fclose(File); + forceTruncateFile(Filename); } /* Write a partial profile to \p Filename, which is required to be backed by @@ -1271,4 +1278,99 @@ COMPILER_RT_VISIBILITY int __llvm_profile_set_file_object(FILE *File, return 0; } +int __llvm_write_custom_profile(const char *Target, + const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, + const char *CountersBegin, + const char *CountersEnd, const char *NamesBegin, + const char *NamesEnd) { + int ReturnValue = 0, FilenameLength, TargetLength, MergeDone; + char *FilenameBuf, *TargetFilename; + const char *Filename; + + /* Save old profile data */ + FILE *oldFile = getProfileFile(); + + // Temporarily suspend getting SIGKILL when the parent exits. + int PDeathSig = lprofSuspendSigKill(); + + if (lprofProfileDumped() || __llvm_profile_is_continuous_mode_enabled()) { + PROF_NOTE("Profile data not written to file: %s.\n", "already written"); + if (PDeathSig == 1) + lprofRestoreSigKill(); + return 0; + } + + /* Get current filename */ + FilenameLength = getCurFilenameLength(); + FilenameBuf = (char *)COMPILER_RT_ALLOCA(FilenameLength + 1); + Filename = getCurFilename(FilenameBuf, 0); + + /* Check the filename. */ + if (!Filename) { + PROF_ERR("Failed to write file : %s\n", "Filename not set"); + if (PDeathSig == 1) + lprofRestoreSigKill(); + return -1; + } + + /* Allocate new space for our target-specific PGO filename */ + TargetLength = strlen(Target); + TargetFilename = + (char *)COMPILER_RT_ALLOCA(FilenameLength + TargetLength + 2); + + /* Prepend "TARGET." to current filename */ + memcpy(TargetFilename, Target, TargetLength); + TargetFilename[TargetLength] = '.'; + memcpy(TargetFilename, Target, TargetLength); + memcpy(TargetFilename + 1 + TargetLength, Filename, FilenameLength); + TargetFilename[FilenameLength + 1 + TargetLength] = 0; + + /* Check if there is llvm/runtime version mismatch. */ + if (GET_VERSION(__llvm_profile_get_version()) != INSTR_PROF_RAW_VERSION) { + PROF_ERR("Runtime and instrumentation version mismatch : " + "expected %d, but get %d\n", + INSTR_PROF_RAW_VERSION, + (int)GET_VERSION(__llvm_profile_get_version())); + if (PDeathSig == 1) + lprofRestoreSigKill(); + return -1; + } + + /* Clean old target file */ + forceTruncateFile(TargetFilename); + + /* Open target-specific PGO file */ + MergeDone = 0; + FILE *OutputFile = getMergeFileObject(TargetFilename, &MergeDone); + + if (!OutputFile) { + PROF_ERR("Failed to open file : %s\n", TargetFilename); + if (PDeathSig == 1) + lprofRestoreSigKill(); + return -1; + } + + FreeHook = &free; + setupIOBuffer(); + ProfDataWriter fileWriter; + initFileWriter(&fileWriter, OutputFile); + + /* Write custom data to the file */ + ReturnValue = lprofWriteDataImpl( + &fileWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, + lprofGetVPDataReader(), NamesBegin, NamesEnd, MergeDone); + + closeFileObject(OutputFile); + + // Restore SIGKILL. + if (PDeathSig == 1) + lprofRestoreSigKill(); + + /* Restore old profiling file */ + setProfileFile(oldFile); + + return ReturnValue; +} + #endif diff --git a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h index f5a15ca11bfcda..af0cd4dcdf5dcf 100644 --- a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h +++ b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h @@ -63,14 +63,24 @@ struct __llvm_profile_data { #include "llvm/ProfileData/InstrProfData.inc" }; +extern "C" { +extern int __attribute__((weak)) +__llvm_write_custom_profile(const char *Target, + const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, + const char *CountersBegin, const char *CountersEnd, + const char *NamesBegin, const char *NamesEnd); +} + /// PGO profiling data extracted from a GPU device struct GPUProfGlobals { - SmallVector NamesData; - SmallVector> Counts; + SmallVector Counts; SmallVector<__llvm_profile_data> Data; + SmallVector NamesData; Triple TargetTriple; void dump() const; + Error write() const; }; /// Subclass of GlobalTy that holds the memory for a global of \p Ty. diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index 1fce2448922624..2f16b6e3c139e9 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -205,7 +205,7 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, GlobalTy CountGlobal(NameOrErr->str(), Sym.getSize(), Counts.data()); if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal)) return Err; - DeviceProfileData.Counts.push_back(std::move(Counts)); + DeviceProfileData.Counts.append(std::move(Counts)); } else if (NameOrErr->starts_with(getInstrProfDataVarPrefix())) { // Read profiling data for this global variable __llvm_profile_data Data{}; @@ -223,15 +223,14 @@ void GPUProfGlobals::dump() const { << "\n"; outs() << "======== Counters =========\n"; - for (const auto &Count : Counts) { - outs() << "["; - for (size_t i = 0; i < Count.size(); i++) { - if (i == 0) - outs() << " "; - outs() << Count[i] << " "; - } - outs() << "]\n"; + for (size_t i = 0; i < Counts.size(); i++) { + if (i > 0 && i % 10 == 0) + outs() << "\n"; + else if (i != 0) + outs() << " "; + outs() << Counts[i]; } + outs() << "\n"; outs() << "========== Data ===========\n"; for (const auto &ProfData : Data) { @@ -256,3 +255,43 @@ void GPUProfGlobals::dump() const { Symtab.dumpNames(outs()); outs() << "===========================\n"; } + +Error GPUProfGlobals::write() const { + if (!__llvm_write_custom_profile) + return Plugin::error("Could not find symbol __llvm_write_custom_profile. " + "The compiler-rt profiling library must be linked for " + "GPU PGO to work."); + + size_t DataSize = Data.size() * sizeof(__llvm_profile_data), + CountsSize = Counts.size() * sizeof(int64_t); + __llvm_profile_data *DataBegin, *DataEnd; + char *CountersBegin, *CountersEnd, *NamesBegin, *NamesEnd; + + // Initialize array of contiguous data. We need to make sure each section is + // contiguous so that the PGO library can compute deltas properly + SmallVector ContiguousData(NamesData.size() + DataSize + CountsSize); + + // Compute region pointers + DataBegin = (__llvm_profile_data *)(ContiguousData.data() + CountsSize); + DataEnd = + (__llvm_profile_data *)(ContiguousData.data() + CountsSize + DataSize); + CountersBegin = (char *)ContiguousData.data(); + CountersEnd = (char *)(ContiguousData.data() + CountsSize); + NamesBegin = (char *)(ContiguousData.data() + CountsSize + DataSize); + NamesEnd = (char *)(ContiguousData.data() + CountsSize + DataSize + + NamesData.size()); + + // Copy data to contiguous buffer + memcpy(DataBegin, Data.data(), DataSize); + memcpy(CountersBegin, Counts.data(), CountsSize); + memcpy(NamesBegin, NamesData.data(), NamesData.size()); + + // Invoke compiler-rt entrypoint + int result = __llvm_write_custom_profile(TargetTriple.str().c_str(), + DataBegin, DataEnd, CountersBegin, + CountersEnd, NamesBegin, NamesEnd); + if (result != 0) + return Plugin::error("Error writing GPU PGO data to file"); + + return Plugin::success(); +} diff --git a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp index 1ea93795ce8ce4..d5e6b6128152dc 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp @@ -837,8 +837,10 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { if (!ProfOrErr) return ProfOrErr.takeError(); - // TODO: write data to profiling file - ProfOrErr->dump(); + // Write data to profiling file + if (auto Err = ProfOrErr->write()) { + consumeError(std::move(Err)); + } } // Delete the memory manager before deinitializing the device. Otherwise, >From b8c916305acf08c0bd2d51b81875be5e8fc59ff3 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 13 Mar 2024 20:05:32 -0500 Subject: [PATCH 21/38] Fix tests --- .../plugins-nextgen/common/src/PluginInterface.cpp | 3 +++ openmp/libomptarget/test/offloading/pgo1.c | 8 ++------ 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp index d5e6b6128152dc..2359ad28a25b04 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp @@ -837,6 +837,9 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { if (!ProfOrErr) return ProfOrErr.takeError(); + // Dump out profdata + ProfOrErr->dump(); + // Write data to profiling file if (auto Err = ProfOrErr->write()) { consumeError(std::move(Err)); diff --git a/openmp/libomptarget/test/offloading/pgo1.c b/openmp/libomptarget/test/offloading/pgo1.c index d95793b508dcfc..79e93d0f10827f 100644 --- a/openmp/libomptarget/test/offloading/pgo1.c +++ b/openmp/libomptarget/test/offloading/pgo1.c @@ -32,9 +32,7 @@ int main() { } // CLANG-PGO: ======== Counters ========= -// CLANG-PGO-NEXT: [ 0 11 20 ] -// CLANG-PGO-NEXT: [ 10 ] -// CLANG-PGO-NEXT: [ 20 ] +// CLANG-PGO-NEXT: 0 11 20 10 20 // CLANG-PGO-NEXT: ========== Data =========== // CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} // CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} @@ -55,9 +53,7 @@ int main() { // CLANG-PGO-NEXT: test2 // LLVM-PGO: ======== Counters ========= -// LLVM-PGO-NEXT: [ 20 ] -// LLVM-PGO-NEXT: [ 10 ] -// LLVM-PGO-NEXT: [ 20 10 1 1 ] +// LLVM-PGO-NEXT: 20 10 20 10 1 1 // LLVM-PGO-NEXT: ========== Data =========== // LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} // LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} >From 7770b37a5a4c40bd45887f762bd7f1e652bc0ed2 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 7 May 2024 16:31:48 -0500 Subject: [PATCH 22/38] Fix params --- compiler-rt/lib/profile/InstrProfilingFile.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index 466bfe480543bc..bc1d40a37a5ad6 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1360,9 +1360,10 @@ int __llvm_write_custom_profile(const char *Target, initFileWriter(&fileWriter, OutputFile); /* Write custom data to the file */ - ReturnValue = lprofWriteDataImpl( - &fileWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, - lprofGetVPDataReader(), NamesBegin, NamesEnd, MergeDone); + ReturnValue = + lprofWriteDataImpl(&fileWriter, DataBegin, DataEnd, CountersBegin, + CountersEnd, NULL, NULL, lprofGetVPDataReader(), NULL, + NULL, NULL, NULL, NamesBegin, NamesEnd, MergeDone); closeFileObject(OutputFile); >From aa895a1788969a0d27692057a1457074e9772c78 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 18 Mar 2024 21:31:32 -0500 Subject: [PATCH 23/38] Fix elf obj file --- offload/plugins-nextgen/common/src/GlobalHandler.cpp | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/offload/plugins-nextgen/common/src/GlobalHandler.cpp b/offload/plugins-nextgen/common/src/GlobalHandler.cpp index 80cdcaff75528e..7717e19a5b6779 100644 --- a/offload/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/offload/plugins-nextgen/common/src/GlobalHandler.cpp @@ -177,16 +177,19 @@ Expected GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, DeviceImageTy &Image) { GPUProfGlobals DeviceProfileData; - auto ELFObj = getELFObjectFile(Image); - if (!ELFObj) - return ELFObj.takeError(); + auto ObjFile = getELFObjectFile(Image); + if (!ObjFile) + return ObjFile.takeError(); + + std::unique_ptr ELFObj( + static_cast(ObjFile->release())); DeviceProfileData.TargetTriple = ELFObj->makeTriple(); // Iterate through elf symbols for (auto &Sym : ELFObj->symbols()) { auto NameOrErr = Sym.getName(); if (!NameOrErr) - return ELFObj.takeError(); + return NameOrErr.takeError(); // Check if given current global is a profiling global based // on name >From 2031e49c2b26864f2dab72e629eb6cbe34928a7a Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 6 May 2024 23:13:58 -0500 Subject: [PATCH 24/38] Add more addrspace casts for GPU targets --- .../Transforms/Instrumentation/InstrProfiling.cpp | 11 ++++++++--- .../Instrumentation/PGOInstrumentation.cpp | 13 +++++++++---- 2 files changed, 17 insertions(+), 7 deletions(-) diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index a6b1e0d488120a..dd8c027c4bbf62 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -869,6 +869,8 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { llvm::InstrProfValueKind::IPVK_MemOPSize); CallInst *Call = nullptr; auto *TLI = &GetTLI(*Ind->getFunction()); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + DataVar, PointerType::getUnqual(M.getContext())); // To support value profiling calls within Windows exception handlers, funclet // information contained within operand bundles needs to be copied over to @@ -877,11 +879,13 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { SmallVector OpBundles; Ind->getOperandBundlesAsDefs(OpBundles); if (!IsMemOpSize) { - Value *Args[3] = {Ind->getTargetValue(), DataVar, Builder.getInt32(Index)}; + Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Builder.getInt32(Index)}; Call = Builder.CreateCall(getOrInsertValueProfilingCall(M, *TLI), Args, OpBundles); } else { - Value *Args[3] = {Ind->getTargetValue(), DataVar, Builder.getInt32(Index)}; + Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Builder.getInt32(Index)}; Call = Builder.CreateCall( getOrInsertValueProfilingCall(M, *TLI, ValueProfilingCallType::MemOp), Args, OpBundles); @@ -1575,7 +1579,8 @@ void InstrLowerer::createDataVariable(InstrProfCntrInstBase *Inc) { getInstrProfSectionName(IPSK_vals, TT.getObjectFormat())); ValuesVar->setAlignment(Align(8)); maybeSetComdat(ValuesVar, Fn, CntsVarName); - ValuesPtrExpr = ValuesVar; + ValuesPtrExpr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + ValuesVar, PointerType::getUnqual(Fn->getContext())); } uint64_t NumCounters = Inc->getNumCounters()->getZExtValue(); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index 4b51396a8baa35..ee1657ba8400ee 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -1007,12 +1007,15 @@ static void instrumentOneFunc( ToProfile = Builder.CreatePtrToInt(Cand.V, Builder.getInt64Ty()); assert(ToProfile && "value profiling Value is of unexpected type"); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + Name, PointerType::get(M->getContext(), 0)); + SmallVector OpBundles; populateEHOperandBundle(Cand, BlockColors, OpBundles); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_value_profile), - {FuncInfo.FuncNameVar, Builder.getInt64(FuncInfo.FunctionHash), - ToProfile, Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, + {NormalizedPtr, Builder.getInt64(FuncInfo.FunctionHash), ToProfile, + Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, OpBundles); } } // IPVK_First <= Kind <= IPVK_Last @@ -1685,10 +1688,12 @@ void SelectInstVisitor::instrumentOneSelectInst(SelectInst &SI) { IRBuilder<> Builder(&SI); Type *Int64Ty = Builder.getInt64Ty(); auto *Step = Builder.CreateZExt(SI.getCondition(), Int64Ty); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + FuncNameVar, PointerType::get(M->getContext(), 0)); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_increment_step), - {FuncNameVar, Builder.getInt64(FuncHash), Builder.getInt32(TotalNumCtrs), - Builder.getInt32(*CurCtrIdx), Step}); + {NormalizedPtr, Builder.getInt64(FuncHash), + Builder.getInt32(TotalNumCtrs), Builder.getInt32(*CurCtrIdx), Step}); ++(*CurCtrIdx); } >From be6524bb4f77de0add1e698f68115fd336f32238 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 13 May 2024 17:41:00 -0500 Subject: [PATCH 25/38] Have test read from profraw instead of dump --- offload/test/lit.cfg | 2 + offload/test/offloading/pgo1.c | 94 ++++++++++++++++------------------ 2 files changed, 46 insertions(+), 50 deletions(-) diff --git a/offload/test/lit.cfg b/offload/test/lit.cfg index 069110dc69a6e4..38e6a33b01fafc 100644 --- a/offload/test/lit.cfg +++ b/offload/test/lit.cfg @@ -391,6 +391,8 @@ if config.test_fortran_compiler: config.available_features.add('flang') config.substitutions.append(("%flang", config.test_fortran_compiler)) +config.substitutions.append(("%target_triple", config.libomptarget_current_target)) + config.substitutions.append(("%openmp_flags", config.test_openmp_flags)) if config.libomptarget_current_target.startswith('nvptx') and config.cuda_path: config.substitutions.append(("%cuda_flags", "--cuda-path=" + config.cuda_path)) diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index 79e93d0f10827f..d22d5340f5b3ec 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -1,22 +1,21 @@ -// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ -// RUN: -Xclang "-fprofile-instrument=clang" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ -// RUN: --check-prefix="CLANG-PGO" -// RUN: %libomptarget-compile-generic -fprofile-generate \ -// RUN: -Xclang "-fprofile-instrument=llvm" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ +// RUN: %libomptarget-compile-generic -Xclang "-fprofile-instrument=llvm" +// RUN: env LLVM_PROFILE_FILE=llvm.profraw %libomptarget-run-generic 2>&1 +// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %target_triple.llvm.profraw | %fcheck-generic \ // RUN: --check-prefix="LLVM-PGO" +// RUN: %libomptarget-compile-generic -Xclang "-fprofile-instrument=clang" +// RUN: env LLVM_PROFILE_FILE=clang.profraw %libomptarget-run-generic 2>&1 +// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %target_triple.clang.profraw | %fcheck-generic \ +// RUN: --check-prefix="CLANG-PGO" + // UNSUPPORTED: x86_64-pc-linux-gnu // UNSUPPORTED: x86_64-pc-linux-gnu-LTO // UNSUPPORTED: aarch64-unknown-linux-gnu // UNSUPPORTED: aarch64-unknown-linux-gnu-LTO // REQUIRES: pgo -#ifdef _OPENMP -#include -#endif - int test1(int a) { return a / 2; } int test2(int a) { return a * 2; } @@ -31,43 +30,38 @@ int main() { } } -// CLANG-PGO: ======== Counters ========= -// CLANG-PGO-NEXT: 0 11 20 10 20 -// CLANG-PGO-NEXT: ========== Data =========== -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: ======== Functions ======== -// CLANG-PGO-NEXT: pgo1.c: -// CLANG-PGO-SAME: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} -// CLANG-PGO-NEXT: test1 -// CLANG-PGO-NEXT: test2 +// LLVM-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: +// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-PGO: Counters: 4 +// LLVM-PGO: Function count: 20 +// LLVM-PGO: Block counts: [10, 20, 10] + +// LLVM-PGO-LABEL: test1: +// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-PGO: Counters: 1 +// LLVM-PGO: Function count: 1 +// LLVM-PGO: Block counts: [] + +// LLVM-PGO-LABEL: test2: +// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-PGO: Counters: 1 +// LLVM-PGO: Function count: 1 +// LLVM-PGO: Block counts: [] + +// CLANG-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: +// CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-PGO: Counters: 3 +// CLANG-PGO: Function count: 0 +// CLANG-PGO: Block counts: [11, 20] + +// CLANG-PGO-LABEL: test1: +// CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-PGO: Counters: 1 +// CLANG-PGO: Function count: 10 +// CLANG-PGO: Block counts: [] -// LLVM-PGO: ======== Counters ========= -// LLVM-PGO-NEXT: 20 10 20 10 1 1 -// LLVM-PGO-NEXT: ========== Data =========== -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: ======== Functions ======== -// LLVM-PGO-NEXT: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} -// LLVM-PGO-NEXT: test1 -// LLVM-PGO-NEXT: test2 +// CLANG-PGO-LABEL: test2: +// CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-PGO: Counters: 1 +// CLANG-PGO: Function count: 20 +// CLANG-PGO: Block counts: [] >From 2b8eb2935ec21bf0acc5c56f45837b5976560963 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 24 May 2024 19:59:33 -0500 Subject: [PATCH 26/38] Fix PGO test format --- offload/test/offloading/pgo1.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index d22d5340f5b3ec..0e75c684ed9263 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -33,20 +33,17 @@ int main() { // LLVM-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 4 -// LLVM-PGO: Function count: 20 -// LLVM-PGO: Block counts: [10, 20, 10] +// LLVM-PGO: Block counts: [20, 10, 20, 10] // LLVM-PGO-LABEL: test1: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Function count: 1 -// LLVM-PGO: Block counts: [] +// LLVM-PGO: Block counts: [1] // LLVM-PGO-LABEL: test2: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Function count: 1 -// LLVM-PGO: Block counts: [] +// LLVM-PGO: Block counts: [1] // CLANG-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} >From 67f3009173d815295f36e2b37e85add1347e3bf9 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 24 May 2024 20:45:04 -0500 Subject: [PATCH 27/38] Refactor profile writer --- compiler-rt/lib/profile/InstrProfilingFile.c | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index bc1d40a37a5ad6..76238214c13aa3 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1344,8 +1344,7 @@ int __llvm_write_custom_profile(const char *Target, forceTruncateFile(TargetFilename); /* Open target-specific PGO file */ - MergeDone = 0; - FILE *OutputFile = getMergeFileObject(TargetFilename, &MergeDone); + FILE *OutputFile = getFileObject(TargetFilename); if (!OutputFile) { PROF_ERR("Failed to open file : %s\n", TargetFilename); @@ -1356,15 +1355,11 @@ int __llvm_write_custom_profile(const char *Target, FreeHook = &free; setupIOBuffer(); - ProfDataWriter fileWriter; - initFileWriter(&fileWriter, OutputFile); - - /* Write custom data to the file */ - ReturnValue = - lprofWriteDataImpl(&fileWriter, DataBegin, DataEnd, CountersBegin, - CountersEnd, NULL, NULL, lprofGetVPDataReader(), NULL, - NULL, NULL, NULL, NamesBegin, NamesEnd, MergeDone); + /* Write custom data */ + ReturnValue = __llvm_profile_write_buffer_internal( + OutputFile, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, + NamesBegin, NamesEnd); closeFileObject(OutputFile); // Restore SIGKILL. >From e8ad1322c557f7b48e2b28fe3a34a696a1103bba Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 27 May 2024 18:29:18 -0500 Subject: [PATCH 28/38] Fix refactor bug --- compiler-rt/lib/profile/InstrProfilingFile.c | 52 ++++++++++---------- offload/test/offloading/pgo1.c | 6 ++- 2 files changed, 29 insertions(+), 29 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index 76238214c13aa3..784cb9af6169d8 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -505,14 +505,6 @@ static FILE *getFileObject(const char *OutputName) { return fopen(OutputName, "ab"); } -/* Get file object and merge if applicable */ -static FILE *getMergeFileObject(const char *OutputName, int *MergeDone) { - VPMergeHook = &lprofMergeValueProfData; - if (doMerging()) - return openFileForMerging(OutputName, MergeDone); - return getFileObject(OutputName); -} - static void closeFileObject(FILE *OutputFile) { if (OutputFile == getProfileFile()) { fflush(OutputFile); @@ -526,8 +518,15 @@ static void closeFileObject(FILE *OutputFile) { /* Write profile data to file \c OutputName. */ static int writeFile(const char *OutputName) { - int RetVal, MergeDone = 0; - FILE *OutputFile = getMergeFileObject(OutputName, &MergeDone); + int RetVal; + FILE *OutputFile; + + int MergeDone = 0; + VPMergeHook = &lprofMergeValueProfData; + if (doMerging()) + OutputFile = openFileForMerging(OutputName, &MergeDone); + else + OutputFile = getFileObject(OutputName); if (!OutputFile) return -1; @@ -565,16 +564,10 @@ static int writeOrderFile(const char *OutputName) { #define LPROF_INIT_ONCE_ENV "__LLVM_PROFILE_RT_INIT_ONCE" -static void forceTruncateFile(const char *Filename) { - FILE *File = fopen(Filename, "w"); - if (!File) - return; - fclose(File); -} - static void truncateCurrentFile(void) { const char *Filename; char *FilenameBuf; + FILE *File; int Length; Length = getCurFilenameLength(); @@ -604,7 +597,10 @@ static void truncateCurrentFile(void) { return; /* Truncate the file. Later we'll reopen and append. */ - forceTruncateFile(Filename); + File = fopen(Filename, "w"); + if (!File) + return; + fclose(File); } /* Write a partial profile to \p Filename, which is required to be backed by @@ -1287,7 +1283,7 @@ int __llvm_write_custom_profile(const char *Target, const char *CountersBegin, const char *CountersEnd, const char *NamesBegin, const char *NamesEnd) { - int ReturnValue = 0, FilenameLength, TargetLength, MergeDone; + int ReturnValue = 0, FilenameLength, TargetLength; char *FilenameBuf, *TargetFilename; const char *Filename; @@ -1340,11 +1336,9 @@ int __llvm_write_custom_profile(const char *Target, return -1; } - /* Clean old target file */ - forceTruncateFile(TargetFilename); - - /* Open target-specific PGO file */ - FILE *OutputFile = getFileObject(TargetFilename); + /* Open and truncate target-specific PGO file */ + FILE *OutputFile = fopen(TargetFilename, "w"); + setProfileFile(OutputFile); if (!OutputFile) { PROF_ERR("Failed to open file : %s\n", TargetFilename); @@ -1357,9 +1351,13 @@ int __llvm_write_custom_profile(const char *Target, setupIOBuffer(); /* Write custom data */ - ReturnValue = __llvm_profile_write_buffer_internal( - OutputFile, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, - NamesBegin, NamesEnd); + ProfDataWriter fileWriter; + initFileWriter(&fileWriter, OutputFile); + + /* Write custom data to the file */ + ReturnValue = lprofWriteDataImpl( + &fileWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, + lprofGetVPDataReader(), NULL, NULL, NULL, NULL, NamesBegin, NamesEnd, 0); closeFileObject(OutputFile); // Restore SIGKILL. diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index 0e75c684ed9263..d6747113265803 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -1,10 +1,12 @@ -// RUN: %libomptarget-compile-generic -Xclang "-fprofile-instrument=llvm" +// RUN: %libomptarget-compile-generic -fprofile-generate \ +// RUN: -Xclang "-fprofile-instrument=llvm" // RUN: env LLVM_PROFILE_FILE=llvm.profraw %libomptarget-run-generic 2>&1 // RUN: llvm-profdata show --all-functions --counts \ // RUN: %target_triple.llvm.profraw | %fcheck-generic \ // RUN: --check-prefix="LLVM-PGO" -// RUN: %libomptarget-compile-generic -Xclang "-fprofile-instrument=clang" +// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ +// RUN: -Xclang "-fprofile-instrument=clang" // RUN: env LLVM_PROFILE_FILE=clang.profraw %libomptarget-run-generic 2>&1 // RUN: llvm-profdata show --all-functions --counts \ // RUN: %target_triple.clang.profraw | %fcheck-generic \ >From 4c9f814ce14aeb6766a93f5c1d15b847b98dc29f Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 28 May 2024 12:58:43 -0500 Subject: [PATCH 29/38] Make requested clang-format change --- offload/plugins-nextgen/common/include/GlobalHandler.h | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/offload/plugins-nextgen/common/include/GlobalHandler.h b/offload/plugins-nextgen/common/include/GlobalHandler.h index 017d7e994f07a8..1d7b9f80f9dfd3 100644 --- a/offload/plugins-nextgen/common/include/GlobalHandler.h +++ b/offload/plugins-nextgen/common/include/GlobalHandler.h @@ -64,12 +64,10 @@ struct __llvm_profile_data { }; extern "C" { -extern int __attribute__((weak)) -__llvm_write_custom_profile(const char *Target, - const __llvm_profile_data *DataBegin, - const __llvm_profile_data *DataEnd, - const char *CountersBegin, const char *CountersEnd, - const char *NamesBegin, const char *NamesEnd); +extern int __attribute__((weak)) __llvm_write_custom_profile( + const char *Target, const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, const char *CountersBegin, + const char *CountersEnd, const char *NamesBegin, const char *NamesEnd); } /// PGO profiling data extracted from a GPU device >From 344e357de657f54c068be969dcfc3ea33f2f026e Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 31 May 2024 20:29:20 -0500 Subject: [PATCH 30/38] Tighten PGO test requirements Require compiler-rt to be an enabled runtime --- offload/test/CMakeLists.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/offload/test/CMakeLists.txt b/offload/test/CMakeLists.txt index 32df1e47afaeb2..41ab339147791c 100644 --- a/offload/test/CMakeLists.txt +++ b/offload/test/CMakeLists.txt @@ -12,10 +12,10 @@ else() set(LIBOMPTARGET_DEBUG False) endif() -if (OPENMP_STANDALONE_BUILD) - set(LIBOMPTARGET_TEST_GPU_PGO False) -else() +if (NOT OPENMP_STANDALONE_BUILD AND "compiler-rt" IN_LIST LLVM_ENABLE_RUNTIMES) set(LIBOMPTARGET_TEST_GPU_PGO True) +else() + set(LIBOMPTARGET_TEST_GPU_PGO False) endif() # Replace the space from user's input with ";" in case that CMake add escape >From 2f751420b9ad2ffc7c9fac4a645724b45cdae59a Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 31 May 2024 20:29:20 -0500 Subject: [PATCH 31/38] Tighten PGO test requirements Require compiler-rt to be an enabled runtime --- offload/test/CMakeLists.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/offload/test/CMakeLists.txt b/offload/test/CMakeLists.txt index 32df1e47afaeb2..41ab339147791c 100644 --- a/offload/test/CMakeLists.txt +++ b/offload/test/CMakeLists.txt @@ -12,10 +12,10 @@ else() set(LIBOMPTARGET_DEBUG False) endif() -if (OPENMP_STANDALONE_BUILD) - set(LIBOMPTARGET_TEST_GPU_PGO False) -else() +if (NOT OPENMP_STANDALONE_BUILD AND "compiler-rt" IN_LIST LLVM_ENABLE_RUNTIMES) set(LIBOMPTARGET_TEST_GPU_PGO True) +else() + set(LIBOMPTARGET_TEST_GPU_PGO False) endif() # Replace the space from user's input with ";" in case that CMake add escape >From 488cb4a349fdfbd73d0a78ddb2c17522c46145ba Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 26 Jun 2024 18:18:31 -0500 Subject: [PATCH 32/38] Apply requested formatting changes --- clang/lib/CodeGen/CodeGenPGO.cpp | 11 +++++----- llvm/lib/ProfileData/InstrProf.cpp | 4 ++-- .../Instrumentation/InstrProfiling.cpp | 10 ++++----- .../Instrumentation/PGOInstrumentation.cpp | 21 ++++++++++--------- offload/DeviceRTL/src/Profiling.cpp | 6 ++++-- 5 files changed, 28 insertions(+), 24 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index a7ce0b8f6a35f3..3edfbdd679c61d 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1199,12 +1199,13 @@ void CodeGenPGO::emitCounterSetOrIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); + auto *NormalizedFuncNameVarPtr = + llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( + FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); - llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), - Builder.getInt32(NumRegionCounters), - Builder.getInt32(Counter), StepV}; + llvm::Value *Args[] = { + NormalizedFuncNameVarPtr, Builder.getInt64(FunctionHash), + Builder.getInt32(NumRegionCounters), Builder.getInt32(Counter), StepV}; if (llvm::EnableSingleByteCoverage) Builder.CreateCall(CGM.getIntrinsic(llvm::Intrinsic::instrprof_cover), diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp index 1284efd4b5f4da..6742435c9d065e 100644 --- a/llvm/lib/ProfileData/InstrProf.cpp +++ b/llvm/lib/ProfileData/InstrProf.cpp @@ -433,8 +433,8 @@ std::string getPGOFuncNameVarName(StringRef FuncName, } bool isGPUProfTarget(const Module &M) { - const auto &Triple = llvm::Triple(M.getTargetTriple()); - return Triple.isAMDGPU() || Triple.isNVPTX(); + const auto &T = Triple(M.getTargetTriple()); + return T.isAMDGPU() || T.isNVPTX(); } void setPGOFuncVisibility(Module &M, GlobalVariable *FuncNameVar) { diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index dd8c027c4bbf62..05cef1236f0879 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -869,8 +869,8 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { llvm::InstrProfValueKind::IPVK_MemOPSize); CallInst *Call = nullptr; auto *TLI = &GetTLI(*Ind->getFunction()); - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( - DataVar, PointerType::getUnqual(M.getContext())); + auto *NormalizedDataVarPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + DataVar, PointerType::get(M.getContext(), 0)); // To support value profiling calls within Windows exception handlers, funclet // information contained within operand bundles needs to be copied over to @@ -879,12 +879,12 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { SmallVector OpBundles; Ind->getOperandBundlesAsDefs(OpBundles); if (!IsMemOpSize) { - Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Value *Args[3] = {Ind->getTargetValue(), NormalizedDataVarPtr, Builder.getInt32(Index)}; Call = Builder.CreateCall(getOrInsertValueProfilingCall(M, *TLI), Args, OpBundles); } else { - Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Value *Args[3] = {Ind->getTargetValue(), NormalizedDataVarPtr, Builder.getInt32(Index)}; Call = Builder.CreateCall( getOrInsertValueProfilingCall(M, *TLI, ValueProfilingCallType::MemOp), @@ -1580,7 +1580,7 @@ void InstrLowerer::createDataVariable(InstrProfCntrInstBase *Inc) { ValuesVar->setAlignment(Align(8)); maybeSetComdat(ValuesVar, Fn, CntsVarName); ValuesPtrExpr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( - ValuesVar, PointerType::getUnqual(Fn->getContext())); + ValuesVar, PointerType::get(Fn->getContext(), 0)); } uint64_t NumCounters = Inc->getNumCounters()->getZExtValue(); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index ee1657ba8400ee..f8f34ea25597f3 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -884,7 +884,7 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + auto *NormalizedNamePtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); @@ -893,7 +893,7 @@ static void instrumentOneFunc( // i32 ) Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_cover), - {NormalizedPtr, CFGHash, Builder.getInt32(1), Builder.getInt32(0)}); + {NormalizedNamePtr, CFGHash, Builder.getInt32(1), Builder.getInt32(0)}); return; } @@ -948,7 +948,7 @@ static void instrumentOneFunc( // i32 ) Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_timestamp), - {NormalizedPtr, CFGHash, Builder.getInt32(NumCounters), + {NormalizedNamePtr, CFGHash, Builder.getInt32(NumCounters), Builder.getInt32(I)}); I += PGOBlockCoverage ? 8 : 1; } @@ -963,7 +963,7 @@ static void instrumentOneFunc( Intrinsic::getDeclaration(M, PGOBlockCoverage ? Intrinsic::instrprof_cover : Intrinsic::instrprof_increment), - {NormalizedPtr, CFGHash, Builder.getInt32(NumCounters), + {NormalizedNamePtr, CFGHash, Builder.getInt32(NumCounters), Builder.getInt32(I++)}); } @@ -1007,15 +1007,15 @@ static void instrumentOneFunc( ToProfile = Builder.CreatePtrToInt(Cand.V, Builder.getInt64Ty()); assert(ToProfile && "value profiling Value is of unexpected type"); - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + auto *NormalizedNamePtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); SmallVector OpBundles; populateEHOperandBundle(Cand, BlockColors, OpBundles); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_value_profile), - {NormalizedPtr, Builder.getInt64(FuncInfo.FunctionHash), ToProfile, - Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, + {NormalizedNamePtr, Builder.getInt64(FuncInfo.FunctionHash), + ToProfile, Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, OpBundles); } } // IPVK_First <= Kind <= IPVK_Last @@ -1688,11 +1688,12 @@ void SelectInstVisitor::instrumentOneSelectInst(SelectInst &SI) { IRBuilder<> Builder(&SI); Type *Int64Ty = Builder.getInt64Ty(); auto *Step = Builder.CreateZExt(SI.getCondition(), Int64Ty); - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( - FuncNameVar, PointerType::get(M->getContext(), 0)); + auto *NormalizedFuncNameVarPtr = + ConstantExpr::getPointerBitCastOrAddrSpaceCast( + FuncNameVar, PointerType::get(M->getContext(), 0)); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_increment_step), - {NormalizedPtr, Builder.getInt64(FuncHash), + {NormalizedFuncNameVarPtr, Builder.getInt64(FuncHash), Builder.getInt32(TotalNumCtrs), Builder.getInt32(*CurCtrIdx), Step}); ++(*CurCtrIdx); } diff --git a/offload/DeviceRTL/src/Profiling.cpp b/offload/DeviceRTL/src/Profiling.cpp index 799477f5e47d27..639c62ceff7a69 100644 --- a/offload/DeviceRTL/src/Profiling.cpp +++ b/offload/DeviceRTL/src/Profiling.cpp @@ -12,8 +12,10 @@ extern "C" { -void __llvm_profile_register_function(void *ptr) {} -void __llvm_profile_register_names_function(void *ptr, long int i) {} +// Provides empty implementations for certain functions in compiler-rt +// that are emitted by the PGO instrumentation. +void __llvm_profile_register_function(void *Ptr) {} +void __llvm_profile_register_names_function(void *Ptr, long int I) {} } #pragma omp end declare target >From b90c01583f1893802aba0180b07a448584585365 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 26 Jun 2024 18:29:59 -0500 Subject: [PATCH 33/38] Add memop function shim to DeviceRTL This comes up sometimes when using LLVM IR level instrumentation. --- offload/DeviceRTL/include/Profiling.h | 1 + offload/DeviceRTL/src/Profiling.cpp | 1 + 2 files changed, 2 insertions(+) diff --git a/offload/DeviceRTL/include/Profiling.h b/offload/DeviceRTL/include/Profiling.h index 9efc1554c176bc..d9947522541219 100644 --- a/offload/DeviceRTL/include/Profiling.h +++ b/offload/DeviceRTL/include/Profiling.h @@ -15,6 +15,7 @@ extern "C" { void __llvm_profile_register_function(void *Ptr); void __llvm_profile_register_names_function(void *Ptr, long int I); +void __llvm_profile_instrument_memop(long int I, void *Ptr, int I2); } #endif diff --git a/offload/DeviceRTL/src/Profiling.cpp b/offload/DeviceRTL/src/Profiling.cpp index 639c62ceff7a69..bb3caaadcc03dd 100644 --- a/offload/DeviceRTL/src/Profiling.cpp +++ b/offload/DeviceRTL/src/Profiling.cpp @@ -16,6 +16,7 @@ extern "C" { // that are emitted by the PGO instrumentation. void __llvm_profile_register_function(void *Ptr) {} void __llvm_profile_register_names_function(void *Ptr, long int I) {} +void __llvm_profile_instrument_memop(long int I, void *Ptr, int I2) {} } #pragma omp end declare target >From c68c6e2fa98a1fe608b88ed38f7db68eae804c5b Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 27 Jun 2024 02:04:27 -0500 Subject: [PATCH 34/38] Make requested changes --- compiler-rt/lib/profile/InstrProfiling.h | 2 +- compiler-rt/lib/profile/InstrProfilingFile.c | 1 - offload/plugins-nextgen/common/src/PluginInterface.cpp | 5 ++--- 3 files changed, 3 insertions(+), 5 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfiling.h b/compiler-rt/lib/profile/InstrProfiling.h index ef1292a45bf01d..eda3e9a673c1af 100644 --- a/compiler-rt/lib/profile/InstrProfiling.h +++ b/compiler-rt/lib/profile/InstrProfiling.h @@ -298,7 +298,7 @@ void __llvm_profile_set_dumped(); /*! * \brief Write custom target-specific profiling data to a seperate file. - * Used by libomptarget for GPU PGO. + * Used by offload PGO. */ int __llvm_write_custom_profile(const char *Target, const __llvm_profile_data *DataBegin, diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index 784cb9af6169d8..93436ecbabb40d 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1321,7 +1321,6 @@ int __llvm_write_custom_profile(const char *Target, /* Prepend "TARGET." to current filename */ memcpy(TargetFilename, Target, TargetLength); TargetFilename[TargetLength] = '.'; - memcpy(TargetFilename, Target, TargetLength); memcpy(TargetFilename + 1 + TargetLength, Filename, FilenameLength); TargetFilename[FilenameLength + 1 + TargetLength] = 0; diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp index c4e1e63777de8a..445f4ad942bd4d 100644 --- a/offload/plugins-nextgen/common/src/PluginInterface.cpp +++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp @@ -843,9 +843,8 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { ProfOrErr->dump(); // Write data to profiling file - if (auto Err = ProfOrErr->write()) { - consumeError(std::move(Err)); - } + if (auto Err = ProfOrErr->write()) + return Err; } // Delete the memory manager before deinitializing the device. Otherwise, >From ca52c58c7fde412897cf6b10b9bbb321812f193d Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 27 Jun 2024 02:26:20 -0500 Subject: [PATCH 35/38] Only dump counters if PGODump flag is set --- offload/include/Shared/Environment.h | 1 + offload/plugins-nextgen/common/src/PluginInterface.cpp | 4 +++- openmp/docs/design/Runtimes.rst | 1 + 3 files changed, 5 insertions(+), 1 deletion(-) diff --git a/offload/include/Shared/Environment.h b/offload/include/Shared/Environment.h index d141146b6bd5a1..86f6d1c6ea2d36 100644 --- a/offload/include/Shared/Environment.h +++ b/offload/include/Shared/Environment.h @@ -30,6 +30,7 @@ enum class DeviceDebugKind : uint32_t { FunctionTracing = 1U << 1, CommonIssues = 1U << 2, AllocationTracker = 1U << 3, + PGODump = 1U << 4, }; struct DeviceEnvironmentTy { diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp index 445f4ad942bd4d..35fb04863d8741 100644 --- a/offload/plugins-nextgen/common/src/PluginInterface.cpp +++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp @@ -840,7 +840,9 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { return ProfOrErr.takeError(); // Dump out profdata - ProfOrErr->dump(); + if ((OMPX_DebugKind.get() & uint32_t(DeviceDebugKind::PGODump)) == + uint32_t(DeviceDebugKind::PGODump)) + ProfOrErr->dump(); // Write data to profiling file if (auto Err = ProfOrErr->write()) diff --git a/openmp/docs/design/Runtimes.rst b/openmp/docs/design/Runtimes.rst index f8a8cb87e83e66..7fc697a838e229 100644 --- a/openmp/docs/design/Runtimes.rst +++ b/openmp/docs/design/Runtimes.rst @@ -1493,3 +1493,4 @@ debugging features are supported. * Enable debugging assertions in the device. ``0x01`` * Enable diagnosing common problems during offloading . ``0x4`` * Enable device malloc statistics (amdgpu only). ``0x8`` + * Dump device PGO counters (only if PGO on GPU is enabled). ``0x10`` >From ee4431a1b57469c7679f54f124ca5f3dd7f0433b Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 9 Aug 2024 20:21:38 -0500 Subject: [PATCH 36/38] Update requirements --- offload/test/offloading/pgo1.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index d6747113265803..fbf6337374a997 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -12,10 +12,7 @@ // RUN: %target_triple.clang.profraw | %fcheck-generic \ // RUN: --check-prefix="CLANG-PGO" -// UNSUPPORTED: x86_64-pc-linux-gnu -// UNSUPPORTED: x86_64-pc-linux-gnu-LTO -// UNSUPPORTED: aarch64-unknown-linux-gnu -// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +// REQUIRES: gpu // REQUIRES: pgo int test1(int a) { return a / 2; } >From fb699b6bca72d42359a304bcbba88f3564ae9ac9 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Sat, 10 Aug 2024 00:54:36 -0500 Subject: [PATCH 37/38] Merge changes --- offload/plugins-nextgen/common/src/GlobalHandler.cpp | 2 +- offload/test/offloading/pgo1.c | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/offload/plugins-nextgen/common/src/GlobalHandler.cpp b/offload/plugins-nextgen/common/src/GlobalHandler.cpp index bca66cff6558a2..d7bfbba01c8efc 100644 --- a/offload/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/offload/plugins-nextgen/common/src/GlobalHandler.cpp @@ -193,7 +193,7 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, // Check if given current global is a profiling global based // on name - if (NameOrErr->equals(getInstrProfNamesVarName())) { + if (*NameOrErr == getInstrProfNamesVarName()) { // Read in profiled function names DeviceProfileData.NamesData = SmallVector(Sym.getSize(), 0); GlobalTy NamesGlobal(NameOrErr->str(), Sym.getSize(), diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index fbf6337374a997..3270ce8f15e7dc 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -32,17 +32,17 @@ int main() { // LLVM-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 4 -// LLVM-PGO: Block counts: [20, 10, 20, 10] +// LLVM-PGO: Block counts: [20, 10, 2, 1] // LLVM-PGO-LABEL: test1: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Block counts: [1] +// LLVM-PGO: Block counts: [10] // LLVM-PGO-LABEL: test2: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Block counts: [1] +// LLVM-PGO: Block counts: [20] // CLANG-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} >From 1d0a961aabe488e6d09b96a80329498b8f586923 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 25 Oct 2024 13:42:19 -0500 Subject: [PATCH 38/38] Add llvm-profdata substitution to offload tests --- offload/test/lit.cfg | 2 ++ offload/test/lit.site.cfg.in | 2 +- offload/test/offloading/pgo1.c | 4 ++-- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/offload/test/lit.cfg b/offload/test/lit.cfg index 7994a08ba063fb..cfd1ad6c3c1eb5 100644 --- a/offload/test/lit.cfg +++ b/offload/test/lit.cfg @@ -112,8 +112,10 @@ config.available_features.add(config.libomptarget_current_target) if config.libomptarget_has_libc: config.available_features.add('libc') +profdata_path = os.path.join(config.bin_llvm_tools_dir, "llvm-profdata") if config.libomptarget_test_pgo: config.available_features.add('pgo') + config.substitutions.append(("%profdata", profdata_path)) # Determine whether the test system supports unified memory. # For CUDA, this is the case with compute capability 70 (Volta) or higher. diff --git a/offload/test/lit.site.cfg.in b/offload/test/lit.site.cfg.in index a1cb5acc38a405..d998fb0c839700 100644 --- a/offload/test/lit.site.cfg.in +++ b/offload/test/lit.site.cfg.in @@ -1,6 +1,6 @@ @AUTO_GEN_COMMENT@ -config.bin_llvm_tools_dir = "@CMAKE_BINARY_DIR@/bin" +config.bin_llvm_tools_dir = "@LLVM_RUNTIME_OUTPUT_INTDIR@" config.test_c_compiler = "@OPENMP_TEST_C_COMPILER@" config.test_cxx_compiler = "@OPENMP_TEST_CXX_COMPILER@" config.test_fortran_compiler="@OPENMP_TEST_Fortran_COMPILER@" diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index 1ef540e430a27a..51671afa62b0db 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -1,14 +1,14 @@ // RUN: %libomptarget-compile-generic -fprofile-generate \ // RUN: -Xclang "-fprofile-instrument=llvm" // RUN: env LLVM_PROFILE_FILE=llvm.profraw %libomptarget-run-generic 2>&1 -// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %profdata show --all-functions --counts \ // RUN: %target_triple.llvm.profraw | %fcheck-generic \ // RUN: --check-prefix="LLVM-PGO" // RUN: %libomptarget-compile-generic -fprofile-instr-generate \ // RUN: -Xclang "-fprofile-instrument=clang" // RUN: env LLVM_PROFILE_FILE=clang.profraw %libomptarget-run-generic 2>&1 -// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %profdata show --all-functions --counts \ // RUN: %target_triple.clang.profraw | %fcheck-generic \ // RUN: --check-prefix="CLANG-PGO" From openmp-commits at lists.llvm.org Fri Oct 25 12:22:42 2024 From: openmp-commits at lists.llvm.org (LLVM Continuous Integration via Openmp-commits) Date: Fri, 25 Oct 2024 12:22:42 -0700 (PDT) Subject: [Openmp-commits] [openmp] [OpenMP] Create versioned libgomp softlinks (PR #112973) In-Reply-To: Message-ID: <671bf002.170a0220.39bf75.6eb0@mx.google.com> llvm-ci wrote: LLVM Buildbot has detected a new failure on builder `openmp-offload-libc-amdgpu-runtime` running on `omp-vega20-1` while building `openmp` at step 11 "Add check check-libc-amdgcn-amd-amdhsa". Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/7574
Here is the relevant piece of the build log for the reference ``` Step 11 (Add check check-libc-amdgcn-amd-amdhsa) failure: 1200 seconds without output running [b'ninja', b'-j 32', b'check-libc-amdgcn-amd-amdhsa'], attempting to kill ... [2559/2681] Linking CXX executable libc/test/src/locale/libc.test.src.locale.locale_test.__hermetic__.__build__ [2560/2681] Linking CXX executable libc/test/src/time/libc.test.src.time.clock_test.__hermetic__.__build__ [2561/2681] Linking CXX executable libc/test/src/string/libc.test.src.string.strxfrm_test.__hermetic__.__build__ [2562/2681] Linking CXX executable libc/test/src/string/libc.test.src.string.strncat_test.__hermetic__.__build__ [2563/2681] Linking CXX executable libc/test/src/inttypes/libc.test.src.inttypes.strtoimax_test.__hermetic__.__build__ [2564/2681] Linking CXX executable libc/test/src/inttypes/libc.test.src.inttypes.strtoumax_test.__hermetic__.__build__ [2565/2681] Linking CXX executable libc/test/src/string/libc.test.src.string.memmove_test.__hermetic__.__build__ [2566/2681] Linking CXX executable libc/test/src/time/libc.test.src.time.clock_gettime_test.__hermetic__.__build__ [2567/2681] Linking CXX executable libc/test/src/stdio/libc.test.src.stdio.asprintf_test.__hermetic__.__build__ [2568/2681] Linking CXX executable libc/test/src/stdio/libc.test.src.stdio.vasprintf_test.__hermetic__.__build__ command timed out: 1200 seconds without output running [b'ninja', b'-j 32', b'check-libc-amdgcn-amd-amdhsa'], attempting to kill process killed by signal 9 program finished with exit code -1 elapsedTime=1469.198629 ```
https://github.com/llvm/llvm-project/pull/112973 From openmp-commits at lists.llvm.org Mon Oct 28 16:46:18 2024 From: openmp-commits at lists.llvm.org (Ethan Luis McDonough via Openmp-commits) Date: Mon, 28 Oct 2024 16:46:18 -0700 (PDT) Subject: [Openmp-commits] [clang] [compiler-rt] [llvm] [openmp] [PGO][Offload] Profile profraw generation for GPU instrumentation #76587 (PR #93365) In-Reply-To: Message-ID: <6720224a.170a0220.28b94a.875f@mx.google.com> https://github.com/EthanLuisMcDonough updated https://github.com/llvm/llvm-project/pull/93365 >From 530eb982b9770190377bb0bd09c5cb715f34d484 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 15 Dec 2023 20:38:38 -0600 Subject: [PATCH 01/39] Add profiling functions to libomptarget --- .../include/llvm/Frontend/OpenMP/OMPKinds.def | 3 +++ openmp/libomptarget/DeviceRTL/CMakeLists.txt | 2 ++ .../DeviceRTL/include/Profiling.h | 21 +++++++++++++++++++ .../libomptarget/DeviceRTL/src/Profiling.cpp | 19 +++++++++++++++++ 4 files changed, 45 insertions(+) create mode 100644 openmp/libomptarget/DeviceRTL/include/Profiling.h create mode 100644 openmp/libomptarget/DeviceRTL/src/Profiling.cpp diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def index d22d2a8e948b00..1d887d5cb58127 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def +++ b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def @@ -503,6 +503,9 @@ __OMP_RTL(__kmpc_barrier_simple_generic, false, Void, IdentPtr, Int32) __OMP_RTL(__kmpc_warp_active_thread_mask, false, Int64,) __OMP_RTL(__kmpc_syncwarp, false, Void, Int64) +__OMP_RTL(__llvm_profile_register_function, false, Void, VoidPtr) +__OMP_RTL(__llvm_profile_register_names_function, false, Void, VoidPtr, Int64) + __OMP_RTL(__last, false, Void, ) #undef __OMP_RTL diff --git a/openmp/libomptarget/DeviceRTL/CMakeLists.txt b/openmp/libomptarget/DeviceRTL/CMakeLists.txt index 1ce3e1e40a80ab..55ee15d068c67b 100644 --- a/openmp/libomptarget/DeviceRTL/CMakeLists.txt +++ b/openmp/libomptarget/DeviceRTL/CMakeLists.txt @@ -89,6 +89,7 @@ set(include_files ${include_directory}/Interface.h ${include_directory}/LibC.h ${include_directory}/Mapping.h + ${include_directory}/Profiling.h ${include_directory}/State.h ${include_directory}/Synchronization.h ${include_directory}/Types.h @@ -104,6 +105,7 @@ set(src_files ${source_directory}/Mapping.cpp ${source_directory}/Misc.cpp ${source_directory}/Parallelism.cpp + ${source_directory}/Profiling.cpp ${source_directory}/Reduction.cpp ${source_directory}/State.cpp ${source_directory}/Synchronization.cpp diff --git a/openmp/libomptarget/DeviceRTL/include/Profiling.h b/openmp/libomptarget/DeviceRTL/include/Profiling.h new file mode 100644 index 00000000000000..68c7744cd60752 --- /dev/null +++ b/openmp/libomptarget/DeviceRTL/include/Profiling.h @@ -0,0 +1,21 @@ +//===-------- Profiling.h - OpenMP interface ---------------------- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// +//===----------------------------------------------------------------------===// + +#ifndef OMPTARGET_DEVICERTL_PROFILING_H +#define OMPTARGET_DEVICERTL_PROFILING_H + +extern "C" { + +void __llvm_profile_register_function(void *ptr); +void __llvm_profile_register_names_function(void *ptr, long int i); +} + +#endif diff --git a/openmp/libomptarget/DeviceRTL/src/Profiling.cpp b/openmp/libomptarget/DeviceRTL/src/Profiling.cpp new file mode 100644 index 00000000000000..799477f5e47d27 --- /dev/null +++ b/openmp/libomptarget/DeviceRTL/src/Profiling.cpp @@ -0,0 +1,19 @@ +//===------- Profiling.cpp ---------------------------------------- C++ ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#include "Profiling.h" + +#pragma omp begin declare target device_type(nohost) + +extern "C" { + +void __llvm_profile_register_function(void *ptr) {} +void __llvm_profile_register_names_function(void *ptr, long int i) {} +} + +#pragma omp end declare target >From fb067d4ffe604fd68cf90b705db1942bce49dbb1 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Sat, 16 Dec 2023 01:18:41 -0600 Subject: [PATCH 02/39] Fix PGO instrumentation for GPU targets --- clang/lib/CodeGen/CodeGenPGO.cpp | 10 ++++++++-- .../lib/Transforms/Instrumentation/InstrProfiling.cpp | 11 ++++++++--- 2 files changed, 16 insertions(+), 5 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index 81bf8ea696b164..edae6885b528ac 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -959,8 +959,14 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, unsigned Counter = (*RegionCounterMap)[S]; - llvm::Value *Args[] = {FuncNameVar, - Builder.getInt64(FunctionHash), + // Make sure that pointer to global is passed in with zero addrspace + // This is relevant during GPU profiling + auto *I8Ty = llvm::Type::getInt8Ty(CGM.getLLVMContext()); + auto *I8PtrTy = llvm::PointerType::getUnqual(I8Ty); + auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( + FuncNameVar, I8PtrTy); + + llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), Builder.getInt32(NumRegionCounters), Builder.getInt32(Counter), StepV}; if (!StepV) diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index fe5a0578bd9721..d2cb8155c17967 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -1658,10 +1658,13 @@ void InstrLowerer::emitRegistration() { IRBuilder<> IRB(BasicBlock::Create(M.getContext(), "", RegisterF)); for (Value *Data : CompilerUsedVars) if (!isa(Data)) - IRB.CreateCall(RuntimeRegisterF, Data); + // Check for addrspace cast when profiling GPU + IRB.CreateCall(RuntimeRegisterF, + IRB.CreatePointerBitCastOrAddrSpaceCast(Data, VoidPtrTy)); for (Value *Data : UsedVars) if (Data != NamesVar && !isa(Data)) - IRB.CreateCall(RuntimeRegisterF, Data); + IRB.CreateCall(RuntimeRegisterF, + IRB.CreatePointerBitCastOrAddrSpaceCast(Data, VoidPtrTy)); if (NamesVar) { Type *ParamTypes[] = {VoidPtrTy, Int64Ty}; @@ -1670,7 +1673,9 @@ void InstrLowerer::emitRegistration() { auto *NamesRegisterF = Function::Create(NamesRegisterTy, GlobalVariable::ExternalLinkage, getInstrProfNamesRegFuncName(), M); - IRB.CreateCall(NamesRegisterF, {NamesVar, IRB.getInt64(NamesSize)}); + IRB.CreateCall(NamesRegisterF, {IRB.CreatePointerBitCastOrAddrSpaceCast( + NamesVar, VoidPtrTy), + IRB.getInt64(NamesSize)}); } IRB.CreateRetVoid(); >From 7a0e0efa178cc4de6a22a8f5cc3f53cd1c81ea3a Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 21 Dec 2023 00:25:46 -0600 Subject: [PATCH 03/39] Change global visibility on GPU targets --- llvm/include/llvm/ProfileData/InstrProf.h | 4 ++++ llvm/lib/ProfileData/InstrProf.cpp | 17 +++++++++++++++-- .../Instrumentation/InstrProfiling.cpp | 15 +++++++++++---- 3 files changed, 30 insertions(+), 6 deletions(-) diff --git a/llvm/include/llvm/ProfileData/InstrProf.h b/llvm/include/llvm/ProfileData/InstrProf.h index 288dc71d756aee..bf9899d867e3dd 100644 --- a/llvm/include/llvm/ProfileData/InstrProf.h +++ b/llvm/include/llvm/ProfileData/InstrProf.h @@ -171,6 +171,10 @@ inline StringRef getInstrProfCounterBiasVarName() { /// Return the marker used to separate PGO names during serialization. inline StringRef getInstrProfNameSeparator() { return "\01"; } +/// Determines whether module targets a GPU eligable for PGO +/// instrumentation +bool isGPUProfTarget(const Module &M); + /// Return the modified name for function \c F suitable to be /// used the key for profile lookup. Variable \c InLTO indicates if this /// is called in LTO optimization passes. diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp index 649d814cfd9de0..0d6717aeb0142c 100644 --- a/llvm/lib/ProfileData/InstrProf.cpp +++ b/llvm/lib/ProfileData/InstrProf.cpp @@ -410,13 +410,22 @@ std::string getPGOFuncNameVarName(StringRef FuncName, return VarName; } +bool isGPUProfTarget(const Module &M) { + const auto &triple = M.getTargetTriple(); + return triple.rfind("nvptx", 0) == 0 || triple.rfind("amdgcn", 0) == 0 || + triple.rfind("r600", 0) == 0; +} + GlobalVariable *createPGOFuncNameVar(Module &M, GlobalValue::LinkageTypes Linkage, StringRef PGOFuncName) { + // Ensure profiling variables on GPU are visible to be read from host + if (isGPUProfTarget(M)) + Linkage = GlobalValue::ExternalLinkage; // We generally want to match the function's linkage, but available_externally // and extern_weak both have the wrong semantics, and anything that doesn't // need to link across compilation units doesn't need to be visible at all. - if (Linkage == GlobalValue::ExternalWeakLinkage) + else if (Linkage == GlobalValue::ExternalWeakLinkage) Linkage = GlobalValue::LinkOnceAnyLinkage; else if (Linkage == GlobalValue::AvailableExternallyLinkage) Linkage = GlobalValue::LinkOnceODRLinkage; @@ -430,8 +439,12 @@ GlobalVariable *createPGOFuncNameVar(Module &M, new GlobalVariable(M, Value->getType(), true, Linkage, Value, getPGOFuncNameVarName(PGOFuncName, Linkage)); + // If the target is a GPU, make the symbol protected so it can + // be read from the host device + if (isGPUProfTarget(M)) + FuncNameVar->setVisibility(GlobalValue::ProtectedVisibility); // Hide the symbol so that we correctly get a copy for each executable. - if (!GlobalValue::isLocalLinkage(FuncNameVar->getLinkage())) + else if (!GlobalValue::isLocalLinkage(FuncNameVar->getLinkage())) FuncNameVar->setVisibility(GlobalValue::HiddenVisibility); return FuncNameVar; diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index d2cb8155c17967..3b582b65190808 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -1481,6 +1481,10 @@ void InstrLowerer::createDataVariable(InstrProfCntrInstBase *Inc) { for (uint32_t Kind = IPVK_First; Kind <= IPVK_Last; ++Kind) Int16ArrayVals[Kind] = ConstantInt::get(Int16Ty, PD.NumValueSites[Kind]); + if (isGPUProfTarget(M)) { + Linkage = GlobalValue::ExternalLinkage; + Visibility = GlobalValue::ProtectedVisibility; + } // If the data variable is not referenced by code (if we don't emit // @llvm.instrprof.value.profile, NS will be 0), and the counter keeps the // data variable live under linker GC, the data variable can be private. This @@ -1492,9 +1496,9 @@ void InstrLowerer::createDataVariable(InstrProfCntrInstBase *Inc) { // If profd is in a deduplicate comdat, NS==0 with a hash suffix guarantees // that other copies must have the same CFG and cannot have value profiling. // If no hash suffix, other profd copies may be referenced by code. - if (NS == 0 && !(DataReferencedByCode && NeedComdat && !Renamed) && - (TT.isOSBinFormatELF() || - (!DataReferencedByCode && TT.isOSBinFormatCOFF()))) { + else if (NS == 0 && !(DataReferencedByCode && NeedComdat && !Renamed) && + (TT.isOSBinFormatELF() || + (!DataReferencedByCode && TT.isOSBinFormatCOFF()))) { Linkage = GlobalValue::PrivateLinkage; Visibility = GlobalValue::DefaultVisibility; } @@ -1696,7 +1700,10 @@ bool InstrLowerer::emitRuntimeHook() { auto *Var = new GlobalVariable(M, Int32Ty, false, GlobalValue::ExternalLinkage, nullptr, getInstrProfRuntimeHookVarName()); - Var->setVisibility(GlobalValue::HiddenVisibility); + if (isGPUProfTarget(M)) + Var->setVisibility(GlobalValue::ProtectedVisibility); + else + Var->setVisibility(GlobalValue::HiddenVisibility); if (TT.isOSBinFormatELF() && !TT.isPS()) { // Mark the user variable as used so that it isn't stripped out. >From fddc07908ed9aa698fe3250ddbfc5621ab4d049d Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 22 Dec 2023 23:43:29 -0600 Subject: [PATCH 04/39] Make names global public on GPU --- llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index 3b582b65190808..61fba7be3ee0ee 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -1621,6 +1621,13 @@ void InstrLowerer::emitNameData() { NamesVar = new GlobalVariable(M, NamesVal->getType(), true, GlobalValue::PrivateLinkage, NamesVal, getInstrProfNamesVarName()); + + // Make names variable public if current target is a GPU + if (isGPUProfTarget(M)) { + NamesVar->setLinkage(GlobalValue::ExternalLinkage); + NamesVar->setVisibility(GlobalValue::VisibilityTypes::ProtectedVisibility); + } + NamesSize = CompressedNameStr.size(); setGlobalVariableLargeSection(TT, *NamesVar); NamesVar->setSection( >From e9db03c70bf79f4f4ddad4b48a5aa63a37e0d4f6 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 29 Dec 2023 12:54:50 -0600 Subject: [PATCH 05/39] Read and print GPU device PGO globals --- .../common/include/GlobalHandler.h | 27 ++++++ .../common/src/GlobalHandler.cpp | 82 +++++++++++++++++++ .../common/src/PluginInterface.cpp | 14 ++++ 3 files changed, 123 insertions(+) diff --git a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h index fa079ac9660ee0..a82cd536487653 100644 --- a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h +++ b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h @@ -14,9 +14,11 @@ #define LLVM_OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_GLOBALHANDLER_H #include +#include #include "llvm/ADT/DenseMap.h" #include "llvm/Object/ELFObjectFile.h" +#include "llvm/ProfileData/InstrProf.h" #include "Shared/Debug.h" #include "Shared/Utils.h" @@ -58,6 +60,22 @@ class GlobalTy { void setPtr(void *P) { Ptr = P; } }; +typedef void *IntPtrT; +struct __llvm_profile_data { +#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) Type Name; +#include "llvm/ProfileData/InstrProfData.inc" +}; + +/// PGO profiling data extracted from a GPU device +struct GPUProfGlobals { + std::string names; + std::vector> counts; + std::vector<__llvm_profile_data> data; + Triple targetTriple; + + void dump() const; +}; + /// Subclass of GlobalTy that holds the memory for a global of \p Ty. template class StaticGlobalTy : public GlobalTy { Ty Data; @@ -172,6 +190,15 @@ class GenericGlobalHandlerTy { return moveGlobalBetweenDeviceAndHost(Device, Image, HostGlobal, /* D2H */ false); } + + /// Checks whether a given image contains profiling globals. + bool hasProfilingGlobals(GenericDeviceTy &Device, DeviceImageTy &Image); + + /// Reads profiling data from a GPU image to supplied profdata struct. + /// Iterates through the image symbol table and stores global values + /// with profiling prefixes. + Expected readProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image); }; } // namespace plugin diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index 3a272e228c7dfe..5dd5daec468ca5 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -176,3 +176,85 @@ Error GenericGlobalHandlerTy::readGlobalFromImage(GenericDeviceTy &Device, return Plugin::success(); } + +bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GlobalTy global(getInstrProfNamesVarName().str(), 0); + if (auto Err = getGlobalMetadataFromImage(Device, Image, global)) { + consumeError(std::move(Err)); + return false; + } + return true; +} + +Expected +GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GPUProfGlobals profdata; + const auto *elf = getOrCreateELFObjectFile(Device, Image); + profdata.targetTriple = elf->makeTriple(); + // Iterate through + for (auto &sym : elf->symbols()) { + if (auto name = sym.getName()) { + // Check if given current global is a profiling global based + // on name + if (name->equals(getInstrProfNamesVarName())) { + // Read in profiled function names + std::vector chars(sym.getSize() / sizeof(char), ' '); + GlobalTy NamesGlobal(name->str(), sym.getSize(), chars.data()); + if (auto Err = readGlobalFromDevice(Device, Image, NamesGlobal)) + return Err; + std::string names(chars.begin(), chars.end()); + profdata.names = std::move(names); + } else if (name->starts_with(getInstrProfCountersVarPrefix())) { + // Read global variable profiling counts + std::vector counts(sym.getSize() / sizeof(int64_t), 0); + GlobalTy CountGlobal(name->str(), sym.getSize(), counts.data()); + if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal)) + return Err; + profdata.counts.push_back(std::move(counts)); + } else if (name->starts_with(getInstrProfDataVarPrefix())) { + // Read profiling data for this global variable + __llvm_profile_data data{}; + GlobalTy DataGlobal(name->str(), sym.getSize(), &data); + if (auto Err = readGlobalFromDevice(Device, Image, DataGlobal)) + return Err; + profdata.data.push_back(std::move(data)); + } + } + } + return profdata; +} + +void GPUProfGlobals::dump() const { + llvm::outs() << "======= GPU Profile =======\nTarget: " << targetTriple.str() + << "\n"; + + llvm::outs() << "======== Counters =========\n"; + for (const auto &count : counts) { + llvm::outs() << "["; + for (size_t i = 0; i < count.size(); i++) { + if (i == 0) + llvm::outs() << " "; + llvm::outs() << count[i] << " "; + } + llvm::outs() << "]\n"; + } + + llvm::outs() << "========== Data ===========\n"; + for (const auto &d : data) { + llvm::outs() << "{ "; +#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ + llvm::outs() << d.Name << " "; +#include "llvm/ProfileData/InstrProfData.inc" + llvm::outs() << " }\n"; + } + + llvm::outs() << "======== Functions ========\n"; + InstrProfSymtab symtab; + if (Error Err = symtab.create(StringRef(names))) { + consumeError(std::move(Err)); + } + symtab.dumpNames(llvm::outs()); + llvm::outs() << "===========================\n"; +} diff --git a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp index 3c7d1ca8998787..84ed90f03f84f1 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp @@ -811,6 +811,20 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { DeviceMemoryPoolTracking.AllocationMax); } + for (auto *Image : LoadedImages) { + GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler(); + if (!Handler.hasProfilingGlobals(*this, *Image)) + continue; + + GPUProfGlobals profdata; + auto ProfOrErr = Handler.readProfilingGlobals(*this, *Image); + if (!ProfOrErr) + return ProfOrErr.takeError(); + + // TODO: write data to profiling file + ProfOrErr->dump(); + } + // Delete the memory manager before deinitializing the device. Otherwise, // we may delete device allocations after the device is deinitialized. if (MemoryManager) >From e4687605d1a6ca932312025826db09dba84845a3 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 3 Jan 2024 17:06:15 -0600 Subject: [PATCH 06/39] Fix rebase bug --- .../plugins-nextgen/common/src/GlobalHandler.cpp | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index cb71b61f4a9c4f..86742d0f77a2fe 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -178,10 +178,12 @@ Expected GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, DeviceImageTy &Image) { GPUProfGlobals profdata; - const auto *elf = getOrCreateELFObjectFile(Device, Image); - profdata.targetTriple = elf->makeTriple(); - // Iterate through - for (auto &sym : elf->symbols()) { + auto ELFObj = getELFObjectFile(Image); + if (!ELFObj) + return ELFObj.takeError(); + profdata.targetTriple = ELFObj->makeTriple(); + // Iterate through elf symbols + for (auto &sym : ELFObj->symbols()) { if (auto name = sym.getName()) { // Check if given current global is a profiling global based // on name >From ec18ce94c227e1d43927955fa1c67360ecfcfca6 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 3 Jan 2024 17:10:19 -0600 Subject: [PATCH 07/39] Refactor portions to be more idiomatic --- clang/lib/CodeGen/CodeGenPGO.cpp | 4 +--- llvm/lib/ProfileData/InstrProf.cpp | 5 ++--- 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index edae6885b528ac..7bfcec43ee4c98 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -961,10 +961,8 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *I8Ty = llvm::Type::getInt8Ty(CGM.getLLVMContext()); - auto *I8PtrTy = llvm::PointerType::getUnqual(I8Ty); auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - FuncNameVar, I8PtrTy); + FuncNameVar, llvm::PointerType::getUnqual(CGM.getLLVMContext())); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), Builder.getInt32(NumRegionCounters), diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp index cdcd6840bb5108..1d88da16a5ff9c 100644 --- a/llvm/lib/ProfileData/InstrProf.cpp +++ b/llvm/lib/ProfileData/InstrProf.cpp @@ -429,9 +429,8 @@ std::string getPGOFuncNameVarName(StringRef FuncName, } bool isGPUProfTarget(const Module &M) { - const auto &triple = M.getTargetTriple(); - return triple.rfind("nvptx", 0) == 0 || triple.rfind("amdgcn", 0) == 0 || - triple.rfind("r600", 0) == 0; + const auto &Triple = llvm::Triple(M.getTargetTriple()); + return Triple.isAMDGPU() || Triple.isNVPTX(); } GlobalVariable *createPGOFuncNameVar(Module &M, >From 0872556f597056361b0a2c23cdd0be3d9745aef3 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 3 Jan 2024 17:18:47 -0600 Subject: [PATCH 08/39] Reformat DeviceRTL prof functions --- openmp/libomptarget/DeviceRTL/include/Profiling.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/openmp/libomptarget/DeviceRTL/include/Profiling.h b/openmp/libomptarget/DeviceRTL/include/Profiling.h index 68c7744cd60752..9efc1554c176bc 100644 --- a/openmp/libomptarget/DeviceRTL/include/Profiling.h +++ b/openmp/libomptarget/DeviceRTL/include/Profiling.h @@ -13,9 +13,8 @@ #define OMPTARGET_DEVICERTL_PROFILING_H extern "C" { - -void __llvm_profile_register_function(void *ptr); -void __llvm_profile_register_names_function(void *ptr, long int i); +void __llvm_profile_register_function(void *Ptr); +void __llvm_profile_register_names_function(void *Ptr, long int I); } #endif >From 62f31d1c71b5d100f38d6dc584cc138b3904581b Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 9 Jan 2024 11:52:29 -0600 Subject: [PATCH 09/39] Style changes + catch name error --- .../common/include/GlobalHandler.h | 16 ++-- .../common/src/GlobalHandler.cpp | 87 ++++++++++--------- 2 files changed, 56 insertions(+), 47 deletions(-) diff --git a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h index a803b3f76d8b25..755bb23a414e37 100644 --- a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h +++ b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h @@ -13,8 +13,7 @@ #ifndef LLVM_OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_GLOBALHANDLER_H #define LLVM_OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_GLOBALHANDLER_H -#include -#include +#include #include "llvm/ADT/DenseMap.h" #include "llvm/Object/ELFObjectFile.h" @@ -60,18 +59,19 @@ class GlobalTy { void setPtr(void *P) { Ptr = P; } }; -typedef void *IntPtrT; +using IntPtrT = void *; struct __llvm_profile_data { -#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) Type Name; +#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ + std::remove_const::type Name; #include "llvm/ProfileData/InstrProfData.inc" }; /// PGO profiling data extracted from a GPU device struct GPUProfGlobals { - std::string names; - std::vector> counts; - std::vector<__llvm_profile_data> data; - Triple targetTriple; + SmallVector NamesData; + SmallVector> Counts; + SmallVector<__llvm_profile_data> Data; + Triple TargetTriple; void dump() const; }; diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index 86742d0f77a2fe..7cb672e7b26839 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -19,6 +19,7 @@ #include "llvm/Support/Error.h" #include +#include using namespace llvm; using namespace omp; @@ -177,73 +178,81 @@ bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy &Device, Expected GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, DeviceImageTy &Image) { - GPUProfGlobals profdata; + GPUProfGlobals DeviceProfileData; auto ELFObj = getELFObjectFile(Image); if (!ELFObj) return ELFObj.takeError(); - profdata.targetTriple = ELFObj->makeTriple(); + DeviceProfileData.TargetTriple = ELFObj->makeTriple(); + // Iterate through elf symbols - for (auto &sym : ELFObj->symbols()) { - if (auto name = sym.getName()) { - // Check if given current global is a profiling global based - // on name - if (name->equals(getInstrProfNamesVarName())) { - // Read in profiled function names - std::vector chars(sym.getSize() / sizeof(char), ' '); - GlobalTy NamesGlobal(name->str(), sym.getSize(), chars.data()); - if (auto Err = readGlobalFromDevice(Device, Image, NamesGlobal)) - return Err; - std::string names(chars.begin(), chars.end()); - profdata.names = std::move(names); - } else if (name->starts_with(getInstrProfCountersVarPrefix())) { - // Read global variable profiling counts - std::vector counts(sym.getSize() / sizeof(int64_t), 0); - GlobalTy CountGlobal(name->str(), sym.getSize(), counts.data()); - if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal)) - return Err; - profdata.counts.push_back(std::move(counts)); - } else if (name->starts_with(getInstrProfDataVarPrefix())) { - // Read profiling data for this global variable - __llvm_profile_data data{}; - GlobalTy DataGlobal(name->str(), sym.getSize(), &data); - if (auto Err = readGlobalFromDevice(Device, Image, DataGlobal)) - return Err; - profdata.data.push_back(std::move(data)); - } + for (auto &Sym : ELFObj->symbols()) { + auto NameOrErr = Sym.getName(); + if (!NameOrErr) + return ELFObj.takeError(); + + // Check if given current global is a profiling global based + // on name + if (NameOrErr->equals(getInstrProfNamesVarName())) { + // Read in profiled function names + DeviceProfileData.NamesData = SmallVector(Sym.getSize(), 0); + GlobalTy NamesGlobal(NameOrErr->str(), Sym.getSize(), + DeviceProfileData.NamesData.data()); + if (auto Err = readGlobalFromDevice(Device, Image, NamesGlobal)) + return Err; + } else if (NameOrErr->starts_with(getInstrProfCountersVarPrefix())) { + // Read global variable profiling counts + SmallVector Counts(Sym.getSize() / sizeof(int64_t), 0); + GlobalTy CountGlobal(NameOrErr->str(), Sym.getSize(), Counts.data()); + if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal)) + return Err; + DeviceProfileData.Counts.push_back(std::move(Counts)); + } else if (NameOrErr->starts_with(getInstrProfDataVarPrefix())) { + // Read profiling data for this global variable + __llvm_profile_data Data{}; + GlobalTy DataGlobal(NameOrErr->str(), Sym.getSize(), &Data); + if (auto Err = readGlobalFromDevice(Device, Image, DataGlobal)) + return Err; + DeviceProfileData.Data.push_back(std::move(Data)); } } - return profdata; + return DeviceProfileData; } void GPUProfGlobals::dump() const { - llvm::outs() << "======= GPU Profile =======\nTarget: " << targetTriple.str() + llvm::outs() << "======= GPU Profile =======\nTarget: " << TargetTriple.str() << "\n"; llvm::outs() << "======== Counters =========\n"; - for (const auto &count : counts) { + for (const auto &Count : Counts) { llvm::outs() << "["; - for (size_t i = 0; i < count.size(); i++) { + for (size_t i = 0; i < Count.size(); i++) { if (i == 0) llvm::outs() << " "; - llvm::outs() << count[i] << " "; + llvm::outs() << Count[i] << " "; } llvm::outs() << "]\n"; } llvm::outs() << "========== Data ===========\n"; - for (const auto &d : data) { + for (const auto &ProfData : Data) { llvm::outs() << "{ "; #define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ - llvm::outs() << d.Name << " "; + llvm::outs() << ProfData.Name << " "; #include "llvm/ProfileData/InstrProfData.inc" llvm::outs() << " }\n"; } llvm::outs() << "======== Functions ========\n"; - InstrProfSymtab symtab; - if (Error Err = symtab.create(StringRef(names))) { + std::string s; + s.reserve(NamesData.size()); + for (uint8_t Name : NamesData) { + s.push_back((char)Name); + } + + InstrProfSymtab Symtab; + if (Error Err = Symtab.create(StringRef(s))) { consumeError(std::move(Err)); } - symtab.dumpNames(llvm::outs()); + Symtab.dumpNames(llvm::outs()); llvm::outs() << "===========================\n"; } >From 0c4bbeb54d189c1461affd37853aa86c3e3ca7d8 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 17 Jan 2024 19:59:06 -0600 Subject: [PATCH 10/39] Add GPU PGO test --- .../common/src/GlobalHandler.cpp | 2 +- openmp/libomptarget/test/CMakeLists.txt | 6 +++ openmp/libomptarget/test/lit.cfg | 3 ++ openmp/libomptarget/test/lit.site.cfg.in | 2 +- openmp/libomptarget/test/offloading/pgo1.c | 39 +++++++++++++++++++ 5 files changed, 50 insertions(+), 2 deletions(-) create mode 100644 openmp/libomptarget/test/offloading/pgo1.c diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index 7cb672e7b26839..e5eb653d022287 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -239,7 +239,7 @@ void GPUProfGlobals::dump() const { #define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ llvm::outs() << ProfData.Name << " "; #include "llvm/ProfileData/InstrProfData.inc" - llvm::outs() << " }\n"; + llvm::outs() << "}\n"; } llvm::outs() << "======== Functions ========\n"; diff --git a/openmp/libomptarget/test/CMakeLists.txt b/openmp/libomptarget/test/CMakeLists.txt index a0ba233eaa5726..21233f3e252eb5 100644 --- a/openmp/libomptarget/test/CMakeLists.txt +++ b/openmp/libomptarget/test/CMakeLists.txt @@ -12,6 +12,12 @@ else() set(LIBOMPTARGET_DEBUG False) endif() +if (OPENMP_STANDALONE_BUILD) + set(LIBOMPTARGET_TEST_GPU_PGO False) +else() + set(LIBOMPTARGET_TEST_GPU_PGO True) +endif() + # Replace the space from user's input with ";" in case that CMake add escape # char into the lit command. string(REPLACE " " ";" LIBOMPTARGET_LIT_ARG_LIST "${LIBOMPTARGET_LIT_ARGS}") diff --git a/openmp/libomptarget/test/lit.cfg b/openmp/libomptarget/test/lit.cfg index 19c5e5c4572227..49743f9fed7f29 100644 --- a/openmp/libomptarget/test/lit.cfg +++ b/openmp/libomptarget/test/lit.cfg @@ -104,6 +104,9 @@ config.available_features.add(config.libomptarget_current_target) if config.libomptarget_has_libc: config.available_features.add('libc') +if config.libomptarget_test_pgo: + config.available_features.add('pgo') + # Determine whether the test system supports unified memory. # For CUDA, this is the case with compute capability 70 (Volta) or higher. # For all other targets, we currently assume it is. diff --git a/openmp/libomptarget/test/lit.site.cfg.in b/openmp/libomptarget/test/lit.site.cfg.in index 2d638118838727..494d1636af304a 100644 --- a/openmp/libomptarget/test/lit.site.cfg.in +++ b/openmp/libomptarget/test/lit.site.cfg.in @@ -25,6 +25,6 @@ config.libomptarget_not = "@OPENMP_NOT_EXECUTABLE@" config.libomptarget_debug = @LIBOMPTARGET_DEBUG@ config.has_libomptarget_ompt = @LIBOMPTARGET_OMPT_SUPPORT@ config.libomptarget_has_libc = @LIBOMPTARGET_GPU_LIBC_SUPPORT@ - +config.libomptarget_test_pgo = @LIBOMPTARGET_TEST_GPU_PGO@ # Let the main config do the real work. lit_config.load_config(config, "@CMAKE_CURRENT_SOURCE_DIR@/lit.cfg") diff --git a/openmp/libomptarget/test/offloading/pgo1.c b/openmp/libomptarget/test/offloading/pgo1.c new file mode 100644 index 00000000000000..ca8a6f502a06aa --- /dev/null +++ b/openmp/libomptarget/test/offloading/pgo1.c @@ -0,0 +1,39 @@ +// RUN: %libomptarget-compile-generic -fprofile-instr-generate -Xclang "-fprofile-instrument=clang" +// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic + +// UNSUPPORTED: x86_64-pc-linux-gnu +// UNSUPPORTED: x86_64-pc-linux-gnu-LTO +// UNSUPPORTED: aarch64-unknown-linux-gnu +// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +// REQUIRES: pgo + +#ifdef _OPENMP +#include +#endif + +int test1(int a) { return a / 2; } +int test2(int a) { return a * 2; } + +int main() { + int m = 2; +#pragma omp target + for (int i = 0; i < 10; i++) { + m = test1(m); + for (int j = 0; j < 2; j++) { + m = test2(m); + } + } +} + +// CHECK: ======== Counters ========= +// CHECK-NEXT: [ 0 11 20 ] +// CHECK-NEXT: [ 10 ] +// CHECK-NEXT: [ 20 ] +// CHECK-NEXT: ========== Data =========== +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: ======== Functions ======== +// CHECK-NEXT: pgo1.c:__omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} +// CHECK-NEXT: test1 +// CHECK-NEXT: test2 >From c7ae2a74daa93b05058fcc9bba64e0734359362c Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 17 Jan 2024 23:12:27 -0600 Subject: [PATCH 11/39] Fix PGO test formatting --- openmp/libomptarget/test/offloading/pgo1.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/openmp/libomptarget/test/offloading/pgo1.c b/openmp/libomptarget/test/offloading/pgo1.c index ca8a6f502a06aa..389be19b670d76 100644 --- a/openmp/libomptarget/test/offloading/pgo1.c +++ b/openmp/libomptarget/test/offloading/pgo1.c @@ -1,4 +1,5 @@ -// RUN: %libomptarget-compile-generic -fprofile-instr-generate -Xclang "-fprofile-instrument=clang" +// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ +// RUN: -Xclang "-fprofile-instrument=clang" // RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic // UNSUPPORTED: x86_64-pc-linux-gnu @@ -30,9 +31,18 @@ int main() { // CHECK-NEXT: [ 10 ] // CHECK-NEXT: [ 20 ] // CHECK-NEXT: ========== Data =========== -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } // CHECK-NEXT: ======== Functions ======== // CHECK-NEXT: pgo1.c:__omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} // CHECK-NEXT: test1 >From 8bb22072914bbb830e2788d117aedd0e0bab66ff Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 18 Jan 2024 23:15:55 -0600 Subject: [PATCH 12/39] Refactor visibility logic --- llvm/lib/ProfileData/InstrProf.cpp | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp index 511571a3eed9b0..708ea63fd95e04 100644 --- a/llvm/lib/ProfileData/InstrProf.cpp +++ b/llvm/lib/ProfileData/InstrProf.cpp @@ -422,6 +422,16 @@ bool isGPUProfTarget(const Module &M) { return Triple.isAMDGPU() || Triple.isNVPTX(); } +void setPGOFuncVisibility(Module &M, GlobalVariable *FuncNameVar) { + // If the target is a GPU, make the symbol protected so it can + // be read from the host device + if (isGPUProfTarget(M)) + FuncNameVar->setVisibility(GlobalValue::ProtectedVisibility); + // Hide the symbol so that we correctly get a copy for each executable. + else if (!GlobalValue::isLocalLinkage(FuncNameVar->getLinkage())) + FuncNameVar->setVisibility(GlobalValue::HiddenVisibility); +} + GlobalVariable *createPGOFuncNameVar(Module &M, GlobalValue::LinkageTypes Linkage, StringRef PGOFuncName) { @@ -445,14 +455,7 @@ GlobalVariable *createPGOFuncNameVar(Module &M, new GlobalVariable(M, Value->getType(), true, Linkage, Value, getPGOFuncNameVarName(PGOFuncName, Linkage)); - // If the target is a GPU, make the symbol protected so it can - // be read from the host device - if (isGPUProfTarget(M)) - FuncNameVar->setVisibility(GlobalValue::ProtectedVisibility); - // Hide the symbol so that we correctly get a copy for each executable. - else if (!GlobalValue::isLocalLinkage(FuncNameVar->getLinkage())) - FuncNameVar->setVisibility(GlobalValue::HiddenVisibility); - + setPGOFuncVisibility(M, FuncNameVar); return FuncNameVar; } >From 9f13943f64cb16162e44902d54de53a9b1229179 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 23 Jan 2024 18:33:58 -0600 Subject: [PATCH 13/39] Add LLVM instrumentation support This PR formerly only supported -fprofile-instrument=clang. This commit adds support for -fprofile-instrument=llvm --- .../Instrumentation/PGOInstrumentation.cpp | 12 +++- openmp/libomptarget/test/offloading/pgo1.c | 72 +++++++++++++------ 2 files changed, 59 insertions(+), 25 deletions(-) diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index c20fc942eaf0d5..bbc8da78fd7baf 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -862,6 +862,10 @@ static void instrumentOneFunc( auto Name = FuncInfo.FuncNameVar; auto CFGHash = ConstantInt::get(Type::getInt64Ty(M->getContext()), FuncInfo.FunctionHash); + // Make sure that pointer to global is passed in with zero addrspace + // This is relevant during GPU profiling + auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( + Name, llvm::PointerType::getUnqual(M->getContext())); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); IRBuilder<> Builder(&EntryBB, EntryBB.getFirstInsertionPt()); @@ -869,7 +873,7 @@ static void instrumentOneFunc( // i32 ) Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_cover), - {Name, CFGHash, Builder.getInt32(1), Builder.getInt32(0)}); + {NormalizedPtr, CFGHash, Builder.getInt32(1), Builder.getInt32(0)}); return; } @@ -887,7 +891,8 @@ static void instrumentOneFunc( // i32 ) Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_timestamp), - {Name, CFGHash, Builder.getInt32(NumCounters), Builder.getInt32(I)}); + {NormalizedPtr, CFGHash, Builder.getInt32(NumCounters), + Builder.getInt32(I)}); I += PGOBlockCoverage ? 8 : 1; } @@ -901,7 +906,8 @@ static void instrumentOneFunc( Intrinsic::getDeclaration(M, PGOBlockCoverage ? Intrinsic::instrprof_cover : Intrinsic::instrprof_increment), - {Name, CFGHash, Builder.getInt32(NumCounters), Builder.getInt32(I++)}); + {NormalizedPtr, CFGHash, Builder.getInt32(NumCounters), + Builder.getInt32(I++)}); } // Now instrument select instructions: diff --git a/openmp/libomptarget/test/offloading/pgo1.c b/openmp/libomptarget/test/offloading/pgo1.c index 389be19b670d76..d95793b508dcfc 100644 --- a/openmp/libomptarget/test/offloading/pgo1.c +++ b/openmp/libomptarget/test/offloading/pgo1.c @@ -1,6 +1,11 @@ // RUN: %libomptarget-compile-generic -fprofile-instr-generate \ // RUN: -Xclang "-fprofile-instrument=clang" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic +// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ +// RUN: --check-prefix="CLANG-PGO" +// RUN: %libomptarget-compile-generic -fprofile-generate \ +// RUN: -Xclang "-fprofile-instrument=llvm" +// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ +// RUN: --check-prefix="LLVM-PGO" // UNSUPPORTED: x86_64-pc-linux-gnu // UNSUPPORTED: x86_64-pc-linux-gnu-LTO @@ -26,24 +31,47 @@ int main() { } } -// CHECK: ======== Counters ========= -// CHECK-NEXT: [ 0 11 20 ] -// CHECK-NEXT: [ 10 ] -// CHECK-NEXT: [ 20 ] -// CHECK-NEXT: ========== Data =========== -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: ======== Functions ======== -// CHECK-NEXT: pgo1.c:__omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} -// CHECK-NEXT: test1 -// CHECK-NEXT: test2 +// CLANG-PGO: ======== Counters ========= +// CLANG-PGO-NEXT: [ 0 11 20 ] +// CLANG-PGO-NEXT: [ 10 ] +// CLANG-PGO-NEXT: [ 20 ] +// CLANG-PGO-NEXT: ========== Data =========== +// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CLANG-PGO-NEXT: ======== Functions ======== +// CLANG-PGO-NEXT: pgo1.c: +// CLANG-PGO-SAME: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} +// CLANG-PGO-NEXT: test1 +// CLANG-PGO-NEXT: test2 + +// LLVM-PGO: ======== Counters ========= +// LLVM-PGO-NEXT: [ 20 ] +// LLVM-PGO-NEXT: [ 10 ] +// LLVM-PGO-NEXT: [ 20 10 1 1 ] +// LLVM-PGO-NEXT: ========== Data =========== +// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// LLVM-PGO-NEXT: ======== Functions ======== +// LLVM-PGO-NEXT: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} +// LLVM-PGO-NEXT: test1 +// LLVM-PGO-NEXT: test2 >From 0606f0dd1b32ef9ebe138bbc964b3921e22d95d1 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 14 Feb 2024 01:46:55 -0600 Subject: [PATCH 14/39] Use explicit addrspace instead of unqual --- clang/lib/CodeGen/CodeGenPGO.cpp | 2 +- llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index e084dda879cbc0..4c75a01222d304 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1103,7 +1103,7 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - FuncNameVar, llvm::PointerType::getUnqual(CGM.getLLVMContext())); + FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), Builder.getInt32(NumRegionCounters), diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index bbc8da78fd7baf..c63b3e4ecf786a 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -865,7 +865,7 @@ static void instrumentOneFunc( // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - Name, llvm::PointerType::getUnqual(M->getContext())); + Name, llvm::PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); IRBuilder<> Builder(&EntryBB, EntryBB.getFirstInsertionPt()); >From c1f9be321678766525141214aaab74636cafbc2c Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 15 Feb 2024 19:10:09 -0600 Subject: [PATCH 15/39] Remove redundant namespaces --- .../Instrumentation/PGOInstrumentation.cpp | 4 +-- .../common/src/GlobalHandler.cpp | 26 +++++++++---------- 2 files changed, 15 insertions(+), 15 deletions(-) diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index c63b3e4ecf786a..3058e577738fda 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -864,8 +864,8 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - Name, llvm::PointerType::get(M->getContext(), 0)); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); IRBuilder<> Builder(&EntryBB, EntryBB.getFirstInsertionPt()); diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index e5eb653d022287..ae270c60804d26 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -219,30 +219,30 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, } void GPUProfGlobals::dump() const { - llvm::outs() << "======= GPU Profile =======\nTarget: " << TargetTriple.str() + outs() << "======= GPU Profile =======\nTarget: " << TargetTriple.str() << "\n"; - llvm::outs() << "======== Counters =========\n"; + outs() << "======== Counters =========\n"; for (const auto &Count : Counts) { - llvm::outs() << "["; + outs() << "["; for (size_t i = 0; i < Count.size(); i++) { if (i == 0) - llvm::outs() << " "; - llvm::outs() << Count[i] << " "; + outs() << " "; + outs() << Count[i] << " "; } - llvm::outs() << "]\n"; + outs() << "]\n"; } - llvm::outs() << "========== Data ===========\n"; + outs() << "========== Data ===========\n"; for (const auto &ProfData : Data) { - llvm::outs() << "{ "; + outs() << "{ "; #define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ - llvm::outs() << ProfData.Name << " "; + outs() << ProfData.Name << " "; #include "llvm/ProfileData/InstrProfData.inc" - llvm::outs() << "}\n"; + outs() << "}\n"; } - llvm::outs() << "======== Functions ========\n"; + outs() << "======== Functions ========\n"; std::string s; s.reserve(NamesData.size()); for (uint8_t Name : NamesData) { @@ -253,6 +253,6 @@ void GPUProfGlobals::dump() const { if (Error Err = Symtab.create(StringRef(s))) { consumeError(std::move(Err)); } - Symtab.dumpNames(llvm::outs()); - llvm::outs() << "===========================\n"; + Symtab.dumpNames(outs()); + outs() << "===========================\n"; } >From 6a3ae407e69e7524f0f808329c534f8352ee1779 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 15 Feb 2024 19:15:15 -0600 Subject: [PATCH 16/39] Clang format --- .../libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index ae270c60804d26..1fce2448922624 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -220,7 +220,7 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, void GPUProfGlobals::dump() const { outs() << "======= GPU Profile =======\nTarget: " << TargetTriple.str() - << "\n"; + << "\n"; outs() << "======== Counters =========\n"; for (const auto &Count : Counts) { >From 6866862d459e3c3fa65fae8ae639ddc3ff735252 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 16 Feb 2024 13:13:39 -0600 Subject: [PATCH 17/39] Use getAddrSpaceCast Replace getPointerBitCastOrAddrSpaceCast with getAddrSpaceCast and allow no-op getAddrSpaceCast calls when types are identical --- clang/lib/CodeGen/CodeGenPGO.cpp | 2 +- llvm/lib/IR/Constants.cpp | 4 ++++ llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | 2 +- 3 files changed, 6 insertions(+), 2 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index 8f52018445d2b0..baceeba8380ddb 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1099,7 +1099,7 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( + auto *NormalizedPtr = llvm::ConstantExpr::getAddrSpaceCast( FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp index a38b912164b130..2d89c5bbd4a4c2 100644 --- a/llvm/lib/IR/Constants.cpp +++ b/llvm/lib/IR/Constants.cpp @@ -2067,6 +2067,10 @@ Constant *ConstantExpr::getBitCast(Constant *C, Type *DstTy, Constant *ConstantExpr::getAddrSpaceCast(Constant *C, Type *DstTy, bool OnlyIfReduced) { + // Skip cast if types are identical + if (C->getType() == DstTy) + return C; + assert(CastInst::castIsValid(Instruction::AddrSpaceCast, C, DstTy) && "Invalid constantexpr addrspacecast!"); return getFoldedCast(Instruction::AddrSpaceCast, C, DstTy, OnlyIfReduced); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index 3058e577738fda..c0be71aa4cc004 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -864,7 +864,7 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + auto *NormalizedPtr = ConstantExpr::getAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); >From 62a5ee1c75545571f81d9edd22e19e9ef7cff69f Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 27 Feb 2024 14:53:51 -0600 Subject: [PATCH 18/39] Revert "Use getAddrSpaceCast" This reverts commit 6866862d459e3c3fa65fae8ae639ddc3ff735252. --- clang/lib/CodeGen/CodeGenPGO.cpp | 2 +- llvm/lib/IR/Constants.cpp | 4 ---- llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | 2 +- 3 files changed, 2 insertions(+), 6 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index baceeba8380ddb..8f52018445d2b0 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1099,7 +1099,7 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getAddrSpaceCast( + auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp index 2d89c5bbd4a4c2..a38b912164b130 100644 --- a/llvm/lib/IR/Constants.cpp +++ b/llvm/lib/IR/Constants.cpp @@ -2067,10 +2067,6 @@ Constant *ConstantExpr::getBitCast(Constant *C, Type *DstTy, Constant *ConstantExpr::getAddrSpaceCast(Constant *C, Type *DstTy, bool OnlyIfReduced) { - // Skip cast if types are identical - if (C->getType() == DstTy) - return C; - assert(CastInst::castIsValid(Instruction::AddrSpaceCast, C, DstTy) && "Invalid constantexpr addrspacecast!"); return getFoldedCast(Instruction::AddrSpaceCast, C, DstTy, OnlyIfReduced); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index c0be71aa4cc004..3058e577738fda 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -864,7 +864,7 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = ConstantExpr::getAddrSpaceCast( + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); >From 052394fa28c923d130bf73a07b965a9751467302 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 27 Feb 2024 15:34:34 -0600 Subject: [PATCH 19/39] Revert "Use getAddrSpaceCast" This reverts commit 6866862d459e3c3fa65fae8ae639ddc3ff735252. --- clang/lib/CodeGen/CodeGenPGO.cpp | 2 +- llvm/lib/IR/Constants.cpp | 4 ---- llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | 2 +- 3 files changed, 2 insertions(+), 6 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index baceeba8380ddb..8f52018445d2b0 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1099,7 +1099,7 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getAddrSpaceCast( + auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp index 2d89c5bbd4a4c2..a38b912164b130 100644 --- a/llvm/lib/IR/Constants.cpp +++ b/llvm/lib/IR/Constants.cpp @@ -2067,10 +2067,6 @@ Constant *ConstantExpr::getBitCast(Constant *C, Type *DstTy, Constant *ConstantExpr::getAddrSpaceCast(Constant *C, Type *DstTy, bool OnlyIfReduced) { - // Skip cast if types are identical - if (C->getType() == DstTy) - return C; - assert(CastInst::castIsValid(Instruction::AddrSpaceCast, C, DstTy) && "Invalid constantexpr addrspacecast!"); return getFoldedCast(Instruction::AddrSpaceCast, C, DstTy, OnlyIfReduced); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index c0be71aa4cc004..3058e577738fda 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -864,7 +864,7 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = ConstantExpr::getAddrSpaceCast( + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); >From 612d5a5f6966a77e82e5591f5aea475fbf886e55 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 1 Mar 2024 02:04:00 -0600 Subject: [PATCH 20/39] Write PGO TODO: Fix tests --- compiler-rt/lib/profile/InstrProfiling.h | 11 ++ compiler-rt/lib/profile/InstrProfilingFile.c | 148 +++++++++++++++--- .../common/include/GlobalHandler.h | 14 +- .../common/src/GlobalHandler.cpp | 57 +++++-- .../common/src/PluginInterface.cpp | 6 +- 5 files changed, 200 insertions(+), 36 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfiling.h b/compiler-rt/lib/profile/InstrProfiling.h index 01239083369187..937acbd417de46 100644 --- a/compiler-rt/lib/profile/InstrProfiling.h +++ b/compiler-rt/lib/profile/InstrProfiling.h @@ -275,6 +275,17 @@ void __llvm_profile_get_padding_sizes_for_counters( */ void __llvm_profile_set_dumped(); +/*! + * \brief Write custom target-specific profiling data to a seperate file. + * Used by libomptarget for GPU PGO. + */ +int __llvm_write_custom_profile(const char *Target, + const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, + const char *CountersBegin, + const char *CountersEnd, const char *NamesBegin, + const char *NamesEnd); + /*! * This variable is defined in InstrProfilingRuntime.cpp as a hidden * symbol. Its main purpose is to enable profile runtime user to diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index f3b457d786e6bd..4fc401bb9bebf5 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -502,27 +502,15 @@ static FILE *getFileObject(const char *OutputName) { return fopen(OutputName, "ab"); } -/* Write profile data to file \c OutputName. */ -static int writeFile(const char *OutputName) { - int RetVal; - FILE *OutputFile; - - int MergeDone = 0; +/* Get file object and merge if applicable */ +static FILE *getMergeFileObject(const char *OutputName, int *MergeDone) { VPMergeHook = &lprofMergeValueProfData; if (doMerging()) - OutputFile = openFileForMerging(OutputName, &MergeDone); - else - OutputFile = getFileObject(OutputName); - - if (!OutputFile) - return -1; - - FreeHook = &free; - setupIOBuffer(); - ProfDataWriter fileWriter; - initFileWriter(&fileWriter, OutputFile); - RetVal = lprofWriteData(&fileWriter, lprofGetVPDataReader(), MergeDone); + return openFileForMerging(OutputName, MergeDone); + return getFileObject(OutputName); +} +static void closeFileObject(FILE *OutputFile) { if (OutputFile == getProfileFile()) { fflush(OutputFile); if (doMerging() && !__llvm_profile_is_continuous_mode_enabled()) { @@ -531,7 +519,23 @@ static int writeFile(const char *OutputName) { } else { fclose(OutputFile); } +} + +/* Write profile data to file \c OutputName. */ +static int writeFile(const char *OutputName) { + int RetVal, MergeDone = 0; + FILE *OutputFile = getMergeFileObject(OutputName, &MergeDone); + + if (!OutputFile) + return -1; + + FreeHook = &free; + setupIOBuffer(); + ProfDataWriter fileWriter; + initFileWriter(&fileWriter, OutputFile); + RetVal = lprofWriteData(&fileWriter, lprofGetVPDataReader(), MergeDone); + closeFileObject(OutputFile); return RetVal; } @@ -558,10 +562,16 @@ static int writeOrderFile(const char *OutputName) { #define LPROF_INIT_ONCE_ENV "__LLVM_PROFILE_RT_INIT_ONCE" +static void forceTruncateFile(const char *Filename) { + FILE *File = fopen(Filename, "w"); + if (!File) + return; + fclose(File); +} + static void truncateCurrentFile(void) { const char *Filename; char *FilenameBuf; - FILE *File; int Length; Length = getCurFilenameLength(); @@ -591,10 +601,7 @@ static void truncateCurrentFile(void) { return; /* Truncate the file. Later we'll reopen and append. */ - File = fopen(Filename, "w"); - if (!File) - return; - fclose(File); + forceTruncateFile(Filename); } /* Write a partial profile to \p Filename, which is required to be backed by @@ -1271,4 +1278,99 @@ COMPILER_RT_VISIBILITY int __llvm_profile_set_file_object(FILE *File, return 0; } +int __llvm_write_custom_profile(const char *Target, + const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, + const char *CountersBegin, + const char *CountersEnd, const char *NamesBegin, + const char *NamesEnd) { + int ReturnValue = 0, FilenameLength, TargetLength, MergeDone; + char *FilenameBuf, *TargetFilename; + const char *Filename; + + /* Save old profile data */ + FILE *oldFile = getProfileFile(); + + // Temporarily suspend getting SIGKILL when the parent exits. + int PDeathSig = lprofSuspendSigKill(); + + if (lprofProfileDumped() || __llvm_profile_is_continuous_mode_enabled()) { + PROF_NOTE("Profile data not written to file: %s.\n", "already written"); + if (PDeathSig == 1) + lprofRestoreSigKill(); + return 0; + } + + /* Get current filename */ + FilenameLength = getCurFilenameLength(); + FilenameBuf = (char *)COMPILER_RT_ALLOCA(FilenameLength + 1); + Filename = getCurFilename(FilenameBuf, 0); + + /* Check the filename. */ + if (!Filename) { + PROF_ERR("Failed to write file : %s\n", "Filename not set"); + if (PDeathSig == 1) + lprofRestoreSigKill(); + return -1; + } + + /* Allocate new space for our target-specific PGO filename */ + TargetLength = strlen(Target); + TargetFilename = + (char *)COMPILER_RT_ALLOCA(FilenameLength + TargetLength + 2); + + /* Prepend "TARGET." to current filename */ + memcpy(TargetFilename, Target, TargetLength); + TargetFilename[TargetLength] = '.'; + memcpy(TargetFilename, Target, TargetLength); + memcpy(TargetFilename + 1 + TargetLength, Filename, FilenameLength); + TargetFilename[FilenameLength + 1 + TargetLength] = 0; + + /* Check if there is llvm/runtime version mismatch. */ + if (GET_VERSION(__llvm_profile_get_version()) != INSTR_PROF_RAW_VERSION) { + PROF_ERR("Runtime and instrumentation version mismatch : " + "expected %d, but get %d\n", + INSTR_PROF_RAW_VERSION, + (int)GET_VERSION(__llvm_profile_get_version())); + if (PDeathSig == 1) + lprofRestoreSigKill(); + return -1; + } + + /* Clean old target file */ + forceTruncateFile(TargetFilename); + + /* Open target-specific PGO file */ + MergeDone = 0; + FILE *OutputFile = getMergeFileObject(TargetFilename, &MergeDone); + + if (!OutputFile) { + PROF_ERR("Failed to open file : %s\n", TargetFilename); + if (PDeathSig == 1) + lprofRestoreSigKill(); + return -1; + } + + FreeHook = &free; + setupIOBuffer(); + ProfDataWriter fileWriter; + initFileWriter(&fileWriter, OutputFile); + + /* Write custom data to the file */ + ReturnValue = lprofWriteDataImpl( + &fileWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, + lprofGetVPDataReader(), NamesBegin, NamesEnd, MergeDone); + + closeFileObject(OutputFile); + + // Restore SIGKILL. + if (PDeathSig == 1) + lprofRestoreSigKill(); + + /* Restore old profiling file */ + setProfileFile(oldFile); + + return ReturnValue; +} + #endif diff --git a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h index f5a15ca11bfcda..af0cd4dcdf5dcf 100644 --- a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h +++ b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h @@ -63,14 +63,24 @@ struct __llvm_profile_data { #include "llvm/ProfileData/InstrProfData.inc" }; +extern "C" { +extern int __attribute__((weak)) +__llvm_write_custom_profile(const char *Target, + const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, + const char *CountersBegin, const char *CountersEnd, + const char *NamesBegin, const char *NamesEnd); +} + /// PGO profiling data extracted from a GPU device struct GPUProfGlobals { - SmallVector NamesData; - SmallVector> Counts; + SmallVector Counts; SmallVector<__llvm_profile_data> Data; + SmallVector NamesData; Triple TargetTriple; void dump() const; + Error write() const; }; /// Subclass of GlobalTy that holds the memory for a global of \p Ty. diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index 1fce2448922624..2f16b6e3c139e9 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -205,7 +205,7 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, GlobalTy CountGlobal(NameOrErr->str(), Sym.getSize(), Counts.data()); if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal)) return Err; - DeviceProfileData.Counts.push_back(std::move(Counts)); + DeviceProfileData.Counts.append(std::move(Counts)); } else if (NameOrErr->starts_with(getInstrProfDataVarPrefix())) { // Read profiling data for this global variable __llvm_profile_data Data{}; @@ -223,15 +223,14 @@ void GPUProfGlobals::dump() const { << "\n"; outs() << "======== Counters =========\n"; - for (const auto &Count : Counts) { - outs() << "["; - for (size_t i = 0; i < Count.size(); i++) { - if (i == 0) - outs() << " "; - outs() << Count[i] << " "; - } - outs() << "]\n"; + for (size_t i = 0; i < Counts.size(); i++) { + if (i > 0 && i % 10 == 0) + outs() << "\n"; + else if (i != 0) + outs() << " "; + outs() << Counts[i]; } + outs() << "\n"; outs() << "========== Data ===========\n"; for (const auto &ProfData : Data) { @@ -256,3 +255,43 @@ void GPUProfGlobals::dump() const { Symtab.dumpNames(outs()); outs() << "===========================\n"; } + +Error GPUProfGlobals::write() const { + if (!__llvm_write_custom_profile) + return Plugin::error("Could not find symbol __llvm_write_custom_profile. " + "The compiler-rt profiling library must be linked for " + "GPU PGO to work."); + + size_t DataSize = Data.size() * sizeof(__llvm_profile_data), + CountsSize = Counts.size() * sizeof(int64_t); + __llvm_profile_data *DataBegin, *DataEnd; + char *CountersBegin, *CountersEnd, *NamesBegin, *NamesEnd; + + // Initialize array of contiguous data. We need to make sure each section is + // contiguous so that the PGO library can compute deltas properly + SmallVector ContiguousData(NamesData.size() + DataSize + CountsSize); + + // Compute region pointers + DataBegin = (__llvm_profile_data *)(ContiguousData.data() + CountsSize); + DataEnd = + (__llvm_profile_data *)(ContiguousData.data() + CountsSize + DataSize); + CountersBegin = (char *)ContiguousData.data(); + CountersEnd = (char *)(ContiguousData.data() + CountsSize); + NamesBegin = (char *)(ContiguousData.data() + CountsSize + DataSize); + NamesEnd = (char *)(ContiguousData.data() + CountsSize + DataSize + + NamesData.size()); + + // Copy data to contiguous buffer + memcpy(DataBegin, Data.data(), DataSize); + memcpy(CountersBegin, Counts.data(), CountsSize); + memcpy(NamesBegin, NamesData.data(), NamesData.size()); + + // Invoke compiler-rt entrypoint + int result = __llvm_write_custom_profile(TargetTriple.str().c_str(), + DataBegin, DataEnd, CountersBegin, + CountersEnd, NamesBegin, NamesEnd); + if (result != 0) + return Plugin::error("Error writing GPU PGO data to file"); + + return Plugin::success(); +} diff --git a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp index 1ea93795ce8ce4..d5e6b6128152dc 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp @@ -837,8 +837,10 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { if (!ProfOrErr) return ProfOrErr.takeError(); - // TODO: write data to profiling file - ProfOrErr->dump(); + // Write data to profiling file + if (auto Err = ProfOrErr->write()) { + consumeError(std::move(Err)); + } } // Delete the memory manager before deinitializing the device. Otherwise, >From b8c916305acf08c0bd2d51b81875be5e8fc59ff3 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 13 Mar 2024 20:05:32 -0500 Subject: [PATCH 21/39] Fix tests --- .../plugins-nextgen/common/src/PluginInterface.cpp | 3 +++ openmp/libomptarget/test/offloading/pgo1.c | 8 ++------ 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp index d5e6b6128152dc..2359ad28a25b04 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp @@ -837,6 +837,9 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { if (!ProfOrErr) return ProfOrErr.takeError(); + // Dump out profdata + ProfOrErr->dump(); + // Write data to profiling file if (auto Err = ProfOrErr->write()) { consumeError(std::move(Err)); diff --git a/openmp/libomptarget/test/offloading/pgo1.c b/openmp/libomptarget/test/offloading/pgo1.c index d95793b508dcfc..79e93d0f10827f 100644 --- a/openmp/libomptarget/test/offloading/pgo1.c +++ b/openmp/libomptarget/test/offloading/pgo1.c @@ -32,9 +32,7 @@ int main() { } // CLANG-PGO: ======== Counters ========= -// CLANG-PGO-NEXT: [ 0 11 20 ] -// CLANG-PGO-NEXT: [ 10 ] -// CLANG-PGO-NEXT: [ 20 ] +// CLANG-PGO-NEXT: 0 11 20 10 20 // CLANG-PGO-NEXT: ========== Data =========== // CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} // CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} @@ -55,9 +53,7 @@ int main() { // CLANG-PGO-NEXT: test2 // LLVM-PGO: ======== Counters ========= -// LLVM-PGO-NEXT: [ 20 ] -// LLVM-PGO-NEXT: [ 10 ] -// LLVM-PGO-NEXT: [ 20 10 1 1 ] +// LLVM-PGO-NEXT: 20 10 20 10 1 1 // LLVM-PGO-NEXT: ========== Data =========== // LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} // LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} >From 7770b37a5a4c40bd45887f762bd7f1e652bc0ed2 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 7 May 2024 16:31:48 -0500 Subject: [PATCH 22/39] Fix params --- compiler-rt/lib/profile/InstrProfilingFile.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index 466bfe480543bc..bc1d40a37a5ad6 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1360,9 +1360,10 @@ int __llvm_write_custom_profile(const char *Target, initFileWriter(&fileWriter, OutputFile); /* Write custom data to the file */ - ReturnValue = lprofWriteDataImpl( - &fileWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, - lprofGetVPDataReader(), NamesBegin, NamesEnd, MergeDone); + ReturnValue = + lprofWriteDataImpl(&fileWriter, DataBegin, DataEnd, CountersBegin, + CountersEnd, NULL, NULL, lprofGetVPDataReader(), NULL, + NULL, NULL, NULL, NamesBegin, NamesEnd, MergeDone); closeFileObject(OutputFile); >From aa895a1788969a0d27692057a1457074e9772c78 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 18 Mar 2024 21:31:32 -0500 Subject: [PATCH 23/39] Fix elf obj file --- offload/plugins-nextgen/common/src/GlobalHandler.cpp | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/offload/plugins-nextgen/common/src/GlobalHandler.cpp b/offload/plugins-nextgen/common/src/GlobalHandler.cpp index 80cdcaff75528e..7717e19a5b6779 100644 --- a/offload/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/offload/plugins-nextgen/common/src/GlobalHandler.cpp @@ -177,16 +177,19 @@ Expected GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, DeviceImageTy &Image) { GPUProfGlobals DeviceProfileData; - auto ELFObj = getELFObjectFile(Image); - if (!ELFObj) - return ELFObj.takeError(); + auto ObjFile = getELFObjectFile(Image); + if (!ObjFile) + return ObjFile.takeError(); + + std::unique_ptr ELFObj( + static_cast(ObjFile->release())); DeviceProfileData.TargetTriple = ELFObj->makeTriple(); // Iterate through elf symbols for (auto &Sym : ELFObj->symbols()) { auto NameOrErr = Sym.getName(); if (!NameOrErr) - return ELFObj.takeError(); + return NameOrErr.takeError(); // Check if given current global is a profiling global based // on name >From 2031e49c2b26864f2dab72e629eb6cbe34928a7a Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 6 May 2024 23:13:58 -0500 Subject: [PATCH 24/39] Add more addrspace casts for GPU targets --- .../Transforms/Instrumentation/InstrProfiling.cpp | 11 ++++++++--- .../Instrumentation/PGOInstrumentation.cpp | 13 +++++++++---- 2 files changed, 17 insertions(+), 7 deletions(-) diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index a6b1e0d488120a..dd8c027c4bbf62 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -869,6 +869,8 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { llvm::InstrProfValueKind::IPVK_MemOPSize); CallInst *Call = nullptr; auto *TLI = &GetTLI(*Ind->getFunction()); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + DataVar, PointerType::getUnqual(M.getContext())); // To support value profiling calls within Windows exception handlers, funclet // information contained within operand bundles needs to be copied over to @@ -877,11 +879,13 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { SmallVector OpBundles; Ind->getOperandBundlesAsDefs(OpBundles); if (!IsMemOpSize) { - Value *Args[3] = {Ind->getTargetValue(), DataVar, Builder.getInt32(Index)}; + Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Builder.getInt32(Index)}; Call = Builder.CreateCall(getOrInsertValueProfilingCall(M, *TLI), Args, OpBundles); } else { - Value *Args[3] = {Ind->getTargetValue(), DataVar, Builder.getInt32(Index)}; + Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Builder.getInt32(Index)}; Call = Builder.CreateCall( getOrInsertValueProfilingCall(M, *TLI, ValueProfilingCallType::MemOp), Args, OpBundles); @@ -1575,7 +1579,8 @@ void InstrLowerer::createDataVariable(InstrProfCntrInstBase *Inc) { getInstrProfSectionName(IPSK_vals, TT.getObjectFormat())); ValuesVar->setAlignment(Align(8)); maybeSetComdat(ValuesVar, Fn, CntsVarName); - ValuesPtrExpr = ValuesVar; + ValuesPtrExpr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + ValuesVar, PointerType::getUnqual(Fn->getContext())); } uint64_t NumCounters = Inc->getNumCounters()->getZExtValue(); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index 4b51396a8baa35..ee1657ba8400ee 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -1007,12 +1007,15 @@ static void instrumentOneFunc( ToProfile = Builder.CreatePtrToInt(Cand.V, Builder.getInt64Ty()); assert(ToProfile && "value profiling Value is of unexpected type"); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + Name, PointerType::get(M->getContext(), 0)); + SmallVector OpBundles; populateEHOperandBundle(Cand, BlockColors, OpBundles); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_value_profile), - {FuncInfo.FuncNameVar, Builder.getInt64(FuncInfo.FunctionHash), - ToProfile, Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, + {NormalizedPtr, Builder.getInt64(FuncInfo.FunctionHash), ToProfile, + Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, OpBundles); } } // IPVK_First <= Kind <= IPVK_Last @@ -1685,10 +1688,12 @@ void SelectInstVisitor::instrumentOneSelectInst(SelectInst &SI) { IRBuilder<> Builder(&SI); Type *Int64Ty = Builder.getInt64Ty(); auto *Step = Builder.CreateZExt(SI.getCondition(), Int64Ty); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + FuncNameVar, PointerType::get(M->getContext(), 0)); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_increment_step), - {FuncNameVar, Builder.getInt64(FuncHash), Builder.getInt32(TotalNumCtrs), - Builder.getInt32(*CurCtrIdx), Step}); + {NormalizedPtr, Builder.getInt64(FuncHash), + Builder.getInt32(TotalNumCtrs), Builder.getInt32(*CurCtrIdx), Step}); ++(*CurCtrIdx); } >From be6524bb4f77de0add1e698f68115fd336f32238 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 13 May 2024 17:41:00 -0500 Subject: [PATCH 25/39] Have test read from profraw instead of dump --- offload/test/lit.cfg | 2 + offload/test/offloading/pgo1.c | 94 ++++++++++++++++------------------ 2 files changed, 46 insertions(+), 50 deletions(-) diff --git a/offload/test/lit.cfg b/offload/test/lit.cfg index 069110dc69a6e4..38e6a33b01fafc 100644 --- a/offload/test/lit.cfg +++ b/offload/test/lit.cfg @@ -391,6 +391,8 @@ if config.test_fortran_compiler: config.available_features.add('flang') config.substitutions.append(("%flang", config.test_fortran_compiler)) +config.substitutions.append(("%target_triple", config.libomptarget_current_target)) + config.substitutions.append(("%openmp_flags", config.test_openmp_flags)) if config.libomptarget_current_target.startswith('nvptx') and config.cuda_path: config.substitutions.append(("%cuda_flags", "--cuda-path=" + config.cuda_path)) diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index 79e93d0f10827f..d22d5340f5b3ec 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -1,22 +1,21 @@ -// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ -// RUN: -Xclang "-fprofile-instrument=clang" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ -// RUN: --check-prefix="CLANG-PGO" -// RUN: %libomptarget-compile-generic -fprofile-generate \ -// RUN: -Xclang "-fprofile-instrument=llvm" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ +// RUN: %libomptarget-compile-generic -Xclang "-fprofile-instrument=llvm" +// RUN: env LLVM_PROFILE_FILE=llvm.profraw %libomptarget-run-generic 2>&1 +// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %target_triple.llvm.profraw | %fcheck-generic \ // RUN: --check-prefix="LLVM-PGO" +// RUN: %libomptarget-compile-generic -Xclang "-fprofile-instrument=clang" +// RUN: env LLVM_PROFILE_FILE=clang.profraw %libomptarget-run-generic 2>&1 +// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %target_triple.clang.profraw | %fcheck-generic \ +// RUN: --check-prefix="CLANG-PGO" + // UNSUPPORTED: x86_64-pc-linux-gnu // UNSUPPORTED: x86_64-pc-linux-gnu-LTO // UNSUPPORTED: aarch64-unknown-linux-gnu // UNSUPPORTED: aarch64-unknown-linux-gnu-LTO // REQUIRES: pgo -#ifdef _OPENMP -#include -#endif - int test1(int a) { return a / 2; } int test2(int a) { return a * 2; } @@ -31,43 +30,38 @@ int main() { } } -// CLANG-PGO: ======== Counters ========= -// CLANG-PGO-NEXT: 0 11 20 10 20 -// CLANG-PGO-NEXT: ========== Data =========== -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: ======== Functions ======== -// CLANG-PGO-NEXT: pgo1.c: -// CLANG-PGO-SAME: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} -// CLANG-PGO-NEXT: test1 -// CLANG-PGO-NEXT: test2 +// LLVM-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: +// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-PGO: Counters: 4 +// LLVM-PGO: Function count: 20 +// LLVM-PGO: Block counts: [10, 20, 10] + +// LLVM-PGO-LABEL: test1: +// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-PGO: Counters: 1 +// LLVM-PGO: Function count: 1 +// LLVM-PGO: Block counts: [] + +// LLVM-PGO-LABEL: test2: +// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-PGO: Counters: 1 +// LLVM-PGO: Function count: 1 +// LLVM-PGO: Block counts: [] + +// CLANG-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: +// CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-PGO: Counters: 3 +// CLANG-PGO: Function count: 0 +// CLANG-PGO: Block counts: [11, 20] + +// CLANG-PGO-LABEL: test1: +// CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-PGO: Counters: 1 +// CLANG-PGO: Function count: 10 +// CLANG-PGO: Block counts: [] -// LLVM-PGO: ======== Counters ========= -// LLVM-PGO-NEXT: 20 10 20 10 1 1 -// LLVM-PGO-NEXT: ========== Data =========== -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: ======== Functions ======== -// LLVM-PGO-NEXT: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} -// LLVM-PGO-NEXT: test1 -// LLVM-PGO-NEXT: test2 +// CLANG-PGO-LABEL: test2: +// CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-PGO: Counters: 1 +// CLANG-PGO: Function count: 20 +// CLANG-PGO: Block counts: [] >From 2b8eb2935ec21bf0acc5c56f45837b5976560963 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 24 May 2024 19:59:33 -0500 Subject: [PATCH 26/39] Fix PGO test format --- offload/test/offloading/pgo1.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index d22d5340f5b3ec..0e75c684ed9263 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -33,20 +33,17 @@ int main() { // LLVM-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 4 -// LLVM-PGO: Function count: 20 -// LLVM-PGO: Block counts: [10, 20, 10] +// LLVM-PGO: Block counts: [20, 10, 20, 10] // LLVM-PGO-LABEL: test1: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Function count: 1 -// LLVM-PGO: Block counts: [] +// LLVM-PGO: Block counts: [1] // LLVM-PGO-LABEL: test2: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Function count: 1 -// LLVM-PGO: Block counts: [] +// LLVM-PGO: Block counts: [1] // CLANG-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} >From 67f3009173d815295f36e2b37e85add1347e3bf9 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 24 May 2024 20:45:04 -0500 Subject: [PATCH 27/39] Refactor profile writer --- compiler-rt/lib/profile/InstrProfilingFile.c | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index bc1d40a37a5ad6..76238214c13aa3 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1344,8 +1344,7 @@ int __llvm_write_custom_profile(const char *Target, forceTruncateFile(TargetFilename); /* Open target-specific PGO file */ - MergeDone = 0; - FILE *OutputFile = getMergeFileObject(TargetFilename, &MergeDone); + FILE *OutputFile = getFileObject(TargetFilename); if (!OutputFile) { PROF_ERR("Failed to open file : %s\n", TargetFilename); @@ -1356,15 +1355,11 @@ int __llvm_write_custom_profile(const char *Target, FreeHook = &free; setupIOBuffer(); - ProfDataWriter fileWriter; - initFileWriter(&fileWriter, OutputFile); - - /* Write custom data to the file */ - ReturnValue = - lprofWriteDataImpl(&fileWriter, DataBegin, DataEnd, CountersBegin, - CountersEnd, NULL, NULL, lprofGetVPDataReader(), NULL, - NULL, NULL, NULL, NamesBegin, NamesEnd, MergeDone); + /* Write custom data */ + ReturnValue = __llvm_profile_write_buffer_internal( + OutputFile, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, + NamesBegin, NamesEnd); closeFileObject(OutputFile); // Restore SIGKILL. >From e8ad1322c557f7b48e2b28fe3a34a696a1103bba Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 27 May 2024 18:29:18 -0500 Subject: [PATCH 28/39] Fix refactor bug --- compiler-rt/lib/profile/InstrProfilingFile.c | 52 ++++++++++---------- offload/test/offloading/pgo1.c | 6 ++- 2 files changed, 29 insertions(+), 29 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index 76238214c13aa3..784cb9af6169d8 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -505,14 +505,6 @@ static FILE *getFileObject(const char *OutputName) { return fopen(OutputName, "ab"); } -/* Get file object and merge if applicable */ -static FILE *getMergeFileObject(const char *OutputName, int *MergeDone) { - VPMergeHook = &lprofMergeValueProfData; - if (doMerging()) - return openFileForMerging(OutputName, MergeDone); - return getFileObject(OutputName); -} - static void closeFileObject(FILE *OutputFile) { if (OutputFile == getProfileFile()) { fflush(OutputFile); @@ -526,8 +518,15 @@ static void closeFileObject(FILE *OutputFile) { /* Write profile data to file \c OutputName. */ static int writeFile(const char *OutputName) { - int RetVal, MergeDone = 0; - FILE *OutputFile = getMergeFileObject(OutputName, &MergeDone); + int RetVal; + FILE *OutputFile; + + int MergeDone = 0; + VPMergeHook = &lprofMergeValueProfData; + if (doMerging()) + OutputFile = openFileForMerging(OutputName, &MergeDone); + else + OutputFile = getFileObject(OutputName); if (!OutputFile) return -1; @@ -565,16 +564,10 @@ static int writeOrderFile(const char *OutputName) { #define LPROF_INIT_ONCE_ENV "__LLVM_PROFILE_RT_INIT_ONCE" -static void forceTruncateFile(const char *Filename) { - FILE *File = fopen(Filename, "w"); - if (!File) - return; - fclose(File); -} - static void truncateCurrentFile(void) { const char *Filename; char *FilenameBuf; + FILE *File; int Length; Length = getCurFilenameLength(); @@ -604,7 +597,10 @@ static void truncateCurrentFile(void) { return; /* Truncate the file. Later we'll reopen and append. */ - forceTruncateFile(Filename); + File = fopen(Filename, "w"); + if (!File) + return; + fclose(File); } /* Write a partial profile to \p Filename, which is required to be backed by @@ -1287,7 +1283,7 @@ int __llvm_write_custom_profile(const char *Target, const char *CountersBegin, const char *CountersEnd, const char *NamesBegin, const char *NamesEnd) { - int ReturnValue = 0, FilenameLength, TargetLength, MergeDone; + int ReturnValue = 0, FilenameLength, TargetLength; char *FilenameBuf, *TargetFilename; const char *Filename; @@ -1340,11 +1336,9 @@ int __llvm_write_custom_profile(const char *Target, return -1; } - /* Clean old target file */ - forceTruncateFile(TargetFilename); - - /* Open target-specific PGO file */ - FILE *OutputFile = getFileObject(TargetFilename); + /* Open and truncate target-specific PGO file */ + FILE *OutputFile = fopen(TargetFilename, "w"); + setProfileFile(OutputFile); if (!OutputFile) { PROF_ERR("Failed to open file : %s\n", TargetFilename); @@ -1357,9 +1351,13 @@ int __llvm_write_custom_profile(const char *Target, setupIOBuffer(); /* Write custom data */ - ReturnValue = __llvm_profile_write_buffer_internal( - OutputFile, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, - NamesBegin, NamesEnd); + ProfDataWriter fileWriter; + initFileWriter(&fileWriter, OutputFile); + + /* Write custom data to the file */ + ReturnValue = lprofWriteDataImpl( + &fileWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, + lprofGetVPDataReader(), NULL, NULL, NULL, NULL, NamesBegin, NamesEnd, 0); closeFileObject(OutputFile); // Restore SIGKILL. diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index 0e75c684ed9263..d6747113265803 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -1,10 +1,12 @@ -// RUN: %libomptarget-compile-generic -Xclang "-fprofile-instrument=llvm" +// RUN: %libomptarget-compile-generic -fprofile-generate \ +// RUN: -Xclang "-fprofile-instrument=llvm" // RUN: env LLVM_PROFILE_FILE=llvm.profraw %libomptarget-run-generic 2>&1 // RUN: llvm-profdata show --all-functions --counts \ // RUN: %target_triple.llvm.profraw | %fcheck-generic \ // RUN: --check-prefix="LLVM-PGO" -// RUN: %libomptarget-compile-generic -Xclang "-fprofile-instrument=clang" +// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ +// RUN: -Xclang "-fprofile-instrument=clang" // RUN: env LLVM_PROFILE_FILE=clang.profraw %libomptarget-run-generic 2>&1 // RUN: llvm-profdata show --all-functions --counts \ // RUN: %target_triple.clang.profraw | %fcheck-generic \ >From 4c9f814ce14aeb6766a93f5c1d15b847b98dc29f Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 28 May 2024 12:58:43 -0500 Subject: [PATCH 29/39] Make requested clang-format change --- offload/plugins-nextgen/common/include/GlobalHandler.h | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/offload/plugins-nextgen/common/include/GlobalHandler.h b/offload/plugins-nextgen/common/include/GlobalHandler.h index 017d7e994f07a8..1d7b9f80f9dfd3 100644 --- a/offload/plugins-nextgen/common/include/GlobalHandler.h +++ b/offload/plugins-nextgen/common/include/GlobalHandler.h @@ -64,12 +64,10 @@ struct __llvm_profile_data { }; extern "C" { -extern int __attribute__((weak)) -__llvm_write_custom_profile(const char *Target, - const __llvm_profile_data *DataBegin, - const __llvm_profile_data *DataEnd, - const char *CountersBegin, const char *CountersEnd, - const char *NamesBegin, const char *NamesEnd); +extern int __attribute__((weak)) __llvm_write_custom_profile( + const char *Target, const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, const char *CountersBegin, + const char *CountersEnd, const char *NamesBegin, const char *NamesEnd); } /// PGO profiling data extracted from a GPU device >From 344e357de657f54c068be969dcfc3ea33f2f026e Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 31 May 2024 20:29:20 -0500 Subject: [PATCH 30/39] Tighten PGO test requirements Require compiler-rt to be an enabled runtime --- offload/test/CMakeLists.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/offload/test/CMakeLists.txt b/offload/test/CMakeLists.txt index 32df1e47afaeb2..41ab339147791c 100644 --- a/offload/test/CMakeLists.txt +++ b/offload/test/CMakeLists.txt @@ -12,10 +12,10 @@ else() set(LIBOMPTARGET_DEBUG False) endif() -if (OPENMP_STANDALONE_BUILD) - set(LIBOMPTARGET_TEST_GPU_PGO False) -else() +if (NOT OPENMP_STANDALONE_BUILD AND "compiler-rt" IN_LIST LLVM_ENABLE_RUNTIMES) set(LIBOMPTARGET_TEST_GPU_PGO True) +else() + set(LIBOMPTARGET_TEST_GPU_PGO False) endif() # Replace the space from user's input with ";" in case that CMake add escape >From 2f751420b9ad2ffc7c9fac4a645724b45cdae59a Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 31 May 2024 20:29:20 -0500 Subject: [PATCH 31/39] Tighten PGO test requirements Require compiler-rt to be an enabled runtime --- offload/test/CMakeLists.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/offload/test/CMakeLists.txt b/offload/test/CMakeLists.txt index 32df1e47afaeb2..41ab339147791c 100644 --- a/offload/test/CMakeLists.txt +++ b/offload/test/CMakeLists.txt @@ -12,10 +12,10 @@ else() set(LIBOMPTARGET_DEBUG False) endif() -if (OPENMP_STANDALONE_BUILD) - set(LIBOMPTARGET_TEST_GPU_PGO False) -else() +if (NOT OPENMP_STANDALONE_BUILD AND "compiler-rt" IN_LIST LLVM_ENABLE_RUNTIMES) set(LIBOMPTARGET_TEST_GPU_PGO True) +else() + set(LIBOMPTARGET_TEST_GPU_PGO False) endif() # Replace the space from user's input with ";" in case that CMake add escape >From 488cb4a349fdfbd73d0a78ddb2c17522c46145ba Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 26 Jun 2024 18:18:31 -0500 Subject: [PATCH 32/39] Apply requested formatting changes --- clang/lib/CodeGen/CodeGenPGO.cpp | 11 +++++----- llvm/lib/ProfileData/InstrProf.cpp | 4 ++-- .../Instrumentation/InstrProfiling.cpp | 10 ++++----- .../Instrumentation/PGOInstrumentation.cpp | 21 ++++++++++--------- offload/DeviceRTL/src/Profiling.cpp | 6 ++++-- 5 files changed, 28 insertions(+), 24 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index a7ce0b8f6a35f3..3edfbdd679c61d 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1199,12 +1199,13 @@ void CodeGenPGO::emitCounterSetOrIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); + auto *NormalizedFuncNameVarPtr = + llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( + FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); - llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), - Builder.getInt32(NumRegionCounters), - Builder.getInt32(Counter), StepV}; + llvm::Value *Args[] = { + NormalizedFuncNameVarPtr, Builder.getInt64(FunctionHash), + Builder.getInt32(NumRegionCounters), Builder.getInt32(Counter), StepV}; if (llvm::EnableSingleByteCoverage) Builder.CreateCall(CGM.getIntrinsic(llvm::Intrinsic::instrprof_cover), diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp index 1284efd4b5f4da..6742435c9d065e 100644 --- a/llvm/lib/ProfileData/InstrProf.cpp +++ b/llvm/lib/ProfileData/InstrProf.cpp @@ -433,8 +433,8 @@ std::string getPGOFuncNameVarName(StringRef FuncName, } bool isGPUProfTarget(const Module &M) { - const auto &Triple = llvm::Triple(M.getTargetTriple()); - return Triple.isAMDGPU() || Triple.isNVPTX(); + const auto &T = Triple(M.getTargetTriple()); + return T.isAMDGPU() || T.isNVPTX(); } void setPGOFuncVisibility(Module &M, GlobalVariable *FuncNameVar) { diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index dd8c027c4bbf62..05cef1236f0879 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -869,8 +869,8 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { llvm::InstrProfValueKind::IPVK_MemOPSize); CallInst *Call = nullptr; auto *TLI = &GetTLI(*Ind->getFunction()); - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( - DataVar, PointerType::getUnqual(M.getContext())); + auto *NormalizedDataVarPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + DataVar, PointerType::get(M.getContext(), 0)); // To support value profiling calls within Windows exception handlers, funclet // information contained within operand bundles needs to be copied over to @@ -879,12 +879,12 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { SmallVector OpBundles; Ind->getOperandBundlesAsDefs(OpBundles); if (!IsMemOpSize) { - Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Value *Args[3] = {Ind->getTargetValue(), NormalizedDataVarPtr, Builder.getInt32(Index)}; Call = Builder.CreateCall(getOrInsertValueProfilingCall(M, *TLI), Args, OpBundles); } else { - Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Value *Args[3] = {Ind->getTargetValue(), NormalizedDataVarPtr, Builder.getInt32(Index)}; Call = Builder.CreateCall( getOrInsertValueProfilingCall(M, *TLI, ValueProfilingCallType::MemOp), @@ -1580,7 +1580,7 @@ void InstrLowerer::createDataVariable(InstrProfCntrInstBase *Inc) { ValuesVar->setAlignment(Align(8)); maybeSetComdat(ValuesVar, Fn, CntsVarName); ValuesPtrExpr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( - ValuesVar, PointerType::getUnqual(Fn->getContext())); + ValuesVar, PointerType::get(Fn->getContext(), 0)); } uint64_t NumCounters = Inc->getNumCounters()->getZExtValue(); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index ee1657ba8400ee..f8f34ea25597f3 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -884,7 +884,7 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + auto *NormalizedNamePtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); @@ -893,7 +893,7 @@ static void instrumentOneFunc( // i32 ) Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_cover), - {NormalizedPtr, CFGHash, Builder.getInt32(1), Builder.getInt32(0)}); + {NormalizedNamePtr, CFGHash, Builder.getInt32(1), Builder.getInt32(0)}); return; } @@ -948,7 +948,7 @@ static void instrumentOneFunc( // i32 ) Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_timestamp), - {NormalizedPtr, CFGHash, Builder.getInt32(NumCounters), + {NormalizedNamePtr, CFGHash, Builder.getInt32(NumCounters), Builder.getInt32(I)}); I += PGOBlockCoverage ? 8 : 1; } @@ -963,7 +963,7 @@ static void instrumentOneFunc( Intrinsic::getDeclaration(M, PGOBlockCoverage ? Intrinsic::instrprof_cover : Intrinsic::instrprof_increment), - {NormalizedPtr, CFGHash, Builder.getInt32(NumCounters), + {NormalizedNamePtr, CFGHash, Builder.getInt32(NumCounters), Builder.getInt32(I++)}); } @@ -1007,15 +1007,15 @@ static void instrumentOneFunc( ToProfile = Builder.CreatePtrToInt(Cand.V, Builder.getInt64Ty()); assert(ToProfile && "value profiling Value is of unexpected type"); - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + auto *NormalizedNamePtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); SmallVector OpBundles; populateEHOperandBundle(Cand, BlockColors, OpBundles); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_value_profile), - {NormalizedPtr, Builder.getInt64(FuncInfo.FunctionHash), ToProfile, - Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, + {NormalizedNamePtr, Builder.getInt64(FuncInfo.FunctionHash), + ToProfile, Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, OpBundles); } } // IPVK_First <= Kind <= IPVK_Last @@ -1688,11 +1688,12 @@ void SelectInstVisitor::instrumentOneSelectInst(SelectInst &SI) { IRBuilder<> Builder(&SI); Type *Int64Ty = Builder.getInt64Ty(); auto *Step = Builder.CreateZExt(SI.getCondition(), Int64Ty); - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( - FuncNameVar, PointerType::get(M->getContext(), 0)); + auto *NormalizedFuncNameVarPtr = + ConstantExpr::getPointerBitCastOrAddrSpaceCast( + FuncNameVar, PointerType::get(M->getContext(), 0)); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_increment_step), - {NormalizedPtr, Builder.getInt64(FuncHash), + {NormalizedFuncNameVarPtr, Builder.getInt64(FuncHash), Builder.getInt32(TotalNumCtrs), Builder.getInt32(*CurCtrIdx), Step}); ++(*CurCtrIdx); } diff --git a/offload/DeviceRTL/src/Profiling.cpp b/offload/DeviceRTL/src/Profiling.cpp index 799477f5e47d27..639c62ceff7a69 100644 --- a/offload/DeviceRTL/src/Profiling.cpp +++ b/offload/DeviceRTL/src/Profiling.cpp @@ -12,8 +12,10 @@ extern "C" { -void __llvm_profile_register_function(void *ptr) {} -void __llvm_profile_register_names_function(void *ptr, long int i) {} +// Provides empty implementations for certain functions in compiler-rt +// that are emitted by the PGO instrumentation. +void __llvm_profile_register_function(void *Ptr) {} +void __llvm_profile_register_names_function(void *Ptr, long int I) {} } #pragma omp end declare target >From b90c01583f1893802aba0180b07a448584585365 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 26 Jun 2024 18:29:59 -0500 Subject: [PATCH 33/39] Add memop function shim to DeviceRTL This comes up sometimes when using LLVM IR level instrumentation. --- offload/DeviceRTL/include/Profiling.h | 1 + offload/DeviceRTL/src/Profiling.cpp | 1 + 2 files changed, 2 insertions(+) diff --git a/offload/DeviceRTL/include/Profiling.h b/offload/DeviceRTL/include/Profiling.h index 9efc1554c176bc..d9947522541219 100644 --- a/offload/DeviceRTL/include/Profiling.h +++ b/offload/DeviceRTL/include/Profiling.h @@ -15,6 +15,7 @@ extern "C" { void __llvm_profile_register_function(void *Ptr); void __llvm_profile_register_names_function(void *Ptr, long int I); +void __llvm_profile_instrument_memop(long int I, void *Ptr, int I2); } #endif diff --git a/offload/DeviceRTL/src/Profiling.cpp b/offload/DeviceRTL/src/Profiling.cpp index 639c62ceff7a69..bb3caaadcc03dd 100644 --- a/offload/DeviceRTL/src/Profiling.cpp +++ b/offload/DeviceRTL/src/Profiling.cpp @@ -16,6 +16,7 @@ extern "C" { // that are emitted by the PGO instrumentation. void __llvm_profile_register_function(void *Ptr) {} void __llvm_profile_register_names_function(void *Ptr, long int I) {} +void __llvm_profile_instrument_memop(long int I, void *Ptr, int I2) {} } #pragma omp end declare target >From c68c6e2fa98a1fe608b88ed38f7db68eae804c5b Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 27 Jun 2024 02:04:27 -0500 Subject: [PATCH 34/39] Make requested changes --- compiler-rt/lib/profile/InstrProfiling.h | 2 +- compiler-rt/lib/profile/InstrProfilingFile.c | 1 - offload/plugins-nextgen/common/src/PluginInterface.cpp | 5 ++--- 3 files changed, 3 insertions(+), 5 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfiling.h b/compiler-rt/lib/profile/InstrProfiling.h index ef1292a45bf01d..eda3e9a673c1af 100644 --- a/compiler-rt/lib/profile/InstrProfiling.h +++ b/compiler-rt/lib/profile/InstrProfiling.h @@ -298,7 +298,7 @@ void __llvm_profile_set_dumped(); /*! * \brief Write custom target-specific profiling data to a seperate file. - * Used by libomptarget for GPU PGO. + * Used by offload PGO. */ int __llvm_write_custom_profile(const char *Target, const __llvm_profile_data *DataBegin, diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index 784cb9af6169d8..93436ecbabb40d 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1321,7 +1321,6 @@ int __llvm_write_custom_profile(const char *Target, /* Prepend "TARGET." to current filename */ memcpy(TargetFilename, Target, TargetLength); TargetFilename[TargetLength] = '.'; - memcpy(TargetFilename, Target, TargetLength); memcpy(TargetFilename + 1 + TargetLength, Filename, FilenameLength); TargetFilename[FilenameLength + 1 + TargetLength] = 0; diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp index c4e1e63777de8a..445f4ad942bd4d 100644 --- a/offload/plugins-nextgen/common/src/PluginInterface.cpp +++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp @@ -843,9 +843,8 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { ProfOrErr->dump(); // Write data to profiling file - if (auto Err = ProfOrErr->write()) { - consumeError(std::move(Err)); - } + if (auto Err = ProfOrErr->write()) + return Err; } // Delete the memory manager before deinitializing the device. Otherwise, >From ca52c58c7fde412897cf6b10b9bbb321812f193d Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 27 Jun 2024 02:26:20 -0500 Subject: [PATCH 35/39] Only dump counters if PGODump flag is set --- offload/include/Shared/Environment.h | 1 + offload/plugins-nextgen/common/src/PluginInterface.cpp | 4 +++- openmp/docs/design/Runtimes.rst | 1 + 3 files changed, 5 insertions(+), 1 deletion(-) diff --git a/offload/include/Shared/Environment.h b/offload/include/Shared/Environment.h index d141146b6bd5a1..86f6d1c6ea2d36 100644 --- a/offload/include/Shared/Environment.h +++ b/offload/include/Shared/Environment.h @@ -30,6 +30,7 @@ enum class DeviceDebugKind : uint32_t { FunctionTracing = 1U << 1, CommonIssues = 1U << 2, AllocationTracker = 1U << 3, + PGODump = 1U << 4, }; struct DeviceEnvironmentTy { diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp index 445f4ad942bd4d..35fb04863d8741 100644 --- a/offload/plugins-nextgen/common/src/PluginInterface.cpp +++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp @@ -840,7 +840,9 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { return ProfOrErr.takeError(); // Dump out profdata - ProfOrErr->dump(); + if ((OMPX_DebugKind.get() & uint32_t(DeviceDebugKind::PGODump)) == + uint32_t(DeviceDebugKind::PGODump)) + ProfOrErr->dump(); // Write data to profiling file if (auto Err = ProfOrErr->write()) diff --git a/openmp/docs/design/Runtimes.rst b/openmp/docs/design/Runtimes.rst index f8a8cb87e83e66..7fc697a838e229 100644 --- a/openmp/docs/design/Runtimes.rst +++ b/openmp/docs/design/Runtimes.rst @@ -1493,3 +1493,4 @@ debugging features are supported. * Enable debugging assertions in the device. ``0x01`` * Enable diagnosing common problems during offloading . ``0x4`` * Enable device malloc statistics (amdgpu only). ``0x8`` + * Dump device PGO counters (only if PGO on GPU is enabled). ``0x10`` >From ee4431a1b57469c7679f54f124ca5f3dd7f0433b Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 9 Aug 2024 20:21:38 -0500 Subject: [PATCH 36/39] Update requirements --- offload/test/offloading/pgo1.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index d6747113265803..fbf6337374a997 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -12,10 +12,7 @@ // RUN: %target_triple.clang.profraw | %fcheck-generic \ // RUN: --check-prefix="CLANG-PGO" -// UNSUPPORTED: x86_64-pc-linux-gnu -// UNSUPPORTED: x86_64-pc-linux-gnu-LTO -// UNSUPPORTED: aarch64-unknown-linux-gnu -// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +// REQUIRES: gpu // REQUIRES: pgo int test1(int a) { return a / 2; } >From fb699b6bca72d42359a304bcbba88f3564ae9ac9 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Sat, 10 Aug 2024 00:54:36 -0500 Subject: [PATCH 37/39] Merge changes --- offload/plugins-nextgen/common/src/GlobalHandler.cpp | 2 +- offload/test/offloading/pgo1.c | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/offload/plugins-nextgen/common/src/GlobalHandler.cpp b/offload/plugins-nextgen/common/src/GlobalHandler.cpp index bca66cff6558a2..d7bfbba01c8efc 100644 --- a/offload/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/offload/plugins-nextgen/common/src/GlobalHandler.cpp @@ -193,7 +193,7 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, // Check if given current global is a profiling global based // on name - if (NameOrErr->equals(getInstrProfNamesVarName())) { + if (*NameOrErr == getInstrProfNamesVarName()) { // Read in profiled function names DeviceProfileData.NamesData = SmallVector(Sym.getSize(), 0); GlobalTy NamesGlobal(NameOrErr->str(), Sym.getSize(), diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index fbf6337374a997..3270ce8f15e7dc 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -32,17 +32,17 @@ int main() { // LLVM-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 4 -// LLVM-PGO: Block counts: [20, 10, 20, 10] +// LLVM-PGO: Block counts: [20, 10, 2, 1] // LLVM-PGO-LABEL: test1: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Block counts: [1] +// LLVM-PGO: Block counts: [10] // LLVM-PGO-LABEL: test2: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Block counts: [1] +// LLVM-PGO: Block counts: [20] // CLANG-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} >From 1d0a961aabe488e6d09b96a80329498b8f586923 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 25 Oct 2024 13:42:19 -0500 Subject: [PATCH 38/39] Add llvm-profdata substitution to offload tests --- offload/test/lit.cfg | 2 ++ offload/test/lit.site.cfg.in | 2 +- offload/test/offloading/pgo1.c | 4 ++-- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/offload/test/lit.cfg b/offload/test/lit.cfg index 7994a08ba063fb..cfd1ad6c3c1eb5 100644 --- a/offload/test/lit.cfg +++ b/offload/test/lit.cfg @@ -112,8 +112,10 @@ config.available_features.add(config.libomptarget_current_target) if config.libomptarget_has_libc: config.available_features.add('libc') +profdata_path = os.path.join(config.bin_llvm_tools_dir, "llvm-profdata") if config.libomptarget_test_pgo: config.available_features.add('pgo') + config.substitutions.append(("%profdata", profdata_path)) # Determine whether the test system supports unified memory. # For CUDA, this is the case with compute capability 70 (Volta) or higher. diff --git a/offload/test/lit.site.cfg.in b/offload/test/lit.site.cfg.in index a1cb5acc38a405..d998fb0c839700 100644 --- a/offload/test/lit.site.cfg.in +++ b/offload/test/lit.site.cfg.in @@ -1,6 +1,6 @@ @AUTO_GEN_COMMENT@ -config.bin_llvm_tools_dir = "@CMAKE_BINARY_DIR@/bin" +config.bin_llvm_tools_dir = "@LLVM_RUNTIME_OUTPUT_INTDIR@" config.test_c_compiler = "@OPENMP_TEST_C_COMPILER@" config.test_cxx_compiler = "@OPENMP_TEST_CXX_COMPILER@" config.test_fortran_compiler="@OPENMP_TEST_Fortran_COMPILER@" diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index 1ef540e430a27a..51671afa62b0db 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -1,14 +1,14 @@ // RUN: %libomptarget-compile-generic -fprofile-generate \ // RUN: -Xclang "-fprofile-instrument=llvm" // RUN: env LLVM_PROFILE_FILE=llvm.profraw %libomptarget-run-generic 2>&1 -// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %profdata show --all-functions --counts \ // RUN: %target_triple.llvm.profraw | %fcheck-generic \ // RUN: --check-prefix="LLVM-PGO" // RUN: %libomptarget-compile-generic -fprofile-instr-generate \ // RUN: -Xclang "-fprofile-instrument=clang" // RUN: env LLVM_PROFILE_FILE=clang.profraw %libomptarget-run-generic 2>&1 -// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %profdata show --all-functions --counts \ // RUN: %target_triple.clang.profraw | %fcheck-generic \ // RUN: --check-prefix="CLANG-PGO" >From c6b34ad7a676a462955b2e7b534b12264363b430 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 28 Oct 2024 18:45:37 -0500 Subject: [PATCH 39/39] Prepend target prefix to basename --- compiler-rt/lib/profile/InstrProfilingFile.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index decafbcb1a5352..6b6f47e239714c 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1355,10 +1355,21 @@ int __llvm_write_custom_profile(const char *Target, TargetFilename = (char *)COMPILER_RT_ALLOCA(FilenameLength + TargetLength + 2); + /* Find file basename and path sizes */ + int32_t DirEnd = FilenameLength - 1; + while (DirEnd >= 0 && !IS_DIR_SEPARATOR(Filename[DirEnd])) { + DirEnd--; + } + uint32_t DirSize = DirEnd + 1, BaseSize = FilenameLength - DirSize; + /* Prepend "TARGET." to current filename */ - memcpy(TargetFilename, Target, TargetLength); - TargetFilename[TargetLength] = '.'; - memcpy(TargetFilename + 1 + TargetLength, Filename, FilenameLength); + if (DirSize > 0) { + memcpy(TargetFilename, Filename, DirSize); + } + memcpy(TargetFilename + DirSize, Target, TargetLength); + TargetFilename[TargetLength + DirSize] = '.'; + memcpy(TargetFilename + DirSize + 1 + TargetLength, Filename + DirSize, + BaseSize); TargetFilename[FilenameLength + 1 + TargetLength] = 0; /* Check if there is llvm/runtime version mismatch. */ From openmp-commits at lists.llvm.org Tue Oct 29 12:21:59 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Tue, 29 Oct 2024 12:21:59 -0700 (PDT) Subject: [Openmp-commits] [openmp] [OpenMP][OMPT][OMPD] Fix frame flags for OpenMP tool APIs (PR #114118) Message-ID: https://github.com/jprotze created https://github.com/llvm/llvm-project/pull/114118 In several cases the flags entries in ompt_frame_t are not initialized. According to @jdelsign the address provided as reenter and exit address is the canonical frame address (cfa) rather than a "framepointer". This patch makes sure that the flags entry is always initialized and changes the value from ompt_frame_framepointer to ompt_frame_cfa. The assertion in the tests makes sure that the flags are always set, when a tool (callback.h in this case) looks at the value. Fixes #89058 >From 22d02b63ac3bc6113e532ec6069afa3dd70a417d Mon Sep 17 00:00:00 2001 From: Joachim Jenke Date: Tue, 29 Oct 2024 20:13:00 +0100 Subject: [PATCH] [OpenMP][OMPT][OMPD] Fix frame flags for OpenMP tool APIs In several cases the flags entries in ompt_frame_t are not initialized. According to @jdelsign the address provided as reenter and exit address is the canonical frame address (cfa) rather than a "framepointer". This patch makes sure that the flags is always initialized and changes the value from ompt_frame_framepointer to ompt_frame_cfa. The assertion in the tests makes sure that the flags are always set, when a tool (callback.h in this case) looks at the value. Fixes #89058 --- openmp/runtime/src/kmp_tasking.cpp | 22 +++------------------- openmp/runtime/src/ompt-general.cpp | 1 + openmp/runtime/src/ompt-internal.h | 2 ++ openmp/runtime/src/ompt-specific.cpp | 2 ++ openmp/runtime/src/ompt-specific.h | 15 +++++++++++++++ openmp/runtime/test/ompt/callback.h | 22 +++++++++++++++++++++- 6 files changed, 44 insertions(+), 20 deletions(-) diff --git a/openmp/runtime/src/kmp_tasking.cpp b/openmp/runtime/src/kmp_tasking.cpp index 932799e133b45b..3e229b517cfcd6 100644 --- a/openmp/runtime/src/kmp_tasking.cpp +++ b/openmp/runtime/src/kmp_tasking.cpp @@ -714,22 +714,6 @@ static void __kmp_task_start(kmp_int32 gtid, kmp_task_t *task, #if OMPT_SUPPORT //------------------------------------------------------------------------------ -// __ompt_task_init: -// Initialize OMPT fields maintained by a task. This will only be called after -// ompt_start_tool, so we already know whether ompt is enabled or not. - -static inline void __ompt_task_init(kmp_taskdata_t *task, int tid) { - // The calls to __ompt_task_init already have the ompt_enabled condition. - task->ompt_task_info.task_data.value = 0; - task->ompt_task_info.frame.exit_frame = ompt_data_none; - task->ompt_task_info.frame.enter_frame = ompt_data_none; - task->ompt_task_info.frame.exit_frame_flags = - ompt_frame_runtime | ompt_frame_framepointer; - task->ompt_task_info.frame.enter_frame_flags = - ompt_frame_runtime | ompt_frame_framepointer; - task->ompt_task_info.dispatch_chunk.start = 0; - task->ompt_task_info.dispatch_chunk.iterations = 0; -} // __ompt_task_start: // Build and trigger task-begin event @@ -804,7 +788,7 @@ static void __kmpc_omp_task_begin_if0_template(ident_t *loc_ref, kmp_int32 gtid, taskdata->ompt_task_info.frame.exit_frame.ptr = frame_address; current_task->ompt_task_info.frame.enter_frame_flags = taskdata->ompt_task_info.frame.exit_frame_flags = - ompt_frame_application | ompt_frame_framepointer; + OMPT_FRAME_FLAGS_APP; } if (ompt_enabled.ompt_callback_task_create) { ompt_task_info_t *parent_info = &(current_task->ompt_task_info); @@ -1268,8 +1252,7 @@ static void __kmpc_omp_task_complete_if0_template(ident_t *loc_ref, ompt_frame_t *ompt_frame; __ompt_get_task_info_internal(0, NULL, NULL, &ompt_frame, NULL, NULL); ompt_frame->enter_frame = ompt_data_none; - ompt_frame->enter_frame_flags = - ompt_frame_runtime | ompt_frame_framepointer; + ompt_frame->enter_frame_flags = OMPT_FRAME_FLAGS_RUNTIME; } #endif @@ -2010,6 +1993,7 @@ kmp_int32 __kmpc_omp_task_parts(ident_t *loc_ref, kmp_int32 gtid, #if OMPT_SUPPORT if (UNLIKELY(ompt_enabled.enabled)) { parent->ompt_task_info.frame.enter_frame = ompt_data_none; + parent->ompt_task_info.frame.enter_frame_flags = OMPT_FRAME_FLAGS_RUNTIME; } #endif return TASK_CURRENT_NOT_QUEUED; diff --git a/openmp/runtime/src/ompt-general.cpp b/openmp/runtime/src/ompt-general.cpp index 923eea2a563a91..64f83a5cb19cec 100644 --- a/openmp/runtime/src/ompt-general.cpp +++ b/openmp/runtime/src/ompt-general.cpp @@ -497,6 +497,7 @@ void ompt_post_init() { kmp_info_t *root_thread = ompt_get_thread(); ompt_set_thread_state(root_thread, ompt_state_overhead); + __ompt_task_init(root_thread->th.th_current_task, 0); if (ompt_enabled.ompt_callback_thread_begin) { ompt_callbacks.ompt_callback(ompt_callback_thread_begin)( diff --git a/openmp/runtime/src/ompt-internal.h b/openmp/runtime/src/ompt-internal.h index 580a7c2ac79168..0cfab8bfaa1906 100644 --- a/openmp/runtime/src/ompt-internal.h +++ b/openmp/runtime/src/ompt-internal.h @@ -111,6 +111,8 @@ void ompt_fini(void); #define OMPT_GET_RETURN_ADDRESS(level) __builtin_return_address(level) #define OMPT_GET_FRAME_ADDRESS(level) __builtin_frame_address(level) +#define OMPT_FRAME_FLAGS_APP (ompt_frame_application | ompt_frame_cfa) +#define OMPT_FRAME_FLAGS_RUNTIME (ompt_frame_runtime | ompt_frame_cfa) int __kmp_control_tool(uint64_t command, uint64_t modifier, void *arg); diff --git a/openmp/runtime/src/ompt-specific.cpp b/openmp/runtime/src/ompt-specific.cpp index 0737c0cdfb1602..94ae2e52938751 100644 --- a/openmp/runtime/src/ompt-specific.cpp +++ b/openmp/runtime/src/ompt-specific.cpp @@ -266,6 +266,8 @@ void __ompt_lw_taskteam_init(ompt_lw_taskteam_t *lwt, kmp_info_t *thr, int gtid, lwt->ompt_task_info.task_data.value = 0; lwt->ompt_task_info.frame.enter_frame = ompt_data_none; lwt->ompt_task_info.frame.exit_frame = ompt_data_none; + lwt->ompt_task_info.frame.enter_frame_flags = OMPT_FRAME_FLAGS_RUNTIME; + lwt->ompt_task_info.frame.exit_frame_flags = OMPT_FRAME_FLAGS_RUNTIME; lwt->ompt_task_info.scheduling_parent = NULL; lwt->heap = 0; lwt->parent = 0; diff --git a/openmp/runtime/src/ompt-specific.h b/openmp/runtime/src/ompt-specific.h index 7864ed6126c701..e9e40d43429eaf 100644 --- a/openmp/runtime/src/ompt-specific.h +++ b/openmp/runtime/src/ompt-specific.h @@ -54,6 +54,21 @@ int __ompt_get_task_info_internal(int ancestor_level, int *type, ompt_data_t *__ompt_get_thread_data_internal(); +// __ompt_task_init: +// Initialize OMPT fields maintained by a task. This will only be called after +// ompt_start_tool, so we already know whether ompt is enabled or not. + +static inline void __ompt_task_init(kmp_taskdata_t *task, int tid) { + // The calls to __ompt_task_init already have the ompt_enabled condition. + task->ompt_task_info.task_data.value = 0; + task->ompt_task_info.frame.exit_frame = ompt_data_none; + task->ompt_task_info.frame.enter_frame = ompt_data_none; + task->ompt_task_info.frame.exit_frame_flags = + task->ompt_task_info.frame.enter_frame_flags = OMPT_FRAME_FLAGS_RUNTIME; + task->ompt_task_info.dispatch_chunk.start = 0; + task->ompt_task_info.dispatch_chunk.iterations = 0; +} + /* * Unused currently static uint64_t __ompt_get_get_unique_id_internal(); diff --git a/openmp/runtime/test/ompt/callback.h b/openmp/runtime/test/ompt/callback.h index 07d38cf836dff0..4dd1db4c4225b3 100644 --- a/openmp/runtime/test/ompt/callback.h +++ b/openmp/runtime/test/ompt/callback.h @@ -12,6 +12,8 @@ #include #include #include "ompt-signal.h" +#include +#include // Used to detect architecture #include "../../src/kmp_platform.h" @@ -147,6 +149,22 @@ static ompt_get_proc_id_t ompt_get_proc_id; static ompt_enumerate_states_t ompt_enumerate_states; static ompt_enumerate_mutex_impls_t ompt_enumerate_mutex_impls; +void assert_frame_flags(int enterf, int exitf) { + if (!(enterf == (ompt_frame_application | ompt_frame_cfa) || + enterf == (ompt_frame_runtime | ompt_frame_cfa))) { + printf("enter_frame_flags (%i) is invalid\n", enterf); + fflush(NULL); + } + if (!(exitf == (ompt_frame_application | ompt_frame_cfa) || + exitf == (ompt_frame_runtime | ompt_frame_cfa))) { + printf("exit_frame_flags (%i) is invalid\n", exitf); + fflush(NULL); + } + assert(enterf == (ompt_frame_application | ompt_frame_cfa) || + enterf == (ompt_frame_runtime | ompt_frame_cfa)); + assert(exitf == (ompt_frame_application | ompt_frame_cfa) || + exitf == (ompt_frame_runtime | ompt_frame_cfa)); +} static void print_ids(int level) { int task_type, thread_num; @@ -157,7 +175,7 @@ static void print_ids(int level) &task_parallel_data, &thread_num); char buffer[2048]; format_task_type(task_type, buffer); - if (frame) + if (frame) { printf("%" PRIu64 ": task level %d: parallel_id=%" PRIu64 ", task_id=%" PRIu64 ", exit_frame=%p, reenter_frame=%p, " "task_type=%s=%d, thread_num=%d\n", @@ -165,6 +183,8 @@ static void print_ids(int level) exists_task ? task_parallel_data->value : 0, exists_task ? task_data->value : 0, frame->exit_frame.ptr, frame->enter_frame.ptr, buffer, task_type, thread_num); + assert_frame_flags(frame->enter_frame_flags, frame->exit_frame_flags); + } } #define get_frame_address(level) __builtin_frame_address(level) From openmp-commits at lists.llvm.org Wed Oct 30 05:57:25 2024 From: openmp-commits at lists.llvm.org (=?UTF-8?Q?Jan_Andr=C3=A9_Reuter?= via Openmp-commits) Date: Wed, 30 Oct 2024 05:57:25 -0700 (PDT) Subject: [Openmp-commits] [openmp] [OMPT] Fix issue #84180 (PR #84215) In-Reply-To: Message-ID: <67222d35.170a0220.31d2fb.08a8@mx.google.com> Thyre wrote: Is there any update on this? We recently ran into the same issue while investigating overhead with EPCC benchmarks. https://github.com/llvm/llvm-project/pull/84215 From openmp-commits at lists.llvm.org Wed Oct 30 08:50:41 2024 From: openmp-commits at lists.llvm.org (Hansang Bae via Openmp-commits) Date: Wed, 30 Oct 2024 08:50:41 -0700 (PDT) Subject: [Openmp-commits] [openmp] [OpenMP][OMPT][OMPD] Fix frame flags for OpenMP tool APIs (PR #114118) In-Reply-To: Message-ID: <672255d1.a70a0220.18e75e.92c4@mx.google.com> https://github.com/hansangbae approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/114118 From openmp-commits at lists.llvm.org Thu Oct 31 02:48:02 2024 From: openmp-commits at lists.llvm.org (Sylvestre Ledru via Openmp-commits) Date: Thu, 31 Oct 2024 02:48:02 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <67235252.170a0220.1be5e0.2096@mx.google.com> sylvestre wrote: @everythingfunctional with this change and install, we are installing flang with the -20 in the install dir: ` -- Installing: /build/source/debian/tmp/usr/lib/llvm-20/bin/flang-20 ` it should not be the case, it should be named `/usr/lib/llvm-20/bin/flang` to match what is done elsewhere https://github.com/llvm/llvm-project/pull/110023 From openmp-commits at lists.llvm.org Thu Oct 31 02:51:41 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Thu, 31 Oct 2024 02:51:41 -0700 (PDT) Subject: [Openmp-commits] [openmp] [OpenMP][OMPT][OMPD] Fix frame flags for OpenMP tool APIs (PR #114118) In-Reply-To: Message-ID: <6723532d.170a0220.224316.1a99@mx.google.com> jprotze wrote: Thanks for the review. I'll give @jdelsign some time to check that this patch indeed fixes his issues reported in #89058 . https://github.com/llvm/llvm-project/pull/114118 From openmp-commits at lists.llvm.org Thu Oct 31 12:22:24 2024 From: openmp-commits at lists.llvm.org (Brad Richardson via Openmp-commits) Date: Thu, 31 Oct 2024 12:22:24 -0700 (PDT) Subject: [Openmp-commits] [clang] [flang] [llvm] [openmp] [flang][driver] rename flang-new to flang (PR #110023) In-Reply-To: Message-ID: <6723d8f0.050a0220.3275ab.7f94@mx.google.com> everythingfunctional wrote: > @everythingfunctional with this change and install, we are installing flang with the -20 in the install dir: `-- Installing: /build/source/debian/tmp/usr/lib/llvm-20/bin/flang-20` > > it should not be the case, it should be named `/usr/lib/llvm-20/bin/flang` to match what is done elsewhere Forgive me if I misunderstood or implemented this incorrectly, but I was under the impression (based on [an earlier comment](https://github.com/llvm/llvm-project/pull/110023#issuecomment-2378556647)) that this was the intended behavior. https://github.com/llvm/llvm-project/pull/110023