From openmp-commits at lists.llvm.org Mon Dec 2 07:25:32 2024 From: openmp-commits at lists.llvm.org (David Spickett via Openmp-commits) Date: Mon, 02 Dec 2024 07:25:32 -0800 (PST) Subject: [Openmp-commits] [libclc] [libcxx] [llvm] [openmp] [polly] [llvm] Move sub-project lead maintainers into their own Maintainers.md files (PR #118309) Message-ID: https://github.com/DavidSpickett created https://github.com/llvm/llvm-project/pull/118309 This adds Maintainers.md files to the sub-projects mentioned here, so that someone going directly to that sub-project will find them more easily. Recently I needed to find a libclc maintainer and I had no idea there was one listed here instead of in libclc/. >From 9d279bcb1174e80e24ddac8755962cb65e20be91 Mon Sep 17 00:00:00 2001 From: David Spickett Date: Mon, 2 Dec 2024 15:22:50 +0000 Subject: [PATCH] [llvm] Move sub-project lead maintainers into their own files This adds Maintainers.md files to the sub-projects mentioned here, so that someone going directly to that sub-project will find them more easily. Recently I needed to find a libclc maintainer and I had no idea there was one listed here instead of in libclc/. --- libclc/Maintainers.md | 10 ++++++++++ libcxx/Maintainers.md | 10 ++++++++++ llvm/Maintainers.md | 25 ++++++------------------- openmp/Maintainers.md | 10 ++++++++++ polly/Maintainers.md | 10 ++++++++++ 5 files changed, 46 insertions(+), 19 deletions(-) create mode 100644 libclc/Maintainers.md create mode 100644 libcxx/Maintainers.md create mode 100644 openmp/Maintainers.md create mode 100644 polly/Maintainers.md diff --git a/libclc/Maintainers.md b/libclc/Maintainers.md new file mode 100644 index 00000000000000..314b139b81aaff --- /dev/null +++ b/libclc/Maintainers.md @@ -0,0 +1,10 @@ +# libclc Maintainers + +This file is a list of the +[maintainers](https://llvm.org/docs/DeveloperPolicy.html#maintainers) for +libclc. + +# Lead maintainer + +Tom Stellard \ +tstellar at redhat.com (email), [tstellar](https://github.com/tstellar) (GitHub) \ No newline at end of file diff --git a/libcxx/Maintainers.md b/libcxx/Maintainers.md new file mode 100644 index 00000000000000..2630ed9001119b --- /dev/null +++ b/libcxx/Maintainers.md @@ -0,0 +1,10 @@ +# libc++ Maintainers + +This file is a list of the +[maintainers](https://llvm.org/docs/DeveloperPolicy.html#maintainers) for +libc++. + +# Lead maintainer + +Louis Dionne \ +ldionne.2 at gmail.com (email), [ldionne](https://github.com/ldionne) (GitHub) \ No newline at end of file diff --git a/llvm/Maintainers.md b/llvm/Maintainers.md index c5f98f76bda312..bb05552461b21c 100644 --- a/llvm/Maintainers.md +++ b/llvm/Maintainers.md @@ -422,7 +422,6 @@ gkistanova at gmail.com (email), [gkistanova](https://github.com/gkistanova) (GitHu ### Other subprojects Some subprojects maintain their own list of per-component maintainers. -Others only have a lead maintainer listed here. [Bolt maintainers](https://github.com/llvm/llvm-project/blob/main/bolt/Maintainers.txt) @@ -434,29 +433,17 @@ Others only have a lead maintainer listed here. [Flang maintainers](https://github.com/llvm/llvm-project/blob/main/flang/Maintainers.txt) -[LLD maintainers](https://github.com/llvm/llvm-project/blob/main/lld/CODE_OWNERS.TXT) - -[LLDB maintainers](https://github.com/llvm/llvm-project/blob/main/lldb/Maintainers.rst) - -#### libc++ - -Louis Dionne \ -ldionne.2 at gmail.com (email), [ldionne](https://github.com/ldionne) (GitHub) +[libc++ maintainers](https://github.com/llvm/llvm-project/blob/main/libcxx/Maintainers.rst) -#### libclc +[libclc maintainers](https://github.com/llvm/llvm-project/blob/main/libclc/Maintainers.rst) -Tom Stellard \ -tstellar at redhat.com (email), [tstellar](https://github.com/tstellar) (GitHub) - -#### OpenMP (runtime library) +[LLD maintainers](https://github.com/llvm/llvm-project/blob/main/lld/CODE_OWNERS.TXT) -Andrey Churbanov \ -andrey.churbanov at intel.com (email), [AndreyChurbanov](https://github.com/AndreyChurbanov) (GitHub) +[LLDB maintainers](https://github.com/llvm/llvm-project/blob/main/lldb/Maintainers.rst) -#### Polly +[LLVM OpenMP Library maintainers](https://github.com/llvm/llvm-project/blob/main/openmp/Maintainers.rst) -Tobias Grosser \ -tobias at grosser.es (email), [tobiasgrosser](https://github.com/tobiasgrosser) (GitHub) +[Polly maintainers](https://github.com/llvm/llvm-project/blob/main/polly/Maintainers.rst) ## Inactive Maintainers diff --git a/openmp/Maintainers.md b/openmp/Maintainers.md new file mode 100644 index 00000000000000..fc49c9e35928b9 --- /dev/null +++ b/openmp/Maintainers.md @@ -0,0 +1,10 @@ +# LLVM OpenMP Library Maintainers + +This file is a list of the +[maintainers](https://llvm.org/docs/DeveloperPolicy.html#maintainers) for +the LLVM OpenMP library. + +# Lead maintainer + +Andrey Churbanov \ +andrey.churbanov at intel.com (email), [AndreyChurbanov](https://github.com/AndreyChurbanov) (GitHub) \ No newline at end of file diff --git a/polly/Maintainers.md b/polly/Maintainers.md new file mode 100644 index 00000000000000..6c0ecef0f281a9 --- /dev/null +++ b/polly/Maintainers.md @@ -0,0 +1,10 @@ +# Polly Maintainers + +This file is a list of the +[maintainers](https://llvm.org/docs/DeveloperPolicy.html#maintainers) for +Polly. + +# Lead maintainer + +Tobias Grosser \ +tobias at grosser.es (email), [tobiasgrosser](https://github.com/tobiasgrosser) (GitHub) \ No newline at end of file From openmp-commits at lists.llvm.org Mon Dec 2 07:26:02 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Mon, 02 Dec 2024 07:26:02 -0800 (PST) Subject: [Openmp-commits] [libclc] [libcxx] [llvm] [openmp] [polly] [llvm] Move sub-project lead maintainers into their own Maintainers.md files (PR #118309) In-Reply-To: Message-ID: <674dd18a.170a0220.1c7b68.ba22@mx.google.com> llvmbot wrote: @llvm/pr-subscribers-libcxx Author: David Spickett (DavidSpickett)
Changes This adds Maintainers.md files to the sub-projects mentioned here, so that someone going directly to that sub-project will find them more easily. Recently I needed to find a libclc maintainer and I had no idea there was one listed here instead of in libclc/. --- Full diff: https://github.com/llvm/llvm-project/pull/118309.diff 5 Files Affected: - (added) libclc/Maintainers.md (+10) - (added) libcxx/Maintainers.md (+10) - (modified) llvm/Maintainers.md (+6-19) - (added) openmp/Maintainers.md (+10) - (added) polly/Maintainers.md (+10) ``````````diff diff --git a/libclc/Maintainers.md b/libclc/Maintainers.md new file mode 100644 index 00000000000000..314b139b81aaff --- /dev/null +++ b/libclc/Maintainers.md @@ -0,0 +1,10 @@ +# libclc Maintainers + +This file is a list of the +[maintainers](https://llvm.org/docs/DeveloperPolicy.html#maintainers) for +libclc. + +# Lead maintainer + +Tom Stellard \ +tstellar at redhat.com (email), [tstellar](https://github.com/tstellar) (GitHub) \ No newline at end of file diff --git a/libcxx/Maintainers.md b/libcxx/Maintainers.md new file mode 100644 index 00000000000000..2630ed9001119b --- /dev/null +++ b/libcxx/Maintainers.md @@ -0,0 +1,10 @@ +# libc++ Maintainers + +This file is a list of the +[maintainers](https://llvm.org/docs/DeveloperPolicy.html#maintainers) for +libc++. + +# Lead maintainer + +Louis Dionne \ +ldionne.2 at gmail.com (email), [ldionne](https://github.com/ldionne) (GitHub) \ No newline at end of file diff --git a/llvm/Maintainers.md b/llvm/Maintainers.md index c5f98f76bda312..bb05552461b21c 100644 --- a/llvm/Maintainers.md +++ b/llvm/Maintainers.md @@ -422,7 +422,6 @@ gkistanova at gmail.com (email), [gkistanova](https://github.com/gkistanova) (GitHu ### Other subprojects Some subprojects maintain their own list of per-component maintainers. -Others only have a lead maintainer listed here. [Bolt maintainers](https://github.com/llvm/llvm-project/blob/main/bolt/Maintainers.txt) @@ -434,29 +433,17 @@ Others only have a lead maintainer listed here. [Flang maintainers](https://github.com/llvm/llvm-project/blob/main/flang/Maintainers.txt) -[LLD maintainers](https://github.com/llvm/llvm-project/blob/main/lld/CODE_OWNERS.TXT) - -[LLDB maintainers](https://github.com/llvm/llvm-project/blob/main/lldb/Maintainers.rst) - -#### libc++ - -Louis Dionne \ -ldionne.2 at gmail.com (email), [ldionne](https://github.com/ldionne) (GitHub) +[libc++ maintainers](https://github.com/llvm/llvm-project/blob/main/libcxx/Maintainers.rst) -#### libclc +[libclc maintainers](https://github.com/llvm/llvm-project/blob/main/libclc/Maintainers.rst) -Tom Stellard \ -tstellar at redhat.com (email), [tstellar](https://github.com/tstellar) (GitHub) - -#### OpenMP (runtime library) +[LLD maintainers](https://github.com/llvm/llvm-project/blob/main/lld/CODE_OWNERS.TXT) -Andrey Churbanov \ -andrey.churbanov at intel.com (email), [AndreyChurbanov](https://github.com/AndreyChurbanov) (GitHub) +[LLDB maintainers](https://github.com/llvm/llvm-project/blob/main/lldb/Maintainers.rst) -#### Polly +[LLVM OpenMP Library maintainers](https://github.com/llvm/llvm-project/blob/main/openmp/Maintainers.rst) -Tobias Grosser \ -tobias at grosser.es (email), [tobiasgrosser](https://github.com/tobiasgrosser) (GitHub) +[Polly maintainers](https://github.com/llvm/llvm-project/blob/main/polly/Maintainers.rst) ## Inactive Maintainers diff --git a/openmp/Maintainers.md b/openmp/Maintainers.md new file mode 100644 index 00000000000000..fc49c9e35928b9 --- /dev/null +++ b/openmp/Maintainers.md @@ -0,0 +1,10 @@ +# LLVM OpenMP Library Maintainers + +This file is a list of the +[maintainers](https://llvm.org/docs/DeveloperPolicy.html#maintainers) for +the LLVM OpenMP library. + +# Lead maintainer + +Andrey Churbanov \ +andrey.churbanov at intel.com (email), [AndreyChurbanov](https://github.com/AndreyChurbanov) (GitHub) \ No newline at end of file diff --git a/polly/Maintainers.md b/polly/Maintainers.md new file mode 100644 index 00000000000000..6c0ecef0f281a9 --- /dev/null +++ b/polly/Maintainers.md @@ -0,0 +1,10 @@ +# Polly Maintainers + +This file is a list of the +[maintainers](https://llvm.org/docs/DeveloperPolicy.html#maintainers) for +Polly. + +# Lead maintainer + +Tobias Grosser \ +tobias at grosser.es (email), [tobiasgrosser](https://github.com/tobiasgrosser) (GitHub) \ No newline at end of file ``````````
https://github.com/llvm/llvm-project/pull/118309 From openmp-commits at lists.llvm.org Mon Dec 2 07:27:48 2024 From: openmp-commits at lists.llvm.org (David Spickett via Openmp-commits) Date: Mon, 02 Dec 2024 07:27:48 -0800 (PST) Subject: [Openmp-commits] [libclc] [libcxx] [llvm] [openmp] [polly] [llvm] Move sub-project lead maintainers into their own Maintainers.md files (PR #118309) In-Reply-To: Message-ID: <674dd1f4.170a0220.2fd46a.0990@mx.google.com> DavidSpickett wrote: @tobiasgrosser (because for some reason I can't add you as a reviewer directly) https://github.com/llvm/llvm-project/pull/118309 From openmp-commits at lists.llvm.org Mon Dec 2 07:36:51 2024 From: openmp-commits at lists.llvm.org (Nikita Popov via Openmp-commits) Date: Mon, 02 Dec 2024 07:36:51 -0800 (PST) Subject: [Openmp-commits] [libclc] [libcxx] [llvm] [openmp] [polly] [llvm] Move sub-project lead maintainers into their own Maintainers.md files (PR #118309) In-Reply-To: Message-ID: <674dd413.170a0220.8a8f7.a84e@mx.google.com> https://github.com/nikic commented: Looks reasonable to me. We might want to use this chance to also get the information up to date... I think for polly this should probably be @tobiasgrosser -> @Meinersbur. For libclc maybe @tstellar -> @frasercrmck (or maybe both). https://github.com/llvm/llvm-project/pull/118309 From openmp-commits at lists.llvm.org Mon Dec 2 07:36:51 2024 From: openmp-commits at lists.llvm.org (Nikita Popov via Openmp-commits) Date: Mon, 02 Dec 2024 07:36:51 -0800 (PST) Subject: [Openmp-commits] [libclc] [libcxx] [llvm] [openmp] [polly] [llvm] Move sub-project lead maintainers into their own Maintainers.md files (PR #118309) In-Reply-To: Message-ID: <674dd413.170a0220.55d2c.bb4c@mx.google.com> https://github.com/nikic edited https://github.com/llvm/llvm-project/pull/118309 From openmp-commits at lists.llvm.org Mon Dec 2 07:36:53 2024 From: openmp-commits at lists.llvm.org (Nikita Popov via Openmp-commits) Date: Mon, 02 Dec 2024 07:36:53 -0800 (PST) Subject: [Openmp-commits] [libclc] [libcxx] [llvm] [openmp] [polly] [llvm] Move sub-project lead maintainers into their own Maintainers.md files (PR #118309) In-Reply-To: Message-ID: <674dd415.170a0220.bfa8a.d72c@mx.google.com> ================ @@ -434,29 +433,17 @@ Others only have a lead maintainer listed here. [Flang maintainers](https://github.com/llvm/llvm-project/blob/main/flang/Maintainers.txt) -[LLD maintainers](https://github.com/llvm/llvm-project/blob/main/lld/CODE_OWNERS.TXT) - -[LLDB maintainers](https://github.com/llvm/llvm-project/blob/main/lldb/Maintainers.rst) - -#### libc++ - -Louis Dionne \ -ldionne.2 at gmail.com (email), [ldionne](https://github.com/ldionne) (GitHub) +[libc++ maintainers](https://github.com/llvm/llvm-project/blob/main/libcxx/Maintainers.rst) -#### libclc +[libclc maintainers](https://github.com/llvm/llvm-project/blob/main/libclc/Maintainers.rst) ---------------- nikic wrote: ```suggestion [libclc maintainers](https://github.com/llvm/llvm-project/blob/main/libclc/Maintainers.md) ``` etc https://github.com/llvm/llvm-project/pull/118309 From openmp-commits at lists.llvm.org Mon Dec 2 07:41:50 2024 From: openmp-commits at lists.llvm.org (Nikita Popov via Openmp-commits) Date: Mon, 02 Dec 2024 07:41:50 -0800 (PST) Subject: [Openmp-commits] [libclc] [libcxx] [llvm] [openmp] [polly] [llvm] Move sub-project lead maintainers into their own Maintainers.md files (PR #118309) In-Reply-To: Message-ID: <674dd53e.170a0220.1abae2.0870@mx.google.com> nikic wrote: For openmp I'm not really sure, maybe @jdoerfert or @jpeyton52 can advise. https://github.com/llvm/llvm-project/pull/118309 From openmp-commits at lists.llvm.org Mon Dec 2 07:48:26 2024 From: openmp-commits at lists.llvm.org (David Spickett via Openmp-commits) Date: Mon, 02 Dec 2024 07:48:26 -0800 (PST) Subject: [Openmp-commits] [libclc] [libcxx] [llvm] [openmp] [polly] [llvm] Move sub-project lead maintainers into their own Maintainers.md files (PR #118309) In-Reply-To: Message-ID: <674dd6ca.170a0220.30938c.08d5@mx.google.com> https://github.com/DavidSpickett updated https://github.com/llvm/llvm-project/pull/118309 >From 9d279bcb1174e80e24ddac8755962cb65e20be91 Mon Sep 17 00:00:00 2001 From: David Spickett Date: Mon, 2 Dec 2024 15:22:50 +0000 Subject: [PATCH 1/2] [llvm] Move sub-project lead maintainers into their own files This adds Maintainers.md files to the sub-projects mentioned here, so that someone going directly to that sub-project will find them more easily. Recently I needed to find a libclc maintainer and I had no idea there was one listed here instead of in libclc/. --- libclc/Maintainers.md | 10 ++++++++++ libcxx/Maintainers.md | 10 ++++++++++ llvm/Maintainers.md | 25 ++++++------------------- openmp/Maintainers.md | 10 ++++++++++ polly/Maintainers.md | 10 ++++++++++ 5 files changed, 46 insertions(+), 19 deletions(-) create mode 100644 libclc/Maintainers.md create mode 100644 libcxx/Maintainers.md create mode 100644 openmp/Maintainers.md create mode 100644 polly/Maintainers.md diff --git a/libclc/Maintainers.md b/libclc/Maintainers.md new file mode 100644 index 00000000000000..314b139b81aaff --- /dev/null +++ b/libclc/Maintainers.md @@ -0,0 +1,10 @@ +# libclc Maintainers + +This file is a list of the +[maintainers](https://llvm.org/docs/DeveloperPolicy.html#maintainers) for +libclc. + +# Lead maintainer + +Tom Stellard \ +tstellar at redhat.com (email), [tstellar](https://github.com/tstellar) (GitHub) \ No newline at end of file diff --git a/libcxx/Maintainers.md b/libcxx/Maintainers.md new file mode 100644 index 00000000000000..2630ed9001119b --- /dev/null +++ b/libcxx/Maintainers.md @@ -0,0 +1,10 @@ +# libc++ Maintainers + +This file is a list of the +[maintainers](https://llvm.org/docs/DeveloperPolicy.html#maintainers) for +libc++. + +# Lead maintainer + +Louis Dionne \ +ldionne.2 at gmail.com (email), [ldionne](https://github.com/ldionne) (GitHub) \ No newline at end of file diff --git a/llvm/Maintainers.md b/llvm/Maintainers.md index c5f98f76bda312..bb05552461b21c 100644 --- a/llvm/Maintainers.md +++ b/llvm/Maintainers.md @@ -422,7 +422,6 @@ gkistanova at gmail.com (email), [gkistanova](https://github.com/gkistanova) (GitHu ### Other subprojects Some subprojects maintain their own list of per-component maintainers. -Others only have a lead maintainer listed here. [Bolt maintainers](https://github.com/llvm/llvm-project/blob/main/bolt/Maintainers.txt) @@ -434,29 +433,17 @@ Others only have a lead maintainer listed here. [Flang maintainers](https://github.com/llvm/llvm-project/blob/main/flang/Maintainers.txt) -[LLD maintainers](https://github.com/llvm/llvm-project/blob/main/lld/CODE_OWNERS.TXT) - -[LLDB maintainers](https://github.com/llvm/llvm-project/blob/main/lldb/Maintainers.rst) - -#### libc++ - -Louis Dionne \ -ldionne.2 at gmail.com (email), [ldionne](https://github.com/ldionne) (GitHub) +[libc++ maintainers](https://github.com/llvm/llvm-project/blob/main/libcxx/Maintainers.rst) -#### libclc +[libclc maintainers](https://github.com/llvm/llvm-project/blob/main/libclc/Maintainers.rst) -Tom Stellard \ -tstellar at redhat.com (email), [tstellar](https://github.com/tstellar) (GitHub) - -#### OpenMP (runtime library) +[LLD maintainers](https://github.com/llvm/llvm-project/blob/main/lld/CODE_OWNERS.TXT) -Andrey Churbanov \ -andrey.churbanov at intel.com (email), [AndreyChurbanov](https://github.com/AndreyChurbanov) (GitHub) +[LLDB maintainers](https://github.com/llvm/llvm-project/blob/main/lldb/Maintainers.rst) -#### Polly +[LLVM OpenMP Library maintainers](https://github.com/llvm/llvm-project/blob/main/openmp/Maintainers.rst) -Tobias Grosser \ -tobias at grosser.es (email), [tobiasgrosser](https://github.com/tobiasgrosser) (GitHub) +[Polly maintainers](https://github.com/llvm/llvm-project/blob/main/polly/Maintainers.rst) ## Inactive Maintainers diff --git a/openmp/Maintainers.md b/openmp/Maintainers.md new file mode 100644 index 00000000000000..fc49c9e35928b9 --- /dev/null +++ b/openmp/Maintainers.md @@ -0,0 +1,10 @@ +# LLVM OpenMP Library Maintainers + +This file is a list of the +[maintainers](https://llvm.org/docs/DeveloperPolicy.html#maintainers) for +the LLVM OpenMP library. + +# Lead maintainer + +Andrey Churbanov \ +andrey.churbanov at intel.com (email), [AndreyChurbanov](https://github.com/AndreyChurbanov) (GitHub) \ No newline at end of file diff --git a/polly/Maintainers.md b/polly/Maintainers.md new file mode 100644 index 00000000000000..6c0ecef0f281a9 --- /dev/null +++ b/polly/Maintainers.md @@ -0,0 +1,10 @@ +# Polly Maintainers + +This file is a list of the +[maintainers](https://llvm.org/docs/DeveloperPolicy.html#maintainers) for +Polly. + +# Lead maintainer + +Tobias Grosser \ +tobias at grosser.es (email), [tobiasgrosser](https://github.com/tobiasgrosser) (GitHub) \ No newline at end of file >From 5912070b56eecd3eca5ef24cad67ac7ac9645b00 Mon Sep 17 00:00:00 2001 From: David Spickett Date: Mon, 2 Dec 2024 15:47:31 +0000 Subject: [PATCH 2/2] .rst -> .md --- llvm/Maintainers.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/llvm/Maintainers.md b/llvm/Maintainers.md index bb05552461b21c..cf9f182550d5ef 100644 --- a/llvm/Maintainers.md +++ b/llvm/Maintainers.md @@ -433,17 +433,17 @@ Some subprojects maintain their own list of per-component maintainers. [Flang maintainers](https://github.com/llvm/llvm-project/blob/main/flang/Maintainers.txt) -[libc++ maintainers](https://github.com/llvm/llvm-project/blob/main/libcxx/Maintainers.rst) +[libc++ maintainers](https://github.com/llvm/llvm-project/blob/main/libcxx/Maintainers.md) -[libclc maintainers](https://github.com/llvm/llvm-project/blob/main/libclc/Maintainers.rst) +[libclc maintainers](https://github.com/llvm/llvm-project/blob/main/libclc/Maintainers.md) [LLD maintainers](https://github.com/llvm/llvm-project/blob/main/lld/CODE_OWNERS.TXT) [LLDB maintainers](https://github.com/llvm/llvm-project/blob/main/lldb/Maintainers.rst) -[LLVM OpenMP Library maintainers](https://github.com/llvm/llvm-project/blob/main/openmp/Maintainers.rst) +[LLVM OpenMP Library maintainers](https://github.com/llvm/llvm-project/blob/main/openmp/Maintainers.md) -[Polly maintainers](https://github.com/llvm/llvm-project/blob/main/polly/Maintainers.rst) +[Polly maintainers](https://github.com/llvm/llvm-project/blob/main/polly/Maintainers.md) ## Inactive Maintainers From openmp-commits at lists.llvm.org Mon Dec 2 07:48:38 2024 From: openmp-commits at lists.llvm.org (David Spickett via Openmp-commits) Date: Mon, 02 Dec 2024 07:48:38 -0800 (PST) Subject: [Openmp-commits] [libclc] [libcxx] [llvm] [openmp] [polly] [llvm] Move sub-project lead maintainers into their own Maintainers.md files (PR #118309) In-Reply-To: Message-ID: <674dd6d6.050a0220.e1073.3038@mx.google.com> ================ @@ -434,29 +433,17 @@ Others only have a lead maintainer listed here. [Flang maintainers](https://github.com/llvm/llvm-project/blob/main/flang/Maintainers.txt) -[LLD maintainers](https://github.com/llvm/llvm-project/blob/main/lld/CODE_OWNERS.TXT) - -[LLDB maintainers](https://github.com/llvm/llvm-project/blob/main/lldb/Maintainers.rst) - -#### libc++ - -Louis Dionne \ -ldionne.2 at gmail.com (email), [ldionne](https://github.com/ldionne) (GitHub) +[libc++ maintainers](https://github.com/llvm/llvm-project/blob/main/libcxx/Maintainers.rst) -#### libclc +[libclc maintainers](https://github.com/llvm/llvm-project/blob/main/libclc/Maintainers.rst) ---------------- DavidSpickett wrote: Fixed them all. https://github.com/llvm/llvm-project/pull/118309 From openmp-commits at lists.llvm.org Mon Dec 2 18:23:53 2024 From: openmp-commits at lists.llvm.org (Shilei Tian via Openmp-commits) Date: Mon, 02 Dec 2024 18:23:53 -0800 (PST) Subject: [Openmp-commits] [openmp] [OpenMP] Fix or disable NVPTX tests failing currently (PR #77844) In-Reply-To: Message-ID: <674e6bb9.050a0220.5373c.6f00@mx.google.com> ================ @@ -4,7 +4,10 @@ // UNSUPPORTED: clang-6, clang-7, clang-8, clang-9 // Fails on amdgpu with error: GPU Memory Error +// Fails on nvptx with error: an illegal memory access was encountered // XFAIL: amdgcn-amd-amdhsa +// XFAIL: nvptx64-nvidia-cuda ---------------- shiltian wrote: probably nvptx fixed the issue in some version. this can constantly pass on my local machine. https://github.com/llvm/llvm-project/pull/77844 From openmp-commits at lists.llvm.org Mon Dec 2 18:24:24 2024 From: openmp-commits at lists.llvm.org (Shilei Tian via Openmp-commits) Date: Mon, 02 Dec 2024 18:24:24 -0800 (PST) Subject: [Openmp-commits] [openmp] [OpenMP] Fix or disable NVPTX tests failing currently (PR #77844) In-Reply-To: Message-ID: <674e6bd8.170a0220.2864a8.beee@mx.google.com> https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/77844 From openmp-commits at lists.llvm.org Mon Dec 2 18:24:38 2024 From: openmp-commits at lists.llvm.org (Shilei Tian via Openmp-commits) Date: Mon, 02 Dec 2024 18:24:38 -0800 (PST) Subject: [Openmp-commits] [openmp] [OpenMP] Fix or disable NVPTX tests failing currently (PR #77844) In-Reply-To: Message-ID: <674e6be6.050a0220.30dcba.69c4@mx.google.com> https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/77844 From openmp-commits at lists.llvm.org Tue Dec 3 02:15:06 2024 From: openmp-commits at lists.llvm.org (Fraser Cormack via Openmp-commits) Date: Tue, 03 Dec 2024 02:15:06 -0800 (PST) Subject: [Openmp-commits] [libclc] [libcxx] [llvm] [openmp] [polly] [llvm] Move sub-project lead maintainers into their own Maintainers.md files (PR #118309) In-Reply-To: Message-ID: <674eda2a.170a0220.65821.fbdd@mx.google.com> frasercrmck wrote: > For libclc maybe @tstellar -> @frasercrmck (or maybe both). I'm not opposed to this if people think it suitable. But having someone from Red Hat or AMD as co-maintainer(s) makes sense to me, as it's a project used (downstream) by several different groups. https://github.com/llvm/llvm-project/pull/118309 From openmp-commits at lists.llvm.org Tue Dec 3 10:24:17 2024 From: openmp-commits at lists.llvm.org (Nikita Popov via Openmp-commits) Date: Tue, 03 Dec 2024 10:24:17 -0800 (PST) Subject: [Openmp-commits] [libclc] [libcxx] [llvm] [openmp] [polly] [llvm] Move sub-project lead maintainers into their own Maintainers.md files (PR #118309) In-Reply-To: Message-ID: <674f4cd1.170a0220.37b61f.9be0@mx.google.com> nikic wrote: Looks like some openmp maintainers have been proposed in https://github.com/llvm/llvm-project/pull/118521. Do we want to let that land first or integrate it here? (cc @nawrinsu) https://github.com/llvm/llvm-project/pull/118309 From openmp-commits at lists.llvm.org Wed Dec 4 00:51:09 2024 From: openmp-commits at lists.llvm.org (David Spickett via Openmp-commits) Date: Wed, 04 Dec 2024 00:51:09 -0800 (PST) Subject: [Openmp-commits] [libclc] [libcxx] [llvm] [openmp] [polly] [llvm] Move sub-project lead maintainers into their own Maintainers.md files (PR #118309) In-Reply-To: Message-ID: <675017fd.170a0220.2836c6.24b5@mx.google.com> DavidSpickett wrote: It already landed, I'll rebase this PR. https://github.com/llvm/llvm-project/pull/118309 From openmp-commits at lists.llvm.org Wed Dec 4 01:05:30 2024 From: openmp-commits at lists.llvm.org (David Spickett via Openmp-commits) Date: Wed, 04 Dec 2024 01:05:30 -0800 (PST) Subject: [Openmp-commits] [openmp] 2137ded - [OpenMP] Add Maintainers.md file Message-ID: <67501b5a.170a0220.3221d6.24ed@mx.google.com> Author: David Spickett Date: 2024-12-04T09:04:31Z New Revision: 2137ded301adb430e0616cd835da9838e4fd79ce URL: https://github.com/llvm/llvm-project/commit/2137ded301adb430e0616cd835da9838e4fd79ce DIFF: https://github.com/llvm/llvm-project/commit/2137ded301adb430e0616cd835da9838e4fd79ce.diff LOG: [OpenMP] Add Maintainers.md file Added: openmp/Maintainers.md Modified: llvm/Maintainers.md Removed: ################################################################################ diff --git a/llvm/Maintainers.md b/llvm/Maintainers.md index e69fe62ac175b5..63bdfd42528db4 100644 --- a/llvm/Maintainers.md +++ b/llvm/Maintainers.md @@ -438,6 +438,8 @@ Others only have a lead maintainer listed here. [LLDB maintainers](https://github.com/llvm/llvm-project/blob/main/lldb/Maintainers.rst) +[LLVM OpenMP Library maintainers](https://github.com/llvm/llvm-project/blob/main/openmp/Maintainers.md) + #### libc++ Louis Dionne \ @@ -448,14 +450,6 @@ ldionne.2 at gmail.com (email), [ldionne](https://github.com/ldionne) (GitHub) Tom Stellard \ tstellar at redhat.com (email), [tstellar](https://github.com/tstellar) (GitHub) -#### OpenMP (runtime library) - -Michael Klemm \ -michael.klemm at amd.com (email), [mjklemm](https://github.com/mjklemm) (GitHub) - -Terry Wilmarth \ -terry.l.wilmarth at intel.com (email), [TerryLWilmarth](https://github.com/TerryLWilmarth) (GitHub) - #### Polly Tobias Grosser \ diff --git a/openmp/Maintainers.md b/openmp/Maintainers.md new file mode 100644 index 00000000000000..1892577374e6fb --- /dev/null +++ b/openmp/Maintainers.md @@ -0,0 +1,13 @@ +# LLVM OpenMP Library Maintainers + +This file is a list of the +[maintainers](https://llvm.org/docs/DeveloperPolicy.html#maintainers) for +the LLVM OpenMP library. + +# Current Maintainers + +Michael Klemm \ +michael.klemm at amd.com (email), [mjklemm](https://github.com/mjklemm) (GitHub) + +Terry Wilmarth \ +terry.l.wilmarth at intel.com (email), [TerryLWilmarth](https://github.com/TerryLWilmarth) (GitHub) From openmp-commits at lists.llvm.org Wed Dec 4 01:07:12 2024 From: openmp-commits at lists.llvm.org (David Spickett via Openmp-commits) Date: Wed, 04 Dec 2024 01:07:12 -0800 (PST) Subject: [Openmp-commits] [libclc] [libcxx] [llvm] [openmp] [polly] [llvm] Move sub-project lead maintainers into their own Maintainers.md files (PR #118309) In-Reply-To: Message-ID: <67501bc0.050a0220.41148.950e@mx.google.com> DavidSpickett wrote: OpenMP part pushed directly: https://github.com/llvm/llvm-project/commit/2137ded301adb430e0616cd835da9838e4fd79ce https://github.com/llvm/llvm-project/pull/118309 From openmp-commits at lists.llvm.org Fri Dec 6 21:35:26 2024 From: openmp-commits at lists.llvm.org (Brad Smith via Openmp-commits) Date: Fri, 06 Dec 2024 21:35:26 -0800 (PST) Subject: [Openmp-commits] [openmp] [OpenMP] Fix interoperability test compilation on OpenBSD (PR #119053) Message-ID: https://github.com/brad0 created https://github.com/llvm/llvm-project/pull/119053 None >From ee2e3837e4332bfa2c8bf7bc8e8c029611c29b0c Mon Sep 17 00:00:00 2001 From: Brad Smith Date: Sat, 7 Dec 2024 00:32:16 -0500 Subject: [PATCH] [OpenMP] Fix interoperability test compilation on OpenBSD --- openmp/runtime/test/ompt/misc/interoperability.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/openmp/runtime/test/ompt/misc/interoperability.cpp b/openmp/runtime/test/ompt/misc/interoperability.cpp index 6f17a94c1f4042..d9c11461220df1 100644 --- a/openmp/runtime/test/ompt/misc/interoperability.cpp +++ b/openmp/runtime/test/ompt/misc/interoperability.cpp @@ -3,7 +3,7 @@ #include #include -#if !defined(__FreeBSD__) && !defined(__NetBSD__) +#if !defined(__FreeBSD__) && !defined(__NetBSD__) && !defined(__OpenBSD__) #include #else #include From openmp-commits at lists.llvm.org Sat Dec 7 12:05:06 2024 From: openmp-commits at lists.llvm.org (Joseph Huber via Openmp-commits) Date: Sat, 07 Dec 2024 12:05:06 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <6754aa72.170a0220.1e5838.d301@mx.google.com> https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/119091 >From 4c710e49eea97e542b97e0b5e78b7915acd32383 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Sat, 7 Dec 2024 13:47:23 -0600 Subject: [PATCH 1/3] [OpenMP] Use generic IR for the OpenMP DeviceRTL Summary: We previously built this for every single architecture to deal with incompatibility. This patch updates it to use the 'generic' IR that `libc` and other projects use. Who knows if this will have any side-effects, probably worth testing more but it passes the tests I expect to pass on my side. --- clang/lib/Driver/ToolChains/CommonArgs.cpp | 3 +- clang/lib/Driver/ToolChains/Cuda.cpp | 1 - offload/DeviceRTL/CMakeLists.txt | 70 ++++------------- offload/DeviceRTL/src/Reduction.cpp | 89 +++++++++++----------- 4 files changed, 62 insertions(+), 101 deletions(-) diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp index 7d3d7f8f03c491..3dd90ecf8bca4c 100644 --- a/clang/lib/Driver/ToolChains/CommonArgs.cpp +++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp @@ -2800,8 +2800,7 @@ void tools::addOpenMPDeviceRTL(const Driver &D, : options::OPT_libomptarget_nvptx_bc_path_EQ; StringRef ArchPrefix = Triple.isAMDGCN() ? "amdgpu" : "nvptx"; - std::string LibOmpTargetName = - ("libomptarget-" + ArchPrefix + "-" + BitcodeSuffix + ".bc").str(); + std::string LibOmpTargetName = ("libomptarget-" + ArchPrefix + ".bc").str(); // First check whether user specifies bc library if (const Arg *A = DriverArgs.getLastArg(LibomptargetBCPathOpt)) { diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp index 102794829795da..214f1e5d83478f 100644 --- a/clang/lib/Driver/ToolChains/Cuda.cpp +++ b/clang/lib/Driver/ToolChains/Cuda.cpp @@ -851,7 +851,6 @@ void CudaToolChain::addClangTargetOptions( HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind); StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_march_EQ); - assert(!GpuArch.empty() && "Must have an explicit GPU arch."); assert((DeviceOffloadingKind == Action::OFK_OpenMP || DeviceOffloadingKind == Action::OFK_Cuda) && "Only OpenMP or CUDA offloading kinds are supported for NVIDIA GPUs."); diff --git a/offload/DeviceRTL/CMakeLists.txt b/offload/DeviceRTL/CMakeLists.txt index 1bf3eb9da38aa1..cda633c41062b6 100644 --- a/offload/DeviceRTL/CMakeLists.txt +++ b/offload/DeviceRTL/CMakeLists.txt @@ -42,38 +42,6 @@ set(devicertl_base_directory ${CMAKE_CURRENT_SOURCE_DIR}) set(include_directory ${devicertl_base_directory}/include) set(source_directory ${devicertl_base_directory}/src) -set(all_amdgpu_architectures "gfx700;gfx701;gfx801;gfx803;gfx900;gfx902;gfx906" - "gfx908;gfx90a;gfx90c;gfx940;gfx941;gfx942;gfx950;gfx1010" - "gfx1012;gfx1030;gfx1031;gfx1032;gfx1033;gfx1034;gfx1035" - "gfx1036;gfx1100;gfx1101;gfx1102;gfx1103;gfx1150" - "gfx1151;gfx1152;gfx1153") -set(all_nvptx_architectures "sm_35;sm_37;sm_50;sm_52;sm_53;sm_60;sm_61;sm_62" - "sm_70;sm_72;sm_75;sm_80;sm_86;sm_87;sm_89;sm_90") -set(all_gpu_architectures - "${all_amdgpu_architectures};${all_nvptx_architectures}") - -set(LIBOMPTARGET_DEVICE_ARCHITECTURES "all" CACHE STRING - "List of device architectures to be used to compile the OpenMP DeviceRTL.") - -if(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "all") - set(LIBOMPTARGET_DEVICE_ARCHITECTURES ${all_gpu_architectures}) -elseif(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "amdgpu") - set(LIBOMPTARGET_DEVICE_ARCHITECTURES ${all_amdgpu_architectures}) -elseif(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "nvptx") - set(LIBOMPTARGET_DEVICE_ARCHITECTURES ${all_nvptx_architectures}) -elseif(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "auto" OR - LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "native") - if(NOT LIBOMPTARGET_NVPTX_ARCH AND NOT LIBOMPTARGET_AMDGPU_ARCH) - message(FATAL_ERROR - "Could not find 'amdgpu-arch' and 'nvptx-arch' tools required for 'auto'") - elseif(NOT LIBOMPTARGET_FOUND_NVIDIA_GPU AND NOT LIBOMPTARGET_FOUND_AMDGPU_GPU) - message(FATAL_ERROR "No AMD or NVIDIA GPU found on the system when using 'auto'") - endif() - set(LIBOMPTARGET_DEVICE_ARCHITECTURES - "${LIBOMPTARGET_NVPTX_DETECTED_ARCH_LIST};${LIBOMPTARGET_AMDGPU_DETECTED_ARCH_LIST}") -endif() -list(REMOVE_DUPLICATES LIBOMPTARGET_DEVICE_ARCHITECTURES) - set(include_files ${include_directory}/Allocator.h ${include_directory}/Configuration.h @@ -141,20 +109,21 @@ set(bc_flags -c -foffload-lto -std=c++17 -fvisibility=hidden # first create an object target add_library(omptarget.devicertl.all_objs OBJECT IMPORTED) -function(compileDeviceRTLLibrary target_cpu target_name target_triple) +function(compileDeviceRTLLibrary target_name target_triple) set(target_bc_flags ${ARGN}) set(bc_files "") foreach(src ${src_files}) get_filename_component(infile ${src} ABSOLUTE) get_filename_component(outfile ${src} NAME) - set(outfile "${outfile}-${target_cpu}.bc") + set(outfile "${outfile}-${target_name}.bc") set(depfile "${outfile}.d") add_custom_command(OUTPUT ${outfile} COMMAND ${CLANG_TOOL} ${bc_flags} - --offload-arch=${target_cpu} + -fopenmp-targets=${target_triple} + -Xopenmp-target=${target_triple} -march= ${target_bc_flags} -MD -MF ${depfile} ${infile} -o ${outfile} @@ -177,7 +146,7 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) list(APPEND bc_files ${outfile}) endforeach() - set(bclib_name "libomptarget-${target_name}-${target_cpu}.bc") + set(bclib_name "libomptarget-${target_name}.bc") # Link to a bitcode library. add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/linked_${bclib_name} @@ -217,7 +186,7 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) APPEND) endif() - set(bclib_target_name "omptarget-${target_name}-${target_cpu}-bc") + set(bclib_target_name "omptarget-${target_name}-bc") add_custom_target(${bclib_target_name} DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name}) # Copy library to destination. @@ -239,7 +208,7 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) # Package the bitcode in the bitcode and embed it in an ELF for the static library add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} COMMAND ${PACKAGER_TOOL} -o ${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} - "--image=file=${CMAKE_CURRENT_BINARY_DIR}/${bclib_name},${target_feature},triple=${target_triple},arch=${target_cpu},kind=openmp" + "--image=file=${CMAKE_CURRENT_BINARY_DIR}/${bclib_name},${target_feature},triple=${target_triple},arch=generic,kind=openmp" DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name} COMMENT "Packaging LLVM offloading binary ${bclib_name}.out" ) @@ -249,14 +218,14 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) APPEND) endif() - set(output_name "${CMAKE_CURRENT_BINARY_DIR}/devicertl-${target_name}-${target_cpu}.o") + set(output_name "${CMAKE_CURRENT_BINARY_DIR}/devicertl-${target_name}.o") add_custom_command(OUTPUT ${output_name} COMMAND ${CLANG_TOOL} --std=c++17 -c -nostdlib -Xclang -fembed-offload-object=${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} -o ${output_name} ${source_directory}/Stub.cpp DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} ${source_directory}/Stub.cpp - COMMENT "Embedding LLVM offloading binary in devicertl-${target_name}-${target_cpu}.o" + COMMENT "Embedding LLVM offloading binary in devicertl-${target_name}.o" VERBATIM ) if(TARGET clang) @@ -269,11 +238,11 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) set_property(TARGET omptarget.devicertl.all_objs APPEND PROPERTY IMPORTED_OBJECTS ${output_name}) if (CMAKE_EXPORT_COMPILE_COMMANDS) - set(ide_target_name omptarget-ide-${target_name}-${target_cpu}) + set(ide_target_name omptarget-ide-${target_name}) add_library(${ide_target_name} STATIC EXCLUDE_FROM_ALL ${src_files}) target_compile_options(${ide_target_name} PRIVATE - -fopenmp --offload-arch=${target_cpu} -fopenmp-cuda-mode - -mllvm -openmp-opt-disable + -fopenmp-targets=${target_triple} -Xopenmp-target=${target_triple} -march= + -fopenmp -fopenmp-cuda-mode -mllvm -openmp-opt-disable -foffload-lto -fvisibility=hidden --offload-device-only -nocudalib -nogpulib -nogpuinc -nostdlibinc -Wno-unknown-cuda-version ) @@ -288,18 +257,11 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) endif() endfunction() -# Generate a Bitcode library for all the gpu architectures the user requested. -add_custom_target(omptarget.devicertl.nvptx) add_custom_target(omptarget.devicertl.amdgpu) -foreach(gpu_arch ${LIBOMPTARGET_DEVICE_ARCHITECTURES}) - if("${gpu_arch}" IN_LIST all_amdgpu_architectures) - compileDeviceRTLLibrary(${gpu_arch} amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) - elseif("${gpu_arch}" IN_LIST all_nvptx_architectures) - compileDeviceRTLLibrary(${gpu_arch} nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) - else() - message(FATAL_ERROR "Unknown GPU architecture '${gpu_arch}'") - endif() -endforeach() +compileDeviceRTLLibrary(amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) + +add_custom_target(omptarget.devicertl.nvptx) +compileDeviceRTLLibrary(nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) # Archive all the object files generated above into a static library add_library(omptarget.devicertl STATIC) diff --git a/offload/DeviceRTL/src/Reduction.cpp b/offload/DeviceRTL/src/Reduction.cpp index 57df159d3f28e5..d1aa9d3e474f71 100644 --- a/offload/DeviceRTL/src/Reduction.cpp +++ b/offload/DeviceRTL/src/Reduction.cpp @@ -44,7 +44,6 @@ void gpu_irregular_warp_reduce(void *reduce_data, ShuffleReductFnTy shflFct, } } -#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ < 700 static uint32_t gpu_irregular_simd_reduce(void *reduce_data, ShuffleReductFnTy shflFct) { uint32_t size, remote_id, physical_lane_id; @@ -63,7 +62,6 @@ static uint32_t gpu_irregular_simd_reduce(void *reduce_data, } while (logical_lane_id % 2 == 0 && size > 1); return (logical_lane_id == 0); } -#endif static int32_t nvptx_parallel_reduce_nowait(void *reduce_data, ShuffleReductFnTy shflFct, @@ -74,49 +72,53 @@ static int32_t nvptx_parallel_reduce_nowait(void *reduce_data, uint32_t NumThreads = omp_get_num_threads(); if (NumThreads == 1) return 1; - /* - * This reduce function handles reduction within a team. It handles - * parallel regions in both L1 and L2 parallelism levels. It also - * supports Generic, SPMD, and NoOMP modes. - * - * 1. Reduce within a warp. - * 2. Warp master copies value to warp 0 via shared memory. - * 3. Warp 0 reduces to a single value. - * 4. The reduced value is available in the thread that returns 1. - */ - -#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 700 - uint32_t WarpsNeeded = - (NumThreads + mapping::getWarpSize() - 1) / mapping::getWarpSize(); - uint32_t WarpId = mapping::getWarpIdInBlock(); - - // Volta execution model: - // For the Generic execution mode a parallel region either has 1 thread and - // beyond that, always a multiple of 32. For the SPMD execution mode we may - // have any number of threads. - if ((NumThreads % mapping::getWarpSize() == 0) || (WarpId < WarpsNeeded - 1)) - gpu_regular_warp_reduce(reduce_data, shflFct); - else if (NumThreads > 1) // Only SPMD execution mode comes thru this case. - gpu_irregular_warp_reduce(reduce_data, shflFct, - /*LaneCount=*/NumThreads % mapping::getWarpSize(), - /*LaneId=*/mapping::getThreadIdInBlock() % - mapping::getWarpSize()); - // When we have more than [mapping::getWarpSize()] number of threads - // a block reduction is performed here. // - // Only L1 parallel region can enter this if condition. - if (NumThreads > mapping::getWarpSize()) { - // Gather all the reduced values from each warp - // to the first warp. - cpyFct(reduce_data, WarpsNeeded); + // This reduce function handles reduction within a team. It handles + // parallel regions in both L1 and L2 parallelism levels. It also + // supports Generic, SPMD, and NoOMP modes. + // + // 1. Reduce within a warp. + // 2. Warp master copies value to warp 0 via shared memory. + // 3. Warp 0 reduces to a single value. + // 4. The reduced value is available in the thread that returns 1. + // - if (WarpId == 0) - gpu_irregular_warp_reduce(reduce_data, shflFct, WarpsNeeded, - BlockThreadId); +#if __has_builtin(__nvvm_reflect) + if (__nvvm_reflect("__CUDA_ARCH") >= 700) { + uint32_t WarpsNeeded = + (NumThreads + mapping::getWarpSize() - 1) / mapping::getWarpSize(); + uint32_t WarpId = mapping::getWarpIdInBlock(); + + // Volta execution model: + // For the Generic execution mode a parallel region either has 1 thread and + // beyond that, always a multiple of 32. For the SPMD execution mode we may + // have any number of threads. + if ((NumThreads % mapping::getWarpSize() == 0) || + (WarpId < WarpsNeeded - 1)) + gpu_regular_warp_reduce(reduce_data, shflFct); + else if (NumThreads > 1) // Only SPMD execution mode comes thru this case. + gpu_irregular_warp_reduce( + reduce_data, shflFct, + /*LaneCount=*/NumThreads % mapping::getWarpSize(), + /*LaneId=*/mapping::getThreadIdInBlock() % mapping::getWarpSize()); + + // When we have more than [mapping::getWarpSize()] number of threads + // a block reduction is performed here. + // + // Only L1 parallel region can enter this if condition. + if (NumThreads > mapping::getWarpSize()) { + // Gather all the reduced values from each warp + // to the first warp. + cpyFct(reduce_data, WarpsNeeded); + + if (WarpId == 0) + gpu_irregular_warp_reduce(reduce_data, shflFct, WarpsNeeded, + BlockThreadId); + } + return BlockThreadId == 0; } - return BlockThreadId == 0; -#else +#endif __kmpc_impl_lanemask_t Liveness = mapping::activemask(); if (Liveness == lanes::All) // Full warp gpu_regular_warp_reduce(reduce_data, shflFct); @@ -150,10 +152,9 @@ static int32_t nvptx_parallel_reduce_nowait(void *reduce_data, return BlockThreadId == 0; } - // Get the OMP thread Id. This is different from BlockThreadId in the case of - // an L2 parallel region. + // Get the OMP thread Id. This is different from BlockThreadId in the case + // of an L2 parallel region. return BlockThreadId == 0; -#endif // __CUDA_ARCH__ >= 700 } uint32_t roundToWarpsize(uint32_t s) { >From ae742d66d1c92d4ef95bbcafa05298fb72dcfa63 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Sat, 7 Dec 2024 13:56:50 -0600 Subject: [PATCH 2/3] update with comment --- offload/DeviceRTL/CMakeLists.txt | 1 + offload/DeviceRTL/src/Reduction.cpp | 20 ++++++++++---------- 2 files changed, 11 insertions(+), 10 deletions(-) diff --git a/offload/DeviceRTL/CMakeLists.txt b/offload/DeviceRTL/CMakeLists.txt index cda633c41062b6..22940264f9b19a 100644 --- a/offload/DeviceRTL/CMakeLists.txt +++ b/offload/DeviceRTL/CMakeLists.txt @@ -119,6 +119,7 @@ function(compileDeviceRTLLibrary target_name target_triple) set(outfile "${outfile}-${target_name}.bc") set(depfile "${outfile}.d") + # Passing an empty CPU to -march= suppressed target specific metadata. add_custom_command(OUTPUT ${outfile} COMMAND ${CLANG_TOOL} ${bc_flags} diff --git a/offload/DeviceRTL/src/Reduction.cpp b/offload/DeviceRTL/src/Reduction.cpp index d1aa9d3e474f71..d3b4528401953c 100644 --- a/offload/DeviceRTL/src/Reduction.cpp +++ b/offload/DeviceRTL/src/Reduction.cpp @@ -73,16 +73,16 @@ static int32_t nvptx_parallel_reduce_nowait(void *reduce_data, if (NumThreads == 1) return 1; - // - // This reduce function handles reduction within a team. It handles - // parallel regions in both L1 and L2 parallelism levels. It also - // supports Generic, SPMD, and NoOMP modes. - // - // 1. Reduce within a warp. - // 2. Warp master copies value to warp 0 via shared memory. - // 3. Warp 0 reduces to a single value. - // 4. The reduced value is available in the thread that returns 1. - // + // + // This reduce function handles reduction within a team. It handles + // parallel regions in both L1 and L2 parallelism levels. It also + // supports Generic, SPMD, and NoOMP modes. + // + // 1. Reduce within a warp. + // 2. Warp master copies value to warp 0 via shared memory. + // 3. Warp 0 reduces to a single value. + // 4. The reduced value is available in the thread that returns 1. + // #if __has_builtin(__nvvm_reflect) if (__nvvm_reflect("__CUDA_ARCH") >= 700) { >From f868bbfc0c724dbd985e13c4a9fbe1c66aafc46a Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Sat, 7 Dec 2024 14:04:55 -0600 Subject: [PATCH 3/3] release notes --- clang/docs/ReleaseNotes.rst | 3 +++ openmp/docs/ReleaseNotes.rst | 7 +++++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index 12d8ebf3251bc0..1b2bb3b98f0dc4 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -1153,6 +1153,9 @@ OpenMP Support - Added support for 'omp assume' directive. - Added support for 'omp scope' directive. - Added support for allocator-modifier in 'allocate' clause. +- Changed the OpenMP DeviceRTL to use 'generic' IR. The + ``LIBOMPTARGET_DEVICE_ARCHITECTURES`` CMake argument is now unused and will + always build support for AMDGPU and NVPTX targets. Improvements ^^^^^^^^^^^^ diff --git a/openmp/docs/ReleaseNotes.rst b/openmp/docs/ReleaseNotes.rst index d4a4b1a99f7813..db594bd71d7e61 100644 --- a/openmp/docs/ReleaseNotes.rst +++ b/openmp/docs/ReleaseNotes.rst @@ -1,5 +1,4 @@ -=========================== -OpenMP 20.0.0 Release Notes + =========================== @@ -19,3 +18,7 @@ from the `LLVM releases web site `_. Non-comprehensive list of changes in this release ================================================= + +- Changed the OpenMP DeviceRTL to use 'generic' IR. The + ``LIBOMPTARGET_DEVICE_ARCHITECTURES`` CMake argument is now unused and will + always build support for AMDGPU and NVPTX targets. From openmp-commits at lists.llvm.org Sat Dec 7 12:23:07 2024 From: openmp-commits at lists.llvm.org (Joseph Huber via Openmp-commits) Date: Sat, 07 Dec 2024 12:23:07 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <6754aeab.170a0220.1670e0.0994@mx.google.com> https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/119091 >From 4c710e49eea97e542b97e0b5e78b7915acd32383 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Sat, 7 Dec 2024 13:47:23 -0600 Subject: [PATCH 1/3] [OpenMP] Use generic IR for the OpenMP DeviceRTL Summary: We previously built this for every single architecture to deal with incompatibility. This patch updates it to use the 'generic' IR that `libc` and other projects use. Who knows if this will have any side-effects, probably worth testing more but it passes the tests I expect to pass on my side. --- clang/lib/Driver/ToolChains/CommonArgs.cpp | 3 +- clang/lib/Driver/ToolChains/Cuda.cpp | 1 - offload/DeviceRTL/CMakeLists.txt | 70 ++++------------- offload/DeviceRTL/src/Reduction.cpp | 89 +++++++++++----------- 4 files changed, 62 insertions(+), 101 deletions(-) diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp index 7d3d7f8f03c491..3dd90ecf8bca4c 100644 --- a/clang/lib/Driver/ToolChains/CommonArgs.cpp +++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp @@ -2800,8 +2800,7 @@ void tools::addOpenMPDeviceRTL(const Driver &D, : options::OPT_libomptarget_nvptx_bc_path_EQ; StringRef ArchPrefix = Triple.isAMDGCN() ? "amdgpu" : "nvptx"; - std::string LibOmpTargetName = - ("libomptarget-" + ArchPrefix + "-" + BitcodeSuffix + ".bc").str(); + std::string LibOmpTargetName = ("libomptarget-" + ArchPrefix + ".bc").str(); // First check whether user specifies bc library if (const Arg *A = DriverArgs.getLastArg(LibomptargetBCPathOpt)) { diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp index 102794829795da..214f1e5d83478f 100644 --- a/clang/lib/Driver/ToolChains/Cuda.cpp +++ b/clang/lib/Driver/ToolChains/Cuda.cpp @@ -851,7 +851,6 @@ void CudaToolChain::addClangTargetOptions( HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind); StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_march_EQ); - assert(!GpuArch.empty() && "Must have an explicit GPU arch."); assert((DeviceOffloadingKind == Action::OFK_OpenMP || DeviceOffloadingKind == Action::OFK_Cuda) && "Only OpenMP or CUDA offloading kinds are supported for NVIDIA GPUs."); diff --git a/offload/DeviceRTL/CMakeLists.txt b/offload/DeviceRTL/CMakeLists.txt index 1bf3eb9da38aa1..cda633c41062b6 100644 --- a/offload/DeviceRTL/CMakeLists.txt +++ b/offload/DeviceRTL/CMakeLists.txt @@ -42,38 +42,6 @@ set(devicertl_base_directory ${CMAKE_CURRENT_SOURCE_DIR}) set(include_directory ${devicertl_base_directory}/include) set(source_directory ${devicertl_base_directory}/src) -set(all_amdgpu_architectures "gfx700;gfx701;gfx801;gfx803;gfx900;gfx902;gfx906" - "gfx908;gfx90a;gfx90c;gfx940;gfx941;gfx942;gfx950;gfx1010" - "gfx1012;gfx1030;gfx1031;gfx1032;gfx1033;gfx1034;gfx1035" - "gfx1036;gfx1100;gfx1101;gfx1102;gfx1103;gfx1150" - "gfx1151;gfx1152;gfx1153") -set(all_nvptx_architectures "sm_35;sm_37;sm_50;sm_52;sm_53;sm_60;sm_61;sm_62" - "sm_70;sm_72;sm_75;sm_80;sm_86;sm_87;sm_89;sm_90") -set(all_gpu_architectures - "${all_amdgpu_architectures};${all_nvptx_architectures}") - -set(LIBOMPTARGET_DEVICE_ARCHITECTURES "all" CACHE STRING - "List of device architectures to be used to compile the OpenMP DeviceRTL.") - -if(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "all") - set(LIBOMPTARGET_DEVICE_ARCHITECTURES ${all_gpu_architectures}) -elseif(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "amdgpu") - set(LIBOMPTARGET_DEVICE_ARCHITECTURES ${all_amdgpu_architectures}) -elseif(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "nvptx") - set(LIBOMPTARGET_DEVICE_ARCHITECTURES ${all_nvptx_architectures}) -elseif(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "auto" OR - LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "native") - if(NOT LIBOMPTARGET_NVPTX_ARCH AND NOT LIBOMPTARGET_AMDGPU_ARCH) - message(FATAL_ERROR - "Could not find 'amdgpu-arch' and 'nvptx-arch' tools required for 'auto'") - elseif(NOT LIBOMPTARGET_FOUND_NVIDIA_GPU AND NOT LIBOMPTARGET_FOUND_AMDGPU_GPU) - message(FATAL_ERROR "No AMD or NVIDIA GPU found on the system when using 'auto'") - endif() - set(LIBOMPTARGET_DEVICE_ARCHITECTURES - "${LIBOMPTARGET_NVPTX_DETECTED_ARCH_LIST};${LIBOMPTARGET_AMDGPU_DETECTED_ARCH_LIST}") -endif() -list(REMOVE_DUPLICATES LIBOMPTARGET_DEVICE_ARCHITECTURES) - set(include_files ${include_directory}/Allocator.h ${include_directory}/Configuration.h @@ -141,20 +109,21 @@ set(bc_flags -c -foffload-lto -std=c++17 -fvisibility=hidden # first create an object target add_library(omptarget.devicertl.all_objs OBJECT IMPORTED) -function(compileDeviceRTLLibrary target_cpu target_name target_triple) +function(compileDeviceRTLLibrary target_name target_triple) set(target_bc_flags ${ARGN}) set(bc_files "") foreach(src ${src_files}) get_filename_component(infile ${src} ABSOLUTE) get_filename_component(outfile ${src} NAME) - set(outfile "${outfile}-${target_cpu}.bc") + set(outfile "${outfile}-${target_name}.bc") set(depfile "${outfile}.d") add_custom_command(OUTPUT ${outfile} COMMAND ${CLANG_TOOL} ${bc_flags} - --offload-arch=${target_cpu} + -fopenmp-targets=${target_triple} + -Xopenmp-target=${target_triple} -march= ${target_bc_flags} -MD -MF ${depfile} ${infile} -o ${outfile} @@ -177,7 +146,7 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) list(APPEND bc_files ${outfile}) endforeach() - set(bclib_name "libomptarget-${target_name}-${target_cpu}.bc") + set(bclib_name "libomptarget-${target_name}.bc") # Link to a bitcode library. add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/linked_${bclib_name} @@ -217,7 +186,7 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) APPEND) endif() - set(bclib_target_name "omptarget-${target_name}-${target_cpu}-bc") + set(bclib_target_name "omptarget-${target_name}-bc") add_custom_target(${bclib_target_name} DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name}) # Copy library to destination. @@ -239,7 +208,7 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) # Package the bitcode in the bitcode and embed it in an ELF for the static library add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} COMMAND ${PACKAGER_TOOL} -o ${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} - "--image=file=${CMAKE_CURRENT_BINARY_DIR}/${bclib_name},${target_feature},triple=${target_triple},arch=${target_cpu},kind=openmp" + "--image=file=${CMAKE_CURRENT_BINARY_DIR}/${bclib_name},${target_feature},triple=${target_triple},arch=generic,kind=openmp" DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name} COMMENT "Packaging LLVM offloading binary ${bclib_name}.out" ) @@ -249,14 +218,14 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) APPEND) endif() - set(output_name "${CMAKE_CURRENT_BINARY_DIR}/devicertl-${target_name}-${target_cpu}.o") + set(output_name "${CMAKE_CURRENT_BINARY_DIR}/devicertl-${target_name}.o") add_custom_command(OUTPUT ${output_name} COMMAND ${CLANG_TOOL} --std=c++17 -c -nostdlib -Xclang -fembed-offload-object=${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} -o ${output_name} ${source_directory}/Stub.cpp DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} ${source_directory}/Stub.cpp - COMMENT "Embedding LLVM offloading binary in devicertl-${target_name}-${target_cpu}.o" + COMMENT "Embedding LLVM offloading binary in devicertl-${target_name}.o" VERBATIM ) if(TARGET clang) @@ -269,11 +238,11 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) set_property(TARGET omptarget.devicertl.all_objs APPEND PROPERTY IMPORTED_OBJECTS ${output_name}) if (CMAKE_EXPORT_COMPILE_COMMANDS) - set(ide_target_name omptarget-ide-${target_name}-${target_cpu}) + set(ide_target_name omptarget-ide-${target_name}) add_library(${ide_target_name} STATIC EXCLUDE_FROM_ALL ${src_files}) target_compile_options(${ide_target_name} PRIVATE - -fopenmp --offload-arch=${target_cpu} -fopenmp-cuda-mode - -mllvm -openmp-opt-disable + -fopenmp-targets=${target_triple} -Xopenmp-target=${target_triple} -march= + -fopenmp -fopenmp-cuda-mode -mllvm -openmp-opt-disable -foffload-lto -fvisibility=hidden --offload-device-only -nocudalib -nogpulib -nogpuinc -nostdlibinc -Wno-unknown-cuda-version ) @@ -288,18 +257,11 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) endif() endfunction() -# Generate a Bitcode library for all the gpu architectures the user requested. -add_custom_target(omptarget.devicertl.nvptx) add_custom_target(omptarget.devicertl.amdgpu) -foreach(gpu_arch ${LIBOMPTARGET_DEVICE_ARCHITECTURES}) - if("${gpu_arch}" IN_LIST all_amdgpu_architectures) - compileDeviceRTLLibrary(${gpu_arch} amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) - elseif("${gpu_arch}" IN_LIST all_nvptx_architectures) - compileDeviceRTLLibrary(${gpu_arch} nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) - else() - message(FATAL_ERROR "Unknown GPU architecture '${gpu_arch}'") - endif() -endforeach() +compileDeviceRTLLibrary(amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) + +add_custom_target(omptarget.devicertl.nvptx) +compileDeviceRTLLibrary(nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) # Archive all the object files generated above into a static library add_library(omptarget.devicertl STATIC) diff --git a/offload/DeviceRTL/src/Reduction.cpp b/offload/DeviceRTL/src/Reduction.cpp index 57df159d3f28e5..d1aa9d3e474f71 100644 --- a/offload/DeviceRTL/src/Reduction.cpp +++ b/offload/DeviceRTL/src/Reduction.cpp @@ -44,7 +44,6 @@ void gpu_irregular_warp_reduce(void *reduce_data, ShuffleReductFnTy shflFct, } } -#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ < 700 static uint32_t gpu_irregular_simd_reduce(void *reduce_data, ShuffleReductFnTy shflFct) { uint32_t size, remote_id, physical_lane_id; @@ -63,7 +62,6 @@ static uint32_t gpu_irregular_simd_reduce(void *reduce_data, } while (logical_lane_id % 2 == 0 && size > 1); return (logical_lane_id == 0); } -#endif static int32_t nvptx_parallel_reduce_nowait(void *reduce_data, ShuffleReductFnTy shflFct, @@ -74,49 +72,53 @@ static int32_t nvptx_parallel_reduce_nowait(void *reduce_data, uint32_t NumThreads = omp_get_num_threads(); if (NumThreads == 1) return 1; - /* - * This reduce function handles reduction within a team. It handles - * parallel regions in both L1 and L2 parallelism levels. It also - * supports Generic, SPMD, and NoOMP modes. - * - * 1. Reduce within a warp. - * 2. Warp master copies value to warp 0 via shared memory. - * 3. Warp 0 reduces to a single value. - * 4. The reduced value is available in the thread that returns 1. - */ - -#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 700 - uint32_t WarpsNeeded = - (NumThreads + mapping::getWarpSize() - 1) / mapping::getWarpSize(); - uint32_t WarpId = mapping::getWarpIdInBlock(); - - // Volta execution model: - // For the Generic execution mode a parallel region either has 1 thread and - // beyond that, always a multiple of 32. For the SPMD execution mode we may - // have any number of threads. - if ((NumThreads % mapping::getWarpSize() == 0) || (WarpId < WarpsNeeded - 1)) - gpu_regular_warp_reduce(reduce_data, shflFct); - else if (NumThreads > 1) // Only SPMD execution mode comes thru this case. - gpu_irregular_warp_reduce(reduce_data, shflFct, - /*LaneCount=*/NumThreads % mapping::getWarpSize(), - /*LaneId=*/mapping::getThreadIdInBlock() % - mapping::getWarpSize()); - // When we have more than [mapping::getWarpSize()] number of threads - // a block reduction is performed here. // - // Only L1 parallel region can enter this if condition. - if (NumThreads > mapping::getWarpSize()) { - // Gather all the reduced values from each warp - // to the first warp. - cpyFct(reduce_data, WarpsNeeded); + // This reduce function handles reduction within a team. It handles + // parallel regions in both L1 and L2 parallelism levels. It also + // supports Generic, SPMD, and NoOMP modes. + // + // 1. Reduce within a warp. + // 2. Warp master copies value to warp 0 via shared memory. + // 3. Warp 0 reduces to a single value. + // 4. The reduced value is available in the thread that returns 1. + // - if (WarpId == 0) - gpu_irregular_warp_reduce(reduce_data, shflFct, WarpsNeeded, - BlockThreadId); +#if __has_builtin(__nvvm_reflect) + if (__nvvm_reflect("__CUDA_ARCH") >= 700) { + uint32_t WarpsNeeded = + (NumThreads + mapping::getWarpSize() - 1) / mapping::getWarpSize(); + uint32_t WarpId = mapping::getWarpIdInBlock(); + + // Volta execution model: + // For the Generic execution mode a parallel region either has 1 thread and + // beyond that, always a multiple of 32. For the SPMD execution mode we may + // have any number of threads. + if ((NumThreads % mapping::getWarpSize() == 0) || + (WarpId < WarpsNeeded - 1)) + gpu_regular_warp_reduce(reduce_data, shflFct); + else if (NumThreads > 1) // Only SPMD execution mode comes thru this case. + gpu_irregular_warp_reduce( + reduce_data, shflFct, + /*LaneCount=*/NumThreads % mapping::getWarpSize(), + /*LaneId=*/mapping::getThreadIdInBlock() % mapping::getWarpSize()); + + // When we have more than [mapping::getWarpSize()] number of threads + // a block reduction is performed here. + // + // Only L1 parallel region can enter this if condition. + if (NumThreads > mapping::getWarpSize()) { + // Gather all the reduced values from each warp + // to the first warp. + cpyFct(reduce_data, WarpsNeeded); + + if (WarpId == 0) + gpu_irregular_warp_reduce(reduce_data, shflFct, WarpsNeeded, + BlockThreadId); + } + return BlockThreadId == 0; } - return BlockThreadId == 0; -#else +#endif __kmpc_impl_lanemask_t Liveness = mapping::activemask(); if (Liveness == lanes::All) // Full warp gpu_regular_warp_reduce(reduce_data, shflFct); @@ -150,10 +152,9 @@ static int32_t nvptx_parallel_reduce_nowait(void *reduce_data, return BlockThreadId == 0; } - // Get the OMP thread Id. This is different from BlockThreadId in the case of - // an L2 parallel region. + // Get the OMP thread Id. This is different from BlockThreadId in the case + // of an L2 parallel region. return BlockThreadId == 0; -#endif // __CUDA_ARCH__ >= 700 } uint32_t roundToWarpsize(uint32_t s) { >From ae742d66d1c92d4ef95bbcafa05298fb72dcfa63 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Sat, 7 Dec 2024 13:56:50 -0600 Subject: [PATCH 2/3] update with comment --- offload/DeviceRTL/CMakeLists.txt | 1 + offload/DeviceRTL/src/Reduction.cpp | 20 ++++++++++---------- 2 files changed, 11 insertions(+), 10 deletions(-) diff --git a/offload/DeviceRTL/CMakeLists.txt b/offload/DeviceRTL/CMakeLists.txt index cda633c41062b6..22940264f9b19a 100644 --- a/offload/DeviceRTL/CMakeLists.txt +++ b/offload/DeviceRTL/CMakeLists.txt @@ -119,6 +119,7 @@ function(compileDeviceRTLLibrary target_name target_triple) set(outfile "${outfile}-${target_name}.bc") set(depfile "${outfile}.d") + # Passing an empty CPU to -march= suppressed target specific metadata. add_custom_command(OUTPUT ${outfile} COMMAND ${CLANG_TOOL} ${bc_flags} diff --git a/offload/DeviceRTL/src/Reduction.cpp b/offload/DeviceRTL/src/Reduction.cpp index d1aa9d3e474f71..d3b4528401953c 100644 --- a/offload/DeviceRTL/src/Reduction.cpp +++ b/offload/DeviceRTL/src/Reduction.cpp @@ -73,16 +73,16 @@ static int32_t nvptx_parallel_reduce_nowait(void *reduce_data, if (NumThreads == 1) return 1; - // - // This reduce function handles reduction within a team. It handles - // parallel regions in both L1 and L2 parallelism levels. It also - // supports Generic, SPMD, and NoOMP modes. - // - // 1. Reduce within a warp. - // 2. Warp master copies value to warp 0 via shared memory. - // 3. Warp 0 reduces to a single value. - // 4. The reduced value is available in the thread that returns 1. - // + // + // This reduce function handles reduction within a team. It handles + // parallel regions in both L1 and L2 parallelism levels. It also + // supports Generic, SPMD, and NoOMP modes. + // + // 1. Reduce within a warp. + // 2. Warp master copies value to warp 0 via shared memory. + // 3. Warp 0 reduces to a single value. + // 4. The reduced value is available in the thread that returns 1. + // #if __has_builtin(__nvvm_reflect) if (__nvvm_reflect("__CUDA_ARCH") >= 700) { >From 51c5926a6b193383b34cbfd14e499481f7a269eb Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Sat, 7 Dec 2024 14:04:55 -0600 Subject: [PATCH 3/3] release notes --- clang/docs/ReleaseNotes.rst | 3 +++ openmp/docs/ReleaseNotes.rst | 9 +++++++-- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index 12d8ebf3251bc0..1b2bb3b98f0dc4 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -1153,6 +1153,9 @@ OpenMP Support - Added support for 'omp assume' directive. - Added support for 'omp scope' directive. - Added support for allocator-modifier in 'allocate' clause. +- Changed the OpenMP DeviceRTL to use 'generic' IR. The + ``LIBOMPTARGET_DEVICE_ARCHITECTURES`` CMake argument is now unused and will + always build support for AMDGPU and NVPTX targets. Improvements ^^^^^^^^^^^^ diff --git a/openmp/docs/ReleaseNotes.rst b/openmp/docs/ReleaseNotes.rst index d4a4b1a99f7813..21be8b41c73aac 100644 --- a/openmp/docs/ReleaseNotes.rst +++ b/openmp/docs/ReleaseNotes.rst @@ -1,5 +1,4 @@ -=========================== -OpenMP 20.0.0 Release Notes + =========================== @@ -19,3 +18,9 @@ from the `LLVM releases web site `_. Non-comprehensive list of changes in this release ================================================= + +Device Runtime +-------------- +- Changed the OpenMP DeviceRTL to use 'generic' IR. The + ``LIBOMPTARGET_DEVICE_ARCHITECTURES`` CMake argument is now unused and will + always build support for AMDGPU and NVPTX targets. From openmp-commits at lists.llvm.org Sat Dec 7 12:23:18 2024 From: openmp-commits at lists.llvm.org (Joseph Huber via Openmp-commits) Date: Sat, 07 Dec 2024 12:23:18 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <6754aeb6.170a0220.116b06.dc51@mx.google.com> jhuber6 wrote: @hidekisaito Might be relevant to your patch. https://github.com/llvm/llvm-project/pull/119091 From openmp-commits at lists.llvm.org Sat Dec 7 13:27:09 2024 From: openmp-commits at lists.llvm.org (Shilei Tian via Openmp-commits) Date: Sat, 07 Dec 2024 13:27:09 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <6754bdad.170a0220.38ce08.dd8b@mx.google.com> https://github.com/shiltian approved this pull request. it seems like there are some test failures but I think the PR looks good https://github.com/llvm/llvm-project/pull/119091 From openmp-commits at lists.llvm.org Sat Dec 7 13:43:01 2024 From: openmp-commits at lists.llvm.org (Joseph Huber via Openmp-commits) Date: Sat, 07 Dec 2024 13:43:01 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <6754c165.170a0220.5f83.1059@mx.google.com> https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/119091 >From 4c710e49eea97e542b97e0b5e78b7915acd32383 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Sat, 7 Dec 2024 13:47:23 -0600 Subject: [PATCH 1/4] [OpenMP] Use generic IR for the OpenMP DeviceRTL Summary: We previously built this for every single architecture to deal with incompatibility. This patch updates it to use the 'generic' IR that `libc` and other projects use. Who knows if this will have any side-effects, probably worth testing more but it passes the tests I expect to pass on my side. --- clang/lib/Driver/ToolChains/CommonArgs.cpp | 3 +- clang/lib/Driver/ToolChains/Cuda.cpp | 1 - offload/DeviceRTL/CMakeLists.txt | 70 ++++------------- offload/DeviceRTL/src/Reduction.cpp | 89 +++++++++++----------- 4 files changed, 62 insertions(+), 101 deletions(-) diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp index 7d3d7f8f03c491..3dd90ecf8bca4c 100644 --- a/clang/lib/Driver/ToolChains/CommonArgs.cpp +++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp @@ -2800,8 +2800,7 @@ void tools::addOpenMPDeviceRTL(const Driver &D, : options::OPT_libomptarget_nvptx_bc_path_EQ; StringRef ArchPrefix = Triple.isAMDGCN() ? "amdgpu" : "nvptx"; - std::string LibOmpTargetName = - ("libomptarget-" + ArchPrefix + "-" + BitcodeSuffix + ".bc").str(); + std::string LibOmpTargetName = ("libomptarget-" + ArchPrefix + ".bc").str(); // First check whether user specifies bc library if (const Arg *A = DriverArgs.getLastArg(LibomptargetBCPathOpt)) { diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp index 102794829795da..214f1e5d83478f 100644 --- a/clang/lib/Driver/ToolChains/Cuda.cpp +++ b/clang/lib/Driver/ToolChains/Cuda.cpp @@ -851,7 +851,6 @@ void CudaToolChain::addClangTargetOptions( HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind); StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_march_EQ); - assert(!GpuArch.empty() && "Must have an explicit GPU arch."); assert((DeviceOffloadingKind == Action::OFK_OpenMP || DeviceOffloadingKind == Action::OFK_Cuda) && "Only OpenMP or CUDA offloading kinds are supported for NVIDIA GPUs."); diff --git a/offload/DeviceRTL/CMakeLists.txt b/offload/DeviceRTL/CMakeLists.txt index 1bf3eb9da38aa1..cda633c41062b6 100644 --- a/offload/DeviceRTL/CMakeLists.txt +++ b/offload/DeviceRTL/CMakeLists.txt @@ -42,38 +42,6 @@ set(devicertl_base_directory ${CMAKE_CURRENT_SOURCE_DIR}) set(include_directory ${devicertl_base_directory}/include) set(source_directory ${devicertl_base_directory}/src) -set(all_amdgpu_architectures "gfx700;gfx701;gfx801;gfx803;gfx900;gfx902;gfx906" - "gfx908;gfx90a;gfx90c;gfx940;gfx941;gfx942;gfx950;gfx1010" - "gfx1012;gfx1030;gfx1031;gfx1032;gfx1033;gfx1034;gfx1035" - "gfx1036;gfx1100;gfx1101;gfx1102;gfx1103;gfx1150" - "gfx1151;gfx1152;gfx1153") -set(all_nvptx_architectures "sm_35;sm_37;sm_50;sm_52;sm_53;sm_60;sm_61;sm_62" - "sm_70;sm_72;sm_75;sm_80;sm_86;sm_87;sm_89;sm_90") -set(all_gpu_architectures - "${all_amdgpu_architectures};${all_nvptx_architectures}") - -set(LIBOMPTARGET_DEVICE_ARCHITECTURES "all" CACHE STRING - "List of device architectures to be used to compile the OpenMP DeviceRTL.") - -if(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "all") - set(LIBOMPTARGET_DEVICE_ARCHITECTURES ${all_gpu_architectures}) -elseif(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "amdgpu") - set(LIBOMPTARGET_DEVICE_ARCHITECTURES ${all_amdgpu_architectures}) -elseif(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "nvptx") - set(LIBOMPTARGET_DEVICE_ARCHITECTURES ${all_nvptx_architectures}) -elseif(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "auto" OR - LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "native") - if(NOT LIBOMPTARGET_NVPTX_ARCH AND NOT LIBOMPTARGET_AMDGPU_ARCH) - message(FATAL_ERROR - "Could not find 'amdgpu-arch' and 'nvptx-arch' tools required for 'auto'") - elseif(NOT LIBOMPTARGET_FOUND_NVIDIA_GPU AND NOT LIBOMPTARGET_FOUND_AMDGPU_GPU) - message(FATAL_ERROR "No AMD or NVIDIA GPU found on the system when using 'auto'") - endif() - set(LIBOMPTARGET_DEVICE_ARCHITECTURES - "${LIBOMPTARGET_NVPTX_DETECTED_ARCH_LIST};${LIBOMPTARGET_AMDGPU_DETECTED_ARCH_LIST}") -endif() -list(REMOVE_DUPLICATES LIBOMPTARGET_DEVICE_ARCHITECTURES) - set(include_files ${include_directory}/Allocator.h ${include_directory}/Configuration.h @@ -141,20 +109,21 @@ set(bc_flags -c -foffload-lto -std=c++17 -fvisibility=hidden # first create an object target add_library(omptarget.devicertl.all_objs OBJECT IMPORTED) -function(compileDeviceRTLLibrary target_cpu target_name target_triple) +function(compileDeviceRTLLibrary target_name target_triple) set(target_bc_flags ${ARGN}) set(bc_files "") foreach(src ${src_files}) get_filename_component(infile ${src} ABSOLUTE) get_filename_component(outfile ${src} NAME) - set(outfile "${outfile}-${target_cpu}.bc") + set(outfile "${outfile}-${target_name}.bc") set(depfile "${outfile}.d") add_custom_command(OUTPUT ${outfile} COMMAND ${CLANG_TOOL} ${bc_flags} - --offload-arch=${target_cpu} + -fopenmp-targets=${target_triple} + -Xopenmp-target=${target_triple} -march= ${target_bc_flags} -MD -MF ${depfile} ${infile} -o ${outfile} @@ -177,7 +146,7 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) list(APPEND bc_files ${outfile}) endforeach() - set(bclib_name "libomptarget-${target_name}-${target_cpu}.bc") + set(bclib_name "libomptarget-${target_name}.bc") # Link to a bitcode library. add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/linked_${bclib_name} @@ -217,7 +186,7 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) APPEND) endif() - set(bclib_target_name "omptarget-${target_name}-${target_cpu}-bc") + set(bclib_target_name "omptarget-${target_name}-bc") add_custom_target(${bclib_target_name} DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name}) # Copy library to destination. @@ -239,7 +208,7 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) # Package the bitcode in the bitcode and embed it in an ELF for the static library add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} COMMAND ${PACKAGER_TOOL} -o ${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} - "--image=file=${CMAKE_CURRENT_BINARY_DIR}/${bclib_name},${target_feature},triple=${target_triple},arch=${target_cpu},kind=openmp" + "--image=file=${CMAKE_CURRENT_BINARY_DIR}/${bclib_name},${target_feature},triple=${target_triple},arch=generic,kind=openmp" DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name} COMMENT "Packaging LLVM offloading binary ${bclib_name}.out" ) @@ -249,14 +218,14 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) APPEND) endif() - set(output_name "${CMAKE_CURRENT_BINARY_DIR}/devicertl-${target_name}-${target_cpu}.o") + set(output_name "${CMAKE_CURRENT_BINARY_DIR}/devicertl-${target_name}.o") add_custom_command(OUTPUT ${output_name} COMMAND ${CLANG_TOOL} --std=c++17 -c -nostdlib -Xclang -fembed-offload-object=${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} -o ${output_name} ${source_directory}/Stub.cpp DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} ${source_directory}/Stub.cpp - COMMENT "Embedding LLVM offloading binary in devicertl-${target_name}-${target_cpu}.o" + COMMENT "Embedding LLVM offloading binary in devicertl-${target_name}.o" VERBATIM ) if(TARGET clang) @@ -269,11 +238,11 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) set_property(TARGET omptarget.devicertl.all_objs APPEND PROPERTY IMPORTED_OBJECTS ${output_name}) if (CMAKE_EXPORT_COMPILE_COMMANDS) - set(ide_target_name omptarget-ide-${target_name}-${target_cpu}) + set(ide_target_name omptarget-ide-${target_name}) add_library(${ide_target_name} STATIC EXCLUDE_FROM_ALL ${src_files}) target_compile_options(${ide_target_name} PRIVATE - -fopenmp --offload-arch=${target_cpu} -fopenmp-cuda-mode - -mllvm -openmp-opt-disable + -fopenmp-targets=${target_triple} -Xopenmp-target=${target_triple} -march= + -fopenmp -fopenmp-cuda-mode -mllvm -openmp-opt-disable -foffload-lto -fvisibility=hidden --offload-device-only -nocudalib -nogpulib -nogpuinc -nostdlibinc -Wno-unknown-cuda-version ) @@ -288,18 +257,11 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) endif() endfunction() -# Generate a Bitcode library for all the gpu architectures the user requested. -add_custom_target(omptarget.devicertl.nvptx) add_custom_target(omptarget.devicertl.amdgpu) -foreach(gpu_arch ${LIBOMPTARGET_DEVICE_ARCHITECTURES}) - if("${gpu_arch}" IN_LIST all_amdgpu_architectures) - compileDeviceRTLLibrary(${gpu_arch} amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) - elseif("${gpu_arch}" IN_LIST all_nvptx_architectures) - compileDeviceRTLLibrary(${gpu_arch} nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) - else() - message(FATAL_ERROR "Unknown GPU architecture '${gpu_arch}'") - endif() -endforeach() +compileDeviceRTLLibrary(amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) + +add_custom_target(omptarget.devicertl.nvptx) +compileDeviceRTLLibrary(nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) # Archive all the object files generated above into a static library add_library(omptarget.devicertl STATIC) diff --git a/offload/DeviceRTL/src/Reduction.cpp b/offload/DeviceRTL/src/Reduction.cpp index 57df159d3f28e5..d1aa9d3e474f71 100644 --- a/offload/DeviceRTL/src/Reduction.cpp +++ b/offload/DeviceRTL/src/Reduction.cpp @@ -44,7 +44,6 @@ void gpu_irregular_warp_reduce(void *reduce_data, ShuffleReductFnTy shflFct, } } -#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ < 700 static uint32_t gpu_irregular_simd_reduce(void *reduce_data, ShuffleReductFnTy shflFct) { uint32_t size, remote_id, physical_lane_id; @@ -63,7 +62,6 @@ static uint32_t gpu_irregular_simd_reduce(void *reduce_data, } while (logical_lane_id % 2 == 0 && size > 1); return (logical_lane_id == 0); } -#endif static int32_t nvptx_parallel_reduce_nowait(void *reduce_data, ShuffleReductFnTy shflFct, @@ -74,49 +72,53 @@ static int32_t nvptx_parallel_reduce_nowait(void *reduce_data, uint32_t NumThreads = omp_get_num_threads(); if (NumThreads == 1) return 1; - /* - * This reduce function handles reduction within a team. It handles - * parallel regions in both L1 and L2 parallelism levels. It also - * supports Generic, SPMD, and NoOMP modes. - * - * 1. Reduce within a warp. - * 2. Warp master copies value to warp 0 via shared memory. - * 3. Warp 0 reduces to a single value. - * 4. The reduced value is available in the thread that returns 1. - */ - -#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 700 - uint32_t WarpsNeeded = - (NumThreads + mapping::getWarpSize() - 1) / mapping::getWarpSize(); - uint32_t WarpId = mapping::getWarpIdInBlock(); - - // Volta execution model: - // For the Generic execution mode a parallel region either has 1 thread and - // beyond that, always a multiple of 32. For the SPMD execution mode we may - // have any number of threads. - if ((NumThreads % mapping::getWarpSize() == 0) || (WarpId < WarpsNeeded - 1)) - gpu_regular_warp_reduce(reduce_data, shflFct); - else if (NumThreads > 1) // Only SPMD execution mode comes thru this case. - gpu_irregular_warp_reduce(reduce_data, shflFct, - /*LaneCount=*/NumThreads % mapping::getWarpSize(), - /*LaneId=*/mapping::getThreadIdInBlock() % - mapping::getWarpSize()); - // When we have more than [mapping::getWarpSize()] number of threads - // a block reduction is performed here. // - // Only L1 parallel region can enter this if condition. - if (NumThreads > mapping::getWarpSize()) { - // Gather all the reduced values from each warp - // to the first warp. - cpyFct(reduce_data, WarpsNeeded); + // This reduce function handles reduction within a team. It handles + // parallel regions in both L1 and L2 parallelism levels. It also + // supports Generic, SPMD, and NoOMP modes. + // + // 1. Reduce within a warp. + // 2. Warp master copies value to warp 0 via shared memory. + // 3. Warp 0 reduces to a single value. + // 4. The reduced value is available in the thread that returns 1. + // - if (WarpId == 0) - gpu_irregular_warp_reduce(reduce_data, shflFct, WarpsNeeded, - BlockThreadId); +#if __has_builtin(__nvvm_reflect) + if (__nvvm_reflect("__CUDA_ARCH") >= 700) { + uint32_t WarpsNeeded = + (NumThreads + mapping::getWarpSize() - 1) / mapping::getWarpSize(); + uint32_t WarpId = mapping::getWarpIdInBlock(); + + // Volta execution model: + // For the Generic execution mode a parallel region either has 1 thread and + // beyond that, always a multiple of 32. For the SPMD execution mode we may + // have any number of threads. + if ((NumThreads % mapping::getWarpSize() == 0) || + (WarpId < WarpsNeeded - 1)) + gpu_regular_warp_reduce(reduce_data, shflFct); + else if (NumThreads > 1) // Only SPMD execution mode comes thru this case. + gpu_irregular_warp_reduce( + reduce_data, shflFct, + /*LaneCount=*/NumThreads % mapping::getWarpSize(), + /*LaneId=*/mapping::getThreadIdInBlock() % mapping::getWarpSize()); + + // When we have more than [mapping::getWarpSize()] number of threads + // a block reduction is performed here. + // + // Only L1 parallel region can enter this if condition. + if (NumThreads > mapping::getWarpSize()) { + // Gather all the reduced values from each warp + // to the first warp. + cpyFct(reduce_data, WarpsNeeded); + + if (WarpId == 0) + gpu_irregular_warp_reduce(reduce_data, shflFct, WarpsNeeded, + BlockThreadId); + } + return BlockThreadId == 0; } - return BlockThreadId == 0; -#else +#endif __kmpc_impl_lanemask_t Liveness = mapping::activemask(); if (Liveness == lanes::All) // Full warp gpu_regular_warp_reduce(reduce_data, shflFct); @@ -150,10 +152,9 @@ static int32_t nvptx_parallel_reduce_nowait(void *reduce_data, return BlockThreadId == 0; } - // Get the OMP thread Id. This is different from BlockThreadId in the case of - // an L2 parallel region. + // Get the OMP thread Id. This is different from BlockThreadId in the case + // of an L2 parallel region. return BlockThreadId == 0; -#endif // __CUDA_ARCH__ >= 700 } uint32_t roundToWarpsize(uint32_t s) { >From ae742d66d1c92d4ef95bbcafa05298fb72dcfa63 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Sat, 7 Dec 2024 13:56:50 -0600 Subject: [PATCH 2/4] update with comment --- offload/DeviceRTL/CMakeLists.txt | 1 + offload/DeviceRTL/src/Reduction.cpp | 20 ++++++++++---------- 2 files changed, 11 insertions(+), 10 deletions(-) diff --git a/offload/DeviceRTL/CMakeLists.txt b/offload/DeviceRTL/CMakeLists.txt index cda633c41062b6..22940264f9b19a 100644 --- a/offload/DeviceRTL/CMakeLists.txt +++ b/offload/DeviceRTL/CMakeLists.txt @@ -119,6 +119,7 @@ function(compileDeviceRTLLibrary target_name target_triple) set(outfile "${outfile}-${target_name}.bc") set(depfile "${outfile}.d") + # Passing an empty CPU to -march= suppressed target specific metadata. add_custom_command(OUTPUT ${outfile} COMMAND ${CLANG_TOOL} ${bc_flags} diff --git a/offload/DeviceRTL/src/Reduction.cpp b/offload/DeviceRTL/src/Reduction.cpp index d1aa9d3e474f71..d3b4528401953c 100644 --- a/offload/DeviceRTL/src/Reduction.cpp +++ b/offload/DeviceRTL/src/Reduction.cpp @@ -73,16 +73,16 @@ static int32_t nvptx_parallel_reduce_nowait(void *reduce_data, if (NumThreads == 1) return 1; - // - // This reduce function handles reduction within a team. It handles - // parallel regions in both L1 and L2 parallelism levels. It also - // supports Generic, SPMD, and NoOMP modes. - // - // 1. Reduce within a warp. - // 2. Warp master copies value to warp 0 via shared memory. - // 3. Warp 0 reduces to a single value. - // 4. The reduced value is available in the thread that returns 1. - // + // + // This reduce function handles reduction within a team. It handles + // parallel regions in both L1 and L2 parallelism levels. It also + // supports Generic, SPMD, and NoOMP modes. + // + // 1. Reduce within a warp. + // 2. Warp master copies value to warp 0 via shared memory. + // 3. Warp 0 reduces to a single value. + // 4. The reduced value is available in the thread that returns 1. + // #if __has_builtin(__nvvm_reflect) if (__nvvm_reflect("__CUDA_ARCH") >= 700) { >From 51c5926a6b193383b34cbfd14e499481f7a269eb Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Sat, 7 Dec 2024 14:04:55 -0600 Subject: [PATCH 3/4] release notes --- clang/docs/ReleaseNotes.rst | 3 +++ openmp/docs/ReleaseNotes.rst | 9 +++++++-- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index 12d8ebf3251bc0..1b2bb3b98f0dc4 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -1153,6 +1153,9 @@ OpenMP Support - Added support for 'omp assume' directive. - Added support for 'omp scope' directive. - Added support for allocator-modifier in 'allocate' clause. +- Changed the OpenMP DeviceRTL to use 'generic' IR. The + ``LIBOMPTARGET_DEVICE_ARCHITECTURES`` CMake argument is now unused and will + always build support for AMDGPU and NVPTX targets. Improvements ^^^^^^^^^^^^ diff --git a/openmp/docs/ReleaseNotes.rst b/openmp/docs/ReleaseNotes.rst index d4a4b1a99f7813..21be8b41c73aac 100644 --- a/openmp/docs/ReleaseNotes.rst +++ b/openmp/docs/ReleaseNotes.rst @@ -1,5 +1,4 @@ -=========================== -OpenMP 20.0.0 Release Notes + =========================== @@ -19,3 +18,9 @@ from the `LLVM releases web site `_. Non-comprehensive list of changes in this release ================================================= + +Device Runtime +-------------- +- Changed the OpenMP DeviceRTL to use 'generic' IR. The + ``LIBOMPTARGET_DEVICE_ARCHITECTURES`` CMake argument is now unused and will + always build support for AMDGPU and NVPTX targets. >From 54660d4db72ed98d63ab5b538bc936d0fb4e29c2 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Sat, 7 Dec 2024 15:42:50 -0600 Subject: [PATCH 4/4] fix --- .../{libomptarget-nvptx-sm_52.bc => libomptarget-nvptx.bc} | 0 .../{libomptarget-nvptx-sm_52.bc => libomptarget-nvptx.bc} | 0 clang/test/Driver/openmp-offload-gpu.c | 4 ++-- openmp/docs/ReleaseNotes.rst | 3 ++- 4 files changed, 4 insertions(+), 3 deletions(-) rename clang/test/Driver/Inputs/libomptarget/{libomptarget-nvptx-sm_52.bc => libomptarget-nvptx.bc} (100%) rename clang/test/Driver/Inputs/libomptarget/subdir/{libomptarget-nvptx-sm_52.bc => libomptarget-nvptx.bc} (100%) diff --git a/clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx-sm_52.bc b/clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx.bc similarity index 100% rename from clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx-sm_52.bc rename to clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx.bc diff --git a/clang/test/Driver/Inputs/libomptarget/subdir/libomptarget-nvptx-sm_52.bc b/clang/test/Driver/Inputs/libomptarget/subdir/libomptarget-nvptx.bc similarity index 100% rename from clang/test/Driver/Inputs/libomptarget/subdir/libomptarget-nvptx-sm_52.bc rename to clang/test/Driver/Inputs/libomptarget/subdir/libomptarget-nvptx.bc diff --git a/clang/test/Driver/openmp-offload-gpu.c b/clang/test/Driver/openmp-offload-gpu.c index f6e2245dcdbc05..74bd2a6aeee468 100644 --- a/clang/test/Driver/openmp-offload-gpu.c +++ b/clang/test/Driver/openmp-offload-gpu.c @@ -90,8 +90,8 @@ // RUN: %s 2>&1 | FileCheck -check-prefix=CHK-ENV-BCLIB %s // CHK-BCLIB: clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}libomptarget-nvptx-test.bc -// CHK-BCLIB-DIR: clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}libomptarget{{/|\\\\}}libomptarget-nvptx-sm_52.bc -// CHK-ENV-BCLIB: clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}subdir{{/|\\\\}}libomptarget-nvptx-sm_52.bc +// CHK-BCLIB-DIR: clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}libomptarget{{/|\\\\}}libomptarget-nvptx.bc +// CHK-ENV-BCLIB: clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}subdir{{/|\\\\}}libomptarget-nvptx.bc // CHK-BCLIB-NOT: {{error:|warning:}} /// ########################################################################### diff --git a/openmp/docs/ReleaseNotes.rst b/openmp/docs/ReleaseNotes.rst index 21be8b41c73aac..0089f1aa31d144 100644 --- a/openmp/docs/ReleaseNotes.rst +++ b/openmp/docs/ReleaseNotes.rst @@ -1,4 +1,5 @@ - +=========================== +OpenMP 20.0.0 Release Notes =========================== From openmp-commits at lists.llvm.org Sat Dec 7 13:49:38 2024 From: openmp-commits at lists.llvm.org (Joseph Huber via Openmp-commits) Date: Sat, 07 Dec 2024 13:49:38 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <6754c2f2.050a0220.31c451.f96f@mx.google.com> jhuber6 wrote: @ronlieb This will probably conflict a lot with downstream, should probably wait until I've talked it over with others before trying to merge it in AOMP. https://github.com/llvm/llvm-project/pull/119091 From openmp-commits at lists.llvm.org Sat Dec 7 13:55:41 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Sat, 07 Dec 2024 13:55:41 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <6754c45d.050a0220.1984fc.5388@mx.google.com> ronlieb wrote: > @ronlieb This will probably conflict a lot with downstream, should probably wait until I've talked it over with others before trying to merge it in AOMP. yes please , lets wait until internal monday mtg to discuss. also i would like to help with some precommit performance testing using a downstream build. https://github.com/llvm/llvm-project/pull/119091 From openmp-commits at lists.llvm.org Sun Dec 8 04:56:06 2024 From: openmp-commits at lists.llvm.org (=?UTF-8?B?TWljaGHFgiBHw7Nybnk=?= via Openmp-commits) Date: Sun, 08 Dec 2024 04:56:06 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <67559766.170a0220.333b77.2c55@mx.google.com> ================ @@ -288,18 +258,11 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) endif() endfunction() -# Generate a Bitcode library for all the gpu architectures the user requested. -add_custom_target(omptarget.devicertl.nvptx) add_custom_target(omptarget.devicertl.amdgpu) -foreach(gpu_arch ${LIBOMPTARGET_DEVICE_ARCHITECTURES}) - if("${gpu_arch}" IN_LIST all_amdgpu_architectures) - compileDeviceRTLLibrary(${gpu_arch} amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) - elseif("${gpu_arch}" IN_LIST all_nvptx_architectures) - compileDeviceRTLLibrary(${gpu_arch} nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) - else() - message(FATAL_ERROR "Unknown GPU architecture '${gpu_arch}'") - endif() -endforeach() +compileDeviceRTLLibrary(amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) + +add_custom_target(omptarget.devicertl.nvptx) +compileDeviceRTLLibrary(nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) ---------------- mgorny wrote: Can we have an option to disable either of amdgpu/nvptx specifically? If anything, because it feels weird having a choice between amdgpu and nvptx plugins, but no choice between devicertl variants. https://github.com/llvm/llvm-project/pull/119091 From openmp-commits at lists.llvm.org Sun Dec 8 05:23:14 2024 From: openmp-commits at lists.llvm.org (Joseph Huber via Openmp-commits) Date: Sun, 08 Dec 2024 05:23:14 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <67559dc2.630a0220.68ac5.18e4@mx.google.com> ================ @@ -288,18 +258,11 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) endif() endfunction() -# Generate a Bitcode library for all the gpu architectures the user requested. -add_custom_target(omptarget.devicertl.nvptx) add_custom_target(omptarget.devicertl.amdgpu) -foreach(gpu_arch ${LIBOMPTARGET_DEVICE_ARCHITECTURES}) - if("${gpu_arch}" IN_LIST all_amdgpu_architectures) - compileDeviceRTLLibrary(${gpu_arch} amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) - elseif("${gpu_arch}" IN_LIST all_nvptx_architectures) - compileDeviceRTLLibrary(${gpu_arch} nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) - else() - message(FATAL_ERROR "Unknown GPU architecture '${gpu_arch}'") - endif() -endforeach() +compileDeviceRTLLibrary(amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) + +add_custom_target(omptarget.devicertl.nvptx) +compileDeviceRTLLibrary(nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) ---------------- jhuber6 wrote: I didn't feel like it was strictly necessary since if you can build one you can build the other. The only thing you save is disk space, but if you feel like it's really needed I can re-use the architectures thing or something. https://github.com/llvm/llvm-project/pull/119091 From openmp-commits at lists.llvm.org Sun Dec 8 05:29:28 2024 From: openmp-commits at lists.llvm.org (=?UTF-8?B?TWljaGHFgiBHw7Nybnk=?= via Openmp-commits) Date: Sun, 08 Dec 2024 05:29:28 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <67559f38.050a0220.20fb4b.d804@mx.google.com> ================ @@ -288,18 +258,11 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) endif() endfunction() -# Generate a Bitcode library for all the gpu architectures the user requested. -add_custom_target(omptarget.devicertl.nvptx) add_custom_target(omptarget.devicertl.amdgpu) -foreach(gpu_arch ${LIBOMPTARGET_DEVICE_ARCHITECTURES}) - if("${gpu_arch}" IN_LIST all_amdgpu_architectures) - compileDeviceRTLLibrary(${gpu_arch} amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) - elseif("${gpu_arch}" IN_LIST all_nvptx_architectures) - compileDeviceRTLLibrary(${gpu_arch} nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) - else() - message(FATAL_ERROR "Unknown GPU architecture '${gpu_arch}'") - endif() -endforeach() +compileDeviceRTLLibrary(amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) + +add_custom_target(omptarget.devicertl.nvptx) +compileDeviceRTLLibrary(nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) ---------------- mgorny wrote: I don't exactly feel strongly about it. It's just that we already provide a switch for amdgpu/nvptx in Gentoo, so it only feels natural for this to respect it. Also, on a semi-related matter: is there a reason we're installing both separate `.bc` files and `libomptarget.devicertl.a`? https://github.com/llvm/llvm-project/pull/119091 From openmp-commits at lists.llvm.org Sun Dec 8 05:36:04 2024 From: openmp-commits at lists.llvm.org (Joseph Huber via Openmp-commits) Date: Sun, 08 Dec 2024 05:36:04 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <6755a0c4.630a0220.f37f9.22f6@mx.google.com> ================ @@ -288,18 +258,11 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) endif() endfunction() -# Generate a Bitcode library for all the gpu architectures the user requested. -add_custom_target(omptarget.devicertl.nvptx) add_custom_target(omptarget.devicertl.amdgpu) -foreach(gpu_arch ${LIBOMPTARGET_DEVICE_ARCHITECTURES}) - if("${gpu_arch}" IN_LIST all_amdgpu_architectures) - compileDeviceRTLLibrary(${gpu_arch} amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) - elseif("${gpu_arch}" IN_LIST all_nvptx_architectures) - compileDeviceRTLLibrary(${gpu_arch} nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) - else() - message(FATAL_ERROR "Unknown GPU architecture '${gpu_arch}'") - endif() -endforeach() +compileDeviceRTLLibrary(amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) + +add_custom_target(omptarget.devicertl.nvptx) +compileDeviceRTLLibrary(nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) ---------------- jhuber6 wrote: This is really over-complicated, but right now we only use the `.bc` file for non-LTO NVPTX compilations which get put through each TU in a broken way via `-mlink-builtin-bitcode`. This is because it would get really slow if we didn't optimize out calls to the runtime. https://github.com/llvm/llvm-project/pull/119091 From openmp-commits at lists.llvm.org Sun Dec 8 05:50:27 2024 From: openmp-commits at lists.llvm.org (=?UTF-8?B?TWljaGHFgiBHw7Nybnk=?= via Openmp-commits) Date: Sun, 08 Dec 2024 05:50:27 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <6755a423.170a0220.3ce871.4552@mx.google.com> ================ @@ -288,18 +258,11 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) endif() endfunction() -# Generate a Bitcode library for all the gpu architectures the user requested. -add_custom_target(omptarget.devicertl.nvptx) add_custom_target(omptarget.devicertl.amdgpu) -foreach(gpu_arch ${LIBOMPTARGET_DEVICE_ARCHITECTURES}) - if("${gpu_arch}" IN_LIST all_amdgpu_architectures) - compileDeviceRTLLibrary(${gpu_arch} amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) - elseif("${gpu_arch}" IN_LIST all_nvptx_architectures) - compileDeviceRTLLibrary(${gpu_arch} nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) - else() - message(FATAL_ERROR "Unknown GPU architecture '${gpu_arch}'") - endif() -endforeach() +compileDeviceRTLLibrary(amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) + +add_custom_target(omptarget.devicertl.nvptx) +compileDeviceRTLLibrary(nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) ---------------- mgorny wrote: Ok, so it's not accidental/obsolete — that's all I needed to know :-). Another thing I've noticed that clang needs `libomptarget.devicertl.a` even when no GPU target is actually used. I've worked around that for now by creating an empty `.a` file locally, but perhaps that should also be handled better somehow (I can make a PR later if you tell me how you'd like it handled). https://github.com/llvm/llvm-project/pull/119091 From openmp-commits at lists.llvm.org Sun Dec 8 05:55:29 2024 From: openmp-commits at lists.llvm.org (Joseph Huber via Openmp-commits) Date: Sun, 08 Dec 2024 05:55:29 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <6755a551.050a0220.11bcc6.01b4@mx.google.com> ================ @@ -288,18 +258,11 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) endif() endfunction() -# Generate a Bitcode library for all the gpu architectures the user requested. -add_custom_target(omptarget.devicertl.nvptx) add_custom_target(omptarget.devicertl.amdgpu) -foreach(gpu_arch ${LIBOMPTARGET_DEVICE_ARCHITECTURES}) - if("${gpu_arch}" IN_LIST all_amdgpu_architectures) - compileDeviceRTLLibrary(${gpu_arch} amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) - elseif("${gpu_arch}" IN_LIST all_nvptx_architectures) - compileDeviceRTLLibrary(${gpu_arch} nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) - else() - message(FATAL_ERROR "Unknown GPU architecture '${gpu_arch}'") - endif() -endforeach() +compileDeviceRTLLibrary(amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) + +add_custom_target(omptarget.devicertl.nvptx) +compileDeviceRTLLibrary(nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) ---------------- jhuber6 wrote: I suppose that happens for the CPU targets? Those aren't really used anywhere real, they're just for testing, but even so we probably shouldn't leave them broken. We link the device RTL unconditionally because the link step shouldn't need to know the architectures that were used to compile it. Because these are static libraries they're not extracted if they aren't needed, so it doesn't hurt anything if they're unused. But I guess you can have a situation where the user doesn't build this intentionally, but that's non-default behavior so I've never given it much thought. https://github.com/llvm/llvm-project/pull/119091 From openmp-commits at lists.llvm.org Sun Dec 8 06:33:27 2024 From: openmp-commits at lists.llvm.org (Matt Arsenault via Openmp-commits) Date: Sun, 08 Dec 2024 06:33:27 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <6755ae37.630a0220.68ac5.1cc4@mx.google.com> arsenm wrote: This probably should retain separate wave32/wave64 builds. Additionally, should have extension points for subtarget specific implementations https://github.com/llvm/llvm-project/pull/119091 From openmp-commits at lists.llvm.org Sun Dec 8 06:34:45 2024 From: openmp-commits at lists.llvm.org (Joseph Huber via Openmp-commits) Date: Sun, 08 Dec 2024 06:34:45 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <6755ae85.170a0220.347f.74c9@mx.google.com> jhuber6 wrote: > This probably should retain separate wave32/wave64 builds. Additionally, should have extension points for subtarget specific implementations That's what Shilei was talking about since we have `__nvvm_reflect` for that for NVPTX. https://github.com/llvm/llvm-project/pull/119091 From openmp-commits at lists.llvm.org Sun Dec 8 06:36:59 2024 From: openmp-commits at lists.llvm.org (Matt Arsenault via Openmp-commits) Date: Sun, 08 Dec 2024 06:36:59 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <6755af0b.170a0220.8e99.0aef@mx.google.com> arsenm wrote: > > This probably should retain separate wave32/wave64 builds. Additionally, should have extension points for subtarget specific implementations > > That's what Shilei was talking about since we have `__nvvm_reflect` for that for NVPTX. First I will always consider NVVM reflect a giant hack. NVVM reflect cannot actually deal with the full range of wavesize issues. It is an incompatible ABI and the code should never be intermixed https://github.com/llvm/llvm-project/pull/119091 From openmp-commits at lists.llvm.org Sun Dec 8 06:38:11 2024 From: openmp-commits at lists.llvm.org (Joseph Huber via Openmp-commits) Date: Sun, 08 Dec 2024 06:38:11 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <6755af53.a70a0220.28926e.258f@mx.google.com> jhuber6 wrote: > First I will always consider NVVM reflect a giant hack. NVVM reflect cannot actually deal with the full range of wavesize issues. It is an incompatible ABI and the code should never be intermixed It's a hack, but still better than whatever it is AMD does currently. https://github.com/llvm/llvm-project/pull/119091 From openmp-commits at lists.llvm.org Sun Dec 8 07:10:37 2024 From: openmp-commits at lists.llvm.org (Shilei Tian via Openmp-commits) Date: Sun, 08 Dec 2024 07:10:37 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <6755b6ed.170a0220.2b492.db9d@mx.google.com> shiltian wrote: > This probably should retain separate wave32/wave64 builds. Additionally, should have extension points for subtarget specific implementations Does wave32/wave64 affect the lowering from front end source code to middle end IR, if we don't use certain functions, such as wave level primitives? https://github.com/llvm/llvm-project/pull/119091 From openmp-commits at lists.llvm.org Sun Dec 8 07:12:02 2024 From: openmp-commits at lists.llvm.org (=?UTF-8?B?TWljaGHFgiBHw7Nybnk=?= via Openmp-commits) Date: Sun, 08 Dec 2024 07:12:02 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <6755b742.050a0220.26a312.4bfd@mx.google.com> ================ @@ -288,18 +258,11 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) endif() endfunction() -# Generate a Bitcode library for all the gpu architectures the user requested. -add_custom_target(omptarget.devicertl.nvptx) add_custom_target(omptarget.devicertl.amdgpu) -foreach(gpu_arch ${LIBOMPTARGET_DEVICE_ARCHITECTURES}) - if("${gpu_arch}" IN_LIST all_amdgpu_architectures) - compileDeviceRTLLibrary(${gpu_arch} amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) - elseif("${gpu_arch}" IN_LIST all_nvptx_architectures) - compileDeviceRTLLibrary(${gpu_arch} nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) - else() - message(FATAL_ERROR "Unknown GPU architecture '${gpu_arch}'") - endif() -endforeach() +compileDeviceRTLLibrary(amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) + +add_custom_target(omptarget.devicertl.nvptx) +compileDeviceRTLLibrary(nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) ---------------- mgorny wrote: Yeah. precisely. I've noticed this while building `-DLIBOMPTARGET_BUILD_DEVICERTL_BCLIB=OFF -DLIBOMPTARGET_PLUGINS_TO_BUILD=host` to run tests on CPU only. https://github.com/llvm/llvm-project/pull/119091 From openmp-commits at lists.llvm.org Sun Dec 8 12:44:04 2024 From: openmp-commits at lists.llvm.org (Matt Arsenault via Openmp-commits) Date: Sun, 08 Dec 2024 12:44:04 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <67560514.170a0220.a102c.0b3b@mx.google.com> arsenm wrote: > Does wave32/wave64 affect the lowering from front end source code to middle end IR, if we don't use certain functions, such as wave level primitives? Yes. This is a fixed ABI setting that can change the binary function signature (which then has knock on optimization effects). The IR is a representation of what goes on a final .o. It is not really intended to enable new features or make the program portable. https://github.com/llvm/llvm-project/pull/119091 From openmp-commits at lists.llvm.org Sun Dec 8 13:35:04 2024 From: openmp-commits at lists.llvm.org (Shilei Tian via Openmp-commits) Date: Sun, 08 Dec 2024 13:35:04 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <67561108.630a0220.cdac9.7818@mx.google.com> shiltian wrote: > The IR is a representation of what goes on a final .o. It is not really intended to enable new features or make the program portable. My .02 is, it depends on how to think about it. If we eliminate the "unwanted" part (especially those that could potentially affect ABI) at the very beginning of the middle end, it will not have optimization effects. In addition, the compiler backend lowering is essentially doing something like: ``` if (target == A) lower Inst to I1 else if (has_feature_that_only_compiler_can_call("...")) lower Inst to I2 ... ``` If speaking from "portability", the compiler is "portable". The reflection is just to give end user an interface to do similar things: ``` if (__builtin_get_target() == A) do something in one way else if (has_feature_that_user_can_call("...")) do something in another way ... ``` Of course the interface(s) have to be carefully designed and all the queried information can't be changed during optimization. https://github.com/llvm/llvm-project/pull/119091 From openmp-commits at lists.llvm.org Mon Dec 9 07:53:48 2024 From: openmp-commits at lists.llvm.org (Shilei Tian via Openmp-commits) Date: Mon, 09 Dec 2024 07:53:48 -0800 (PST) Subject: [Openmp-commits] [openmp] [OpenMP] Fix interoperability test compilation on OpenBSD (PR #119053) In-Reply-To: Message-ID: <6757128c.170a0220.22c9d.d5b2@mx.google.com> https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/119053 From openmp-commits at lists.llvm.org Mon Dec 9 07:53:48 2024 From: openmp-commits at lists.llvm.org (Shilei Tian via Openmp-commits) Date: Mon, 09 Dec 2024 07:53:48 -0800 (PST) Subject: [Openmp-commits] [openmp] [OpenMP] Fix interoperability test compilation on OpenBSD (PR #119053) In-Reply-To: Message-ID: <6757128c.050a0220.31b971.be40@mx.google.com> https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/119053 From openmp-commits at lists.llvm.org Mon Dec 9 08:13:40 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Mon, 09 Dec 2024 08:13:40 -0800 (PST) Subject: [Openmp-commits] [openmp] ec27eb8 - [OpenMP] Fix interoperability test compilation on OpenBSD (#119053) Message-ID: <67571734.170a0220.39d029.a03a@mx.google.com> Author: Brad Smith Date: 2024-12-09T11:13:37-05:00 New Revision: ec27eb8c6b08fb9284ff7069cda85f3dd208a3f1 URL: https://github.com/llvm/llvm-project/commit/ec27eb8c6b08fb9284ff7069cda85f3dd208a3f1 DIFF: https://github.com/llvm/llvm-project/commit/ec27eb8c6b08fb9284ff7069cda85f3dd208a3f1.diff LOG: [OpenMP] Fix interoperability test compilation on OpenBSD (#119053) Added: Modified: openmp/runtime/test/ompt/misc/interoperability.cpp Removed: ################################################################################ diff --git a/openmp/runtime/test/ompt/misc/interoperability.cpp b/openmp/runtime/test/ompt/misc/interoperability.cpp index 6f17a94c1f4042..d9c11461220df1 100644 --- a/openmp/runtime/test/ompt/misc/interoperability.cpp +++ b/openmp/runtime/test/ompt/misc/interoperability.cpp @@ -3,7 +3,7 @@ #include #include -#if !defined(__FreeBSD__) && !defined(__NetBSD__) +#if !defined(__FreeBSD__) && !defined(__NetBSD__) && !defined(__OpenBSD__) #include #else #include From openmp-commits at lists.llvm.org Mon Dec 9 08:13:43 2024 From: openmp-commits at lists.llvm.org (Brad Smith via Openmp-commits) Date: Mon, 09 Dec 2024 08:13:43 -0800 (PST) Subject: [Openmp-commits] [openmp] [OpenMP] Fix interoperability test compilation on OpenBSD (PR #119053) In-Reply-To: Message-ID: <67571737.170a0220.1daac4.96f2@mx.google.com> https://github.com/brad0 closed https://github.com/llvm/llvm-project/pull/119053 From openmp-commits at lists.llvm.org Mon Dec 9 08:31:49 2024 From: openmp-commits at lists.llvm.org (Joseph Huber via Openmp-commits) Date: Mon, 09 Dec 2024 08:31:49 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <67571b75.050a0220.337e40.bbc1@mx.google.com> https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/119091 >From ccbbc8cd83415aa56fbc3726069776255bcbc918 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Sat, 7 Dec 2024 13:47:23 -0600 Subject: [PATCH] [OpenMP] Use generic IR for the OpenMP DeviceRTL Summary: We previously built this for every single architecture to deal with incompatibility. This patch updates it to use the 'generic' IR that `libc` and other projects use. Who knows if this will have any side-effects, probably worth testing more but it passes the tests I expect to pass on my side. update with comment release notes fix --- clang/docs/ReleaseNotes.rst | 3 + clang/lib/Driver/ToolChains/CommonArgs.cpp | 3 +- clang/lib/Driver/ToolChains/Cuda.cpp | 1 - ...t-nvptx-sm_52.bc => libomptarget-nvptx.bc} | 0 ...t-nvptx-sm_52.bc => libomptarget-nvptx.bc} | 0 clang/test/Driver/openmp-offload-gpu.c | 4 +- offload/DeviceRTL/CMakeLists.txt | 71 ++++----------- offload/DeviceRTL/src/Reduction.cpp | 91 ++++++++++--------- openmp/docs/ReleaseNotes.rst | 6 ++ 9 files changed, 75 insertions(+), 104 deletions(-) rename clang/test/Driver/Inputs/libomptarget/{libomptarget-nvptx-sm_52.bc => libomptarget-nvptx.bc} (100%) rename clang/test/Driver/Inputs/libomptarget/subdir/{libomptarget-nvptx-sm_52.bc => libomptarget-nvptx.bc} (100%) diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index 7d846f1d447d16..53be91337f3b23 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -1154,6 +1154,9 @@ OpenMP Support - Added support for 'omp assume' directive. - Added support for 'omp scope' directive. - Added support for allocator-modifier in 'allocate' clause. +- Changed the OpenMP DeviceRTL to use 'generic' IR. The + ``LIBOMPTARGET_DEVICE_ARCHITECTURES`` CMake argument is now unused and will + always build support for AMDGPU and NVPTX targets. Improvements ^^^^^^^^^^^^ diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp index 7d3d7f8f03c491..3dd90ecf8bca4c 100644 --- a/clang/lib/Driver/ToolChains/CommonArgs.cpp +++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp @@ -2800,8 +2800,7 @@ void tools::addOpenMPDeviceRTL(const Driver &D, : options::OPT_libomptarget_nvptx_bc_path_EQ; StringRef ArchPrefix = Triple.isAMDGCN() ? "amdgpu" : "nvptx"; - std::string LibOmpTargetName = - ("libomptarget-" + ArchPrefix + "-" + BitcodeSuffix + ".bc").str(); + std::string LibOmpTargetName = ("libomptarget-" + ArchPrefix + ".bc").str(); // First check whether user specifies bc library if (const Arg *A = DriverArgs.getLastArg(LibomptargetBCPathOpt)) { diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp index 102794829795da..214f1e5d83478f 100644 --- a/clang/lib/Driver/ToolChains/Cuda.cpp +++ b/clang/lib/Driver/ToolChains/Cuda.cpp @@ -851,7 +851,6 @@ void CudaToolChain::addClangTargetOptions( HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind); StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_march_EQ); - assert(!GpuArch.empty() && "Must have an explicit GPU arch."); assert((DeviceOffloadingKind == Action::OFK_OpenMP || DeviceOffloadingKind == Action::OFK_Cuda) && "Only OpenMP or CUDA offloading kinds are supported for NVIDIA GPUs."); diff --git a/clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx-sm_52.bc b/clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx.bc similarity index 100% rename from clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx-sm_52.bc rename to clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx.bc diff --git a/clang/test/Driver/Inputs/libomptarget/subdir/libomptarget-nvptx-sm_52.bc b/clang/test/Driver/Inputs/libomptarget/subdir/libomptarget-nvptx.bc similarity index 100% rename from clang/test/Driver/Inputs/libomptarget/subdir/libomptarget-nvptx-sm_52.bc rename to clang/test/Driver/Inputs/libomptarget/subdir/libomptarget-nvptx.bc diff --git a/clang/test/Driver/openmp-offload-gpu.c b/clang/test/Driver/openmp-offload-gpu.c index f6e2245dcdbc05..74bd2a6aeee468 100644 --- a/clang/test/Driver/openmp-offload-gpu.c +++ b/clang/test/Driver/openmp-offload-gpu.c @@ -90,8 +90,8 @@ // RUN: %s 2>&1 | FileCheck -check-prefix=CHK-ENV-BCLIB %s // CHK-BCLIB: clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}libomptarget-nvptx-test.bc -// CHK-BCLIB-DIR: clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}libomptarget{{/|\\\\}}libomptarget-nvptx-sm_52.bc -// CHK-ENV-BCLIB: clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}subdir{{/|\\\\}}libomptarget-nvptx-sm_52.bc +// CHK-BCLIB-DIR: clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}libomptarget{{/|\\\\}}libomptarget-nvptx.bc +// CHK-ENV-BCLIB: clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}subdir{{/|\\\\}}libomptarget-nvptx.bc // CHK-BCLIB-NOT: {{error:|warning:}} /// ########################################################################### diff --git a/offload/DeviceRTL/CMakeLists.txt b/offload/DeviceRTL/CMakeLists.txt index 1bf3eb9da38aa1..22940264f9b19a 100644 --- a/offload/DeviceRTL/CMakeLists.txt +++ b/offload/DeviceRTL/CMakeLists.txt @@ -42,38 +42,6 @@ set(devicertl_base_directory ${CMAKE_CURRENT_SOURCE_DIR}) set(include_directory ${devicertl_base_directory}/include) set(source_directory ${devicertl_base_directory}/src) -set(all_amdgpu_architectures "gfx700;gfx701;gfx801;gfx803;gfx900;gfx902;gfx906" - "gfx908;gfx90a;gfx90c;gfx940;gfx941;gfx942;gfx950;gfx1010" - "gfx1012;gfx1030;gfx1031;gfx1032;gfx1033;gfx1034;gfx1035" - "gfx1036;gfx1100;gfx1101;gfx1102;gfx1103;gfx1150" - "gfx1151;gfx1152;gfx1153") -set(all_nvptx_architectures "sm_35;sm_37;sm_50;sm_52;sm_53;sm_60;sm_61;sm_62" - "sm_70;sm_72;sm_75;sm_80;sm_86;sm_87;sm_89;sm_90") -set(all_gpu_architectures - "${all_amdgpu_architectures};${all_nvptx_architectures}") - -set(LIBOMPTARGET_DEVICE_ARCHITECTURES "all" CACHE STRING - "List of device architectures to be used to compile the OpenMP DeviceRTL.") - -if(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "all") - set(LIBOMPTARGET_DEVICE_ARCHITECTURES ${all_gpu_architectures}) -elseif(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "amdgpu") - set(LIBOMPTARGET_DEVICE_ARCHITECTURES ${all_amdgpu_architectures}) -elseif(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "nvptx") - set(LIBOMPTARGET_DEVICE_ARCHITECTURES ${all_nvptx_architectures}) -elseif(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "auto" OR - LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "native") - if(NOT LIBOMPTARGET_NVPTX_ARCH AND NOT LIBOMPTARGET_AMDGPU_ARCH) - message(FATAL_ERROR - "Could not find 'amdgpu-arch' and 'nvptx-arch' tools required for 'auto'") - elseif(NOT LIBOMPTARGET_FOUND_NVIDIA_GPU AND NOT LIBOMPTARGET_FOUND_AMDGPU_GPU) - message(FATAL_ERROR "No AMD or NVIDIA GPU found on the system when using 'auto'") - endif() - set(LIBOMPTARGET_DEVICE_ARCHITECTURES - "${LIBOMPTARGET_NVPTX_DETECTED_ARCH_LIST};${LIBOMPTARGET_AMDGPU_DETECTED_ARCH_LIST}") -endif() -list(REMOVE_DUPLICATES LIBOMPTARGET_DEVICE_ARCHITECTURES) - set(include_files ${include_directory}/Allocator.h ${include_directory}/Configuration.h @@ -141,20 +109,22 @@ set(bc_flags -c -foffload-lto -std=c++17 -fvisibility=hidden # first create an object target add_library(omptarget.devicertl.all_objs OBJECT IMPORTED) -function(compileDeviceRTLLibrary target_cpu target_name target_triple) +function(compileDeviceRTLLibrary target_name target_triple) set(target_bc_flags ${ARGN}) set(bc_files "") foreach(src ${src_files}) get_filename_component(infile ${src} ABSOLUTE) get_filename_component(outfile ${src} NAME) - set(outfile "${outfile}-${target_cpu}.bc") + set(outfile "${outfile}-${target_name}.bc") set(depfile "${outfile}.d") + # Passing an empty CPU to -march= suppressed target specific metadata. add_custom_command(OUTPUT ${outfile} COMMAND ${CLANG_TOOL} ${bc_flags} - --offload-arch=${target_cpu} + -fopenmp-targets=${target_triple} + -Xopenmp-target=${target_triple} -march= ${target_bc_flags} -MD -MF ${depfile} ${infile} -o ${outfile} @@ -177,7 +147,7 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) list(APPEND bc_files ${outfile}) endforeach() - set(bclib_name "libomptarget-${target_name}-${target_cpu}.bc") + set(bclib_name "libomptarget-${target_name}.bc") # Link to a bitcode library. add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/linked_${bclib_name} @@ -217,7 +187,7 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) APPEND) endif() - set(bclib_target_name "omptarget-${target_name}-${target_cpu}-bc") + set(bclib_target_name "omptarget-${target_name}-bc") add_custom_target(${bclib_target_name} DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name}) # Copy library to destination. @@ -239,7 +209,7 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) # Package the bitcode in the bitcode and embed it in an ELF for the static library add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} COMMAND ${PACKAGER_TOOL} -o ${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} - "--image=file=${CMAKE_CURRENT_BINARY_DIR}/${bclib_name},${target_feature},triple=${target_triple},arch=${target_cpu},kind=openmp" + "--image=file=${CMAKE_CURRENT_BINARY_DIR}/${bclib_name},${target_feature},triple=${target_triple},arch=generic,kind=openmp" DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name} COMMENT "Packaging LLVM offloading binary ${bclib_name}.out" ) @@ -249,14 +219,14 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) APPEND) endif() - set(output_name "${CMAKE_CURRENT_BINARY_DIR}/devicertl-${target_name}-${target_cpu}.o") + set(output_name "${CMAKE_CURRENT_BINARY_DIR}/devicertl-${target_name}.o") add_custom_command(OUTPUT ${output_name} COMMAND ${CLANG_TOOL} --std=c++17 -c -nostdlib -Xclang -fembed-offload-object=${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} -o ${output_name} ${source_directory}/Stub.cpp DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} ${source_directory}/Stub.cpp - COMMENT "Embedding LLVM offloading binary in devicertl-${target_name}-${target_cpu}.o" + COMMENT "Embedding LLVM offloading binary in devicertl-${target_name}.o" VERBATIM ) if(TARGET clang) @@ -269,11 +239,11 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) set_property(TARGET omptarget.devicertl.all_objs APPEND PROPERTY IMPORTED_OBJECTS ${output_name}) if (CMAKE_EXPORT_COMPILE_COMMANDS) - set(ide_target_name omptarget-ide-${target_name}-${target_cpu}) + set(ide_target_name omptarget-ide-${target_name}) add_library(${ide_target_name} STATIC EXCLUDE_FROM_ALL ${src_files}) target_compile_options(${ide_target_name} PRIVATE - -fopenmp --offload-arch=${target_cpu} -fopenmp-cuda-mode - -mllvm -openmp-opt-disable + -fopenmp-targets=${target_triple} -Xopenmp-target=${target_triple} -march= + -fopenmp -fopenmp-cuda-mode -mllvm -openmp-opt-disable -foffload-lto -fvisibility=hidden --offload-device-only -nocudalib -nogpulib -nogpuinc -nostdlibinc -Wno-unknown-cuda-version ) @@ -288,18 +258,11 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) endif() endfunction() -# Generate a Bitcode library for all the gpu architectures the user requested. -add_custom_target(omptarget.devicertl.nvptx) add_custom_target(omptarget.devicertl.amdgpu) -foreach(gpu_arch ${LIBOMPTARGET_DEVICE_ARCHITECTURES}) - if("${gpu_arch}" IN_LIST all_amdgpu_architectures) - compileDeviceRTLLibrary(${gpu_arch} amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) - elseif("${gpu_arch}" IN_LIST all_nvptx_architectures) - compileDeviceRTLLibrary(${gpu_arch} nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) - else() - message(FATAL_ERROR "Unknown GPU architecture '${gpu_arch}'") - endif() -endforeach() +compileDeviceRTLLibrary(amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) + +add_custom_target(omptarget.devicertl.nvptx) +compileDeviceRTLLibrary(nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) # Archive all the object files generated above into a static library add_library(omptarget.devicertl STATIC) diff --git a/offload/DeviceRTL/src/Reduction.cpp b/offload/DeviceRTL/src/Reduction.cpp index 57df159d3f28e5..d3b4528401953c 100644 --- a/offload/DeviceRTL/src/Reduction.cpp +++ b/offload/DeviceRTL/src/Reduction.cpp @@ -44,7 +44,6 @@ void gpu_irregular_warp_reduce(void *reduce_data, ShuffleReductFnTy shflFct, } } -#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ < 700 static uint32_t gpu_irregular_simd_reduce(void *reduce_data, ShuffleReductFnTy shflFct) { uint32_t size, remote_id, physical_lane_id; @@ -63,7 +62,6 @@ static uint32_t gpu_irregular_simd_reduce(void *reduce_data, } while (logical_lane_id % 2 == 0 && size > 1); return (logical_lane_id == 0); } -#endif static int32_t nvptx_parallel_reduce_nowait(void *reduce_data, ShuffleReductFnTy shflFct, @@ -74,49 +72,53 @@ static int32_t nvptx_parallel_reduce_nowait(void *reduce_data, uint32_t NumThreads = omp_get_num_threads(); if (NumThreads == 1) return 1; - /* - * This reduce function handles reduction within a team. It handles - * parallel regions in both L1 and L2 parallelism levels. It also - * supports Generic, SPMD, and NoOMP modes. - * - * 1. Reduce within a warp. - * 2. Warp master copies value to warp 0 via shared memory. - * 3. Warp 0 reduces to a single value. - * 4. The reduced value is available in the thread that returns 1. - */ - -#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 700 - uint32_t WarpsNeeded = - (NumThreads + mapping::getWarpSize() - 1) / mapping::getWarpSize(); - uint32_t WarpId = mapping::getWarpIdInBlock(); - - // Volta execution model: - // For the Generic execution mode a parallel region either has 1 thread and - // beyond that, always a multiple of 32. For the SPMD execution mode we may - // have any number of threads. - if ((NumThreads % mapping::getWarpSize() == 0) || (WarpId < WarpsNeeded - 1)) - gpu_regular_warp_reduce(reduce_data, shflFct); - else if (NumThreads > 1) // Only SPMD execution mode comes thru this case. - gpu_irregular_warp_reduce(reduce_data, shflFct, - /*LaneCount=*/NumThreads % mapping::getWarpSize(), - /*LaneId=*/mapping::getThreadIdInBlock() % - mapping::getWarpSize()); - // When we have more than [mapping::getWarpSize()] number of threads - // a block reduction is performed here. - // - // Only L1 parallel region can enter this if condition. - if (NumThreads > mapping::getWarpSize()) { - // Gather all the reduced values from each warp - // to the first warp. - cpyFct(reduce_data, WarpsNeeded); + // + // This reduce function handles reduction within a team. It handles + // parallel regions in both L1 and L2 parallelism levels. It also + // supports Generic, SPMD, and NoOMP modes. + // + // 1. Reduce within a warp. + // 2. Warp master copies value to warp 0 via shared memory. + // 3. Warp 0 reduces to a single value. + // 4. The reduced value is available in the thread that returns 1. + // - if (WarpId == 0) - gpu_irregular_warp_reduce(reduce_data, shflFct, WarpsNeeded, - BlockThreadId); +#if __has_builtin(__nvvm_reflect) + if (__nvvm_reflect("__CUDA_ARCH") >= 700) { + uint32_t WarpsNeeded = + (NumThreads + mapping::getWarpSize() - 1) / mapping::getWarpSize(); + uint32_t WarpId = mapping::getWarpIdInBlock(); + + // Volta execution model: + // For the Generic execution mode a parallel region either has 1 thread and + // beyond that, always a multiple of 32. For the SPMD execution mode we may + // have any number of threads. + if ((NumThreads % mapping::getWarpSize() == 0) || + (WarpId < WarpsNeeded - 1)) + gpu_regular_warp_reduce(reduce_data, shflFct); + else if (NumThreads > 1) // Only SPMD execution mode comes thru this case. + gpu_irregular_warp_reduce( + reduce_data, shflFct, + /*LaneCount=*/NumThreads % mapping::getWarpSize(), + /*LaneId=*/mapping::getThreadIdInBlock() % mapping::getWarpSize()); + + // When we have more than [mapping::getWarpSize()] number of threads + // a block reduction is performed here. + // + // Only L1 parallel region can enter this if condition. + if (NumThreads > mapping::getWarpSize()) { + // Gather all the reduced values from each warp + // to the first warp. + cpyFct(reduce_data, WarpsNeeded); + + if (WarpId == 0) + gpu_irregular_warp_reduce(reduce_data, shflFct, WarpsNeeded, + BlockThreadId); + } + return BlockThreadId == 0; } - return BlockThreadId == 0; -#else +#endif __kmpc_impl_lanemask_t Liveness = mapping::activemask(); if (Liveness == lanes::All) // Full warp gpu_regular_warp_reduce(reduce_data, shflFct); @@ -150,10 +152,9 @@ static int32_t nvptx_parallel_reduce_nowait(void *reduce_data, return BlockThreadId == 0; } - // Get the OMP thread Id. This is different from BlockThreadId in the case of - // an L2 parallel region. + // Get the OMP thread Id. This is different from BlockThreadId in the case + // of an L2 parallel region. return BlockThreadId == 0; -#endif // __CUDA_ARCH__ >= 700 } uint32_t roundToWarpsize(uint32_t s) { diff --git a/openmp/docs/ReleaseNotes.rst b/openmp/docs/ReleaseNotes.rst index d4a4b1a99f7813..0089f1aa31d144 100644 --- a/openmp/docs/ReleaseNotes.rst +++ b/openmp/docs/ReleaseNotes.rst @@ -19,3 +19,9 @@ from the `LLVM releases web site `_. Non-comprehensive list of changes in this release ================================================= + +Device Runtime +-------------- +- Changed the OpenMP DeviceRTL to use 'generic' IR. The + ``LIBOMPTARGET_DEVICE_ARCHITECTURES`` CMake argument is now unused and will + always build support for AMDGPU and NVPTX targets. From openmp-commits at lists.llvm.org Mon Dec 9 15:49:39 2024 From: openmp-commits at lists.llvm.org (Sam James via Openmp-commits) Date: Mon, 09 Dec 2024 15:49:39 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <67578213.170a0220.aa506.44b6@mx.google.com> https://github.com/thesamesam approved this pull request. LGTM (thanks for pinging mgorny and I), notwithstanding mgorny's comment, but we could handle that as a followup too. https://github.com/llvm/llvm-project/pull/119091 From openmp-commits at lists.llvm.org Sun Dec 22 20:39:09 2024 From: openmp-commits at lists.llvm.org (Joseph Huber via Openmp-commits) Date: Sun, 22 Dec 2024 20:39:09 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <6768e96d.050a0220.2875e4.2370@mx.google.com> https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/119091 >From 767d34a0469aa67c2c47a35bc9bff29d20ae1222 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Sat, 7 Dec 2024 13:47:23 -0600 Subject: [PATCH] [OpenMP] Use generic IR for the OpenMP DeviceRTL Summary: We previously built this for every single architecture to deal with incompatibility. This patch updates it to use the 'generic' IR that `libc` and other projects use. Who knows if this will have any side-effects, probably worth testing more but it passes the tests I expect to pass on my side. update with comment release notes fix --- clang/docs/ReleaseNotes.rst | 3 + clang/lib/Driver/ToolChains/CommonArgs.cpp | 3 +- clang/lib/Driver/ToolChains/Cuda.cpp | 1 - ...t-nvptx-sm_52.bc => libomptarget-nvptx.bc} | 0 ...t-nvptx-sm_52.bc => libomptarget-nvptx.bc} | 0 clang/test/Driver/openmp-offload-gpu.c | 4 +- offload/DeviceRTL/CMakeLists.txt | 76 ++++------------ offload/DeviceRTL/src/Reduction.cpp | 91 ++++++++++--------- openmp/docs/ReleaseNotes.rst | 6 ++ 9 files changed, 75 insertions(+), 109 deletions(-) rename clang/test/Driver/Inputs/libomptarget/{libomptarget-nvptx-sm_52.bc => libomptarget-nvptx.bc} (100%) rename clang/test/Driver/Inputs/libomptarget/subdir/{libomptarget-nvptx-sm_52.bc => libomptarget-nvptx.bc} (100%) diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index 8b984ecaefecaf..ba5aafe25cc934 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -1250,6 +1250,9 @@ OpenMP Support - Added support for 'omp assume' directive. - Added support for 'omp scope' directive. - Added support for allocator-modifier in 'allocate' clause. +- Changed the OpenMP DeviceRTL to use 'generic' IR. The + ``LIBOMPTARGET_DEVICE_ARCHITECTURES`` CMake argument is now unused and will + always build support for AMDGPU and NVPTX targets. Improvements ^^^^^^^^^^^^ diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp index 47df650e5b9486..e606a7d3d44199 100644 --- a/clang/lib/Driver/ToolChains/CommonArgs.cpp +++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp @@ -2837,8 +2837,7 @@ void tools::addOpenMPDeviceRTL(const Driver &D, : options::OPT_libomptarget_nvptx_bc_path_EQ; StringRef ArchPrefix = Triple.isAMDGCN() ? "amdgpu" : "nvptx"; - std::string LibOmpTargetName = - ("libomptarget-" + ArchPrefix + "-" + BitcodeSuffix + ".bc").str(); + std::string LibOmpTargetName = ("libomptarget-" + ArchPrefix + ".bc").str(); // First check whether user specifies bc library if (const Arg *A = DriverArgs.getLastArg(LibomptargetBCPathOpt)) { diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp index 102794829795da..214f1e5d83478f 100644 --- a/clang/lib/Driver/ToolChains/Cuda.cpp +++ b/clang/lib/Driver/ToolChains/Cuda.cpp @@ -851,7 +851,6 @@ void CudaToolChain::addClangTargetOptions( HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind); StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_march_EQ); - assert(!GpuArch.empty() && "Must have an explicit GPU arch."); assert((DeviceOffloadingKind == Action::OFK_OpenMP || DeviceOffloadingKind == Action::OFK_Cuda) && "Only OpenMP or CUDA offloading kinds are supported for NVIDIA GPUs."); diff --git a/clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx-sm_52.bc b/clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx.bc similarity index 100% rename from clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx-sm_52.bc rename to clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx.bc diff --git a/clang/test/Driver/Inputs/libomptarget/subdir/libomptarget-nvptx-sm_52.bc b/clang/test/Driver/Inputs/libomptarget/subdir/libomptarget-nvptx.bc similarity index 100% rename from clang/test/Driver/Inputs/libomptarget/subdir/libomptarget-nvptx-sm_52.bc rename to clang/test/Driver/Inputs/libomptarget/subdir/libomptarget-nvptx.bc diff --git a/clang/test/Driver/openmp-offload-gpu.c b/clang/test/Driver/openmp-offload-gpu.c index f6e2245dcdbc05..74bd2a6aeee468 100644 --- a/clang/test/Driver/openmp-offload-gpu.c +++ b/clang/test/Driver/openmp-offload-gpu.c @@ -90,8 +90,8 @@ // RUN: %s 2>&1 | FileCheck -check-prefix=CHK-ENV-BCLIB %s // CHK-BCLIB: clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}libomptarget-nvptx-test.bc -// CHK-BCLIB-DIR: clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}libomptarget{{/|\\\\}}libomptarget-nvptx-sm_52.bc -// CHK-ENV-BCLIB: clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}subdir{{/|\\\\}}libomptarget-nvptx-sm_52.bc +// CHK-BCLIB-DIR: clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}libomptarget{{/|\\\\}}libomptarget-nvptx.bc +// CHK-ENV-BCLIB: clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}subdir{{/|\\\\}}libomptarget-nvptx.bc // CHK-BCLIB-NOT: {{error:|warning:}} /// ########################################################################### diff --git a/offload/DeviceRTL/CMakeLists.txt b/offload/DeviceRTL/CMakeLists.txt index 11176a567a6439..22940264f9b19a 100644 --- a/offload/DeviceRTL/CMakeLists.txt +++ b/offload/DeviceRTL/CMakeLists.txt @@ -42,43 +42,6 @@ set(devicertl_base_directory ${CMAKE_CURRENT_SOURCE_DIR}) set(include_directory ${devicertl_base_directory}/include) set(source_directory ${devicertl_base_directory}/src) -set(all_amdgpu_architectures "gfx700;gfx701;gfx801;gfx803" - "gfx9-generic;gfx900;gfx902;gfx906;gfx908" - "gfx90a;gfx90c" - "gfx9-4-generic;gfx940;gfx941;gfx942;gfx950" - "gfx10-1-generic;gfx1010;gfx1012" - "gfx10-3-generic;gfx1030;gfx1031;gfx1032;gfx1033" - "gfx1034;gfx1035;gfx1036" - "gfx11-generic;gfx1100;gfx1101;gfx1102;gfx1103" - "gfx1150;gfx1151;gfx1152;gfx1153" - "gfx12-generic") -set(all_nvptx_architectures "sm_35;sm_37;sm_50;sm_52;sm_53;sm_60;sm_61;sm_62" - "sm_70;sm_72;sm_75;sm_80;sm_86;sm_87;sm_89;sm_90") -set(all_gpu_architectures - "${all_amdgpu_architectures};${all_nvptx_architectures}") - -set(LIBOMPTARGET_DEVICE_ARCHITECTURES "all" CACHE STRING - "List of device architectures to be used to compile the OpenMP DeviceRTL.") - -if(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "all") - set(LIBOMPTARGET_DEVICE_ARCHITECTURES ${all_gpu_architectures}) -elseif(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "amdgpu") - set(LIBOMPTARGET_DEVICE_ARCHITECTURES ${all_amdgpu_architectures}) -elseif(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "nvptx") - set(LIBOMPTARGET_DEVICE_ARCHITECTURES ${all_nvptx_architectures}) -elseif(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "auto" OR - LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "native") - if(NOT LIBOMPTARGET_NVPTX_ARCH AND NOT LIBOMPTARGET_AMDGPU_ARCH) - message(FATAL_ERROR - "Could not find 'amdgpu-arch' and 'nvptx-arch' tools required for 'auto'") - elseif(NOT LIBOMPTARGET_FOUND_NVIDIA_GPU AND NOT LIBOMPTARGET_FOUND_AMDGPU_GPU) - message(FATAL_ERROR "No AMD or NVIDIA GPU found on the system when using 'auto'") - endif() - set(LIBOMPTARGET_DEVICE_ARCHITECTURES - "${LIBOMPTARGET_NVPTX_DETECTED_ARCH_LIST};${LIBOMPTARGET_AMDGPU_DETECTED_ARCH_LIST}") -endif() -list(REMOVE_DUPLICATES LIBOMPTARGET_DEVICE_ARCHITECTURES) - set(include_files ${include_directory}/Allocator.h ${include_directory}/Configuration.h @@ -146,20 +109,22 @@ set(bc_flags -c -foffload-lto -std=c++17 -fvisibility=hidden # first create an object target add_library(omptarget.devicertl.all_objs OBJECT IMPORTED) -function(compileDeviceRTLLibrary target_cpu target_name target_triple) +function(compileDeviceRTLLibrary target_name target_triple) set(target_bc_flags ${ARGN}) set(bc_files "") foreach(src ${src_files}) get_filename_component(infile ${src} ABSOLUTE) get_filename_component(outfile ${src} NAME) - set(outfile "${outfile}-${target_cpu}.bc") + set(outfile "${outfile}-${target_name}.bc") set(depfile "${outfile}.d") + # Passing an empty CPU to -march= suppressed target specific metadata. add_custom_command(OUTPUT ${outfile} COMMAND ${CLANG_TOOL} ${bc_flags} - --offload-arch=${target_cpu} + -fopenmp-targets=${target_triple} + -Xopenmp-target=${target_triple} -march= ${target_bc_flags} -MD -MF ${depfile} ${infile} -o ${outfile} @@ -182,7 +147,7 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) list(APPEND bc_files ${outfile}) endforeach() - set(bclib_name "libomptarget-${target_name}-${target_cpu}.bc") + set(bclib_name "libomptarget-${target_name}.bc") # Link to a bitcode library. add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/linked_${bclib_name} @@ -222,7 +187,7 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) APPEND) endif() - set(bclib_target_name "omptarget-${target_name}-${target_cpu}-bc") + set(bclib_target_name "omptarget-${target_name}-bc") add_custom_target(${bclib_target_name} DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name}) # Copy library to destination. @@ -244,7 +209,7 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) # Package the bitcode in the bitcode and embed it in an ELF for the static library add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} COMMAND ${PACKAGER_TOOL} -o ${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} - "--image=file=${CMAKE_CURRENT_BINARY_DIR}/${bclib_name},${target_feature},triple=${target_triple},arch=${target_cpu},kind=openmp" + "--image=file=${CMAKE_CURRENT_BINARY_DIR}/${bclib_name},${target_feature},triple=${target_triple},arch=generic,kind=openmp" DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name} COMMENT "Packaging LLVM offloading binary ${bclib_name}.out" ) @@ -254,14 +219,14 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) APPEND) endif() - set(output_name "${CMAKE_CURRENT_BINARY_DIR}/devicertl-${target_name}-${target_cpu}.o") + set(output_name "${CMAKE_CURRENT_BINARY_DIR}/devicertl-${target_name}.o") add_custom_command(OUTPUT ${output_name} COMMAND ${CLANG_TOOL} --std=c++17 -c -nostdlib -Xclang -fembed-offload-object=${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} -o ${output_name} ${source_directory}/Stub.cpp DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} ${source_directory}/Stub.cpp - COMMENT "Embedding LLVM offloading binary in devicertl-${target_name}-${target_cpu}.o" + COMMENT "Embedding LLVM offloading binary in devicertl-${target_name}.o" VERBATIM ) if(TARGET clang) @@ -274,11 +239,11 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) set_property(TARGET omptarget.devicertl.all_objs APPEND PROPERTY IMPORTED_OBJECTS ${output_name}) if (CMAKE_EXPORT_COMPILE_COMMANDS) - set(ide_target_name omptarget-ide-${target_name}-${target_cpu}) + set(ide_target_name omptarget-ide-${target_name}) add_library(${ide_target_name} STATIC EXCLUDE_FROM_ALL ${src_files}) target_compile_options(${ide_target_name} PRIVATE - -fopenmp --offload-arch=${target_cpu} -fopenmp-cuda-mode - -mllvm -openmp-opt-disable + -fopenmp-targets=${target_triple} -Xopenmp-target=${target_triple} -march= + -fopenmp -fopenmp-cuda-mode -mllvm -openmp-opt-disable -foffload-lto -fvisibility=hidden --offload-device-only -nocudalib -nogpulib -nogpuinc -nostdlibinc -Wno-unknown-cuda-version ) @@ -293,18 +258,11 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) endif() endfunction() -# Generate a Bitcode library for all the gpu architectures the user requested. -add_custom_target(omptarget.devicertl.nvptx) add_custom_target(omptarget.devicertl.amdgpu) -foreach(gpu_arch ${LIBOMPTARGET_DEVICE_ARCHITECTURES}) - if("${gpu_arch}" IN_LIST all_amdgpu_architectures) - compileDeviceRTLLibrary(${gpu_arch} amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) - elseif("${gpu_arch}" IN_LIST all_nvptx_architectures) - compileDeviceRTLLibrary(${gpu_arch} nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) - else() - message(FATAL_ERROR "Unknown GPU architecture '${gpu_arch}'") - endif() -endforeach() +compileDeviceRTLLibrary(amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) + +add_custom_target(omptarget.devicertl.nvptx) +compileDeviceRTLLibrary(nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) # Archive all the object files generated above into a static library add_library(omptarget.devicertl STATIC) diff --git a/offload/DeviceRTL/src/Reduction.cpp b/offload/DeviceRTL/src/Reduction.cpp index 57df159d3f28e5..d3b4528401953c 100644 --- a/offload/DeviceRTL/src/Reduction.cpp +++ b/offload/DeviceRTL/src/Reduction.cpp @@ -44,7 +44,6 @@ void gpu_irregular_warp_reduce(void *reduce_data, ShuffleReductFnTy shflFct, } } -#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ < 700 static uint32_t gpu_irregular_simd_reduce(void *reduce_data, ShuffleReductFnTy shflFct) { uint32_t size, remote_id, physical_lane_id; @@ -63,7 +62,6 @@ static uint32_t gpu_irregular_simd_reduce(void *reduce_data, } while (logical_lane_id % 2 == 0 && size > 1); return (logical_lane_id == 0); } -#endif static int32_t nvptx_parallel_reduce_nowait(void *reduce_data, ShuffleReductFnTy shflFct, @@ -74,49 +72,53 @@ static int32_t nvptx_parallel_reduce_nowait(void *reduce_data, uint32_t NumThreads = omp_get_num_threads(); if (NumThreads == 1) return 1; - /* - * This reduce function handles reduction within a team. It handles - * parallel regions in both L1 and L2 parallelism levels. It also - * supports Generic, SPMD, and NoOMP modes. - * - * 1. Reduce within a warp. - * 2. Warp master copies value to warp 0 via shared memory. - * 3. Warp 0 reduces to a single value. - * 4. The reduced value is available in the thread that returns 1. - */ - -#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 700 - uint32_t WarpsNeeded = - (NumThreads + mapping::getWarpSize() - 1) / mapping::getWarpSize(); - uint32_t WarpId = mapping::getWarpIdInBlock(); - - // Volta execution model: - // For the Generic execution mode a parallel region either has 1 thread and - // beyond that, always a multiple of 32. For the SPMD execution mode we may - // have any number of threads. - if ((NumThreads % mapping::getWarpSize() == 0) || (WarpId < WarpsNeeded - 1)) - gpu_regular_warp_reduce(reduce_data, shflFct); - else if (NumThreads > 1) // Only SPMD execution mode comes thru this case. - gpu_irregular_warp_reduce(reduce_data, shflFct, - /*LaneCount=*/NumThreads % mapping::getWarpSize(), - /*LaneId=*/mapping::getThreadIdInBlock() % - mapping::getWarpSize()); - // When we have more than [mapping::getWarpSize()] number of threads - // a block reduction is performed here. - // - // Only L1 parallel region can enter this if condition. - if (NumThreads > mapping::getWarpSize()) { - // Gather all the reduced values from each warp - // to the first warp. - cpyFct(reduce_data, WarpsNeeded); + // + // This reduce function handles reduction within a team. It handles + // parallel regions in both L1 and L2 parallelism levels. It also + // supports Generic, SPMD, and NoOMP modes. + // + // 1. Reduce within a warp. + // 2. Warp master copies value to warp 0 via shared memory. + // 3. Warp 0 reduces to a single value. + // 4. The reduced value is available in the thread that returns 1. + // - if (WarpId == 0) - gpu_irregular_warp_reduce(reduce_data, shflFct, WarpsNeeded, - BlockThreadId); +#if __has_builtin(__nvvm_reflect) + if (__nvvm_reflect("__CUDA_ARCH") >= 700) { + uint32_t WarpsNeeded = + (NumThreads + mapping::getWarpSize() - 1) / mapping::getWarpSize(); + uint32_t WarpId = mapping::getWarpIdInBlock(); + + // Volta execution model: + // For the Generic execution mode a parallel region either has 1 thread and + // beyond that, always a multiple of 32. For the SPMD execution mode we may + // have any number of threads. + if ((NumThreads % mapping::getWarpSize() == 0) || + (WarpId < WarpsNeeded - 1)) + gpu_regular_warp_reduce(reduce_data, shflFct); + else if (NumThreads > 1) // Only SPMD execution mode comes thru this case. + gpu_irregular_warp_reduce( + reduce_data, shflFct, + /*LaneCount=*/NumThreads % mapping::getWarpSize(), + /*LaneId=*/mapping::getThreadIdInBlock() % mapping::getWarpSize()); + + // When we have more than [mapping::getWarpSize()] number of threads + // a block reduction is performed here. + // + // Only L1 parallel region can enter this if condition. + if (NumThreads > mapping::getWarpSize()) { + // Gather all the reduced values from each warp + // to the first warp. + cpyFct(reduce_data, WarpsNeeded); + + if (WarpId == 0) + gpu_irregular_warp_reduce(reduce_data, shflFct, WarpsNeeded, + BlockThreadId); + } + return BlockThreadId == 0; } - return BlockThreadId == 0; -#else +#endif __kmpc_impl_lanemask_t Liveness = mapping::activemask(); if (Liveness == lanes::All) // Full warp gpu_regular_warp_reduce(reduce_data, shflFct); @@ -150,10 +152,9 @@ static int32_t nvptx_parallel_reduce_nowait(void *reduce_data, return BlockThreadId == 0; } - // Get the OMP thread Id. This is different from BlockThreadId in the case of - // an L2 parallel region. + // Get the OMP thread Id. This is different from BlockThreadId in the case + // of an L2 parallel region. return BlockThreadId == 0; -#endif // __CUDA_ARCH__ >= 700 } uint32_t roundToWarpsize(uint32_t s) { diff --git a/openmp/docs/ReleaseNotes.rst b/openmp/docs/ReleaseNotes.rst index d4a4b1a99f7813..0089f1aa31d144 100644 --- a/openmp/docs/ReleaseNotes.rst +++ b/openmp/docs/ReleaseNotes.rst @@ -19,3 +19,9 @@ from the `LLVM releases web site `_. Non-comprehensive list of changes in this release ================================================= + +Device Runtime +-------------- +- Changed the OpenMP DeviceRTL to use 'generic' IR. The + ``LIBOMPTARGET_DEVICE_ARCHITECTURES`` CMake argument is now unused and will + always build support for AMDGPU and NVPTX targets. From openmp-commits at lists.llvm.org Mon Dec 23 03:45:57 2024 From: openmp-commits at lists.llvm.org (=?UTF-8?B?TWljaGHFgiBHw7Nybnk=?= via Openmp-commits) Date: Mon, 23 Dec 2024 03:45:57 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <67694d75.170a0220.12d179.167f@mx.google.com> mgorny wrote: @jhuber6, since I think this PR makes #119006 obsolete, should I merge that one while this one's still in the works, or close it? https://github.com/llvm/llvm-project/pull/119091 From openmp-commits at lists.llvm.org Mon Dec 23 05:53:00 2024 From: openmp-commits at lists.llvm.org (Joseph Huber via Openmp-commits) Date: Mon, 23 Dec 2024 05:53:00 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <67696b3c.170a0220.352399.431a@mx.google.com> jhuber6 wrote: > @jhuber6, since I think this PR makes #119006 obsolete, should I merge that one while this one's still in the works, or close it? I wouldn't say it's high priority. I would've landed this already if it weren't for the downstream fork being so divergent this isn't an easy apply. https://github.com/llvm/llvm-project/pull/119091 From openmp-commits at lists.llvm.org Mon Dec 23 09:54:55 2024 From: openmp-commits at lists.llvm.org (Joseph Huber via Openmp-commits) Date: Mon, 23 Dec 2024 09:54:55 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <6769a3ef.170a0220.edabd.7af9@mx.google.com> jhuber6 wrote: @ronlieb https://gist.github.com/jhuber6/26e3fc8473a750410d2ca1db4a65918d https://github.com/llvm/llvm-project/pull/119091 From openmp-commits at lists.llvm.org Mon Dec 23 17:43:01 2024 From: openmp-commits at lists.llvm.org (Joseph Huber via Openmp-commits) Date: Mon, 23 Dec 2024 17:43:01 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <676a11a5.170a0220.eb10.8f5f@mx.google.com> https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/119091 >From 767d34a0469aa67c2c47a35bc9bff29d20ae1222 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Sat, 7 Dec 2024 13:47:23 -0600 Subject: [PATCH 1/2] [OpenMP] Use generic IR for the OpenMP DeviceRTL Summary: We previously built this for every single architecture to deal with incompatibility. This patch updates it to use the 'generic' IR that `libc` and other projects use. Who knows if this will have any side-effects, probably worth testing more but it passes the tests I expect to pass on my side. update with comment release notes fix --- clang/docs/ReleaseNotes.rst | 3 + clang/lib/Driver/ToolChains/CommonArgs.cpp | 3 +- clang/lib/Driver/ToolChains/Cuda.cpp | 1 - ...t-nvptx-sm_52.bc => libomptarget-nvptx.bc} | 0 ...t-nvptx-sm_52.bc => libomptarget-nvptx.bc} | 0 clang/test/Driver/openmp-offload-gpu.c | 4 +- offload/DeviceRTL/CMakeLists.txt | 76 ++++------------ offload/DeviceRTL/src/Reduction.cpp | 91 ++++++++++--------- openmp/docs/ReleaseNotes.rst | 6 ++ 9 files changed, 75 insertions(+), 109 deletions(-) rename clang/test/Driver/Inputs/libomptarget/{libomptarget-nvptx-sm_52.bc => libomptarget-nvptx.bc} (100%) rename clang/test/Driver/Inputs/libomptarget/subdir/{libomptarget-nvptx-sm_52.bc => libomptarget-nvptx.bc} (100%) diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index 8b984ecaefecaf..ba5aafe25cc934 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -1250,6 +1250,9 @@ OpenMP Support - Added support for 'omp assume' directive. - Added support for 'omp scope' directive. - Added support for allocator-modifier in 'allocate' clause. +- Changed the OpenMP DeviceRTL to use 'generic' IR. The + ``LIBOMPTARGET_DEVICE_ARCHITECTURES`` CMake argument is now unused and will + always build support for AMDGPU and NVPTX targets. Improvements ^^^^^^^^^^^^ diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp index 47df650e5b9486..e606a7d3d44199 100644 --- a/clang/lib/Driver/ToolChains/CommonArgs.cpp +++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp @@ -2837,8 +2837,7 @@ void tools::addOpenMPDeviceRTL(const Driver &D, : options::OPT_libomptarget_nvptx_bc_path_EQ; StringRef ArchPrefix = Triple.isAMDGCN() ? "amdgpu" : "nvptx"; - std::string LibOmpTargetName = - ("libomptarget-" + ArchPrefix + "-" + BitcodeSuffix + ".bc").str(); + std::string LibOmpTargetName = ("libomptarget-" + ArchPrefix + ".bc").str(); // First check whether user specifies bc library if (const Arg *A = DriverArgs.getLastArg(LibomptargetBCPathOpt)) { diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp index 102794829795da..214f1e5d83478f 100644 --- a/clang/lib/Driver/ToolChains/Cuda.cpp +++ b/clang/lib/Driver/ToolChains/Cuda.cpp @@ -851,7 +851,6 @@ void CudaToolChain::addClangTargetOptions( HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind); StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_march_EQ); - assert(!GpuArch.empty() && "Must have an explicit GPU arch."); assert((DeviceOffloadingKind == Action::OFK_OpenMP || DeviceOffloadingKind == Action::OFK_Cuda) && "Only OpenMP or CUDA offloading kinds are supported for NVIDIA GPUs."); diff --git a/clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx-sm_52.bc b/clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx.bc similarity index 100% rename from clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx-sm_52.bc rename to clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx.bc diff --git a/clang/test/Driver/Inputs/libomptarget/subdir/libomptarget-nvptx-sm_52.bc b/clang/test/Driver/Inputs/libomptarget/subdir/libomptarget-nvptx.bc similarity index 100% rename from clang/test/Driver/Inputs/libomptarget/subdir/libomptarget-nvptx-sm_52.bc rename to clang/test/Driver/Inputs/libomptarget/subdir/libomptarget-nvptx.bc diff --git a/clang/test/Driver/openmp-offload-gpu.c b/clang/test/Driver/openmp-offload-gpu.c index f6e2245dcdbc05..74bd2a6aeee468 100644 --- a/clang/test/Driver/openmp-offload-gpu.c +++ b/clang/test/Driver/openmp-offload-gpu.c @@ -90,8 +90,8 @@ // RUN: %s 2>&1 | FileCheck -check-prefix=CHK-ENV-BCLIB %s // CHK-BCLIB: clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}libomptarget-nvptx-test.bc -// CHK-BCLIB-DIR: clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}libomptarget{{/|\\\\}}libomptarget-nvptx-sm_52.bc -// CHK-ENV-BCLIB: clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}subdir{{/|\\\\}}libomptarget-nvptx-sm_52.bc +// CHK-BCLIB-DIR: clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}libomptarget{{/|\\\\}}libomptarget-nvptx.bc +// CHK-ENV-BCLIB: clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}subdir{{/|\\\\}}libomptarget-nvptx.bc // CHK-BCLIB-NOT: {{error:|warning:}} /// ########################################################################### diff --git a/offload/DeviceRTL/CMakeLists.txt b/offload/DeviceRTL/CMakeLists.txt index 11176a567a6439..22940264f9b19a 100644 --- a/offload/DeviceRTL/CMakeLists.txt +++ b/offload/DeviceRTL/CMakeLists.txt @@ -42,43 +42,6 @@ set(devicertl_base_directory ${CMAKE_CURRENT_SOURCE_DIR}) set(include_directory ${devicertl_base_directory}/include) set(source_directory ${devicertl_base_directory}/src) -set(all_amdgpu_architectures "gfx700;gfx701;gfx801;gfx803" - "gfx9-generic;gfx900;gfx902;gfx906;gfx908" - "gfx90a;gfx90c" - "gfx9-4-generic;gfx940;gfx941;gfx942;gfx950" - "gfx10-1-generic;gfx1010;gfx1012" - "gfx10-3-generic;gfx1030;gfx1031;gfx1032;gfx1033" - "gfx1034;gfx1035;gfx1036" - "gfx11-generic;gfx1100;gfx1101;gfx1102;gfx1103" - "gfx1150;gfx1151;gfx1152;gfx1153" - "gfx12-generic") -set(all_nvptx_architectures "sm_35;sm_37;sm_50;sm_52;sm_53;sm_60;sm_61;sm_62" - "sm_70;sm_72;sm_75;sm_80;sm_86;sm_87;sm_89;sm_90") -set(all_gpu_architectures - "${all_amdgpu_architectures};${all_nvptx_architectures}") - -set(LIBOMPTARGET_DEVICE_ARCHITECTURES "all" CACHE STRING - "List of device architectures to be used to compile the OpenMP DeviceRTL.") - -if(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "all") - set(LIBOMPTARGET_DEVICE_ARCHITECTURES ${all_gpu_architectures}) -elseif(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "amdgpu") - set(LIBOMPTARGET_DEVICE_ARCHITECTURES ${all_amdgpu_architectures}) -elseif(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "nvptx") - set(LIBOMPTARGET_DEVICE_ARCHITECTURES ${all_nvptx_architectures}) -elseif(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "auto" OR - LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "native") - if(NOT LIBOMPTARGET_NVPTX_ARCH AND NOT LIBOMPTARGET_AMDGPU_ARCH) - message(FATAL_ERROR - "Could not find 'amdgpu-arch' and 'nvptx-arch' tools required for 'auto'") - elseif(NOT LIBOMPTARGET_FOUND_NVIDIA_GPU AND NOT LIBOMPTARGET_FOUND_AMDGPU_GPU) - message(FATAL_ERROR "No AMD or NVIDIA GPU found on the system when using 'auto'") - endif() - set(LIBOMPTARGET_DEVICE_ARCHITECTURES - "${LIBOMPTARGET_NVPTX_DETECTED_ARCH_LIST};${LIBOMPTARGET_AMDGPU_DETECTED_ARCH_LIST}") -endif() -list(REMOVE_DUPLICATES LIBOMPTARGET_DEVICE_ARCHITECTURES) - set(include_files ${include_directory}/Allocator.h ${include_directory}/Configuration.h @@ -146,20 +109,22 @@ set(bc_flags -c -foffload-lto -std=c++17 -fvisibility=hidden # first create an object target add_library(omptarget.devicertl.all_objs OBJECT IMPORTED) -function(compileDeviceRTLLibrary target_cpu target_name target_triple) +function(compileDeviceRTLLibrary target_name target_triple) set(target_bc_flags ${ARGN}) set(bc_files "") foreach(src ${src_files}) get_filename_component(infile ${src} ABSOLUTE) get_filename_component(outfile ${src} NAME) - set(outfile "${outfile}-${target_cpu}.bc") + set(outfile "${outfile}-${target_name}.bc") set(depfile "${outfile}.d") + # Passing an empty CPU to -march= suppressed target specific metadata. add_custom_command(OUTPUT ${outfile} COMMAND ${CLANG_TOOL} ${bc_flags} - --offload-arch=${target_cpu} + -fopenmp-targets=${target_triple} + -Xopenmp-target=${target_triple} -march= ${target_bc_flags} -MD -MF ${depfile} ${infile} -o ${outfile} @@ -182,7 +147,7 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) list(APPEND bc_files ${outfile}) endforeach() - set(bclib_name "libomptarget-${target_name}-${target_cpu}.bc") + set(bclib_name "libomptarget-${target_name}.bc") # Link to a bitcode library. add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/linked_${bclib_name} @@ -222,7 +187,7 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) APPEND) endif() - set(bclib_target_name "omptarget-${target_name}-${target_cpu}-bc") + set(bclib_target_name "omptarget-${target_name}-bc") add_custom_target(${bclib_target_name} DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name}) # Copy library to destination. @@ -244,7 +209,7 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) # Package the bitcode in the bitcode and embed it in an ELF for the static library add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} COMMAND ${PACKAGER_TOOL} -o ${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} - "--image=file=${CMAKE_CURRENT_BINARY_DIR}/${bclib_name},${target_feature},triple=${target_triple},arch=${target_cpu},kind=openmp" + "--image=file=${CMAKE_CURRENT_BINARY_DIR}/${bclib_name},${target_feature},triple=${target_triple},arch=generic,kind=openmp" DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name} COMMENT "Packaging LLVM offloading binary ${bclib_name}.out" ) @@ -254,14 +219,14 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) APPEND) endif() - set(output_name "${CMAKE_CURRENT_BINARY_DIR}/devicertl-${target_name}-${target_cpu}.o") + set(output_name "${CMAKE_CURRENT_BINARY_DIR}/devicertl-${target_name}.o") add_custom_command(OUTPUT ${output_name} COMMAND ${CLANG_TOOL} --std=c++17 -c -nostdlib -Xclang -fembed-offload-object=${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} -o ${output_name} ${source_directory}/Stub.cpp DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} ${source_directory}/Stub.cpp - COMMENT "Embedding LLVM offloading binary in devicertl-${target_name}-${target_cpu}.o" + COMMENT "Embedding LLVM offloading binary in devicertl-${target_name}.o" VERBATIM ) if(TARGET clang) @@ -274,11 +239,11 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) set_property(TARGET omptarget.devicertl.all_objs APPEND PROPERTY IMPORTED_OBJECTS ${output_name}) if (CMAKE_EXPORT_COMPILE_COMMANDS) - set(ide_target_name omptarget-ide-${target_name}-${target_cpu}) + set(ide_target_name omptarget-ide-${target_name}) add_library(${ide_target_name} STATIC EXCLUDE_FROM_ALL ${src_files}) target_compile_options(${ide_target_name} PRIVATE - -fopenmp --offload-arch=${target_cpu} -fopenmp-cuda-mode - -mllvm -openmp-opt-disable + -fopenmp-targets=${target_triple} -Xopenmp-target=${target_triple} -march= + -fopenmp -fopenmp-cuda-mode -mllvm -openmp-opt-disable -foffload-lto -fvisibility=hidden --offload-device-only -nocudalib -nogpulib -nogpuinc -nostdlibinc -Wno-unknown-cuda-version ) @@ -293,18 +258,11 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) endif() endfunction() -# Generate a Bitcode library for all the gpu architectures the user requested. -add_custom_target(omptarget.devicertl.nvptx) add_custom_target(omptarget.devicertl.amdgpu) -foreach(gpu_arch ${LIBOMPTARGET_DEVICE_ARCHITECTURES}) - if("${gpu_arch}" IN_LIST all_amdgpu_architectures) - compileDeviceRTLLibrary(${gpu_arch} amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) - elseif("${gpu_arch}" IN_LIST all_nvptx_architectures) - compileDeviceRTLLibrary(${gpu_arch} nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) - else() - message(FATAL_ERROR "Unknown GPU architecture '${gpu_arch}'") - endif() -endforeach() +compileDeviceRTLLibrary(amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) + +add_custom_target(omptarget.devicertl.nvptx) +compileDeviceRTLLibrary(nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) # Archive all the object files generated above into a static library add_library(omptarget.devicertl STATIC) diff --git a/offload/DeviceRTL/src/Reduction.cpp b/offload/DeviceRTL/src/Reduction.cpp index 57df159d3f28e5..d3b4528401953c 100644 --- a/offload/DeviceRTL/src/Reduction.cpp +++ b/offload/DeviceRTL/src/Reduction.cpp @@ -44,7 +44,6 @@ void gpu_irregular_warp_reduce(void *reduce_data, ShuffleReductFnTy shflFct, } } -#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ < 700 static uint32_t gpu_irregular_simd_reduce(void *reduce_data, ShuffleReductFnTy shflFct) { uint32_t size, remote_id, physical_lane_id; @@ -63,7 +62,6 @@ static uint32_t gpu_irregular_simd_reduce(void *reduce_data, } while (logical_lane_id % 2 == 0 && size > 1); return (logical_lane_id == 0); } -#endif static int32_t nvptx_parallel_reduce_nowait(void *reduce_data, ShuffleReductFnTy shflFct, @@ -74,49 +72,53 @@ static int32_t nvptx_parallel_reduce_nowait(void *reduce_data, uint32_t NumThreads = omp_get_num_threads(); if (NumThreads == 1) return 1; - /* - * This reduce function handles reduction within a team. It handles - * parallel regions in both L1 and L2 parallelism levels. It also - * supports Generic, SPMD, and NoOMP modes. - * - * 1. Reduce within a warp. - * 2. Warp master copies value to warp 0 via shared memory. - * 3. Warp 0 reduces to a single value. - * 4. The reduced value is available in the thread that returns 1. - */ - -#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 700 - uint32_t WarpsNeeded = - (NumThreads + mapping::getWarpSize() - 1) / mapping::getWarpSize(); - uint32_t WarpId = mapping::getWarpIdInBlock(); - - // Volta execution model: - // For the Generic execution mode a parallel region either has 1 thread and - // beyond that, always a multiple of 32. For the SPMD execution mode we may - // have any number of threads. - if ((NumThreads % mapping::getWarpSize() == 0) || (WarpId < WarpsNeeded - 1)) - gpu_regular_warp_reduce(reduce_data, shflFct); - else if (NumThreads > 1) // Only SPMD execution mode comes thru this case. - gpu_irregular_warp_reduce(reduce_data, shflFct, - /*LaneCount=*/NumThreads % mapping::getWarpSize(), - /*LaneId=*/mapping::getThreadIdInBlock() % - mapping::getWarpSize()); - // When we have more than [mapping::getWarpSize()] number of threads - // a block reduction is performed here. - // - // Only L1 parallel region can enter this if condition. - if (NumThreads > mapping::getWarpSize()) { - // Gather all the reduced values from each warp - // to the first warp. - cpyFct(reduce_data, WarpsNeeded); + // + // This reduce function handles reduction within a team. It handles + // parallel regions in both L1 and L2 parallelism levels. It also + // supports Generic, SPMD, and NoOMP modes. + // + // 1. Reduce within a warp. + // 2. Warp master copies value to warp 0 via shared memory. + // 3. Warp 0 reduces to a single value. + // 4. The reduced value is available in the thread that returns 1. + // - if (WarpId == 0) - gpu_irregular_warp_reduce(reduce_data, shflFct, WarpsNeeded, - BlockThreadId); +#if __has_builtin(__nvvm_reflect) + if (__nvvm_reflect("__CUDA_ARCH") >= 700) { + uint32_t WarpsNeeded = + (NumThreads + mapping::getWarpSize() - 1) / mapping::getWarpSize(); + uint32_t WarpId = mapping::getWarpIdInBlock(); + + // Volta execution model: + // For the Generic execution mode a parallel region either has 1 thread and + // beyond that, always a multiple of 32. For the SPMD execution mode we may + // have any number of threads. + if ((NumThreads % mapping::getWarpSize() == 0) || + (WarpId < WarpsNeeded - 1)) + gpu_regular_warp_reduce(reduce_data, shflFct); + else if (NumThreads > 1) // Only SPMD execution mode comes thru this case. + gpu_irregular_warp_reduce( + reduce_data, shflFct, + /*LaneCount=*/NumThreads % mapping::getWarpSize(), + /*LaneId=*/mapping::getThreadIdInBlock() % mapping::getWarpSize()); + + // When we have more than [mapping::getWarpSize()] number of threads + // a block reduction is performed here. + // + // Only L1 parallel region can enter this if condition. + if (NumThreads > mapping::getWarpSize()) { + // Gather all the reduced values from each warp + // to the first warp. + cpyFct(reduce_data, WarpsNeeded); + + if (WarpId == 0) + gpu_irregular_warp_reduce(reduce_data, shflFct, WarpsNeeded, + BlockThreadId); + } + return BlockThreadId == 0; } - return BlockThreadId == 0; -#else +#endif __kmpc_impl_lanemask_t Liveness = mapping::activemask(); if (Liveness == lanes::All) // Full warp gpu_regular_warp_reduce(reduce_data, shflFct); @@ -150,10 +152,9 @@ static int32_t nvptx_parallel_reduce_nowait(void *reduce_data, return BlockThreadId == 0; } - // Get the OMP thread Id. This is different from BlockThreadId in the case of - // an L2 parallel region. + // Get the OMP thread Id. This is different from BlockThreadId in the case + // of an L2 parallel region. return BlockThreadId == 0; -#endif // __CUDA_ARCH__ >= 700 } uint32_t roundToWarpsize(uint32_t s) { diff --git a/openmp/docs/ReleaseNotes.rst b/openmp/docs/ReleaseNotes.rst index d4a4b1a99f7813..0089f1aa31d144 100644 --- a/openmp/docs/ReleaseNotes.rst +++ b/openmp/docs/ReleaseNotes.rst @@ -19,3 +19,9 @@ from the `LLVM releases web site `_. Non-comprehensive list of changes in this release ================================================= + +Device Runtime +-------------- +- Changed the OpenMP DeviceRTL to use 'generic' IR. The + ``LIBOMPTARGET_DEVICE_ARCHITECTURES`` CMake argument is now unused and will + always build support for AMDGPU and NVPTX targets. >From 3c8cd44fc17b46d8689f3133ccd319838f441602 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Mon, 23 Dec 2024 19:42:50 -0600 Subject: [PATCH 2/2] No break time --- offload/DeviceRTL/src/Misc.cpp | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/offload/DeviceRTL/src/Misc.cpp b/offload/DeviceRTL/src/Misc.cpp index 00935cce05e479..ba6fbf5d5c7e3c 100644 --- a/offload/DeviceRTL/src/Misc.cpp +++ b/offload/DeviceRTL/src/Misc.cpp @@ -39,15 +39,7 @@ double getWTick() { } double getWTime() { - uint64_t NumTicks = 0; - if constexpr (__has_builtin(__builtin_amdgcn_s_sendmsg_rtnl)) - NumTicks = __builtin_amdgcn_s_sendmsg_rtnl(0x83); - else if constexpr (__has_builtin(__builtin_amdgcn_s_memrealtime)) - NumTicks = __builtin_amdgcn_s_memrealtime(); - else if constexpr (__has_builtin(__builtin_amdgcn_s_memtime)) - NumTicks = __builtin_amdgcn_s_memtime(); - - return static_cast(NumTicks) * getWTick(); + return static_cast(__builtin_readsteadycounter()) * getWTick(); } #pragma omp end declare variant From openmp-commits at lists.llvm.org Tue Dec 24 15:52:05 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Tue, 24 Dec 2024 15:52:05 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <676b4925.170a0220.1cb2df.e8e7@mx.google.com> https://github.com/ronlieb approved this pull request. https://github.com/llvm/llvm-project/pull/119091 From openmp-commits at lists.llvm.org Tue Dec 24 16:05:32 2024 From: openmp-commits at lists.llvm.org (via Openmp-commits) Date: Tue, 24 Dec 2024 16:05:32 -0800 (PST) Subject: [Openmp-commits] [openmp] 34f8573 - [OpenMP] Use generic IR for the OpenMP DeviceRTL (#119091) Message-ID: <676b4c4c.170a0220.24b240.cd1c@mx.google.com> Author: Joseph Huber Date: 2024-12-24T18:05:28-06:00 New Revision: 34f8573a514915222630cf21e8a0c901a25f4ca0 URL: https://github.com/llvm/llvm-project/commit/34f8573a514915222630cf21e8a0c901a25f4ca0 DIFF: https://github.com/llvm/llvm-project/commit/34f8573a514915222630cf21e8a0c901a25f4ca0.diff LOG: [OpenMP] Use generic IR for the OpenMP DeviceRTL (#119091) Summary: We previously built this for every single architecture to deal with incompatibility. This patch updates it to use the 'generic' IR that `libc` and other projects use. Who knows if this will have any side-effects, probably worth testing more but it passes the tests I expect to pass on my side. Added: clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx.bc clang/test/Driver/Inputs/libomptarget/subdir/libomptarget-nvptx.bc Modified: clang/docs/ReleaseNotes.rst clang/lib/Driver/ToolChains/CommonArgs.cpp clang/lib/Driver/ToolChains/Cuda.cpp clang/test/Driver/openmp-offload-gpu.c offload/DeviceRTL/CMakeLists.txt offload/DeviceRTL/src/Misc.cpp offload/DeviceRTL/src/Reduction.cpp openmp/docs/ReleaseNotes.rst Removed: clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx-sm_52.bc clang/test/Driver/Inputs/libomptarget/subdir/libomptarget-nvptx-sm_52.bc ################################################################################ diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index 8b984ecaefecaf..ba5aafe25cc934 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -1250,6 +1250,9 @@ OpenMP Support - Added support for 'omp assume' directive. - Added support for 'omp scope' directive. - Added support for allocator-modifier in 'allocate' clause. +- Changed the OpenMP DeviceRTL to use 'generic' IR. The + ``LIBOMPTARGET_DEVICE_ARCHITECTURES`` CMake argument is now unused and will + always build support for AMDGPU and NVPTX targets. Improvements ^^^^^^^^^^^^ diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp index f8f751cb6a66d5..bc62e8c48238b8 100644 --- a/clang/lib/Driver/ToolChains/CommonArgs.cpp +++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp @@ -2838,8 +2838,7 @@ void tools::addOpenMPDeviceRTL(const Driver &D, : options::OPT_libomptarget_nvptx_bc_path_EQ; StringRef ArchPrefix = Triple.isAMDGCN() ? "amdgpu" : "nvptx"; - std::string LibOmpTargetName = - ("libomptarget-" + ArchPrefix + "-" + BitcodeSuffix + ".bc").str(); + std::string LibOmpTargetName = ("libomptarget-" + ArchPrefix + ".bc").str(); // First check whether user specifies bc library if (const Arg *A = DriverArgs.getLastArg(LibomptargetBCPathOpt)) { diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp index 102794829795da..214f1e5d83478f 100644 --- a/clang/lib/Driver/ToolChains/Cuda.cpp +++ b/clang/lib/Driver/ToolChains/Cuda.cpp @@ -851,7 +851,6 @@ void CudaToolChain::addClangTargetOptions( HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind); StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_march_EQ); - assert(!GpuArch.empty() && "Must have an explicit GPU arch."); assert((DeviceOffloadingKind == Action::OFK_OpenMP || DeviceOffloadingKind == Action::OFK_Cuda) && "Only OpenMP or CUDA offloading kinds are supported for NVIDIA GPUs."); diff --git a/clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx-sm_52.bc b/clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx.bc similarity index 100% rename from clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx-sm_52.bc rename to clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx.bc diff --git a/clang/test/Driver/Inputs/libomptarget/subdir/libomptarget-nvptx-sm_52.bc b/clang/test/Driver/Inputs/libomptarget/subdir/libomptarget-nvptx.bc similarity index 100% rename from clang/test/Driver/Inputs/libomptarget/subdir/libomptarget-nvptx-sm_52.bc rename to clang/test/Driver/Inputs/libomptarget/subdir/libomptarget-nvptx.bc diff --git a/clang/test/Driver/openmp-offload-gpu.c b/clang/test/Driver/openmp-offload-gpu.c index f6e2245dcdbc05..74bd2a6aeee468 100644 --- a/clang/test/Driver/openmp-offload-gpu.c +++ b/clang/test/Driver/openmp-offload-gpu.c @@ -90,8 +90,8 @@ // RUN: %s 2>&1 | FileCheck -check-prefix=CHK-ENV-BCLIB %s // CHK-BCLIB: clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}libomptarget-nvptx-test.bc -// CHK-BCLIB-DIR: clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}libomptarget{{/|\\\\}}libomptarget-nvptx-sm_52.bc -// CHK-ENV-BCLIB: clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}subdir{{/|\\\\}}libomptarget-nvptx-sm_52.bc +// CHK-BCLIB-DIR: clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}libomptarget{{/|\\\\}}libomptarget-nvptx.bc +// CHK-ENV-BCLIB: clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}subdir{{/|\\\\}}libomptarget-nvptx.bc // CHK-BCLIB-NOT: {{error:|warning:}} /// ########################################################################### diff --git a/offload/DeviceRTL/CMakeLists.txt b/offload/DeviceRTL/CMakeLists.txt index 11176a567a6439..22940264f9b19a 100644 --- a/offload/DeviceRTL/CMakeLists.txt +++ b/offload/DeviceRTL/CMakeLists.txt @@ -42,43 +42,6 @@ set(devicertl_base_directory ${CMAKE_CURRENT_SOURCE_DIR}) set(include_directory ${devicertl_base_directory}/include) set(source_directory ${devicertl_base_directory}/src) -set(all_amdgpu_architectures "gfx700;gfx701;gfx801;gfx803" - "gfx9-generic;gfx900;gfx902;gfx906;gfx908" - "gfx90a;gfx90c" - "gfx9-4-generic;gfx940;gfx941;gfx942;gfx950" - "gfx10-1-generic;gfx1010;gfx1012" - "gfx10-3-generic;gfx1030;gfx1031;gfx1032;gfx1033" - "gfx1034;gfx1035;gfx1036" - "gfx11-generic;gfx1100;gfx1101;gfx1102;gfx1103" - "gfx1150;gfx1151;gfx1152;gfx1153" - "gfx12-generic") -set(all_nvptx_architectures "sm_35;sm_37;sm_50;sm_52;sm_53;sm_60;sm_61;sm_62" - "sm_70;sm_72;sm_75;sm_80;sm_86;sm_87;sm_89;sm_90") -set(all_gpu_architectures - "${all_amdgpu_architectures};${all_nvptx_architectures}") - -set(LIBOMPTARGET_DEVICE_ARCHITECTURES "all" CACHE STRING - "List of device architectures to be used to compile the OpenMP DeviceRTL.") - -if(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "all") - set(LIBOMPTARGET_DEVICE_ARCHITECTURES ${all_gpu_architectures}) -elseif(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "amdgpu") - set(LIBOMPTARGET_DEVICE_ARCHITECTURES ${all_amdgpu_architectures}) -elseif(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "nvptx") - set(LIBOMPTARGET_DEVICE_ARCHITECTURES ${all_nvptx_architectures}) -elseif(LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "auto" OR - LIBOMPTARGET_DEVICE_ARCHITECTURES STREQUAL "native") - if(NOT LIBOMPTARGET_NVPTX_ARCH AND NOT LIBOMPTARGET_AMDGPU_ARCH) - message(FATAL_ERROR - "Could not find 'amdgpu-arch' and 'nvptx-arch' tools required for 'auto'") - elseif(NOT LIBOMPTARGET_FOUND_NVIDIA_GPU AND NOT LIBOMPTARGET_FOUND_AMDGPU_GPU) - message(FATAL_ERROR "No AMD or NVIDIA GPU found on the system when using 'auto'") - endif() - set(LIBOMPTARGET_DEVICE_ARCHITECTURES - "${LIBOMPTARGET_NVPTX_DETECTED_ARCH_LIST};${LIBOMPTARGET_AMDGPU_DETECTED_ARCH_LIST}") -endif() -list(REMOVE_DUPLICATES LIBOMPTARGET_DEVICE_ARCHITECTURES) - set(include_files ${include_directory}/Allocator.h ${include_directory}/Configuration.h @@ -146,20 +109,22 @@ set(bc_flags -c -foffload-lto -std=c++17 -fvisibility=hidden # first create an object target add_library(omptarget.devicertl.all_objs OBJECT IMPORTED) -function(compileDeviceRTLLibrary target_cpu target_name target_triple) +function(compileDeviceRTLLibrary target_name target_triple) set(target_bc_flags ${ARGN}) set(bc_files "") foreach(src ${src_files}) get_filename_component(infile ${src} ABSOLUTE) get_filename_component(outfile ${src} NAME) - set(outfile "${outfile}-${target_cpu}.bc") + set(outfile "${outfile}-${target_name}.bc") set(depfile "${outfile}.d") + # Passing an empty CPU to -march= suppressed target specific metadata. add_custom_command(OUTPUT ${outfile} COMMAND ${CLANG_TOOL} ${bc_flags} - --offload-arch=${target_cpu} + -fopenmp-targets=${target_triple} + -Xopenmp-target=${target_triple} -march= ${target_bc_flags} -MD -MF ${depfile} ${infile} -o ${outfile} @@ -182,7 +147,7 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) list(APPEND bc_files ${outfile}) endforeach() - set(bclib_name "libomptarget-${target_name}-${target_cpu}.bc") + set(bclib_name "libomptarget-${target_name}.bc") # Link to a bitcode library. add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/linked_${bclib_name} @@ -222,7 +187,7 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) APPEND) endif() - set(bclib_target_name "omptarget-${target_name}-${target_cpu}-bc") + set(bclib_target_name "omptarget-${target_name}-bc") add_custom_target(${bclib_target_name} DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name}) # Copy library to destination. @@ -244,7 +209,7 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) # Package the bitcode in the bitcode and embed it in an ELF for the static library add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} COMMAND ${PACKAGER_TOOL} -o ${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} - "--image=file=${CMAKE_CURRENT_BINARY_DIR}/${bclib_name},${target_feature},triple=${target_triple},arch=${target_cpu},kind=openmp" + "--image=file=${CMAKE_CURRENT_BINARY_DIR}/${bclib_name},${target_feature},triple=${target_triple},arch=generic,kind=openmp" DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name} COMMENT "Packaging LLVM offloading binary ${bclib_name}.out" ) @@ -254,14 +219,14 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) APPEND) endif() - set(output_name "${CMAKE_CURRENT_BINARY_DIR}/devicertl-${target_name}-${target_cpu}.o") + set(output_name "${CMAKE_CURRENT_BINARY_DIR}/devicertl-${target_name}.o") add_custom_command(OUTPUT ${output_name} COMMAND ${CLANG_TOOL} --std=c++17 -c -nostdlib -Xclang -fembed-offload-object=${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} -o ${output_name} ${source_directory}/Stub.cpp DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} ${source_directory}/Stub.cpp - COMMENT "Embedding LLVM offloading binary in devicertl-${target_name}-${target_cpu}.o" + COMMENT "Embedding LLVM offloading binary in devicertl-${target_name}.o" VERBATIM ) if(TARGET clang) @@ -274,11 +239,11 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) set_property(TARGET omptarget.devicertl.all_objs APPEND PROPERTY IMPORTED_OBJECTS ${output_name}) if (CMAKE_EXPORT_COMPILE_COMMANDS) - set(ide_target_name omptarget-ide-${target_name}-${target_cpu}) + set(ide_target_name omptarget-ide-${target_name}) add_library(${ide_target_name} STATIC EXCLUDE_FROM_ALL ${src_files}) target_compile_options(${ide_target_name} PRIVATE - -fopenmp --offload-arch=${target_cpu} -fopenmp-cuda-mode - -mllvm -openmp-opt-disable + -fopenmp-targets=${target_triple} -Xopenmp-target=${target_triple} -march= + -fopenmp -fopenmp-cuda-mode -mllvm -openmp-opt-disable -foffload-lto -fvisibility=hidden --offload-device-only -nocudalib -nogpulib -nogpuinc -nostdlibinc -Wno-unknown-cuda-version ) @@ -293,18 +258,11 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple) endif() endfunction() -# Generate a Bitcode library for all the gpu architectures the user requested. -add_custom_target(omptarget.devicertl.nvptx) add_custom_target(omptarget.devicertl.amdgpu) -foreach(gpu_arch ${LIBOMPTARGET_DEVICE_ARCHITECTURES}) - if("${gpu_arch}" IN_LIST all_amdgpu_architectures) - compileDeviceRTLLibrary(${gpu_arch} amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) - elseif("${gpu_arch}" IN_LIST all_nvptx_architectures) - compileDeviceRTLLibrary(${gpu_arch} nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) - else() - message(FATAL_ERROR "Unknown GPU architecture '${gpu_arch}'") - endif() -endforeach() +compileDeviceRTLLibrary(amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none) + +add_custom_target(omptarget.devicertl.nvptx) +compileDeviceRTLLibrary(nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63) # Archive all the object files generated above into a static library add_library(omptarget.devicertl STATIC) diff --git a/offload/DeviceRTL/src/Misc.cpp b/offload/DeviceRTL/src/Misc.cpp index 00935cce05e479..ba6fbf5d5c7e3c 100644 --- a/offload/DeviceRTL/src/Misc.cpp +++ b/offload/DeviceRTL/src/Misc.cpp @@ -39,15 +39,7 @@ double getWTick() { } double getWTime() { - uint64_t NumTicks = 0; - if constexpr (__has_builtin(__builtin_amdgcn_s_sendmsg_rtnl)) - NumTicks = __builtin_amdgcn_s_sendmsg_rtnl(0x83); - else if constexpr (__has_builtin(__builtin_amdgcn_s_memrealtime)) - NumTicks = __builtin_amdgcn_s_memrealtime(); - else if constexpr (__has_builtin(__builtin_amdgcn_s_memtime)) - NumTicks = __builtin_amdgcn_s_memtime(); - - return static_cast(NumTicks) * getWTick(); + return static_cast(__builtin_readsteadycounter()) * getWTick(); } #pragma omp end declare variant diff --git a/offload/DeviceRTL/src/Reduction.cpp b/offload/DeviceRTL/src/Reduction.cpp index 57df159d3f28e5..d3b4528401953c 100644 --- a/offload/DeviceRTL/src/Reduction.cpp +++ b/offload/DeviceRTL/src/Reduction.cpp @@ -44,7 +44,6 @@ void gpu_irregular_warp_reduce(void *reduce_data, ShuffleReductFnTy shflFct, } } -#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ < 700 static uint32_t gpu_irregular_simd_reduce(void *reduce_data, ShuffleReductFnTy shflFct) { uint32_t size, remote_id, physical_lane_id; @@ -63,7 +62,6 @@ static uint32_t gpu_irregular_simd_reduce(void *reduce_data, } while (logical_lane_id % 2 == 0 && size > 1); return (logical_lane_id == 0); } -#endif static int32_t nvptx_parallel_reduce_nowait(void *reduce_data, ShuffleReductFnTy shflFct, @@ -74,49 +72,53 @@ static int32_t nvptx_parallel_reduce_nowait(void *reduce_data, uint32_t NumThreads = omp_get_num_threads(); if (NumThreads == 1) return 1; - /* - * This reduce function handles reduction within a team. It handles - * parallel regions in both L1 and L2 parallelism levels. It also - * supports Generic, SPMD, and NoOMP modes. - * - * 1. Reduce within a warp. - * 2. Warp master copies value to warp 0 via shared memory. - * 3. Warp 0 reduces to a single value. - * 4. The reduced value is available in the thread that returns 1. - */ - -#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 700 - uint32_t WarpsNeeded = - (NumThreads + mapping::getWarpSize() - 1) / mapping::getWarpSize(); - uint32_t WarpId = mapping::getWarpIdInBlock(); - - // Volta execution model: - // For the Generic execution mode a parallel region either has 1 thread and - // beyond that, always a multiple of 32. For the SPMD execution mode we may - // have any number of threads. - if ((NumThreads % mapping::getWarpSize() == 0) || (WarpId < WarpsNeeded - 1)) - gpu_regular_warp_reduce(reduce_data, shflFct); - else if (NumThreads > 1) // Only SPMD execution mode comes thru this case. - gpu_irregular_warp_reduce(reduce_data, shflFct, - /*LaneCount=*/NumThreads % mapping::getWarpSize(), - /*LaneId=*/mapping::getThreadIdInBlock() % - mapping::getWarpSize()); - // When we have more than [mapping::getWarpSize()] number of threads - // a block reduction is performed here. - // - // Only L1 parallel region can enter this if condition. - if (NumThreads > mapping::getWarpSize()) { - // Gather all the reduced values from each warp - // to the first warp. - cpyFct(reduce_data, WarpsNeeded); + // + // This reduce function handles reduction within a team. It handles + // parallel regions in both L1 and L2 parallelism levels. It also + // supports Generic, SPMD, and NoOMP modes. + // + // 1. Reduce within a warp. + // 2. Warp master copies value to warp 0 via shared memory. + // 3. Warp 0 reduces to a single value. + // 4. The reduced value is available in the thread that returns 1. + // - if (WarpId == 0) - gpu_irregular_warp_reduce(reduce_data, shflFct, WarpsNeeded, - BlockThreadId); +#if __has_builtin(__nvvm_reflect) + if (__nvvm_reflect("__CUDA_ARCH") >= 700) { + uint32_t WarpsNeeded = + (NumThreads + mapping::getWarpSize() - 1) / mapping::getWarpSize(); + uint32_t WarpId = mapping::getWarpIdInBlock(); + + // Volta execution model: + // For the Generic execution mode a parallel region either has 1 thread and + // beyond that, always a multiple of 32. For the SPMD execution mode we may + // have any number of threads. + if ((NumThreads % mapping::getWarpSize() == 0) || + (WarpId < WarpsNeeded - 1)) + gpu_regular_warp_reduce(reduce_data, shflFct); + else if (NumThreads > 1) // Only SPMD execution mode comes thru this case. + gpu_irregular_warp_reduce( + reduce_data, shflFct, + /*LaneCount=*/NumThreads % mapping::getWarpSize(), + /*LaneId=*/mapping::getThreadIdInBlock() % mapping::getWarpSize()); + + // When we have more than [mapping::getWarpSize()] number of threads + // a block reduction is performed here. + // + // Only L1 parallel region can enter this if condition. + if (NumThreads > mapping::getWarpSize()) { + // Gather all the reduced values from each warp + // to the first warp. + cpyFct(reduce_data, WarpsNeeded); + + if (WarpId == 0) + gpu_irregular_warp_reduce(reduce_data, shflFct, WarpsNeeded, + BlockThreadId); + } + return BlockThreadId == 0; } - return BlockThreadId == 0; -#else +#endif __kmpc_impl_lanemask_t Liveness = mapping::activemask(); if (Liveness == lanes::All) // Full warp gpu_regular_warp_reduce(reduce_data, shflFct); @@ -150,10 +152,9 @@ static int32_t nvptx_parallel_reduce_nowait(void *reduce_data, return BlockThreadId == 0; } - // Get the OMP thread Id. This is diff erent from BlockThreadId in the case of - // an L2 parallel region. + // Get the OMP thread Id. This is diff erent from BlockThreadId in the case + // of an L2 parallel region. return BlockThreadId == 0; -#endif // __CUDA_ARCH__ >= 700 } uint32_t roundToWarpsize(uint32_t s) { diff --git a/openmp/docs/ReleaseNotes.rst b/openmp/docs/ReleaseNotes.rst index d4a4b1a99f7813..0089f1aa31d144 100644 --- a/openmp/docs/ReleaseNotes.rst +++ b/openmp/docs/ReleaseNotes.rst @@ -19,3 +19,9 @@ from the `LLVM releases web site `_. Non-comprehensive list of changes in this release ================================================= + +Device Runtime +-------------- +- Changed the OpenMP DeviceRTL to use 'generic' IR. The + ``LIBOMPTARGET_DEVICE_ARCHITECTURES`` CMake argument is now unused and will + always build support for AMDGPU and NVPTX targets. From openmp-commits at lists.llvm.org Tue Dec 24 16:05:35 2024 From: openmp-commits at lists.llvm.org (Joseph Huber via Openmp-commits) Date: Tue, 24 Dec 2024 16:05:35 -0800 (PST) Subject: [Openmp-commits] [clang] [llvm] [openmp] [OpenMP] Use generic IR for the OpenMP DeviceRTL (PR #119091) In-Reply-To: Message-ID: <676b4c4f.050a0220.4888c.0b3c@mx.google.com> https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/119091 From openmp-commits at lists.llvm.org Thu Dec 26 19:36:59 2024 From: openmp-commits at lists.llvm.org (Ethan Luis McDonough via Openmp-commits) Date: Thu, 26 Dec 2024 19:36:59 -0800 (PST) Subject: [Openmp-commits] [clang] [compiler-rt] [llvm] [openmp] [PGO][Offload] Profile profraw generation for GPU instrumentation #76587 (PR #93365) In-Reply-To: Message-ID: <676e20db.170a0220.29fa4c.c550@mx.google.com> https://github.com/EthanLuisMcDonough updated https://github.com/llvm/llvm-project/pull/93365 >From 530eb982b9770190377bb0bd09c5cb715f34d484 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 15 Dec 2023 20:38:38 -0600 Subject: [PATCH 01/39] Add profiling functions to libomptarget --- .../include/llvm/Frontend/OpenMP/OMPKinds.def | 3 +++ openmp/libomptarget/DeviceRTL/CMakeLists.txt | 2 ++ .../DeviceRTL/include/Profiling.h | 21 +++++++++++++++++++ .../libomptarget/DeviceRTL/src/Profiling.cpp | 19 +++++++++++++++++ 4 files changed, 45 insertions(+) create mode 100644 openmp/libomptarget/DeviceRTL/include/Profiling.h create mode 100644 openmp/libomptarget/DeviceRTL/src/Profiling.cpp diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def index d22d2a8e948b00..1d887d5cb58127 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def +++ b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def @@ -503,6 +503,9 @@ __OMP_RTL(__kmpc_barrier_simple_generic, false, Void, IdentPtr, Int32) __OMP_RTL(__kmpc_warp_active_thread_mask, false, Int64,) __OMP_RTL(__kmpc_syncwarp, false, Void, Int64) +__OMP_RTL(__llvm_profile_register_function, false, Void, VoidPtr) +__OMP_RTL(__llvm_profile_register_names_function, false, Void, VoidPtr, Int64) + __OMP_RTL(__last, false, Void, ) #undef __OMP_RTL diff --git a/openmp/libomptarget/DeviceRTL/CMakeLists.txt b/openmp/libomptarget/DeviceRTL/CMakeLists.txt index 1ce3e1e40a80ab..55ee15d068c67b 100644 --- a/openmp/libomptarget/DeviceRTL/CMakeLists.txt +++ b/openmp/libomptarget/DeviceRTL/CMakeLists.txt @@ -89,6 +89,7 @@ set(include_files ${include_directory}/Interface.h ${include_directory}/LibC.h ${include_directory}/Mapping.h + ${include_directory}/Profiling.h ${include_directory}/State.h ${include_directory}/Synchronization.h ${include_directory}/Types.h @@ -104,6 +105,7 @@ set(src_files ${source_directory}/Mapping.cpp ${source_directory}/Misc.cpp ${source_directory}/Parallelism.cpp + ${source_directory}/Profiling.cpp ${source_directory}/Reduction.cpp ${source_directory}/State.cpp ${source_directory}/Synchronization.cpp diff --git a/openmp/libomptarget/DeviceRTL/include/Profiling.h b/openmp/libomptarget/DeviceRTL/include/Profiling.h new file mode 100644 index 00000000000000..68c7744cd60752 --- /dev/null +++ b/openmp/libomptarget/DeviceRTL/include/Profiling.h @@ -0,0 +1,21 @@ +//===-------- Profiling.h - OpenMP interface ---------------------- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// +//===----------------------------------------------------------------------===// + +#ifndef OMPTARGET_DEVICERTL_PROFILING_H +#define OMPTARGET_DEVICERTL_PROFILING_H + +extern "C" { + +void __llvm_profile_register_function(void *ptr); +void __llvm_profile_register_names_function(void *ptr, long int i); +} + +#endif diff --git a/openmp/libomptarget/DeviceRTL/src/Profiling.cpp b/openmp/libomptarget/DeviceRTL/src/Profiling.cpp new file mode 100644 index 00000000000000..799477f5e47d27 --- /dev/null +++ b/openmp/libomptarget/DeviceRTL/src/Profiling.cpp @@ -0,0 +1,19 @@ +//===------- Profiling.cpp ---------------------------------------- C++ ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#include "Profiling.h" + +#pragma omp begin declare target device_type(nohost) + +extern "C" { + +void __llvm_profile_register_function(void *ptr) {} +void __llvm_profile_register_names_function(void *ptr, long int i) {} +} + +#pragma omp end declare target >From fb067d4ffe604fd68cf90b705db1942bce49dbb1 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Sat, 16 Dec 2023 01:18:41 -0600 Subject: [PATCH 02/39] Fix PGO instrumentation for GPU targets --- clang/lib/CodeGen/CodeGenPGO.cpp | 10 ++++++++-- .../lib/Transforms/Instrumentation/InstrProfiling.cpp | 11 ++++++++--- 2 files changed, 16 insertions(+), 5 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index 81bf8ea696b164..edae6885b528ac 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -959,8 +959,14 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, unsigned Counter = (*RegionCounterMap)[S]; - llvm::Value *Args[] = {FuncNameVar, - Builder.getInt64(FunctionHash), + // Make sure that pointer to global is passed in with zero addrspace + // This is relevant during GPU profiling + auto *I8Ty = llvm::Type::getInt8Ty(CGM.getLLVMContext()); + auto *I8PtrTy = llvm::PointerType::getUnqual(I8Ty); + auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( + FuncNameVar, I8PtrTy); + + llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), Builder.getInt32(NumRegionCounters), Builder.getInt32(Counter), StepV}; if (!StepV) diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index fe5a0578bd9721..d2cb8155c17967 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -1658,10 +1658,13 @@ void InstrLowerer::emitRegistration() { IRBuilder<> IRB(BasicBlock::Create(M.getContext(), "", RegisterF)); for (Value *Data : CompilerUsedVars) if (!isa(Data)) - IRB.CreateCall(RuntimeRegisterF, Data); + // Check for addrspace cast when profiling GPU + IRB.CreateCall(RuntimeRegisterF, + IRB.CreatePointerBitCastOrAddrSpaceCast(Data, VoidPtrTy)); for (Value *Data : UsedVars) if (Data != NamesVar && !isa(Data)) - IRB.CreateCall(RuntimeRegisterF, Data); + IRB.CreateCall(RuntimeRegisterF, + IRB.CreatePointerBitCastOrAddrSpaceCast(Data, VoidPtrTy)); if (NamesVar) { Type *ParamTypes[] = {VoidPtrTy, Int64Ty}; @@ -1670,7 +1673,9 @@ void InstrLowerer::emitRegistration() { auto *NamesRegisterF = Function::Create(NamesRegisterTy, GlobalVariable::ExternalLinkage, getInstrProfNamesRegFuncName(), M); - IRB.CreateCall(NamesRegisterF, {NamesVar, IRB.getInt64(NamesSize)}); + IRB.CreateCall(NamesRegisterF, {IRB.CreatePointerBitCastOrAddrSpaceCast( + NamesVar, VoidPtrTy), + IRB.getInt64(NamesSize)}); } IRB.CreateRetVoid(); >From 7a0e0efa178cc4de6a22a8f5cc3f53cd1c81ea3a Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 21 Dec 2023 00:25:46 -0600 Subject: [PATCH 03/39] Change global visibility on GPU targets --- llvm/include/llvm/ProfileData/InstrProf.h | 4 ++++ llvm/lib/ProfileData/InstrProf.cpp | 17 +++++++++++++++-- .../Instrumentation/InstrProfiling.cpp | 15 +++++++++++---- 3 files changed, 30 insertions(+), 6 deletions(-) diff --git a/llvm/include/llvm/ProfileData/InstrProf.h b/llvm/include/llvm/ProfileData/InstrProf.h index 288dc71d756aee..bf9899d867e3dd 100644 --- a/llvm/include/llvm/ProfileData/InstrProf.h +++ b/llvm/include/llvm/ProfileData/InstrProf.h @@ -171,6 +171,10 @@ inline StringRef getInstrProfCounterBiasVarName() { /// Return the marker used to separate PGO names during serialization. inline StringRef getInstrProfNameSeparator() { return "\01"; } +/// Determines whether module targets a GPU eligable for PGO +/// instrumentation +bool isGPUProfTarget(const Module &M); + /// Return the modified name for function \c F suitable to be /// used the key for profile lookup. Variable \c InLTO indicates if this /// is called in LTO optimization passes. diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp index 649d814cfd9de0..0d6717aeb0142c 100644 --- a/llvm/lib/ProfileData/InstrProf.cpp +++ b/llvm/lib/ProfileData/InstrProf.cpp @@ -410,13 +410,22 @@ std::string getPGOFuncNameVarName(StringRef FuncName, return VarName; } +bool isGPUProfTarget(const Module &M) { + const auto &triple = M.getTargetTriple(); + return triple.rfind("nvptx", 0) == 0 || triple.rfind("amdgcn", 0) == 0 || + triple.rfind("r600", 0) == 0; +} + GlobalVariable *createPGOFuncNameVar(Module &M, GlobalValue::LinkageTypes Linkage, StringRef PGOFuncName) { + // Ensure profiling variables on GPU are visible to be read from host + if (isGPUProfTarget(M)) + Linkage = GlobalValue::ExternalLinkage; // We generally want to match the function's linkage, but available_externally // and extern_weak both have the wrong semantics, and anything that doesn't // need to link across compilation units doesn't need to be visible at all. - if (Linkage == GlobalValue::ExternalWeakLinkage) + else if (Linkage == GlobalValue::ExternalWeakLinkage) Linkage = GlobalValue::LinkOnceAnyLinkage; else if (Linkage == GlobalValue::AvailableExternallyLinkage) Linkage = GlobalValue::LinkOnceODRLinkage; @@ -430,8 +439,12 @@ GlobalVariable *createPGOFuncNameVar(Module &M, new GlobalVariable(M, Value->getType(), true, Linkage, Value, getPGOFuncNameVarName(PGOFuncName, Linkage)); + // If the target is a GPU, make the symbol protected so it can + // be read from the host device + if (isGPUProfTarget(M)) + FuncNameVar->setVisibility(GlobalValue::ProtectedVisibility); // Hide the symbol so that we correctly get a copy for each executable. - if (!GlobalValue::isLocalLinkage(FuncNameVar->getLinkage())) + else if (!GlobalValue::isLocalLinkage(FuncNameVar->getLinkage())) FuncNameVar->setVisibility(GlobalValue::HiddenVisibility); return FuncNameVar; diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index d2cb8155c17967..3b582b65190808 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -1481,6 +1481,10 @@ void InstrLowerer::createDataVariable(InstrProfCntrInstBase *Inc) { for (uint32_t Kind = IPVK_First; Kind <= IPVK_Last; ++Kind) Int16ArrayVals[Kind] = ConstantInt::get(Int16Ty, PD.NumValueSites[Kind]); + if (isGPUProfTarget(M)) { + Linkage = GlobalValue::ExternalLinkage; + Visibility = GlobalValue::ProtectedVisibility; + } // If the data variable is not referenced by code (if we don't emit // @llvm.instrprof.value.profile, NS will be 0), and the counter keeps the // data variable live under linker GC, the data variable can be private. This @@ -1492,9 +1496,9 @@ void InstrLowerer::createDataVariable(InstrProfCntrInstBase *Inc) { // If profd is in a deduplicate comdat, NS==0 with a hash suffix guarantees // that other copies must have the same CFG and cannot have value profiling. // If no hash suffix, other profd copies may be referenced by code. - if (NS == 0 && !(DataReferencedByCode && NeedComdat && !Renamed) && - (TT.isOSBinFormatELF() || - (!DataReferencedByCode && TT.isOSBinFormatCOFF()))) { + else if (NS == 0 && !(DataReferencedByCode && NeedComdat && !Renamed) && + (TT.isOSBinFormatELF() || + (!DataReferencedByCode && TT.isOSBinFormatCOFF()))) { Linkage = GlobalValue::PrivateLinkage; Visibility = GlobalValue::DefaultVisibility; } @@ -1696,7 +1700,10 @@ bool InstrLowerer::emitRuntimeHook() { auto *Var = new GlobalVariable(M, Int32Ty, false, GlobalValue::ExternalLinkage, nullptr, getInstrProfRuntimeHookVarName()); - Var->setVisibility(GlobalValue::HiddenVisibility); + if (isGPUProfTarget(M)) + Var->setVisibility(GlobalValue::ProtectedVisibility); + else + Var->setVisibility(GlobalValue::HiddenVisibility); if (TT.isOSBinFormatELF() && !TT.isPS()) { // Mark the user variable as used so that it isn't stripped out. >From fddc07908ed9aa698fe3250ddbfc5621ab4d049d Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 22 Dec 2023 23:43:29 -0600 Subject: [PATCH 04/39] Make names global public on GPU --- llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index 3b582b65190808..61fba7be3ee0ee 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -1621,6 +1621,13 @@ void InstrLowerer::emitNameData() { NamesVar = new GlobalVariable(M, NamesVal->getType(), true, GlobalValue::PrivateLinkage, NamesVal, getInstrProfNamesVarName()); + + // Make names variable public if current target is a GPU + if (isGPUProfTarget(M)) { + NamesVar->setLinkage(GlobalValue::ExternalLinkage); + NamesVar->setVisibility(GlobalValue::VisibilityTypes::ProtectedVisibility); + } + NamesSize = CompressedNameStr.size(); setGlobalVariableLargeSection(TT, *NamesVar); NamesVar->setSection( >From e9db03c70bf79f4f4ddad4b48a5aa63a37e0d4f6 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 29 Dec 2023 12:54:50 -0600 Subject: [PATCH 05/39] Read and print GPU device PGO globals --- .../common/include/GlobalHandler.h | 27 ++++++ .../common/src/GlobalHandler.cpp | 82 +++++++++++++++++++ .../common/src/PluginInterface.cpp | 14 ++++ 3 files changed, 123 insertions(+) diff --git a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h index fa079ac9660ee0..a82cd536487653 100644 --- a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h +++ b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h @@ -14,9 +14,11 @@ #define LLVM_OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_GLOBALHANDLER_H #include +#include #include "llvm/ADT/DenseMap.h" #include "llvm/Object/ELFObjectFile.h" +#include "llvm/ProfileData/InstrProf.h" #include "Shared/Debug.h" #include "Shared/Utils.h" @@ -58,6 +60,22 @@ class GlobalTy { void setPtr(void *P) { Ptr = P; } }; +typedef void *IntPtrT; +struct __llvm_profile_data { +#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) Type Name; +#include "llvm/ProfileData/InstrProfData.inc" +}; + +/// PGO profiling data extracted from a GPU device +struct GPUProfGlobals { + std::string names; + std::vector> counts; + std::vector<__llvm_profile_data> data; + Triple targetTriple; + + void dump() const; +}; + /// Subclass of GlobalTy that holds the memory for a global of \p Ty. template class StaticGlobalTy : public GlobalTy { Ty Data; @@ -172,6 +190,15 @@ class GenericGlobalHandlerTy { return moveGlobalBetweenDeviceAndHost(Device, Image, HostGlobal, /* D2H */ false); } + + /// Checks whether a given image contains profiling globals. + bool hasProfilingGlobals(GenericDeviceTy &Device, DeviceImageTy &Image); + + /// Reads profiling data from a GPU image to supplied profdata struct. + /// Iterates through the image symbol table and stores global values + /// with profiling prefixes. + Expected readProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image); }; } // namespace plugin diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index 3a272e228c7dfe..5dd5daec468ca5 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -176,3 +176,85 @@ Error GenericGlobalHandlerTy::readGlobalFromImage(GenericDeviceTy &Device, return Plugin::success(); } + +bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GlobalTy global(getInstrProfNamesVarName().str(), 0); + if (auto Err = getGlobalMetadataFromImage(Device, Image, global)) { + consumeError(std::move(Err)); + return false; + } + return true; +} + +Expected +GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GPUProfGlobals profdata; + const auto *elf = getOrCreateELFObjectFile(Device, Image); + profdata.targetTriple = elf->makeTriple(); + // Iterate through + for (auto &sym : elf->symbols()) { + if (auto name = sym.getName()) { + // Check if given current global is a profiling global based + // on name + if (name->equals(getInstrProfNamesVarName())) { + // Read in profiled function names + std::vector chars(sym.getSize() / sizeof(char), ' '); + GlobalTy NamesGlobal(name->str(), sym.getSize(), chars.data()); + if (auto Err = readGlobalFromDevice(Device, Image, NamesGlobal)) + return Err; + std::string names(chars.begin(), chars.end()); + profdata.names = std::move(names); + } else if (name->starts_with(getInstrProfCountersVarPrefix())) { + // Read global variable profiling counts + std::vector counts(sym.getSize() / sizeof(int64_t), 0); + GlobalTy CountGlobal(name->str(), sym.getSize(), counts.data()); + if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal)) + return Err; + profdata.counts.push_back(std::move(counts)); + } else if (name->starts_with(getInstrProfDataVarPrefix())) { + // Read profiling data for this global variable + __llvm_profile_data data{}; + GlobalTy DataGlobal(name->str(), sym.getSize(), &data); + if (auto Err = readGlobalFromDevice(Device, Image, DataGlobal)) + return Err; + profdata.data.push_back(std::move(data)); + } + } + } + return profdata; +} + +void GPUProfGlobals::dump() const { + llvm::outs() << "======= GPU Profile =======\nTarget: " << targetTriple.str() + << "\n"; + + llvm::outs() << "======== Counters =========\n"; + for (const auto &count : counts) { + llvm::outs() << "["; + for (size_t i = 0; i < count.size(); i++) { + if (i == 0) + llvm::outs() << " "; + llvm::outs() << count[i] << " "; + } + llvm::outs() << "]\n"; + } + + llvm::outs() << "========== Data ===========\n"; + for (const auto &d : data) { + llvm::outs() << "{ "; +#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ + llvm::outs() << d.Name << " "; +#include "llvm/ProfileData/InstrProfData.inc" + llvm::outs() << " }\n"; + } + + llvm::outs() << "======== Functions ========\n"; + InstrProfSymtab symtab; + if (Error Err = symtab.create(StringRef(names))) { + consumeError(std::move(Err)); + } + symtab.dumpNames(llvm::outs()); + llvm::outs() << "===========================\n"; +} diff --git a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp index 3c7d1ca8998787..84ed90f03f84f1 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp @@ -811,6 +811,20 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { DeviceMemoryPoolTracking.AllocationMax); } + for (auto *Image : LoadedImages) { + GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler(); + if (!Handler.hasProfilingGlobals(*this, *Image)) + continue; + + GPUProfGlobals profdata; + auto ProfOrErr = Handler.readProfilingGlobals(*this, *Image); + if (!ProfOrErr) + return ProfOrErr.takeError(); + + // TODO: write data to profiling file + ProfOrErr->dump(); + } + // Delete the memory manager before deinitializing the device. Otherwise, // we may delete device allocations after the device is deinitialized. if (MemoryManager) >From e4687605d1a6ca932312025826db09dba84845a3 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 3 Jan 2024 17:06:15 -0600 Subject: [PATCH 06/39] Fix rebase bug --- .../plugins-nextgen/common/src/GlobalHandler.cpp | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index cb71b61f4a9c4f..86742d0f77a2fe 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -178,10 +178,12 @@ Expected GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, DeviceImageTy &Image) { GPUProfGlobals profdata; - const auto *elf = getOrCreateELFObjectFile(Device, Image); - profdata.targetTriple = elf->makeTriple(); - // Iterate through - for (auto &sym : elf->symbols()) { + auto ELFObj = getELFObjectFile(Image); + if (!ELFObj) + return ELFObj.takeError(); + profdata.targetTriple = ELFObj->makeTriple(); + // Iterate through elf symbols + for (auto &sym : ELFObj->symbols()) { if (auto name = sym.getName()) { // Check if given current global is a profiling global based // on name >From ec18ce94c227e1d43927955fa1c67360ecfcfca6 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 3 Jan 2024 17:10:19 -0600 Subject: [PATCH 07/39] Refactor portions to be more idiomatic --- clang/lib/CodeGen/CodeGenPGO.cpp | 4 +--- llvm/lib/ProfileData/InstrProf.cpp | 5 ++--- 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index edae6885b528ac..7bfcec43ee4c98 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -961,10 +961,8 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *I8Ty = llvm::Type::getInt8Ty(CGM.getLLVMContext()); - auto *I8PtrTy = llvm::PointerType::getUnqual(I8Ty); auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - FuncNameVar, I8PtrTy); + FuncNameVar, llvm::PointerType::getUnqual(CGM.getLLVMContext())); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), Builder.getInt32(NumRegionCounters), diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp index cdcd6840bb5108..1d88da16a5ff9c 100644 --- a/llvm/lib/ProfileData/InstrProf.cpp +++ b/llvm/lib/ProfileData/InstrProf.cpp @@ -429,9 +429,8 @@ std::string getPGOFuncNameVarName(StringRef FuncName, } bool isGPUProfTarget(const Module &M) { - const auto &triple = M.getTargetTriple(); - return triple.rfind("nvptx", 0) == 0 || triple.rfind("amdgcn", 0) == 0 || - triple.rfind("r600", 0) == 0; + const auto &Triple = llvm::Triple(M.getTargetTriple()); + return Triple.isAMDGPU() || Triple.isNVPTX(); } GlobalVariable *createPGOFuncNameVar(Module &M, >From 0872556f597056361b0a2c23cdd0be3d9745aef3 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 3 Jan 2024 17:18:47 -0600 Subject: [PATCH 08/39] Reformat DeviceRTL prof functions --- openmp/libomptarget/DeviceRTL/include/Profiling.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/openmp/libomptarget/DeviceRTL/include/Profiling.h b/openmp/libomptarget/DeviceRTL/include/Profiling.h index 68c7744cd60752..9efc1554c176bc 100644 --- a/openmp/libomptarget/DeviceRTL/include/Profiling.h +++ b/openmp/libomptarget/DeviceRTL/include/Profiling.h @@ -13,9 +13,8 @@ #define OMPTARGET_DEVICERTL_PROFILING_H extern "C" { - -void __llvm_profile_register_function(void *ptr); -void __llvm_profile_register_names_function(void *ptr, long int i); +void __llvm_profile_register_function(void *Ptr); +void __llvm_profile_register_names_function(void *Ptr, long int I); } #endif >From 62f31d1c71b5d100f38d6dc584cc138b3904581b Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 9 Jan 2024 11:52:29 -0600 Subject: [PATCH 09/39] Style changes + catch name error --- .../common/include/GlobalHandler.h | 16 ++-- .../common/src/GlobalHandler.cpp | 87 ++++++++++--------- 2 files changed, 56 insertions(+), 47 deletions(-) diff --git a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h index a803b3f76d8b25..755bb23a414e37 100644 --- a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h +++ b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h @@ -13,8 +13,7 @@ #ifndef LLVM_OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_GLOBALHANDLER_H #define LLVM_OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_GLOBALHANDLER_H -#include -#include +#include #include "llvm/ADT/DenseMap.h" #include "llvm/Object/ELFObjectFile.h" @@ -60,18 +59,19 @@ class GlobalTy { void setPtr(void *P) { Ptr = P; } }; -typedef void *IntPtrT; +using IntPtrT = void *; struct __llvm_profile_data { -#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) Type Name; +#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ + std::remove_const::type Name; #include "llvm/ProfileData/InstrProfData.inc" }; /// PGO profiling data extracted from a GPU device struct GPUProfGlobals { - std::string names; - std::vector> counts; - std::vector<__llvm_profile_data> data; - Triple targetTriple; + SmallVector NamesData; + SmallVector> Counts; + SmallVector<__llvm_profile_data> Data; + Triple TargetTriple; void dump() const; }; diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index 86742d0f77a2fe..7cb672e7b26839 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -19,6 +19,7 @@ #include "llvm/Support/Error.h" #include +#include using namespace llvm; using namespace omp; @@ -177,73 +178,81 @@ bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy &Device, Expected GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, DeviceImageTy &Image) { - GPUProfGlobals profdata; + GPUProfGlobals DeviceProfileData; auto ELFObj = getELFObjectFile(Image); if (!ELFObj) return ELFObj.takeError(); - profdata.targetTriple = ELFObj->makeTriple(); + DeviceProfileData.TargetTriple = ELFObj->makeTriple(); + // Iterate through elf symbols - for (auto &sym : ELFObj->symbols()) { - if (auto name = sym.getName()) { - // Check if given current global is a profiling global based - // on name - if (name->equals(getInstrProfNamesVarName())) { - // Read in profiled function names - std::vector chars(sym.getSize() / sizeof(char), ' '); - GlobalTy NamesGlobal(name->str(), sym.getSize(), chars.data()); - if (auto Err = readGlobalFromDevice(Device, Image, NamesGlobal)) - return Err; - std::string names(chars.begin(), chars.end()); - profdata.names = std::move(names); - } else if (name->starts_with(getInstrProfCountersVarPrefix())) { - // Read global variable profiling counts - std::vector counts(sym.getSize() / sizeof(int64_t), 0); - GlobalTy CountGlobal(name->str(), sym.getSize(), counts.data()); - if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal)) - return Err; - profdata.counts.push_back(std::move(counts)); - } else if (name->starts_with(getInstrProfDataVarPrefix())) { - // Read profiling data for this global variable - __llvm_profile_data data{}; - GlobalTy DataGlobal(name->str(), sym.getSize(), &data); - if (auto Err = readGlobalFromDevice(Device, Image, DataGlobal)) - return Err; - profdata.data.push_back(std::move(data)); - } + for (auto &Sym : ELFObj->symbols()) { + auto NameOrErr = Sym.getName(); + if (!NameOrErr) + return ELFObj.takeError(); + + // Check if given current global is a profiling global based + // on name + if (NameOrErr->equals(getInstrProfNamesVarName())) { + // Read in profiled function names + DeviceProfileData.NamesData = SmallVector(Sym.getSize(), 0); + GlobalTy NamesGlobal(NameOrErr->str(), Sym.getSize(), + DeviceProfileData.NamesData.data()); + if (auto Err = readGlobalFromDevice(Device, Image, NamesGlobal)) + return Err; + } else if (NameOrErr->starts_with(getInstrProfCountersVarPrefix())) { + // Read global variable profiling counts + SmallVector Counts(Sym.getSize() / sizeof(int64_t), 0); + GlobalTy CountGlobal(NameOrErr->str(), Sym.getSize(), Counts.data()); + if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal)) + return Err; + DeviceProfileData.Counts.push_back(std::move(Counts)); + } else if (NameOrErr->starts_with(getInstrProfDataVarPrefix())) { + // Read profiling data for this global variable + __llvm_profile_data Data{}; + GlobalTy DataGlobal(NameOrErr->str(), Sym.getSize(), &Data); + if (auto Err = readGlobalFromDevice(Device, Image, DataGlobal)) + return Err; + DeviceProfileData.Data.push_back(std::move(Data)); } } - return profdata; + return DeviceProfileData; } void GPUProfGlobals::dump() const { - llvm::outs() << "======= GPU Profile =======\nTarget: " << targetTriple.str() + llvm::outs() << "======= GPU Profile =======\nTarget: " << TargetTriple.str() << "\n"; llvm::outs() << "======== Counters =========\n"; - for (const auto &count : counts) { + for (const auto &Count : Counts) { llvm::outs() << "["; - for (size_t i = 0; i < count.size(); i++) { + for (size_t i = 0; i < Count.size(); i++) { if (i == 0) llvm::outs() << " "; - llvm::outs() << count[i] << " "; + llvm::outs() << Count[i] << " "; } llvm::outs() << "]\n"; } llvm::outs() << "========== Data ===========\n"; - for (const auto &d : data) { + for (const auto &ProfData : Data) { llvm::outs() << "{ "; #define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ - llvm::outs() << d.Name << " "; + llvm::outs() << ProfData.Name << " "; #include "llvm/ProfileData/InstrProfData.inc" llvm::outs() << " }\n"; } llvm::outs() << "======== Functions ========\n"; - InstrProfSymtab symtab; - if (Error Err = symtab.create(StringRef(names))) { + std::string s; + s.reserve(NamesData.size()); + for (uint8_t Name : NamesData) { + s.push_back((char)Name); + } + + InstrProfSymtab Symtab; + if (Error Err = Symtab.create(StringRef(s))) { consumeError(std::move(Err)); } - symtab.dumpNames(llvm::outs()); + Symtab.dumpNames(llvm::outs()); llvm::outs() << "===========================\n"; } >From 0c4bbeb54d189c1461affd37853aa86c3e3ca7d8 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 17 Jan 2024 19:59:06 -0600 Subject: [PATCH 10/39] Add GPU PGO test --- .../common/src/GlobalHandler.cpp | 2 +- openmp/libomptarget/test/CMakeLists.txt | 6 +++ openmp/libomptarget/test/lit.cfg | 3 ++ openmp/libomptarget/test/lit.site.cfg.in | 2 +- openmp/libomptarget/test/offloading/pgo1.c | 39 +++++++++++++++++++ 5 files changed, 50 insertions(+), 2 deletions(-) create mode 100644 openmp/libomptarget/test/offloading/pgo1.c diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index 7cb672e7b26839..e5eb653d022287 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -239,7 +239,7 @@ void GPUProfGlobals::dump() const { #define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ llvm::outs() << ProfData.Name << " "; #include "llvm/ProfileData/InstrProfData.inc" - llvm::outs() << " }\n"; + llvm::outs() << "}\n"; } llvm::outs() << "======== Functions ========\n"; diff --git a/openmp/libomptarget/test/CMakeLists.txt b/openmp/libomptarget/test/CMakeLists.txt index a0ba233eaa5726..21233f3e252eb5 100644 --- a/openmp/libomptarget/test/CMakeLists.txt +++ b/openmp/libomptarget/test/CMakeLists.txt @@ -12,6 +12,12 @@ else() set(LIBOMPTARGET_DEBUG False) endif() +if (OPENMP_STANDALONE_BUILD) + set(LIBOMPTARGET_TEST_GPU_PGO False) +else() + set(LIBOMPTARGET_TEST_GPU_PGO True) +endif() + # Replace the space from user's input with ";" in case that CMake add escape # char into the lit command. string(REPLACE " " ";" LIBOMPTARGET_LIT_ARG_LIST "${LIBOMPTARGET_LIT_ARGS}") diff --git a/openmp/libomptarget/test/lit.cfg b/openmp/libomptarget/test/lit.cfg index 19c5e5c4572227..49743f9fed7f29 100644 --- a/openmp/libomptarget/test/lit.cfg +++ b/openmp/libomptarget/test/lit.cfg @@ -104,6 +104,9 @@ config.available_features.add(config.libomptarget_current_target) if config.libomptarget_has_libc: config.available_features.add('libc') +if config.libomptarget_test_pgo: + config.available_features.add('pgo') + # Determine whether the test system supports unified memory. # For CUDA, this is the case with compute capability 70 (Volta) or higher. # For all other targets, we currently assume it is. diff --git a/openmp/libomptarget/test/lit.site.cfg.in b/openmp/libomptarget/test/lit.site.cfg.in index 2d638118838727..494d1636af304a 100644 --- a/openmp/libomptarget/test/lit.site.cfg.in +++ b/openmp/libomptarget/test/lit.site.cfg.in @@ -25,6 +25,6 @@ config.libomptarget_not = "@OPENMP_NOT_EXECUTABLE@" config.libomptarget_debug = @LIBOMPTARGET_DEBUG@ config.has_libomptarget_ompt = @LIBOMPTARGET_OMPT_SUPPORT@ config.libomptarget_has_libc = @LIBOMPTARGET_GPU_LIBC_SUPPORT@ - +config.libomptarget_test_pgo = @LIBOMPTARGET_TEST_GPU_PGO@ # Let the main config do the real work. lit_config.load_config(config, "@CMAKE_CURRENT_SOURCE_DIR@/lit.cfg") diff --git a/openmp/libomptarget/test/offloading/pgo1.c b/openmp/libomptarget/test/offloading/pgo1.c new file mode 100644 index 00000000000000..ca8a6f502a06aa --- /dev/null +++ b/openmp/libomptarget/test/offloading/pgo1.c @@ -0,0 +1,39 @@ +// RUN: %libomptarget-compile-generic -fprofile-instr-generate -Xclang "-fprofile-instrument=clang" +// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic + +// UNSUPPORTED: x86_64-pc-linux-gnu +// UNSUPPORTED: x86_64-pc-linux-gnu-LTO +// UNSUPPORTED: aarch64-unknown-linux-gnu +// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +// REQUIRES: pgo + +#ifdef _OPENMP +#include +#endif + +int test1(int a) { return a / 2; } +int test2(int a) { return a * 2; } + +int main() { + int m = 2; +#pragma omp target + for (int i = 0; i < 10; i++) { + m = test1(m); + for (int j = 0; j < 2; j++) { + m = test2(m); + } + } +} + +// CHECK: ======== Counters ========= +// CHECK-NEXT: [ 0 11 20 ] +// CHECK-NEXT: [ 10 ] +// CHECK-NEXT: [ 20 ] +// CHECK-NEXT: ========== Data =========== +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: ======== Functions ======== +// CHECK-NEXT: pgo1.c:__omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} +// CHECK-NEXT: test1 +// CHECK-NEXT: test2 >From c7ae2a74daa93b05058fcc9bba64e0734359362c Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 17 Jan 2024 23:12:27 -0600 Subject: [PATCH 11/39] Fix PGO test formatting --- openmp/libomptarget/test/offloading/pgo1.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/openmp/libomptarget/test/offloading/pgo1.c b/openmp/libomptarget/test/offloading/pgo1.c index ca8a6f502a06aa..389be19b670d76 100644 --- a/openmp/libomptarget/test/offloading/pgo1.c +++ b/openmp/libomptarget/test/offloading/pgo1.c @@ -1,4 +1,5 @@ -// RUN: %libomptarget-compile-generic -fprofile-instr-generate -Xclang "-fprofile-instrument=clang" +// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ +// RUN: -Xclang "-fprofile-instrument=clang" // RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic // UNSUPPORTED: x86_64-pc-linux-gnu @@ -30,9 +31,18 @@ int main() { // CHECK-NEXT: [ 10 ] // CHECK-NEXT: [ 20 ] // CHECK-NEXT: ========== Data =========== -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } // CHECK-NEXT: ======== Functions ======== // CHECK-NEXT: pgo1.c:__omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} // CHECK-NEXT: test1 >From 8bb22072914bbb830e2788d117aedd0e0bab66ff Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 18 Jan 2024 23:15:55 -0600 Subject: [PATCH 12/39] Refactor visibility logic --- llvm/lib/ProfileData/InstrProf.cpp | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp index 511571a3eed9b0..708ea63fd95e04 100644 --- a/llvm/lib/ProfileData/InstrProf.cpp +++ b/llvm/lib/ProfileData/InstrProf.cpp @@ -422,6 +422,16 @@ bool isGPUProfTarget(const Module &M) { return Triple.isAMDGPU() || Triple.isNVPTX(); } +void setPGOFuncVisibility(Module &M, GlobalVariable *FuncNameVar) { + // If the target is a GPU, make the symbol protected so it can + // be read from the host device + if (isGPUProfTarget(M)) + FuncNameVar->setVisibility(GlobalValue::ProtectedVisibility); + // Hide the symbol so that we correctly get a copy for each executable. + else if (!GlobalValue::isLocalLinkage(FuncNameVar->getLinkage())) + FuncNameVar->setVisibility(GlobalValue::HiddenVisibility); +} + GlobalVariable *createPGOFuncNameVar(Module &M, GlobalValue::LinkageTypes Linkage, StringRef PGOFuncName) { @@ -445,14 +455,7 @@ GlobalVariable *createPGOFuncNameVar(Module &M, new GlobalVariable(M, Value->getType(), true, Linkage, Value, getPGOFuncNameVarName(PGOFuncName, Linkage)); - // If the target is a GPU, make the symbol protected so it can - // be read from the host device - if (isGPUProfTarget(M)) - FuncNameVar->setVisibility(GlobalValue::ProtectedVisibility); - // Hide the symbol so that we correctly get a copy for each executable. - else if (!GlobalValue::isLocalLinkage(FuncNameVar->getLinkage())) - FuncNameVar->setVisibility(GlobalValue::HiddenVisibility); - + setPGOFuncVisibility(M, FuncNameVar); return FuncNameVar; } >From 9f13943f64cb16162e44902d54de53a9b1229179 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 23 Jan 2024 18:33:58 -0600 Subject: [PATCH 13/39] Add LLVM instrumentation support This PR formerly only supported -fprofile-instrument=clang. This commit adds support for -fprofile-instrument=llvm --- .../Instrumentation/PGOInstrumentation.cpp | 12 +++- openmp/libomptarget/test/offloading/pgo1.c | 72 +++++++++++++------ 2 files changed, 59 insertions(+), 25 deletions(-) diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index c20fc942eaf0d5..bbc8da78fd7baf 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -862,6 +862,10 @@ static void instrumentOneFunc( auto Name = FuncInfo.FuncNameVar; auto CFGHash = ConstantInt::get(Type::getInt64Ty(M->getContext()), FuncInfo.FunctionHash); + // Make sure that pointer to global is passed in with zero addrspace + // This is relevant during GPU profiling + auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( + Name, llvm::PointerType::getUnqual(M->getContext())); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); IRBuilder<> Builder(&EntryBB, EntryBB.getFirstInsertionPt()); @@ -869,7 +873,7 @@ static void instrumentOneFunc( // i32 ) Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_cover), - {Name, CFGHash, Builder.getInt32(1), Builder.getInt32(0)}); + {NormalizedPtr, CFGHash, Builder.getInt32(1), Builder.getInt32(0)}); return; } @@ -887,7 +891,8 @@ static void instrumentOneFunc( // i32 ) Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_timestamp), - {Name, CFGHash, Builder.getInt32(NumCounters), Builder.getInt32(I)}); + {NormalizedPtr, CFGHash, Builder.getInt32(NumCounters), + Builder.getInt32(I)}); I += PGOBlockCoverage ? 8 : 1; } @@ -901,7 +906,8 @@ static void instrumentOneFunc( Intrinsic::getDeclaration(M, PGOBlockCoverage ? Intrinsic::instrprof_cover : Intrinsic::instrprof_increment), - {Name, CFGHash, Builder.getInt32(NumCounters), Builder.getInt32(I++)}); + {NormalizedPtr, CFGHash, Builder.getInt32(NumCounters), + Builder.getInt32(I++)}); } // Now instrument select instructions: diff --git a/openmp/libomptarget/test/offloading/pgo1.c b/openmp/libomptarget/test/offloading/pgo1.c index 389be19b670d76..d95793b508dcfc 100644 --- a/openmp/libomptarget/test/offloading/pgo1.c +++ b/openmp/libomptarget/test/offloading/pgo1.c @@ -1,6 +1,11 @@ // RUN: %libomptarget-compile-generic -fprofile-instr-generate \ // RUN: -Xclang "-fprofile-instrument=clang" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic +// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ +// RUN: --check-prefix="CLANG-PGO" +// RUN: %libomptarget-compile-generic -fprofile-generate \ +// RUN: -Xclang "-fprofile-instrument=llvm" +// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ +// RUN: --check-prefix="LLVM-PGO" // UNSUPPORTED: x86_64-pc-linux-gnu // UNSUPPORTED: x86_64-pc-linux-gnu-LTO @@ -26,24 +31,47 @@ int main() { } } -// CHECK: ======== Counters ========= -// CHECK-NEXT: [ 0 11 20 ] -// CHECK-NEXT: [ 10 ] -// CHECK-NEXT: [ 20 ] -// CHECK-NEXT: ========== Data =========== -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: ======== Functions ======== -// CHECK-NEXT: pgo1.c:__omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} -// CHECK-NEXT: test1 -// CHECK-NEXT: test2 +// CLANG-PGO: ======== Counters ========= +// CLANG-PGO-NEXT: [ 0 11 20 ] +// CLANG-PGO-NEXT: [ 10 ] +// CLANG-PGO-NEXT: [ 20 ] +// CLANG-PGO-NEXT: ========== Data =========== +// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CLANG-PGO-NEXT: ======== Functions ======== +// CLANG-PGO-NEXT: pgo1.c: +// CLANG-PGO-SAME: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} +// CLANG-PGO-NEXT: test1 +// CLANG-PGO-NEXT: test2 + +// LLVM-PGO: ======== Counters ========= +// LLVM-PGO-NEXT: [ 20 ] +// LLVM-PGO-NEXT: [ 10 ] +// LLVM-PGO-NEXT: [ 20 10 1 1 ] +// LLVM-PGO-NEXT: ========== Data =========== +// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// LLVM-PGO-NEXT: ======== Functions ======== +// LLVM-PGO-NEXT: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} +// LLVM-PGO-NEXT: test1 +// LLVM-PGO-NEXT: test2 >From 0606f0dd1b32ef9ebe138bbc964b3921e22d95d1 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 14 Feb 2024 01:46:55 -0600 Subject: [PATCH 14/39] Use explicit addrspace instead of unqual --- clang/lib/CodeGen/CodeGenPGO.cpp | 2 +- llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index e084dda879cbc0..4c75a01222d304 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1103,7 +1103,7 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - FuncNameVar, llvm::PointerType::getUnqual(CGM.getLLVMContext())); + FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), Builder.getInt32(NumRegionCounters), diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index bbc8da78fd7baf..c63b3e4ecf786a 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -865,7 +865,7 @@ static void instrumentOneFunc( // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - Name, llvm::PointerType::getUnqual(M->getContext())); + Name, llvm::PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); IRBuilder<> Builder(&EntryBB, EntryBB.getFirstInsertionPt()); >From c1f9be321678766525141214aaab74636cafbc2c Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 15 Feb 2024 19:10:09 -0600 Subject: [PATCH 15/39] Remove redundant namespaces --- .../Instrumentation/PGOInstrumentation.cpp | 4 +-- .../common/src/GlobalHandler.cpp | 26 +++++++++---------- 2 files changed, 15 insertions(+), 15 deletions(-) diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index c63b3e4ecf786a..3058e577738fda 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -864,8 +864,8 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - Name, llvm::PointerType::get(M->getContext(), 0)); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); IRBuilder<> Builder(&EntryBB, EntryBB.getFirstInsertionPt()); diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index e5eb653d022287..ae270c60804d26 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -219,30 +219,30 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, } void GPUProfGlobals::dump() const { - llvm::outs() << "======= GPU Profile =======\nTarget: " << TargetTriple.str() + outs() << "======= GPU Profile =======\nTarget: " << TargetTriple.str() << "\n"; - llvm::outs() << "======== Counters =========\n"; + outs() << "======== Counters =========\n"; for (const auto &Count : Counts) { - llvm::outs() << "["; + outs() << "["; for (size_t i = 0; i < Count.size(); i++) { if (i == 0) - llvm::outs() << " "; - llvm::outs() << Count[i] << " "; + outs() << " "; + outs() << Count[i] << " "; } - llvm::outs() << "]\n"; + outs() << "]\n"; } - llvm::outs() << "========== Data ===========\n"; + outs() << "========== Data ===========\n"; for (const auto &ProfData : Data) { - llvm::outs() << "{ "; + outs() << "{ "; #define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ - llvm::outs() << ProfData.Name << " "; + outs() << ProfData.Name << " "; #include "llvm/ProfileData/InstrProfData.inc" - llvm::outs() << "}\n"; + outs() << "}\n"; } - llvm::outs() << "======== Functions ========\n"; + outs() << "======== Functions ========\n"; std::string s; s.reserve(NamesData.size()); for (uint8_t Name : NamesData) { @@ -253,6 +253,6 @@ void GPUProfGlobals::dump() const { if (Error Err = Symtab.create(StringRef(s))) { consumeError(std::move(Err)); } - Symtab.dumpNames(llvm::outs()); - llvm::outs() << "===========================\n"; + Symtab.dumpNames(outs()); + outs() << "===========================\n"; } >From 6a3ae407e69e7524f0f808329c534f8352ee1779 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 15 Feb 2024 19:15:15 -0600 Subject: [PATCH 16/39] Clang format --- .../libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index ae270c60804d26..1fce2448922624 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -220,7 +220,7 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, void GPUProfGlobals::dump() const { outs() << "======= GPU Profile =======\nTarget: " << TargetTriple.str() - << "\n"; + << "\n"; outs() << "======== Counters =========\n"; for (const auto &Count : Counts) { >From 6866862d459e3c3fa65fae8ae639ddc3ff735252 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 16 Feb 2024 13:13:39 -0600 Subject: [PATCH 17/39] Use getAddrSpaceCast Replace getPointerBitCastOrAddrSpaceCast with getAddrSpaceCast and allow no-op getAddrSpaceCast calls when types are identical --- clang/lib/CodeGen/CodeGenPGO.cpp | 2 +- llvm/lib/IR/Constants.cpp | 4 ++++ llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | 2 +- 3 files changed, 6 insertions(+), 2 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index 8f52018445d2b0..baceeba8380ddb 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1099,7 +1099,7 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( + auto *NormalizedPtr = llvm::ConstantExpr::getAddrSpaceCast( FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp index a38b912164b130..2d89c5bbd4a4c2 100644 --- a/llvm/lib/IR/Constants.cpp +++ b/llvm/lib/IR/Constants.cpp @@ -2067,6 +2067,10 @@ Constant *ConstantExpr::getBitCast(Constant *C, Type *DstTy, Constant *ConstantExpr::getAddrSpaceCast(Constant *C, Type *DstTy, bool OnlyIfReduced) { + // Skip cast if types are identical + if (C->getType() == DstTy) + return C; + assert(CastInst::castIsValid(Instruction::AddrSpaceCast, C, DstTy) && "Invalid constantexpr addrspacecast!"); return getFoldedCast(Instruction::AddrSpaceCast, C, DstTy, OnlyIfReduced); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index 3058e577738fda..c0be71aa4cc004 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -864,7 +864,7 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + auto *NormalizedPtr = ConstantExpr::getAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); >From 62a5ee1c75545571f81d9edd22e19e9ef7cff69f Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 27 Feb 2024 14:53:51 -0600 Subject: [PATCH 18/39] Revert "Use getAddrSpaceCast" This reverts commit 6866862d459e3c3fa65fae8ae639ddc3ff735252. --- clang/lib/CodeGen/CodeGenPGO.cpp | 2 +- llvm/lib/IR/Constants.cpp | 4 ---- llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | 2 +- 3 files changed, 2 insertions(+), 6 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index baceeba8380ddb..8f52018445d2b0 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1099,7 +1099,7 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getAddrSpaceCast( + auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp index 2d89c5bbd4a4c2..a38b912164b130 100644 --- a/llvm/lib/IR/Constants.cpp +++ b/llvm/lib/IR/Constants.cpp @@ -2067,10 +2067,6 @@ Constant *ConstantExpr::getBitCast(Constant *C, Type *DstTy, Constant *ConstantExpr::getAddrSpaceCast(Constant *C, Type *DstTy, bool OnlyIfReduced) { - // Skip cast if types are identical - if (C->getType() == DstTy) - return C; - assert(CastInst::castIsValid(Instruction::AddrSpaceCast, C, DstTy) && "Invalid constantexpr addrspacecast!"); return getFoldedCast(Instruction::AddrSpaceCast, C, DstTy, OnlyIfReduced); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index c0be71aa4cc004..3058e577738fda 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -864,7 +864,7 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = ConstantExpr::getAddrSpaceCast( + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); >From 052394fa28c923d130bf73a07b965a9751467302 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 27 Feb 2024 15:34:34 -0600 Subject: [PATCH 19/39] Revert "Use getAddrSpaceCast" This reverts commit 6866862d459e3c3fa65fae8ae639ddc3ff735252. --- clang/lib/CodeGen/CodeGenPGO.cpp | 2 +- llvm/lib/IR/Constants.cpp | 4 ---- llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | 2 +- 3 files changed, 2 insertions(+), 6 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index baceeba8380ddb..8f52018445d2b0 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1099,7 +1099,7 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getAddrSpaceCast( + auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp index 2d89c5bbd4a4c2..a38b912164b130 100644 --- a/llvm/lib/IR/Constants.cpp +++ b/llvm/lib/IR/Constants.cpp @@ -2067,10 +2067,6 @@ Constant *ConstantExpr::getBitCast(Constant *C, Type *DstTy, Constant *ConstantExpr::getAddrSpaceCast(Constant *C, Type *DstTy, bool OnlyIfReduced) { - // Skip cast if types are identical - if (C->getType() == DstTy) - return C; - assert(CastInst::castIsValid(Instruction::AddrSpaceCast, C, DstTy) && "Invalid constantexpr addrspacecast!"); return getFoldedCast(Instruction::AddrSpaceCast, C, DstTy, OnlyIfReduced); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index c0be71aa4cc004..3058e577738fda 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -864,7 +864,7 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = ConstantExpr::getAddrSpaceCast( + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); >From 612d5a5f6966a77e82e5591f5aea475fbf886e55 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 1 Mar 2024 02:04:00 -0600 Subject: [PATCH 20/39] Write PGO TODO: Fix tests --- compiler-rt/lib/profile/InstrProfiling.h | 11 ++ compiler-rt/lib/profile/InstrProfilingFile.c | 148 +++++++++++++++--- .../common/include/GlobalHandler.h | 14 +- .../common/src/GlobalHandler.cpp | 57 +++++-- .../common/src/PluginInterface.cpp | 6 +- 5 files changed, 200 insertions(+), 36 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfiling.h b/compiler-rt/lib/profile/InstrProfiling.h index 01239083369187..937acbd417de46 100644 --- a/compiler-rt/lib/profile/InstrProfiling.h +++ b/compiler-rt/lib/profile/InstrProfiling.h @@ -275,6 +275,17 @@ void __llvm_profile_get_padding_sizes_for_counters( */ void __llvm_profile_set_dumped(); +/*! + * \brief Write custom target-specific profiling data to a seperate file. + * Used by libomptarget for GPU PGO. + */ +int __llvm_write_custom_profile(const char *Target, + const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, + const char *CountersBegin, + const char *CountersEnd, const char *NamesBegin, + const char *NamesEnd); + /*! * This variable is defined in InstrProfilingRuntime.cpp as a hidden * symbol. Its main purpose is to enable profile runtime user to diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index f3b457d786e6bd..4fc401bb9bebf5 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -502,27 +502,15 @@ static FILE *getFileObject(const char *OutputName) { return fopen(OutputName, "ab"); } -/* Write profile data to file \c OutputName. */ -static int writeFile(const char *OutputName) { - int RetVal; - FILE *OutputFile; - - int MergeDone = 0; +/* Get file object and merge if applicable */ +static FILE *getMergeFileObject(const char *OutputName, int *MergeDone) { VPMergeHook = &lprofMergeValueProfData; if (doMerging()) - OutputFile = openFileForMerging(OutputName, &MergeDone); - else - OutputFile = getFileObject(OutputName); - - if (!OutputFile) - return -1; - - FreeHook = &free; - setupIOBuffer(); - ProfDataWriter fileWriter; - initFileWriter(&fileWriter, OutputFile); - RetVal = lprofWriteData(&fileWriter, lprofGetVPDataReader(), MergeDone); + return openFileForMerging(OutputName, MergeDone); + return getFileObject(OutputName); +} +static void closeFileObject(FILE *OutputFile) { if (OutputFile == getProfileFile()) { fflush(OutputFile); if (doMerging() && !__llvm_profile_is_continuous_mode_enabled()) { @@ -531,7 +519,23 @@ static int writeFile(const char *OutputName) { } else { fclose(OutputFile); } +} + +/* Write profile data to file \c OutputName. */ +static int writeFile(const char *OutputName) { + int RetVal, MergeDone = 0; + FILE *OutputFile = getMergeFileObject(OutputName, &MergeDone); + + if (!OutputFile) + return -1; + + FreeHook = &free; + setupIOBuffer(); + ProfDataWriter fileWriter; + initFileWriter(&fileWriter, OutputFile); + RetVal = lprofWriteData(&fileWriter, lprofGetVPDataReader(), MergeDone); + closeFileObject(OutputFile); return RetVal; } @@ -558,10 +562,16 @@ static int writeOrderFile(const char *OutputName) { #define LPROF_INIT_ONCE_ENV "__LLVM_PROFILE_RT_INIT_ONCE" +static void forceTruncateFile(const char *Filename) { + FILE *File = fopen(Filename, "w"); + if (!File) + return; + fclose(File); +} + static void truncateCurrentFile(void) { const char *Filename; char *FilenameBuf; - FILE *File; int Length; Length = getCurFilenameLength(); @@ -591,10 +601,7 @@ static void truncateCurrentFile(void) { return; /* Truncate the file. Later we'll reopen and append. */ - File = fopen(Filename, "w"); - if (!File) - return; - fclose(File); + forceTruncateFile(Filename); } /* Write a partial profile to \p Filename, which is required to be backed by @@ -1271,4 +1278,99 @@ COMPILER_RT_VISIBILITY int __llvm_profile_set_file_object(FILE *File, return 0; } +int __llvm_write_custom_profile(const char *Target, + const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, + const char *CountersBegin, + const char *CountersEnd, const char *NamesBegin, + const char *NamesEnd) { + int ReturnValue = 0, FilenameLength, TargetLength, MergeDone; + char *FilenameBuf, *TargetFilename; + const char *Filename; + + /* Save old profile data */ + FILE *oldFile = getProfileFile(); + + // Temporarily suspend getting SIGKILL when the parent exits. + int PDeathSig = lprofSuspendSigKill(); + + if (lprofProfileDumped() || __llvm_profile_is_continuous_mode_enabled()) { + PROF_NOTE("Profile data not written to file: %s.\n", "already written"); + if (PDeathSig == 1) + lprofRestoreSigKill(); + return 0; + } + + /* Get current filename */ + FilenameLength = getCurFilenameLength(); + FilenameBuf = (char *)COMPILER_RT_ALLOCA(FilenameLength + 1); + Filename = getCurFilename(FilenameBuf, 0); + + /* Check the filename. */ + if (!Filename) { + PROF_ERR("Failed to write file : %s\n", "Filename not set"); + if (PDeathSig == 1) + lprofRestoreSigKill(); + return -1; + } + + /* Allocate new space for our target-specific PGO filename */ + TargetLength = strlen(Target); + TargetFilename = + (char *)COMPILER_RT_ALLOCA(FilenameLength + TargetLength + 2); + + /* Prepend "TARGET." to current filename */ + memcpy(TargetFilename, Target, TargetLength); + TargetFilename[TargetLength] = '.'; + memcpy(TargetFilename, Target, TargetLength); + memcpy(TargetFilename + 1 + TargetLength, Filename, FilenameLength); + TargetFilename[FilenameLength + 1 + TargetLength] = 0; + + /* Check if there is llvm/runtime version mismatch. */ + if (GET_VERSION(__llvm_profile_get_version()) != INSTR_PROF_RAW_VERSION) { + PROF_ERR("Runtime and instrumentation version mismatch : " + "expected %d, but get %d\n", + INSTR_PROF_RAW_VERSION, + (int)GET_VERSION(__llvm_profile_get_version())); + if (PDeathSig == 1) + lprofRestoreSigKill(); + return -1; + } + + /* Clean old target file */ + forceTruncateFile(TargetFilename); + + /* Open target-specific PGO file */ + MergeDone = 0; + FILE *OutputFile = getMergeFileObject(TargetFilename, &MergeDone); + + if (!OutputFile) { + PROF_ERR("Failed to open file : %s\n", TargetFilename); + if (PDeathSig == 1) + lprofRestoreSigKill(); + return -1; + } + + FreeHook = &free; + setupIOBuffer(); + ProfDataWriter fileWriter; + initFileWriter(&fileWriter, OutputFile); + + /* Write custom data to the file */ + ReturnValue = lprofWriteDataImpl( + &fileWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, + lprofGetVPDataReader(), NamesBegin, NamesEnd, MergeDone); + + closeFileObject(OutputFile); + + // Restore SIGKILL. + if (PDeathSig == 1) + lprofRestoreSigKill(); + + /* Restore old profiling file */ + setProfileFile(oldFile); + + return ReturnValue; +} + #endif diff --git a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h index f5a15ca11bfcda..af0cd4dcdf5dcf 100644 --- a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h +++ b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h @@ -63,14 +63,24 @@ struct __llvm_profile_data { #include "llvm/ProfileData/InstrProfData.inc" }; +extern "C" { +extern int __attribute__((weak)) +__llvm_write_custom_profile(const char *Target, + const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, + const char *CountersBegin, const char *CountersEnd, + const char *NamesBegin, const char *NamesEnd); +} + /// PGO profiling data extracted from a GPU device struct GPUProfGlobals { - SmallVector NamesData; - SmallVector> Counts; + SmallVector Counts; SmallVector<__llvm_profile_data> Data; + SmallVector NamesData; Triple TargetTriple; void dump() const; + Error write() const; }; /// Subclass of GlobalTy that holds the memory for a global of \p Ty. diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index 1fce2448922624..2f16b6e3c139e9 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -205,7 +205,7 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, GlobalTy CountGlobal(NameOrErr->str(), Sym.getSize(), Counts.data()); if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal)) return Err; - DeviceProfileData.Counts.push_back(std::move(Counts)); + DeviceProfileData.Counts.append(std::move(Counts)); } else if (NameOrErr->starts_with(getInstrProfDataVarPrefix())) { // Read profiling data for this global variable __llvm_profile_data Data{}; @@ -223,15 +223,14 @@ void GPUProfGlobals::dump() const { << "\n"; outs() << "======== Counters =========\n"; - for (const auto &Count : Counts) { - outs() << "["; - for (size_t i = 0; i < Count.size(); i++) { - if (i == 0) - outs() << " "; - outs() << Count[i] << " "; - } - outs() << "]\n"; + for (size_t i = 0; i < Counts.size(); i++) { + if (i > 0 && i % 10 == 0) + outs() << "\n"; + else if (i != 0) + outs() << " "; + outs() << Counts[i]; } + outs() << "\n"; outs() << "========== Data ===========\n"; for (const auto &ProfData : Data) { @@ -256,3 +255,43 @@ void GPUProfGlobals::dump() const { Symtab.dumpNames(outs()); outs() << "===========================\n"; } + +Error GPUProfGlobals::write() const { + if (!__llvm_write_custom_profile) + return Plugin::error("Could not find symbol __llvm_write_custom_profile. " + "The compiler-rt profiling library must be linked for " + "GPU PGO to work."); + + size_t DataSize = Data.size() * sizeof(__llvm_profile_data), + CountsSize = Counts.size() * sizeof(int64_t); + __llvm_profile_data *DataBegin, *DataEnd; + char *CountersBegin, *CountersEnd, *NamesBegin, *NamesEnd; + + // Initialize array of contiguous data. We need to make sure each section is + // contiguous so that the PGO library can compute deltas properly + SmallVector ContiguousData(NamesData.size() + DataSize + CountsSize); + + // Compute region pointers + DataBegin = (__llvm_profile_data *)(ContiguousData.data() + CountsSize); + DataEnd = + (__llvm_profile_data *)(ContiguousData.data() + CountsSize + DataSize); + CountersBegin = (char *)ContiguousData.data(); + CountersEnd = (char *)(ContiguousData.data() + CountsSize); + NamesBegin = (char *)(ContiguousData.data() + CountsSize + DataSize); + NamesEnd = (char *)(ContiguousData.data() + CountsSize + DataSize + + NamesData.size()); + + // Copy data to contiguous buffer + memcpy(DataBegin, Data.data(), DataSize); + memcpy(CountersBegin, Counts.data(), CountsSize); + memcpy(NamesBegin, NamesData.data(), NamesData.size()); + + // Invoke compiler-rt entrypoint + int result = __llvm_write_custom_profile(TargetTriple.str().c_str(), + DataBegin, DataEnd, CountersBegin, + CountersEnd, NamesBegin, NamesEnd); + if (result != 0) + return Plugin::error("Error writing GPU PGO data to file"); + + return Plugin::success(); +} diff --git a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp index 1ea93795ce8ce4..d5e6b6128152dc 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp @@ -837,8 +837,10 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { if (!ProfOrErr) return ProfOrErr.takeError(); - // TODO: write data to profiling file - ProfOrErr->dump(); + // Write data to profiling file + if (auto Err = ProfOrErr->write()) { + consumeError(std::move(Err)); + } } // Delete the memory manager before deinitializing the device. Otherwise, >From b8c916305acf08c0bd2d51b81875be5e8fc59ff3 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 13 Mar 2024 20:05:32 -0500 Subject: [PATCH 21/39] Fix tests --- .../plugins-nextgen/common/src/PluginInterface.cpp | 3 +++ openmp/libomptarget/test/offloading/pgo1.c | 8 ++------ 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp index d5e6b6128152dc..2359ad28a25b04 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp @@ -837,6 +837,9 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { if (!ProfOrErr) return ProfOrErr.takeError(); + // Dump out profdata + ProfOrErr->dump(); + // Write data to profiling file if (auto Err = ProfOrErr->write()) { consumeError(std::move(Err)); diff --git a/openmp/libomptarget/test/offloading/pgo1.c b/openmp/libomptarget/test/offloading/pgo1.c index d95793b508dcfc..79e93d0f10827f 100644 --- a/openmp/libomptarget/test/offloading/pgo1.c +++ b/openmp/libomptarget/test/offloading/pgo1.c @@ -32,9 +32,7 @@ int main() { } // CLANG-PGO: ======== Counters ========= -// CLANG-PGO-NEXT: [ 0 11 20 ] -// CLANG-PGO-NEXT: [ 10 ] -// CLANG-PGO-NEXT: [ 20 ] +// CLANG-PGO-NEXT: 0 11 20 10 20 // CLANG-PGO-NEXT: ========== Data =========== // CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} // CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} @@ -55,9 +53,7 @@ int main() { // CLANG-PGO-NEXT: test2 // LLVM-PGO: ======== Counters ========= -// LLVM-PGO-NEXT: [ 20 ] -// LLVM-PGO-NEXT: [ 10 ] -// LLVM-PGO-NEXT: [ 20 10 1 1 ] +// LLVM-PGO-NEXT: 20 10 20 10 1 1 // LLVM-PGO-NEXT: ========== Data =========== // LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} // LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} >From 7770b37a5a4c40bd45887f762bd7f1e652bc0ed2 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 7 May 2024 16:31:48 -0500 Subject: [PATCH 22/39] Fix params --- compiler-rt/lib/profile/InstrProfilingFile.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index 466bfe480543bc..bc1d40a37a5ad6 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1360,9 +1360,10 @@ int __llvm_write_custom_profile(const char *Target, initFileWriter(&fileWriter, OutputFile); /* Write custom data to the file */ - ReturnValue = lprofWriteDataImpl( - &fileWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, - lprofGetVPDataReader(), NamesBegin, NamesEnd, MergeDone); + ReturnValue = + lprofWriteDataImpl(&fileWriter, DataBegin, DataEnd, CountersBegin, + CountersEnd, NULL, NULL, lprofGetVPDataReader(), NULL, + NULL, NULL, NULL, NamesBegin, NamesEnd, MergeDone); closeFileObject(OutputFile); >From aa895a1788969a0d27692057a1457074e9772c78 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 18 Mar 2024 21:31:32 -0500 Subject: [PATCH 23/39] Fix elf obj file --- offload/plugins-nextgen/common/src/GlobalHandler.cpp | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/offload/plugins-nextgen/common/src/GlobalHandler.cpp b/offload/plugins-nextgen/common/src/GlobalHandler.cpp index 80cdcaff75528e..7717e19a5b6779 100644 --- a/offload/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/offload/plugins-nextgen/common/src/GlobalHandler.cpp @@ -177,16 +177,19 @@ Expected GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, DeviceImageTy &Image) { GPUProfGlobals DeviceProfileData; - auto ELFObj = getELFObjectFile(Image); - if (!ELFObj) - return ELFObj.takeError(); + auto ObjFile = getELFObjectFile(Image); + if (!ObjFile) + return ObjFile.takeError(); + + std::unique_ptr ELFObj( + static_cast(ObjFile->release())); DeviceProfileData.TargetTriple = ELFObj->makeTriple(); // Iterate through elf symbols for (auto &Sym : ELFObj->symbols()) { auto NameOrErr = Sym.getName(); if (!NameOrErr) - return ELFObj.takeError(); + return NameOrErr.takeError(); // Check if given current global is a profiling global based // on name >From 2031e49c2b26864f2dab72e629eb6cbe34928a7a Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 6 May 2024 23:13:58 -0500 Subject: [PATCH 24/39] Add more addrspace casts for GPU targets --- .../Transforms/Instrumentation/InstrProfiling.cpp | 11 ++++++++--- .../Instrumentation/PGOInstrumentation.cpp | 13 +++++++++---- 2 files changed, 17 insertions(+), 7 deletions(-) diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index a6b1e0d488120a..dd8c027c4bbf62 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -869,6 +869,8 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { llvm::InstrProfValueKind::IPVK_MemOPSize); CallInst *Call = nullptr; auto *TLI = &GetTLI(*Ind->getFunction()); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + DataVar, PointerType::getUnqual(M.getContext())); // To support value profiling calls within Windows exception handlers, funclet // information contained within operand bundles needs to be copied over to @@ -877,11 +879,13 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { SmallVector OpBundles; Ind->getOperandBundlesAsDefs(OpBundles); if (!IsMemOpSize) { - Value *Args[3] = {Ind->getTargetValue(), DataVar, Builder.getInt32(Index)}; + Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Builder.getInt32(Index)}; Call = Builder.CreateCall(getOrInsertValueProfilingCall(M, *TLI), Args, OpBundles); } else { - Value *Args[3] = {Ind->getTargetValue(), DataVar, Builder.getInt32(Index)}; + Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Builder.getInt32(Index)}; Call = Builder.CreateCall( getOrInsertValueProfilingCall(M, *TLI, ValueProfilingCallType::MemOp), Args, OpBundles); @@ -1575,7 +1579,8 @@ void InstrLowerer::createDataVariable(InstrProfCntrInstBase *Inc) { getInstrProfSectionName(IPSK_vals, TT.getObjectFormat())); ValuesVar->setAlignment(Align(8)); maybeSetComdat(ValuesVar, Fn, CntsVarName); - ValuesPtrExpr = ValuesVar; + ValuesPtrExpr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + ValuesVar, PointerType::getUnqual(Fn->getContext())); } uint64_t NumCounters = Inc->getNumCounters()->getZExtValue(); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index 4b51396a8baa35..ee1657ba8400ee 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -1007,12 +1007,15 @@ static void instrumentOneFunc( ToProfile = Builder.CreatePtrToInt(Cand.V, Builder.getInt64Ty()); assert(ToProfile && "value profiling Value is of unexpected type"); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + Name, PointerType::get(M->getContext(), 0)); + SmallVector OpBundles; populateEHOperandBundle(Cand, BlockColors, OpBundles); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_value_profile), - {FuncInfo.FuncNameVar, Builder.getInt64(FuncInfo.FunctionHash), - ToProfile, Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, + {NormalizedPtr, Builder.getInt64(FuncInfo.FunctionHash), ToProfile, + Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, OpBundles); } } // IPVK_First <= Kind <= IPVK_Last @@ -1685,10 +1688,12 @@ void SelectInstVisitor::instrumentOneSelectInst(SelectInst &SI) { IRBuilder<> Builder(&SI); Type *Int64Ty = Builder.getInt64Ty(); auto *Step = Builder.CreateZExt(SI.getCondition(), Int64Ty); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + FuncNameVar, PointerType::get(M->getContext(), 0)); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_increment_step), - {FuncNameVar, Builder.getInt64(FuncHash), Builder.getInt32(TotalNumCtrs), - Builder.getInt32(*CurCtrIdx), Step}); + {NormalizedPtr, Builder.getInt64(FuncHash), + Builder.getInt32(TotalNumCtrs), Builder.getInt32(*CurCtrIdx), Step}); ++(*CurCtrIdx); } >From be6524bb4f77de0add1e698f68115fd336f32238 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 13 May 2024 17:41:00 -0500 Subject: [PATCH 25/39] Have test read from profraw instead of dump --- offload/test/lit.cfg | 2 + offload/test/offloading/pgo1.c | 94 ++++++++++++++++------------------ 2 files changed, 46 insertions(+), 50 deletions(-) diff --git a/offload/test/lit.cfg b/offload/test/lit.cfg index 069110dc69a6e4..38e6a33b01fafc 100644 --- a/offload/test/lit.cfg +++ b/offload/test/lit.cfg @@ -391,6 +391,8 @@ if config.test_fortran_compiler: config.available_features.add('flang') config.substitutions.append(("%flang", config.test_fortran_compiler)) +config.substitutions.append(("%target_triple", config.libomptarget_current_target)) + config.substitutions.append(("%openmp_flags", config.test_openmp_flags)) if config.libomptarget_current_target.startswith('nvptx') and config.cuda_path: config.substitutions.append(("%cuda_flags", "--cuda-path=" + config.cuda_path)) diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index 79e93d0f10827f..d22d5340f5b3ec 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -1,22 +1,21 @@ -// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ -// RUN: -Xclang "-fprofile-instrument=clang" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ -// RUN: --check-prefix="CLANG-PGO" -// RUN: %libomptarget-compile-generic -fprofile-generate \ -// RUN: -Xclang "-fprofile-instrument=llvm" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ +// RUN: %libomptarget-compile-generic -Xclang "-fprofile-instrument=llvm" +// RUN: env LLVM_PROFILE_FILE=llvm.profraw %libomptarget-run-generic 2>&1 +// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %target_triple.llvm.profraw | %fcheck-generic \ // RUN: --check-prefix="LLVM-PGO" +// RUN: %libomptarget-compile-generic -Xclang "-fprofile-instrument=clang" +// RUN: env LLVM_PROFILE_FILE=clang.profraw %libomptarget-run-generic 2>&1 +// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %target_triple.clang.profraw | %fcheck-generic \ +// RUN: --check-prefix="CLANG-PGO" + // UNSUPPORTED: x86_64-pc-linux-gnu // UNSUPPORTED: x86_64-pc-linux-gnu-LTO // UNSUPPORTED: aarch64-unknown-linux-gnu // UNSUPPORTED: aarch64-unknown-linux-gnu-LTO // REQUIRES: pgo -#ifdef _OPENMP -#include -#endif - int test1(int a) { return a / 2; } int test2(int a) { return a * 2; } @@ -31,43 +30,38 @@ int main() { } } -// CLANG-PGO: ======== Counters ========= -// CLANG-PGO-NEXT: 0 11 20 10 20 -// CLANG-PGO-NEXT: ========== Data =========== -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: ======== Functions ======== -// CLANG-PGO-NEXT: pgo1.c: -// CLANG-PGO-SAME: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} -// CLANG-PGO-NEXT: test1 -// CLANG-PGO-NEXT: test2 +// LLVM-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: +// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-PGO: Counters: 4 +// LLVM-PGO: Function count: 20 +// LLVM-PGO: Block counts: [10, 20, 10] + +// LLVM-PGO-LABEL: test1: +// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-PGO: Counters: 1 +// LLVM-PGO: Function count: 1 +// LLVM-PGO: Block counts: [] + +// LLVM-PGO-LABEL: test2: +// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-PGO: Counters: 1 +// LLVM-PGO: Function count: 1 +// LLVM-PGO: Block counts: [] + +// CLANG-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: +// CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-PGO: Counters: 3 +// CLANG-PGO: Function count: 0 +// CLANG-PGO: Block counts: [11, 20] + +// CLANG-PGO-LABEL: test1: +// CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-PGO: Counters: 1 +// CLANG-PGO: Function count: 10 +// CLANG-PGO: Block counts: [] -// LLVM-PGO: ======== Counters ========= -// LLVM-PGO-NEXT: 20 10 20 10 1 1 -// LLVM-PGO-NEXT: ========== Data =========== -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: ======== Functions ======== -// LLVM-PGO-NEXT: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} -// LLVM-PGO-NEXT: test1 -// LLVM-PGO-NEXT: test2 +// CLANG-PGO-LABEL: test2: +// CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-PGO: Counters: 1 +// CLANG-PGO: Function count: 20 +// CLANG-PGO: Block counts: [] >From 2b8eb2935ec21bf0acc5c56f45837b5976560963 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 24 May 2024 19:59:33 -0500 Subject: [PATCH 26/39] Fix PGO test format --- offload/test/offloading/pgo1.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index d22d5340f5b3ec..0e75c684ed9263 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -33,20 +33,17 @@ int main() { // LLVM-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 4 -// LLVM-PGO: Function count: 20 -// LLVM-PGO: Block counts: [10, 20, 10] +// LLVM-PGO: Block counts: [20, 10, 20, 10] // LLVM-PGO-LABEL: test1: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Function count: 1 -// LLVM-PGO: Block counts: [] +// LLVM-PGO: Block counts: [1] // LLVM-PGO-LABEL: test2: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Function count: 1 -// LLVM-PGO: Block counts: [] +// LLVM-PGO: Block counts: [1] // CLANG-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} >From 67f3009173d815295f36e2b37e85add1347e3bf9 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 24 May 2024 20:45:04 -0500 Subject: [PATCH 27/39] Refactor profile writer --- compiler-rt/lib/profile/InstrProfilingFile.c | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index bc1d40a37a5ad6..76238214c13aa3 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1344,8 +1344,7 @@ int __llvm_write_custom_profile(const char *Target, forceTruncateFile(TargetFilename); /* Open target-specific PGO file */ - MergeDone = 0; - FILE *OutputFile = getMergeFileObject(TargetFilename, &MergeDone); + FILE *OutputFile = getFileObject(TargetFilename); if (!OutputFile) { PROF_ERR("Failed to open file : %s\n", TargetFilename); @@ -1356,15 +1355,11 @@ int __llvm_write_custom_profile(const char *Target, FreeHook = &free; setupIOBuffer(); - ProfDataWriter fileWriter; - initFileWriter(&fileWriter, OutputFile); - - /* Write custom data to the file */ - ReturnValue = - lprofWriteDataImpl(&fileWriter, DataBegin, DataEnd, CountersBegin, - CountersEnd, NULL, NULL, lprofGetVPDataReader(), NULL, - NULL, NULL, NULL, NamesBegin, NamesEnd, MergeDone); + /* Write custom data */ + ReturnValue = __llvm_profile_write_buffer_internal( + OutputFile, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, + NamesBegin, NamesEnd); closeFileObject(OutputFile); // Restore SIGKILL. >From e8ad1322c557f7b48e2b28fe3a34a696a1103bba Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 27 May 2024 18:29:18 -0500 Subject: [PATCH 28/39] Fix refactor bug --- compiler-rt/lib/profile/InstrProfilingFile.c | 52 ++++++++++---------- offload/test/offloading/pgo1.c | 6 ++- 2 files changed, 29 insertions(+), 29 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index 76238214c13aa3..784cb9af6169d8 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -505,14 +505,6 @@ static FILE *getFileObject(const char *OutputName) { return fopen(OutputName, "ab"); } -/* Get file object and merge if applicable */ -static FILE *getMergeFileObject(const char *OutputName, int *MergeDone) { - VPMergeHook = &lprofMergeValueProfData; - if (doMerging()) - return openFileForMerging(OutputName, MergeDone); - return getFileObject(OutputName); -} - static void closeFileObject(FILE *OutputFile) { if (OutputFile == getProfileFile()) { fflush(OutputFile); @@ -526,8 +518,15 @@ static void closeFileObject(FILE *OutputFile) { /* Write profile data to file \c OutputName. */ static int writeFile(const char *OutputName) { - int RetVal, MergeDone = 0; - FILE *OutputFile = getMergeFileObject(OutputName, &MergeDone); + int RetVal; + FILE *OutputFile; + + int MergeDone = 0; + VPMergeHook = &lprofMergeValueProfData; + if (doMerging()) + OutputFile = openFileForMerging(OutputName, &MergeDone); + else + OutputFile = getFileObject(OutputName); if (!OutputFile) return -1; @@ -565,16 +564,10 @@ static int writeOrderFile(const char *OutputName) { #define LPROF_INIT_ONCE_ENV "__LLVM_PROFILE_RT_INIT_ONCE" -static void forceTruncateFile(const char *Filename) { - FILE *File = fopen(Filename, "w"); - if (!File) - return; - fclose(File); -} - static void truncateCurrentFile(void) { const char *Filename; char *FilenameBuf; + FILE *File; int Length; Length = getCurFilenameLength(); @@ -604,7 +597,10 @@ static void truncateCurrentFile(void) { return; /* Truncate the file. Later we'll reopen and append. */ - forceTruncateFile(Filename); + File = fopen(Filename, "w"); + if (!File) + return; + fclose(File); } /* Write a partial profile to \p Filename, which is required to be backed by @@ -1287,7 +1283,7 @@ int __llvm_write_custom_profile(const char *Target, const char *CountersBegin, const char *CountersEnd, const char *NamesBegin, const char *NamesEnd) { - int ReturnValue = 0, FilenameLength, TargetLength, MergeDone; + int ReturnValue = 0, FilenameLength, TargetLength; char *FilenameBuf, *TargetFilename; const char *Filename; @@ -1340,11 +1336,9 @@ int __llvm_write_custom_profile(const char *Target, return -1; } - /* Clean old target file */ - forceTruncateFile(TargetFilename); - - /* Open target-specific PGO file */ - FILE *OutputFile = getFileObject(TargetFilename); + /* Open and truncate target-specific PGO file */ + FILE *OutputFile = fopen(TargetFilename, "w"); + setProfileFile(OutputFile); if (!OutputFile) { PROF_ERR("Failed to open file : %s\n", TargetFilename); @@ -1357,9 +1351,13 @@ int __llvm_write_custom_profile(const char *Target, setupIOBuffer(); /* Write custom data */ - ReturnValue = __llvm_profile_write_buffer_internal( - OutputFile, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, - NamesBegin, NamesEnd); + ProfDataWriter fileWriter; + initFileWriter(&fileWriter, OutputFile); + + /* Write custom data to the file */ + ReturnValue = lprofWriteDataImpl( + &fileWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, + lprofGetVPDataReader(), NULL, NULL, NULL, NULL, NamesBegin, NamesEnd, 0); closeFileObject(OutputFile); // Restore SIGKILL. diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index 0e75c684ed9263..d6747113265803 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -1,10 +1,12 @@ -// RUN: %libomptarget-compile-generic -Xclang "-fprofile-instrument=llvm" +// RUN: %libomptarget-compile-generic -fprofile-generate \ +// RUN: -Xclang "-fprofile-instrument=llvm" // RUN: env LLVM_PROFILE_FILE=llvm.profraw %libomptarget-run-generic 2>&1 // RUN: llvm-profdata show --all-functions --counts \ // RUN: %target_triple.llvm.profraw | %fcheck-generic \ // RUN: --check-prefix="LLVM-PGO" -// RUN: %libomptarget-compile-generic -Xclang "-fprofile-instrument=clang" +// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ +// RUN: -Xclang "-fprofile-instrument=clang" // RUN: env LLVM_PROFILE_FILE=clang.profraw %libomptarget-run-generic 2>&1 // RUN: llvm-profdata show --all-functions --counts \ // RUN: %target_triple.clang.profraw | %fcheck-generic \ >From 4c9f814ce14aeb6766a93f5c1d15b847b98dc29f Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 28 May 2024 12:58:43 -0500 Subject: [PATCH 29/39] Make requested clang-format change --- offload/plugins-nextgen/common/include/GlobalHandler.h | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/offload/plugins-nextgen/common/include/GlobalHandler.h b/offload/plugins-nextgen/common/include/GlobalHandler.h index 017d7e994f07a8..1d7b9f80f9dfd3 100644 --- a/offload/plugins-nextgen/common/include/GlobalHandler.h +++ b/offload/plugins-nextgen/common/include/GlobalHandler.h @@ -64,12 +64,10 @@ struct __llvm_profile_data { }; extern "C" { -extern int __attribute__((weak)) -__llvm_write_custom_profile(const char *Target, - const __llvm_profile_data *DataBegin, - const __llvm_profile_data *DataEnd, - const char *CountersBegin, const char *CountersEnd, - const char *NamesBegin, const char *NamesEnd); +extern int __attribute__((weak)) __llvm_write_custom_profile( + const char *Target, const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, const char *CountersBegin, + const char *CountersEnd, const char *NamesBegin, const char *NamesEnd); } /// PGO profiling data extracted from a GPU device >From 344e357de657f54c068be969dcfc3ea33f2f026e Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 31 May 2024 20:29:20 -0500 Subject: [PATCH 30/39] Tighten PGO test requirements Require compiler-rt to be an enabled runtime --- offload/test/CMakeLists.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/offload/test/CMakeLists.txt b/offload/test/CMakeLists.txt index 32df1e47afaeb2..41ab339147791c 100644 --- a/offload/test/CMakeLists.txt +++ b/offload/test/CMakeLists.txt @@ -12,10 +12,10 @@ else() set(LIBOMPTARGET_DEBUG False) endif() -if (OPENMP_STANDALONE_BUILD) - set(LIBOMPTARGET_TEST_GPU_PGO False) -else() +if (NOT OPENMP_STANDALONE_BUILD AND "compiler-rt" IN_LIST LLVM_ENABLE_RUNTIMES) set(LIBOMPTARGET_TEST_GPU_PGO True) +else() + set(LIBOMPTARGET_TEST_GPU_PGO False) endif() # Replace the space from user's input with ";" in case that CMake add escape >From 2f751420b9ad2ffc7c9fac4a645724b45cdae59a Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 31 May 2024 20:29:20 -0500 Subject: [PATCH 31/39] Tighten PGO test requirements Require compiler-rt to be an enabled runtime --- offload/test/CMakeLists.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/offload/test/CMakeLists.txt b/offload/test/CMakeLists.txt index 32df1e47afaeb2..41ab339147791c 100644 --- a/offload/test/CMakeLists.txt +++ b/offload/test/CMakeLists.txt @@ -12,10 +12,10 @@ else() set(LIBOMPTARGET_DEBUG False) endif() -if (OPENMP_STANDALONE_BUILD) - set(LIBOMPTARGET_TEST_GPU_PGO False) -else() +if (NOT OPENMP_STANDALONE_BUILD AND "compiler-rt" IN_LIST LLVM_ENABLE_RUNTIMES) set(LIBOMPTARGET_TEST_GPU_PGO True) +else() + set(LIBOMPTARGET_TEST_GPU_PGO False) endif() # Replace the space from user's input with ";" in case that CMake add escape >From 488cb4a349fdfbd73d0a78ddb2c17522c46145ba Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 26 Jun 2024 18:18:31 -0500 Subject: [PATCH 32/39] Apply requested formatting changes --- clang/lib/CodeGen/CodeGenPGO.cpp | 11 +++++----- llvm/lib/ProfileData/InstrProf.cpp | 4 ++-- .../Instrumentation/InstrProfiling.cpp | 10 ++++----- .../Instrumentation/PGOInstrumentation.cpp | 21 ++++++++++--------- offload/DeviceRTL/src/Profiling.cpp | 6 ++++-- 5 files changed, 28 insertions(+), 24 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index a7ce0b8f6a35f3..3edfbdd679c61d 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1199,12 +1199,13 @@ void CodeGenPGO::emitCounterSetOrIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); + auto *NormalizedFuncNameVarPtr = + llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( + FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); - llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), - Builder.getInt32(NumRegionCounters), - Builder.getInt32(Counter), StepV}; + llvm::Value *Args[] = { + NormalizedFuncNameVarPtr, Builder.getInt64(FunctionHash), + Builder.getInt32(NumRegionCounters), Builder.getInt32(Counter), StepV}; if (llvm::EnableSingleByteCoverage) Builder.CreateCall(CGM.getIntrinsic(llvm::Intrinsic::instrprof_cover), diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp index 1284efd4b5f4da..6742435c9d065e 100644 --- a/llvm/lib/ProfileData/InstrProf.cpp +++ b/llvm/lib/ProfileData/InstrProf.cpp @@ -433,8 +433,8 @@ std::string getPGOFuncNameVarName(StringRef FuncName, } bool isGPUProfTarget(const Module &M) { - const auto &Triple = llvm::Triple(M.getTargetTriple()); - return Triple.isAMDGPU() || Triple.isNVPTX(); + const auto &T = Triple(M.getTargetTriple()); + return T.isAMDGPU() || T.isNVPTX(); } void setPGOFuncVisibility(Module &M, GlobalVariable *FuncNameVar) { diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index dd8c027c4bbf62..05cef1236f0879 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -869,8 +869,8 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { llvm::InstrProfValueKind::IPVK_MemOPSize); CallInst *Call = nullptr; auto *TLI = &GetTLI(*Ind->getFunction()); - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( - DataVar, PointerType::getUnqual(M.getContext())); + auto *NormalizedDataVarPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + DataVar, PointerType::get(M.getContext(), 0)); // To support value profiling calls within Windows exception handlers, funclet // information contained within operand bundles needs to be copied over to @@ -879,12 +879,12 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { SmallVector OpBundles; Ind->getOperandBundlesAsDefs(OpBundles); if (!IsMemOpSize) { - Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Value *Args[3] = {Ind->getTargetValue(), NormalizedDataVarPtr, Builder.getInt32(Index)}; Call = Builder.CreateCall(getOrInsertValueProfilingCall(M, *TLI), Args, OpBundles); } else { - Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Value *Args[3] = {Ind->getTargetValue(), NormalizedDataVarPtr, Builder.getInt32(Index)}; Call = Builder.CreateCall( getOrInsertValueProfilingCall(M, *TLI, ValueProfilingCallType::MemOp), @@ -1580,7 +1580,7 @@ void InstrLowerer::createDataVariable(InstrProfCntrInstBase *Inc) { ValuesVar->setAlignment(Align(8)); maybeSetComdat(ValuesVar, Fn, CntsVarName); ValuesPtrExpr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( - ValuesVar, PointerType::getUnqual(Fn->getContext())); + ValuesVar, PointerType::get(Fn->getContext(), 0)); } uint64_t NumCounters = Inc->getNumCounters()->getZExtValue(); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index ee1657ba8400ee..f8f34ea25597f3 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -884,7 +884,7 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + auto *NormalizedNamePtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); @@ -893,7 +893,7 @@ static void instrumentOneFunc( // i32 ) Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_cover), - {NormalizedPtr, CFGHash, Builder.getInt32(1), Builder.getInt32(0)}); + {NormalizedNamePtr, CFGHash, Builder.getInt32(1), Builder.getInt32(0)}); return; } @@ -948,7 +948,7 @@ static void instrumentOneFunc( // i32 ) Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_timestamp), - {NormalizedPtr, CFGHash, Builder.getInt32(NumCounters), + {NormalizedNamePtr, CFGHash, Builder.getInt32(NumCounters), Builder.getInt32(I)}); I += PGOBlockCoverage ? 8 : 1; } @@ -963,7 +963,7 @@ static void instrumentOneFunc( Intrinsic::getDeclaration(M, PGOBlockCoverage ? Intrinsic::instrprof_cover : Intrinsic::instrprof_increment), - {NormalizedPtr, CFGHash, Builder.getInt32(NumCounters), + {NormalizedNamePtr, CFGHash, Builder.getInt32(NumCounters), Builder.getInt32(I++)}); } @@ -1007,15 +1007,15 @@ static void instrumentOneFunc( ToProfile = Builder.CreatePtrToInt(Cand.V, Builder.getInt64Ty()); assert(ToProfile && "value profiling Value is of unexpected type"); - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + auto *NormalizedNamePtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); SmallVector OpBundles; populateEHOperandBundle(Cand, BlockColors, OpBundles); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_value_profile), - {NormalizedPtr, Builder.getInt64(FuncInfo.FunctionHash), ToProfile, - Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, + {NormalizedNamePtr, Builder.getInt64(FuncInfo.FunctionHash), + ToProfile, Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, OpBundles); } } // IPVK_First <= Kind <= IPVK_Last @@ -1688,11 +1688,12 @@ void SelectInstVisitor::instrumentOneSelectInst(SelectInst &SI) { IRBuilder<> Builder(&SI); Type *Int64Ty = Builder.getInt64Ty(); auto *Step = Builder.CreateZExt(SI.getCondition(), Int64Ty); - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( - FuncNameVar, PointerType::get(M->getContext(), 0)); + auto *NormalizedFuncNameVarPtr = + ConstantExpr::getPointerBitCastOrAddrSpaceCast( + FuncNameVar, PointerType::get(M->getContext(), 0)); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_increment_step), - {NormalizedPtr, Builder.getInt64(FuncHash), + {NormalizedFuncNameVarPtr, Builder.getInt64(FuncHash), Builder.getInt32(TotalNumCtrs), Builder.getInt32(*CurCtrIdx), Step}); ++(*CurCtrIdx); } diff --git a/offload/DeviceRTL/src/Profiling.cpp b/offload/DeviceRTL/src/Profiling.cpp index 799477f5e47d27..639c62ceff7a69 100644 --- a/offload/DeviceRTL/src/Profiling.cpp +++ b/offload/DeviceRTL/src/Profiling.cpp @@ -12,8 +12,10 @@ extern "C" { -void __llvm_profile_register_function(void *ptr) {} -void __llvm_profile_register_names_function(void *ptr, long int i) {} +// Provides empty implementations for certain functions in compiler-rt +// that are emitted by the PGO instrumentation. +void __llvm_profile_register_function(void *Ptr) {} +void __llvm_profile_register_names_function(void *Ptr, long int I) {} } #pragma omp end declare target >From b90c01583f1893802aba0180b07a448584585365 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 26 Jun 2024 18:29:59 -0500 Subject: [PATCH 33/39] Add memop function shim to DeviceRTL This comes up sometimes when using LLVM IR level instrumentation. --- offload/DeviceRTL/include/Profiling.h | 1 + offload/DeviceRTL/src/Profiling.cpp | 1 + 2 files changed, 2 insertions(+) diff --git a/offload/DeviceRTL/include/Profiling.h b/offload/DeviceRTL/include/Profiling.h index 9efc1554c176bc..d9947522541219 100644 --- a/offload/DeviceRTL/include/Profiling.h +++ b/offload/DeviceRTL/include/Profiling.h @@ -15,6 +15,7 @@ extern "C" { void __llvm_profile_register_function(void *Ptr); void __llvm_profile_register_names_function(void *Ptr, long int I); +void __llvm_profile_instrument_memop(long int I, void *Ptr, int I2); } #endif diff --git a/offload/DeviceRTL/src/Profiling.cpp b/offload/DeviceRTL/src/Profiling.cpp index 639c62ceff7a69..bb3caaadcc03dd 100644 --- a/offload/DeviceRTL/src/Profiling.cpp +++ b/offload/DeviceRTL/src/Profiling.cpp @@ -16,6 +16,7 @@ extern "C" { // that are emitted by the PGO instrumentation. void __llvm_profile_register_function(void *Ptr) {} void __llvm_profile_register_names_function(void *Ptr, long int I) {} +void __llvm_profile_instrument_memop(long int I, void *Ptr, int I2) {} } #pragma omp end declare target >From c68c6e2fa98a1fe608b88ed38f7db68eae804c5b Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 27 Jun 2024 02:04:27 -0500 Subject: [PATCH 34/39] Make requested changes --- compiler-rt/lib/profile/InstrProfiling.h | 2 +- compiler-rt/lib/profile/InstrProfilingFile.c | 1 - offload/plugins-nextgen/common/src/PluginInterface.cpp | 5 ++--- 3 files changed, 3 insertions(+), 5 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfiling.h b/compiler-rt/lib/profile/InstrProfiling.h index ef1292a45bf01d..eda3e9a673c1af 100644 --- a/compiler-rt/lib/profile/InstrProfiling.h +++ b/compiler-rt/lib/profile/InstrProfiling.h @@ -298,7 +298,7 @@ void __llvm_profile_set_dumped(); /*! * \brief Write custom target-specific profiling data to a seperate file. - * Used by libomptarget for GPU PGO. + * Used by offload PGO. */ int __llvm_write_custom_profile(const char *Target, const __llvm_profile_data *DataBegin, diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index 784cb9af6169d8..93436ecbabb40d 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1321,7 +1321,6 @@ int __llvm_write_custom_profile(const char *Target, /* Prepend "TARGET." to current filename */ memcpy(TargetFilename, Target, TargetLength); TargetFilename[TargetLength] = '.'; - memcpy(TargetFilename, Target, TargetLength); memcpy(TargetFilename + 1 + TargetLength, Filename, FilenameLength); TargetFilename[FilenameLength + 1 + TargetLength] = 0; diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp index c4e1e63777de8a..445f4ad942bd4d 100644 --- a/offload/plugins-nextgen/common/src/PluginInterface.cpp +++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp @@ -843,9 +843,8 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { ProfOrErr->dump(); // Write data to profiling file - if (auto Err = ProfOrErr->write()) { - consumeError(std::move(Err)); - } + if (auto Err = ProfOrErr->write()) + return Err; } // Delete the memory manager before deinitializing the device. Otherwise, >From ca52c58c7fde412897cf6b10b9bbb321812f193d Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 27 Jun 2024 02:26:20 -0500 Subject: [PATCH 35/39] Only dump counters if PGODump flag is set --- offload/include/Shared/Environment.h | 1 + offload/plugins-nextgen/common/src/PluginInterface.cpp | 4 +++- openmp/docs/design/Runtimes.rst | 1 + 3 files changed, 5 insertions(+), 1 deletion(-) diff --git a/offload/include/Shared/Environment.h b/offload/include/Shared/Environment.h index d141146b6bd5a1..86f6d1c6ea2d36 100644 --- a/offload/include/Shared/Environment.h +++ b/offload/include/Shared/Environment.h @@ -30,6 +30,7 @@ enum class DeviceDebugKind : uint32_t { FunctionTracing = 1U << 1, CommonIssues = 1U << 2, AllocationTracker = 1U << 3, + PGODump = 1U << 4, }; struct DeviceEnvironmentTy { diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp index 445f4ad942bd4d..35fb04863d8741 100644 --- a/offload/plugins-nextgen/common/src/PluginInterface.cpp +++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp @@ -840,7 +840,9 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { return ProfOrErr.takeError(); // Dump out profdata - ProfOrErr->dump(); + if ((OMPX_DebugKind.get() & uint32_t(DeviceDebugKind::PGODump)) == + uint32_t(DeviceDebugKind::PGODump)) + ProfOrErr->dump(); // Write data to profiling file if (auto Err = ProfOrErr->write()) diff --git a/openmp/docs/design/Runtimes.rst b/openmp/docs/design/Runtimes.rst index f8a8cb87e83e66..7fc697a838e229 100644 --- a/openmp/docs/design/Runtimes.rst +++ b/openmp/docs/design/Runtimes.rst @@ -1493,3 +1493,4 @@ debugging features are supported. * Enable debugging assertions in the device. ``0x01`` * Enable diagnosing common problems during offloading . ``0x4`` * Enable device malloc statistics (amdgpu only). ``0x8`` + * Dump device PGO counters (only if PGO on GPU is enabled). ``0x10`` >From ee4431a1b57469c7679f54f124ca5f3dd7f0433b Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 9 Aug 2024 20:21:38 -0500 Subject: [PATCH 36/39] Update requirements --- offload/test/offloading/pgo1.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index d6747113265803..fbf6337374a997 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -12,10 +12,7 @@ // RUN: %target_triple.clang.profraw | %fcheck-generic \ // RUN: --check-prefix="CLANG-PGO" -// UNSUPPORTED: x86_64-pc-linux-gnu -// UNSUPPORTED: x86_64-pc-linux-gnu-LTO -// UNSUPPORTED: aarch64-unknown-linux-gnu -// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +// REQUIRES: gpu // REQUIRES: pgo int test1(int a) { return a / 2; } >From fb699b6bca72d42359a304bcbba88f3564ae9ac9 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Sat, 10 Aug 2024 00:54:36 -0500 Subject: [PATCH 37/39] Merge changes --- offload/plugins-nextgen/common/src/GlobalHandler.cpp | 2 +- offload/test/offloading/pgo1.c | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/offload/plugins-nextgen/common/src/GlobalHandler.cpp b/offload/plugins-nextgen/common/src/GlobalHandler.cpp index bca66cff6558a2..d7bfbba01c8efc 100644 --- a/offload/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/offload/plugins-nextgen/common/src/GlobalHandler.cpp @@ -193,7 +193,7 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, // Check if given current global is a profiling global based // on name - if (NameOrErr->equals(getInstrProfNamesVarName())) { + if (*NameOrErr == getInstrProfNamesVarName()) { // Read in profiled function names DeviceProfileData.NamesData = SmallVector(Sym.getSize(), 0); GlobalTy NamesGlobal(NameOrErr->str(), Sym.getSize(), diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index fbf6337374a997..3270ce8f15e7dc 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -32,17 +32,17 @@ int main() { // LLVM-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 4 -// LLVM-PGO: Block counts: [20, 10, 20, 10] +// LLVM-PGO: Block counts: [20, 10, 2, 1] // LLVM-PGO-LABEL: test1: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Block counts: [1] +// LLVM-PGO: Block counts: [10] // LLVM-PGO-LABEL: test2: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Block counts: [1] +// LLVM-PGO: Block counts: [20] // CLANG-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} >From 1d0a961aabe488e6d09b96a80329498b8f586923 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 25 Oct 2024 13:42:19 -0500 Subject: [PATCH 38/39] Add llvm-profdata substitution to offload tests --- offload/test/lit.cfg | 2 ++ offload/test/lit.site.cfg.in | 2 +- offload/test/offloading/pgo1.c | 4 ++-- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/offload/test/lit.cfg b/offload/test/lit.cfg index 7994a08ba063fb..cfd1ad6c3c1eb5 100644 --- a/offload/test/lit.cfg +++ b/offload/test/lit.cfg @@ -112,8 +112,10 @@ config.available_features.add(config.libomptarget_current_target) if config.libomptarget_has_libc: config.available_features.add('libc') +profdata_path = os.path.join(config.bin_llvm_tools_dir, "llvm-profdata") if config.libomptarget_test_pgo: config.available_features.add('pgo') + config.substitutions.append(("%profdata", profdata_path)) # Determine whether the test system supports unified memory. # For CUDA, this is the case with compute capability 70 (Volta) or higher. diff --git a/offload/test/lit.site.cfg.in b/offload/test/lit.site.cfg.in index a1cb5acc38a405..d998fb0c839700 100644 --- a/offload/test/lit.site.cfg.in +++ b/offload/test/lit.site.cfg.in @@ -1,6 +1,6 @@ @AUTO_GEN_COMMENT@ -config.bin_llvm_tools_dir = "@CMAKE_BINARY_DIR@/bin" +config.bin_llvm_tools_dir = "@LLVM_RUNTIME_OUTPUT_INTDIR@" config.test_c_compiler = "@OPENMP_TEST_C_COMPILER@" config.test_cxx_compiler = "@OPENMP_TEST_CXX_COMPILER@" config.test_fortran_compiler="@OPENMP_TEST_Fortran_COMPILER@" diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index 1ef540e430a27a..51671afa62b0db 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -1,14 +1,14 @@ // RUN: %libomptarget-compile-generic -fprofile-generate \ // RUN: -Xclang "-fprofile-instrument=llvm" // RUN: env LLVM_PROFILE_FILE=llvm.profraw %libomptarget-run-generic 2>&1 -// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %profdata show --all-functions --counts \ // RUN: %target_triple.llvm.profraw | %fcheck-generic \ // RUN: --check-prefix="LLVM-PGO" // RUN: %libomptarget-compile-generic -fprofile-instr-generate \ // RUN: -Xclang "-fprofile-instrument=clang" // RUN: env LLVM_PROFILE_FILE=clang.profraw %libomptarget-run-generic 2>&1 -// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %profdata show --all-functions --counts \ // RUN: %target_triple.clang.profraw | %fcheck-generic \ // RUN: --check-prefix="CLANG-PGO" >From c6b34ad7a676a462955b2e7b534b12264363b430 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 28 Oct 2024 18:45:37 -0500 Subject: [PATCH 39/39] Prepend target prefix to basename --- compiler-rt/lib/profile/InstrProfilingFile.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index decafbcb1a5352..6b6f47e239714c 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1355,10 +1355,21 @@ int __llvm_write_custom_profile(const char *Target, TargetFilename = (char *)COMPILER_RT_ALLOCA(FilenameLength + TargetLength + 2); + /* Find file basename and path sizes */ + int32_t DirEnd = FilenameLength - 1; + while (DirEnd >= 0 && !IS_DIR_SEPARATOR(Filename[DirEnd])) { + DirEnd--; + } + uint32_t DirSize = DirEnd + 1, BaseSize = FilenameLength - DirSize; + /* Prepend "TARGET." to current filename */ - memcpy(TargetFilename, Target, TargetLength); - TargetFilename[TargetLength] = '.'; - memcpy(TargetFilename + 1 + TargetLength, Filename, FilenameLength); + if (DirSize > 0) { + memcpy(TargetFilename, Filename, DirSize); + } + memcpy(TargetFilename + DirSize, Target, TargetLength); + TargetFilename[TargetLength + DirSize] = '.'; + memcpy(TargetFilename + DirSize + 1 + TargetLength, Filename + DirSize, + BaseSize); TargetFilename[FilenameLength + 1 + TargetLength] = 0; /* Check if there is llvm/runtime version mismatch. */ From openmp-commits at lists.llvm.org Thu Dec 26 23:16:39 2024 From: openmp-commits at lists.llvm.org (Ethan Luis McDonough via Openmp-commits) Date: Thu, 26 Dec 2024 23:16:39 -0800 (PST) Subject: [Openmp-commits] [clang] [compiler-rt] [llvm] [openmp] [PGO][Offload] Add GPU profiling flags to driver (PR #94268) In-Reply-To: Message-ID: <676e5457.170a0220.292a0d.6129@mx.google.com> https://github.com/EthanLuisMcDonough updated https://github.com/llvm/llvm-project/pull/94268 >From 530eb982b9770190377bb0bd09c5cb715f34d484 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 15 Dec 2023 20:38:38 -0600 Subject: [PATCH 01/58] Add profiling functions to libomptarget --- .../include/llvm/Frontend/OpenMP/OMPKinds.def | 3 +++ openmp/libomptarget/DeviceRTL/CMakeLists.txt | 2 ++ .../DeviceRTL/include/Profiling.h | 21 +++++++++++++++++++ .../libomptarget/DeviceRTL/src/Profiling.cpp | 19 +++++++++++++++++ 4 files changed, 45 insertions(+) create mode 100644 openmp/libomptarget/DeviceRTL/include/Profiling.h create mode 100644 openmp/libomptarget/DeviceRTL/src/Profiling.cpp diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def index d22d2a8e948b00..1d887d5cb58127 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def +++ b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def @@ -503,6 +503,9 @@ __OMP_RTL(__kmpc_barrier_simple_generic, false, Void, IdentPtr, Int32) __OMP_RTL(__kmpc_warp_active_thread_mask, false, Int64,) __OMP_RTL(__kmpc_syncwarp, false, Void, Int64) +__OMP_RTL(__llvm_profile_register_function, false, Void, VoidPtr) +__OMP_RTL(__llvm_profile_register_names_function, false, Void, VoidPtr, Int64) + __OMP_RTL(__last, false, Void, ) #undef __OMP_RTL diff --git a/openmp/libomptarget/DeviceRTL/CMakeLists.txt b/openmp/libomptarget/DeviceRTL/CMakeLists.txt index 1ce3e1e40a80ab..55ee15d068c67b 100644 --- a/openmp/libomptarget/DeviceRTL/CMakeLists.txt +++ b/openmp/libomptarget/DeviceRTL/CMakeLists.txt @@ -89,6 +89,7 @@ set(include_files ${include_directory}/Interface.h ${include_directory}/LibC.h ${include_directory}/Mapping.h + ${include_directory}/Profiling.h ${include_directory}/State.h ${include_directory}/Synchronization.h ${include_directory}/Types.h @@ -104,6 +105,7 @@ set(src_files ${source_directory}/Mapping.cpp ${source_directory}/Misc.cpp ${source_directory}/Parallelism.cpp + ${source_directory}/Profiling.cpp ${source_directory}/Reduction.cpp ${source_directory}/State.cpp ${source_directory}/Synchronization.cpp diff --git a/openmp/libomptarget/DeviceRTL/include/Profiling.h b/openmp/libomptarget/DeviceRTL/include/Profiling.h new file mode 100644 index 00000000000000..68c7744cd60752 --- /dev/null +++ b/openmp/libomptarget/DeviceRTL/include/Profiling.h @@ -0,0 +1,21 @@ +//===-------- Profiling.h - OpenMP interface ---------------------- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// +//===----------------------------------------------------------------------===// + +#ifndef OMPTARGET_DEVICERTL_PROFILING_H +#define OMPTARGET_DEVICERTL_PROFILING_H + +extern "C" { + +void __llvm_profile_register_function(void *ptr); +void __llvm_profile_register_names_function(void *ptr, long int i); +} + +#endif diff --git a/openmp/libomptarget/DeviceRTL/src/Profiling.cpp b/openmp/libomptarget/DeviceRTL/src/Profiling.cpp new file mode 100644 index 00000000000000..799477f5e47d27 --- /dev/null +++ b/openmp/libomptarget/DeviceRTL/src/Profiling.cpp @@ -0,0 +1,19 @@ +//===------- Profiling.cpp ---------------------------------------- C++ ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#include "Profiling.h" + +#pragma omp begin declare target device_type(nohost) + +extern "C" { + +void __llvm_profile_register_function(void *ptr) {} +void __llvm_profile_register_names_function(void *ptr, long int i) {} +} + +#pragma omp end declare target >From fb067d4ffe604fd68cf90b705db1942bce49dbb1 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Sat, 16 Dec 2023 01:18:41 -0600 Subject: [PATCH 02/58] Fix PGO instrumentation for GPU targets --- clang/lib/CodeGen/CodeGenPGO.cpp | 10 ++++++++-- .../lib/Transforms/Instrumentation/InstrProfiling.cpp | 11 ++++++++--- 2 files changed, 16 insertions(+), 5 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index 81bf8ea696b164..edae6885b528ac 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -959,8 +959,14 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, unsigned Counter = (*RegionCounterMap)[S]; - llvm::Value *Args[] = {FuncNameVar, - Builder.getInt64(FunctionHash), + // Make sure that pointer to global is passed in with zero addrspace + // This is relevant during GPU profiling + auto *I8Ty = llvm::Type::getInt8Ty(CGM.getLLVMContext()); + auto *I8PtrTy = llvm::PointerType::getUnqual(I8Ty); + auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( + FuncNameVar, I8PtrTy); + + llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), Builder.getInt32(NumRegionCounters), Builder.getInt32(Counter), StepV}; if (!StepV) diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index fe5a0578bd9721..d2cb8155c17967 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -1658,10 +1658,13 @@ void InstrLowerer::emitRegistration() { IRBuilder<> IRB(BasicBlock::Create(M.getContext(), "", RegisterF)); for (Value *Data : CompilerUsedVars) if (!isa(Data)) - IRB.CreateCall(RuntimeRegisterF, Data); + // Check for addrspace cast when profiling GPU + IRB.CreateCall(RuntimeRegisterF, + IRB.CreatePointerBitCastOrAddrSpaceCast(Data, VoidPtrTy)); for (Value *Data : UsedVars) if (Data != NamesVar && !isa(Data)) - IRB.CreateCall(RuntimeRegisterF, Data); + IRB.CreateCall(RuntimeRegisterF, + IRB.CreatePointerBitCastOrAddrSpaceCast(Data, VoidPtrTy)); if (NamesVar) { Type *ParamTypes[] = {VoidPtrTy, Int64Ty}; @@ -1670,7 +1673,9 @@ void InstrLowerer::emitRegistration() { auto *NamesRegisterF = Function::Create(NamesRegisterTy, GlobalVariable::ExternalLinkage, getInstrProfNamesRegFuncName(), M); - IRB.CreateCall(NamesRegisterF, {NamesVar, IRB.getInt64(NamesSize)}); + IRB.CreateCall(NamesRegisterF, {IRB.CreatePointerBitCastOrAddrSpaceCast( + NamesVar, VoidPtrTy), + IRB.getInt64(NamesSize)}); } IRB.CreateRetVoid(); >From 7a0e0efa178cc4de6a22a8f5cc3f53cd1c81ea3a Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 21 Dec 2023 00:25:46 -0600 Subject: [PATCH 03/58] Change global visibility on GPU targets --- llvm/include/llvm/ProfileData/InstrProf.h | 4 ++++ llvm/lib/ProfileData/InstrProf.cpp | 17 +++++++++++++++-- .../Instrumentation/InstrProfiling.cpp | 15 +++++++++++---- 3 files changed, 30 insertions(+), 6 deletions(-) diff --git a/llvm/include/llvm/ProfileData/InstrProf.h b/llvm/include/llvm/ProfileData/InstrProf.h index 288dc71d756aee..bf9899d867e3dd 100644 --- a/llvm/include/llvm/ProfileData/InstrProf.h +++ b/llvm/include/llvm/ProfileData/InstrProf.h @@ -171,6 +171,10 @@ inline StringRef getInstrProfCounterBiasVarName() { /// Return the marker used to separate PGO names during serialization. inline StringRef getInstrProfNameSeparator() { return "\01"; } +/// Determines whether module targets a GPU eligable for PGO +/// instrumentation +bool isGPUProfTarget(const Module &M); + /// Return the modified name for function \c F suitable to be /// used the key for profile lookup. Variable \c InLTO indicates if this /// is called in LTO optimization passes. diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp index 649d814cfd9de0..0d6717aeb0142c 100644 --- a/llvm/lib/ProfileData/InstrProf.cpp +++ b/llvm/lib/ProfileData/InstrProf.cpp @@ -410,13 +410,22 @@ std::string getPGOFuncNameVarName(StringRef FuncName, return VarName; } +bool isGPUProfTarget(const Module &M) { + const auto &triple = M.getTargetTriple(); + return triple.rfind("nvptx", 0) == 0 || triple.rfind("amdgcn", 0) == 0 || + triple.rfind("r600", 0) == 0; +} + GlobalVariable *createPGOFuncNameVar(Module &M, GlobalValue::LinkageTypes Linkage, StringRef PGOFuncName) { + // Ensure profiling variables on GPU are visible to be read from host + if (isGPUProfTarget(M)) + Linkage = GlobalValue::ExternalLinkage; // We generally want to match the function's linkage, but available_externally // and extern_weak both have the wrong semantics, and anything that doesn't // need to link across compilation units doesn't need to be visible at all. - if (Linkage == GlobalValue::ExternalWeakLinkage) + else if (Linkage == GlobalValue::ExternalWeakLinkage) Linkage = GlobalValue::LinkOnceAnyLinkage; else if (Linkage == GlobalValue::AvailableExternallyLinkage) Linkage = GlobalValue::LinkOnceODRLinkage; @@ -430,8 +439,12 @@ GlobalVariable *createPGOFuncNameVar(Module &M, new GlobalVariable(M, Value->getType(), true, Linkage, Value, getPGOFuncNameVarName(PGOFuncName, Linkage)); + // If the target is a GPU, make the symbol protected so it can + // be read from the host device + if (isGPUProfTarget(M)) + FuncNameVar->setVisibility(GlobalValue::ProtectedVisibility); // Hide the symbol so that we correctly get a copy for each executable. - if (!GlobalValue::isLocalLinkage(FuncNameVar->getLinkage())) + else if (!GlobalValue::isLocalLinkage(FuncNameVar->getLinkage())) FuncNameVar->setVisibility(GlobalValue::HiddenVisibility); return FuncNameVar; diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index d2cb8155c17967..3b582b65190808 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -1481,6 +1481,10 @@ void InstrLowerer::createDataVariable(InstrProfCntrInstBase *Inc) { for (uint32_t Kind = IPVK_First; Kind <= IPVK_Last; ++Kind) Int16ArrayVals[Kind] = ConstantInt::get(Int16Ty, PD.NumValueSites[Kind]); + if (isGPUProfTarget(M)) { + Linkage = GlobalValue::ExternalLinkage; + Visibility = GlobalValue::ProtectedVisibility; + } // If the data variable is not referenced by code (if we don't emit // @llvm.instrprof.value.profile, NS will be 0), and the counter keeps the // data variable live under linker GC, the data variable can be private. This @@ -1492,9 +1496,9 @@ void InstrLowerer::createDataVariable(InstrProfCntrInstBase *Inc) { // If profd is in a deduplicate comdat, NS==0 with a hash suffix guarantees // that other copies must have the same CFG and cannot have value profiling. // If no hash suffix, other profd copies may be referenced by code. - if (NS == 0 && !(DataReferencedByCode && NeedComdat && !Renamed) && - (TT.isOSBinFormatELF() || - (!DataReferencedByCode && TT.isOSBinFormatCOFF()))) { + else if (NS == 0 && !(DataReferencedByCode && NeedComdat && !Renamed) && + (TT.isOSBinFormatELF() || + (!DataReferencedByCode && TT.isOSBinFormatCOFF()))) { Linkage = GlobalValue::PrivateLinkage; Visibility = GlobalValue::DefaultVisibility; } @@ -1696,7 +1700,10 @@ bool InstrLowerer::emitRuntimeHook() { auto *Var = new GlobalVariable(M, Int32Ty, false, GlobalValue::ExternalLinkage, nullptr, getInstrProfRuntimeHookVarName()); - Var->setVisibility(GlobalValue::HiddenVisibility); + if (isGPUProfTarget(M)) + Var->setVisibility(GlobalValue::ProtectedVisibility); + else + Var->setVisibility(GlobalValue::HiddenVisibility); if (TT.isOSBinFormatELF() && !TT.isPS()) { // Mark the user variable as used so that it isn't stripped out. >From fddc07908ed9aa698fe3250ddbfc5621ab4d049d Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 22 Dec 2023 23:43:29 -0600 Subject: [PATCH 04/58] Make names global public on GPU --- llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index 3b582b65190808..61fba7be3ee0ee 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -1621,6 +1621,13 @@ void InstrLowerer::emitNameData() { NamesVar = new GlobalVariable(M, NamesVal->getType(), true, GlobalValue::PrivateLinkage, NamesVal, getInstrProfNamesVarName()); + + // Make names variable public if current target is a GPU + if (isGPUProfTarget(M)) { + NamesVar->setLinkage(GlobalValue::ExternalLinkage); + NamesVar->setVisibility(GlobalValue::VisibilityTypes::ProtectedVisibility); + } + NamesSize = CompressedNameStr.size(); setGlobalVariableLargeSection(TT, *NamesVar); NamesVar->setSection( >From e9db03c70bf79f4f4ddad4b48a5aa63a37e0d4f6 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 29 Dec 2023 12:54:50 -0600 Subject: [PATCH 05/58] Read and print GPU device PGO globals --- .../common/include/GlobalHandler.h | 27 ++++++ .../common/src/GlobalHandler.cpp | 82 +++++++++++++++++++ .../common/src/PluginInterface.cpp | 14 ++++ 3 files changed, 123 insertions(+) diff --git a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h index fa079ac9660ee0..a82cd536487653 100644 --- a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h +++ b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h @@ -14,9 +14,11 @@ #define LLVM_OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_GLOBALHANDLER_H #include +#include #include "llvm/ADT/DenseMap.h" #include "llvm/Object/ELFObjectFile.h" +#include "llvm/ProfileData/InstrProf.h" #include "Shared/Debug.h" #include "Shared/Utils.h" @@ -58,6 +60,22 @@ class GlobalTy { void setPtr(void *P) { Ptr = P; } }; +typedef void *IntPtrT; +struct __llvm_profile_data { +#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) Type Name; +#include "llvm/ProfileData/InstrProfData.inc" +}; + +/// PGO profiling data extracted from a GPU device +struct GPUProfGlobals { + std::string names; + std::vector> counts; + std::vector<__llvm_profile_data> data; + Triple targetTriple; + + void dump() const; +}; + /// Subclass of GlobalTy that holds the memory for a global of \p Ty. template class StaticGlobalTy : public GlobalTy { Ty Data; @@ -172,6 +190,15 @@ class GenericGlobalHandlerTy { return moveGlobalBetweenDeviceAndHost(Device, Image, HostGlobal, /* D2H */ false); } + + /// Checks whether a given image contains profiling globals. + bool hasProfilingGlobals(GenericDeviceTy &Device, DeviceImageTy &Image); + + /// Reads profiling data from a GPU image to supplied profdata struct. + /// Iterates through the image symbol table and stores global values + /// with profiling prefixes. + Expected readProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image); }; } // namespace plugin diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index 3a272e228c7dfe..5dd5daec468ca5 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -176,3 +176,85 @@ Error GenericGlobalHandlerTy::readGlobalFromImage(GenericDeviceTy &Device, return Plugin::success(); } + +bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GlobalTy global(getInstrProfNamesVarName().str(), 0); + if (auto Err = getGlobalMetadataFromImage(Device, Image, global)) { + consumeError(std::move(Err)); + return false; + } + return true; +} + +Expected +GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GPUProfGlobals profdata; + const auto *elf = getOrCreateELFObjectFile(Device, Image); + profdata.targetTriple = elf->makeTriple(); + // Iterate through + for (auto &sym : elf->symbols()) { + if (auto name = sym.getName()) { + // Check if given current global is a profiling global based + // on name + if (name->equals(getInstrProfNamesVarName())) { + // Read in profiled function names + std::vector chars(sym.getSize() / sizeof(char), ' '); + GlobalTy NamesGlobal(name->str(), sym.getSize(), chars.data()); + if (auto Err = readGlobalFromDevice(Device, Image, NamesGlobal)) + return Err; + std::string names(chars.begin(), chars.end()); + profdata.names = std::move(names); + } else if (name->starts_with(getInstrProfCountersVarPrefix())) { + // Read global variable profiling counts + std::vector counts(sym.getSize() / sizeof(int64_t), 0); + GlobalTy CountGlobal(name->str(), sym.getSize(), counts.data()); + if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal)) + return Err; + profdata.counts.push_back(std::move(counts)); + } else if (name->starts_with(getInstrProfDataVarPrefix())) { + // Read profiling data for this global variable + __llvm_profile_data data{}; + GlobalTy DataGlobal(name->str(), sym.getSize(), &data); + if (auto Err = readGlobalFromDevice(Device, Image, DataGlobal)) + return Err; + profdata.data.push_back(std::move(data)); + } + } + } + return profdata; +} + +void GPUProfGlobals::dump() const { + llvm::outs() << "======= GPU Profile =======\nTarget: " << targetTriple.str() + << "\n"; + + llvm::outs() << "======== Counters =========\n"; + for (const auto &count : counts) { + llvm::outs() << "["; + for (size_t i = 0; i < count.size(); i++) { + if (i == 0) + llvm::outs() << " "; + llvm::outs() << count[i] << " "; + } + llvm::outs() << "]\n"; + } + + llvm::outs() << "========== Data ===========\n"; + for (const auto &d : data) { + llvm::outs() << "{ "; +#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ + llvm::outs() << d.Name << " "; +#include "llvm/ProfileData/InstrProfData.inc" + llvm::outs() << " }\n"; + } + + llvm::outs() << "======== Functions ========\n"; + InstrProfSymtab symtab; + if (Error Err = symtab.create(StringRef(names))) { + consumeError(std::move(Err)); + } + symtab.dumpNames(llvm::outs()); + llvm::outs() << "===========================\n"; +} diff --git a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp index 3c7d1ca8998787..84ed90f03f84f1 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp @@ -811,6 +811,20 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { DeviceMemoryPoolTracking.AllocationMax); } + for (auto *Image : LoadedImages) { + GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler(); + if (!Handler.hasProfilingGlobals(*this, *Image)) + continue; + + GPUProfGlobals profdata; + auto ProfOrErr = Handler.readProfilingGlobals(*this, *Image); + if (!ProfOrErr) + return ProfOrErr.takeError(); + + // TODO: write data to profiling file + ProfOrErr->dump(); + } + // Delete the memory manager before deinitializing the device. Otherwise, // we may delete device allocations after the device is deinitialized. if (MemoryManager) >From e4687605d1a6ca932312025826db09dba84845a3 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 3 Jan 2024 17:06:15 -0600 Subject: [PATCH 06/58] Fix rebase bug --- .../plugins-nextgen/common/src/GlobalHandler.cpp | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index cb71b61f4a9c4f..86742d0f77a2fe 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -178,10 +178,12 @@ Expected GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, DeviceImageTy &Image) { GPUProfGlobals profdata; - const auto *elf = getOrCreateELFObjectFile(Device, Image); - profdata.targetTriple = elf->makeTriple(); - // Iterate through - for (auto &sym : elf->symbols()) { + auto ELFObj = getELFObjectFile(Image); + if (!ELFObj) + return ELFObj.takeError(); + profdata.targetTriple = ELFObj->makeTriple(); + // Iterate through elf symbols + for (auto &sym : ELFObj->symbols()) { if (auto name = sym.getName()) { // Check if given current global is a profiling global based // on name >From ec18ce94c227e1d43927955fa1c67360ecfcfca6 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 3 Jan 2024 17:10:19 -0600 Subject: [PATCH 07/58] Refactor portions to be more idiomatic --- clang/lib/CodeGen/CodeGenPGO.cpp | 4 +--- llvm/lib/ProfileData/InstrProf.cpp | 5 ++--- 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index edae6885b528ac..7bfcec43ee4c98 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -961,10 +961,8 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *I8Ty = llvm::Type::getInt8Ty(CGM.getLLVMContext()); - auto *I8PtrTy = llvm::PointerType::getUnqual(I8Ty); auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - FuncNameVar, I8PtrTy); + FuncNameVar, llvm::PointerType::getUnqual(CGM.getLLVMContext())); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), Builder.getInt32(NumRegionCounters), diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp index cdcd6840bb5108..1d88da16a5ff9c 100644 --- a/llvm/lib/ProfileData/InstrProf.cpp +++ b/llvm/lib/ProfileData/InstrProf.cpp @@ -429,9 +429,8 @@ std::string getPGOFuncNameVarName(StringRef FuncName, } bool isGPUProfTarget(const Module &M) { - const auto &triple = M.getTargetTriple(); - return triple.rfind("nvptx", 0) == 0 || triple.rfind("amdgcn", 0) == 0 || - triple.rfind("r600", 0) == 0; + const auto &Triple = llvm::Triple(M.getTargetTriple()); + return Triple.isAMDGPU() || Triple.isNVPTX(); } GlobalVariable *createPGOFuncNameVar(Module &M, >From 0872556f597056361b0a2c23cdd0be3d9745aef3 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 3 Jan 2024 17:18:47 -0600 Subject: [PATCH 08/58] Reformat DeviceRTL prof functions --- openmp/libomptarget/DeviceRTL/include/Profiling.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/openmp/libomptarget/DeviceRTL/include/Profiling.h b/openmp/libomptarget/DeviceRTL/include/Profiling.h index 68c7744cd60752..9efc1554c176bc 100644 --- a/openmp/libomptarget/DeviceRTL/include/Profiling.h +++ b/openmp/libomptarget/DeviceRTL/include/Profiling.h @@ -13,9 +13,8 @@ #define OMPTARGET_DEVICERTL_PROFILING_H extern "C" { - -void __llvm_profile_register_function(void *ptr); -void __llvm_profile_register_names_function(void *ptr, long int i); +void __llvm_profile_register_function(void *Ptr); +void __llvm_profile_register_names_function(void *Ptr, long int I); } #endif >From 62f31d1c71b5d100f38d6dc584cc138b3904581b Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 9 Jan 2024 11:52:29 -0600 Subject: [PATCH 09/58] Style changes + catch name error --- .../common/include/GlobalHandler.h | 16 ++-- .../common/src/GlobalHandler.cpp | 87 ++++++++++--------- 2 files changed, 56 insertions(+), 47 deletions(-) diff --git a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h index a803b3f76d8b25..755bb23a414e37 100644 --- a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h +++ b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h @@ -13,8 +13,7 @@ #ifndef LLVM_OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_GLOBALHANDLER_H #define LLVM_OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_GLOBALHANDLER_H -#include -#include +#include #include "llvm/ADT/DenseMap.h" #include "llvm/Object/ELFObjectFile.h" @@ -60,18 +59,19 @@ class GlobalTy { void setPtr(void *P) { Ptr = P; } }; -typedef void *IntPtrT; +using IntPtrT = void *; struct __llvm_profile_data { -#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) Type Name; +#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ + std::remove_const::type Name; #include "llvm/ProfileData/InstrProfData.inc" }; /// PGO profiling data extracted from a GPU device struct GPUProfGlobals { - std::string names; - std::vector> counts; - std::vector<__llvm_profile_data> data; - Triple targetTriple; + SmallVector NamesData; + SmallVector> Counts; + SmallVector<__llvm_profile_data> Data; + Triple TargetTriple; void dump() const; }; diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index 86742d0f77a2fe..7cb672e7b26839 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -19,6 +19,7 @@ #include "llvm/Support/Error.h" #include +#include using namespace llvm; using namespace omp; @@ -177,73 +178,81 @@ bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy &Device, Expected GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, DeviceImageTy &Image) { - GPUProfGlobals profdata; + GPUProfGlobals DeviceProfileData; auto ELFObj = getELFObjectFile(Image); if (!ELFObj) return ELFObj.takeError(); - profdata.targetTriple = ELFObj->makeTriple(); + DeviceProfileData.TargetTriple = ELFObj->makeTriple(); + // Iterate through elf symbols - for (auto &sym : ELFObj->symbols()) { - if (auto name = sym.getName()) { - // Check if given current global is a profiling global based - // on name - if (name->equals(getInstrProfNamesVarName())) { - // Read in profiled function names - std::vector chars(sym.getSize() / sizeof(char), ' '); - GlobalTy NamesGlobal(name->str(), sym.getSize(), chars.data()); - if (auto Err = readGlobalFromDevice(Device, Image, NamesGlobal)) - return Err; - std::string names(chars.begin(), chars.end()); - profdata.names = std::move(names); - } else if (name->starts_with(getInstrProfCountersVarPrefix())) { - // Read global variable profiling counts - std::vector counts(sym.getSize() / sizeof(int64_t), 0); - GlobalTy CountGlobal(name->str(), sym.getSize(), counts.data()); - if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal)) - return Err; - profdata.counts.push_back(std::move(counts)); - } else if (name->starts_with(getInstrProfDataVarPrefix())) { - // Read profiling data for this global variable - __llvm_profile_data data{}; - GlobalTy DataGlobal(name->str(), sym.getSize(), &data); - if (auto Err = readGlobalFromDevice(Device, Image, DataGlobal)) - return Err; - profdata.data.push_back(std::move(data)); - } + for (auto &Sym : ELFObj->symbols()) { + auto NameOrErr = Sym.getName(); + if (!NameOrErr) + return ELFObj.takeError(); + + // Check if given current global is a profiling global based + // on name + if (NameOrErr->equals(getInstrProfNamesVarName())) { + // Read in profiled function names + DeviceProfileData.NamesData = SmallVector(Sym.getSize(), 0); + GlobalTy NamesGlobal(NameOrErr->str(), Sym.getSize(), + DeviceProfileData.NamesData.data()); + if (auto Err = readGlobalFromDevice(Device, Image, NamesGlobal)) + return Err; + } else if (NameOrErr->starts_with(getInstrProfCountersVarPrefix())) { + // Read global variable profiling counts + SmallVector Counts(Sym.getSize() / sizeof(int64_t), 0); + GlobalTy CountGlobal(NameOrErr->str(), Sym.getSize(), Counts.data()); + if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal)) + return Err; + DeviceProfileData.Counts.push_back(std::move(Counts)); + } else if (NameOrErr->starts_with(getInstrProfDataVarPrefix())) { + // Read profiling data for this global variable + __llvm_profile_data Data{}; + GlobalTy DataGlobal(NameOrErr->str(), Sym.getSize(), &Data); + if (auto Err = readGlobalFromDevice(Device, Image, DataGlobal)) + return Err; + DeviceProfileData.Data.push_back(std::move(Data)); } } - return profdata; + return DeviceProfileData; } void GPUProfGlobals::dump() const { - llvm::outs() << "======= GPU Profile =======\nTarget: " << targetTriple.str() + llvm::outs() << "======= GPU Profile =======\nTarget: " << TargetTriple.str() << "\n"; llvm::outs() << "======== Counters =========\n"; - for (const auto &count : counts) { + for (const auto &Count : Counts) { llvm::outs() << "["; - for (size_t i = 0; i < count.size(); i++) { + for (size_t i = 0; i < Count.size(); i++) { if (i == 0) llvm::outs() << " "; - llvm::outs() << count[i] << " "; + llvm::outs() << Count[i] << " "; } llvm::outs() << "]\n"; } llvm::outs() << "========== Data ===========\n"; - for (const auto &d : data) { + for (const auto &ProfData : Data) { llvm::outs() << "{ "; #define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ - llvm::outs() << d.Name << " "; + llvm::outs() << ProfData.Name << " "; #include "llvm/ProfileData/InstrProfData.inc" llvm::outs() << " }\n"; } llvm::outs() << "======== Functions ========\n"; - InstrProfSymtab symtab; - if (Error Err = symtab.create(StringRef(names))) { + std::string s; + s.reserve(NamesData.size()); + for (uint8_t Name : NamesData) { + s.push_back((char)Name); + } + + InstrProfSymtab Symtab; + if (Error Err = Symtab.create(StringRef(s))) { consumeError(std::move(Err)); } - symtab.dumpNames(llvm::outs()); + Symtab.dumpNames(llvm::outs()); llvm::outs() << "===========================\n"; } >From 0c4bbeb54d189c1461affd37853aa86c3e3ca7d8 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 17 Jan 2024 19:59:06 -0600 Subject: [PATCH 10/58] Add GPU PGO test --- .../common/src/GlobalHandler.cpp | 2 +- openmp/libomptarget/test/CMakeLists.txt | 6 +++ openmp/libomptarget/test/lit.cfg | 3 ++ openmp/libomptarget/test/lit.site.cfg.in | 2 +- openmp/libomptarget/test/offloading/pgo1.c | 39 +++++++++++++++++++ 5 files changed, 50 insertions(+), 2 deletions(-) create mode 100644 openmp/libomptarget/test/offloading/pgo1.c diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index 7cb672e7b26839..e5eb653d022287 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -239,7 +239,7 @@ void GPUProfGlobals::dump() const { #define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ llvm::outs() << ProfData.Name << " "; #include "llvm/ProfileData/InstrProfData.inc" - llvm::outs() << " }\n"; + llvm::outs() << "}\n"; } llvm::outs() << "======== Functions ========\n"; diff --git a/openmp/libomptarget/test/CMakeLists.txt b/openmp/libomptarget/test/CMakeLists.txt index a0ba233eaa5726..21233f3e252eb5 100644 --- a/openmp/libomptarget/test/CMakeLists.txt +++ b/openmp/libomptarget/test/CMakeLists.txt @@ -12,6 +12,12 @@ else() set(LIBOMPTARGET_DEBUG False) endif() +if (OPENMP_STANDALONE_BUILD) + set(LIBOMPTARGET_TEST_GPU_PGO False) +else() + set(LIBOMPTARGET_TEST_GPU_PGO True) +endif() + # Replace the space from user's input with ";" in case that CMake add escape # char into the lit command. string(REPLACE " " ";" LIBOMPTARGET_LIT_ARG_LIST "${LIBOMPTARGET_LIT_ARGS}") diff --git a/openmp/libomptarget/test/lit.cfg b/openmp/libomptarget/test/lit.cfg index 19c5e5c4572227..49743f9fed7f29 100644 --- a/openmp/libomptarget/test/lit.cfg +++ b/openmp/libomptarget/test/lit.cfg @@ -104,6 +104,9 @@ config.available_features.add(config.libomptarget_current_target) if config.libomptarget_has_libc: config.available_features.add('libc') +if config.libomptarget_test_pgo: + config.available_features.add('pgo') + # Determine whether the test system supports unified memory. # For CUDA, this is the case with compute capability 70 (Volta) or higher. # For all other targets, we currently assume it is. diff --git a/openmp/libomptarget/test/lit.site.cfg.in b/openmp/libomptarget/test/lit.site.cfg.in index 2d638118838727..494d1636af304a 100644 --- a/openmp/libomptarget/test/lit.site.cfg.in +++ b/openmp/libomptarget/test/lit.site.cfg.in @@ -25,6 +25,6 @@ config.libomptarget_not = "@OPENMP_NOT_EXECUTABLE@" config.libomptarget_debug = @LIBOMPTARGET_DEBUG@ config.has_libomptarget_ompt = @LIBOMPTARGET_OMPT_SUPPORT@ config.libomptarget_has_libc = @LIBOMPTARGET_GPU_LIBC_SUPPORT@ - +config.libomptarget_test_pgo = @LIBOMPTARGET_TEST_GPU_PGO@ # Let the main config do the real work. lit_config.load_config(config, "@CMAKE_CURRENT_SOURCE_DIR@/lit.cfg") diff --git a/openmp/libomptarget/test/offloading/pgo1.c b/openmp/libomptarget/test/offloading/pgo1.c new file mode 100644 index 00000000000000..ca8a6f502a06aa --- /dev/null +++ b/openmp/libomptarget/test/offloading/pgo1.c @@ -0,0 +1,39 @@ +// RUN: %libomptarget-compile-generic -fprofile-instr-generate -Xclang "-fprofile-instrument=clang" +// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic + +// UNSUPPORTED: x86_64-pc-linux-gnu +// UNSUPPORTED: x86_64-pc-linux-gnu-LTO +// UNSUPPORTED: aarch64-unknown-linux-gnu +// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +// REQUIRES: pgo + +#ifdef _OPENMP +#include +#endif + +int test1(int a) { return a / 2; } +int test2(int a) { return a * 2; } + +int main() { + int m = 2; +#pragma omp target + for (int i = 0; i < 10; i++) { + m = test1(m); + for (int j = 0; j < 2; j++) { + m = test2(m); + } + } +} + +// CHECK: ======== Counters ========= +// CHECK-NEXT: [ 0 11 20 ] +// CHECK-NEXT: [ 10 ] +// CHECK-NEXT: [ 20 ] +// CHECK-NEXT: ========== Data =========== +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: ======== Functions ======== +// CHECK-NEXT: pgo1.c:__omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} +// CHECK-NEXT: test1 +// CHECK-NEXT: test2 >From c7ae2a74daa93b05058fcc9bba64e0734359362c Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 17 Jan 2024 23:12:27 -0600 Subject: [PATCH 11/58] Fix PGO test formatting --- openmp/libomptarget/test/offloading/pgo1.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/openmp/libomptarget/test/offloading/pgo1.c b/openmp/libomptarget/test/offloading/pgo1.c index ca8a6f502a06aa..389be19b670d76 100644 --- a/openmp/libomptarget/test/offloading/pgo1.c +++ b/openmp/libomptarget/test/offloading/pgo1.c @@ -1,4 +1,5 @@ -// RUN: %libomptarget-compile-generic -fprofile-instr-generate -Xclang "-fprofile-instrument=clang" +// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ +// RUN: -Xclang "-fprofile-instrument=clang" // RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic // UNSUPPORTED: x86_64-pc-linux-gnu @@ -30,9 +31,18 @@ int main() { // CHECK-NEXT: [ 10 ] // CHECK-NEXT: [ 20 ] // CHECK-NEXT: ========== Data =========== -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } // CHECK-NEXT: ======== Functions ======== // CHECK-NEXT: pgo1.c:__omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} // CHECK-NEXT: test1 >From 8bb22072914bbb830e2788d117aedd0e0bab66ff Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 18 Jan 2024 23:15:55 -0600 Subject: [PATCH 12/58] Refactor visibility logic --- llvm/lib/ProfileData/InstrProf.cpp | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp index 511571a3eed9b0..708ea63fd95e04 100644 --- a/llvm/lib/ProfileData/InstrProf.cpp +++ b/llvm/lib/ProfileData/InstrProf.cpp @@ -422,6 +422,16 @@ bool isGPUProfTarget(const Module &M) { return Triple.isAMDGPU() || Triple.isNVPTX(); } +void setPGOFuncVisibility(Module &M, GlobalVariable *FuncNameVar) { + // If the target is a GPU, make the symbol protected so it can + // be read from the host device + if (isGPUProfTarget(M)) + FuncNameVar->setVisibility(GlobalValue::ProtectedVisibility); + // Hide the symbol so that we correctly get a copy for each executable. + else if (!GlobalValue::isLocalLinkage(FuncNameVar->getLinkage())) + FuncNameVar->setVisibility(GlobalValue::HiddenVisibility); +} + GlobalVariable *createPGOFuncNameVar(Module &M, GlobalValue::LinkageTypes Linkage, StringRef PGOFuncName) { @@ -445,14 +455,7 @@ GlobalVariable *createPGOFuncNameVar(Module &M, new GlobalVariable(M, Value->getType(), true, Linkage, Value, getPGOFuncNameVarName(PGOFuncName, Linkage)); - // If the target is a GPU, make the symbol protected so it can - // be read from the host device - if (isGPUProfTarget(M)) - FuncNameVar->setVisibility(GlobalValue::ProtectedVisibility); - // Hide the symbol so that we correctly get a copy for each executable. - else if (!GlobalValue::isLocalLinkage(FuncNameVar->getLinkage())) - FuncNameVar->setVisibility(GlobalValue::HiddenVisibility); - + setPGOFuncVisibility(M, FuncNameVar); return FuncNameVar; } >From 9f13943f64cb16162e44902d54de53a9b1229179 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 23 Jan 2024 18:33:58 -0600 Subject: [PATCH 13/58] Add LLVM instrumentation support This PR formerly only supported -fprofile-instrument=clang. This commit adds support for -fprofile-instrument=llvm --- .../Instrumentation/PGOInstrumentation.cpp | 12 +++- openmp/libomptarget/test/offloading/pgo1.c | 72 +++++++++++++------ 2 files changed, 59 insertions(+), 25 deletions(-) diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index c20fc942eaf0d5..bbc8da78fd7baf 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -862,6 +862,10 @@ static void instrumentOneFunc( auto Name = FuncInfo.FuncNameVar; auto CFGHash = ConstantInt::get(Type::getInt64Ty(M->getContext()), FuncInfo.FunctionHash); + // Make sure that pointer to global is passed in with zero addrspace + // This is relevant during GPU profiling + auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( + Name, llvm::PointerType::getUnqual(M->getContext())); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); IRBuilder<> Builder(&EntryBB, EntryBB.getFirstInsertionPt()); @@ -869,7 +873,7 @@ static void instrumentOneFunc( // i32 ) Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_cover), - {Name, CFGHash, Builder.getInt32(1), Builder.getInt32(0)}); + {NormalizedPtr, CFGHash, Builder.getInt32(1), Builder.getInt32(0)}); return; } @@ -887,7 +891,8 @@ static void instrumentOneFunc( // i32 ) Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_timestamp), - {Name, CFGHash, Builder.getInt32(NumCounters), Builder.getInt32(I)}); + {NormalizedPtr, CFGHash, Builder.getInt32(NumCounters), + Builder.getInt32(I)}); I += PGOBlockCoverage ? 8 : 1; } @@ -901,7 +906,8 @@ static void instrumentOneFunc( Intrinsic::getDeclaration(M, PGOBlockCoverage ? Intrinsic::instrprof_cover : Intrinsic::instrprof_increment), - {Name, CFGHash, Builder.getInt32(NumCounters), Builder.getInt32(I++)}); + {NormalizedPtr, CFGHash, Builder.getInt32(NumCounters), + Builder.getInt32(I++)}); } // Now instrument select instructions: diff --git a/openmp/libomptarget/test/offloading/pgo1.c b/openmp/libomptarget/test/offloading/pgo1.c index 389be19b670d76..d95793b508dcfc 100644 --- a/openmp/libomptarget/test/offloading/pgo1.c +++ b/openmp/libomptarget/test/offloading/pgo1.c @@ -1,6 +1,11 @@ // RUN: %libomptarget-compile-generic -fprofile-instr-generate \ // RUN: -Xclang "-fprofile-instrument=clang" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic +// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ +// RUN: --check-prefix="CLANG-PGO" +// RUN: %libomptarget-compile-generic -fprofile-generate \ +// RUN: -Xclang "-fprofile-instrument=llvm" +// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ +// RUN: --check-prefix="LLVM-PGO" // UNSUPPORTED: x86_64-pc-linux-gnu // UNSUPPORTED: x86_64-pc-linux-gnu-LTO @@ -26,24 +31,47 @@ int main() { } } -// CHECK: ======== Counters ========= -// CHECK-NEXT: [ 0 11 20 ] -// CHECK-NEXT: [ 10 ] -// CHECK-NEXT: [ 20 ] -// CHECK-NEXT: ========== Data =========== -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: ======== Functions ======== -// CHECK-NEXT: pgo1.c:__omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} -// CHECK-NEXT: test1 -// CHECK-NEXT: test2 +// CLANG-PGO: ======== Counters ========= +// CLANG-PGO-NEXT: [ 0 11 20 ] +// CLANG-PGO-NEXT: [ 10 ] +// CLANG-PGO-NEXT: [ 20 ] +// CLANG-PGO-NEXT: ========== Data =========== +// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CLANG-PGO-NEXT: ======== Functions ======== +// CLANG-PGO-NEXT: pgo1.c: +// CLANG-PGO-SAME: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} +// CLANG-PGO-NEXT: test1 +// CLANG-PGO-NEXT: test2 + +// LLVM-PGO: ======== Counters ========= +// LLVM-PGO-NEXT: [ 20 ] +// LLVM-PGO-NEXT: [ 10 ] +// LLVM-PGO-NEXT: [ 20 10 1 1 ] +// LLVM-PGO-NEXT: ========== Data =========== +// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// LLVM-PGO-NEXT: ======== Functions ======== +// LLVM-PGO-NEXT: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} +// LLVM-PGO-NEXT: test1 +// LLVM-PGO-NEXT: test2 >From 0606f0dd1b32ef9ebe138bbc964b3921e22d95d1 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 14 Feb 2024 01:46:55 -0600 Subject: [PATCH 14/58] Use explicit addrspace instead of unqual --- clang/lib/CodeGen/CodeGenPGO.cpp | 2 +- llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index e084dda879cbc0..4c75a01222d304 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1103,7 +1103,7 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - FuncNameVar, llvm::PointerType::getUnqual(CGM.getLLVMContext())); + FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), Builder.getInt32(NumRegionCounters), diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index bbc8da78fd7baf..c63b3e4ecf786a 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -865,7 +865,7 @@ static void instrumentOneFunc( // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - Name, llvm::PointerType::getUnqual(M->getContext())); + Name, llvm::PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); IRBuilder<> Builder(&EntryBB, EntryBB.getFirstInsertionPt()); >From c1f9be321678766525141214aaab74636cafbc2c Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 15 Feb 2024 19:10:09 -0600 Subject: [PATCH 15/58] Remove redundant namespaces --- .../Instrumentation/PGOInstrumentation.cpp | 4 +-- .../common/src/GlobalHandler.cpp | 26 +++++++++---------- 2 files changed, 15 insertions(+), 15 deletions(-) diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index c63b3e4ecf786a..3058e577738fda 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -864,8 +864,8 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - Name, llvm::PointerType::get(M->getContext(), 0)); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); IRBuilder<> Builder(&EntryBB, EntryBB.getFirstInsertionPt()); diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index e5eb653d022287..ae270c60804d26 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -219,30 +219,30 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, } void GPUProfGlobals::dump() const { - llvm::outs() << "======= GPU Profile =======\nTarget: " << TargetTriple.str() + outs() << "======= GPU Profile =======\nTarget: " << TargetTriple.str() << "\n"; - llvm::outs() << "======== Counters =========\n"; + outs() << "======== Counters =========\n"; for (const auto &Count : Counts) { - llvm::outs() << "["; + outs() << "["; for (size_t i = 0; i < Count.size(); i++) { if (i == 0) - llvm::outs() << " "; - llvm::outs() << Count[i] << " "; + outs() << " "; + outs() << Count[i] << " "; } - llvm::outs() << "]\n"; + outs() << "]\n"; } - llvm::outs() << "========== Data ===========\n"; + outs() << "========== Data ===========\n"; for (const auto &ProfData : Data) { - llvm::outs() << "{ "; + outs() << "{ "; #define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ - llvm::outs() << ProfData.Name << " "; + outs() << ProfData.Name << " "; #include "llvm/ProfileData/InstrProfData.inc" - llvm::outs() << "}\n"; + outs() << "}\n"; } - llvm::outs() << "======== Functions ========\n"; + outs() << "======== Functions ========\n"; std::string s; s.reserve(NamesData.size()); for (uint8_t Name : NamesData) { @@ -253,6 +253,6 @@ void GPUProfGlobals::dump() const { if (Error Err = Symtab.create(StringRef(s))) { consumeError(std::move(Err)); } - Symtab.dumpNames(llvm::outs()); - llvm::outs() << "===========================\n"; + Symtab.dumpNames(outs()); + outs() << "===========================\n"; } >From 6a3ae407e69e7524f0f808329c534f8352ee1779 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 15 Feb 2024 19:15:15 -0600 Subject: [PATCH 16/58] Clang format --- .../libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index ae270c60804d26..1fce2448922624 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -220,7 +220,7 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, void GPUProfGlobals::dump() const { outs() << "======= GPU Profile =======\nTarget: " << TargetTriple.str() - << "\n"; + << "\n"; outs() << "======== Counters =========\n"; for (const auto &Count : Counts) { >From 6866862d459e3c3fa65fae8ae639ddc3ff735252 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 16 Feb 2024 13:13:39 -0600 Subject: [PATCH 17/58] Use getAddrSpaceCast Replace getPointerBitCastOrAddrSpaceCast with getAddrSpaceCast and allow no-op getAddrSpaceCast calls when types are identical --- clang/lib/CodeGen/CodeGenPGO.cpp | 2 +- llvm/lib/IR/Constants.cpp | 4 ++++ llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | 2 +- 3 files changed, 6 insertions(+), 2 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index 8f52018445d2b0..baceeba8380ddb 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1099,7 +1099,7 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( + auto *NormalizedPtr = llvm::ConstantExpr::getAddrSpaceCast( FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp index a38b912164b130..2d89c5bbd4a4c2 100644 --- a/llvm/lib/IR/Constants.cpp +++ b/llvm/lib/IR/Constants.cpp @@ -2067,6 +2067,10 @@ Constant *ConstantExpr::getBitCast(Constant *C, Type *DstTy, Constant *ConstantExpr::getAddrSpaceCast(Constant *C, Type *DstTy, bool OnlyIfReduced) { + // Skip cast if types are identical + if (C->getType() == DstTy) + return C; + assert(CastInst::castIsValid(Instruction::AddrSpaceCast, C, DstTy) && "Invalid constantexpr addrspacecast!"); return getFoldedCast(Instruction::AddrSpaceCast, C, DstTy, OnlyIfReduced); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index 3058e577738fda..c0be71aa4cc004 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -864,7 +864,7 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + auto *NormalizedPtr = ConstantExpr::getAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); >From 62a5ee1c75545571f81d9edd22e19e9ef7cff69f Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 27 Feb 2024 14:53:51 -0600 Subject: [PATCH 18/58] Revert "Use getAddrSpaceCast" This reverts commit 6866862d459e3c3fa65fae8ae639ddc3ff735252. --- clang/lib/CodeGen/CodeGenPGO.cpp | 2 +- llvm/lib/IR/Constants.cpp | 4 ---- llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | 2 +- 3 files changed, 2 insertions(+), 6 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index baceeba8380ddb..8f52018445d2b0 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1099,7 +1099,7 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getAddrSpaceCast( + auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp index 2d89c5bbd4a4c2..a38b912164b130 100644 --- a/llvm/lib/IR/Constants.cpp +++ b/llvm/lib/IR/Constants.cpp @@ -2067,10 +2067,6 @@ Constant *ConstantExpr::getBitCast(Constant *C, Type *DstTy, Constant *ConstantExpr::getAddrSpaceCast(Constant *C, Type *DstTy, bool OnlyIfReduced) { - // Skip cast if types are identical - if (C->getType() == DstTy) - return C; - assert(CastInst::castIsValid(Instruction::AddrSpaceCast, C, DstTy) && "Invalid constantexpr addrspacecast!"); return getFoldedCast(Instruction::AddrSpaceCast, C, DstTy, OnlyIfReduced); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index c0be71aa4cc004..3058e577738fda 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -864,7 +864,7 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = ConstantExpr::getAddrSpaceCast( + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); >From 052394fa28c923d130bf73a07b965a9751467302 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 27 Feb 2024 15:34:34 -0600 Subject: [PATCH 19/58] Revert "Use getAddrSpaceCast" This reverts commit 6866862d459e3c3fa65fae8ae639ddc3ff735252. --- clang/lib/CodeGen/CodeGenPGO.cpp | 2 +- llvm/lib/IR/Constants.cpp | 4 ---- llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | 2 +- 3 files changed, 2 insertions(+), 6 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index baceeba8380ddb..8f52018445d2b0 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1099,7 +1099,7 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getAddrSpaceCast( + auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp index 2d89c5bbd4a4c2..a38b912164b130 100644 --- a/llvm/lib/IR/Constants.cpp +++ b/llvm/lib/IR/Constants.cpp @@ -2067,10 +2067,6 @@ Constant *ConstantExpr::getBitCast(Constant *C, Type *DstTy, Constant *ConstantExpr::getAddrSpaceCast(Constant *C, Type *DstTy, bool OnlyIfReduced) { - // Skip cast if types are identical - if (C->getType() == DstTy) - return C; - assert(CastInst::castIsValid(Instruction::AddrSpaceCast, C, DstTy) && "Invalid constantexpr addrspacecast!"); return getFoldedCast(Instruction::AddrSpaceCast, C, DstTy, OnlyIfReduced); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index c0be71aa4cc004..3058e577738fda 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -864,7 +864,7 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = ConstantExpr::getAddrSpaceCast( + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); >From 612d5a5f6966a77e82e5591f5aea475fbf886e55 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 1 Mar 2024 02:04:00 -0600 Subject: [PATCH 20/58] Write PGO TODO: Fix tests --- compiler-rt/lib/profile/InstrProfiling.h | 11 ++ compiler-rt/lib/profile/InstrProfilingFile.c | 148 +++++++++++++++--- .../common/include/GlobalHandler.h | 14 +- .../common/src/GlobalHandler.cpp | 57 +++++-- .../common/src/PluginInterface.cpp | 6 +- 5 files changed, 200 insertions(+), 36 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfiling.h b/compiler-rt/lib/profile/InstrProfiling.h index 01239083369187..937acbd417de46 100644 --- a/compiler-rt/lib/profile/InstrProfiling.h +++ b/compiler-rt/lib/profile/InstrProfiling.h @@ -275,6 +275,17 @@ void __llvm_profile_get_padding_sizes_for_counters( */ void __llvm_profile_set_dumped(); +/*! + * \brief Write custom target-specific profiling data to a seperate file. + * Used by libomptarget for GPU PGO. + */ +int __llvm_write_custom_profile(const char *Target, + const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, + const char *CountersBegin, + const char *CountersEnd, const char *NamesBegin, + const char *NamesEnd); + /*! * This variable is defined in InstrProfilingRuntime.cpp as a hidden * symbol. Its main purpose is to enable profile runtime user to diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index f3b457d786e6bd..4fc401bb9bebf5 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -502,27 +502,15 @@ static FILE *getFileObject(const char *OutputName) { return fopen(OutputName, "ab"); } -/* Write profile data to file \c OutputName. */ -static int writeFile(const char *OutputName) { - int RetVal; - FILE *OutputFile; - - int MergeDone = 0; +/* Get file object and merge if applicable */ +static FILE *getMergeFileObject(const char *OutputName, int *MergeDone) { VPMergeHook = &lprofMergeValueProfData; if (doMerging()) - OutputFile = openFileForMerging(OutputName, &MergeDone); - else - OutputFile = getFileObject(OutputName); - - if (!OutputFile) - return -1; - - FreeHook = &free; - setupIOBuffer(); - ProfDataWriter fileWriter; - initFileWriter(&fileWriter, OutputFile); - RetVal = lprofWriteData(&fileWriter, lprofGetVPDataReader(), MergeDone); + return openFileForMerging(OutputName, MergeDone); + return getFileObject(OutputName); +} +static void closeFileObject(FILE *OutputFile) { if (OutputFile == getProfileFile()) { fflush(OutputFile); if (doMerging() && !__llvm_profile_is_continuous_mode_enabled()) { @@ -531,7 +519,23 @@ static int writeFile(const char *OutputName) { } else { fclose(OutputFile); } +} + +/* Write profile data to file \c OutputName. */ +static int writeFile(const char *OutputName) { + int RetVal, MergeDone = 0; + FILE *OutputFile = getMergeFileObject(OutputName, &MergeDone); + + if (!OutputFile) + return -1; + + FreeHook = &free; + setupIOBuffer(); + ProfDataWriter fileWriter; + initFileWriter(&fileWriter, OutputFile); + RetVal = lprofWriteData(&fileWriter, lprofGetVPDataReader(), MergeDone); + closeFileObject(OutputFile); return RetVal; } @@ -558,10 +562,16 @@ static int writeOrderFile(const char *OutputName) { #define LPROF_INIT_ONCE_ENV "__LLVM_PROFILE_RT_INIT_ONCE" +static void forceTruncateFile(const char *Filename) { + FILE *File = fopen(Filename, "w"); + if (!File) + return; + fclose(File); +} + static void truncateCurrentFile(void) { const char *Filename; char *FilenameBuf; - FILE *File; int Length; Length = getCurFilenameLength(); @@ -591,10 +601,7 @@ static void truncateCurrentFile(void) { return; /* Truncate the file. Later we'll reopen and append. */ - File = fopen(Filename, "w"); - if (!File) - return; - fclose(File); + forceTruncateFile(Filename); } /* Write a partial profile to \p Filename, which is required to be backed by @@ -1271,4 +1278,99 @@ COMPILER_RT_VISIBILITY int __llvm_profile_set_file_object(FILE *File, return 0; } +int __llvm_write_custom_profile(const char *Target, + const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, + const char *CountersBegin, + const char *CountersEnd, const char *NamesBegin, + const char *NamesEnd) { + int ReturnValue = 0, FilenameLength, TargetLength, MergeDone; + char *FilenameBuf, *TargetFilename; + const char *Filename; + + /* Save old profile data */ + FILE *oldFile = getProfileFile(); + + // Temporarily suspend getting SIGKILL when the parent exits. + int PDeathSig = lprofSuspendSigKill(); + + if (lprofProfileDumped() || __llvm_profile_is_continuous_mode_enabled()) { + PROF_NOTE("Profile data not written to file: %s.\n", "already written"); + if (PDeathSig == 1) + lprofRestoreSigKill(); + return 0; + } + + /* Get current filename */ + FilenameLength = getCurFilenameLength(); + FilenameBuf = (char *)COMPILER_RT_ALLOCA(FilenameLength + 1); + Filename = getCurFilename(FilenameBuf, 0); + + /* Check the filename. */ + if (!Filename) { + PROF_ERR("Failed to write file : %s\n", "Filename not set"); + if (PDeathSig == 1) + lprofRestoreSigKill(); + return -1; + } + + /* Allocate new space for our target-specific PGO filename */ + TargetLength = strlen(Target); + TargetFilename = + (char *)COMPILER_RT_ALLOCA(FilenameLength + TargetLength + 2); + + /* Prepend "TARGET." to current filename */ + memcpy(TargetFilename, Target, TargetLength); + TargetFilename[TargetLength] = '.'; + memcpy(TargetFilename, Target, TargetLength); + memcpy(TargetFilename + 1 + TargetLength, Filename, FilenameLength); + TargetFilename[FilenameLength + 1 + TargetLength] = 0; + + /* Check if there is llvm/runtime version mismatch. */ + if (GET_VERSION(__llvm_profile_get_version()) != INSTR_PROF_RAW_VERSION) { + PROF_ERR("Runtime and instrumentation version mismatch : " + "expected %d, but get %d\n", + INSTR_PROF_RAW_VERSION, + (int)GET_VERSION(__llvm_profile_get_version())); + if (PDeathSig == 1) + lprofRestoreSigKill(); + return -1; + } + + /* Clean old target file */ + forceTruncateFile(TargetFilename); + + /* Open target-specific PGO file */ + MergeDone = 0; + FILE *OutputFile = getMergeFileObject(TargetFilename, &MergeDone); + + if (!OutputFile) { + PROF_ERR("Failed to open file : %s\n", TargetFilename); + if (PDeathSig == 1) + lprofRestoreSigKill(); + return -1; + } + + FreeHook = &free; + setupIOBuffer(); + ProfDataWriter fileWriter; + initFileWriter(&fileWriter, OutputFile); + + /* Write custom data to the file */ + ReturnValue = lprofWriteDataImpl( + &fileWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, + lprofGetVPDataReader(), NamesBegin, NamesEnd, MergeDone); + + closeFileObject(OutputFile); + + // Restore SIGKILL. + if (PDeathSig == 1) + lprofRestoreSigKill(); + + /* Restore old profiling file */ + setProfileFile(oldFile); + + return ReturnValue; +} + #endif diff --git a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h index f5a15ca11bfcda..af0cd4dcdf5dcf 100644 --- a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h +++ b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h @@ -63,14 +63,24 @@ struct __llvm_profile_data { #include "llvm/ProfileData/InstrProfData.inc" }; +extern "C" { +extern int __attribute__((weak)) +__llvm_write_custom_profile(const char *Target, + const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, + const char *CountersBegin, const char *CountersEnd, + const char *NamesBegin, const char *NamesEnd); +} + /// PGO profiling data extracted from a GPU device struct GPUProfGlobals { - SmallVector NamesData; - SmallVector> Counts; + SmallVector Counts; SmallVector<__llvm_profile_data> Data; + SmallVector NamesData; Triple TargetTriple; void dump() const; + Error write() const; }; /// Subclass of GlobalTy that holds the memory for a global of \p Ty. diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index 1fce2448922624..2f16b6e3c139e9 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -205,7 +205,7 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, GlobalTy CountGlobal(NameOrErr->str(), Sym.getSize(), Counts.data()); if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal)) return Err; - DeviceProfileData.Counts.push_back(std::move(Counts)); + DeviceProfileData.Counts.append(std::move(Counts)); } else if (NameOrErr->starts_with(getInstrProfDataVarPrefix())) { // Read profiling data for this global variable __llvm_profile_data Data{}; @@ -223,15 +223,14 @@ void GPUProfGlobals::dump() const { << "\n"; outs() << "======== Counters =========\n"; - for (const auto &Count : Counts) { - outs() << "["; - for (size_t i = 0; i < Count.size(); i++) { - if (i == 0) - outs() << " "; - outs() << Count[i] << " "; - } - outs() << "]\n"; + for (size_t i = 0; i < Counts.size(); i++) { + if (i > 0 && i % 10 == 0) + outs() << "\n"; + else if (i != 0) + outs() << " "; + outs() << Counts[i]; } + outs() << "\n"; outs() << "========== Data ===========\n"; for (const auto &ProfData : Data) { @@ -256,3 +255,43 @@ void GPUProfGlobals::dump() const { Symtab.dumpNames(outs()); outs() << "===========================\n"; } + +Error GPUProfGlobals::write() const { + if (!__llvm_write_custom_profile) + return Plugin::error("Could not find symbol __llvm_write_custom_profile. " + "The compiler-rt profiling library must be linked for " + "GPU PGO to work."); + + size_t DataSize = Data.size() * sizeof(__llvm_profile_data), + CountsSize = Counts.size() * sizeof(int64_t); + __llvm_profile_data *DataBegin, *DataEnd; + char *CountersBegin, *CountersEnd, *NamesBegin, *NamesEnd; + + // Initialize array of contiguous data. We need to make sure each section is + // contiguous so that the PGO library can compute deltas properly + SmallVector ContiguousData(NamesData.size() + DataSize + CountsSize); + + // Compute region pointers + DataBegin = (__llvm_profile_data *)(ContiguousData.data() + CountsSize); + DataEnd = + (__llvm_profile_data *)(ContiguousData.data() + CountsSize + DataSize); + CountersBegin = (char *)ContiguousData.data(); + CountersEnd = (char *)(ContiguousData.data() + CountsSize); + NamesBegin = (char *)(ContiguousData.data() + CountsSize + DataSize); + NamesEnd = (char *)(ContiguousData.data() + CountsSize + DataSize + + NamesData.size()); + + // Copy data to contiguous buffer + memcpy(DataBegin, Data.data(), DataSize); + memcpy(CountersBegin, Counts.data(), CountsSize); + memcpy(NamesBegin, NamesData.data(), NamesData.size()); + + // Invoke compiler-rt entrypoint + int result = __llvm_write_custom_profile(TargetTriple.str().c_str(), + DataBegin, DataEnd, CountersBegin, + CountersEnd, NamesBegin, NamesEnd); + if (result != 0) + return Plugin::error("Error writing GPU PGO data to file"); + + return Plugin::success(); +} diff --git a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp index 1ea93795ce8ce4..d5e6b6128152dc 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp @@ -837,8 +837,10 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { if (!ProfOrErr) return ProfOrErr.takeError(); - // TODO: write data to profiling file - ProfOrErr->dump(); + // Write data to profiling file + if (auto Err = ProfOrErr->write()) { + consumeError(std::move(Err)); + } } // Delete the memory manager before deinitializing the device. Otherwise, >From b8c916305acf08c0bd2d51b81875be5e8fc59ff3 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 13 Mar 2024 20:05:32 -0500 Subject: [PATCH 21/58] Fix tests --- .../plugins-nextgen/common/src/PluginInterface.cpp | 3 +++ openmp/libomptarget/test/offloading/pgo1.c | 8 ++------ 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp index d5e6b6128152dc..2359ad28a25b04 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp @@ -837,6 +837,9 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { if (!ProfOrErr) return ProfOrErr.takeError(); + // Dump out profdata + ProfOrErr->dump(); + // Write data to profiling file if (auto Err = ProfOrErr->write()) { consumeError(std::move(Err)); diff --git a/openmp/libomptarget/test/offloading/pgo1.c b/openmp/libomptarget/test/offloading/pgo1.c index d95793b508dcfc..79e93d0f10827f 100644 --- a/openmp/libomptarget/test/offloading/pgo1.c +++ b/openmp/libomptarget/test/offloading/pgo1.c @@ -32,9 +32,7 @@ int main() { } // CLANG-PGO: ======== Counters ========= -// CLANG-PGO-NEXT: [ 0 11 20 ] -// CLANG-PGO-NEXT: [ 10 ] -// CLANG-PGO-NEXT: [ 20 ] +// CLANG-PGO-NEXT: 0 11 20 10 20 // CLANG-PGO-NEXT: ========== Data =========== // CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} // CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} @@ -55,9 +53,7 @@ int main() { // CLANG-PGO-NEXT: test2 // LLVM-PGO: ======== Counters ========= -// LLVM-PGO-NEXT: [ 20 ] -// LLVM-PGO-NEXT: [ 10 ] -// LLVM-PGO-NEXT: [ 20 10 1 1 ] +// LLVM-PGO-NEXT: 20 10 20 10 1 1 // LLVM-PGO-NEXT: ========== Data =========== // LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} // LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} >From 4568c4244d11010aacf9f1fe20bb1197008b057f Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 13 Mar 2024 21:40:20 -0500 Subject: [PATCH 22/58] Fix arguments --- compiler-rt/lib/profile/InstrProfilingFile.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index 466bfe480543bc..6570bc0d74caa1 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1360,10 +1360,10 @@ int __llvm_write_custom_profile(const char *Target, initFileWriter(&fileWriter, OutputFile); /* Write custom data to the file */ - ReturnValue = lprofWriteDataImpl( - &fileWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, - lprofGetVPDataReader(), NamesBegin, NamesEnd, MergeDone); - + ReturnValue = lprofWriteDataImpl(&fileWriter, DataBegin, DataEnd, + CountersBegin, CountersEnd, NULL, NULL, + lprofGetVPDataReader(), NamesBegin, NamesEnd, + NULL, NULL, NULL, NULL, MergeDone); closeFileObject(OutputFile); // Restore SIGKILL. >From 1fc4cb9c01f251432f4a6748e69b1d8cf74cc4fb Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 18 Mar 2024 20:05:52 -0500 Subject: [PATCH 23/58] Add GPU prof flags --- clang/include/clang/Driver/Options.td | 6 +++ clang/lib/Driver/ToolChain.cpp | 69 +++++++++++++-------------- clang/lib/Driver/ToolChains/Clang.cpp | 39 +++++++++++++-- 3 files changed, 74 insertions(+), 40 deletions(-) diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 4b1fcf1db1ad09..aab445906fa347 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -1716,6 +1716,9 @@ defm debug_info_for_profiling : BoolFOption<"debug-info-for-profiling", def fprofile_instr_generate : Flag<["-"], "fprofile-instr-generate">, Group, Visibility<[ClangOption, CLOption]>, HelpText<"Generate instrumented code to collect execution counts into default.profraw file (overridden by '=' form of option or LLVM_PROFILE_FILE env var)">; +def fprofile_instr_generate_gpu : Flag<["-"], "fprofile-instr-generate-gpu">, + Group, Visibility<[ClangOption, CLOption]>, + HelpText<"Generate instrumented GPU device code to collect execution counts into GPU_TARGET.default.profraw (overridden by LLVM_PROFILE_FILE env var)">; def fprofile_instr_generate_EQ : Joined<["-"], "fprofile-instr-generate=">, Group, Visibility<[ClangOption, CLOption]>, MetaVarName<"">, HelpText<"Generate instrumented code to collect execution counts into (overridden by LLVM_PROFILE_FILE env var)">; @@ -1744,6 +1747,9 @@ defm mcdc_coverage : BoolFOption<"coverage-mcdc", def fprofile_generate : Flag<["-"], "fprofile-generate">, Group, Visibility<[ClangOption, CLOption]>, HelpText<"Generate instrumented code to collect execution counts into default.profraw (overridden by LLVM_PROFILE_FILE env var)">; +def fprofile_generate_gpu : Flag<["-"], "fprofile-generate-gpu">, + Group, Visibility<[ClangOption, CLOption]>, + HelpText<"Generate instrumented GPU device code to collect execution counts into GPU_TARGET.default.profraw (overridden by LLVM_PROFILE_FILE env var)">; def fprofile_generate_EQ : Joined<["-"], "fprofile-generate=">, Group, Visibility<[ClangOption, CLOption]>, MetaVarName<"">, diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp index 03450fc0f57b93..f4c8aafbbd0e64 100644 --- a/clang/lib/Driver/ToolChain.cpp +++ b/clang/lib/Driver/ToolChain.cpp @@ -184,10 +184,9 @@ bool ToolChain::defaultToIEEELongDouble() const { return PPC_LINUX_DEFAULT_IEEELONGDOUBLE && getTriple().isOSLinux(); } -static void getAArch64MultilibFlags(const Driver &D, - const llvm::Triple &Triple, - const llvm::opt::ArgList &Args, - Multilib::flags_list &Result) { +static void getAArch64MultilibFlags(const Driver &D, const llvm::Triple &Triple, + const llvm::opt::ArgList &Args, + Multilib::flags_list &Result) { std::vector Features; tools::aarch64::getAArch64TargetFeatures(D, Triple, Args, Features, false); const auto UnifiedFeatures = tools::unifyTargetFeatures(Features); @@ -209,10 +208,9 @@ static void getAArch64MultilibFlags(const Driver &D, Result.push_back(llvm::join(MArch, "+")); } -static void getARMMultilibFlags(const Driver &D, - const llvm::Triple &Triple, - const llvm::opt::ArgList &Args, - Multilib::flags_list &Result) { +static void getARMMultilibFlags(const Driver &D, const llvm::Triple &Triple, + const llvm::opt::ArgList &Args, + Multilib::flags_list &Result) { std::vector Features; llvm::ARM::FPUKind FPUKind = tools::arm::getARMTargetFeatures( D, Triple, Args, Features, false /*ForAs*/, true /*ForMultilib*/); @@ -303,7 +301,7 @@ ToolChain::getSanitizerArgs(const llvm::opt::ArgList &JobArgs) const { return SanArgs; } -const XRayArgs& ToolChain::getXRayArgs() const { +const XRayArgs &ToolChain::getXRayArgs() const { if (!XRayArguments) XRayArguments.reset(new XRayArgs(*this, Args)); return *XRayArguments; @@ -394,8 +392,7 @@ static const DriverSuffix *parseDriverSuffix(StringRef ProgName, size_t &Pos) { return DS; } -ParsedClangName -ToolChain::getTargetAndModeFromProgramName(StringRef PN) { +ParsedClangName ToolChain::getTargetAndModeFromProgramName(StringRef PN) { std::string ProgName = normalizeProgramName(PN); size_t SuffixPos; const DriverSuffix *DS = parseDriverSuffix(ProgName, SuffixPos); @@ -406,8 +403,8 @@ ToolChain::getTargetAndModeFromProgramName(StringRef PN) { size_t LastComponent = ProgName.rfind('-', SuffixPos); if (LastComponent == std::string::npos) return ParsedClangName(ProgName.substr(0, SuffixEnd), DS->ModeFlag); - std::string ModeSuffix = ProgName.substr(LastComponent + 1, - SuffixEnd - LastComponent - 1); + std::string ModeSuffix = + ProgName.substr(LastComponent + 1, SuffixEnd - LastComponent - 1); // Infer target from the prefix. StringRef Prefix(ProgName); @@ -465,9 +462,7 @@ Tool *ToolChain::getFlang() const { return Flang.get(); } -Tool *ToolChain::buildAssembler() const { - return new tools::ClangAs(*this); -} +Tool *ToolChain::buildAssembler() const { return new tools::ClangAs(*this); } Tool *ToolChain::buildLinker() const { llvm_unreachable("Linking is not supported by this toolchain"); @@ -826,10 +821,12 @@ bool ToolChain::needsProfileRT(const ArgList &Args) { return false; return Args.hasArg(options::OPT_fprofile_generate) || + Args.hasArg(options::OPT_fprofile_generate_gpu) || Args.hasArg(options::OPT_fprofile_generate_EQ) || Args.hasArg(options::OPT_fcs_profile_generate) || Args.hasArg(options::OPT_fcs_profile_generate_EQ) || Args.hasArg(options::OPT_fprofile_instr_generate) || + Args.hasArg(options::OPT_fprofile_instr_generate_gpu) || Args.hasArg(options::OPT_fprofile_instr_generate_EQ) || Args.hasArg(options::OPT_fcreate_profile) || Args.hasArg(options::OPT_forder_file_instrumentation); @@ -842,8 +839,10 @@ bool ToolChain::needsGCovInstrumentation(const llvm::opt::ArgList &Args) { } Tool *ToolChain::SelectTool(const JobAction &JA) const { - if (D.IsFlangMode() && getDriver().ShouldUseFlangCompiler(JA)) return getFlang(); - if (getDriver().ShouldUseClangCompiler(JA)) return getClang(); + if (D.IsFlangMode() && getDriver().ShouldUseFlangCompiler(JA)) + return getFlang(); + if (getDriver().ShouldUseClangCompiler(JA)) + return getClang(); Action::ActionClass AC = JA.getKind(); if (AC == Action::AssembleJobClass && useIntegratedAs() && !getTriple().isOSAIX()) @@ -865,7 +864,7 @@ std::string ToolChain::GetLinkerPath(bool *LinkerIsLLD) const { // Get -fuse-ld= first to prevent -Wunused-command-line-argument. -fuse-ld= is // considered as the linker flavor, e.g. "bfd", "gold", or "lld". - const Arg* A = Args.getLastArg(options::OPT_fuse_ld_EQ); + const Arg *A = Args.getLastArg(options::OPT_fuse_ld_EQ); StringRef UseLinker = A ? A->getValue() : CLANG_DEFAULT_LINKER; // --ld-path= takes precedence over -fuse-ld= and specifies the executable @@ -950,9 +949,7 @@ types::ID ToolChain::LookupTypeForExtension(StringRef Ext) const { return id; } -bool ToolChain::HasNativeLLVMSupport() const { - return false; -} +bool ToolChain::HasNativeLLVMSupport() const { return false; } bool ToolChain::isCrossCompiling() const { llvm::Triple HostTriple(LLVM_HOST_TRIPLE); @@ -964,7 +961,8 @@ bool ToolChain::isCrossCompiling() const { case llvm::Triple::thumb: case llvm::Triple::thumbeb: return getArch() != llvm::Triple::arm && getArch() != llvm::Triple::thumb && - getArch() != llvm::Triple::armeb && getArch() != llvm::Triple::thumbeb; + getArch() != llvm::Triple::armeb && + getArch() != llvm::Triple::thumbeb; default: return HostTriple.getArch() != getArch(); } @@ -1046,9 +1044,7 @@ std::string ToolChain::ComputeEffectiveClangTriple(const ArgList &Args, return ComputeLLVMTriple(Args, InputType); } -std::string ToolChain::computeSysRoot() const { - return D.SysRoot; -} +std::string ToolChain::computeSysRoot() const { return D.SysRoot; } void ToolChain::AddClangSystemIncludeArgs(const ArgList &DriverArgs, ArgStringList &CC1Args) const { @@ -1072,12 +1068,12 @@ void ToolChain::addProfileRTLibs(const llvm::opt::ArgList &Args, CmdArgs.push_back(getCompilerRTArgString(Args, "profile")); } -ToolChain::RuntimeLibType ToolChain::GetRuntimeLibType( - const ArgList &Args) const { +ToolChain::RuntimeLibType +ToolChain::GetRuntimeLibType(const ArgList &Args) const { if (runtimeLibType) return *runtimeLibType; - const Arg* A = Args.getLastArg(options::OPT_rtlib_EQ); + const Arg *A = Args.getLastArg(options::OPT_rtlib_EQ); StringRef LibName = A ? A->getValue() : CLANG_DEFAULT_RTLIB; // Only use "platform" in tests to override CLANG_DEFAULT_RTLIB! @@ -1098,8 +1094,8 @@ ToolChain::RuntimeLibType ToolChain::GetRuntimeLibType( return *runtimeLibType; } -ToolChain::UnwindLibType ToolChain::GetUnwindLibType( - const ArgList &Args) const { +ToolChain::UnwindLibType +ToolChain::GetUnwindLibType(const ArgList &Args) const { if (unwindLibType) return *unwindLibType; @@ -1134,7 +1130,8 @@ ToolChain::UnwindLibType ToolChain::GetUnwindLibType( return *unwindLibType; } -ToolChain::CXXStdlibType ToolChain::GetCXXStdlibType(const ArgList &Args) const{ +ToolChain::CXXStdlibType +ToolChain::GetCXXStdlibType(const ArgList &Args) const { if (cxxStdlibType) return *cxxStdlibType; @@ -1290,7 +1287,7 @@ void ToolChain::AddCXXStdlibLibArgs(const ArgList &Args, void ToolChain::AddFilePathLibArgs(const ArgList &Args, ArgStringList &CmdArgs) const { for (const auto &LibPath : getFilePaths()) - if(LibPath.length() > 0) + if (LibPath.length() > 0) CmdArgs.push_back(Args.MakeArgString(StringRef("-L") + LibPath)); } @@ -1306,9 +1303,9 @@ bool ToolChain::isFastMathRuntimeAvailable(const ArgList &Args, if (!isOptimizationLevelFast(Args)) { // Check if -ffast-math or -funsafe-math. Arg *A = - Args.getLastArg(options::OPT_ffast_math, options::OPT_fno_fast_math, - options::OPT_funsafe_math_optimizations, - options::OPT_fno_unsafe_math_optimizations); + Args.getLastArg(options::OPT_ffast_math, options::OPT_fno_fast_math, + options::OPT_funsafe_math_optimizations, + options::OPT_fno_unsafe_math_optimizations); if (!A || A->getOption().getID() == options::OPT_fno_fast_math || A->getOption().getID() == options::OPT_fno_unsafe_math_optimizations) diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index 055884d275ce1b..106a612135f93f 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -578,6 +578,35 @@ static void addDashXForInput(const ArgList &Args, const InputInfo &Input, } } +static void addPGOFlagsGPU(const ToolChain &TC, const ArgList &Args, + ArgStringList &CmdArgs) { + const Driver &D = TC.getDriver(); + auto *ProfileClangArg = Args.getLastArg(options::OPT_fprofile_generate_gpu, + options::OPT_fno_profile_generate); + auto *ProfileLLVMArg = + Args.getLastArg(options::OPT_fprofile_instr_generate_gpu, + options::OPT_fno_profile_generate); + if (ProfileClangArg && + ProfileClangArg->getOption().matches(options::OPT_fno_profile_generate)) + ProfileClangArg = nullptr; + + if (ProfileLLVMArg && + ProfileLLVMArg->getOption().matches(options::OPT_fno_profile_generate)) + ProfileLLVMArg = nullptr; + + if (ProfileClangArg && ProfileLLVMArg) { + D.Diag(diag::err_drv_argument_not_allowed_with) + << ProfileClangArg->getSpelling() << ProfileLLVMArg->getSpelling(); + return; + } + + if (ProfileClangArg) + CmdArgs.push_back("-fprofile-instrument=clang"); + + if (ProfileLLVMArg) + CmdArgs.push_back("-fprofile-instrument=llvm"); +} + static void addPGOAndCoverageFlags(const ToolChain &TC, Compilation &C, const JobAction &JA, const InputInfo &Output, const ArgList &Args, SanitizerArgs &SanArgs, @@ -6049,10 +6078,12 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, options::OPT_finstrument_functions_after_inlining, options::OPT_finstrument_function_entry_bare); - // NVPTX/AMDGCN doesn't support PGO or coverage. There's no runtime support - // for sampling, overhead of call arc collection is way too high and there's - // no way to collect the output. - if (!Triple.isNVPTX() && !Triple.isAMDGCN()) + // NVPTX/AMDGCN PGO is handled separately + // GPU targets don't have their own profiling libraries and are + // collected/handled by the host's profiling library + if (Triple.isNVPTX() || Triple.isAMDGCN()) + addPGOFlagsGPU(TC, Args, CmdArgs); + else addPGOAndCoverageFlags(TC, C, JA, Output, Args, SanitizeArgs, CmdArgs); Args.AddLastArg(CmdArgs, options::OPT_fclang_abi_compat_EQ); >From 849b244ea29ac15cae7ddaa973356cecfb0e4792 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 18 Mar 2024 21:31:32 -0500 Subject: [PATCH 24/58] Fix elf obj file --- .../plugins-nextgen/common/src/GlobalHandler.cpp | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index 88e4bee506ba89..bca66cff6558a2 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -177,16 +177,19 @@ Expected GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, DeviceImageTy &Image) { GPUProfGlobals DeviceProfileData; - auto ELFObj = getELFObjectFile(Image); - if (!ELFObj) - return ELFObj.takeError(); + auto ObjFile = getELFObjectFile(Image); + if (!ObjFile) + return ObjFile.takeError(); + + std::unique_ptr ELFObj( + static_cast(ObjFile->release())); DeviceProfileData.TargetTriple = ELFObj->makeTriple(); // Iterate through elf symbols for (auto &Sym : ELFObj->symbols()) { auto NameOrErr = Sym.getName(); if (!NameOrErr) - return ELFObj.takeError(); + return NameOrErr.takeError(); // Check if given current global is a profiling global based // on name >From 55bd8d21a6224e0872002b0d1d77361eb75a3419 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 19 Mar 2024 14:50:54 -0500 Subject: [PATCH 25/58] Add GPU use profile option --- clang/include/clang/Driver/Options.td | 5 +++++ clang/lib/Driver/ToolChains/Clang.cpp | 22 ++++++++++++++++++++++ 2 files changed, 27 insertions(+) diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index aab445906fa347..b317d4e85b9571 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -1768,6 +1768,11 @@ def fprofile_use_EQ : Joined<["-"], "fprofile-use=">, Visibility<[ClangOption, CLOption]>, MetaVarName<"">, HelpText<"Use instrumentation data for profile-guided optimization. If pathname is a directory, it reads from /default.profdata. Otherwise, it reads from file .">; +def fprofile_use_gpu_EQ : Joined<["-"], "fprofile-use-gpu=">, + Group, + Visibility<[ClangOption, CLOption]>, + MetaVarName<"">, + HelpText<"Use instrumentation data for profile-guided optimization targeting GPU">; def fno_profile_instr_generate : Flag<["-"], "fno-profile-instr-generate">, Group, Visibility<[ClangOption, CLOption]>, HelpText<"Disable generation of profile instrumentation.">; diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index 106a612135f93f..1ea55011469509 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -586,6 +586,8 @@ static void addPGOFlagsGPU(const ToolChain &TC, const ArgList &Args, auto *ProfileLLVMArg = Args.getLastArg(options::OPT_fprofile_instr_generate_gpu, options::OPT_fno_profile_generate); + auto *ProfileUseArg = Args.getLastArg(options::OPT_fprofile_use_gpu_EQ, + options::OPT_fno_profile_instr_use); if (ProfileClangArg && ProfileClangArg->getOption().matches(options::OPT_fno_profile_generate)) ProfileClangArg = nullptr; @@ -594,17 +596,37 @@ static void addPGOFlagsGPU(const ToolChain &TC, const ArgList &Args, ProfileLLVMArg->getOption().matches(options::OPT_fno_profile_generate)) ProfileLLVMArg = nullptr; + if (ProfileUseArg && + ProfileUseArg->getOption().matches(options::OPT_fno_profile_generate)) + ProfileUseArg = nullptr; + if (ProfileClangArg && ProfileLLVMArg) { D.Diag(diag::err_drv_argument_not_allowed_with) << ProfileClangArg->getSpelling() << ProfileLLVMArg->getSpelling(); return; } + if (ProfileUseArg && ProfileClangArg) { + D.Diag(diag::err_drv_argument_not_allowed_with) + << ProfileClangArg->getSpelling() << ProfileUseArg->getSpelling(); + return; + } + + if (ProfileUseArg && ProfileLLVMArg) { + D.Diag(diag::err_drv_argument_not_allowed_with) + << ProfileLLVMArg->getSpelling() << ProfileUseArg->getSpelling(); + return; + } + if (ProfileClangArg) CmdArgs.push_back("-fprofile-instrument=clang"); if (ProfileLLVMArg) CmdArgs.push_back("-fprofile-instrument=llvm"); + + if (ProfileUseArg) + CmdArgs.push_back(Args.MakeArgString( + Twine("-fprofile-instrument-use-path=") + ProfileUseArg->getValue())); } static void addPGOAndCoverageFlags(const ToolChain &TC, Compilation &C, >From 4ebbb45baa24b52eb0f94ebaf16b6b9eb671420a Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 6 May 2024 23:13:58 -0500 Subject: [PATCH 26/58] Add more addrspace casts for GPU targets --- .../Transforms/Instrumentation/InstrProfiling.cpp | 11 ++++++++--- .../Instrumentation/PGOInstrumentation.cpp | 13 +++++++++---- 2 files changed, 17 insertions(+), 7 deletions(-) diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index a6b1e0d488120a..dd8c027c4bbf62 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -869,6 +869,8 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { llvm::InstrProfValueKind::IPVK_MemOPSize); CallInst *Call = nullptr; auto *TLI = &GetTLI(*Ind->getFunction()); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + DataVar, PointerType::getUnqual(M.getContext())); // To support value profiling calls within Windows exception handlers, funclet // information contained within operand bundles needs to be copied over to @@ -877,11 +879,13 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { SmallVector OpBundles; Ind->getOperandBundlesAsDefs(OpBundles); if (!IsMemOpSize) { - Value *Args[3] = {Ind->getTargetValue(), DataVar, Builder.getInt32(Index)}; + Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Builder.getInt32(Index)}; Call = Builder.CreateCall(getOrInsertValueProfilingCall(M, *TLI), Args, OpBundles); } else { - Value *Args[3] = {Ind->getTargetValue(), DataVar, Builder.getInt32(Index)}; + Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Builder.getInt32(Index)}; Call = Builder.CreateCall( getOrInsertValueProfilingCall(M, *TLI, ValueProfilingCallType::MemOp), Args, OpBundles); @@ -1575,7 +1579,8 @@ void InstrLowerer::createDataVariable(InstrProfCntrInstBase *Inc) { getInstrProfSectionName(IPSK_vals, TT.getObjectFormat())); ValuesVar->setAlignment(Align(8)); maybeSetComdat(ValuesVar, Fn, CntsVarName); - ValuesPtrExpr = ValuesVar; + ValuesPtrExpr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + ValuesVar, PointerType::getUnqual(Fn->getContext())); } uint64_t NumCounters = Inc->getNumCounters()->getZExtValue(); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index f662c89a378bee..fbe969f4a9c16c 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -955,12 +955,15 @@ static void instrumentOneFunc( ToProfile = Builder.CreatePtrToInt(Cand.V, Builder.getInt64Ty()); assert(ToProfile && "value profiling Value is of unexpected type"); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + Name, PointerType::get(M->getContext(), 0)); + SmallVector OpBundles; populateEHOperandBundle(Cand, BlockColors, OpBundles); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_value_profile), - {FuncInfo.FuncNameVar, Builder.getInt64(FuncInfo.FunctionHash), - ToProfile, Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, + {NormalizedPtr, Builder.getInt64(FuncInfo.FunctionHash), ToProfile, + Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, OpBundles); } } // IPVK_First <= Kind <= IPVK_Last @@ -1632,10 +1635,12 @@ void SelectInstVisitor::instrumentOneSelectInst(SelectInst &SI) { IRBuilder<> Builder(&SI); Type *Int64Ty = Builder.getInt64Ty(); auto *Step = Builder.CreateZExt(SI.getCondition(), Int64Ty); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + FuncNameVar, PointerType::get(M->getContext(), 0)); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_increment_step), - {FuncNameVar, Builder.getInt64(FuncHash), Builder.getInt32(TotalNumCtrs), - Builder.getInt32(*CurCtrIdx), Step}); + {NormalizedPtr, Builder.getInt64(FuncHash), + Builder.getInt32(TotalNumCtrs), Builder.getInt32(*CurCtrIdx), Step}); ++(*CurCtrIdx); } >From 7770b37a5a4c40bd45887f762bd7f1e652bc0ed2 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 7 May 2024 16:31:48 -0500 Subject: [PATCH 27/58] Fix params --- compiler-rt/lib/profile/InstrProfilingFile.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index 466bfe480543bc..bc1d40a37a5ad6 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1360,9 +1360,10 @@ int __llvm_write_custom_profile(const char *Target, initFileWriter(&fileWriter, OutputFile); /* Write custom data to the file */ - ReturnValue = lprofWriteDataImpl( - &fileWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, - lprofGetVPDataReader(), NamesBegin, NamesEnd, MergeDone); + ReturnValue = + lprofWriteDataImpl(&fileWriter, DataBegin, DataEnd, CountersBegin, + CountersEnd, NULL, NULL, lprofGetVPDataReader(), NULL, + NULL, NULL, NULL, NamesBegin, NamesEnd, MergeDone); closeFileObject(OutputFile); >From 619fb6918560f0b5d0b8137d392dfb27255a7d32 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 7 May 2024 17:20:39 -0500 Subject: [PATCH 28/58] Resolve merge conflict --- clang/lib/Driver/ToolChain.cpp | 7 ------- 1 file changed, 7 deletions(-) diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp index fce4168761a6d2..e29f1ccf44b690 100644 --- a/clang/lib/Driver/ToolChain.cpp +++ b/clang/lib/Driver/ToolChain.cpp @@ -1313,17 +1313,10 @@ bool ToolChain::isFastMathRuntimeAvailable(const ArgList &Args, // (to keep the linker options consistent with gcc and clang itself). if (Default && !isOptimizationLevelFast(Args)) { // Check if -ffast-math or -funsafe-math. -<<<<<<< HEAD - Arg *A = - Args.getLastArg(options::OPT_ffast_math, options::OPT_fno_fast_math, - options::OPT_funsafe_math_optimizations, - options::OPT_fno_unsafe_math_optimizations); -======= Arg *A = Args.getLastArg( options::OPT_ffast_math, options::OPT_fno_fast_math, options::OPT_funsafe_math_optimizations, options::OPT_fno_unsafe_math_optimizations, options::OPT_ffp_model_EQ); ->>>>>>> main if (!A || A->getOption().getID() == options::OPT_fno_fast_math || A->getOption().getID() == options::OPT_fno_unsafe_math_optimizations) >From 3f08ae9d560dbaeba4c547186c85a8c34f3dee97 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Sat, 11 May 2024 02:31:28 -0500 Subject: [PATCH 29/58] Have test read from profraw instead of dump --- offload/test/lit.cfg | 2 +- offload/test/offloading/pgo1.c | 90 +++++++++++++++++----------------- 2 files changed, 45 insertions(+), 47 deletions(-) diff --git a/offload/test/lit.cfg b/offload/test/lit.cfg index 069110dc69a6e4..94a0bc8a2b43ff 100644 --- a/offload/test/lit.cfg +++ b/offload/test/lit.cfg @@ -390,7 +390,7 @@ config.substitutions.append(("%clang", config.test_c_compiler)) if config.test_fortran_compiler: config.available_features.add('flang') config.substitutions.append(("%flang", config.test_fortran_compiler)) - +config.substitutions.append(("%target_triple", config.libomptarget_current_target)) config.substitutions.append(("%openmp_flags", config.test_openmp_flags)) if config.libomptarget_current_target.startswith('nvptx') and config.cuda_path: config.substitutions.append(("%cuda_flags", "--cuda-path=" + config.cuda_path)) diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index 79e93d0f10827f..9fe231e7c6716b 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -1,12 +1,15 @@ -// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ -// RUN: -Xclang "-fprofile-instrument=clang" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ -// RUN: --check-prefix="CLANG-PGO" -// RUN: %libomptarget-compile-generic -fprofile-generate \ -// RUN: -Xclang "-fprofile-instrument=llvm" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ +// RUN: %libomptarget-compile-generic -fprofile-instr-generate-gpu +// RUN: env LLVM_PROFILE_FILE=llvm.profraw %libomptarget-run-generic 2>&1 +// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %target_triple.llvm.profraw | %fcheck-generic \ // RUN: --check-prefix="LLVM-PGO" +// RUN: %libomptarget-compile-generic -fprofile-generate-gpu +// RUN: env LLVM_PROFILE_FILE=clang.profraw %libomptarget-run-generic 2>&1 +// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %target_triple.clang.profraw | %fcheck-generic \ +// RUN: --check-prefix="CLANG-PGO" + // UNSUPPORTED: x86_64-pc-linux-gnu // UNSUPPORTED: x86_64-pc-linux-gnu-LTO // UNSUPPORTED: aarch64-unknown-linux-gnu @@ -31,43 +34,38 @@ int main() { } } -// CLANG-PGO: ======== Counters ========= -// CLANG-PGO-NEXT: 0 11 20 10 20 -// CLANG-PGO-NEXT: ========== Data =========== -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: ======== Functions ======== -// CLANG-PGO-NEXT: pgo1.c: -// CLANG-PGO-SAME: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} -// CLANG-PGO-NEXT: test1 -// CLANG-PGO-NEXT: test2 +// LLVM-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: +// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-PGO: Counters: 4 +// LLVM-PGO: Function count: 20 +// LLVM-PGO: Block counts: [10, 20, 10] + +// LLVM-PGO-LABEL: test1: +// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-PGO: Counters: 1 +// LLVM-PGO: Function count: 1 +// LLVM-PGO: Block counts: [] + +// LLVM-PGO-LABEL: test2: +// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-PGO: Counters: 1 +// LLVM-PGO: Function count: 1 +// LLVM-PGO: Block counts: [] + +// CLANG-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: +// CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-PGO: Counters: 3 +// CLANG-PGO: Function count: 0 +// CLANG-PGO: Block counts: [11, 20] + +// CLANG-PGO-LABEL: test1: +// CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-PGO: Counters: 1 +// CLANG-PGO: Function count: 10 +// CLANG-PGO: Block counts: [] -// LLVM-PGO: ======== Counters ========= -// LLVM-PGO-NEXT: 20 10 20 10 1 1 -// LLVM-PGO-NEXT: ========== Data =========== -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: ======== Functions ======== -// LLVM-PGO-NEXT: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} -// LLVM-PGO-NEXT: test1 -// LLVM-PGO-NEXT: test2 +// CLANG-PGO-LABEL: test2: +// CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-PGO: Counters: 1 +// CLANG-PGO: Function count: 20 +// CLANG-PGO: Block counts: [] >From 09f2b39beaa9e325655d5569e5107827c1e7e955 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Sat, 11 May 2024 02:39:16 -0500 Subject: [PATCH 30/58] Remove debug dump --- offload/plugins-nextgen/common/src/PluginInterface.cpp | 3 --- 1 file changed, 3 deletions(-) diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp index a410deed1654f4..e7559f9e6cec83 100644 --- a/offload/plugins-nextgen/common/src/PluginInterface.cpp +++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp @@ -840,9 +840,6 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { if (!ProfOrErr) return ProfOrErr.takeError(); - // Dump out profdata - ProfOrErr->dump(); - // Write data to profiling file if (auto Err = ProfOrErr->write()) { consumeError(std::move(Err)); >From aa895a1788969a0d27692057a1457074e9772c78 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 18 Mar 2024 21:31:32 -0500 Subject: [PATCH 31/58] Fix elf obj file --- offload/plugins-nextgen/common/src/GlobalHandler.cpp | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/offload/plugins-nextgen/common/src/GlobalHandler.cpp b/offload/plugins-nextgen/common/src/GlobalHandler.cpp index 80cdcaff75528e..7717e19a5b6779 100644 --- a/offload/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/offload/plugins-nextgen/common/src/GlobalHandler.cpp @@ -177,16 +177,19 @@ Expected GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, DeviceImageTy &Image) { GPUProfGlobals DeviceProfileData; - auto ELFObj = getELFObjectFile(Image); - if (!ELFObj) - return ELFObj.takeError(); + auto ObjFile = getELFObjectFile(Image); + if (!ObjFile) + return ObjFile.takeError(); + + std::unique_ptr ELFObj( + static_cast(ObjFile->release())); DeviceProfileData.TargetTriple = ELFObj->makeTriple(); // Iterate through elf symbols for (auto &Sym : ELFObj->symbols()) { auto NameOrErr = Sym.getName(); if (!NameOrErr) - return ELFObj.takeError(); + return NameOrErr.takeError(); // Check if given current global is a profiling global based // on name >From 2031e49c2b26864f2dab72e629eb6cbe34928a7a Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 6 May 2024 23:13:58 -0500 Subject: [PATCH 32/58] Add more addrspace casts for GPU targets --- .../Transforms/Instrumentation/InstrProfiling.cpp | 11 ++++++++--- .../Instrumentation/PGOInstrumentation.cpp | 13 +++++++++---- 2 files changed, 17 insertions(+), 7 deletions(-) diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index a6b1e0d488120a..dd8c027c4bbf62 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -869,6 +869,8 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { llvm::InstrProfValueKind::IPVK_MemOPSize); CallInst *Call = nullptr; auto *TLI = &GetTLI(*Ind->getFunction()); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + DataVar, PointerType::getUnqual(M.getContext())); // To support value profiling calls within Windows exception handlers, funclet // information contained within operand bundles needs to be copied over to @@ -877,11 +879,13 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { SmallVector OpBundles; Ind->getOperandBundlesAsDefs(OpBundles); if (!IsMemOpSize) { - Value *Args[3] = {Ind->getTargetValue(), DataVar, Builder.getInt32(Index)}; + Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Builder.getInt32(Index)}; Call = Builder.CreateCall(getOrInsertValueProfilingCall(M, *TLI), Args, OpBundles); } else { - Value *Args[3] = {Ind->getTargetValue(), DataVar, Builder.getInt32(Index)}; + Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Builder.getInt32(Index)}; Call = Builder.CreateCall( getOrInsertValueProfilingCall(M, *TLI, ValueProfilingCallType::MemOp), Args, OpBundles); @@ -1575,7 +1579,8 @@ void InstrLowerer::createDataVariable(InstrProfCntrInstBase *Inc) { getInstrProfSectionName(IPSK_vals, TT.getObjectFormat())); ValuesVar->setAlignment(Align(8)); maybeSetComdat(ValuesVar, Fn, CntsVarName); - ValuesPtrExpr = ValuesVar; + ValuesPtrExpr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + ValuesVar, PointerType::getUnqual(Fn->getContext())); } uint64_t NumCounters = Inc->getNumCounters()->getZExtValue(); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index 4b51396a8baa35..ee1657ba8400ee 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -1007,12 +1007,15 @@ static void instrumentOneFunc( ToProfile = Builder.CreatePtrToInt(Cand.V, Builder.getInt64Ty()); assert(ToProfile && "value profiling Value is of unexpected type"); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + Name, PointerType::get(M->getContext(), 0)); + SmallVector OpBundles; populateEHOperandBundle(Cand, BlockColors, OpBundles); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_value_profile), - {FuncInfo.FuncNameVar, Builder.getInt64(FuncInfo.FunctionHash), - ToProfile, Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, + {NormalizedPtr, Builder.getInt64(FuncInfo.FunctionHash), ToProfile, + Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, OpBundles); } } // IPVK_First <= Kind <= IPVK_Last @@ -1685,10 +1688,12 @@ void SelectInstVisitor::instrumentOneSelectInst(SelectInst &SI) { IRBuilder<> Builder(&SI); Type *Int64Ty = Builder.getInt64Ty(); auto *Step = Builder.CreateZExt(SI.getCondition(), Int64Ty); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + FuncNameVar, PointerType::get(M->getContext(), 0)); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_increment_step), - {FuncNameVar, Builder.getInt64(FuncHash), Builder.getInt32(TotalNumCtrs), - Builder.getInt32(*CurCtrIdx), Step}); + {NormalizedPtr, Builder.getInt64(FuncHash), + Builder.getInt32(TotalNumCtrs), Builder.getInt32(*CurCtrIdx), Step}); ++(*CurCtrIdx); } >From be6524bb4f77de0add1e698f68115fd336f32238 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 13 May 2024 17:41:00 -0500 Subject: [PATCH 33/58] Have test read from profraw instead of dump --- offload/test/lit.cfg | 2 + offload/test/offloading/pgo1.c | 94 ++++++++++++++++------------------ 2 files changed, 46 insertions(+), 50 deletions(-) diff --git a/offload/test/lit.cfg b/offload/test/lit.cfg index 069110dc69a6e4..38e6a33b01fafc 100644 --- a/offload/test/lit.cfg +++ b/offload/test/lit.cfg @@ -391,6 +391,8 @@ if config.test_fortran_compiler: config.available_features.add('flang') config.substitutions.append(("%flang", config.test_fortran_compiler)) +config.substitutions.append(("%target_triple", config.libomptarget_current_target)) + config.substitutions.append(("%openmp_flags", config.test_openmp_flags)) if config.libomptarget_current_target.startswith('nvptx') and config.cuda_path: config.substitutions.append(("%cuda_flags", "--cuda-path=" + config.cuda_path)) diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index 79e93d0f10827f..d22d5340f5b3ec 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -1,22 +1,21 @@ -// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ -// RUN: -Xclang "-fprofile-instrument=clang" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ -// RUN: --check-prefix="CLANG-PGO" -// RUN: %libomptarget-compile-generic -fprofile-generate \ -// RUN: -Xclang "-fprofile-instrument=llvm" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ +// RUN: %libomptarget-compile-generic -Xclang "-fprofile-instrument=llvm" +// RUN: env LLVM_PROFILE_FILE=llvm.profraw %libomptarget-run-generic 2>&1 +// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %target_triple.llvm.profraw | %fcheck-generic \ // RUN: --check-prefix="LLVM-PGO" +// RUN: %libomptarget-compile-generic -Xclang "-fprofile-instrument=clang" +// RUN: env LLVM_PROFILE_FILE=clang.profraw %libomptarget-run-generic 2>&1 +// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %target_triple.clang.profraw | %fcheck-generic \ +// RUN: --check-prefix="CLANG-PGO" + // UNSUPPORTED: x86_64-pc-linux-gnu // UNSUPPORTED: x86_64-pc-linux-gnu-LTO // UNSUPPORTED: aarch64-unknown-linux-gnu // UNSUPPORTED: aarch64-unknown-linux-gnu-LTO // REQUIRES: pgo -#ifdef _OPENMP -#include -#endif - int test1(int a) { return a / 2; } int test2(int a) { return a * 2; } @@ -31,43 +30,38 @@ int main() { } } -// CLANG-PGO: ======== Counters ========= -// CLANG-PGO-NEXT: 0 11 20 10 20 -// CLANG-PGO-NEXT: ========== Data =========== -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: ======== Functions ======== -// CLANG-PGO-NEXT: pgo1.c: -// CLANG-PGO-SAME: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} -// CLANG-PGO-NEXT: test1 -// CLANG-PGO-NEXT: test2 +// LLVM-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: +// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-PGO: Counters: 4 +// LLVM-PGO: Function count: 20 +// LLVM-PGO: Block counts: [10, 20, 10] + +// LLVM-PGO-LABEL: test1: +// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-PGO: Counters: 1 +// LLVM-PGO: Function count: 1 +// LLVM-PGO: Block counts: [] + +// LLVM-PGO-LABEL: test2: +// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-PGO: Counters: 1 +// LLVM-PGO: Function count: 1 +// LLVM-PGO: Block counts: [] + +// CLANG-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: +// CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-PGO: Counters: 3 +// CLANG-PGO: Function count: 0 +// CLANG-PGO: Block counts: [11, 20] + +// CLANG-PGO-LABEL: test1: +// CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-PGO: Counters: 1 +// CLANG-PGO: Function count: 10 +// CLANG-PGO: Block counts: [] -// LLVM-PGO: ======== Counters ========= -// LLVM-PGO-NEXT: 20 10 20 10 1 1 -// LLVM-PGO-NEXT: ========== Data =========== -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: ======== Functions ======== -// LLVM-PGO-NEXT: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} -// LLVM-PGO-NEXT: test1 -// LLVM-PGO-NEXT: test2 +// CLANG-PGO-LABEL: test2: +// CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-PGO: Counters: 1 +// CLANG-PGO: Function count: 20 +// CLANG-PGO: Block counts: [] >From e266cc7190b4639c8273d49d39e78aa644bf032b Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 16 May 2024 23:21:20 -0500 Subject: [PATCH 34/58] Fix GPU PGO names --- clang/lib/Driver/ToolChains/Clang.cpp | 25 ++++++++++++++++++++++--- offload/test/offloading/pgo1.c | 4 ++-- 2 files changed, 24 insertions(+), 5 deletions(-) diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index f0c2481145d0bd..5a4dc1295360fc 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -585,13 +585,20 @@ static void addDashXForInput(const ArgList &Args, const InputInfo &Input, static void addPGOFlagsGPU(const ToolChain &TC, const ArgList &Args, ArgStringList &CmdArgs) { const Driver &D = TC.getDriver(); - auto *ProfileClangArg = Args.getLastArg(options::OPT_fprofile_generate_gpu, - options::OPT_fno_profile_generate); - auto *ProfileLLVMArg = + auto *ProfileClangArg = Args.getLastArg(options::OPT_fprofile_instr_generate_gpu, options::OPT_fno_profile_generate); + auto *ProfileLLVMArg = Args.getLastArg(options::OPT_fprofile_generate_gpu, + options::OPT_fno_profile_generate); auto *ProfileUseArg = Args.getLastArg(options::OPT_fprofile_use_gpu_EQ, options::OPT_fno_profile_instr_use); + + auto *HostLLVMArg = Args.getLastArgNoClaim(options::OPT_fprofile_generate, + options::OPT_fprofile_generate_EQ); + auto *HostClangArg = + Args.getLastArgNoClaim(options::OPT_fprofile_instr_generate, + options::OPT_fprofile_instr_generate_EQ); + if (ProfileClangArg && ProfileClangArg->getOption().matches(options::OPT_fno_profile_generate)) ProfileClangArg = nullptr; @@ -622,6 +629,18 @@ static void addPGOFlagsGPU(const ToolChain &TC, const ArgList &Args, return; } + if (HostLLVMArg && ProfileClangArg) { + D.Diag(diag::err_drv_argument_not_allowed_with) + << HostLLVMArg->getSpelling() << ProfileClangArg->getSpelling(); + return; + } + + if (HostClangArg && ProfileLLVMArg) { + D.Diag(diag::err_drv_argument_not_allowed_with) + << HostClangArg->getSpelling() << ProfileLLVMArg->getSpelling(); + return; + } + if (ProfileClangArg) CmdArgs.push_back("-fprofile-instrument=clang"); diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index ec93cce2c86207..b9fc95c89791a8 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -1,10 +1,10 @@ -// RUN: %libomptarget-compile-generic -fprofile-instr-generate-gpu +// RUN: %libomptarget-compile-generic -fprofile-generate-gpu // RUN: env LLVM_PROFILE_FILE=llvm.profraw %libomptarget-run-generic 2>&1 // RUN: llvm-profdata show --all-functions --counts \ // RUN: %target_triple.llvm.profraw | %fcheck-generic \ // RUN: --check-prefix="LLVM-PGO" -// RUN: %libomptarget-compile-generic -fprofile-generate-gpu +// RUN: %libomptarget-compile-generic -fprofile-instr-generate-gpu // RUN: env LLVM_PROFILE_FILE=clang.profraw %libomptarget-run-generic 2>&1 // RUN: llvm-profdata show --all-functions --counts \ // RUN: %target_triple.clang.profraw | %fcheck-generic \ >From 2b8eb2935ec21bf0acc5c56f45837b5976560963 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 24 May 2024 19:59:33 -0500 Subject: [PATCH 35/58] Fix PGO test format --- offload/test/offloading/pgo1.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index d22d5340f5b3ec..0e75c684ed9263 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -33,20 +33,17 @@ int main() { // LLVM-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 4 -// LLVM-PGO: Function count: 20 -// LLVM-PGO: Block counts: [10, 20, 10] +// LLVM-PGO: Block counts: [20, 10, 20, 10] // LLVM-PGO-LABEL: test1: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Function count: 1 -// LLVM-PGO: Block counts: [] +// LLVM-PGO: Block counts: [1] // LLVM-PGO-LABEL: test2: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Function count: 1 -// LLVM-PGO: Block counts: [] +// LLVM-PGO: Block counts: [1] // CLANG-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} >From 67f3009173d815295f36e2b37e85add1347e3bf9 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 24 May 2024 20:45:04 -0500 Subject: [PATCH 36/58] Refactor profile writer --- compiler-rt/lib/profile/InstrProfilingFile.c | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index bc1d40a37a5ad6..76238214c13aa3 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1344,8 +1344,7 @@ int __llvm_write_custom_profile(const char *Target, forceTruncateFile(TargetFilename); /* Open target-specific PGO file */ - MergeDone = 0; - FILE *OutputFile = getMergeFileObject(TargetFilename, &MergeDone); + FILE *OutputFile = getFileObject(TargetFilename); if (!OutputFile) { PROF_ERR("Failed to open file : %s\n", TargetFilename); @@ -1356,15 +1355,11 @@ int __llvm_write_custom_profile(const char *Target, FreeHook = &free; setupIOBuffer(); - ProfDataWriter fileWriter; - initFileWriter(&fileWriter, OutputFile); - - /* Write custom data to the file */ - ReturnValue = - lprofWriteDataImpl(&fileWriter, DataBegin, DataEnd, CountersBegin, - CountersEnd, NULL, NULL, lprofGetVPDataReader(), NULL, - NULL, NULL, NULL, NamesBegin, NamesEnd, MergeDone); + /* Write custom data */ + ReturnValue = __llvm_profile_write_buffer_internal( + OutputFile, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, + NamesBegin, NamesEnd); closeFileObject(OutputFile); // Restore SIGKILL. >From e8ad1322c557f7b48e2b28fe3a34a696a1103bba Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 27 May 2024 18:29:18 -0500 Subject: [PATCH 37/58] Fix refactor bug --- compiler-rt/lib/profile/InstrProfilingFile.c | 52 ++++++++++---------- offload/test/offloading/pgo1.c | 6 ++- 2 files changed, 29 insertions(+), 29 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index 76238214c13aa3..784cb9af6169d8 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -505,14 +505,6 @@ static FILE *getFileObject(const char *OutputName) { return fopen(OutputName, "ab"); } -/* Get file object and merge if applicable */ -static FILE *getMergeFileObject(const char *OutputName, int *MergeDone) { - VPMergeHook = &lprofMergeValueProfData; - if (doMerging()) - return openFileForMerging(OutputName, MergeDone); - return getFileObject(OutputName); -} - static void closeFileObject(FILE *OutputFile) { if (OutputFile == getProfileFile()) { fflush(OutputFile); @@ -526,8 +518,15 @@ static void closeFileObject(FILE *OutputFile) { /* Write profile data to file \c OutputName. */ static int writeFile(const char *OutputName) { - int RetVal, MergeDone = 0; - FILE *OutputFile = getMergeFileObject(OutputName, &MergeDone); + int RetVal; + FILE *OutputFile; + + int MergeDone = 0; + VPMergeHook = &lprofMergeValueProfData; + if (doMerging()) + OutputFile = openFileForMerging(OutputName, &MergeDone); + else + OutputFile = getFileObject(OutputName); if (!OutputFile) return -1; @@ -565,16 +564,10 @@ static int writeOrderFile(const char *OutputName) { #define LPROF_INIT_ONCE_ENV "__LLVM_PROFILE_RT_INIT_ONCE" -static void forceTruncateFile(const char *Filename) { - FILE *File = fopen(Filename, "w"); - if (!File) - return; - fclose(File); -} - static void truncateCurrentFile(void) { const char *Filename; char *FilenameBuf; + FILE *File; int Length; Length = getCurFilenameLength(); @@ -604,7 +597,10 @@ static void truncateCurrentFile(void) { return; /* Truncate the file. Later we'll reopen and append. */ - forceTruncateFile(Filename); + File = fopen(Filename, "w"); + if (!File) + return; + fclose(File); } /* Write a partial profile to \p Filename, which is required to be backed by @@ -1287,7 +1283,7 @@ int __llvm_write_custom_profile(const char *Target, const char *CountersBegin, const char *CountersEnd, const char *NamesBegin, const char *NamesEnd) { - int ReturnValue = 0, FilenameLength, TargetLength, MergeDone; + int ReturnValue = 0, FilenameLength, TargetLength; char *FilenameBuf, *TargetFilename; const char *Filename; @@ -1340,11 +1336,9 @@ int __llvm_write_custom_profile(const char *Target, return -1; } - /* Clean old target file */ - forceTruncateFile(TargetFilename); - - /* Open target-specific PGO file */ - FILE *OutputFile = getFileObject(TargetFilename); + /* Open and truncate target-specific PGO file */ + FILE *OutputFile = fopen(TargetFilename, "w"); + setProfileFile(OutputFile); if (!OutputFile) { PROF_ERR("Failed to open file : %s\n", TargetFilename); @@ -1357,9 +1351,13 @@ int __llvm_write_custom_profile(const char *Target, setupIOBuffer(); /* Write custom data */ - ReturnValue = __llvm_profile_write_buffer_internal( - OutputFile, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, - NamesBegin, NamesEnd); + ProfDataWriter fileWriter; + initFileWriter(&fileWriter, OutputFile); + + /* Write custom data to the file */ + ReturnValue = lprofWriteDataImpl( + &fileWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, + lprofGetVPDataReader(), NULL, NULL, NULL, NULL, NamesBegin, NamesEnd, 0); closeFileObject(OutputFile); // Restore SIGKILL. diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index 0e75c684ed9263..d6747113265803 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -1,10 +1,12 @@ -// RUN: %libomptarget-compile-generic -Xclang "-fprofile-instrument=llvm" +// RUN: %libomptarget-compile-generic -fprofile-generate \ +// RUN: -Xclang "-fprofile-instrument=llvm" // RUN: env LLVM_PROFILE_FILE=llvm.profraw %libomptarget-run-generic 2>&1 // RUN: llvm-profdata show --all-functions --counts \ // RUN: %target_triple.llvm.profraw | %fcheck-generic \ // RUN: --check-prefix="LLVM-PGO" -// RUN: %libomptarget-compile-generic -Xclang "-fprofile-instrument=clang" +// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ +// RUN: -Xclang "-fprofile-instrument=clang" // RUN: env LLVM_PROFILE_FILE=clang.profraw %libomptarget-run-generic 2>&1 // RUN: llvm-profdata show --all-functions --counts \ // RUN: %target_triple.clang.profraw | %fcheck-generic \ >From 79bf08e0bea8ab32781f201cdfc096a59156f270 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 28 May 2024 00:45:47 -0500 Subject: [PATCH 38/58] Check for level in test case TODO: Actually ensure the right level is used in the profraw file when only GPU flags are supplied --- offload/test/offloading/pgo1.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index 5d1a41ccab0f24..a3a242c703d96a 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -45,6 +45,16 @@ int main() { // LLVM-PGO: Counters: 1 // LLVM-PGO: Block counts: [1] +// LLVM-PGO-LABEL: Instrumentation level: +// LLVM-PGO-SAME: IR +// LLVM-PGO-SAME: entry_first = 0 +// LLVM-PGO-LABEL: Functions shown: +// LLVM-PGO-SAME: 3 +// LLVM-PGO-LABEL: Maximum function count: +// LLVM-PGO-SAME: 20 +// LLVM-PGO-LABEL: Maximum internal block count: +// LLVM-PGO-SAME: 20 + // CLANG-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // CLANG-PGO: Counters: 3 @@ -62,3 +72,12 @@ int main() { // CLANG-PGO: Counters: 1 // CLANG-PGO: Function count: 20 // CLANG-PGO: Block counts: [] + +// CLANG-PGO-LABEL: Instrumentation level: +// CLANG-PGO-SAME: Front-end +// CLANG-PGO-LABEL: Functions shown: +// CLANG-PGO-SAME: 3 +// CLANG-PGO-LABEL: Maximum function count: +// CLANG-PGO-SAME: 20 +// CLANG-PGO-LABEL: Maximum internal block count: +// CLANG-PGO-SAME: 20 >From 4c9f814ce14aeb6766a93f5c1d15b847b98dc29f Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 28 May 2024 12:58:43 -0500 Subject: [PATCH 39/58] Make requested clang-format change --- offload/plugins-nextgen/common/include/GlobalHandler.h | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/offload/plugins-nextgen/common/include/GlobalHandler.h b/offload/plugins-nextgen/common/include/GlobalHandler.h index 017d7e994f07a8..1d7b9f80f9dfd3 100644 --- a/offload/plugins-nextgen/common/include/GlobalHandler.h +++ b/offload/plugins-nextgen/common/include/GlobalHandler.h @@ -64,12 +64,10 @@ struct __llvm_profile_data { }; extern "C" { -extern int __attribute__((weak)) -__llvm_write_custom_profile(const char *Target, - const __llvm_profile_data *DataBegin, - const __llvm_profile_data *DataEnd, - const char *CountersBegin, const char *CountersEnd, - const char *NamesBegin, const char *NamesEnd); +extern int __attribute__((weak)) __llvm_write_custom_profile( + const char *Target, const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, const char *CountersBegin, + const char *CountersEnd, const char *NamesBegin, const char *NamesEnd); } /// PGO profiling data extracted from a GPU device >From cfe166091ca91623d356d2dde41b64cefe98e472 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 30 May 2024 18:15:55 -0500 Subject: [PATCH 40/58] Check for version global on GPU --- compiler-rt/include/profile/InstrProfData.inc | 2 +- compiler-rt/lib/profile/InstrProfiling.h | 3 ++- .../lib/profile/InstrProfilingBuffer.c | 3 ++- compiler-rt/lib/profile/InstrProfilingFile.c | 14 +++++++++---- .../lib/profile/InstrProfilingInternal.h | 3 ++- .../lib/profile/InstrProfilingWriter.c | 20 +++++++++---------- .../llvm/ProfileData/InstrProfData.inc | 2 +- .../Instrumentation/PGOInstrumentation.cpp | 5 ++++- .../llvm-profdata/binary-ids-padding.test | 2 +- ...alformed-not-space-for-another-header.test | 2 +- .../malformed-num-counters-zero.test | 2 +- .../malformed-ptr-to-counter-array.test | 2 +- .../common/include/GlobalHandler.h | 13 ++++++++---- .../common/src/GlobalHandler.cpp | 16 ++++++++++++--- 14 files changed, 58 insertions(+), 31 deletions(-) diff --git a/compiler-rt/include/profile/InstrProfData.inc b/compiler-rt/include/profile/InstrProfData.inc index e9866d94b762c1..f0a260483429c3 100644 --- a/compiler-rt/include/profile/InstrProfData.inc +++ b/compiler-rt/include/profile/InstrProfData.inc @@ -152,7 +152,7 @@ INSTR_PROF_VALUE_NODE(PtrToNodeT, llvm::PointerType::getUnqual(Ctx), Next, \ #define INSTR_PROF_DATA_DEFINED #endif INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic()) -INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version()) +INSTR_PROF_RAW_HEADER(uint64_t, Version, Version) INSTR_PROF_RAW_HEADER(uint64_t, BinaryIdsSize, __llvm_write_binary_ids(NULL)) INSTR_PROF_RAW_HEADER(uint64_t, NumData, NumData) INSTR_PROF_RAW_HEADER(uint64_t, PaddingBytesBeforeCounters, PaddingBytesBeforeCounters) diff --git a/compiler-rt/lib/profile/InstrProfiling.h b/compiler-rt/lib/profile/InstrProfiling.h index ef1292a45bf01d..34b7d85ad66845 100644 --- a/compiler-rt/lib/profile/InstrProfiling.h +++ b/compiler-rt/lib/profile/InstrProfiling.h @@ -305,7 +305,8 @@ int __llvm_write_custom_profile(const char *Target, const __llvm_profile_data *DataEnd, const char *CountersBegin, const char *CountersEnd, const char *NamesBegin, - const char *NamesEnd); + const char *NamesEnd, + const uint64_t *VersionOverride); /*! * This variable is defined in InstrProfilingRuntime.cpp as a hidden diff --git a/compiler-rt/lib/profile/InstrProfilingBuffer.c b/compiler-rt/lib/profile/InstrProfilingBuffer.c index 1c451d7ec75637..b406e8db74f3f0 100644 --- a/compiler-rt/lib/profile/InstrProfilingBuffer.c +++ b/compiler-rt/lib/profile/InstrProfilingBuffer.c @@ -252,5 +252,6 @@ COMPILER_RT_VISIBILITY int __llvm_profile_write_buffer_internal( &BufferWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, BitmapBegin, BitmapEnd, /*VPDataReader=*/0, NamesBegin, NamesEnd, /*VTableBegin=*/NULL, /*VTableEnd=*/NULL, /*VNamesBegin=*/NULL, - /*VNamesEnd=*/NULL, /*SkipNameDataWrite=*/0); + /*VNamesEnd=*/NULL, /*SkipNameDataWrite=*/0, + __llvm_profile_get_version()); } diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index 784cb9af6169d8..947a3ff89bc1fc 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1282,7 +1282,8 @@ int __llvm_write_custom_profile(const char *Target, const __llvm_profile_data *DataEnd, const char *CountersBegin, const char *CountersEnd, const char *NamesBegin, - const char *NamesEnd) { + const char *NamesEnd, + const uint64_t *VersionOverride) { int ReturnValue = 0, FilenameLength, TargetLength; char *FilenameBuf, *TargetFilename; const char *Filename; @@ -1354,10 +1355,15 @@ int __llvm_write_custom_profile(const char *Target, ProfDataWriter fileWriter; initFileWriter(&fileWriter, OutputFile); + uint64_t Version = __llvm_profile_get_version(); + if (VersionOverride) + Version = *VersionOverride; + /* Write custom data to the file */ - ReturnValue = lprofWriteDataImpl( - &fileWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, - lprofGetVPDataReader(), NULL, NULL, NULL, NULL, NamesBegin, NamesEnd, 0); + ReturnValue = + lprofWriteDataImpl(&fileWriter, DataBegin, DataEnd, CountersBegin, + CountersEnd, NULL, NULL, lprofGetVPDataReader(), NULL, + NULL, NULL, NULL, NamesBegin, NamesEnd, 0, Version); closeFileObject(OutputFile); // Restore SIGKILL. diff --git a/compiler-rt/lib/profile/InstrProfilingInternal.h b/compiler-rt/lib/profile/InstrProfilingInternal.h index d5bd0e41fb1291..2b9f687e7f8860 100644 --- a/compiler-rt/lib/profile/InstrProfilingInternal.h +++ b/compiler-rt/lib/profile/InstrProfilingInternal.h @@ -160,7 +160,8 @@ int lprofWriteDataImpl(ProfDataWriter *Writer, VPDataReaderType *VPDataReader, const char *NamesBegin, const char *NamesEnd, const VTableProfData *VTableBegin, const VTableProfData *VTableEnd, const char *VNamesBegin, - const char *VNamesEnd, int SkipNameDataWrite); + const char *VNamesEnd, int SkipNameDataWrite, + uint64_t Version); /* Merge value profile data pointed to by SrcValueProfData into * in-memory profile counters pointed by to DstData. */ diff --git a/compiler-rt/lib/profile/InstrProfilingWriter.c b/compiler-rt/lib/profile/InstrProfilingWriter.c index 8816a71155511b..bcd88b30d050d4 100644 --- a/compiler-rt/lib/profile/InstrProfilingWriter.c +++ b/compiler-rt/lib/profile/InstrProfilingWriter.c @@ -254,21 +254,21 @@ COMPILER_RT_VISIBILITY int lprofWriteData(ProfDataWriter *Writer, const VTableProfData *VTableEnd = __llvm_profile_end_vtables(); const char *VNamesBegin = __llvm_profile_begin_vtabnames(); const char *VNamesEnd = __llvm_profile_end_vtabnames(); + uint64_t Version = __llvm_profile_get_version(); return lprofWriteDataImpl(Writer, DataBegin, DataEnd, CountersBegin, CountersEnd, BitmapBegin, BitmapEnd, VPDataReader, NamesBegin, NamesEnd, VTableBegin, VTableEnd, - VNamesBegin, VNamesEnd, SkipNameDataWrite); + VNamesBegin, VNamesEnd, SkipNameDataWrite, Version); } -COMPILER_RT_VISIBILITY int -lprofWriteDataImpl(ProfDataWriter *Writer, const __llvm_profile_data *DataBegin, - const __llvm_profile_data *DataEnd, - const char *CountersBegin, const char *CountersEnd, - const char *BitmapBegin, const char *BitmapEnd, - VPDataReaderType *VPDataReader, const char *NamesBegin, - const char *NamesEnd, const VTableProfData *VTableBegin, - const VTableProfData *VTableEnd, const char *VNamesBegin, - const char *VNamesEnd, int SkipNameDataWrite) { +COMPILER_RT_VISIBILITY int lprofWriteDataImpl( + ProfDataWriter *Writer, const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, const char *CountersBegin, + const char *CountersEnd, const char *BitmapBegin, const char *BitmapEnd, + VPDataReaderType *VPDataReader, const char *NamesBegin, + const char *NamesEnd, const VTableProfData *VTableBegin, + const VTableProfData *VTableEnd, const char *VNamesBegin, + const char *VNamesEnd, int SkipNameDataWrite, uint64_t Version) { /* Calculate size of sections. */ const uint64_t DataSectionSize = __llvm_profile_get_data_size(DataBegin, DataEnd); diff --git a/llvm/include/llvm/ProfileData/InstrProfData.inc b/llvm/include/llvm/ProfileData/InstrProfData.inc index e9866d94b762c1..f0a260483429c3 100644 --- a/llvm/include/llvm/ProfileData/InstrProfData.inc +++ b/llvm/include/llvm/ProfileData/InstrProfData.inc @@ -152,7 +152,7 @@ INSTR_PROF_VALUE_NODE(PtrToNodeT, llvm::PointerType::getUnqual(Ctx), Next, \ #define INSTR_PROF_DATA_DEFINED #endif INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic()) -INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version()) +INSTR_PROF_RAW_HEADER(uint64_t, Version, Version) INSTR_PROF_RAW_HEADER(uint64_t, BinaryIdsSize, __llvm_write_binary_ids(NULL)) INSTR_PROF_RAW_HEADER(uint64_t, NumData, NumData) INSTR_PROF_RAW_HEADER(uint64_t, PaddingBytesBeforeCounters, PaddingBytesBeforeCounters) diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index ee1657ba8400ee..f3c68fb17ce7cd 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -408,7 +408,10 @@ static GlobalVariable *createIRLevelProfileFlagVar(Module &M, bool IsCS) { auto IRLevelVersionVariable = new GlobalVariable( M, IntTy64, true, GlobalValue::WeakAnyLinkage, Constant::getIntegerValue(IntTy64, APInt(64, ProfileVersion)), VarName); - IRLevelVersionVariable->setVisibility(GlobalValue::HiddenVisibility); + if (isGPUProfTarget(M)) + IRLevelVersionVariable->setVisibility(GlobalValue::ProtectedVisibility); + else + IRLevelVersionVariable->setVisibility(GlobalValue::HiddenVisibility); Triple TT(M.getTargetTriple()); if (TT.supportsCOMDAT()) { IRLevelVersionVariable->setLinkage(GlobalValue::ExternalLinkage); diff --git a/llvm/test/tools/llvm-profdata/binary-ids-padding.test b/llvm/test/tools/llvm-profdata/binary-ids-padding.test index 292c582b45c52d..f31aa15bfe6c97 100644 --- a/llvm/test/tools/llvm-profdata/binary-ids-padding.test +++ b/llvm/test/tools/llvm-profdata/binary-ids-padding.test @@ -1,7 +1,7 @@ // Header // // INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic()) -// INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version()) +// INSTR_PROF_RAW_HEADER(uint64_t, Version, Version) // INSTR_PROF_RAW_HEADER(uint64_t, BinaryIdsSize, __llvm_write_binary_ids(NULL)) // INSTR_PROF_RAW_HEADER(uint64_t, DataSize, DataSize) // INSTR_PROF_RAW_HEADER(uint64_t, CountersSize, CountersSize) diff --git a/llvm/test/tools/llvm-profdata/malformed-not-space-for-another-header.test b/llvm/test/tools/llvm-profdata/malformed-not-space-for-another-header.test index 705e5efaf58759..44be2980bb2f25 100644 --- a/llvm/test/tools/llvm-profdata/malformed-not-space-for-another-header.test +++ b/llvm/test/tools/llvm-profdata/malformed-not-space-for-another-header.test @@ -1,7 +1,7 @@ // Header // // INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic()) -// INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version()) +// INSTR_PROF_RAW_HEADER(uint64_t, Version, Version) // INSTR_PROF_RAW_HEADER(uint64_t, BinaryIdsSize, __llvm_write_binary_ids(NULL)) // INSTR_PROF_RAW_HEADER(uint64_t, DataSize, DataSize) // INSTR_PROF_RAW_HEADER(uint64_t, CountersSize, CountersSize) diff --git a/llvm/test/tools/llvm-profdata/malformed-num-counters-zero.test b/llvm/test/tools/llvm-profdata/malformed-num-counters-zero.test index 157c13b926a7ed..9af9d65a6bdba1 100644 --- a/llvm/test/tools/llvm-profdata/malformed-num-counters-zero.test +++ b/llvm/test/tools/llvm-profdata/malformed-num-counters-zero.test @@ -1,7 +1,7 @@ // Header // // INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic()) -// INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version()) +// INSTR_PROF_RAW_HEADER(uint64_t, Version, Version) // INSTR_PROF_RAW_HEADER(uint64_t, BinaryIdsSize, __llvm_write_binary_ids(NULL)) // INSTR_PROF_RAW_HEADER(uint64_t, DataSize, DataSize) // INSTR_PROF_RAW_HEADER(uint64_t, CountersSize, CountersSize) diff --git a/llvm/test/tools/llvm-profdata/malformed-ptr-to-counter-array.test b/llvm/test/tools/llvm-profdata/malformed-ptr-to-counter-array.test index 83cf76f68fb635..49c5ae9b0931d6 100644 --- a/llvm/test/tools/llvm-profdata/malformed-ptr-to-counter-array.test +++ b/llvm/test/tools/llvm-profdata/malformed-ptr-to-counter-array.test @@ -1,7 +1,7 @@ // Header // // INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic()) -// INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version()) +// INSTR_PROF_RAW_HEADER(uint64_t, Version, Version) // INSTR_PROF_RAW_HEADER(uint64_t, BinaryIdsSize, __llvm_write_binary_ids(NULL)) // INSTR_PROF_RAW_HEADER(uint64_t, DataSize, DataSize) // INSTR_PROF_RAW_HEADER(uint64_t, CountersSize, CountersSize) diff --git a/offload/plugins-nextgen/common/include/GlobalHandler.h b/offload/plugins-nextgen/common/include/GlobalHandler.h index 1d7b9f80f9dfd3..6daa8b1b854131 100644 --- a/offload/plugins-nextgen/common/include/GlobalHandler.h +++ b/offload/plugins-nextgen/common/include/GlobalHandler.h @@ -13,6 +13,7 @@ #ifndef LLVM_OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_GLOBALHANDLER_H #define LLVM_OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_GLOBALHANDLER_H +#include #include #include "llvm/ADT/DenseMap.h" @@ -64,10 +65,13 @@ struct __llvm_profile_data { }; extern "C" { -extern int __attribute__((weak)) __llvm_write_custom_profile( - const char *Target, const __llvm_profile_data *DataBegin, - const __llvm_profile_data *DataEnd, const char *CountersBegin, - const char *CountersEnd, const char *NamesBegin, const char *NamesEnd); +extern int __attribute__((weak)) +__llvm_write_custom_profile(const char *Target, + const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, + const char *CountersBegin, const char *CountersEnd, + const char *NamesBegin, const char *NamesEnd, + const uint64_t *VersionOverride); } /// PGO profiling data extracted from a GPU device @@ -76,6 +80,7 @@ struct GPUProfGlobals { SmallVector<__llvm_profile_data> Data; SmallVector NamesData; Triple TargetTriple; + std::optional Version; void dump() const; Error write() const; diff --git a/offload/plugins-nextgen/common/src/GlobalHandler.cpp b/offload/plugins-nextgen/common/src/GlobalHandler.cpp index bca66cff6558a2..93abd0a5cea365 100644 --- a/offload/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/offload/plugins-nextgen/common/src/GlobalHandler.cpp @@ -16,6 +16,7 @@ #include "Shared/Utils.h" +#include "llvm/ProfileData/InstrProfData.inc" #include "llvm/Support/Error.h" #include @@ -214,6 +215,13 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, if (auto Err = readGlobalFromDevice(Device, Image, DataGlobal)) return Err; DeviceProfileData.Data.push_back(std::move(Data)); + } else if (*NameOrErr == INSTR_PROF_QUOTE(INSTR_PROF_RAW_VERSION_VAR)) { + uint64_t RawVersionData; + GlobalTy RawVersionGlobal(NameOrErr->str(), Sym.getSize(), + &RawVersionData); + if (auto Err = readGlobalFromDevice(Device, Image, RawVersionGlobal)) + return Err; + DeviceProfileData.Version = RawVersionData; } } return DeviceProfileData; @@ -267,6 +275,8 @@ Error GPUProfGlobals::write() const { CountsSize = Counts.size() * sizeof(int64_t); __llvm_profile_data *DataBegin, *DataEnd; char *CountersBegin, *CountersEnd, *NamesBegin, *NamesEnd; + const uint64_t *VersionOverride = + Version.has_value() ? &Version.value() : nullptr; // Initialize array of contiguous data. We need to make sure each section is // contiguous so that the PGO library can compute deltas properly @@ -288,9 +298,9 @@ Error GPUProfGlobals::write() const { memcpy(NamesBegin, NamesData.data(), NamesData.size()); // Invoke compiler-rt entrypoint - int result = __llvm_write_custom_profile(TargetTriple.str().c_str(), - DataBegin, DataEnd, CountersBegin, - CountersEnd, NamesBegin, NamesEnd); + int result = __llvm_write_custom_profile( + TargetTriple.str().c_str(), DataBegin, DataEnd, CountersBegin, + CountersEnd, NamesBegin, NamesEnd, VersionOverride); if (result != 0) return Plugin::error("Error writing GPU PGO data to file"); >From 5bf437618c91c882543c97d34b468d74070218fa Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 31 May 2024 12:57:07 -0500 Subject: [PATCH 41/58] Add host/device combination test --- offload/test/offloading/{ => gpupgo}/pgo1.c | 14 ++-- offload/test/offloading/gpupgo/pgo2.c | 73 +++++++++++++++++++++ 2 files changed, 81 insertions(+), 6 deletions(-) rename offload/test/offloading/{ => gpupgo}/pgo1.c (83%) create mode 100644 offload/test/offloading/gpupgo/pgo2.c diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/gpupgo/pgo1.c similarity index 83% rename from offload/test/offloading/pgo1.c rename to offload/test/offloading/gpupgo/pgo1.c index a3a242c703d96a..7c6e55f6546424 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/gpupgo/pgo1.c @@ -1,14 +1,16 @@ // RUN: %libomptarget-compile-generic -fprofile-generate-gpu -// RUN: env LLVM_PROFILE_FILE=llvm.profraw %libomptarget-run-generic 2>&1 +// RUN: env LLVM_PROFILE_FILE=%basename_t.llvm.profraw \ +// RUN: %libomptarget-run-generic 2>&1 // RUN: llvm-profdata show --all-functions --counts \ -// RUN: %target_triple.llvm.profraw | %fcheck-generic \ -// RUN: --check-prefix="LLVM-PGO" +// RUN: %target_triple.%basename_t.llvm.profraw | \ +// RUN: %fcheck-generic --check-prefix="LLVM-PGO" // RUN: %libomptarget-compile-generic -fprofile-instr-generate-gpu -// RUN: env LLVM_PROFILE_FILE=clang.profraw %libomptarget-run-generic 2>&1 +// RUN: env LLVM_PROFILE_FILE=%basename_t.clang.profraw \ +// RUN: %libomptarget-run-generic 2>&1 // RUN: llvm-profdata show --all-functions --counts \ -// RUN: %target_triple.clang.profraw | %fcheck-generic \ -// RUN: --check-prefix="CLANG-PGO" +// RUN: %target_triple.%basename_t.clang.profraw | \ +// RUN: %fcheck-generic --check-prefix="CLANG-PGO" // UNSUPPORTED: x86_64-pc-linux-gnu // UNSUPPORTED: x86_64-pc-linux-gnu-LTO diff --git a/offload/test/offloading/gpupgo/pgo2.c b/offload/test/offloading/gpupgo/pgo2.c new file mode 100644 index 00000000000000..1819573d55024a --- /dev/null +++ b/offload/test/offloading/gpupgo/pgo2.c @@ -0,0 +1,73 @@ +// RUN: %libomptarget-compile-generic -fprofile-generate \ +// RUN: -fprofile-generate-gpu +// RUN: env LLVM_PROFILE_FILE=%basename_t.llvm.profraw \ +// RUN: %libomptarget-run-generic 2>&1 +// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %basename_t.llvm.profraw | %fcheck-generic \ +// RUN: --check-prefix="LLVM-HOST" +// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %target_triple.%basename_t.llvm.profraw \ +// RUN: | %fcheck-generic --check-prefix="LLVM-DEVICE" + +// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ +// RUN: -fprofile-instr-generate-gpu +// RUN: env LLVM_PROFILE_FILE=%basename_t.clang.profraw \ +// RUN: %libomptarget-run-generic 2>&1 +// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %basename_t.clang.profraw | %fcheck-generic \ +// RUN: --check-prefix="CLANG-HOST" +// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %target_triple.%basename_t.clang.profraw | \ +// RUN: %fcheck-generic --check-prefix="CLANG-DEV" + +// UNSUPPORTED: x86_64-pc-linux-gnu +// UNSUPPORTED: x86_64-pc-linux-gnu-LTO +// UNSUPPORTED: aarch64-unknown-linux-gnu +// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +// REQUIRES: pgo + +int main() { + int host_var = 0; + for (int i = 0; i < 20; i++) { + host_var += i; + } + + int device_var = 1; +#pragma omp target + for (int i = 0; i < 10; i++) { + device_var *= i; + } +} + +// LLVM-HOST-LABEL: main: +// LLVM-HOST: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-HOST: Counters: 3 +// LLVM-HOST: Block counts: [20, 1, 0] + +// LLVM-HOST-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: +// LLVM-HOST: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-HOST: Counters: 2 +// LLVM-HOST: Block counts: [0, 0] + +// LLVM-DEVICE-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: +// LLVM-DEVICE: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-DEVICE: Counters: 3 +// LLVM-DEVICE: Block counts: [10, 1, 1] + +// CLANG-HOST-LABEL: main: +// CLANG-HOST: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-HOST: Counters: 2 +// CLANG-HOST: Function count: 1 +// CLANG-HOST: Block counts: [20] + +// CLANG-HOST-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: +// CLANG-HOST: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-HOST: Counters: 2 +// CLANG-HOST: Function count: 0 +// CLANG-HOST: Block counts: [0] + +// CLANG-DEV-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: +// CLANG-DEV: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-DEV: Counters: 2 +// CLANG-DEV: Function count: 0 +// CLANG-DEV: Block counts: [11] >From 253013792cb7137b11893e701497e8f62143123a Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 31 May 2024 16:59:22 -0500 Subject: [PATCH 42/58] Add PGO dump debug option --- offload/include/Shared/Environment.h | 1 + offload/plugins-nextgen/common/src/PluginInterface.cpp | 4 ++++ 2 files changed, 5 insertions(+) diff --git a/offload/include/Shared/Environment.h b/offload/include/Shared/Environment.h index d141146b6bd5a1..86f6d1c6ea2d36 100644 --- a/offload/include/Shared/Environment.h +++ b/offload/include/Shared/Environment.h @@ -30,6 +30,7 @@ enum class DeviceDebugKind : uint32_t { FunctionTracing = 1U << 1, CommonIssues = 1U << 2, AllocationTracker = 1U << 3, + PGODump = 1U << 4, }; struct DeviceEnvironmentTy { diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp index a479235e2c36cb..a68875ea7748ea 100644 --- a/offload/plugins-nextgen/common/src/PluginInterface.cpp +++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp @@ -839,6 +839,10 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { if (!ProfOrErr) return ProfOrErr.takeError(); + if ((OMPX_DebugKind.get() & uint32_t(DeviceDebugKind::PGODump)) == + uint32_t(DeviceDebugKind::PGODump)) + ProfOrErr->dump(); + // Write data to profiling file if (auto Err = ProfOrErr->write()) { consumeError(std::move(Err)); >From 344e357de657f54c068be969dcfc3ea33f2f026e Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 31 May 2024 20:29:20 -0500 Subject: [PATCH 43/58] Tighten PGO test requirements Require compiler-rt to be an enabled runtime --- offload/test/CMakeLists.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/offload/test/CMakeLists.txt b/offload/test/CMakeLists.txt index 32df1e47afaeb2..41ab339147791c 100644 --- a/offload/test/CMakeLists.txt +++ b/offload/test/CMakeLists.txt @@ -12,10 +12,10 @@ else() set(LIBOMPTARGET_DEBUG False) endif() -if (OPENMP_STANDALONE_BUILD) - set(LIBOMPTARGET_TEST_GPU_PGO False) -else() +if (NOT OPENMP_STANDALONE_BUILD AND "compiler-rt" IN_LIST LLVM_ENABLE_RUNTIMES) set(LIBOMPTARGET_TEST_GPU_PGO True) +else() + set(LIBOMPTARGET_TEST_GPU_PGO False) endif() # Replace the space from user's input with ";" in case that CMake add escape >From 2f751420b9ad2ffc7c9fac4a645724b45cdae59a Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 31 May 2024 20:29:20 -0500 Subject: [PATCH 44/58] Tighten PGO test requirements Require compiler-rt to be an enabled runtime --- offload/test/CMakeLists.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/offload/test/CMakeLists.txt b/offload/test/CMakeLists.txt index 32df1e47afaeb2..41ab339147791c 100644 --- a/offload/test/CMakeLists.txt +++ b/offload/test/CMakeLists.txt @@ -12,10 +12,10 @@ else() set(LIBOMPTARGET_DEBUG False) endif() -if (OPENMP_STANDALONE_BUILD) - set(LIBOMPTARGET_TEST_GPU_PGO False) -else() +if (NOT OPENMP_STANDALONE_BUILD AND "compiler-rt" IN_LIST LLVM_ENABLE_RUNTIMES) set(LIBOMPTARGET_TEST_GPU_PGO True) +else() + set(LIBOMPTARGET_TEST_GPU_PGO False) endif() # Replace the space from user's input with ";" in case that CMake add escape >From 79ceacb6559a3f6ecf3fd7ec1abf768ddeb97d13 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 31 May 2024 20:29:20 -0500 Subject: [PATCH 45/58] Tighten PGO test requirements Require compiler-rt to be an enabled runtime --- offload/test/CMakeLists.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/offload/test/CMakeLists.txt b/offload/test/CMakeLists.txt index 32df1e47afaeb2..41ab339147791c 100644 --- a/offload/test/CMakeLists.txt +++ b/offload/test/CMakeLists.txt @@ -12,10 +12,10 @@ else() set(LIBOMPTARGET_DEBUG False) endif() -if (OPENMP_STANDALONE_BUILD) - set(LIBOMPTARGET_TEST_GPU_PGO False) -else() +if (NOT OPENMP_STANDALONE_BUILD AND "compiler-rt" IN_LIST LLVM_ENABLE_RUNTIMES) set(LIBOMPTARGET_TEST_GPU_PGO True) +else() + set(LIBOMPTARGET_TEST_GPU_PGO False) endif() # Replace the space from user's input with ";" in case that CMake add escape >From ff0dd62cf1b236f1373fd3b70ec2875c3719ca04 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 31 May 2024 22:07:20 -0500 Subject: [PATCH 46/58] Add note about PGO debug flag --- openmp/docs/design/Runtimes.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/openmp/docs/design/Runtimes.rst b/openmp/docs/design/Runtimes.rst index f8a8cb87e83e66..7fc697a838e229 100644 --- a/openmp/docs/design/Runtimes.rst +++ b/openmp/docs/design/Runtimes.rst @@ -1493,3 +1493,4 @@ debugging features are supported. * Enable debugging assertions in the device. ``0x01`` * Enable diagnosing common problems during offloading . ``0x4`` * Enable device malloc statistics (amdgpu only). ``0x8`` + * Dump device PGO counters (only if PGO on GPU is enabled). ``0x10`` >From 0b9cc35d686f03fb8f835b2be2c4e16b630bd426 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 3 Jun 2024 22:15:46 -0500 Subject: [PATCH 47/58] Fix clang format --- .../plugins-nextgen/common/include/GlobalHandler.h | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/offload/plugins-nextgen/common/include/GlobalHandler.h b/offload/plugins-nextgen/common/include/GlobalHandler.h index 6daa8b1b854131..1b38ce65216dde 100644 --- a/offload/plugins-nextgen/common/include/GlobalHandler.h +++ b/offload/plugins-nextgen/common/include/GlobalHandler.h @@ -65,13 +65,11 @@ struct __llvm_profile_data { }; extern "C" { -extern int __attribute__((weak)) -__llvm_write_custom_profile(const char *Target, - const __llvm_profile_data *DataBegin, - const __llvm_profile_data *DataEnd, - const char *CountersBegin, const char *CountersEnd, - const char *NamesBegin, const char *NamesEnd, - const uint64_t *VersionOverride); +extern int __attribute__((weak)) __llvm_write_custom_profile( + const char *Target, const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, const char *CountersBegin, + const char *CountersEnd, const char *NamesBegin, const char *NamesEnd, + const uint64_t *VersionOverride); } /// PGO profiling data extracted from a GPU device >From 488cb4a349fdfbd73d0a78ddb2c17522c46145ba Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 26 Jun 2024 18:18:31 -0500 Subject: [PATCH 48/58] Apply requested formatting changes --- clang/lib/CodeGen/CodeGenPGO.cpp | 11 +++++----- llvm/lib/ProfileData/InstrProf.cpp | 4 ++-- .../Instrumentation/InstrProfiling.cpp | 10 ++++----- .../Instrumentation/PGOInstrumentation.cpp | 21 ++++++++++--------- offload/DeviceRTL/src/Profiling.cpp | 6 ++++-- 5 files changed, 28 insertions(+), 24 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index a7ce0b8f6a35f3..3edfbdd679c61d 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1199,12 +1199,13 @@ void CodeGenPGO::emitCounterSetOrIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); + auto *NormalizedFuncNameVarPtr = + llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( + FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); - llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), - Builder.getInt32(NumRegionCounters), - Builder.getInt32(Counter), StepV}; + llvm::Value *Args[] = { + NormalizedFuncNameVarPtr, Builder.getInt64(FunctionHash), + Builder.getInt32(NumRegionCounters), Builder.getInt32(Counter), StepV}; if (llvm::EnableSingleByteCoverage) Builder.CreateCall(CGM.getIntrinsic(llvm::Intrinsic::instrprof_cover), diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp index 1284efd4b5f4da..6742435c9d065e 100644 --- a/llvm/lib/ProfileData/InstrProf.cpp +++ b/llvm/lib/ProfileData/InstrProf.cpp @@ -433,8 +433,8 @@ std::string getPGOFuncNameVarName(StringRef FuncName, } bool isGPUProfTarget(const Module &M) { - const auto &Triple = llvm::Triple(M.getTargetTriple()); - return Triple.isAMDGPU() || Triple.isNVPTX(); + const auto &T = Triple(M.getTargetTriple()); + return T.isAMDGPU() || T.isNVPTX(); } void setPGOFuncVisibility(Module &M, GlobalVariable *FuncNameVar) { diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index dd8c027c4bbf62..05cef1236f0879 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -869,8 +869,8 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { llvm::InstrProfValueKind::IPVK_MemOPSize); CallInst *Call = nullptr; auto *TLI = &GetTLI(*Ind->getFunction()); - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( - DataVar, PointerType::getUnqual(M.getContext())); + auto *NormalizedDataVarPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + DataVar, PointerType::get(M.getContext(), 0)); // To support value profiling calls within Windows exception handlers, funclet // information contained within operand bundles needs to be copied over to @@ -879,12 +879,12 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { SmallVector OpBundles; Ind->getOperandBundlesAsDefs(OpBundles); if (!IsMemOpSize) { - Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Value *Args[3] = {Ind->getTargetValue(), NormalizedDataVarPtr, Builder.getInt32(Index)}; Call = Builder.CreateCall(getOrInsertValueProfilingCall(M, *TLI), Args, OpBundles); } else { - Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Value *Args[3] = {Ind->getTargetValue(), NormalizedDataVarPtr, Builder.getInt32(Index)}; Call = Builder.CreateCall( getOrInsertValueProfilingCall(M, *TLI, ValueProfilingCallType::MemOp), @@ -1580,7 +1580,7 @@ void InstrLowerer::createDataVariable(InstrProfCntrInstBase *Inc) { ValuesVar->setAlignment(Align(8)); maybeSetComdat(ValuesVar, Fn, CntsVarName); ValuesPtrExpr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( - ValuesVar, PointerType::getUnqual(Fn->getContext())); + ValuesVar, PointerType::get(Fn->getContext(), 0)); } uint64_t NumCounters = Inc->getNumCounters()->getZExtValue(); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index ee1657ba8400ee..f8f34ea25597f3 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -884,7 +884,7 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + auto *NormalizedNamePtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); @@ -893,7 +893,7 @@ static void instrumentOneFunc( // i32 ) Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_cover), - {NormalizedPtr, CFGHash, Builder.getInt32(1), Builder.getInt32(0)}); + {NormalizedNamePtr, CFGHash, Builder.getInt32(1), Builder.getInt32(0)}); return; } @@ -948,7 +948,7 @@ static void instrumentOneFunc( // i32 ) Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_timestamp), - {NormalizedPtr, CFGHash, Builder.getInt32(NumCounters), + {NormalizedNamePtr, CFGHash, Builder.getInt32(NumCounters), Builder.getInt32(I)}); I += PGOBlockCoverage ? 8 : 1; } @@ -963,7 +963,7 @@ static void instrumentOneFunc( Intrinsic::getDeclaration(M, PGOBlockCoverage ? Intrinsic::instrprof_cover : Intrinsic::instrprof_increment), - {NormalizedPtr, CFGHash, Builder.getInt32(NumCounters), + {NormalizedNamePtr, CFGHash, Builder.getInt32(NumCounters), Builder.getInt32(I++)}); } @@ -1007,15 +1007,15 @@ static void instrumentOneFunc( ToProfile = Builder.CreatePtrToInt(Cand.V, Builder.getInt64Ty()); assert(ToProfile && "value profiling Value is of unexpected type"); - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + auto *NormalizedNamePtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); SmallVector OpBundles; populateEHOperandBundle(Cand, BlockColors, OpBundles); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_value_profile), - {NormalizedPtr, Builder.getInt64(FuncInfo.FunctionHash), ToProfile, - Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, + {NormalizedNamePtr, Builder.getInt64(FuncInfo.FunctionHash), + ToProfile, Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, OpBundles); } } // IPVK_First <= Kind <= IPVK_Last @@ -1688,11 +1688,12 @@ void SelectInstVisitor::instrumentOneSelectInst(SelectInst &SI) { IRBuilder<> Builder(&SI); Type *Int64Ty = Builder.getInt64Ty(); auto *Step = Builder.CreateZExt(SI.getCondition(), Int64Ty); - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( - FuncNameVar, PointerType::get(M->getContext(), 0)); + auto *NormalizedFuncNameVarPtr = + ConstantExpr::getPointerBitCastOrAddrSpaceCast( + FuncNameVar, PointerType::get(M->getContext(), 0)); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_increment_step), - {NormalizedPtr, Builder.getInt64(FuncHash), + {NormalizedFuncNameVarPtr, Builder.getInt64(FuncHash), Builder.getInt32(TotalNumCtrs), Builder.getInt32(*CurCtrIdx), Step}); ++(*CurCtrIdx); } diff --git a/offload/DeviceRTL/src/Profiling.cpp b/offload/DeviceRTL/src/Profiling.cpp index 799477f5e47d27..639c62ceff7a69 100644 --- a/offload/DeviceRTL/src/Profiling.cpp +++ b/offload/DeviceRTL/src/Profiling.cpp @@ -12,8 +12,10 @@ extern "C" { -void __llvm_profile_register_function(void *ptr) {} -void __llvm_profile_register_names_function(void *ptr, long int i) {} +// Provides empty implementations for certain functions in compiler-rt +// that are emitted by the PGO instrumentation. +void __llvm_profile_register_function(void *Ptr) {} +void __llvm_profile_register_names_function(void *Ptr, long int I) {} } #pragma omp end declare target >From b90c01583f1893802aba0180b07a448584585365 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 26 Jun 2024 18:29:59 -0500 Subject: [PATCH 49/58] Add memop function shim to DeviceRTL This comes up sometimes when using LLVM IR level instrumentation. --- offload/DeviceRTL/include/Profiling.h | 1 + offload/DeviceRTL/src/Profiling.cpp | 1 + 2 files changed, 2 insertions(+) diff --git a/offload/DeviceRTL/include/Profiling.h b/offload/DeviceRTL/include/Profiling.h index 9efc1554c176bc..d9947522541219 100644 --- a/offload/DeviceRTL/include/Profiling.h +++ b/offload/DeviceRTL/include/Profiling.h @@ -15,6 +15,7 @@ extern "C" { void __llvm_profile_register_function(void *Ptr); void __llvm_profile_register_names_function(void *Ptr, long int I); +void __llvm_profile_instrument_memop(long int I, void *Ptr, int I2); } #endif diff --git a/offload/DeviceRTL/src/Profiling.cpp b/offload/DeviceRTL/src/Profiling.cpp index 639c62ceff7a69..bb3caaadcc03dd 100644 --- a/offload/DeviceRTL/src/Profiling.cpp +++ b/offload/DeviceRTL/src/Profiling.cpp @@ -16,6 +16,7 @@ extern "C" { // that are emitted by the PGO instrumentation. void __llvm_profile_register_function(void *Ptr) {} void __llvm_profile_register_names_function(void *Ptr, long int I) {} +void __llvm_profile_instrument_memop(long int I, void *Ptr, int I2) {} } #pragma omp end declare target >From c68c6e2fa98a1fe608b88ed38f7db68eae804c5b Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 27 Jun 2024 02:04:27 -0500 Subject: [PATCH 50/58] Make requested changes --- compiler-rt/lib/profile/InstrProfiling.h | 2 +- compiler-rt/lib/profile/InstrProfilingFile.c | 1 - offload/plugins-nextgen/common/src/PluginInterface.cpp | 5 ++--- 3 files changed, 3 insertions(+), 5 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfiling.h b/compiler-rt/lib/profile/InstrProfiling.h index ef1292a45bf01d..eda3e9a673c1af 100644 --- a/compiler-rt/lib/profile/InstrProfiling.h +++ b/compiler-rt/lib/profile/InstrProfiling.h @@ -298,7 +298,7 @@ void __llvm_profile_set_dumped(); /*! * \brief Write custom target-specific profiling data to a seperate file. - * Used by libomptarget for GPU PGO. + * Used by offload PGO. */ int __llvm_write_custom_profile(const char *Target, const __llvm_profile_data *DataBegin, diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index 784cb9af6169d8..93436ecbabb40d 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1321,7 +1321,6 @@ int __llvm_write_custom_profile(const char *Target, /* Prepend "TARGET." to current filename */ memcpy(TargetFilename, Target, TargetLength); TargetFilename[TargetLength] = '.'; - memcpy(TargetFilename, Target, TargetLength); memcpy(TargetFilename + 1 + TargetLength, Filename, FilenameLength); TargetFilename[FilenameLength + 1 + TargetLength] = 0; diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp index c4e1e63777de8a..445f4ad942bd4d 100644 --- a/offload/plugins-nextgen/common/src/PluginInterface.cpp +++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp @@ -843,9 +843,8 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { ProfOrErr->dump(); // Write data to profiling file - if (auto Err = ProfOrErr->write()) { - consumeError(std::move(Err)); - } + if (auto Err = ProfOrErr->write()) + return Err; } // Delete the memory manager before deinitializing the device. Otherwise, >From ca52c58c7fde412897cf6b10b9bbb321812f193d Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 27 Jun 2024 02:26:20 -0500 Subject: [PATCH 51/58] Only dump counters if PGODump flag is set --- offload/include/Shared/Environment.h | 1 + offload/plugins-nextgen/common/src/PluginInterface.cpp | 4 +++- openmp/docs/design/Runtimes.rst | 1 + 3 files changed, 5 insertions(+), 1 deletion(-) diff --git a/offload/include/Shared/Environment.h b/offload/include/Shared/Environment.h index d141146b6bd5a1..86f6d1c6ea2d36 100644 --- a/offload/include/Shared/Environment.h +++ b/offload/include/Shared/Environment.h @@ -30,6 +30,7 @@ enum class DeviceDebugKind : uint32_t { FunctionTracing = 1U << 1, CommonIssues = 1U << 2, AllocationTracker = 1U << 3, + PGODump = 1U << 4, }; struct DeviceEnvironmentTy { diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp index 445f4ad942bd4d..35fb04863d8741 100644 --- a/offload/plugins-nextgen/common/src/PluginInterface.cpp +++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp @@ -840,7 +840,9 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { return ProfOrErr.takeError(); // Dump out profdata - ProfOrErr->dump(); + if ((OMPX_DebugKind.get() & uint32_t(DeviceDebugKind::PGODump)) == + uint32_t(DeviceDebugKind::PGODump)) + ProfOrErr->dump(); // Write data to profiling file if (auto Err = ProfOrErr->write()) diff --git a/openmp/docs/design/Runtimes.rst b/openmp/docs/design/Runtimes.rst index f8a8cb87e83e66..7fc697a838e229 100644 --- a/openmp/docs/design/Runtimes.rst +++ b/openmp/docs/design/Runtimes.rst @@ -1493,3 +1493,4 @@ debugging features are supported. * Enable debugging assertions in the device. ``0x01`` * Enable diagnosing common problems during offloading . ``0x4`` * Enable device malloc statistics (amdgpu only). ``0x8`` + * Dump device PGO counters (only if PGO on GPU is enabled). ``0x10`` >From ee4431a1b57469c7679f54f124ca5f3dd7f0433b Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 9 Aug 2024 20:21:38 -0500 Subject: [PATCH 52/58] Update requirements --- offload/test/offloading/pgo1.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index d6747113265803..fbf6337374a997 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -12,10 +12,7 @@ // RUN: %target_triple.clang.profraw | %fcheck-generic \ // RUN: --check-prefix="CLANG-PGO" -// UNSUPPORTED: x86_64-pc-linux-gnu -// UNSUPPORTED: x86_64-pc-linux-gnu-LTO -// UNSUPPORTED: aarch64-unknown-linux-gnu -// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +// REQUIRES: gpu // REQUIRES: pgo int test1(int a) { return a / 2; } >From f9a24e35dfce2b18d0c4acefdaa0e71561bb875d Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 9 Aug 2024 20:30:58 -0500 Subject: [PATCH 53/58] Update test requirements --- offload/test/offloading/gpupgo/pgo1.c | 5 +---- offload/test/offloading/gpupgo/pgo2.c | 5 +---- 2 files changed, 2 insertions(+), 8 deletions(-) diff --git a/offload/test/offloading/gpupgo/pgo1.c b/offload/test/offloading/gpupgo/pgo1.c index 7c6e55f6546424..f5d8aee7908be0 100644 --- a/offload/test/offloading/gpupgo/pgo1.c +++ b/offload/test/offloading/gpupgo/pgo1.c @@ -12,10 +12,7 @@ // RUN: %target_triple.%basename_t.clang.profraw | \ // RUN: %fcheck-generic --check-prefix="CLANG-PGO" -// UNSUPPORTED: x86_64-pc-linux-gnu -// UNSUPPORTED: x86_64-pc-linux-gnu-LTO -// UNSUPPORTED: aarch64-unknown-linux-gnu -// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +// REQUIRES: gpu // REQUIRES: pgo int test1(int a) { return a / 2; } diff --git a/offload/test/offloading/gpupgo/pgo2.c b/offload/test/offloading/gpupgo/pgo2.c index 1819573d55024a..b5d0f2120754af 100644 --- a/offload/test/offloading/gpupgo/pgo2.c +++ b/offload/test/offloading/gpupgo/pgo2.c @@ -20,10 +20,7 @@ // RUN: %target_triple.%basename_t.clang.profraw | \ // RUN: %fcheck-generic --check-prefix="CLANG-DEV" -// UNSUPPORTED: x86_64-pc-linux-gnu -// UNSUPPORTED: x86_64-pc-linux-gnu-LTO -// UNSUPPORTED: aarch64-unknown-linux-gnu -// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +// REQUIRES: gpu // REQUIRES: pgo int main() { >From fb699b6bca72d42359a304bcbba88f3564ae9ac9 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Sat, 10 Aug 2024 00:54:36 -0500 Subject: [PATCH 54/58] Merge changes --- offload/plugins-nextgen/common/src/GlobalHandler.cpp | 2 +- offload/test/offloading/pgo1.c | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/offload/plugins-nextgen/common/src/GlobalHandler.cpp b/offload/plugins-nextgen/common/src/GlobalHandler.cpp index bca66cff6558a2..d7bfbba01c8efc 100644 --- a/offload/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/offload/plugins-nextgen/common/src/GlobalHandler.cpp @@ -193,7 +193,7 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, // Check if given current global is a profiling global based // on name - if (NameOrErr->equals(getInstrProfNamesVarName())) { + if (*NameOrErr == getInstrProfNamesVarName()) { // Read in profiled function names DeviceProfileData.NamesData = SmallVector(Sym.getSize(), 0); GlobalTy NamesGlobal(NameOrErr->str(), Sym.getSize(), diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index fbf6337374a997..3270ce8f15e7dc 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -32,17 +32,17 @@ int main() { // LLVM-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 4 -// LLVM-PGO: Block counts: [20, 10, 20, 10] +// LLVM-PGO: Block counts: [20, 10, 2, 1] // LLVM-PGO-LABEL: test1: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Block counts: [1] +// LLVM-PGO: Block counts: [10] // LLVM-PGO-LABEL: test2: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Block counts: [1] +// LLVM-PGO: Block counts: [20] // CLANG-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} >From 5727284c17e1a0eadfbcbc544d06e0dca0a4384b Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 12 Aug 2024 16:50:43 -0500 Subject: [PATCH 55/58] Merge changes --- offload/plugins-nextgen/common/src/GlobalHandler.cpp | 2 +- offload/test/offloading/gpupgo/pgo1.c | 8 ++++---- offload/test/offloading/gpupgo/pgo2.c | 2 +- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/offload/plugins-nextgen/common/src/GlobalHandler.cpp b/offload/plugins-nextgen/common/src/GlobalHandler.cpp index 93abd0a5cea365..0627b7f4a7f5b7 100644 --- a/offload/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/offload/plugins-nextgen/common/src/GlobalHandler.cpp @@ -194,7 +194,7 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, // Check if given current global is a profiling global based // on name - if (NameOrErr->equals(getInstrProfNamesVarName())) { + if (*NameOrErr == getInstrProfNamesVarName()) { // Read in profiled function names DeviceProfileData.NamesData = SmallVector(Sym.getSize(), 0); GlobalTy NamesGlobal(NameOrErr->str(), Sym.getSize(), diff --git a/offload/test/offloading/gpupgo/pgo1.c b/offload/test/offloading/gpupgo/pgo1.c index f5d8aee7908be0..7196663fcfc908 100644 --- a/offload/test/offloading/gpupgo/pgo1.c +++ b/offload/test/offloading/gpupgo/pgo1.c @@ -32,17 +32,17 @@ int main() { // LLVM-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 4 -// LLVM-PGO: Block counts: [20, 10, 20, 10] +// LLVM-PGO: Block counts: [20, 10, 2, 1] // LLVM-PGO-LABEL: test1: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Block counts: [1] +// LLVM-PGO: Block counts: [10] // LLVM-PGO-LABEL: test2: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Block counts: [1] +// LLVM-PGO: Block counts: [20] // LLVM-PGO-LABEL: Instrumentation level: // LLVM-PGO-SAME: IR @@ -52,7 +52,7 @@ int main() { // LLVM-PGO-LABEL: Maximum function count: // LLVM-PGO-SAME: 20 // LLVM-PGO-LABEL: Maximum internal block count: -// LLVM-PGO-SAME: 20 +// LLVM-PGO-SAME: 10 // CLANG-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} diff --git a/offload/test/offloading/gpupgo/pgo2.c b/offload/test/offloading/gpupgo/pgo2.c index b5d0f2120754af..7f5c9ab7449074 100644 --- a/offload/test/offloading/gpupgo/pgo2.c +++ b/offload/test/offloading/gpupgo/pgo2.c @@ -49,7 +49,7 @@ int main() { // LLVM-DEVICE-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // LLVM-DEVICE: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-DEVICE: Counters: 3 -// LLVM-DEVICE: Block counts: [10, 1, 1] +// LLVM-DEVICE: Block counts: [10, 2, 1] // CLANG-HOST-LABEL: main: // CLANG-HOST: Hash: {{0[xX][0-9a-fA-F]+}} >From f5d9f550cd711bd86c0d7b7bf70b7f5ecd9772e0 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 27 Sep 2024 19:37:45 -0500 Subject: [PATCH 56/58] Rebase updates --- compiler-rt/lib/profile/InstrProfilingFile.c | 1 - .../common/src/PluginInterface.cpp | 5 +- offload/test/offloading/pgo1.c | 74 ------------------- 3 files changed, 2 insertions(+), 78 deletions(-) delete mode 100644 offload/test/offloading/pgo1.c diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index a912de79da03a6..c40942e37cb072 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1359,7 +1359,6 @@ int __llvm_write_custom_profile(const char *Target, /* Prepend "TARGET." to current filename */ memcpy(TargetFilename, Target, TargetLength); TargetFilename[TargetLength] = '.'; - memcpy(TargetFilename, Target, TargetLength); memcpy(TargetFilename + 1 + TargetLength, Filename, FilenameLength); TargetFilename[FilenameLength + 1 + TargetLength] = 0; diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp index 0153da6cbf6773..dc6f50cab9d587 100644 --- a/offload/plugins-nextgen/common/src/PluginInterface.cpp +++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp @@ -859,9 +859,8 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { ProfOrErr->dump(); // Write data to profiling file - if (auto Err = ProfOrErr->write()) { - consumeError(std::move(Err)); - } + if (auto Err = ProfOrErr->write()) + return Err; } // Delete the memory manager before deinitializing the device. Otherwise, diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c deleted file mode 100644 index c0d698323adf06..00000000000000 --- a/offload/test/offloading/pgo1.c +++ /dev/null @@ -1,74 +0,0 @@ -// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ -// RUN: -Xclang "-fprofile-instrument=clang" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ -// RUN: --check-prefix="CLANG-PGO" -// RUN: %libomptarget-compile-generic -fprofile-generate \ -// RUN: -Xclang "-fprofile-instrument=llvm" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ -// RUN: --check-prefix="LLVM-PGO" - -// REQUIRES: gpu -// REQUIRES: pgo - -#ifdef _OPENMP -#include -#endif - -int test1(int a) { return a / 2; } -int test2(int a) { return a * 2; } - -int main() { - int m = 2; -#pragma omp target - for (int i = 0; i < 10; i++) { - m = test1(m); - for (int j = 0; j < 2; j++) { - m = test2(m); - } - } -} - -// CLANG-PGO: ======== Counters ========= -// CLANG-PGO-NEXT: [ 0 11 20 ] -// CLANG-PGO-NEXT: [ 10 ] -// CLANG-PGO-NEXT: [ 20 ] -// CLANG-PGO-NEXT: ========== Data =========== -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: ======== Functions ======== -// CLANG-PGO-NEXT: pgo1.c: -// CLANG-PGO-SAME: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} -// CLANG-PGO-NEXT: test1 -// CLANG-PGO-NEXT: test2 - -// LLVM-PGO: ======== Counters ========= -// LLVM-PGO-NEXT: [ 20 10 2 1 ] -// LLVM-PGO-NEXT: [ 10 ] -// LLVM-PGO-NEXT: [ 20 ] -// LLVM-PGO-NEXT: ========== Data =========== -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: ======== Functions ======== -// LLVM-PGO-NEXT: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} -// LLVM-PGO-NEXT: test1 -// LLVM-PGO-NEXT: test2 >From 1d0a961aabe488e6d09b96a80329498b8f586923 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 25 Oct 2024 13:42:19 -0500 Subject: [PATCH 57/58] Add llvm-profdata substitution to offload tests --- offload/test/lit.cfg | 2 ++ offload/test/lit.site.cfg.in | 2 +- offload/test/offloading/pgo1.c | 4 ++-- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/offload/test/lit.cfg b/offload/test/lit.cfg index 7994a08ba063fb..cfd1ad6c3c1eb5 100644 --- a/offload/test/lit.cfg +++ b/offload/test/lit.cfg @@ -112,8 +112,10 @@ config.available_features.add(config.libomptarget_current_target) if config.libomptarget_has_libc: config.available_features.add('libc') +profdata_path = os.path.join(config.bin_llvm_tools_dir, "llvm-profdata") if config.libomptarget_test_pgo: config.available_features.add('pgo') + config.substitutions.append(("%profdata", profdata_path)) # Determine whether the test system supports unified memory. # For CUDA, this is the case with compute capability 70 (Volta) or higher. diff --git a/offload/test/lit.site.cfg.in b/offload/test/lit.site.cfg.in index a1cb5acc38a405..d998fb0c839700 100644 --- a/offload/test/lit.site.cfg.in +++ b/offload/test/lit.site.cfg.in @@ -1,6 +1,6 @@ @AUTO_GEN_COMMENT@ -config.bin_llvm_tools_dir = "@CMAKE_BINARY_DIR@/bin" +config.bin_llvm_tools_dir = "@LLVM_RUNTIME_OUTPUT_INTDIR@" config.test_c_compiler = "@OPENMP_TEST_C_COMPILER@" config.test_cxx_compiler = "@OPENMP_TEST_CXX_COMPILER@" config.test_fortran_compiler="@OPENMP_TEST_Fortran_COMPILER@" diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index 1ef540e430a27a..51671afa62b0db 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -1,14 +1,14 @@ // RUN: %libomptarget-compile-generic -fprofile-generate \ // RUN: -Xclang "-fprofile-instrument=llvm" // RUN: env LLVM_PROFILE_FILE=llvm.profraw %libomptarget-run-generic 2>&1 -// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %profdata show --all-functions --counts \ // RUN: %target_triple.llvm.profraw | %fcheck-generic \ // RUN: --check-prefix="LLVM-PGO" // RUN: %libomptarget-compile-generic -fprofile-instr-generate \ // RUN: -Xclang "-fprofile-instrument=clang" // RUN: env LLVM_PROFILE_FILE=clang.profraw %libomptarget-run-generic 2>&1 -// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %profdata show --all-functions --counts \ // RUN: %target_triple.clang.profraw | %fcheck-generic \ // RUN: --check-prefix="CLANG-PGO" >From c6b34ad7a676a462955b2e7b534b12264363b430 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 28 Oct 2024 18:45:37 -0500 Subject: [PATCH 58/58] Prepend target prefix to basename --- compiler-rt/lib/profile/InstrProfilingFile.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index decafbcb1a5352..6b6f47e239714c 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1355,10 +1355,21 @@ int __llvm_write_custom_profile(const char *Target, TargetFilename = (char *)COMPILER_RT_ALLOCA(FilenameLength + TargetLength + 2); + /* Find file basename and path sizes */ + int32_t DirEnd = FilenameLength - 1; + while (DirEnd >= 0 && !IS_DIR_SEPARATOR(Filename[DirEnd])) { + DirEnd--; + } + uint32_t DirSize = DirEnd + 1, BaseSize = FilenameLength - DirSize; + /* Prepend "TARGET." to current filename */ - memcpy(TargetFilename, Target, TargetLength); - TargetFilename[TargetLength] = '.'; - memcpy(TargetFilename + 1 + TargetLength, Filename, FilenameLength); + if (DirSize > 0) { + memcpy(TargetFilename, Filename, DirSize); + } + memcpy(TargetFilename + DirSize, Target, TargetLength); + TargetFilename[TargetLength + DirSize] = '.'; + memcpy(TargetFilename + DirSize + 1 + TargetLength, Filename + DirSize, + BaseSize); TargetFilename[FilenameLength + 1 + TargetLength] = 0; /* Check if there is llvm/runtime version mismatch. */ From openmp-commits at lists.llvm.org Fri Dec 27 10:41:45 2024 From: openmp-commits at lists.llvm.org (Ethan Luis McDonough via Openmp-commits) Date: Fri, 27 Dec 2024 10:41:45 -0800 (PST) Subject: [Openmp-commits] [clang] [compiler-rt] [llvm] [openmp] [PGO][Offload] Add GPU profiling flags to driver (PR #94268) In-Reply-To: Message-ID: <676ef4e9.170a0220.352a47.99de@mx.google.com> https://github.com/EthanLuisMcDonough updated https://github.com/llvm/llvm-project/pull/94268 >From 530eb982b9770190377bb0bd09c5cb715f34d484 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 15 Dec 2023 20:38:38 -0600 Subject: [PATCH 01/58] Add profiling functions to libomptarget --- .../include/llvm/Frontend/OpenMP/OMPKinds.def | 3 +++ openmp/libomptarget/DeviceRTL/CMakeLists.txt | 2 ++ .../DeviceRTL/include/Profiling.h | 21 +++++++++++++++++++ .../libomptarget/DeviceRTL/src/Profiling.cpp | 19 +++++++++++++++++ 4 files changed, 45 insertions(+) create mode 100644 openmp/libomptarget/DeviceRTL/include/Profiling.h create mode 100644 openmp/libomptarget/DeviceRTL/src/Profiling.cpp diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def index d22d2a8e948b00..1d887d5cb58127 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def +++ b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def @@ -503,6 +503,9 @@ __OMP_RTL(__kmpc_barrier_simple_generic, false, Void, IdentPtr, Int32) __OMP_RTL(__kmpc_warp_active_thread_mask, false, Int64,) __OMP_RTL(__kmpc_syncwarp, false, Void, Int64) +__OMP_RTL(__llvm_profile_register_function, false, Void, VoidPtr) +__OMP_RTL(__llvm_profile_register_names_function, false, Void, VoidPtr, Int64) + __OMP_RTL(__last, false, Void, ) #undef __OMP_RTL diff --git a/openmp/libomptarget/DeviceRTL/CMakeLists.txt b/openmp/libomptarget/DeviceRTL/CMakeLists.txt index 1ce3e1e40a80ab..55ee15d068c67b 100644 --- a/openmp/libomptarget/DeviceRTL/CMakeLists.txt +++ b/openmp/libomptarget/DeviceRTL/CMakeLists.txt @@ -89,6 +89,7 @@ set(include_files ${include_directory}/Interface.h ${include_directory}/LibC.h ${include_directory}/Mapping.h + ${include_directory}/Profiling.h ${include_directory}/State.h ${include_directory}/Synchronization.h ${include_directory}/Types.h @@ -104,6 +105,7 @@ set(src_files ${source_directory}/Mapping.cpp ${source_directory}/Misc.cpp ${source_directory}/Parallelism.cpp + ${source_directory}/Profiling.cpp ${source_directory}/Reduction.cpp ${source_directory}/State.cpp ${source_directory}/Synchronization.cpp diff --git a/openmp/libomptarget/DeviceRTL/include/Profiling.h b/openmp/libomptarget/DeviceRTL/include/Profiling.h new file mode 100644 index 00000000000000..68c7744cd60752 --- /dev/null +++ b/openmp/libomptarget/DeviceRTL/include/Profiling.h @@ -0,0 +1,21 @@ +//===-------- Profiling.h - OpenMP interface ---------------------- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// +//===----------------------------------------------------------------------===// + +#ifndef OMPTARGET_DEVICERTL_PROFILING_H +#define OMPTARGET_DEVICERTL_PROFILING_H + +extern "C" { + +void __llvm_profile_register_function(void *ptr); +void __llvm_profile_register_names_function(void *ptr, long int i); +} + +#endif diff --git a/openmp/libomptarget/DeviceRTL/src/Profiling.cpp b/openmp/libomptarget/DeviceRTL/src/Profiling.cpp new file mode 100644 index 00000000000000..799477f5e47d27 --- /dev/null +++ b/openmp/libomptarget/DeviceRTL/src/Profiling.cpp @@ -0,0 +1,19 @@ +//===------- Profiling.cpp ---------------------------------------- C++ ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#include "Profiling.h" + +#pragma omp begin declare target device_type(nohost) + +extern "C" { + +void __llvm_profile_register_function(void *ptr) {} +void __llvm_profile_register_names_function(void *ptr, long int i) {} +} + +#pragma omp end declare target >From fb067d4ffe604fd68cf90b705db1942bce49dbb1 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Sat, 16 Dec 2023 01:18:41 -0600 Subject: [PATCH 02/58] Fix PGO instrumentation for GPU targets --- clang/lib/CodeGen/CodeGenPGO.cpp | 10 ++++++++-- .../lib/Transforms/Instrumentation/InstrProfiling.cpp | 11 ++++++++--- 2 files changed, 16 insertions(+), 5 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index 81bf8ea696b164..edae6885b528ac 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -959,8 +959,14 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, unsigned Counter = (*RegionCounterMap)[S]; - llvm::Value *Args[] = {FuncNameVar, - Builder.getInt64(FunctionHash), + // Make sure that pointer to global is passed in with zero addrspace + // This is relevant during GPU profiling + auto *I8Ty = llvm::Type::getInt8Ty(CGM.getLLVMContext()); + auto *I8PtrTy = llvm::PointerType::getUnqual(I8Ty); + auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( + FuncNameVar, I8PtrTy); + + llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), Builder.getInt32(NumRegionCounters), Builder.getInt32(Counter), StepV}; if (!StepV) diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index fe5a0578bd9721..d2cb8155c17967 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -1658,10 +1658,13 @@ void InstrLowerer::emitRegistration() { IRBuilder<> IRB(BasicBlock::Create(M.getContext(), "", RegisterF)); for (Value *Data : CompilerUsedVars) if (!isa(Data)) - IRB.CreateCall(RuntimeRegisterF, Data); + // Check for addrspace cast when profiling GPU + IRB.CreateCall(RuntimeRegisterF, + IRB.CreatePointerBitCastOrAddrSpaceCast(Data, VoidPtrTy)); for (Value *Data : UsedVars) if (Data != NamesVar && !isa(Data)) - IRB.CreateCall(RuntimeRegisterF, Data); + IRB.CreateCall(RuntimeRegisterF, + IRB.CreatePointerBitCastOrAddrSpaceCast(Data, VoidPtrTy)); if (NamesVar) { Type *ParamTypes[] = {VoidPtrTy, Int64Ty}; @@ -1670,7 +1673,9 @@ void InstrLowerer::emitRegistration() { auto *NamesRegisterF = Function::Create(NamesRegisterTy, GlobalVariable::ExternalLinkage, getInstrProfNamesRegFuncName(), M); - IRB.CreateCall(NamesRegisterF, {NamesVar, IRB.getInt64(NamesSize)}); + IRB.CreateCall(NamesRegisterF, {IRB.CreatePointerBitCastOrAddrSpaceCast( + NamesVar, VoidPtrTy), + IRB.getInt64(NamesSize)}); } IRB.CreateRetVoid(); >From 7a0e0efa178cc4de6a22a8f5cc3f53cd1c81ea3a Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 21 Dec 2023 00:25:46 -0600 Subject: [PATCH 03/58] Change global visibility on GPU targets --- llvm/include/llvm/ProfileData/InstrProf.h | 4 ++++ llvm/lib/ProfileData/InstrProf.cpp | 17 +++++++++++++++-- .../Instrumentation/InstrProfiling.cpp | 15 +++++++++++---- 3 files changed, 30 insertions(+), 6 deletions(-) diff --git a/llvm/include/llvm/ProfileData/InstrProf.h b/llvm/include/llvm/ProfileData/InstrProf.h index 288dc71d756aee..bf9899d867e3dd 100644 --- a/llvm/include/llvm/ProfileData/InstrProf.h +++ b/llvm/include/llvm/ProfileData/InstrProf.h @@ -171,6 +171,10 @@ inline StringRef getInstrProfCounterBiasVarName() { /// Return the marker used to separate PGO names during serialization. inline StringRef getInstrProfNameSeparator() { return "\01"; } +/// Determines whether module targets a GPU eligable for PGO +/// instrumentation +bool isGPUProfTarget(const Module &M); + /// Return the modified name for function \c F suitable to be /// used the key for profile lookup. Variable \c InLTO indicates if this /// is called in LTO optimization passes. diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp index 649d814cfd9de0..0d6717aeb0142c 100644 --- a/llvm/lib/ProfileData/InstrProf.cpp +++ b/llvm/lib/ProfileData/InstrProf.cpp @@ -410,13 +410,22 @@ std::string getPGOFuncNameVarName(StringRef FuncName, return VarName; } +bool isGPUProfTarget(const Module &M) { + const auto &triple = M.getTargetTriple(); + return triple.rfind("nvptx", 0) == 0 || triple.rfind("amdgcn", 0) == 0 || + triple.rfind("r600", 0) == 0; +} + GlobalVariable *createPGOFuncNameVar(Module &M, GlobalValue::LinkageTypes Linkage, StringRef PGOFuncName) { + // Ensure profiling variables on GPU are visible to be read from host + if (isGPUProfTarget(M)) + Linkage = GlobalValue::ExternalLinkage; // We generally want to match the function's linkage, but available_externally // and extern_weak both have the wrong semantics, and anything that doesn't // need to link across compilation units doesn't need to be visible at all. - if (Linkage == GlobalValue::ExternalWeakLinkage) + else if (Linkage == GlobalValue::ExternalWeakLinkage) Linkage = GlobalValue::LinkOnceAnyLinkage; else if (Linkage == GlobalValue::AvailableExternallyLinkage) Linkage = GlobalValue::LinkOnceODRLinkage; @@ -430,8 +439,12 @@ GlobalVariable *createPGOFuncNameVar(Module &M, new GlobalVariable(M, Value->getType(), true, Linkage, Value, getPGOFuncNameVarName(PGOFuncName, Linkage)); + // If the target is a GPU, make the symbol protected so it can + // be read from the host device + if (isGPUProfTarget(M)) + FuncNameVar->setVisibility(GlobalValue::ProtectedVisibility); // Hide the symbol so that we correctly get a copy for each executable. - if (!GlobalValue::isLocalLinkage(FuncNameVar->getLinkage())) + else if (!GlobalValue::isLocalLinkage(FuncNameVar->getLinkage())) FuncNameVar->setVisibility(GlobalValue::HiddenVisibility); return FuncNameVar; diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index d2cb8155c17967..3b582b65190808 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -1481,6 +1481,10 @@ void InstrLowerer::createDataVariable(InstrProfCntrInstBase *Inc) { for (uint32_t Kind = IPVK_First; Kind <= IPVK_Last; ++Kind) Int16ArrayVals[Kind] = ConstantInt::get(Int16Ty, PD.NumValueSites[Kind]); + if (isGPUProfTarget(M)) { + Linkage = GlobalValue::ExternalLinkage; + Visibility = GlobalValue::ProtectedVisibility; + } // If the data variable is not referenced by code (if we don't emit // @llvm.instrprof.value.profile, NS will be 0), and the counter keeps the // data variable live under linker GC, the data variable can be private. This @@ -1492,9 +1496,9 @@ void InstrLowerer::createDataVariable(InstrProfCntrInstBase *Inc) { // If profd is in a deduplicate comdat, NS==0 with a hash suffix guarantees // that other copies must have the same CFG and cannot have value profiling. // If no hash suffix, other profd copies may be referenced by code. - if (NS == 0 && !(DataReferencedByCode && NeedComdat && !Renamed) && - (TT.isOSBinFormatELF() || - (!DataReferencedByCode && TT.isOSBinFormatCOFF()))) { + else if (NS == 0 && !(DataReferencedByCode && NeedComdat && !Renamed) && + (TT.isOSBinFormatELF() || + (!DataReferencedByCode && TT.isOSBinFormatCOFF()))) { Linkage = GlobalValue::PrivateLinkage; Visibility = GlobalValue::DefaultVisibility; } @@ -1696,7 +1700,10 @@ bool InstrLowerer::emitRuntimeHook() { auto *Var = new GlobalVariable(M, Int32Ty, false, GlobalValue::ExternalLinkage, nullptr, getInstrProfRuntimeHookVarName()); - Var->setVisibility(GlobalValue::HiddenVisibility); + if (isGPUProfTarget(M)) + Var->setVisibility(GlobalValue::ProtectedVisibility); + else + Var->setVisibility(GlobalValue::HiddenVisibility); if (TT.isOSBinFormatELF() && !TT.isPS()) { // Mark the user variable as used so that it isn't stripped out. >From fddc07908ed9aa698fe3250ddbfc5621ab4d049d Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 22 Dec 2023 23:43:29 -0600 Subject: [PATCH 04/58] Make names global public on GPU --- llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index 3b582b65190808..61fba7be3ee0ee 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -1621,6 +1621,13 @@ void InstrLowerer::emitNameData() { NamesVar = new GlobalVariable(M, NamesVal->getType(), true, GlobalValue::PrivateLinkage, NamesVal, getInstrProfNamesVarName()); + + // Make names variable public if current target is a GPU + if (isGPUProfTarget(M)) { + NamesVar->setLinkage(GlobalValue::ExternalLinkage); + NamesVar->setVisibility(GlobalValue::VisibilityTypes::ProtectedVisibility); + } + NamesSize = CompressedNameStr.size(); setGlobalVariableLargeSection(TT, *NamesVar); NamesVar->setSection( >From e9db03c70bf79f4f4ddad4b48a5aa63a37e0d4f6 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 29 Dec 2023 12:54:50 -0600 Subject: [PATCH 05/58] Read and print GPU device PGO globals --- .../common/include/GlobalHandler.h | 27 ++++++ .../common/src/GlobalHandler.cpp | 82 +++++++++++++++++++ .../common/src/PluginInterface.cpp | 14 ++++ 3 files changed, 123 insertions(+) diff --git a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h index fa079ac9660ee0..a82cd536487653 100644 --- a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h +++ b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h @@ -14,9 +14,11 @@ #define LLVM_OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_GLOBALHANDLER_H #include +#include #include "llvm/ADT/DenseMap.h" #include "llvm/Object/ELFObjectFile.h" +#include "llvm/ProfileData/InstrProf.h" #include "Shared/Debug.h" #include "Shared/Utils.h" @@ -58,6 +60,22 @@ class GlobalTy { void setPtr(void *P) { Ptr = P; } }; +typedef void *IntPtrT; +struct __llvm_profile_data { +#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) Type Name; +#include "llvm/ProfileData/InstrProfData.inc" +}; + +/// PGO profiling data extracted from a GPU device +struct GPUProfGlobals { + std::string names; + std::vector> counts; + std::vector<__llvm_profile_data> data; + Triple targetTriple; + + void dump() const; +}; + /// Subclass of GlobalTy that holds the memory for a global of \p Ty. template class StaticGlobalTy : public GlobalTy { Ty Data; @@ -172,6 +190,15 @@ class GenericGlobalHandlerTy { return moveGlobalBetweenDeviceAndHost(Device, Image, HostGlobal, /* D2H */ false); } + + /// Checks whether a given image contains profiling globals. + bool hasProfilingGlobals(GenericDeviceTy &Device, DeviceImageTy &Image); + + /// Reads profiling data from a GPU image to supplied profdata struct. + /// Iterates through the image symbol table and stores global values + /// with profiling prefixes. + Expected readProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image); }; } // namespace plugin diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index 3a272e228c7dfe..5dd5daec468ca5 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -176,3 +176,85 @@ Error GenericGlobalHandlerTy::readGlobalFromImage(GenericDeviceTy &Device, return Plugin::success(); } + +bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GlobalTy global(getInstrProfNamesVarName().str(), 0); + if (auto Err = getGlobalMetadataFromImage(Device, Image, global)) { + consumeError(std::move(Err)); + return false; + } + return true; +} + +Expected +GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GPUProfGlobals profdata; + const auto *elf = getOrCreateELFObjectFile(Device, Image); + profdata.targetTriple = elf->makeTriple(); + // Iterate through + for (auto &sym : elf->symbols()) { + if (auto name = sym.getName()) { + // Check if given current global is a profiling global based + // on name + if (name->equals(getInstrProfNamesVarName())) { + // Read in profiled function names + std::vector chars(sym.getSize() / sizeof(char), ' '); + GlobalTy NamesGlobal(name->str(), sym.getSize(), chars.data()); + if (auto Err = readGlobalFromDevice(Device, Image, NamesGlobal)) + return Err; + std::string names(chars.begin(), chars.end()); + profdata.names = std::move(names); + } else if (name->starts_with(getInstrProfCountersVarPrefix())) { + // Read global variable profiling counts + std::vector counts(sym.getSize() / sizeof(int64_t), 0); + GlobalTy CountGlobal(name->str(), sym.getSize(), counts.data()); + if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal)) + return Err; + profdata.counts.push_back(std::move(counts)); + } else if (name->starts_with(getInstrProfDataVarPrefix())) { + // Read profiling data for this global variable + __llvm_profile_data data{}; + GlobalTy DataGlobal(name->str(), sym.getSize(), &data); + if (auto Err = readGlobalFromDevice(Device, Image, DataGlobal)) + return Err; + profdata.data.push_back(std::move(data)); + } + } + } + return profdata; +} + +void GPUProfGlobals::dump() const { + llvm::outs() << "======= GPU Profile =======\nTarget: " << targetTriple.str() + << "\n"; + + llvm::outs() << "======== Counters =========\n"; + for (const auto &count : counts) { + llvm::outs() << "["; + for (size_t i = 0; i < count.size(); i++) { + if (i == 0) + llvm::outs() << " "; + llvm::outs() << count[i] << " "; + } + llvm::outs() << "]\n"; + } + + llvm::outs() << "========== Data ===========\n"; + for (const auto &d : data) { + llvm::outs() << "{ "; +#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ + llvm::outs() << d.Name << " "; +#include "llvm/ProfileData/InstrProfData.inc" + llvm::outs() << " }\n"; + } + + llvm::outs() << "======== Functions ========\n"; + InstrProfSymtab symtab; + if (Error Err = symtab.create(StringRef(names))) { + consumeError(std::move(Err)); + } + symtab.dumpNames(llvm::outs()); + llvm::outs() << "===========================\n"; +} diff --git a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp index 3c7d1ca8998787..84ed90f03f84f1 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp @@ -811,6 +811,20 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { DeviceMemoryPoolTracking.AllocationMax); } + for (auto *Image : LoadedImages) { + GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler(); + if (!Handler.hasProfilingGlobals(*this, *Image)) + continue; + + GPUProfGlobals profdata; + auto ProfOrErr = Handler.readProfilingGlobals(*this, *Image); + if (!ProfOrErr) + return ProfOrErr.takeError(); + + // TODO: write data to profiling file + ProfOrErr->dump(); + } + // Delete the memory manager before deinitializing the device. Otherwise, // we may delete device allocations after the device is deinitialized. if (MemoryManager) >From e4687605d1a6ca932312025826db09dba84845a3 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 3 Jan 2024 17:06:15 -0600 Subject: [PATCH 06/58] Fix rebase bug --- .../plugins-nextgen/common/src/GlobalHandler.cpp | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index cb71b61f4a9c4f..86742d0f77a2fe 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -178,10 +178,12 @@ Expected GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, DeviceImageTy &Image) { GPUProfGlobals profdata; - const auto *elf = getOrCreateELFObjectFile(Device, Image); - profdata.targetTriple = elf->makeTriple(); - // Iterate through - for (auto &sym : elf->symbols()) { + auto ELFObj = getELFObjectFile(Image); + if (!ELFObj) + return ELFObj.takeError(); + profdata.targetTriple = ELFObj->makeTriple(); + // Iterate through elf symbols + for (auto &sym : ELFObj->symbols()) { if (auto name = sym.getName()) { // Check if given current global is a profiling global based // on name >From ec18ce94c227e1d43927955fa1c67360ecfcfca6 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 3 Jan 2024 17:10:19 -0600 Subject: [PATCH 07/58] Refactor portions to be more idiomatic --- clang/lib/CodeGen/CodeGenPGO.cpp | 4 +--- llvm/lib/ProfileData/InstrProf.cpp | 5 ++--- 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index edae6885b528ac..7bfcec43ee4c98 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -961,10 +961,8 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *I8Ty = llvm::Type::getInt8Ty(CGM.getLLVMContext()); - auto *I8PtrTy = llvm::PointerType::getUnqual(I8Ty); auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - FuncNameVar, I8PtrTy); + FuncNameVar, llvm::PointerType::getUnqual(CGM.getLLVMContext())); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), Builder.getInt32(NumRegionCounters), diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp index cdcd6840bb5108..1d88da16a5ff9c 100644 --- a/llvm/lib/ProfileData/InstrProf.cpp +++ b/llvm/lib/ProfileData/InstrProf.cpp @@ -429,9 +429,8 @@ std::string getPGOFuncNameVarName(StringRef FuncName, } bool isGPUProfTarget(const Module &M) { - const auto &triple = M.getTargetTriple(); - return triple.rfind("nvptx", 0) == 0 || triple.rfind("amdgcn", 0) == 0 || - triple.rfind("r600", 0) == 0; + const auto &Triple = llvm::Triple(M.getTargetTriple()); + return Triple.isAMDGPU() || Triple.isNVPTX(); } GlobalVariable *createPGOFuncNameVar(Module &M, >From 0872556f597056361b0a2c23cdd0be3d9745aef3 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 3 Jan 2024 17:18:47 -0600 Subject: [PATCH 08/58] Reformat DeviceRTL prof functions --- openmp/libomptarget/DeviceRTL/include/Profiling.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/openmp/libomptarget/DeviceRTL/include/Profiling.h b/openmp/libomptarget/DeviceRTL/include/Profiling.h index 68c7744cd60752..9efc1554c176bc 100644 --- a/openmp/libomptarget/DeviceRTL/include/Profiling.h +++ b/openmp/libomptarget/DeviceRTL/include/Profiling.h @@ -13,9 +13,8 @@ #define OMPTARGET_DEVICERTL_PROFILING_H extern "C" { - -void __llvm_profile_register_function(void *ptr); -void __llvm_profile_register_names_function(void *ptr, long int i); +void __llvm_profile_register_function(void *Ptr); +void __llvm_profile_register_names_function(void *Ptr, long int I); } #endif >From 62f31d1c71b5d100f38d6dc584cc138b3904581b Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 9 Jan 2024 11:52:29 -0600 Subject: [PATCH 09/58] Style changes + catch name error --- .../common/include/GlobalHandler.h | 16 ++-- .../common/src/GlobalHandler.cpp | 87 ++++++++++--------- 2 files changed, 56 insertions(+), 47 deletions(-) diff --git a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h index a803b3f76d8b25..755bb23a414e37 100644 --- a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h +++ b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h @@ -13,8 +13,7 @@ #ifndef LLVM_OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_GLOBALHANDLER_H #define LLVM_OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_GLOBALHANDLER_H -#include -#include +#include #include "llvm/ADT/DenseMap.h" #include "llvm/Object/ELFObjectFile.h" @@ -60,18 +59,19 @@ class GlobalTy { void setPtr(void *P) { Ptr = P; } }; -typedef void *IntPtrT; +using IntPtrT = void *; struct __llvm_profile_data { -#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) Type Name; +#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ + std::remove_const::type Name; #include "llvm/ProfileData/InstrProfData.inc" }; /// PGO profiling data extracted from a GPU device struct GPUProfGlobals { - std::string names; - std::vector> counts; - std::vector<__llvm_profile_data> data; - Triple targetTriple; + SmallVector NamesData; + SmallVector> Counts; + SmallVector<__llvm_profile_data> Data; + Triple TargetTriple; void dump() const; }; diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index 86742d0f77a2fe..7cb672e7b26839 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -19,6 +19,7 @@ #include "llvm/Support/Error.h" #include +#include using namespace llvm; using namespace omp; @@ -177,73 +178,81 @@ bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy &Device, Expected GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, DeviceImageTy &Image) { - GPUProfGlobals profdata; + GPUProfGlobals DeviceProfileData; auto ELFObj = getELFObjectFile(Image); if (!ELFObj) return ELFObj.takeError(); - profdata.targetTriple = ELFObj->makeTriple(); + DeviceProfileData.TargetTriple = ELFObj->makeTriple(); + // Iterate through elf symbols - for (auto &sym : ELFObj->symbols()) { - if (auto name = sym.getName()) { - // Check if given current global is a profiling global based - // on name - if (name->equals(getInstrProfNamesVarName())) { - // Read in profiled function names - std::vector chars(sym.getSize() / sizeof(char), ' '); - GlobalTy NamesGlobal(name->str(), sym.getSize(), chars.data()); - if (auto Err = readGlobalFromDevice(Device, Image, NamesGlobal)) - return Err; - std::string names(chars.begin(), chars.end()); - profdata.names = std::move(names); - } else if (name->starts_with(getInstrProfCountersVarPrefix())) { - // Read global variable profiling counts - std::vector counts(sym.getSize() / sizeof(int64_t), 0); - GlobalTy CountGlobal(name->str(), sym.getSize(), counts.data()); - if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal)) - return Err; - profdata.counts.push_back(std::move(counts)); - } else if (name->starts_with(getInstrProfDataVarPrefix())) { - // Read profiling data for this global variable - __llvm_profile_data data{}; - GlobalTy DataGlobal(name->str(), sym.getSize(), &data); - if (auto Err = readGlobalFromDevice(Device, Image, DataGlobal)) - return Err; - profdata.data.push_back(std::move(data)); - } + for (auto &Sym : ELFObj->symbols()) { + auto NameOrErr = Sym.getName(); + if (!NameOrErr) + return ELFObj.takeError(); + + // Check if given current global is a profiling global based + // on name + if (NameOrErr->equals(getInstrProfNamesVarName())) { + // Read in profiled function names + DeviceProfileData.NamesData = SmallVector(Sym.getSize(), 0); + GlobalTy NamesGlobal(NameOrErr->str(), Sym.getSize(), + DeviceProfileData.NamesData.data()); + if (auto Err = readGlobalFromDevice(Device, Image, NamesGlobal)) + return Err; + } else if (NameOrErr->starts_with(getInstrProfCountersVarPrefix())) { + // Read global variable profiling counts + SmallVector Counts(Sym.getSize() / sizeof(int64_t), 0); + GlobalTy CountGlobal(NameOrErr->str(), Sym.getSize(), Counts.data()); + if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal)) + return Err; + DeviceProfileData.Counts.push_back(std::move(Counts)); + } else if (NameOrErr->starts_with(getInstrProfDataVarPrefix())) { + // Read profiling data for this global variable + __llvm_profile_data Data{}; + GlobalTy DataGlobal(NameOrErr->str(), Sym.getSize(), &Data); + if (auto Err = readGlobalFromDevice(Device, Image, DataGlobal)) + return Err; + DeviceProfileData.Data.push_back(std::move(Data)); } } - return profdata; + return DeviceProfileData; } void GPUProfGlobals::dump() const { - llvm::outs() << "======= GPU Profile =======\nTarget: " << targetTriple.str() + llvm::outs() << "======= GPU Profile =======\nTarget: " << TargetTriple.str() << "\n"; llvm::outs() << "======== Counters =========\n"; - for (const auto &count : counts) { + for (const auto &Count : Counts) { llvm::outs() << "["; - for (size_t i = 0; i < count.size(); i++) { + for (size_t i = 0; i < Count.size(); i++) { if (i == 0) llvm::outs() << " "; - llvm::outs() << count[i] << " "; + llvm::outs() << Count[i] << " "; } llvm::outs() << "]\n"; } llvm::outs() << "========== Data ===========\n"; - for (const auto &d : data) { + for (const auto &ProfData : Data) { llvm::outs() << "{ "; #define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ - llvm::outs() << d.Name << " "; + llvm::outs() << ProfData.Name << " "; #include "llvm/ProfileData/InstrProfData.inc" llvm::outs() << " }\n"; } llvm::outs() << "======== Functions ========\n"; - InstrProfSymtab symtab; - if (Error Err = symtab.create(StringRef(names))) { + std::string s; + s.reserve(NamesData.size()); + for (uint8_t Name : NamesData) { + s.push_back((char)Name); + } + + InstrProfSymtab Symtab; + if (Error Err = Symtab.create(StringRef(s))) { consumeError(std::move(Err)); } - symtab.dumpNames(llvm::outs()); + Symtab.dumpNames(llvm::outs()); llvm::outs() << "===========================\n"; } >From 0c4bbeb54d189c1461affd37853aa86c3e3ca7d8 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 17 Jan 2024 19:59:06 -0600 Subject: [PATCH 10/58] Add GPU PGO test --- .../common/src/GlobalHandler.cpp | 2 +- openmp/libomptarget/test/CMakeLists.txt | 6 +++ openmp/libomptarget/test/lit.cfg | 3 ++ openmp/libomptarget/test/lit.site.cfg.in | 2 +- openmp/libomptarget/test/offloading/pgo1.c | 39 +++++++++++++++++++ 5 files changed, 50 insertions(+), 2 deletions(-) create mode 100644 openmp/libomptarget/test/offloading/pgo1.c diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index 7cb672e7b26839..e5eb653d022287 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -239,7 +239,7 @@ void GPUProfGlobals::dump() const { #define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ llvm::outs() << ProfData.Name << " "; #include "llvm/ProfileData/InstrProfData.inc" - llvm::outs() << " }\n"; + llvm::outs() << "}\n"; } llvm::outs() << "======== Functions ========\n"; diff --git a/openmp/libomptarget/test/CMakeLists.txt b/openmp/libomptarget/test/CMakeLists.txt index a0ba233eaa5726..21233f3e252eb5 100644 --- a/openmp/libomptarget/test/CMakeLists.txt +++ b/openmp/libomptarget/test/CMakeLists.txt @@ -12,6 +12,12 @@ else() set(LIBOMPTARGET_DEBUG False) endif() +if (OPENMP_STANDALONE_BUILD) + set(LIBOMPTARGET_TEST_GPU_PGO False) +else() + set(LIBOMPTARGET_TEST_GPU_PGO True) +endif() + # Replace the space from user's input with ";" in case that CMake add escape # char into the lit command. string(REPLACE " " ";" LIBOMPTARGET_LIT_ARG_LIST "${LIBOMPTARGET_LIT_ARGS}") diff --git a/openmp/libomptarget/test/lit.cfg b/openmp/libomptarget/test/lit.cfg index 19c5e5c4572227..49743f9fed7f29 100644 --- a/openmp/libomptarget/test/lit.cfg +++ b/openmp/libomptarget/test/lit.cfg @@ -104,6 +104,9 @@ config.available_features.add(config.libomptarget_current_target) if config.libomptarget_has_libc: config.available_features.add('libc') +if config.libomptarget_test_pgo: + config.available_features.add('pgo') + # Determine whether the test system supports unified memory. # For CUDA, this is the case with compute capability 70 (Volta) or higher. # For all other targets, we currently assume it is. diff --git a/openmp/libomptarget/test/lit.site.cfg.in b/openmp/libomptarget/test/lit.site.cfg.in index 2d638118838727..494d1636af304a 100644 --- a/openmp/libomptarget/test/lit.site.cfg.in +++ b/openmp/libomptarget/test/lit.site.cfg.in @@ -25,6 +25,6 @@ config.libomptarget_not = "@OPENMP_NOT_EXECUTABLE@" config.libomptarget_debug = @LIBOMPTARGET_DEBUG@ config.has_libomptarget_ompt = @LIBOMPTARGET_OMPT_SUPPORT@ config.libomptarget_has_libc = @LIBOMPTARGET_GPU_LIBC_SUPPORT@ - +config.libomptarget_test_pgo = @LIBOMPTARGET_TEST_GPU_PGO@ # Let the main config do the real work. lit_config.load_config(config, "@CMAKE_CURRENT_SOURCE_DIR@/lit.cfg") diff --git a/openmp/libomptarget/test/offloading/pgo1.c b/openmp/libomptarget/test/offloading/pgo1.c new file mode 100644 index 00000000000000..ca8a6f502a06aa --- /dev/null +++ b/openmp/libomptarget/test/offloading/pgo1.c @@ -0,0 +1,39 @@ +// RUN: %libomptarget-compile-generic -fprofile-instr-generate -Xclang "-fprofile-instrument=clang" +// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic + +// UNSUPPORTED: x86_64-pc-linux-gnu +// UNSUPPORTED: x86_64-pc-linux-gnu-LTO +// UNSUPPORTED: aarch64-unknown-linux-gnu +// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +// REQUIRES: pgo + +#ifdef _OPENMP +#include +#endif + +int test1(int a) { return a / 2; } +int test2(int a) { return a * 2; } + +int main() { + int m = 2; +#pragma omp target + for (int i = 0; i < 10; i++) { + m = test1(m); + for (int j = 0; j < 2; j++) { + m = test2(m); + } + } +} + +// CHECK: ======== Counters ========= +// CHECK-NEXT: [ 0 11 20 ] +// CHECK-NEXT: [ 10 ] +// CHECK-NEXT: [ 20 ] +// CHECK-NEXT: ========== Data =========== +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: ======== Functions ======== +// CHECK-NEXT: pgo1.c:__omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} +// CHECK-NEXT: test1 +// CHECK-NEXT: test2 >From c7ae2a74daa93b05058fcc9bba64e0734359362c Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 17 Jan 2024 23:12:27 -0600 Subject: [PATCH 11/58] Fix PGO test formatting --- openmp/libomptarget/test/offloading/pgo1.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/openmp/libomptarget/test/offloading/pgo1.c b/openmp/libomptarget/test/offloading/pgo1.c index ca8a6f502a06aa..389be19b670d76 100644 --- a/openmp/libomptarget/test/offloading/pgo1.c +++ b/openmp/libomptarget/test/offloading/pgo1.c @@ -1,4 +1,5 @@ -// RUN: %libomptarget-compile-generic -fprofile-instr-generate -Xclang "-fprofile-instrument=clang" +// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ +// RUN: -Xclang "-fprofile-instrument=clang" // RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic // UNSUPPORTED: x86_64-pc-linux-gnu @@ -30,9 +31,18 @@ int main() { // CHECK-NEXT: [ 10 ] // CHECK-NEXT: [ 20 ] // CHECK-NEXT: ========== Data =========== -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } // CHECK-NEXT: ======== Functions ======== // CHECK-NEXT: pgo1.c:__omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} // CHECK-NEXT: test1 >From 8bb22072914bbb830e2788d117aedd0e0bab66ff Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 18 Jan 2024 23:15:55 -0600 Subject: [PATCH 12/58] Refactor visibility logic --- llvm/lib/ProfileData/InstrProf.cpp | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp index 511571a3eed9b0..708ea63fd95e04 100644 --- a/llvm/lib/ProfileData/InstrProf.cpp +++ b/llvm/lib/ProfileData/InstrProf.cpp @@ -422,6 +422,16 @@ bool isGPUProfTarget(const Module &M) { return Triple.isAMDGPU() || Triple.isNVPTX(); } +void setPGOFuncVisibility(Module &M, GlobalVariable *FuncNameVar) { + // If the target is a GPU, make the symbol protected so it can + // be read from the host device + if (isGPUProfTarget(M)) + FuncNameVar->setVisibility(GlobalValue::ProtectedVisibility); + // Hide the symbol so that we correctly get a copy for each executable. + else if (!GlobalValue::isLocalLinkage(FuncNameVar->getLinkage())) + FuncNameVar->setVisibility(GlobalValue::HiddenVisibility); +} + GlobalVariable *createPGOFuncNameVar(Module &M, GlobalValue::LinkageTypes Linkage, StringRef PGOFuncName) { @@ -445,14 +455,7 @@ GlobalVariable *createPGOFuncNameVar(Module &M, new GlobalVariable(M, Value->getType(), true, Linkage, Value, getPGOFuncNameVarName(PGOFuncName, Linkage)); - // If the target is a GPU, make the symbol protected so it can - // be read from the host device - if (isGPUProfTarget(M)) - FuncNameVar->setVisibility(GlobalValue::ProtectedVisibility); - // Hide the symbol so that we correctly get a copy for each executable. - else if (!GlobalValue::isLocalLinkage(FuncNameVar->getLinkage())) - FuncNameVar->setVisibility(GlobalValue::HiddenVisibility); - + setPGOFuncVisibility(M, FuncNameVar); return FuncNameVar; } >From 9f13943f64cb16162e44902d54de53a9b1229179 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 23 Jan 2024 18:33:58 -0600 Subject: [PATCH 13/58] Add LLVM instrumentation support This PR formerly only supported -fprofile-instrument=clang. This commit adds support for -fprofile-instrument=llvm --- .../Instrumentation/PGOInstrumentation.cpp | 12 +++- openmp/libomptarget/test/offloading/pgo1.c | 72 +++++++++++++------ 2 files changed, 59 insertions(+), 25 deletions(-) diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index c20fc942eaf0d5..bbc8da78fd7baf 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -862,6 +862,10 @@ static void instrumentOneFunc( auto Name = FuncInfo.FuncNameVar; auto CFGHash = ConstantInt::get(Type::getInt64Ty(M->getContext()), FuncInfo.FunctionHash); + // Make sure that pointer to global is passed in with zero addrspace + // This is relevant during GPU profiling + auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( + Name, llvm::PointerType::getUnqual(M->getContext())); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); IRBuilder<> Builder(&EntryBB, EntryBB.getFirstInsertionPt()); @@ -869,7 +873,7 @@ static void instrumentOneFunc( // i32 ) Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_cover), - {Name, CFGHash, Builder.getInt32(1), Builder.getInt32(0)}); + {NormalizedPtr, CFGHash, Builder.getInt32(1), Builder.getInt32(0)}); return; } @@ -887,7 +891,8 @@ static void instrumentOneFunc( // i32 ) Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_timestamp), - {Name, CFGHash, Builder.getInt32(NumCounters), Builder.getInt32(I)}); + {NormalizedPtr, CFGHash, Builder.getInt32(NumCounters), + Builder.getInt32(I)}); I += PGOBlockCoverage ? 8 : 1; } @@ -901,7 +906,8 @@ static void instrumentOneFunc( Intrinsic::getDeclaration(M, PGOBlockCoverage ? Intrinsic::instrprof_cover : Intrinsic::instrprof_increment), - {Name, CFGHash, Builder.getInt32(NumCounters), Builder.getInt32(I++)}); + {NormalizedPtr, CFGHash, Builder.getInt32(NumCounters), + Builder.getInt32(I++)}); } // Now instrument select instructions: diff --git a/openmp/libomptarget/test/offloading/pgo1.c b/openmp/libomptarget/test/offloading/pgo1.c index 389be19b670d76..d95793b508dcfc 100644 --- a/openmp/libomptarget/test/offloading/pgo1.c +++ b/openmp/libomptarget/test/offloading/pgo1.c @@ -1,6 +1,11 @@ // RUN: %libomptarget-compile-generic -fprofile-instr-generate \ // RUN: -Xclang "-fprofile-instrument=clang" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic +// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ +// RUN: --check-prefix="CLANG-PGO" +// RUN: %libomptarget-compile-generic -fprofile-generate \ +// RUN: -Xclang "-fprofile-instrument=llvm" +// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ +// RUN: --check-prefix="LLVM-PGO" // UNSUPPORTED: x86_64-pc-linux-gnu // UNSUPPORTED: x86_64-pc-linux-gnu-LTO @@ -26,24 +31,47 @@ int main() { } } -// CHECK: ======== Counters ========= -// CHECK-NEXT: [ 0 11 20 ] -// CHECK-NEXT: [ 10 ] -// CHECK-NEXT: [ 20 ] -// CHECK-NEXT: ========== Data =========== -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CHECK-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CHECK-NEXT: ======== Functions ======== -// CHECK-NEXT: pgo1.c:__omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} -// CHECK-NEXT: test1 -// CHECK-NEXT: test2 +// CLANG-PGO: ======== Counters ========= +// CLANG-PGO-NEXT: [ 0 11 20 ] +// CLANG-PGO-NEXT: [ 10 ] +// CLANG-PGO-NEXT: [ 20 ] +// CLANG-PGO-NEXT: ========== Data =========== +// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// CLANG-PGO-NEXT: ======== Functions ======== +// CLANG-PGO-NEXT: pgo1.c: +// CLANG-PGO-SAME: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} +// CLANG-PGO-NEXT: test1 +// CLANG-PGO-NEXT: test2 + +// LLVM-PGO: ======== Counters ========= +// LLVM-PGO-NEXT: [ 20 ] +// LLVM-PGO-NEXT: [ 10 ] +// LLVM-PGO-NEXT: [ 20 10 1 1 ] +// LLVM-PGO-NEXT: ========== Data =========== +// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} +// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } +// LLVM-PGO-NEXT: ======== Functions ======== +// LLVM-PGO-NEXT: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} +// LLVM-PGO-NEXT: test1 +// LLVM-PGO-NEXT: test2 >From 0606f0dd1b32ef9ebe138bbc964b3921e22d95d1 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 14 Feb 2024 01:46:55 -0600 Subject: [PATCH 14/58] Use explicit addrspace instead of unqual --- clang/lib/CodeGen/CodeGenPGO.cpp | 2 +- llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index e084dda879cbc0..4c75a01222d304 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1103,7 +1103,7 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - FuncNameVar, llvm::PointerType::getUnqual(CGM.getLLVMContext())); + FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), Builder.getInt32(NumRegionCounters), diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index bbc8da78fd7baf..c63b3e4ecf786a 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -865,7 +865,7 @@ static void instrumentOneFunc( // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - Name, llvm::PointerType::getUnqual(M->getContext())); + Name, llvm::PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); IRBuilder<> Builder(&EntryBB, EntryBB.getFirstInsertionPt()); >From c1f9be321678766525141214aaab74636cafbc2c Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 15 Feb 2024 19:10:09 -0600 Subject: [PATCH 15/58] Remove redundant namespaces --- .../Instrumentation/PGOInstrumentation.cpp | 4 +-- .../common/src/GlobalHandler.cpp | 26 +++++++++---------- 2 files changed, 15 insertions(+), 15 deletions(-) diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index c63b3e4ecf786a..3058e577738fda 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -864,8 +864,8 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - Name, llvm::PointerType::get(M->getContext(), 0)); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); IRBuilder<> Builder(&EntryBB, EntryBB.getFirstInsertionPt()); diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index e5eb653d022287..ae270c60804d26 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -219,30 +219,30 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, } void GPUProfGlobals::dump() const { - llvm::outs() << "======= GPU Profile =======\nTarget: " << TargetTriple.str() + outs() << "======= GPU Profile =======\nTarget: " << TargetTriple.str() << "\n"; - llvm::outs() << "======== Counters =========\n"; + outs() << "======== Counters =========\n"; for (const auto &Count : Counts) { - llvm::outs() << "["; + outs() << "["; for (size_t i = 0; i < Count.size(); i++) { if (i == 0) - llvm::outs() << " "; - llvm::outs() << Count[i] << " "; + outs() << " "; + outs() << Count[i] << " "; } - llvm::outs() << "]\n"; + outs() << "]\n"; } - llvm::outs() << "========== Data ===========\n"; + outs() << "========== Data ===========\n"; for (const auto &ProfData : Data) { - llvm::outs() << "{ "; + outs() << "{ "; #define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) \ - llvm::outs() << ProfData.Name << " "; + outs() << ProfData.Name << " "; #include "llvm/ProfileData/InstrProfData.inc" - llvm::outs() << "}\n"; + outs() << "}\n"; } - llvm::outs() << "======== Functions ========\n"; + outs() << "======== Functions ========\n"; std::string s; s.reserve(NamesData.size()); for (uint8_t Name : NamesData) { @@ -253,6 +253,6 @@ void GPUProfGlobals::dump() const { if (Error Err = Symtab.create(StringRef(s))) { consumeError(std::move(Err)); } - Symtab.dumpNames(llvm::outs()); - llvm::outs() << "===========================\n"; + Symtab.dumpNames(outs()); + outs() << "===========================\n"; } >From 6a3ae407e69e7524f0f808329c534f8352ee1779 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 15 Feb 2024 19:15:15 -0600 Subject: [PATCH 16/58] Clang format --- .../libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index ae270c60804d26..1fce2448922624 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -220,7 +220,7 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, void GPUProfGlobals::dump() const { outs() << "======= GPU Profile =======\nTarget: " << TargetTriple.str() - << "\n"; + << "\n"; outs() << "======== Counters =========\n"; for (const auto &Count : Counts) { >From 6866862d459e3c3fa65fae8ae639ddc3ff735252 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 16 Feb 2024 13:13:39 -0600 Subject: [PATCH 17/58] Use getAddrSpaceCast Replace getPointerBitCastOrAddrSpaceCast with getAddrSpaceCast and allow no-op getAddrSpaceCast calls when types are identical --- clang/lib/CodeGen/CodeGenPGO.cpp | 2 +- llvm/lib/IR/Constants.cpp | 4 ++++ llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | 2 +- 3 files changed, 6 insertions(+), 2 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index 8f52018445d2b0..baceeba8380ddb 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1099,7 +1099,7 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( + auto *NormalizedPtr = llvm::ConstantExpr::getAddrSpaceCast( FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp index a38b912164b130..2d89c5bbd4a4c2 100644 --- a/llvm/lib/IR/Constants.cpp +++ b/llvm/lib/IR/Constants.cpp @@ -2067,6 +2067,10 @@ Constant *ConstantExpr::getBitCast(Constant *C, Type *DstTy, Constant *ConstantExpr::getAddrSpaceCast(Constant *C, Type *DstTy, bool OnlyIfReduced) { + // Skip cast if types are identical + if (C->getType() == DstTy) + return C; + assert(CastInst::castIsValid(Instruction::AddrSpaceCast, C, DstTy) && "Invalid constantexpr addrspacecast!"); return getFoldedCast(Instruction::AddrSpaceCast, C, DstTy, OnlyIfReduced); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index 3058e577738fda..c0be71aa4cc004 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -864,7 +864,7 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + auto *NormalizedPtr = ConstantExpr::getAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); >From 62a5ee1c75545571f81d9edd22e19e9ef7cff69f Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 27 Feb 2024 14:53:51 -0600 Subject: [PATCH 18/58] Revert "Use getAddrSpaceCast" This reverts commit 6866862d459e3c3fa65fae8ae639ddc3ff735252. --- clang/lib/CodeGen/CodeGenPGO.cpp | 2 +- llvm/lib/IR/Constants.cpp | 4 ---- llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | 2 +- 3 files changed, 2 insertions(+), 6 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index baceeba8380ddb..8f52018445d2b0 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1099,7 +1099,7 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getAddrSpaceCast( + auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp index 2d89c5bbd4a4c2..a38b912164b130 100644 --- a/llvm/lib/IR/Constants.cpp +++ b/llvm/lib/IR/Constants.cpp @@ -2067,10 +2067,6 @@ Constant *ConstantExpr::getBitCast(Constant *C, Type *DstTy, Constant *ConstantExpr::getAddrSpaceCast(Constant *C, Type *DstTy, bool OnlyIfReduced) { - // Skip cast if types are identical - if (C->getType() == DstTy) - return C; - assert(CastInst::castIsValid(Instruction::AddrSpaceCast, C, DstTy) && "Invalid constantexpr addrspacecast!"); return getFoldedCast(Instruction::AddrSpaceCast, C, DstTy, OnlyIfReduced); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index c0be71aa4cc004..3058e577738fda 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -864,7 +864,7 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = ConstantExpr::getAddrSpaceCast( + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); >From 052394fa28c923d130bf73a07b965a9751467302 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 27 Feb 2024 15:34:34 -0600 Subject: [PATCH 19/58] Revert "Use getAddrSpaceCast" This reverts commit 6866862d459e3c3fa65fae8ae639ddc3ff735252. --- clang/lib/CodeGen/CodeGenPGO.cpp | 2 +- llvm/lib/IR/Constants.cpp | 4 ---- llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | 2 +- 3 files changed, 2 insertions(+), 6 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index baceeba8380ddb..8f52018445d2b0 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1099,7 +1099,7 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getAddrSpaceCast( + auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp index 2d89c5bbd4a4c2..a38b912164b130 100644 --- a/llvm/lib/IR/Constants.cpp +++ b/llvm/lib/IR/Constants.cpp @@ -2067,10 +2067,6 @@ Constant *ConstantExpr::getBitCast(Constant *C, Type *DstTy, Constant *ConstantExpr::getAddrSpaceCast(Constant *C, Type *DstTy, bool OnlyIfReduced) { - // Skip cast if types are identical - if (C->getType() == DstTy) - return C; - assert(CastInst::castIsValid(Instruction::AddrSpaceCast, C, DstTy) && "Invalid constantexpr addrspacecast!"); return getFoldedCast(Instruction::AddrSpaceCast, C, DstTy, OnlyIfReduced); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index c0be71aa4cc004..3058e577738fda 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -864,7 +864,7 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = ConstantExpr::getAddrSpaceCast( + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); >From 612d5a5f6966a77e82e5591f5aea475fbf886e55 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 1 Mar 2024 02:04:00 -0600 Subject: [PATCH 20/58] Write PGO TODO: Fix tests --- compiler-rt/lib/profile/InstrProfiling.h | 11 ++ compiler-rt/lib/profile/InstrProfilingFile.c | 148 +++++++++++++++--- .../common/include/GlobalHandler.h | 14 +- .../common/src/GlobalHandler.cpp | 57 +++++-- .../common/src/PluginInterface.cpp | 6 +- 5 files changed, 200 insertions(+), 36 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfiling.h b/compiler-rt/lib/profile/InstrProfiling.h index 01239083369187..937acbd417de46 100644 --- a/compiler-rt/lib/profile/InstrProfiling.h +++ b/compiler-rt/lib/profile/InstrProfiling.h @@ -275,6 +275,17 @@ void __llvm_profile_get_padding_sizes_for_counters( */ void __llvm_profile_set_dumped(); +/*! + * \brief Write custom target-specific profiling data to a seperate file. + * Used by libomptarget for GPU PGO. + */ +int __llvm_write_custom_profile(const char *Target, + const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, + const char *CountersBegin, + const char *CountersEnd, const char *NamesBegin, + const char *NamesEnd); + /*! * This variable is defined in InstrProfilingRuntime.cpp as a hidden * symbol. Its main purpose is to enable profile runtime user to diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index f3b457d786e6bd..4fc401bb9bebf5 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -502,27 +502,15 @@ static FILE *getFileObject(const char *OutputName) { return fopen(OutputName, "ab"); } -/* Write profile data to file \c OutputName. */ -static int writeFile(const char *OutputName) { - int RetVal; - FILE *OutputFile; - - int MergeDone = 0; +/* Get file object and merge if applicable */ +static FILE *getMergeFileObject(const char *OutputName, int *MergeDone) { VPMergeHook = &lprofMergeValueProfData; if (doMerging()) - OutputFile = openFileForMerging(OutputName, &MergeDone); - else - OutputFile = getFileObject(OutputName); - - if (!OutputFile) - return -1; - - FreeHook = &free; - setupIOBuffer(); - ProfDataWriter fileWriter; - initFileWriter(&fileWriter, OutputFile); - RetVal = lprofWriteData(&fileWriter, lprofGetVPDataReader(), MergeDone); + return openFileForMerging(OutputName, MergeDone); + return getFileObject(OutputName); +} +static void closeFileObject(FILE *OutputFile) { if (OutputFile == getProfileFile()) { fflush(OutputFile); if (doMerging() && !__llvm_profile_is_continuous_mode_enabled()) { @@ -531,7 +519,23 @@ static int writeFile(const char *OutputName) { } else { fclose(OutputFile); } +} + +/* Write profile data to file \c OutputName. */ +static int writeFile(const char *OutputName) { + int RetVal, MergeDone = 0; + FILE *OutputFile = getMergeFileObject(OutputName, &MergeDone); + + if (!OutputFile) + return -1; + + FreeHook = &free; + setupIOBuffer(); + ProfDataWriter fileWriter; + initFileWriter(&fileWriter, OutputFile); + RetVal = lprofWriteData(&fileWriter, lprofGetVPDataReader(), MergeDone); + closeFileObject(OutputFile); return RetVal; } @@ -558,10 +562,16 @@ static int writeOrderFile(const char *OutputName) { #define LPROF_INIT_ONCE_ENV "__LLVM_PROFILE_RT_INIT_ONCE" +static void forceTruncateFile(const char *Filename) { + FILE *File = fopen(Filename, "w"); + if (!File) + return; + fclose(File); +} + static void truncateCurrentFile(void) { const char *Filename; char *FilenameBuf; - FILE *File; int Length; Length = getCurFilenameLength(); @@ -591,10 +601,7 @@ static void truncateCurrentFile(void) { return; /* Truncate the file. Later we'll reopen and append. */ - File = fopen(Filename, "w"); - if (!File) - return; - fclose(File); + forceTruncateFile(Filename); } /* Write a partial profile to \p Filename, which is required to be backed by @@ -1271,4 +1278,99 @@ COMPILER_RT_VISIBILITY int __llvm_profile_set_file_object(FILE *File, return 0; } +int __llvm_write_custom_profile(const char *Target, + const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, + const char *CountersBegin, + const char *CountersEnd, const char *NamesBegin, + const char *NamesEnd) { + int ReturnValue = 0, FilenameLength, TargetLength, MergeDone; + char *FilenameBuf, *TargetFilename; + const char *Filename; + + /* Save old profile data */ + FILE *oldFile = getProfileFile(); + + // Temporarily suspend getting SIGKILL when the parent exits. + int PDeathSig = lprofSuspendSigKill(); + + if (lprofProfileDumped() || __llvm_profile_is_continuous_mode_enabled()) { + PROF_NOTE("Profile data not written to file: %s.\n", "already written"); + if (PDeathSig == 1) + lprofRestoreSigKill(); + return 0; + } + + /* Get current filename */ + FilenameLength = getCurFilenameLength(); + FilenameBuf = (char *)COMPILER_RT_ALLOCA(FilenameLength + 1); + Filename = getCurFilename(FilenameBuf, 0); + + /* Check the filename. */ + if (!Filename) { + PROF_ERR("Failed to write file : %s\n", "Filename not set"); + if (PDeathSig == 1) + lprofRestoreSigKill(); + return -1; + } + + /* Allocate new space for our target-specific PGO filename */ + TargetLength = strlen(Target); + TargetFilename = + (char *)COMPILER_RT_ALLOCA(FilenameLength + TargetLength + 2); + + /* Prepend "TARGET." to current filename */ + memcpy(TargetFilename, Target, TargetLength); + TargetFilename[TargetLength] = '.'; + memcpy(TargetFilename, Target, TargetLength); + memcpy(TargetFilename + 1 + TargetLength, Filename, FilenameLength); + TargetFilename[FilenameLength + 1 + TargetLength] = 0; + + /* Check if there is llvm/runtime version mismatch. */ + if (GET_VERSION(__llvm_profile_get_version()) != INSTR_PROF_RAW_VERSION) { + PROF_ERR("Runtime and instrumentation version mismatch : " + "expected %d, but get %d\n", + INSTR_PROF_RAW_VERSION, + (int)GET_VERSION(__llvm_profile_get_version())); + if (PDeathSig == 1) + lprofRestoreSigKill(); + return -1; + } + + /* Clean old target file */ + forceTruncateFile(TargetFilename); + + /* Open target-specific PGO file */ + MergeDone = 0; + FILE *OutputFile = getMergeFileObject(TargetFilename, &MergeDone); + + if (!OutputFile) { + PROF_ERR("Failed to open file : %s\n", TargetFilename); + if (PDeathSig == 1) + lprofRestoreSigKill(); + return -1; + } + + FreeHook = &free; + setupIOBuffer(); + ProfDataWriter fileWriter; + initFileWriter(&fileWriter, OutputFile); + + /* Write custom data to the file */ + ReturnValue = lprofWriteDataImpl( + &fileWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, + lprofGetVPDataReader(), NamesBegin, NamesEnd, MergeDone); + + closeFileObject(OutputFile); + + // Restore SIGKILL. + if (PDeathSig == 1) + lprofRestoreSigKill(); + + /* Restore old profiling file */ + setProfileFile(oldFile); + + return ReturnValue; +} + #endif diff --git a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h index f5a15ca11bfcda..af0cd4dcdf5dcf 100644 --- a/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h +++ b/openmp/libomptarget/plugins-nextgen/common/include/GlobalHandler.h @@ -63,14 +63,24 @@ struct __llvm_profile_data { #include "llvm/ProfileData/InstrProfData.inc" }; +extern "C" { +extern int __attribute__((weak)) +__llvm_write_custom_profile(const char *Target, + const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, + const char *CountersBegin, const char *CountersEnd, + const char *NamesBegin, const char *NamesEnd); +} + /// PGO profiling data extracted from a GPU device struct GPUProfGlobals { - SmallVector NamesData; - SmallVector> Counts; + SmallVector Counts; SmallVector<__llvm_profile_data> Data; + SmallVector NamesData; Triple TargetTriple; void dump() const; + Error write() const; }; /// Subclass of GlobalTy that holds the memory for a global of \p Ty. diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index 1fce2448922624..2f16b6e3c139e9 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -205,7 +205,7 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, GlobalTy CountGlobal(NameOrErr->str(), Sym.getSize(), Counts.data()); if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal)) return Err; - DeviceProfileData.Counts.push_back(std::move(Counts)); + DeviceProfileData.Counts.append(std::move(Counts)); } else if (NameOrErr->starts_with(getInstrProfDataVarPrefix())) { // Read profiling data for this global variable __llvm_profile_data Data{}; @@ -223,15 +223,14 @@ void GPUProfGlobals::dump() const { << "\n"; outs() << "======== Counters =========\n"; - for (const auto &Count : Counts) { - outs() << "["; - for (size_t i = 0; i < Count.size(); i++) { - if (i == 0) - outs() << " "; - outs() << Count[i] << " "; - } - outs() << "]\n"; + for (size_t i = 0; i < Counts.size(); i++) { + if (i > 0 && i % 10 == 0) + outs() << "\n"; + else if (i != 0) + outs() << " "; + outs() << Counts[i]; } + outs() << "\n"; outs() << "========== Data ===========\n"; for (const auto &ProfData : Data) { @@ -256,3 +255,43 @@ void GPUProfGlobals::dump() const { Symtab.dumpNames(outs()); outs() << "===========================\n"; } + +Error GPUProfGlobals::write() const { + if (!__llvm_write_custom_profile) + return Plugin::error("Could not find symbol __llvm_write_custom_profile. " + "The compiler-rt profiling library must be linked for " + "GPU PGO to work."); + + size_t DataSize = Data.size() * sizeof(__llvm_profile_data), + CountsSize = Counts.size() * sizeof(int64_t); + __llvm_profile_data *DataBegin, *DataEnd; + char *CountersBegin, *CountersEnd, *NamesBegin, *NamesEnd; + + // Initialize array of contiguous data. We need to make sure each section is + // contiguous so that the PGO library can compute deltas properly + SmallVector ContiguousData(NamesData.size() + DataSize + CountsSize); + + // Compute region pointers + DataBegin = (__llvm_profile_data *)(ContiguousData.data() + CountsSize); + DataEnd = + (__llvm_profile_data *)(ContiguousData.data() + CountsSize + DataSize); + CountersBegin = (char *)ContiguousData.data(); + CountersEnd = (char *)(ContiguousData.data() + CountsSize); + NamesBegin = (char *)(ContiguousData.data() + CountsSize + DataSize); + NamesEnd = (char *)(ContiguousData.data() + CountsSize + DataSize + + NamesData.size()); + + // Copy data to contiguous buffer + memcpy(DataBegin, Data.data(), DataSize); + memcpy(CountersBegin, Counts.data(), CountsSize); + memcpy(NamesBegin, NamesData.data(), NamesData.size()); + + // Invoke compiler-rt entrypoint + int result = __llvm_write_custom_profile(TargetTriple.str().c_str(), + DataBegin, DataEnd, CountersBegin, + CountersEnd, NamesBegin, NamesEnd); + if (result != 0) + return Plugin::error("Error writing GPU PGO data to file"); + + return Plugin::success(); +} diff --git a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp index 1ea93795ce8ce4..d5e6b6128152dc 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp @@ -837,8 +837,10 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { if (!ProfOrErr) return ProfOrErr.takeError(); - // TODO: write data to profiling file - ProfOrErr->dump(); + // Write data to profiling file + if (auto Err = ProfOrErr->write()) { + consumeError(std::move(Err)); + } } // Delete the memory manager before deinitializing the device. Otherwise, >From b8c916305acf08c0bd2d51b81875be5e8fc59ff3 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 13 Mar 2024 20:05:32 -0500 Subject: [PATCH 21/58] Fix tests --- .../plugins-nextgen/common/src/PluginInterface.cpp | 3 +++ openmp/libomptarget/test/offloading/pgo1.c | 8 ++------ 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp index d5e6b6128152dc..2359ad28a25b04 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp @@ -837,6 +837,9 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { if (!ProfOrErr) return ProfOrErr.takeError(); + // Dump out profdata + ProfOrErr->dump(); + // Write data to profiling file if (auto Err = ProfOrErr->write()) { consumeError(std::move(Err)); diff --git a/openmp/libomptarget/test/offloading/pgo1.c b/openmp/libomptarget/test/offloading/pgo1.c index d95793b508dcfc..79e93d0f10827f 100644 --- a/openmp/libomptarget/test/offloading/pgo1.c +++ b/openmp/libomptarget/test/offloading/pgo1.c @@ -32,9 +32,7 @@ int main() { } // CLANG-PGO: ======== Counters ========= -// CLANG-PGO-NEXT: [ 0 11 20 ] -// CLANG-PGO-NEXT: [ 10 ] -// CLANG-PGO-NEXT: [ 20 ] +// CLANG-PGO-NEXT: 0 11 20 10 20 // CLANG-PGO-NEXT: ========== Data =========== // CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} // CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} @@ -55,9 +53,7 @@ int main() { // CLANG-PGO-NEXT: test2 // LLVM-PGO: ======== Counters ========= -// LLVM-PGO-NEXT: [ 20 ] -// LLVM-PGO-NEXT: [ 10 ] -// LLVM-PGO-NEXT: [ 20 10 1 1 ] +// LLVM-PGO-NEXT: 20 10 20 10 1 1 // LLVM-PGO-NEXT: ========== Data =========== // LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} // LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} >From 4568c4244d11010aacf9f1fe20bb1197008b057f Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 13 Mar 2024 21:40:20 -0500 Subject: [PATCH 22/58] Fix arguments --- compiler-rt/lib/profile/InstrProfilingFile.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index 466bfe480543bc..6570bc0d74caa1 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1360,10 +1360,10 @@ int __llvm_write_custom_profile(const char *Target, initFileWriter(&fileWriter, OutputFile); /* Write custom data to the file */ - ReturnValue = lprofWriteDataImpl( - &fileWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, - lprofGetVPDataReader(), NamesBegin, NamesEnd, MergeDone); - + ReturnValue = lprofWriteDataImpl(&fileWriter, DataBegin, DataEnd, + CountersBegin, CountersEnd, NULL, NULL, + lprofGetVPDataReader(), NamesBegin, NamesEnd, + NULL, NULL, NULL, NULL, MergeDone); closeFileObject(OutputFile); // Restore SIGKILL. >From 1fc4cb9c01f251432f4a6748e69b1d8cf74cc4fb Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 18 Mar 2024 20:05:52 -0500 Subject: [PATCH 23/58] Add GPU prof flags --- clang/include/clang/Driver/Options.td | 6 +++ clang/lib/Driver/ToolChain.cpp | 69 +++++++++++++-------------- clang/lib/Driver/ToolChains/Clang.cpp | 39 +++++++++++++-- 3 files changed, 74 insertions(+), 40 deletions(-) diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 4b1fcf1db1ad09..aab445906fa347 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -1716,6 +1716,9 @@ defm debug_info_for_profiling : BoolFOption<"debug-info-for-profiling", def fprofile_instr_generate : Flag<["-"], "fprofile-instr-generate">, Group, Visibility<[ClangOption, CLOption]>, HelpText<"Generate instrumented code to collect execution counts into default.profraw file (overridden by '=' form of option or LLVM_PROFILE_FILE env var)">; +def fprofile_instr_generate_gpu : Flag<["-"], "fprofile-instr-generate-gpu">, + Group, Visibility<[ClangOption, CLOption]>, + HelpText<"Generate instrumented GPU device code to collect execution counts into GPU_TARGET.default.profraw (overridden by LLVM_PROFILE_FILE env var)">; def fprofile_instr_generate_EQ : Joined<["-"], "fprofile-instr-generate=">, Group, Visibility<[ClangOption, CLOption]>, MetaVarName<"">, HelpText<"Generate instrumented code to collect execution counts into (overridden by LLVM_PROFILE_FILE env var)">; @@ -1744,6 +1747,9 @@ defm mcdc_coverage : BoolFOption<"coverage-mcdc", def fprofile_generate : Flag<["-"], "fprofile-generate">, Group, Visibility<[ClangOption, CLOption]>, HelpText<"Generate instrumented code to collect execution counts into default.profraw (overridden by LLVM_PROFILE_FILE env var)">; +def fprofile_generate_gpu : Flag<["-"], "fprofile-generate-gpu">, + Group, Visibility<[ClangOption, CLOption]>, + HelpText<"Generate instrumented GPU device code to collect execution counts into GPU_TARGET.default.profraw (overridden by LLVM_PROFILE_FILE env var)">; def fprofile_generate_EQ : Joined<["-"], "fprofile-generate=">, Group, Visibility<[ClangOption, CLOption]>, MetaVarName<"">, diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp index 03450fc0f57b93..f4c8aafbbd0e64 100644 --- a/clang/lib/Driver/ToolChain.cpp +++ b/clang/lib/Driver/ToolChain.cpp @@ -184,10 +184,9 @@ bool ToolChain::defaultToIEEELongDouble() const { return PPC_LINUX_DEFAULT_IEEELONGDOUBLE && getTriple().isOSLinux(); } -static void getAArch64MultilibFlags(const Driver &D, - const llvm::Triple &Triple, - const llvm::opt::ArgList &Args, - Multilib::flags_list &Result) { +static void getAArch64MultilibFlags(const Driver &D, const llvm::Triple &Triple, + const llvm::opt::ArgList &Args, + Multilib::flags_list &Result) { std::vector Features; tools::aarch64::getAArch64TargetFeatures(D, Triple, Args, Features, false); const auto UnifiedFeatures = tools::unifyTargetFeatures(Features); @@ -209,10 +208,9 @@ static void getAArch64MultilibFlags(const Driver &D, Result.push_back(llvm::join(MArch, "+")); } -static void getARMMultilibFlags(const Driver &D, - const llvm::Triple &Triple, - const llvm::opt::ArgList &Args, - Multilib::flags_list &Result) { +static void getARMMultilibFlags(const Driver &D, const llvm::Triple &Triple, + const llvm::opt::ArgList &Args, + Multilib::flags_list &Result) { std::vector Features; llvm::ARM::FPUKind FPUKind = tools::arm::getARMTargetFeatures( D, Triple, Args, Features, false /*ForAs*/, true /*ForMultilib*/); @@ -303,7 +301,7 @@ ToolChain::getSanitizerArgs(const llvm::opt::ArgList &JobArgs) const { return SanArgs; } -const XRayArgs& ToolChain::getXRayArgs() const { +const XRayArgs &ToolChain::getXRayArgs() const { if (!XRayArguments) XRayArguments.reset(new XRayArgs(*this, Args)); return *XRayArguments; @@ -394,8 +392,7 @@ static const DriverSuffix *parseDriverSuffix(StringRef ProgName, size_t &Pos) { return DS; } -ParsedClangName -ToolChain::getTargetAndModeFromProgramName(StringRef PN) { +ParsedClangName ToolChain::getTargetAndModeFromProgramName(StringRef PN) { std::string ProgName = normalizeProgramName(PN); size_t SuffixPos; const DriverSuffix *DS = parseDriverSuffix(ProgName, SuffixPos); @@ -406,8 +403,8 @@ ToolChain::getTargetAndModeFromProgramName(StringRef PN) { size_t LastComponent = ProgName.rfind('-', SuffixPos); if (LastComponent == std::string::npos) return ParsedClangName(ProgName.substr(0, SuffixEnd), DS->ModeFlag); - std::string ModeSuffix = ProgName.substr(LastComponent + 1, - SuffixEnd - LastComponent - 1); + std::string ModeSuffix = + ProgName.substr(LastComponent + 1, SuffixEnd - LastComponent - 1); // Infer target from the prefix. StringRef Prefix(ProgName); @@ -465,9 +462,7 @@ Tool *ToolChain::getFlang() const { return Flang.get(); } -Tool *ToolChain::buildAssembler() const { - return new tools::ClangAs(*this); -} +Tool *ToolChain::buildAssembler() const { return new tools::ClangAs(*this); } Tool *ToolChain::buildLinker() const { llvm_unreachable("Linking is not supported by this toolchain"); @@ -826,10 +821,12 @@ bool ToolChain::needsProfileRT(const ArgList &Args) { return false; return Args.hasArg(options::OPT_fprofile_generate) || + Args.hasArg(options::OPT_fprofile_generate_gpu) || Args.hasArg(options::OPT_fprofile_generate_EQ) || Args.hasArg(options::OPT_fcs_profile_generate) || Args.hasArg(options::OPT_fcs_profile_generate_EQ) || Args.hasArg(options::OPT_fprofile_instr_generate) || + Args.hasArg(options::OPT_fprofile_instr_generate_gpu) || Args.hasArg(options::OPT_fprofile_instr_generate_EQ) || Args.hasArg(options::OPT_fcreate_profile) || Args.hasArg(options::OPT_forder_file_instrumentation); @@ -842,8 +839,10 @@ bool ToolChain::needsGCovInstrumentation(const llvm::opt::ArgList &Args) { } Tool *ToolChain::SelectTool(const JobAction &JA) const { - if (D.IsFlangMode() && getDriver().ShouldUseFlangCompiler(JA)) return getFlang(); - if (getDriver().ShouldUseClangCompiler(JA)) return getClang(); + if (D.IsFlangMode() && getDriver().ShouldUseFlangCompiler(JA)) + return getFlang(); + if (getDriver().ShouldUseClangCompiler(JA)) + return getClang(); Action::ActionClass AC = JA.getKind(); if (AC == Action::AssembleJobClass && useIntegratedAs() && !getTriple().isOSAIX()) @@ -865,7 +864,7 @@ std::string ToolChain::GetLinkerPath(bool *LinkerIsLLD) const { // Get -fuse-ld= first to prevent -Wunused-command-line-argument. -fuse-ld= is // considered as the linker flavor, e.g. "bfd", "gold", or "lld". - const Arg* A = Args.getLastArg(options::OPT_fuse_ld_EQ); + const Arg *A = Args.getLastArg(options::OPT_fuse_ld_EQ); StringRef UseLinker = A ? A->getValue() : CLANG_DEFAULT_LINKER; // --ld-path= takes precedence over -fuse-ld= and specifies the executable @@ -950,9 +949,7 @@ types::ID ToolChain::LookupTypeForExtension(StringRef Ext) const { return id; } -bool ToolChain::HasNativeLLVMSupport() const { - return false; -} +bool ToolChain::HasNativeLLVMSupport() const { return false; } bool ToolChain::isCrossCompiling() const { llvm::Triple HostTriple(LLVM_HOST_TRIPLE); @@ -964,7 +961,8 @@ bool ToolChain::isCrossCompiling() const { case llvm::Triple::thumb: case llvm::Triple::thumbeb: return getArch() != llvm::Triple::arm && getArch() != llvm::Triple::thumb && - getArch() != llvm::Triple::armeb && getArch() != llvm::Triple::thumbeb; + getArch() != llvm::Triple::armeb && + getArch() != llvm::Triple::thumbeb; default: return HostTriple.getArch() != getArch(); } @@ -1046,9 +1044,7 @@ std::string ToolChain::ComputeEffectiveClangTriple(const ArgList &Args, return ComputeLLVMTriple(Args, InputType); } -std::string ToolChain::computeSysRoot() const { - return D.SysRoot; -} +std::string ToolChain::computeSysRoot() const { return D.SysRoot; } void ToolChain::AddClangSystemIncludeArgs(const ArgList &DriverArgs, ArgStringList &CC1Args) const { @@ -1072,12 +1068,12 @@ void ToolChain::addProfileRTLibs(const llvm::opt::ArgList &Args, CmdArgs.push_back(getCompilerRTArgString(Args, "profile")); } -ToolChain::RuntimeLibType ToolChain::GetRuntimeLibType( - const ArgList &Args) const { +ToolChain::RuntimeLibType +ToolChain::GetRuntimeLibType(const ArgList &Args) const { if (runtimeLibType) return *runtimeLibType; - const Arg* A = Args.getLastArg(options::OPT_rtlib_EQ); + const Arg *A = Args.getLastArg(options::OPT_rtlib_EQ); StringRef LibName = A ? A->getValue() : CLANG_DEFAULT_RTLIB; // Only use "platform" in tests to override CLANG_DEFAULT_RTLIB! @@ -1098,8 +1094,8 @@ ToolChain::RuntimeLibType ToolChain::GetRuntimeLibType( return *runtimeLibType; } -ToolChain::UnwindLibType ToolChain::GetUnwindLibType( - const ArgList &Args) const { +ToolChain::UnwindLibType +ToolChain::GetUnwindLibType(const ArgList &Args) const { if (unwindLibType) return *unwindLibType; @@ -1134,7 +1130,8 @@ ToolChain::UnwindLibType ToolChain::GetUnwindLibType( return *unwindLibType; } -ToolChain::CXXStdlibType ToolChain::GetCXXStdlibType(const ArgList &Args) const{ +ToolChain::CXXStdlibType +ToolChain::GetCXXStdlibType(const ArgList &Args) const { if (cxxStdlibType) return *cxxStdlibType; @@ -1290,7 +1287,7 @@ void ToolChain::AddCXXStdlibLibArgs(const ArgList &Args, void ToolChain::AddFilePathLibArgs(const ArgList &Args, ArgStringList &CmdArgs) const { for (const auto &LibPath : getFilePaths()) - if(LibPath.length() > 0) + if (LibPath.length() > 0) CmdArgs.push_back(Args.MakeArgString(StringRef("-L") + LibPath)); } @@ -1306,9 +1303,9 @@ bool ToolChain::isFastMathRuntimeAvailable(const ArgList &Args, if (!isOptimizationLevelFast(Args)) { // Check if -ffast-math or -funsafe-math. Arg *A = - Args.getLastArg(options::OPT_ffast_math, options::OPT_fno_fast_math, - options::OPT_funsafe_math_optimizations, - options::OPT_fno_unsafe_math_optimizations); + Args.getLastArg(options::OPT_ffast_math, options::OPT_fno_fast_math, + options::OPT_funsafe_math_optimizations, + options::OPT_fno_unsafe_math_optimizations); if (!A || A->getOption().getID() == options::OPT_fno_fast_math || A->getOption().getID() == options::OPT_fno_unsafe_math_optimizations) diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index 055884d275ce1b..106a612135f93f 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -578,6 +578,35 @@ static void addDashXForInput(const ArgList &Args, const InputInfo &Input, } } +static void addPGOFlagsGPU(const ToolChain &TC, const ArgList &Args, + ArgStringList &CmdArgs) { + const Driver &D = TC.getDriver(); + auto *ProfileClangArg = Args.getLastArg(options::OPT_fprofile_generate_gpu, + options::OPT_fno_profile_generate); + auto *ProfileLLVMArg = + Args.getLastArg(options::OPT_fprofile_instr_generate_gpu, + options::OPT_fno_profile_generate); + if (ProfileClangArg && + ProfileClangArg->getOption().matches(options::OPT_fno_profile_generate)) + ProfileClangArg = nullptr; + + if (ProfileLLVMArg && + ProfileLLVMArg->getOption().matches(options::OPT_fno_profile_generate)) + ProfileLLVMArg = nullptr; + + if (ProfileClangArg && ProfileLLVMArg) { + D.Diag(diag::err_drv_argument_not_allowed_with) + << ProfileClangArg->getSpelling() << ProfileLLVMArg->getSpelling(); + return; + } + + if (ProfileClangArg) + CmdArgs.push_back("-fprofile-instrument=clang"); + + if (ProfileLLVMArg) + CmdArgs.push_back("-fprofile-instrument=llvm"); +} + static void addPGOAndCoverageFlags(const ToolChain &TC, Compilation &C, const JobAction &JA, const InputInfo &Output, const ArgList &Args, SanitizerArgs &SanArgs, @@ -6049,10 +6078,12 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, options::OPT_finstrument_functions_after_inlining, options::OPT_finstrument_function_entry_bare); - // NVPTX/AMDGCN doesn't support PGO or coverage. There's no runtime support - // for sampling, overhead of call arc collection is way too high and there's - // no way to collect the output. - if (!Triple.isNVPTX() && !Triple.isAMDGCN()) + // NVPTX/AMDGCN PGO is handled separately + // GPU targets don't have their own profiling libraries and are + // collected/handled by the host's profiling library + if (Triple.isNVPTX() || Triple.isAMDGCN()) + addPGOFlagsGPU(TC, Args, CmdArgs); + else addPGOAndCoverageFlags(TC, C, JA, Output, Args, SanitizeArgs, CmdArgs); Args.AddLastArg(CmdArgs, options::OPT_fclang_abi_compat_EQ); >From 849b244ea29ac15cae7ddaa973356cecfb0e4792 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 18 Mar 2024 21:31:32 -0500 Subject: [PATCH 24/58] Fix elf obj file --- .../plugins-nextgen/common/src/GlobalHandler.cpp | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp index 88e4bee506ba89..bca66cff6558a2 100644 --- a/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp @@ -177,16 +177,19 @@ Expected GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, DeviceImageTy &Image) { GPUProfGlobals DeviceProfileData; - auto ELFObj = getELFObjectFile(Image); - if (!ELFObj) - return ELFObj.takeError(); + auto ObjFile = getELFObjectFile(Image); + if (!ObjFile) + return ObjFile.takeError(); + + std::unique_ptr ELFObj( + static_cast(ObjFile->release())); DeviceProfileData.TargetTriple = ELFObj->makeTriple(); // Iterate through elf symbols for (auto &Sym : ELFObj->symbols()) { auto NameOrErr = Sym.getName(); if (!NameOrErr) - return ELFObj.takeError(); + return NameOrErr.takeError(); // Check if given current global is a profiling global based // on name >From 55bd8d21a6224e0872002b0d1d77361eb75a3419 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 19 Mar 2024 14:50:54 -0500 Subject: [PATCH 25/58] Add GPU use profile option --- clang/include/clang/Driver/Options.td | 5 +++++ clang/lib/Driver/ToolChains/Clang.cpp | 22 ++++++++++++++++++++++ 2 files changed, 27 insertions(+) diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index aab445906fa347..b317d4e85b9571 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -1768,6 +1768,11 @@ def fprofile_use_EQ : Joined<["-"], "fprofile-use=">, Visibility<[ClangOption, CLOption]>, MetaVarName<"">, HelpText<"Use instrumentation data for profile-guided optimization. If pathname is a directory, it reads from /default.profdata. Otherwise, it reads from file .">; +def fprofile_use_gpu_EQ : Joined<["-"], "fprofile-use-gpu=">, + Group, + Visibility<[ClangOption, CLOption]>, + MetaVarName<"">, + HelpText<"Use instrumentation data for profile-guided optimization targeting GPU">; def fno_profile_instr_generate : Flag<["-"], "fno-profile-instr-generate">, Group, Visibility<[ClangOption, CLOption]>, HelpText<"Disable generation of profile instrumentation.">; diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index 106a612135f93f..1ea55011469509 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -586,6 +586,8 @@ static void addPGOFlagsGPU(const ToolChain &TC, const ArgList &Args, auto *ProfileLLVMArg = Args.getLastArg(options::OPT_fprofile_instr_generate_gpu, options::OPT_fno_profile_generate); + auto *ProfileUseArg = Args.getLastArg(options::OPT_fprofile_use_gpu_EQ, + options::OPT_fno_profile_instr_use); if (ProfileClangArg && ProfileClangArg->getOption().matches(options::OPT_fno_profile_generate)) ProfileClangArg = nullptr; @@ -594,17 +596,37 @@ static void addPGOFlagsGPU(const ToolChain &TC, const ArgList &Args, ProfileLLVMArg->getOption().matches(options::OPT_fno_profile_generate)) ProfileLLVMArg = nullptr; + if (ProfileUseArg && + ProfileUseArg->getOption().matches(options::OPT_fno_profile_generate)) + ProfileUseArg = nullptr; + if (ProfileClangArg && ProfileLLVMArg) { D.Diag(diag::err_drv_argument_not_allowed_with) << ProfileClangArg->getSpelling() << ProfileLLVMArg->getSpelling(); return; } + if (ProfileUseArg && ProfileClangArg) { + D.Diag(diag::err_drv_argument_not_allowed_with) + << ProfileClangArg->getSpelling() << ProfileUseArg->getSpelling(); + return; + } + + if (ProfileUseArg && ProfileLLVMArg) { + D.Diag(diag::err_drv_argument_not_allowed_with) + << ProfileLLVMArg->getSpelling() << ProfileUseArg->getSpelling(); + return; + } + if (ProfileClangArg) CmdArgs.push_back("-fprofile-instrument=clang"); if (ProfileLLVMArg) CmdArgs.push_back("-fprofile-instrument=llvm"); + + if (ProfileUseArg) + CmdArgs.push_back(Args.MakeArgString( + Twine("-fprofile-instrument-use-path=") + ProfileUseArg->getValue())); } static void addPGOAndCoverageFlags(const ToolChain &TC, Compilation &C, >From 4ebbb45baa24b52eb0f94ebaf16b6b9eb671420a Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 6 May 2024 23:13:58 -0500 Subject: [PATCH 26/58] Add more addrspace casts for GPU targets --- .../Transforms/Instrumentation/InstrProfiling.cpp | 11 ++++++++--- .../Instrumentation/PGOInstrumentation.cpp | 13 +++++++++---- 2 files changed, 17 insertions(+), 7 deletions(-) diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index a6b1e0d488120a..dd8c027c4bbf62 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -869,6 +869,8 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { llvm::InstrProfValueKind::IPVK_MemOPSize); CallInst *Call = nullptr; auto *TLI = &GetTLI(*Ind->getFunction()); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + DataVar, PointerType::getUnqual(M.getContext())); // To support value profiling calls within Windows exception handlers, funclet // information contained within operand bundles needs to be copied over to @@ -877,11 +879,13 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { SmallVector OpBundles; Ind->getOperandBundlesAsDefs(OpBundles); if (!IsMemOpSize) { - Value *Args[3] = {Ind->getTargetValue(), DataVar, Builder.getInt32(Index)}; + Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Builder.getInt32(Index)}; Call = Builder.CreateCall(getOrInsertValueProfilingCall(M, *TLI), Args, OpBundles); } else { - Value *Args[3] = {Ind->getTargetValue(), DataVar, Builder.getInt32(Index)}; + Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Builder.getInt32(Index)}; Call = Builder.CreateCall( getOrInsertValueProfilingCall(M, *TLI, ValueProfilingCallType::MemOp), Args, OpBundles); @@ -1575,7 +1579,8 @@ void InstrLowerer::createDataVariable(InstrProfCntrInstBase *Inc) { getInstrProfSectionName(IPSK_vals, TT.getObjectFormat())); ValuesVar->setAlignment(Align(8)); maybeSetComdat(ValuesVar, Fn, CntsVarName); - ValuesPtrExpr = ValuesVar; + ValuesPtrExpr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + ValuesVar, PointerType::getUnqual(Fn->getContext())); } uint64_t NumCounters = Inc->getNumCounters()->getZExtValue(); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index f662c89a378bee..fbe969f4a9c16c 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -955,12 +955,15 @@ static void instrumentOneFunc( ToProfile = Builder.CreatePtrToInt(Cand.V, Builder.getInt64Ty()); assert(ToProfile && "value profiling Value is of unexpected type"); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + Name, PointerType::get(M->getContext(), 0)); + SmallVector OpBundles; populateEHOperandBundle(Cand, BlockColors, OpBundles); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_value_profile), - {FuncInfo.FuncNameVar, Builder.getInt64(FuncInfo.FunctionHash), - ToProfile, Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, + {NormalizedPtr, Builder.getInt64(FuncInfo.FunctionHash), ToProfile, + Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, OpBundles); } } // IPVK_First <= Kind <= IPVK_Last @@ -1632,10 +1635,12 @@ void SelectInstVisitor::instrumentOneSelectInst(SelectInst &SI) { IRBuilder<> Builder(&SI); Type *Int64Ty = Builder.getInt64Ty(); auto *Step = Builder.CreateZExt(SI.getCondition(), Int64Ty); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + FuncNameVar, PointerType::get(M->getContext(), 0)); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_increment_step), - {FuncNameVar, Builder.getInt64(FuncHash), Builder.getInt32(TotalNumCtrs), - Builder.getInt32(*CurCtrIdx), Step}); + {NormalizedPtr, Builder.getInt64(FuncHash), + Builder.getInt32(TotalNumCtrs), Builder.getInt32(*CurCtrIdx), Step}); ++(*CurCtrIdx); } >From 7770b37a5a4c40bd45887f762bd7f1e652bc0ed2 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 7 May 2024 16:31:48 -0500 Subject: [PATCH 27/58] Fix params --- compiler-rt/lib/profile/InstrProfilingFile.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index 466bfe480543bc..bc1d40a37a5ad6 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1360,9 +1360,10 @@ int __llvm_write_custom_profile(const char *Target, initFileWriter(&fileWriter, OutputFile); /* Write custom data to the file */ - ReturnValue = lprofWriteDataImpl( - &fileWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, - lprofGetVPDataReader(), NamesBegin, NamesEnd, MergeDone); + ReturnValue = + lprofWriteDataImpl(&fileWriter, DataBegin, DataEnd, CountersBegin, + CountersEnd, NULL, NULL, lprofGetVPDataReader(), NULL, + NULL, NULL, NULL, NamesBegin, NamesEnd, MergeDone); closeFileObject(OutputFile); >From 619fb6918560f0b5d0b8137d392dfb27255a7d32 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 7 May 2024 17:20:39 -0500 Subject: [PATCH 28/58] Resolve merge conflict --- clang/lib/Driver/ToolChain.cpp | 7 ------- 1 file changed, 7 deletions(-) diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp index fce4168761a6d2..e29f1ccf44b690 100644 --- a/clang/lib/Driver/ToolChain.cpp +++ b/clang/lib/Driver/ToolChain.cpp @@ -1313,17 +1313,10 @@ bool ToolChain::isFastMathRuntimeAvailable(const ArgList &Args, // (to keep the linker options consistent with gcc and clang itself). if (Default && !isOptimizationLevelFast(Args)) { // Check if -ffast-math or -funsafe-math. -<<<<<<< HEAD - Arg *A = - Args.getLastArg(options::OPT_ffast_math, options::OPT_fno_fast_math, - options::OPT_funsafe_math_optimizations, - options::OPT_fno_unsafe_math_optimizations); -======= Arg *A = Args.getLastArg( options::OPT_ffast_math, options::OPT_fno_fast_math, options::OPT_funsafe_math_optimizations, options::OPT_fno_unsafe_math_optimizations, options::OPT_ffp_model_EQ); ->>>>>>> main if (!A || A->getOption().getID() == options::OPT_fno_fast_math || A->getOption().getID() == options::OPT_fno_unsafe_math_optimizations) >From 3f08ae9d560dbaeba4c547186c85a8c34f3dee97 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Sat, 11 May 2024 02:31:28 -0500 Subject: [PATCH 29/58] Have test read from profraw instead of dump --- offload/test/lit.cfg | 2 +- offload/test/offloading/pgo1.c | 90 +++++++++++++++++----------------- 2 files changed, 45 insertions(+), 47 deletions(-) diff --git a/offload/test/lit.cfg b/offload/test/lit.cfg index 069110dc69a6e4..94a0bc8a2b43ff 100644 --- a/offload/test/lit.cfg +++ b/offload/test/lit.cfg @@ -390,7 +390,7 @@ config.substitutions.append(("%clang", config.test_c_compiler)) if config.test_fortran_compiler: config.available_features.add('flang') config.substitutions.append(("%flang", config.test_fortran_compiler)) - +config.substitutions.append(("%target_triple", config.libomptarget_current_target)) config.substitutions.append(("%openmp_flags", config.test_openmp_flags)) if config.libomptarget_current_target.startswith('nvptx') and config.cuda_path: config.substitutions.append(("%cuda_flags", "--cuda-path=" + config.cuda_path)) diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index 79e93d0f10827f..9fe231e7c6716b 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -1,12 +1,15 @@ -// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ -// RUN: -Xclang "-fprofile-instrument=clang" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ -// RUN: --check-prefix="CLANG-PGO" -// RUN: %libomptarget-compile-generic -fprofile-generate \ -// RUN: -Xclang "-fprofile-instrument=llvm" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ +// RUN: %libomptarget-compile-generic -fprofile-instr-generate-gpu +// RUN: env LLVM_PROFILE_FILE=llvm.profraw %libomptarget-run-generic 2>&1 +// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %target_triple.llvm.profraw | %fcheck-generic \ // RUN: --check-prefix="LLVM-PGO" +// RUN: %libomptarget-compile-generic -fprofile-generate-gpu +// RUN: env LLVM_PROFILE_FILE=clang.profraw %libomptarget-run-generic 2>&1 +// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %target_triple.clang.profraw | %fcheck-generic \ +// RUN: --check-prefix="CLANG-PGO" + // UNSUPPORTED: x86_64-pc-linux-gnu // UNSUPPORTED: x86_64-pc-linux-gnu-LTO // UNSUPPORTED: aarch64-unknown-linux-gnu @@ -31,43 +34,38 @@ int main() { } } -// CLANG-PGO: ======== Counters ========= -// CLANG-PGO-NEXT: 0 11 20 10 20 -// CLANG-PGO-NEXT: ========== Data =========== -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: ======== Functions ======== -// CLANG-PGO-NEXT: pgo1.c: -// CLANG-PGO-SAME: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} -// CLANG-PGO-NEXT: test1 -// CLANG-PGO-NEXT: test2 +// LLVM-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: +// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-PGO: Counters: 4 +// LLVM-PGO: Function count: 20 +// LLVM-PGO: Block counts: [10, 20, 10] + +// LLVM-PGO-LABEL: test1: +// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-PGO: Counters: 1 +// LLVM-PGO: Function count: 1 +// LLVM-PGO: Block counts: [] + +// LLVM-PGO-LABEL: test2: +// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-PGO: Counters: 1 +// LLVM-PGO: Function count: 1 +// LLVM-PGO: Block counts: [] + +// CLANG-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: +// CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-PGO: Counters: 3 +// CLANG-PGO: Function count: 0 +// CLANG-PGO: Block counts: [11, 20] + +// CLANG-PGO-LABEL: test1: +// CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-PGO: Counters: 1 +// CLANG-PGO: Function count: 10 +// CLANG-PGO: Block counts: [] -// LLVM-PGO: ======== Counters ========= -// LLVM-PGO-NEXT: 20 10 20 10 1 1 -// LLVM-PGO-NEXT: ========== Data =========== -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: ======== Functions ======== -// LLVM-PGO-NEXT: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} -// LLVM-PGO-NEXT: test1 -// LLVM-PGO-NEXT: test2 +// CLANG-PGO-LABEL: test2: +// CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-PGO: Counters: 1 +// CLANG-PGO: Function count: 20 +// CLANG-PGO: Block counts: [] >From 09f2b39beaa9e325655d5569e5107827c1e7e955 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Sat, 11 May 2024 02:39:16 -0500 Subject: [PATCH 30/58] Remove debug dump --- offload/plugins-nextgen/common/src/PluginInterface.cpp | 3 --- 1 file changed, 3 deletions(-) diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp index a410deed1654f4..e7559f9e6cec83 100644 --- a/offload/plugins-nextgen/common/src/PluginInterface.cpp +++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp @@ -840,9 +840,6 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { if (!ProfOrErr) return ProfOrErr.takeError(); - // Dump out profdata - ProfOrErr->dump(); - // Write data to profiling file if (auto Err = ProfOrErr->write()) { consumeError(std::move(Err)); >From aa895a1788969a0d27692057a1457074e9772c78 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 18 Mar 2024 21:31:32 -0500 Subject: [PATCH 31/58] Fix elf obj file --- offload/plugins-nextgen/common/src/GlobalHandler.cpp | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/offload/plugins-nextgen/common/src/GlobalHandler.cpp b/offload/plugins-nextgen/common/src/GlobalHandler.cpp index 80cdcaff75528e..7717e19a5b6779 100644 --- a/offload/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/offload/plugins-nextgen/common/src/GlobalHandler.cpp @@ -177,16 +177,19 @@ Expected GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, DeviceImageTy &Image) { GPUProfGlobals DeviceProfileData; - auto ELFObj = getELFObjectFile(Image); - if (!ELFObj) - return ELFObj.takeError(); + auto ObjFile = getELFObjectFile(Image); + if (!ObjFile) + return ObjFile.takeError(); + + std::unique_ptr ELFObj( + static_cast(ObjFile->release())); DeviceProfileData.TargetTriple = ELFObj->makeTriple(); // Iterate through elf symbols for (auto &Sym : ELFObj->symbols()) { auto NameOrErr = Sym.getName(); if (!NameOrErr) - return ELFObj.takeError(); + return NameOrErr.takeError(); // Check if given current global is a profiling global based // on name >From 2031e49c2b26864f2dab72e629eb6cbe34928a7a Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 6 May 2024 23:13:58 -0500 Subject: [PATCH 32/58] Add more addrspace casts for GPU targets --- .../Transforms/Instrumentation/InstrProfiling.cpp | 11 ++++++++--- .../Instrumentation/PGOInstrumentation.cpp | 13 +++++++++---- 2 files changed, 17 insertions(+), 7 deletions(-) diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index a6b1e0d488120a..dd8c027c4bbf62 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -869,6 +869,8 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { llvm::InstrProfValueKind::IPVK_MemOPSize); CallInst *Call = nullptr; auto *TLI = &GetTLI(*Ind->getFunction()); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + DataVar, PointerType::getUnqual(M.getContext())); // To support value profiling calls within Windows exception handlers, funclet // information contained within operand bundles needs to be copied over to @@ -877,11 +879,13 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { SmallVector OpBundles; Ind->getOperandBundlesAsDefs(OpBundles); if (!IsMemOpSize) { - Value *Args[3] = {Ind->getTargetValue(), DataVar, Builder.getInt32(Index)}; + Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Builder.getInt32(Index)}; Call = Builder.CreateCall(getOrInsertValueProfilingCall(M, *TLI), Args, OpBundles); } else { - Value *Args[3] = {Ind->getTargetValue(), DataVar, Builder.getInt32(Index)}; + Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Builder.getInt32(Index)}; Call = Builder.CreateCall( getOrInsertValueProfilingCall(M, *TLI, ValueProfilingCallType::MemOp), Args, OpBundles); @@ -1575,7 +1579,8 @@ void InstrLowerer::createDataVariable(InstrProfCntrInstBase *Inc) { getInstrProfSectionName(IPSK_vals, TT.getObjectFormat())); ValuesVar->setAlignment(Align(8)); maybeSetComdat(ValuesVar, Fn, CntsVarName); - ValuesPtrExpr = ValuesVar; + ValuesPtrExpr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + ValuesVar, PointerType::getUnqual(Fn->getContext())); } uint64_t NumCounters = Inc->getNumCounters()->getZExtValue(); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index 4b51396a8baa35..ee1657ba8400ee 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -1007,12 +1007,15 @@ static void instrumentOneFunc( ToProfile = Builder.CreatePtrToInt(Cand.V, Builder.getInt64Ty()); assert(ToProfile && "value profiling Value is of unexpected type"); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + Name, PointerType::get(M->getContext(), 0)); + SmallVector OpBundles; populateEHOperandBundle(Cand, BlockColors, OpBundles); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_value_profile), - {FuncInfo.FuncNameVar, Builder.getInt64(FuncInfo.FunctionHash), - ToProfile, Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, + {NormalizedPtr, Builder.getInt64(FuncInfo.FunctionHash), ToProfile, + Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, OpBundles); } } // IPVK_First <= Kind <= IPVK_Last @@ -1685,10 +1688,12 @@ void SelectInstVisitor::instrumentOneSelectInst(SelectInst &SI) { IRBuilder<> Builder(&SI); Type *Int64Ty = Builder.getInt64Ty(); auto *Step = Builder.CreateZExt(SI.getCondition(), Int64Ty); + auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + FuncNameVar, PointerType::get(M->getContext(), 0)); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_increment_step), - {FuncNameVar, Builder.getInt64(FuncHash), Builder.getInt32(TotalNumCtrs), - Builder.getInt32(*CurCtrIdx), Step}); + {NormalizedPtr, Builder.getInt64(FuncHash), + Builder.getInt32(TotalNumCtrs), Builder.getInt32(*CurCtrIdx), Step}); ++(*CurCtrIdx); } >From be6524bb4f77de0add1e698f68115fd336f32238 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 13 May 2024 17:41:00 -0500 Subject: [PATCH 33/58] Have test read from profraw instead of dump --- offload/test/lit.cfg | 2 + offload/test/offloading/pgo1.c | 94 ++++++++++++++++------------------ 2 files changed, 46 insertions(+), 50 deletions(-) diff --git a/offload/test/lit.cfg b/offload/test/lit.cfg index 069110dc69a6e4..38e6a33b01fafc 100644 --- a/offload/test/lit.cfg +++ b/offload/test/lit.cfg @@ -391,6 +391,8 @@ if config.test_fortran_compiler: config.available_features.add('flang') config.substitutions.append(("%flang", config.test_fortran_compiler)) +config.substitutions.append(("%target_triple", config.libomptarget_current_target)) + config.substitutions.append(("%openmp_flags", config.test_openmp_flags)) if config.libomptarget_current_target.startswith('nvptx') and config.cuda_path: config.substitutions.append(("%cuda_flags", "--cuda-path=" + config.cuda_path)) diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index 79e93d0f10827f..d22d5340f5b3ec 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -1,22 +1,21 @@ -// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ -// RUN: -Xclang "-fprofile-instrument=clang" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ -// RUN: --check-prefix="CLANG-PGO" -// RUN: %libomptarget-compile-generic -fprofile-generate \ -// RUN: -Xclang "-fprofile-instrument=llvm" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ +// RUN: %libomptarget-compile-generic -Xclang "-fprofile-instrument=llvm" +// RUN: env LLVM_PROFILE_FILE=llvm.profraw %libomptarget-run-generic 2>&1 +// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %target_triple.llvm.profraw | %fcheck-generic \ // RUN: --check-prefix="LLVM-PGO" +// RUN: %libomptarget-compile-generic -Xclang "-fprofile-instrument=clang" +// RUN: env LLVM_PROFILE_FILE=clang.profraw %libomptarget-run-generic 2>&1 +// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %target_triple.clang.profraw | %fcheck-generic \ +// RUN: --check-prefix="CLANG-PGO" + // UNSUPPORTED: x86_64-pc-linux-gnu // UNSUPPORTED: x86_64-pc-linux-gnu-LTO // UNSUPPORTED: aarch64-unknown-linux-gnu // UNSUPPORTED: aarch64-unknown-linux-gnu-LTO // REQUIRES: pgo -#ifdef _OPENMP -#include -#endif - int test1(int a) { return a / 2; } int test2(int a) { return a * 2; } @@ -31,43 +30,38 @@ int main() { } } -// CLANG-PGO: ======== Counters ========= -// CLANG-PGO-NEXT: 0 11 20 10 20 -// CLANG-PGO-NEXT: ========== Data =========== -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: ======== Functions ======== -// CLANG-PGO-NEXT: pgo1.c: -// CLANG-PGO-SAME: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} -// CLANG-PGO-NEXT: test1 -// CLANG-PGO-NEXT: test2 +// LLVM-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: +// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-PGO: Counters: 4 +// LLVM-PGO: Function count: 20 +// LLVM-PGO: Block counts: [10, 20, 10] + +// LLVM-PGO-LABEL: test1: +// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-PGO: Counters: 1 +// LLVM-PGO: Function count: 1 +// LLVM-PGO: Block counts: [] + +// LLVM-PGO-LABEL: test2: +// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-PGO: Counters: 1 +// LLVM-PGO: Function count: 1 +// LLVM-PGO: Block counts: [] + +// CLANG-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: +// CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-PGO: Counters: 3 +// CLANG-PGO: Function count: 0 +// CLANG-PGO: Block counts: [11, 20] + +// CLANG-PGO-LABEL: test1: +// CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-PGO: Counters: 1 +// CLANG-PGO: Function count: 10 +// CLANG-PGO: Block counts: [] -// LLVM-PGO: ======== Counters ========= -// LLVM-PGO-NEXT: 20 10 20 10 1 1 -// LLVM-PGO-NEXT: ========== Data =========== -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: ======== Functions ======== -// LLVM-PGO-NEXT: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} -// LLVM-PGO-NEXT: test1 -// LLVM-PGO-NEXT: test2 +// CLANG-PGO-LABEL: test2: +// CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-PGO: Counters: 1 +// CLANG-PGO: Function count: 20 +// CLANG-PGO: Block counts: [] >From e266cc7190b4639c8273d49d39e78aa644bf032b Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 16 May 2024 23:21:20 -0500 Subject: [PATCH 34/58] Fix GPU PGO names --- clang/lib/Driver/ToolChains/Clang.cpp | 25 ++++++++++++++++++++++--- offload/test/offloading/pgo1.c | 4 ++-- 2 files changed, 24 insertions(+), 5 deletions(-) diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index f0c2481145d0bd..5a4dc1295360fc 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -585,13 +585,20 @@ static void addDashXForInput(const ArgList &Args, const InputInfo &Input, static void addPGOFlagsGPU(const ToolChain &TC, const ArgList &Args, ArgStringList &CmdArgs) { const Driver &D = TC.getDriver(); - auto *ProfileClangArg = Args.getLastArg(options::OPT_fprofile_generate_gpu, - options::OPT_fno_profile_generate); - auto *ProfileLLVMArg = + auto *ProfileClangArg = Args.getLastArg(options::OPT_fprofile_instr_generate_gpu, options::OPT_fno_profile_generate); + auto *ProfileLLVMArg = Args.getLastArg(options::OPT_fprofile_generate_gpu, + options::OPT_fno_profile_generate); auto *ProfileUseArg = Args.getLastArg(options::OPT_fprofile_use_gpu_EQ, options::OPT_fno_profile_instr_use); + + auto *HostLLVMArg = Args.getLastArgNoClaim(options::OPT_fprofile_generate, + options::OPT_fprofile_generate_EQ); + auto *HostClangArg = + Args.getLastArgNoClaim(options::OPT_fprofile_instr_generate, + options::OPT_fprofile_instr_generate_EQ); + if (ProfileClangArg && ProfileClangArg->getOption().matches(options::OPT_fno_profile_generate)) ProfileClangArg = nullptr; @@ -622,6 +629,18 @@ static void addPGOFlagsGPU(const ToolChain &TC, const ArgList &Args, return; } + if (HostLLVMArg && ProfileClangArg) { + D.Diag(diag::err_drv_argument_not_allowed_with) + << HostLLVMArg->getSpelling() << ProfileClangArg->getSpelling(); + return; + } + + if (HostClangArg && ProfileLLVMArg) { + D.Diag(diag::err_drv_argument_not_allowed_with) + << HostClangArg->getSpelling() << ProfileLLVMArg->getSpelling(); + return; + } + if (ProfileClangArg) CmdArgs.push_back("-fprofile-instrument=clang"); diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index ec93cce2c86207..b9fc95c89791a8 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -1,10 +1,10 @@ -// RUN: %libomptarget-compile-generic -fprofile-instr-generate-gpu +// RUN: %libomptarget-compile-generic -fprofile-generate-gpu // RUN: env LLVM_PROFILE_FILE=llvm.profraw %libomptarget-run-generic 2>&1 // RUN: llvm-profdata show --all-functions --counts \ // RUN: %target_triple.llvm.profraw | %fcheck-generic \ // RUN: --check-prefix="LLVM-PGO" -// RUN: %libomptarget-compile-generic -fprofile-generate-gpu +// RUN: %libomptarget-compile-generic -fprofile-instr-generate-gpu // RUN: env LLVM_PROFILE_FILE=clang.profraw %libomptarget-run-generic 2>&1 // RUN: llvm-profdata show --all-functions --counts \ // RUN: %target_triple.clang.profraw | %fcheck-generic \ >From 2b8eb2935ec21bf0acc5c56f45837b5976560963 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 24 May 2024 19:59:33 -0500 Subject: [PATCH 35/58] Fix PGO test format --- offload/test/offloading/pgo1.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index d22d5340f5b3ec..0e75c684ed9263 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -33,20 +33,17 @@ int main() { // LLVM-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 4 -// LLVM-PGO: Function count: 20 -// LLVM-PGO: Block counts: [10, 20, 10] +// LLVM-PGO: Block counts: [20, 10, 20, 10] // LLVM-PGO-LABEL: test1: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Function count: 1 -// LLVM-PGO: Block counts: [] +// LLVM-PGO: Block counts: [1] // LLVM-PGO-LABEL: test2: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Function count: 1 -// LLVM-PGO: Block counts: [] +// LLVM-PGO: Block counts: [1] // CLANG-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} >From 67f3009173d815295f36e2b37e85add1347e3bf9 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 24 May 2024 20:45:04 -0500 Subject: [PATCH 36/58] Refactor profile writer --- compiler-rt/lib/profile/InstrProfilingFile.c | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index bc1d40a37a5ad6..76238214c13aa3 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1344,8 +1344,7 @@ int __llvm_write_custom_profile(const char *Target, forceTruncateFile(TargetFilename); /* Open target-specific PGO file */ - MergeDone = 0; - FILE *OutputFile = getMergeFileObject(TargetFilename, &MergeDone); + FILE *OutputFile = getFileObject(TargetFilename); if (!OutputFile) { PROF_ERR("Failed to open file : %s\n", TargetFilename); @@ -1356,15 +1355,11 @@ int __llvm_write_custom_profile(const char *Target, FreeHook = &free; setupIOBuffer(); - ProfDataWriter fileWriter; - initFileWriter(&fileWriter, OutputFile); - - /* Write custom data to the file */ - ReturnValue = - lprofWriteDataImpl(&fileWriter, DataBegin, DataEnd, CountersBegin, - CountersEnd, NULL, NULL, lprofGetVPDataReader(), NULL, - NULL, NULL, NULL, NamesBegin, NamesEnd, MergeDone); + /* Write custom data */ + ReturnValue = __llvm_profile_write_buffer_internal( + OutputFile, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, + NamesBegin, NamesEnd); closeFileObject(OutputFile); // Restore SIGKILL. >From e8ad1322c557f7b48e2b28fe3a34a696a1103bba Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 27 May 2024 18:29:18 -0500 Subject: [PATCH 37/58] Fix refactor bug --- compiler-rt/lib/profile/InstrProfilingFile.c | 52 ++++++++++---------- offload/test/offloading/pgo1.c | 6 ++- 2 files changed, 29 insertions(+), 29 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index 76238214c13aa3..784cb9af6169d8 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -505,14 +505,6 @@ static FILE *getFileObject(const char *OutputName) { return fopen(OutputName, "ab"); } -/* Get file object and merge if applicable */ -static FILE *getMergeFileObject(const char *OutputName, int *MergeDone) { - VPMergeHook = &lprofMergeValueProfData; - if (doMerging()) - return openFileForMerging(OutputName, MergeDone); - return getFileObject(OutputName); -} - static void closeFileObject(FILE *OutputFile) { if (OutputFile == getProfileFile()) { fflush(OutputFile); @@ -526,8 +518,15 @@ static void closeFileObject(FILE *OutputFile) { /* Write profile data to file \c OutputName. */ static int writeFile(const char *OutputName) { - int RetVal, MergeDone = 0; - FILE *OutputFile = getMergeFileObject(OutputName, &MergeDone); + int RetVal; + FILE *OutputFile; + + int MergeDone = 0; + VPMergeHook = &lprofMergeValueProfData; + if (doMerging()) + OutputFile = openFileForMerging(OutputName, &MergeDone); + else + OutputFile = getFileObject(OutputName); if (!OutputFile) return -1; @@ -565,16 +564,10 @@ static int writeOrderFile(const char *OutputName) { #define LPROF_INIT_ONCE_ENV "__LLVM_PROFILE_RT_INIT_ONCE" -static void forceTruncateFile(const char *Filename) { - FILE *File = fopen(Filename, "w"); - if (!File) - return; - fclose(File); -} - static void truncateCurrentFile(void) { const char *Filename; char *FilenameBuf; + FILE *File; int Length; Length = getCurFilenameLength(); @@ -604,7 +597,10 @@ static void truncateCurrentFile(void) { return; /* Truncate the file. Later we'll reopen and append. */ - forceTruncateFile(Filename); + File = fopen(Filename, "w"); + if (!File) + return; + fclose(File); } /* Write a partial profile to \p Filename, which is required to be backed by @@ -1287,7 +1283,7 @@ int __llvm_write_custom_profile(const char *Target, const char *CountersBegin, const char *CountersEnd, const char *NamesBegin, const char *NamesEnd) { - int ReturnValue = 0, FilenameLength, TargetLength, MergeDone; + int ReturnValue = 0, FilenameLength, TargetLength; char *FilenameBuf, *TargetFilename; const char *Filename; @@ -1340,11 +1336,9 @@ int __llvm_write_custom_profile(const char *Target, return -1; } - /* Clean old target file */ - forceTruncateFile(TargetFilename); - - /* Open target-specific PGO file */ - FILE *OutputFile = getFileObject(TargetFilename); + /* Open and truncate target-specific PGO file */ + FILE *OutputFile = fopen(TargetFilename, "w"); + setProfileFile(OutputFile); if (!OutputFile) { PROF_ERR("Failed to open file : %s\n", TargetFilename); @@ -1357,9 +1351,13 @@ int __llvm_write_custom_profile(const char *Target, setupIOBuffer(); /* Write custom data */ - ReturnValue = __llvm_profile_write_buffer_internal( - OutputFile, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, - NamesBegin, NamesEnd); + ProfDataWriter fileWriter; + initFileWriter(&fileWriter, OutputFile); + + /* Write custom data to the file */ + ReturnValue = lprofWriteDataImpl( + &fileWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, + lprofGetVPDataReader(), NULL, NULL, NULL, NULL, NamesBegin, NamesEnd, 0); closeFileObject(OutputFile); // Restore SIGKILL. diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index 0e75c684ed9263..d6747113265803 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -1,10 +1,12 @@ -// RUN: %libomptarget-compile-generic -Xclang "-fprofile-instrument=llvm" +// RUN: %libomptarget-compile-generic -fprofile-generate \ +// RUN: -Xclang "-fprofile-instrument=llvm" // RUN: env LLVM_PROFILE_FILE=llvm.profraw %libomptarget-run-generic 2>&1 // RUN: llvm-profdata show --all-functions --counts \ // RUN: %target_triple.llvm.profraw | %fcheck-generic \ // RUN: --check-prefix="LLVM-PGO" -// RUN: %libomptarget-compile-generic -Xclang "-fprofile-instrument=clang" +// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ +// RUN: -Xclang "-fprofile-instrument=clang" // RUN: env LLVM_PROFILE_FILE=clang.profraw %libomptarget-run-generic 2>&1 // RUN: llvm-profdata show --all-functions --counts \ // RUN: %target_triple.clang.profraw | %fcheck-generic \ >From 79bf08e0bea8ab32781f201cdfc096a59156f270 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 28 May 2024 00:45:47 -0500 Subject: [PATCH 38/58] Check for level in test case TODO: Actually ensure the right level is used in the profraw file when only GPU flags are supplied --- offload/test/offloading/pgo1.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index 5d1a41ccab0f24..a3a242c703d96a 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -45,6 +45,16 @@ int main() { // LLVM-PGO: Counters: 1 // LLVM-PGO: Block counts: [1] +// LLVM-PGO-LABEL: Instrumentation level: +// LLVM-PGO-SAME: IR +// LLVM-PGO-SAME: entry_first = 0 +// LLVM-PGO-LABEL: Functions shown: +// LLVM-PGO-SAME: 3 +// LLVM-PGO-LABEL: Maximum function count: +// LLVM-PGO-SAME: 20 +// LLVM-PGO-LABEL: Maximum internal block count: +// LLVM-PGO-SAME: 20 + // CLANG-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // CLANG-PGO: Counters: 3 @@ -62,3 +72,12 @@ int main() { // CLANG-PGO: Counters: 1 // CLANG-PGO: Function count: 20 // CLANG-PGO: Block counts: [] + +// CLANG-PGO-LABEL: Instrumentation level: +// CLANG-PGO-SAME: Front-end +// CLANG-PGO-LABEL: Functions shown: +// CLANG-PGO-SAME: 3 +// CLANG-PGO-LABEL: Maximum function count: +// CLANG-PGO-SAME: 20 +// CLANG-PGO-LABEL: Maximum internal block count: +// CLANG-PGO-SAME: 20 >From 4c9f814ce14aeb6766a93f5c1d15b847b98dc29f Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Tue, 28 May 2024 12:58:43 -0500 Subject: [PATCH 39/58] Make requested clang-format change --- offload/plugins-nextgen/common/include/GlobalHandler.h | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/offload/plugins-nextgen/common/include/GlobalHandler.h b/offload/plugins-nextgen/common/include/GlobalHandler.h index 017d7e994f07a8..1d7b9f80f9dfd3 100644 --- a/offload/plugins-nextgen/common/include/GlobalHandler.h +++ b/offload/plugins-nextgen/common/include/GlobalHandler.h @@ -64,12 +64,10 @@ struct __llvm_profile_data { }; extern "C" { -extern int __attribute__((weak)) -__llvm_write_custom_profile(const char *Target, - const __llvm_profile_data *DataBegin, - const __llvm_profile_data *DataEnd, - const char *CountersBegin, const char *CountersEnd, - const char *NamesBegin, const char *NamesEnd); +extern int __attribute__((weak)) __llvm_write_custom_profile( + const char *Target, const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, const char *CountersBegin, + const char *CountersEnd, const char *NamesBegin, const char *NamesEnd); } /// PGO profiling data extracted from a GPU device >From cfe166091ca91623d356d2dde41b64cefe98e472 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 30 May 2024 18:15:55 -0500 Subject: [PATCH 40/58] Check for version global on GPU --- compiler-rt/include/profile/InstrProfData.inc | 2 +- compiler-rt/lib/profile/InstrProfiling.h | 3 ++- .../lib/profile/InstrProfilingBuffer.c | 3 ++- compiler-rt/lib/profile/InstrProfilingFile.c | 14 +++++++++---- .../lib/profile/InstrProfilingInternal.h | 3 ++- .../lib/profile/InstrProfilingWriter.c | 20 +++++++++---------- .../llvm/ProfileData/InstrProfData.inc | 2 +- .../Instrumentation/PGOInstrumentation.cpp | 5 ++++- .../llvm-profdata/binary-ids-padding.test | 2 +- ...alformed-not-space-for-another-header.test | 2 +- .../malformed-num-counters-zero.test | 2 +- .../malformed-ptr-to-counter-array.test | 2 +- .../common/include/GlobalHandler.h | 13 ++++++++---- .../common/src/GlobalHandler.cpp | 16 ++++++++++++--- 14 files changed, 58 insertions(+), 31 deletions(-) diff --git a/compiler-rt/include/profile/InstrProfData.inc b/compiler-rt/include/profile/InstrProfData.inc index e9866d94b762c1..f0a260483429c3 100644 --- a/compiler-rt/include/profile/InstrProfData.inc +++ b/compiler-rt/include/profile/InstrProfData.inc @@ -152,7 +152,7 @@ INSTR_PROF_VALUE_NODE(PtrToNodeT, llvm::PointerType::getUnqual(Ctx), Next, \ #define INSTR_PROF_DATA_DEFINED #endif INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic()) -INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version()) +INSTR_PROF_RAW_HEADER(uint64_t, Version, Version) INSTR_PROF_RAW_HEADER(uint64_t, BinaryIdsSize, __llvm_write_binary_ids(NULL)) INSTR_PROF_RAW_HEADER(uint64_t, NumData, NumData) INSTR_PROF_RAW_HEADER(uint64_t, PaddingBytesBeforeCounters, PaddingBytesBeforeCounters) diff --git a/compiler-rt/lib/profile/InstrProfiling.h b/compiler-rt/lib/profile/InstrProfiling.h index ef1292a45bf01d..34b7d85ad66845 100644 --- a/compiler-rt/lib/profile/InstrProfiling.h +++ b/compiler-rt/lib/profile/InstrProfiling.h @@ -305,7 +305,8 @@ int __llvm_write_custom_profile(const char *Target, const __llvm_profile_data *DataEnd, const char *CountersBegin, const char *CountersEnd, const char *NamesBegin, - const char *NamesEnd); + const char *NamesEnd, + const uint64_t *VersionOverride); /*! * This variable is defined in InstrProfilingRuntime.cpp as a hidden diff --git a/compiler-rt/lib/profile/InstrProfilingBuffer.c b/compiler-rt/lib/profile/InstrProfilingBuffer.c index 1c451d7ec75637..b406e8db74f3f0 100644 --- a/compiler-rt/lib/profile/InstrProfilingBuffer.c +++ b/compiler-rt/lib/profile/InstrProfilingBuffer.c @@ -252,5 +252,6 @@ COMPILER_RT_VISIBILITY int __llvm_profile_write_buffer_internal( &BufferWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, BitmapBegin, BitmapEnd, /*VPDataReader=*/0, NamesBegin, NamesEnd, /*VTableBegin=*/NULL, /*VTableEnd=*/NULL, /*VNamesBegin=*/NULL, - /*VNamesEnd=*/NULL, /*SkipNameDataWrite=*/0); + /*VNamesEnd=*/NULL, /*SkipNameDataWrite=*/0, + __llvm_profile_get_version()); } diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index 784cb9af6169d8..947a3ff89bc1fc 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1282,7 +1282,8 @@ int __llvm_write_custom_profile(const char *Target, const __llvm_profile_data *DataEnd, const char *CountersBegin, const char *CountersEnd, const char *NamesBegin, - const char *NamesEnd) { + const char *NamesEnd, + const uint64_t *VersionOverride) { int ReturnValue = 0, FilenameLength, TargetLength; char *FilenameBuf, *TargetFilename; const char *Filename; @@ -1354,10 +1355,15 @@ int __llvm_write_custom_profile(const char *Target, ProfDataWriter fileWriter; initFileWriter(&fileWriter, OutputFile); + uint64_t Version = __llvm_profile_get_version(); + if (VersionOverride) + Version = *VersionOverride; + /* Write custom data to the file */ - ReturnValue = lprofWriteDataImpl( - &fileWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL, - lprofGetVPDataReader(), NULL, NULL, NULL, NULL, NamesBegin, NamesEnd, 0); + ReturnValue = + lprofWriteDataImpl(&fileWriter, DataBegin, DataEnd, CountersBegin, + CountersEnd, NULL, NULL, lprofGetVPDataReader(), NULL, + NULL, NULL, NULL, NamesBegin, NamesEnd, 0, Version); closeFileObject(OutputFile); // Restore SIGKILL. diff --git a/compiler-rt/lib/profile/InstrProfilingInternal.h b/compiler-rt/lib/profile/InstrProfilingInternal.h index d5bd0e41fb1291..2b9f687e7f8860 100644 --- a/compiler-rt/lib/profile/InstrProfilingInternal.h +++ b/compiler-rt/lib/profile/InstrProfilingInternal.h @@ -160,7 +160,8 @@ int lprofWriteDataImpl(ProfDataWriter *Writer, VPDataReaderType *VPDataReader, const char *NamesBegin, const char *NamesEnd, const VTableProfData *VTableBegin, const VTableProfData *VTableEnd, const char *VNamesBegin, - const char *VNamesEnd, int SkipNameDataWrite); + const char *VNamesEnd, int SkipNameDataWrite, + uint64_t Version); /* Merge value profile data pointed to by SrcValueProfData into * in-memory profile counters pointed by to DstData. */ diff --git a/compiler-rt/lib/profile/InstrProfilingWriter.c b/compiler-rt/lib/profile/InstrProfilingWriter.c index 8816a71155511b..bcd88b30d050d4 100644 --- a/compiler-rt/lib/profile/InstrProfilingWriter.c +++ b/compiler-rt/lib/profile/InstrProfilingWriter.c @@ -254,21 +254,21 @@ COMPILER_RT_VISIBILITY int lprofWriteData(ProfDataWriter *Writer, const VTableProfData *VTableEnd = __llvm_profile_end_vtables(); const char *VNamesBegin = __llvm_profile_begin_vtabnames(); const char *VNamesEnd = __llvm_profile_end_vtabnames(); + uint64_t Version = __llvm_profile_get_version(); return lprofWriteDataImpl(Writer, DataBegin, DataEnd, CountersBegin, CountersEnd, BitmapBegin, BitmapEnd, VPDataReader, NamesBegin, NamesEnd, VTableBegin, VTableEnd, - VNamesBegin, VNamesEnd, SkipNameDataWrite); + VNamesBegin, VNamesEnd, SkipNameDataWrite, Version); } -COMPILER_RT_VISIBILITY int -lprofWriteDataImpl(ProfDataWriter *Writer, const __llvm_profile_data *DataBegin, - const __llvm_profile_data *DataEnd, - const char *CountersBegin, const char *CountersEnd, - const char *BitmapBegin, const char *BitmapEnd, - VPDataReaderType *VPDataReader, const char *NamesBegin, - const char *NamesEnd, const VTableProfData *VTableBegin, - const VTableProfData *VTableEnd, const char *VNamesBegin, - const char *VNamesEnd, int SkipNameDataWrite) { +COMPILER_RT_VISIBILITY int lprofWriteDataImpl( + ProfDataWriter *Writer, const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, const char *CountersBegin, + const char *CountersEnd, const char *BitmapBegin, const char *BitmapEnd, + VPDataReaderType *VPDataReader, const char *NamesBegin, + const char *NamesEnd, const VTableProfData *VTableBegin, + const VTableProfData *VTableEnd, const char *VNamesBegin, + const char *VNamesEnd, int SkipNameDataWrite, uint64_t Version) { /* Calculate size of sections. */ const uint64_t DataSectionSize = __llvm_profile_get_data_size(DataBegin, DataEnd); diff --git a/llvm/include/llvm/ProfileData/InstrProfData.inc b/llvm/include/llvm/ProfileData/InstrProfData.inc index e9866d94b762c1..f0a260483429c3 100644 --- a/llvm/include/llvm/ProfileData/InstrProfData.inc +++ b/llvm/include/llvm/ProfileData/InstrProfData.inc @@ -152,7 +152,7 @@ INSTR_PROF_VALUE_NODE(PtrToNodeT, llvm::PointerType::getUnqual(Ctx), Next, \ #define INSTR_PROF_DATA_DEFINED #endif INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic()) -INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version()) +INSTR_PROF_RAW_HEADER(uint64_t, Version, Version) INSTR_PROF_RAW_HEADER(uint64_t, BinaryIdsSize, __llvm_write_binary_ids(NULL)) INSTR_PROF_RAW_HEADER(uint64_t, NumData, NumData) INSTR_PROF_RAW_HEADER(uint64_t, PaddingBytesBeforeCounters, PaddingBytesBeforeCounters) diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index ee1657ba8400ee..f3c68fb17ce7cd 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -408,7 +408,10 @@ static GlobalVariable *createIRLevelProfileFlagVar(Module &M, bool IsCS) { auto IRLevelVersionVariable = new GlobalVariable( M, IntTy64, true, GlobalValue::WeakAnyLinkage, Constant::getIntegerValue(IntTy64, APInt(64, ProfileVersion)), VarName); - IRLevelVersionVariable->setVisibility(GlobalValue::HiddenVisibility); + if (isGPUProfTarget(M)) + IRLevelVersionVariable->setVisibility(GlobalValue::ProtectedVisibility); + else + IRLevelVersionVariable->setVisibility(GlobalValue::HiddenVisibility); Triple TT(M.getTargetTriple()); if (TT.supportsCOMDAT()) { IRLevelVersionVariable->setLinkage(GlobalValue::ExternalLinkage); diff --git a/llvm/test/tools/llvm-profdata/binary-ids-padding.test b/llvm/test/tools/llvm-profdata/binary-ids-padding.test index 292c582b45c52d..f31aa15bfe6c97 100644 --- a/llvm/test/tools/llvm-profdata/binary-ids-padding.test +++ b/llvm/test/tools/llvm-profdata/binary-ids-padding.test @@ -1,7 +1,7 @@ // Header // // INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic()) -// INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version()) +// INSTR_PROF_RAW_HEADER(uint64_t, Version, Version) // INSTR_PROF_RAW_HEADER(uint64_t, BinaryIdsSize, __llvm_write_binary_ids(NULL)) // INSTR_PROF_RAW_HEADER(uint64_t, DataSize, DataSize) // INSTR_PROF_RAW_HEADER(uint64_t, CountersSize, CountersSize) diff --git a/llvm/test/tools/llvm-profdata/malformed-not-space-for-another-header.test b/llvm/test/tools/llvm-profdata/malformed-not-space-for-another-header.test index 705e5efaf58759..44be2980bb2f25 100644 --- a/llvm/test/tools/llvm-profdata/malformed-not-space-for-another-header.test +++ b/llvm/test/tools/llvm-profdata/malformed-not-space-for-another-header.test @@ -1,7 +1,7 @@ // Header // // INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic()) -// INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version()) +// INSTR_PROF_RAW_HEADER(uint64_t, Version, Version) // INSTR_PROF_RAW_HEADER(uint64_t, BinaryIdsSize, __llvm_write_binary_ids(NULL)) // INSTR_PROF_RAW_HEADER(uint64_t, DataSize, DataSize) // INSTR_PROF_RAW_HEADER(uint64_t, CountersSize, CountersSize) diff --git a/llvm/test/tools/llvm-profdata/malformed-num-counters-zero.test b/llvm/test/tools/llvm-profdata/malformed-num-counters-zero.test index 157c13b926a7ed..9af9d65a6bdba1 100644 --- a/llvm/test/tools/llvm-profdata/malformed-num-counters-zero.test +++ b/llvm/test/tools/llvm-profdata/malformed-num-counters-zero.test @@ -1,7 +1,7 @@ // Header // // INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic()) -// INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version()) +// INSTR_PROF_RAW_HEADER(uint64_t, Version, Version) // INSTR_PROF_RAW_HEADER(uint64_t, BinaryIdsSize, __llvm_write_binary_ids(NULL)) // INSTR_PROF_RAW_HEADER(uint64_t, DataSize, DataSize) // INSTR_PROF_RAW_HEADER(uint64_t, CountersSize, CountersSize) diff --git a/llvm/test/tools/llvm-profdata/malformed-ptr-to-counter-array.test b/llvm/test/tools/llvm-profdata/malformed-ptr-to-counter-array.test index 83cf76f68fb635..49c5ae9b0931d6 100644 --- a/llvm/test/tools/llvm-profdata/malformed-ptr-to-counter-array.test +++ b/llvm/test/tools/llvm-profdata/malformed-ptr-to-counter-array.test @@ -1,7 +1,7 @@ // Header // // INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic()) -// INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version()) +// INSTR_PROF_RAW_HEADER(uint64_t, Version, Version) // INSTR_PROF_RAW_HEADER(uint64_t, BinaryIdsSize, __llvm_write_binary_ids(NULL)) // INSTR_PROF_RAW_HEADER(uint64_t, DataSize, DataSize) // INSTR_PROF_RAW_HEADER(uint64_t, CountersSize, CountersSize) diff --git a/offload/plugins-nextgen/common/include/GlobalHandler.h b/offload/plugins-nextgen/common/include/GlobalHandler.h index 1d7b9f80f9dfd3..6daa8b1b854131 100644 --- a/offload/plugins-nextgen/common/include/GlobalHandler.h +++ b/offload/plugins-nextgen/common/include/GlobalHandler.h @@ -13,6 +13,7 @@ #ifndef LLVM_OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_GLOBALHANDLER_H #define LLVM_OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_GLOBALHANDLER_H +#include #include #include "llvm/ADT/DenseMap.h" @@ -64,10 +65,13 @@ struct __llvm_profile_data { }; extern "C" { -extern int __attribute__((weak)) __llvm_write_custom_profile( - const char *Target, const __llvm_profile_data *DataBegin, - const __llvm_profile_data *DataEnd, const char *CountersBegin, - const char *CountersEnd, const char *NamesBegin, const char *NamesEnd); +extern int __attribute__((weak)) +__llvm_write_custom_profile(const char *Target, + const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, + const char *CountersBegin, const char *CountersEnd, + const char *NamesBegin, const char *NamesEnd, + const uint64_t *VersionOverride); } /// PGO profiling data extracted from a GPU device @@ -76,6 +80,7 @@ struct GPUProfGlobals { SmallVector<__llvm_profile_data> Data; SmallVector NamesData; Triple TargetTriple; + std::optional Version; void dump() const; Error write() const; diff --git a/offload/plugins-nextgen/common/src/GlobalHandler.cpp b/offload/plugins-nextgen/common/src/GlobalHandler.cpp index bca66cff6558a2..93abd0a5cea365 100644 --- a/offload/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/offload/plugins-nextgen/common/src/GlobalHandler.cpp @@ -16,6 +16,7 @@ #include "Shared/Utils.h" +#include "llvm/ProfileData/InstrProfData.inc" #include "llvm/Support/Error.h" #include @@ -214,6 +215,13 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, if (auto Err = readGlobalFromDevice(Device, Image, DataGlobal)) return Err; DeviceProfileData.Data.push_back(std::move(Data)); + } else if (*NameOrErr == INSTR_PROF_QUOTE(INSTR_PROF_RAW_VERSION_VAR)) { + uint64_t RawVersionData; + GlobalTy RawVersionGlobal(NameOrErr->str(), Sym.getSize(), + &RawVersionData); + if (auto Err = readGlobalFromDevice(Device, Image, RawVersionGlobal)) + return Err; + DeviceProfileData.Version = RawVersionData; } } return DeviceProfileData; @@ -267,6 +275,8 @@ Error GPUProfGlobals::write() const { CountsSize = Counts.size() * sizeof(int64_t); __llvm_profile_data *DataBegin, *DataEnd; char *CountersBegin, *CountersEnd, *NamesBegin, *NamesEnd; + const uint64_t *VersionOverride = + Version.has_value() ? &Version.value() : nullptr; // Initialize array of contiguous data. We need to make sure each section is // contiguous so that the PGO library can compute deltas properly @@ -288,9 +298,9 @@ Error GPUProfGlobals::write() const { memcpy(NamesBegin, NamesData.data(), NamesData.size()); // Invoke compiler-rt entrypoint - int result = __llvm_write_custom_profile(TargetTriple.str().c_str(), - DataBegin, DataEnd, CountersBegin, - CountersEnd, NamesBegin, NamesEnd); + int result = __llvm_write_custom_profile( + TargetTriple.str().c_str(), DataBegin, DataEnd, CountersBegin, + CountersEnd, NamesBegin, NamesEnd, VersionOverride); if (result != 0) return Plugin::error("Error writing GPU PGO data to file"); >From 5bf437618c91c882543c97d34b468d74070218fa Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 31 May 2024 12:57:07 -0500 Subject: [PATCH 41/58] Add host/device combination test --- offload/test/offloading/{ => gpupgo}/pgo1.c | 14 ++-- offload/test/offloading/gpupgo/pgo2.c | 73 +++++++++++++++++++++ 2 files changed, 81 insertions(+), 6 deletions(-) rename offload/test/offloading/{ => gpupgo}/pgo1.c (83%) create mode 100644 offload/test/offloading/gpupgo/pgo2.c diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/gpupgo/pgo1.c similarity index 83% rename from offload/test/offloading/pgo1.c rename to offload/test/offloading/gpupgo/pgo1.c index a3a242c703d96a..7c6e55f6546424 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/gpupgo/pgo1.c @@ -1,14 +1,16 @@ // RUN: %libomptarget-compile-generic -fprofile-generate-gpu -// RUN: env LLVM_PROFILE_FILE=llvm.profraw %libomptarget-run-generic 2>&1 +// RUN: env LLVM_PROFILE_FILE=%basename_t.llvm.profraw \ +// RUN: %libomptarget-run-generic 2>&1 // RUN: llvm-profdata show --all-functions --counts \ -// RUN: %target_triple.llvm.profraw | %fcheck-generic \ -// RUN: --check-prefix="LLVM-PGO" +// RUN: %target_triple.%basename_t.llvm.profraw | \ +// RUN: %fcheck-generic --check-prefix="LLVM-PGO" // RUN: %libomptarget-compile-generic -fprofile-instr-generate-gpu -// RUN: env LLVM_PROFILE_FILE=clang.profraw %libomptarget-run-generic 2>&1 +// RUN: env LLVM_PROFILE_FILE=%basename_t.clang.profraw \ +// RUN: %libomptarget-run-generic 2>&1 // RUN: llvm-profdata show --all-functions --counts \ -// RUN: %target_triple.clang.profraw | %fcheck-generic \ -// RUN: --check-prefix="CLANG-PGO" +// RUN: %target_triple.%basename_t.clang.profraw | \ +// RUN: %fcheck-generic --check-prefix="CLANG-PGO" // UNSUPPORTED: x86_64-pc-linux-gnu // UNSUPPORTED: x86_64-pc-linux-gnu-LTO diff --git a/offload/test/offloading/gpupgo/pgo2.c b/offload/test/offloading/gpupgo/pgo2.c new file mode 100644 index 00000000000000..1819573d55024a --- /dev/null +++ b/offload/test/offloading/gpupgo/pgo2.c @@ -0,0 +1,73 @@ +// RUN: %libomptarget-compile-generic -fprofile-generate \ +// RUN: -fprofile-generate-gpu +// RUN: env LLVM_PROFILE_FILE=%basename_t.llvm.profraw \ +// RUN: %libomptarget-run-generic 2>&1 +// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %basename_t.llvm.profraw | %fcheck-generic \ +// RUN: --check-prefix="LLVM-HOST" +// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %target_triple.%basename_t.llvm.profraw \ +// RUN: | %fcheck-generic --check-prefix="LLVM-DEVICE" + +// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ +// RUN: -fprofile-instr-generate-gpu +// RUN: env LLVM_PROFILE_FILE=%basename_t.clang.profraw \ +// RUN: %libomptarget-run-generic 2>&1 +// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %basename_t.clang.profraw | %fcheck-generic \ +// RUN: --check-prefix="CLANG-HOST" +// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %target_triple.%basename_t.clang.profraw | \ +// RUN: %fcheck-generic --check-prefix="CLANG-DEV" + +// UNSUPPORTED: x86_64-pc-linux-gnu +// UNSUPPORTED: x86_64-pc-linux-gnu-LTO +// UNSUPPORTED: aarch64-unknown-linux-gnu +// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +// REQUIRES: pgo + +int main() { + int host_var = 0; + for (int i = 0; i < 20; i++) { + host_var += i; + } + + int device_var = 1; +#pragma omp target + for (int i = 0; i < 10; i++) { + device_var *= i; + } +} + +// LLVM-HOST-LABEL: main: +// LLVM-HOST: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-HOST: Counters: 3 +// LLVM-HOST: Block counts: [20, 1, 0] + +// LLVM-HOST-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: +// LLVM-HOST: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-HOST: Counters: 2 +// LLVM-HOST: Block counts: [0, 0] + +// LLVM-DEVICE-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: +// LLVM-DEVICE: Hash: {{0[xX][0-9a-fA-F]+}} +// LLVM-DEVICE: Counters: 3 +// LLVM-DEVICE: Block counts: [10, 1, 1] + +// CLANG-HOST-LABEL: main: +// CLANG-HOST: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-HOST: Counters: 2 +// CLANG-HOST: Function count: 1 +// CLANG-HOST: Block counts: [20] + +// CLANG-HOST-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: +// CLANG-HOST: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-HOST: Counters: 2 +// CLANG-HOST: Function count: 0 +// CLANG-HOST: Block counts: [0] + +// CLANG-DEV-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: +// CLANG-DEV: Hash: {{0[xX][0-9a-fA-F]+}} +// CLANG-DEV: Counters: 2 +// CLANG-DEV: Function count: 0 +// CLANG-DEV: Block counts: [11] >From 253013792cb7137b11893e701497e8f62143123a Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 31 May 2024 16:59:22 -0500 Subject: [PATCH 42/58] Add PGO dump debug option --- offload/include/Shared/Environment.h | 1 + offload/plugins-nextgen/common/src/PluginInterface.cpp | 4 ++++ 2 files changed, 5 insertions(+) diff --git a/offload/include/Shared/Environment.h b/offload/include/Shared/Environment.h index d141146b6bd5a1..86f6d1c6ea2d36 100644 --- a/offload/include/Shared/Environment.h +++ b/offload/include/Shared/Environment.h @@ -30,6 +30,7 @@ enum class DeviceDebugKind : uint32_t { FunctionTracing = 1U << 1, CommonIssues = 1U << 2, AllocationTracker = 1U << 3, + PGODump = 1U << 4, }; struct DeviceEnvironmentTy { diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp index a479235e2c36cb..a68875ea7748ea 100644 --- a/offload/plugins-nextgen/common/src/PluginInterface.cpp +++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp @@ -839,6 +839,10 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { if (!ProfOrErr) return ProfOrErr.takeError(); + if ((OMPX_DebugKind.get() & uint32_t(DeviceDebugKind::PGODump)) == + uint32_t(DeviceDebugKind::PGODump)) + ProfOrErr->dump(); + // Write data to profiling file if (auto Err = ProfOrErr->write()) { consumeError(std::move(Err)); >From 344e357de657f54c068be969dcfc3ea33f2f026e Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 31 May 2024 20:29:20 -0500 Subject: [PATCH 43/58] Tighten PGO test requirements Require compiler-rt to be an enabled runtime --- offload/test/CMakeLists.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/offload/test/CMakeLists.txt b/offload/test/CMakeLists.txt index 32df1e47afaeb2..41ab339147791c 100644 --- a/offload/test/CMakeLists.txt +++ b/offload/test/CMakeLists.txt @@ -12,10 +12,10 @@ else() set(LIBOMPTARGET_DEBUG False) endif() -if (OPENMP_STANDALONE_BUILD) - set(LIBOMPTARGET_TEST_GPU_PGO False) -else() +if (NOT OPENMP_STANDALONE_BUILD AND "compiler-rt" IN_LIST LLVM_ENABLE_RUNTIMES) set(LIBOMPTARGET_TEST_GPU_PGO True) +else() + set(LIBOMPTARGET_TEST_GPU_PGO False) endif() # Replace the space from user's input with ";" in case that CMake add escape >From 2f751420b9ad2ffc7c9fac4a645724b45cdae59a Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 31 May 2024 20:29:20 -0500 Subject: [PATCH 44/58] Tighten PGO test requirements Require compiler-rt to be an enabled runtime --- offload/test/CMakeLists.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/offload/test/CMakeLists.txt b/offload/test/CMakeLists.txt index 32df1e47afaeb2..41ab339147791c 100644 --- a/offload/test/CMakeLists.txt +++ b/offload/test/CMakeLists.txt @@ -12,10 +12,10 @@ else() set(LIBOMPTARGET_DEBUG False) endif() -if (OPENMP_STANDALONE_BUILD) - set(LIBOMPTARGET_TEST_GPU_PGO False) -else() +if (NOT OPENMP_STANDALONE_BUILD AND "compiler-rt" IN_LIST LLVM_ENABLE_RUNTIMES) set(LIBOMPTARGET_TEST_GPU_PGO True) +else() + set(LIBOMPTARGET_TEST_GPU_PGO False) endif() # Replace the space from user's input with ";" in case that CMake add escape >From 79ceacb6559a3f6ecf3fd7ec1abf768ddeb97d13 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 31 May 2024 20:29:20 -0500 Subject: [PATCH 45/58] Tighten PGO test requirements Require compiler-rt to be an enabled runtime --- offload/test/CMakeLists.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/offload/test/CMakeLists.txt b/offload/test/CMakeLists.txt index 32df1e47afaeb2..41ab339147791c 100644 --- a/offload/test/CMakeLists.txt +++ b/offload/test/CMakeLists.txt @@ -12,10 +12,10 @@ else() set(LIBOMPTARGET_DEBUG False) endif() -if (OPENMP_STANDALONE_BUILD) - set(LIBOMPTARGET_TEST_GPU_PGO False) -else() +if (NOT OPENMP_STANDALONE_BUILD AND "compiler-rt" IN_LIST LLVM_ENABLE_RUNTIMES) set(LIBOMPTARGET_TEST_GPU_PGO True) +else() + set(LIBOMPTARGET_TEST_GPU_PGO False) endif() # Replace the space from user's input with ";" in case that CMake add escape >From ff0dd62cf1b236f1373fd3b70ec2875c3719ca04 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 31 May 2024 22:07:20 -0500 Subject: [PATCH 46/58] Add note about PGO debug flag --- openmp/docs/design/Runtimes.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/openmp/docs/design/Runtimes.rst b/openmp/docs/design/Runtimes.rst index f8a8cb87e83e66..7fc697a838e229 100644 --- a/openmp/docs/design/Runtimes.rst +++ b/openmp/docs/design/Runtimes.rst @@ -1493,3 +1493,4 @@ debugging features are supported. * Enable debugging assertions in the device. ``0x01`` * Enable diagnosing common problems during offloading . ``0x4`` * Enable device malloc statistics (amdgpu only). ``0x8`` + * Dump device PGO counters (only if PGO on GPU is enabled). ``0x10`` >From 0b9cc35d686f03fb8f835b2be2c4e16b630bd426 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 3 Jun 2024 22:15:46 -0500 Subject: [PATCH 47/58] Fix clang format --- .../plugins-nextgen/common/include/GlobalHandler.h | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/offload/plugins-nextgen/common/include/GlobalHandler.h b/offload/plugins-nextgen/common/include/GlobalHandler.h index 6daa8b1b854131..1b38ce65216dde 100644 --- a/offload/plugins-nextgen/common/include/GlobalHandler.h +++ b/offload/plugins-nextgen/common/include/GlobalHandler.h @@ -65,13 +65,11 @@ struct __llvm_profile_data { }; extern "C" { -extern int __attribute__((weak)) -__llvm_write_custom_profile(const char *Target, - const __llvm_profile_data *DataBegin, - const __llvm_profile_data *DataEnd, - const char *CountersBegin, const char *CountersEnd, - const char *NamesBegin, const char *NamesEnd, - const uint64_t *VersionOverride); +extern int __attribute__((weak)) __llvm_write_custom_profile( + const char *Target, const __llvm_profile_data *DataBegin, + const __llvm_profile_data *DataEnd, const char *CountersBegin, + const char *CountersEnd, const char *NamesBegin, const char *NamesEnd, + const uint64_t *VersionOverride); } /// PGO profiling data extracted from a GPU device >From 488cb4a349fdfbd73d0a78ddb2c17522c46145ba Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 26 Jun 2024 18:18:31 -0500 Subject: [PATCH 48/58] Apply requested formatting changes --- clang/lib/CodeGen/CodeGenPGO.cpp | 11 +++++----- llvm/lib/ProfileData/InstrProf.cpp | 4 ++-- .../Instrumentation/InstrProfiling.cpp | 10 ++++----- .../Instrumentation/PGOInstrumentation.cpp | 21 ++++++++++--------- offload/DeviceRTL/src/Profiling.cpp | 6 ++++-- 5 files changed, 28 insertions(+), 24 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp index a7ce0b8f6a35f3..3edfbdd679c61d 100644 --- a/clang/lib/CodeGen/CodeGenPGO.cpp +++ b/clang/lib/CodeGen/CodeGenPGO.cpp @@ -1199,12 +1199,13 @@ void CodeGenPGO::emitCounterSetOrIncrement(CGBuilderTy &Builder, const Stmt *S, // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( - FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); + auto *NormalizedFuncNameVarPtr = + llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( + FuncNameVar, llvm::PointerType::get(CGM.getLLVMContext(), 0)); - llvm::Value *Args[] = {NormalizedPtr, Builder.getInt64(FunctionHash), - Builder.getInt32(NumRegionCounters), - Builder.getInt32(Counter), StepV}; + llvm::Value *Args[] = { + NormalizedFuncNameVarPtr, Builder.getInt64(FunctionHash), + Builder.getInt32(NumRegionCounters), Builder.getInt32(Counter), StepV}; if (llvm::EnableSingleByteCoverage) Builder.CreateCall(CGM.getIntrinsic(llvm::Intrinsic::instrprof_cover), diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp index 1284efd4b5f4da..6742435c9d065e 100644 --- a/llvm/lib/ProfileData/InstrProf.cpp +++ b/llvm/lib/ProfileData/InstrProf.cpp @@ -433,8 +433,8 @@ std::string getPGOFuncNameVarName(StringRef FuncName, } bool isGPUProfTarget(const Module &M) { - const auto &Triple = llvm::Triple(M.getTargetTriple()); - return Triple.isAMDGPU() || Triple.isNVPTX(); + const auto &T = Triple(M.getTargetTriple()); + return T.isAMDGPU() || T.isNVPTX(); } void setPGOFuncVisibility(Module &M, GlobalVariable *FuncNameVar) { diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp index dd8c027c4bbf62..05cef1236f0879 100644 --- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp +++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp @@ -869,8 +869,8 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { llvm::InstrProfValueKind::IPVK_MemOPSize); CallInst *Call = nullptr; auto *TLI = &GetTLI(*Ind->getFunction()); - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( - DataVar, PointerType::getUnqual(M.getContext())); + auto *NormalizedDataVarPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + DataVar, PointerType::get(M.getContext(), 0)); // To support value profiling calls within Windows exception handlers, funclet // information contained within operand bundles needs to be copied over to @@ -879,12 +879,12 @@ void InstrLowerer::lowerValueProfileInst(InstrProfValueProfileInst *Ind) { SmallVector OpBundles; Ind->getOperandBundlesAsDefs(OpBundles); if (!IsMemOpSize) { - Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Value *Args[3] = {Ind->getTargetValue(), NormalizedDataVarPtr, Builder.getInt32(Index)}; Call = Builder.CreateCall(getOrInsertValueProfilingCall(M, *TLI), Args, OpBundles); } else { - Value *Args[3] = {Ind->getTargetValue(), NormalizedPtr, + Value *Args[3] = {Ind->getTargetValue(), NormalizedDataVarPtr, Builder.getInt32(Index)}; Call = Builder.CreateCall( getOrInsertValueProfilingCall(M, *TLI, ValueProfilingCallType::MemOp), @@ -1580,7 +1580,7 @@ void InstrLowerer::createDataVariable(InstrProfCntrInstBase *Inc) { ValuesVar->setAlignment(Align(8)); maybeSetComdat(ValuesVar, Fn, CntsVarName); ValuesPtrExpr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( - ValuesVar, PointerType::getUnqual(Fn->getContext())); + ValuesVar, PointerType::get(Fn->getContext(), 0)); } uint64_t NumCounters = Inc->getNumCounters()->getZExtValue(); diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp index ee1657ba8400ee..f8f34ea25597f3 100644 --- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp @@ -884,7 +884,7 @@ static void instrumentOneFunc( FuncInfo.FunctionHash); // Make sure that pointer to global is passed in with zero addrspace // This is relevant during GPU profiling - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + auto *NormalizedNamePtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); if (PGOFunctionEntryCoverage) { auto &EntryBB = F.getEntryBlock(); @@ -893,7 +893,7 @@ static void instrumentOneFunc( // i32 ) Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_cover), - {NormalizedPtr, CFGHash, Builder.getInt32(1), Builder.getInt32(0)}); + {NormalizedNamePtr, CFGHash, Builder.getInt32(1), Builder.getInt32(0)}); return; } @@ -948,7 +948,7 @@ static void instrumentOneFunc( // i32 ) Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_timestamp), - {NormalizedPtr, CFGHash, Builder.getInt32(NumCounters), + {NormalizedNamePtr, CFGHash, Builder.getInt32(NumCounters), Builder.getInt32(I)}); I += PGOBlockCoverage ? 8 : 1; } @@ -963,7 +963,7 @@ static void instrumentOneFunc( Intrinsic::getDeclaration(M, PGOBlockCoverage ? Intrinsic::instrprof_cover : Intrinsic::instrprof_increment), - {NormalizedPtr, CFGHash, Builder.getInt32(NumCounters), + {NormalizedNamePtr, CFGHash, Builder.getInt32(NumCounters), Builder.getInt32(I++)}); } @@ -1007,15 +1007,15 @@ static void instrumentOneFunc( ToProfile = Builder.CreatePtrToInt(Cand.V, Builder.getInt64Ty()); assert(ToProfile && "value profiling Value is of unexpected type"); - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( + auto *NormalizedNamePtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( Name, PointerType::get(M->getContext(), 0)); SmallVector OpBundles; populateEHOperandBundle(Cand, BlockColors, OpBundles); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_value_profile), - {NormalizedPtr, Builder.getInt64(FuncInfo.FunctionHash), ToProfile, - Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, + {NormalizedNamePtr, Builder.getInt64(FuncInfo.FunctionHash), + ToProfile, Builder.getInt32(Kind), Builder.getInt32(SiteIndex++)}, OpBundles); } } // IPVK_First <= Kind <= IPVK_Last @@ -1688,11 +1688,12 @@ void SelectInstVisitor::instrumentOneSelectInst(SelectInst &SI) { IRBuilder<> Builder(&SI); Type *Int64Ty = Builder.getInt64Ty(); auto *Step = Builder.CreateZExt(SI.getCondition(), Int64Ty); - auto *NormalizedPtr = ConstantExpr::getPointerBitCastOrAddrSpaceCast( - FuncNameVar, PointerType::get(M->getContext(), 0)); + auto *NormalizedFuncNameVarPtr = + ConstantExpr::getPointerBitCastOrAddrSpaceCast( + FuncNameVar, PointerType::get(M->getContext(), 0)); Builder.CreateCall( Intrinsic::getDeclaration(M, Intrinsic::instrprof_increment_step), - {NormalizedPtr, Builder.getInt64(FuncHash), + {NormalizedFuncNameVarPtr, Builder.getInt64(FuncHash), Builder.getInt32(TotalNumCtrs), Builder.getInt32(*CurCtrIdx), Step}); ++(*CurCtrIdx); } diff --git a/offload/DeviceRTL/src/Profiling.cpp b/offload/DeviceRTL/src/Profiling.cpp index 799477f5e47d27..639c62ceff7a69 100644 --- a/offload/DeviceRTL/src/Profiling.cpp +++ b/offload/DeviceRTL/src/Profiling.cpp @@ -12,8 +12,10 @@ extern "C" { -void __llvm_profile_register_function(void *ptr) {} -void __llvm_profile_register_names_function(void *ptr, long int i) {} +// Provides empty implementations for certain functions in compiler-rt +// that are emitted by the PGO instrumentation. +void __llvm_profile_register_function(void *Ptr) {} +void __llvm_profile_register_names_function(void *Ptr, long int I) {} } #pragma omp end declare target >From b90c01583f1893802aba0180b07a448584585365 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Wed, 26 Jun 2024 18:29:59 -0500 Subject: [PATCH 49/58] Add memop function shim to DeviceRTL This comes up sometimes when using LLVM IR level instrumentation. --- offload/DeviceRTL/include/Profiling.h | 1 + offload/DeviceRTL/src/Profiling.cpp | 1 + 2 files changed, 2 insertions(+) diff --git a/offload/DeviceRTL/include/Profiling.h b/offload/DeviceRTL/include/Profiling.h index 9efc1554c176bc..d9947522541219 100644 --- a/offload/DeviceRTL/include/Profiling.h +++ b/offload/DeviceRTL/include/Profiling.h @@ -15,6 +15,7 @@ extern "C" { void __llvm_profile_register_function(void *Ptr); void __llvm_profile_register_names_function(void *Ptr, long int I); +void __llvm_profile_instrument_memop(long int I, void *Ptr, int I2); } #endif diff --git a/offload/DeviceRTL/src/Profiling.cpp b/offload/DeviceRTL/src/Profiling.cpp index 639c62ceff7a69..bb3caaadcc03dd 100644 --- a/offload/DeviceRTL/src/Profiling.cpp +++ b/offload/DeviceRTL/src/Profiling.cpp @@ -16,6 +16,7 @@ extern "C" { // that are emitted by the PGO instrumentation. void __llvm_profile_register_function(void *Ptr) {} void __llvm_profile_register_names_function(void *Ptr, long int I) {} +void __llvm_profile_instrument_memop(long int I, void *Ptr, int I2) {} } #pragma omp end declare target >From c68c6e2fa98a1fe608b88ed38f7db68eae804c5b Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 27 Jun 2024 02:04:27 -0500 Subject: [PATCH 50/58] Make requested changes --- compiler-rt/lib/profile/InstrProfiling.h | 2 +- compiler-rt/lib/profile/InstrProfilingFile.c | 1 - offload/plugins-nextgen/common/src/PluginInterface.cpp | 5 ++--- 3 files changed, 3 insertions(+), 5 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfiling.h b/compiler-rt/lib/profile/InstrProfiling.h index ef1292a45bf01d..eda3e9a673c1af 100644 --- a/compiler-rt/lib/profile/InstrProfiling.h +++ b/compiler-rt/lib/profile/InstrProfiling.h @@ -298,7 +298,7 @@ void __llvm_profile_set_dumped(); /*! * \brief Write custom target-specific profiling data to a seperate file. - * Used by libomptarget for GPU PGO. + * Used by offload PGO. */ int __llvm_write_custom_profile(const char *Target, const __llvm_profile_data *DataBegin, diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index 784cb9af6169d8..93436ecbabb40d 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1321,7 +1321,6 @@ int __llvm_write_custom_profile(const char *Target, /* Prepend "TARGET." to current filename */ memcpy(TargetFilename, Target, TargetLength); TargetFilename[TargetLength] = '.'; - memcpy(TargetFilename, Target, TargetLength); memcpy(TargetFilename + 1 + TargetLength, Filename, FilenameLength); TargetFilename[FilenameLength + 1 + TargetLength] = 0; diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp index c4e1e63777de8a..445f4ad942bd4d 100644 --- a/offload/plugins-nextgen/common/src/PluginInterface.cpp +++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp @@ -843,9 +843,8 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { ProfOrErr->dump(); // Write data to profiling file - if (auto Err = ProfOrErr->write()) { - consumeError(std::move(Err)); - } + if (auto Err = ProfOrErr->write()) + return Err; } // Delete the memory manager before deinitializing the device. Otherwise, >From ca52c58c7fde412897cf6b10b9bbb321812f193d Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Thu, 27 Jun 2024 02:26:20 -0500 Subject: [PATCH 51/58] Only dump counters if PGODump flag is set --- offload/include/Shared/Environment.h | 1 + offload/plugins-nextgen/common/src/PluginInterface.cpp | 4 +++- openmp/docs/design/Runtimes.rst | 1 + 3 files changed, 5 insertions(+), 1 deletion(-) diff --git a/offload/include/Shared/Environment.h b/offload/include/Shared/Environment.h index d141146b6bd5a1..86f6d1c6ea2d36 100644 --- a/offload/include/Shared/Environment.h +++ b/offload/include/Shared/Environment.h @@ -30,6 +30,7 @@ enum class DeviceDebugKind : uint32_t { FunctionTracing = 1U << 1, CommonIssues = 1U << 2, AllocationTracker = 1U << 3, + PGODump = 1U << 4, }; struct DeviceEnvironmentTy { diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp index 445f4ad942bd4d..35fb04863d8741 100644 --- a/offload/plugins-nextgen/common/src/PluginInterface.cpp +++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp @@ -840,7 +840,9 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { return ProfOrErr.takeError(); // Dump out profdata - ProfOrErr->dump(); + if ((OMPX_DebugKind.get() & uint32_t(DeviceDebugKind::PGODump)) == + uint32_t(DeviceDebugKind::PGODump)) + ProfOrErr->dump(); // Write data to profiling file if (auto Err = ProfOrErr->write()) diff --git a/openmp/docs/design/Runtimes.rst b/openmp/docs/design/Runtimes.rst index f8a8cb87e83e66..7fc697a838e229 100644 --- a/openmp/docs/design/Runtimes.rst +++ b/openmp/docs/design/Runtimes.rst @@ -1493,3 +1493,4 @@ debugging features are supported. * Enable debugging assertions in the device. ``0x01`` * Enable diagnosing common problems during offloading . ``0x4`` * Enable device malloc statistics (amdgpu only). ``0x8`` + * Dump device PGO counters (only if PGO on GPU is enabled). ``0x10`` >From ee4431a1b57469c7679f54f124ca5f3dd7f0433b Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 9 Aug 2024 20:21:38 -0500 Subject: [PATCH 52/58] Update requirements --- offload/test/offloading/pgo1.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index d6747113265803..fbf6337374a997 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -12,10 +12,7 @@ // RUN: %target_triple.clang.profraw | %fcheck-generic \ // RUN: --check-prefix="CLANG-PGO" -// UNSUPPORTED: x86_64-pc-linux-gnu -// UNSUPPORTED: x86_64-pc-linux-gnu-LTO -// UNSUPPORTED: aarch64-unknown-linux-gnu -// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +// REQUIRES: gpu // REQUIRES: pgo int test1(int a) { return a / 2; } >From f9a24e35dfce2b18d0c4acefdaa0e71561bb875d Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 9 Aug 2024 20:30:58 -0500 Subject: [PATCH 53/58] Update test requirements --- offload/test/offloading/gpupgo/pgo1.c | 5 +---- offload/test/offloading/gpupgo/pgo2.c | 5 +---- 2 files changed, 2 insertions(+), 8 deletions(-) diff --git a/offload/test/offloading/gpupgo/pgo1.c b/offload/test/offloading/gpupgo/pgo1.c index 7c6e55f6546424..f5d8aee7908be0 100644 --- a/offload/test/offloading/gpupgo/pgo1.c +++ b/offload/test/offloading/gpupgo/pgo1.c @@ -12,10 +12,7 @@ // RUN: %target_triple.%basename_t.clang.profraw | \ // RUN: %fcheck-generic --check-prefix="CLANG-PGO" -// UNSUPPORTED: x86_64-pc-linux-gnu -// UNSUPPORTED: x86_64-pc-linux-gnu-LTO -// UNSUPPORTED: aarch64-unknown-linux-gnu -// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +// REQUIRES: gpu // REQUIRES: pgo int test1(int a) { return a / 2; } diff --git a/offload/test/offloading/gpupgo/pgo2.c b/offload/test/offloading/gpupgo/pgo2.c index 1819573d55024a..b5d0f2120754af 100644 --- a/offload/test/offloading/gpupgo/pgo2.c +++ b/offload/test/offloading/gpupgo/pgo2.c @@ -20,10 +20,7 @@ // RUN: %target_triple.%basename_t.clang.profraw | \ // RUN: %fcheck-generic --check-prefix="CLANG-DEV" -// UNSUPPORTED: x86_64-pc-linux-gnu -// UNSUPPORTED: x86_64-pc-linux-gnu-LTO -// UNSUPPORTED: aarch64-unknown-linux-gnu -// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +// REQUIRES: gpu // REQUIRES: pgo int main() { >From fb699b6bca72d42359a304bcbba88f3564ae9ac9 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Sat, 10 Aug 2024 00:54:36 -0500 Subject: [PATCH 54/58] Merge changes --- offload/plugins-nextgen/common/src/GlobalHandler.cpp | 2 +- offload/test/offloading/pgo1.c | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/offload/plugins-nextgen/common/src/GlobalHandler.cpp b/offload/plugins-nextgen/common/src/GlobalHandler.cpp index bca66cff6558a2..d7bfbba01c8efc 100644 --- a/offload/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/offload/plugins-nextgen/common/src/GlobalHandler.cpp @@ -193,7 +193,7 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, // Check if given current global is a profiling global based // on name - if (NameOrErr->equals(getInstrProfNamesVarName())) { + if (*NameOrErr == getInstrProfNamesVarName()) { // Read in profiled function names DeviceProfileData.NamesData = SmallVector(Sym.getSize(), 0); GlobalTy NamesGlobal(NameOrErr->str(), Sym.getSize(), diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index fbf6337374a997..3270ce8f15e7dc 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -32,17 +32,17 @@ int main() { // LLVM-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 4 -// LLVM-PGO: Block counts: [20, 10, 20, 10] +// LLVM-PGO: Block counts: [20, 10, 2, 1] // LLVM-PGO-LABEL: test1: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Block counts: [1] +// LLVM-PGO: Block counts: [10] // LLVM-PGO-LABEL: test2: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Block counts: [1] +// LLVM-PGO: Block counts: [20] // CLANG-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} >From 5727284c17e1a0eadfbcbc544d06e0dca0a4384b Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 12 Aug 2024 16:50:43 -0500 Subject: [PATCH 55/58] Merge changes --- offload/plugins-nextgen/common/src/GlobalHandler.cpp | 2 +- offload/test/offloading/gpupgo/pgo1.c | 8 ++++---- offload/test/offloading/gpupgo/pgo2.c | 2 +- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/offload/plugins-nextgen/common/src/GlobalHandler.cpp b/offload/plugins-nextgen/common/src/GlobalHandler.cpp index 93abd0a5cea365..0627b7f4a7f5b7 100644 --- a/offload/plugins-nextgen/common/src/GlobalHandler.cpp +++ b/offload/plugins-nextgen/common/src/GlobalHandler.cpp @@ -194,7 +194,7 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, // Check if given current global is a profiling global based // on name - if (NameOrErr->equals(getInstrProfNamesVarName())) { + if (*NameOrErr == getInstrProfNamesVarName()) { // Read in profiled function names DeviceProfileData.NamesData = SmallVector(Sym.getSize(), 0); GlobalTy NamesGlobal(NameOrErr->str(), Sym.getSize(), diff --git a/offload/test/offloading/gpupgo/pgo1.c b/offload/test/offloading/gpupgo/pgo1.c index f5d8aee7908be0..7196663fcfc908 100644 --- a/offload/test/offloading/gpupgo/pgo1.c +++ b/offload/test/offloading/gpupgo/pgo1.c @@ -32,17 +32,17 @@ int main() { // LLVM-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 4 -// LLVM-PGO: Block counts: [20, 10, 20, 10] +// LLVM-PGO: Block counts: [20, 10, 2, 1] // LLVM-PGO-LABEL: test1: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Block counts: [1] +// LLVM-PGO: Block counts: [10] // LLVM-PGO-LABEL: test2: // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-PGO: Counters: 1 -// LLVM-PGO: Block counts: [1] +// LLVM-PGO: Block counts: [20] // LLVM-PGO-LABEL: Instrumentation level: // LLVM-PGO-SAME: IR @@ -52,7 +52,7 @@ int main() { // LLVM-PGO-LABEL: Maximum function count: // LLVM-PGO-SAME: 20 // LLVM-PGO-LABEL: Maximum internal block count: -// LLVM-PGO-SAME: 20 +// LLVM-PGO-SAME: 10 // CLANG-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} diff --git a/offload/test/offloading/gpupgo/pgo2.c b/offload/test/offloading/gpupgo/pgo2.c index b5d0f2120754af..7f5c9ab7449074 100644 --- a/offload/test/offloading/gpupgo/pgo2.c +++ b/offload/test/offloading/gpupgo/pgo2.c @@ -49,7 +49,7 @@ int main() { // LLVM-DEVICE-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: // LLVM-DEVICE: Hash: {{0[xX][0-9a-fA-F]+}} // LLVM-DEVICE: Counters: 3 -// LLVM-DEVICE: Block counts: [10, 1, 1] +// LLVM-DEVICE: Block counts: [10, 2, 1] // CLANG-HOST-LABEL: main: // CLANG-HOST: Hash: {{0[xX][0-9a-fA-F]+}} >From f5d9f550cd711bd86c0d7b7bf70b7f5ecd9772e0 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 27 Sep 2024 19:37:45 -0500 Subject: [PATCH 56/58] Rebase updates --- compiler-rt/lib/profile/InstrProfilingFile.c | 1 - .../common/src/PluginInterface.cpp | 5 +- offload/test/offloading/pgo1.c | 74 ------------------- 3 files changed, 2 insertions(+), 78 deletions(-) delete mode 100644 offload/test/offloading/pgo1.c diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index a912de79da03a6..c40942e37cb072 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1359,7 +1359,6 @@ int __llvm_write_custom_profile(const char *Target, /* Prepend "TARGET." to current filename */ memcpy(TargetFilename, Target, TargetLength); TargetFilename[TargetLength] = '.'; - memcpy(TargetFilename, Target, TargetLength); memcpy(TargetFilename + 1 + TargetLength, Filename, FilenameLength); TargetFilename[FilenameLength + 1 + TargetLength] = 0; diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp index 0153da6cbf6773..dc6f50cab9d587 100644 --- a/offload/plugins-nextgen/common/src/PluginInterface.cpp +++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp @@ -859,9 +859,8 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) { ProfOrErr->dump(); // Write data to profiling file - if (auto Err = ProfOrErr->write()) { - consumeError(std::move(Err)); - } + if (auto Err = ProfOrErr->write()) + return Err; } // Delete the memory manager before deinitializing the device. Otherwise, diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c deleted file mode 100644 index c0d698323adf06..00000000000000 --- a/offload/test/offloading/pgo1.c +++ /dev/null @@ -1,74 +0,0 @@ -// RUN: %libomptarget-compile-generic -fprofile-instr-generate \ -// RUN: -Xclang "-fprofile-instrument=clang" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ -// RUN: --check-prefix="CLANG-PGO" -// RUN: %libomptarget-compile-generic -fprofile-generate \ -// RUN: -Xclang "-fprofile-instrument=llvm" -// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \ -// RUN: --check-prefix="LLVM-PGO" - -// REQUIRES: gpu -// REQUIRES: pgo - -#ifdef _OPENMP -#include -#endif - -int test1(int a) { return a / 2; } -int test2(int a) { return a * 2; } - -int main() { - int m = 2; -#pragma omp target - for (int i = 0; i < 10; i++) { - m = test1(m); - for (int j = 0; j < 2; j++) { - m = test2(m); - } - } -} - -// CLANG-PGO: ======== Counters ========= -// CLANG-PGO-NEXT: [ 0 11 20 ] -// CLANG-PGO-NEXT: [ 10 ] -// CLANG-PGO-NEXT: [ 20 ] -// CLANG-PGO-NEXT: ========== Data =========== -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// CLANG-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// CLANG-PGO-NEXT: ======== Functions ======== -// CLANG-PGO-NEXT: pgo1.c: -// CLANG-PGO-SAME: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} -// CLANG-PGO-NEXT: test1 -// CLANG-PGO-NEXT: test2 - -// LLVM-PGO: ======== Counters ========= -// LLVM-PGO-NEXT: [ 20 10 2 1 ] -// LLVM-PGO-NEXT: [ 10 ] -// LLVM-PGO-NEXT: [ 20 ] -// LLVM-PGO-NEXT: ========== Data =========== -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: { {{[0-9]*}} {{[0-9]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{0x[0-9a-fA-F]*}} {{0x[0-9a-fA-F]*}} -// LLVM-PGO-SAME: {{[0-9]*}} {{[0-9]*}} {{[0-9]*}} } -// LLVM-PGO-NEXT: ======== Functions ======== -// LLVM-PGO-NEXT: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}} -// LLVM-PGO-NEXT: test1 -// LLVM-PGO-NEXT: test2 >From 1d0a961aabe488e6d09b96a80329498b8f586923 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Fri, 25 Oct 2024 13:42:19 -0500 Subject: [PATCH 57/58] Add llvm-profdata substitution to offload tests --- offload/test/lit.cfg | 2 ++ offload/test/lit.site.cfg.in | 2 +- offload/test/offloading/pgo1.c | 4 ++-- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/offload/test/lit.cfg b/offload/test/lit.cfg index 7994a08ba063fb..cfd1ad6c3c1eb5 100644 --- a/offload/test/lit.cfg +++ b/offload/test/lit.cfg @@ -112,8 +112,10 @@ config.available_features.add(config.libomptarget_current_target) if config.libomptarget_has_libc: config.available_features.add('libc') +profdata_path = os.path.join(config.bin_llvm_tools_dir, "llvm-profdata") if config.libomptarget_test_pgo: config.available_features.add('pgo') + config.substitutions.append(("%profdata", profdata_path)) # Determine whether the test system supports unified memory. # For CUDA, this is the case with compute capability 70 (Volta) or higher. diff --git a/offload/test/lit.site.cfg.in b/offload/test/lit.site.cfg.in index a1cb5acc38a405..d998fb0c839700 100644 --- a/offload/test/lit.site.cfg.in +++ b/offload/test/lit.site.cfg.in @@ -1,6 +1,6 @@ @AUTO_GEN_COMMENT@ -config.bin_llvm_tools_dir = "@CMAKE_BINARY_DIR@/bin" +config.bin_llvm_tools_dir = "@LLVM_RUNTIME_OUTPUT_INTDIR@" config.test_c_compiler = "@OPENMP_TEST_C_COMPILER@" config.test_cxx_compiler = "@OPENMP_TEST_CXX_COMPILER@" config.test_fortran_compiler="@OPENMP_TEST_Fortran_COMPILER@" diff --git a/offload/test/offloading/pgo1.c b/offload/test/offloading/pgo1.c index 1ef540e430a27a..51671afa62b0db 100644 --- a/offload/test/offloading/pgo1.c +++ b/offload/test/offloading/pgo1.c @@ -1,14 +1,14 @@ // RUN: %libomptarget-compile-generic -fprofile-generate \ // RUN: -Xclang "-fprofile-instrument=llvm" // RUN: env LLVM_PROFILE_FILE=llvm.profraw %libomptarget-run-generic 2>&1 -// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %profdata show --all-functions --counts \ // RUN: %target_triple.llvm.profraw | %fcheck-generic \ // RUN: --check-prefix="LLVM-PGO" // RUN: %libomptarget-compile-generic -fprofile-instr-generate \ // RUN: -Xclang "-fprofile-instrument=clang" // RUN: env LLVM_PROFILE_FILE=clang.profraw %libomptarget-run-generic 2>&1 -// RUN: llvm-profdata show --all-functions --counts \ +// RUN: %profdata show --all-functions --counts \ // RUN: %target_triple.clang.profraw | %fcheck-generic \ // RUN: --check-prefix="CLANG-PGO" >From c6b34ad7a676a462955b2e7b534b12264363b430 Mon Sep 17 00:00:00 2001 From: Ethan Luis McDonough Date: Mon, 28 Oct 2024 18:45:37 -0500 Subject: [PATCH 58/58] Prepend target prefix to basename --- compiler-rt/lib/profile/InstrProfilingFile.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index decafbcb1a5352..6b6f47e239714c 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -1355,10 +1355,21 @@ int __llvm_write_custom_profile(const char *Target, TargetFilename = (char *)COMPILER_RT_ALLOCA(FilenameLength + TargetLength + 2); + /* Find file basename and path sizes */ + int32_t DirEnd = FilenameLength - 1; + while (DirEnd >= 0 && !IS_DIR_SEPARATOR(Filename[DirEnd])) { + DirEnd--; + } + uint32_t DirSize = DirEnd + 1, BaseSize = FilenameLength - DirSize; + /* Prepend "TARGET." to current filename */ - memcpy(TargetFilename, Target, TargetLength); - TargetFilename[TargetLength] = '.'; - memcpy(TargetFilename + 1 + TargetLength, Filename, FilenameLength); + if (DirSize > 0) { + memcpy(TargetFilename, Filename, DirSize); + } + memcpy(TargetFilename + DirSize, Target, TargetLength); + TargetFilename[TargetLength + DirSize] = '.'; + memcpy(TargetFilename + DirSize + 1 + TargetLength, Filename + DirSize, + BaseSize); TargetFilename[FilenameLength + 1 + TargetLength] = 0; /* Check if there is llvm/runtime version mismatch. */