[all-commits] [llvm/llvm-project] 37c1a5: [Libomptarget] Fix GPU Dtors referencing possibly ...
Joseph Huber via All-commits
all-commits at lists.llvm.org
Thu Jan 11 13:01:05 PST 2024
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 37c1a5e3f56a287703426da6c2c8cb998e28ca7c
https://github.com/llvm/llvm-project/commit/37c1a5e3f56a287703426da6c2c8cb998e28ca7c
Author: Joseph Huber <huberjn at outlook.com>
Date: 2024-01-11 (Thu, 11 Jan 2024)
Changed paths:
M openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
M openmp/libomptarget/plugins-nextgen/common/include/PluginInterface.h
M openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp
Log Message:
-----------
[Libomptarget] Fix GPU Dtors referencing possibly deallocated image (#77828)
Summary:
The constructors and destructors look up a symbol in the ELF quickly to
determine if they need to be run on the GPU. This allows us to avoid the
very slow actions required to do the slower lookup using the vendor API.
One problem occurs with how we handle the lifetime of these images.
Right now there is no invariant to specify the lifetime of the
underlying binary image that is loaded. In the typical case, this comes
from the binary itself in the `.llvm.offloading` section, meaning that
the lifetime of the binary should match the executable itself. This
would work fine, if it weren't for the fact that the plugin is loaded
via `dlopen` and can have a teardown order out of sync with the main
executable.
This was likely what was occuring when this failed on some systems but
not others. A potential solution would be to simply copy images into
memory so the runtime does not rely on external references. Another
would be to manually zero these out after initialization as to prevent
this mistake from happening accidentally. The former has the benefit of
making some checks easier, and allowing for constant initialization be
done on the ELF itself (normally we can't do this because writing to a
constant section, e.g. .llvm.offloading is a segfault.). The downside
would be the extra time required to copy the image in bulk (Although we
are likely doing this in the vendor runtimes as well).
This patch went with a quick solution to simply set a boolean value at
initialization time if we need to call destructors.
Fixes: https://github.com/llvm/llvm-project/issues/77798
More information about the All-commits
mailing list