[Openmp-commits] [PATCH] D134396: [OpenMP][libomptarget] New plugin infrastructure and new CUDA plugin

Johannes Doerfert via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Tue Sep 27 20:37:09 PDT 2022


jdoerfert added a comment.

Drive by comments, @jhuber6 and @tianshilei1992 should give more input. I want this to eventually get in and I'll look over at some point. Before let's get as much of it right as possible. We'll have time to adjust later too.



================
Comment at: openmp/libomptarget/plugins-nextgen/common/PluginInterface/GlobalHandler.cpp:37-39
+  if (!ElfOrErr) {
+    return nullptr;
+  }
----------------



================
Comment at: openmp/libomptarget/plugins-nextgen/common/PluginInterface/GlobalHandler.cpp:44
+  assert(Result.second);
+  assert(Result.first != ELFObjectFiles.end());
+
----------------
Add messages for each assert.


================
Comment at: openmp/libomptarget/plugins-nextgen/common/PluginInterface/GlobalHandler.h:41
+  // holding a private copy of the name as a std::string
+  std::string Name;
+  uint32_t Size;
----------------
kevinsala wrote:
> jhuber6 wrote:
> > kevinsala wrote:
> > > jhuber6 wrote:
> > > > This should be able to be a `StringRef` as well.
> > > Do you want to change it to `StringRef` for performance reasons (i.e., avoid the std::string's dynamic memory)?
> > > 
> > > Using `StringRef` here could be dangerous since we have no control on the string memory lifetime. This can be dangerous if a developer constructs a string on the stack, creates a `GlobalTy` variable, and passes that string as the name (e.g., as we do for the `exec_mode`). Thus, we would be adding an extra restriction on `GlobalTy`, where the user must guarantee that the string name has a longer lifetime than the `GlobalTy` object.
> > > 
> > > I would keep it as it is now, and if we see there is a performance penalty on using `std::string` here, we can change it.
> > This is a huge patch so I can't really get a good view of the usage, but I figured that this will always point to a global inside the `omp_offloading_entires` section, which will guarantee we always have access to this memory as it's just a string constant in the binary itself. Does the new plugin interface add new constants not contained in the entry list?
> There is an example of the usage in line 326 in `openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp`. In that case, the global name is constructed from the kernel name + "_exec_mode", it's just a temporary string, and then it's safely copied to the `GlobalTy`'s `Name` member. We could move the string construction outside the `GlobalTy` constructor, and pass its `StringRef` to the ctor, but it seems dangerous from my point of view. It wouldn't be difficult to break this code at the slightest distraction.
I doubt this is helpful wrt performance. I also doubt we ever would use "stack" (or short lifetime) strings. Both of this said, (and thinking I might have place the "NOTE" there in the first place), I'd keep it as is, this will never hit our profiler either way, but this is the "conservative" version" after all.


================
Comment at: openmp/libomptarget/plugins-nextgen/common/PluginInterface/GlobalHandler.h:105
+  /// Map to store the ELF object files that have been loaded
+  std::unordered_map<int32_t, ELF64LEObjectFile> ELFObjectFiles;
+  std::mutex ELFObjectFilesMutex;
----------------
jhuber6 wrote:
> kevinsala wrote:
> > jhuber6 wrote:
> > > 
> > Does `llvm::DenseMap` keep the references to the previously inserted elements valid when someone else is inserting a new element? I understood it invalidates them, but maybe I'm wrong. In case they are invalidated, we couldn't switch to `llvm::DenseMap` directly. I implemented this section thinking on multiple threads checking ELF object files, where some threads may be consulting an already inserted ELFObjectFile (without holding the map lock) while another potential thread may be inserting a new one (with the lock acquired).
> You're right on the iterator invalidation. That guarantee is pat of what makes `std::unordered_map` slower than `llvm::DenseMap` in general. I'll leave it up to your discretion if you think that we need to be extra careful about invalidating iterators here. I was just guessing that this was part of the initialization code, which is generally thread safe since it should be called from the executable's constructors.
In general I'm with @jhuber6 here. If we don't really think we need persistent iterators we should not add expectations. 


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134396/new/

https://reviews.llvm.org/D134396



More information about the Openmp-commits mailing list