[Openmp-commits] [PATCH] D14253: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget plugins.
Jonas Hahnfeld via Openmp-commits
openmp-commits at lists.llvm.org
Mon Feb 1 02:03:12 PST 2016
Hahnfeld added inline comments.
================
Comment at: libomptarget/plugins/cuda/src/rtl.cpp:14-25
@@ +13,14 @@
+
+#include <algorithm>
+#include <assert.h>
+#include <cstdio>
+#include <cuda.h>
+#include <cuda_runtime_api.h>
+#include <libelf.h>
+#include <list>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string>
+#include <string.h>
+#include <vector>
+
----------------
Some of the C header files were not needed on my system (`stdio.h`, `stdlib.h`, `string.h`)
================
Comment at: libomptarget/plugins/cuda/src/rtl.cpp:110-113
@@ +109,6 @@
+
+ for (unsigned i = 0; i < E.Entries.size(); ++i) {
+ if (E.Entries[i].addr == addr)
+ return true;
+ }
+
----------------
You could use a range-based loop
================
Comment at: libomptarget/plugins/cuda/src/rtl.cpp:427-430
@@ +426,6 @@
+ // obtain cuda pointer to global tracking thread limit.
+ const char SuffixTL[] = "_thread_limit";
+ char *ThreadLimitName =
+ (char *)malloc((strlen(e->name) + strlen(SuffixTL)) * sizeof(char));
+ sprintf(ThreadLimitName, "%s%s", e->name, SuffixTL);
+
----------------
This memory is leaked - can this be done with some C++-constructs (`stringstream` for example)?
================
Comment at: libomptarget/plugins/cuda/src/rtl.cpp:566-573
@@ +565,10 @@
+
+ // update thread limit content in gpu memory if un-initialized or changed.
+ if (KernelInfo->ThreadLimit == 0 || KernelInfo->ThreadLimit != thread_limit) {
+ // always capped by maximum number of threads in a block: even if 1 OMP
+ // thread is 1 independent CUDA thread, we may have up to max block size OMP
+ // threads if the user request thread_limit(tl) with tl > max block size, we
+ // only start max block size CUDA threads.
+ if (thread_limit > DeviceInfo.ThreadsPerBlock[device_id])
+ thread_limit = DeviceInfo.ThreadsPerBlock[device_id];
+
----------------
Doesn't this mean that a default `thread_limit = 0` or a too high one will be updated for every kernel? (The variable is signed, is a negative value possible and what does it mean?)
================
Comment at: libomptarget/plugins/cuda/src/rtl.cpp:593-600
@@ +592,10 @@
+
+ err = cuLaunchKernel(KernelInfo->Func, blocksPerGrid, 1, 1,
+ cudaThreadsPerBlock, 1, 1, nshared, 0, &args[0], 0);
+ if (err != CUDA_SUCCESS) {
+ DP("Device kernel launching failed!\n");
+ CUDA_ERR_STRING(err);
+ assert(err == CUDA_SUCCESS && "Unable to launch target execution!");
+ return OFFLOAD_FAIL;
+ }
+
----------------
So this launch is asynchronous and it gets synced when fetching data from the device.
But what would happen if there is a `target` region immediately following another one that is not yet synced? And can you later on figure out which kernel to wait for?
================
Comment at: libomptarget/plugins/generic-elf-64bit/src/rtl.cpp:14-26
@@ +13,15 @@
+
+#include <algorithm>
+#include <assert.h>
+#include <cstdio>
+#include <dlfcn.h>
+#include <elf.h>
+#include <ffi.h>
+#include <gelf.h>
+#include <list>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <vector>
+#include <link.h>
+
----------------
See other comment about header files
================
Comment at: libomptarget/plugins/generic-elf-64bit/src/rtl.cpp:110-112
@@ +109,5 @@
+ // Close dynamic libraries
+ for (std::list<DynLibTy>::iterator ii = DynLibs.begin(),
+ ie = DynLibs.begin();
+ ii != ie; ++ii)
+ if (ii->Handle) {
----------------
1. `DynLibs` isn't used anywhere else?
2. If it is needed: I think this doesn't work as expected (from `begin` to `begin`) - then maybe rewrite as range-based loop?
================
Comment at: libomptarget/plugins/generic-elf-64bit/src/rtl.cpp:243-245
@@ +242,5 @@
+
+ // load dynamic library and get the entry points. We use the dl library
+ // to do the loading of the library, but we could do it directly to avoid the
+ // dump to the temporary file.
+ //
----------------
So if it's possible, why not do it?
http://reviews.llvm.org/D14253
More information about the Openmp-commits
mailing list