[Openmp-commits] [PATCH] D70971: [libomptarget] Build a minimal deviceRTL for amdgcn

Tue Dec 3 16:37:49 PST 2019

JonChesterfield added inline comments.

================
Comment at: openmp/libomptarget/deviceRTLs/amdgcn/src/device_environment.h:18-26
+struct omptarget_device_environmentTy {
+  int32_t debug_level;   // gets value of envvar LIBOMPTARGET_DEVICE_RTL_DEBUG
+                         // only useful for Debug build of deviceRTLs 
+  int32_t num_devices;   // gets number of active offload devices 
+  int32_t device_num;    // gets a value 0 to num_devices-1
+};
+
----------------
ABataev wrote:
> JonChesterfield wrote:
> > ABataev wrote:
> > > JonChesterfield wrote:
> > > > ABataev wrote:
> > > > > JonChesterfield wrote:
> > > > > > ABataev wrote:
> > > > > > > JonChesterfield wrote:
> > > > > > > > ABataev wrote:
> > > > > > > > > JonChesterfield wrote:
> > > > > > > > > > ABataev wrote:
> > > > > > > > > > > JonChesterfield wrote:
> > > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > > JonChesterfield wrote:
> > > > > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > > > > JonChesterfield wrote:
> > > > > > > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > > > > > > Agan this thing. Can we make it common if it is really required? Again, why do we need these new fields? Could you remove it from the current patch?
> > > > > > > > > > > > > > > > It's different on nvptx and amdgcn, so not a great candidate for common. The extra fields are used to implement a couple of functions that are stubs in trunk which you weren't sure are necessary for nvptx.
> > > > > > > > > > > > > > > Maybe, it is better to exclude this from this patch? Maybe it would be good to have these fields for nvptx too or we don't need them to for amdgcn. Could you remind where we discussed this structure?
> > > > > > > > > > > > > > Discussion was in D69424. I could drop the file from this patch, but it's included by debug.h so that'll break the amdgcn build
> > > > > > > > > > > > > Ah, found it. Again, the same question. These fields are required to add 2 new `omp_...` functions. The question is the same. Are they required by the standard? If yes, we must have the same fields for nvptx. Otherwise, I don't think we need these fields at all.
> > > > > > > > > > > > "When called from a target region the effect of this routing is unspecified"
> > > > > > > > > > > > 
> > > > > > > > > > > > So nvptx returning zero is fine, and andgcn returning a meaningful value is also fine, as far as the standard is concerned.
> > > > > > > > > > > Why do we need to store these values? And if we want these functions to return some meaningful values, better to implement the same behavior for both nvptx and amdgcn.
> > > > > > > > > > The plugin knows how many devices are available, so it writes the values to memory on the device, such that the device can answer the query accurately. I want the function to return a meaningful value on amdgcn as a quality of implementation point - it doesn't directly affect me whether nvptx supports the same. It'll cost a few bytes of memory, in difficult to optimise out fashion.
> > > > > > > > > WE need to support the same behavior on all platforms.
> > > > > > > > Who is we in this context, and do you consider unconditionally returning zero everywhere to be better than returning meaningful values on some targets and zero on others?
> > > > > > > The behavior of the runtime library must be the same on all platforms.
> > > > > > I don't think that's the right requirement. The runtime library is correct on a given target if it meets the openmp standard. If one target fails to meet a part of the standard, or chooses to exceed it, that should neither hobble nor demand a comparable extension from all other targets.
> > > > > > 
> > > > > > 
> > > > > No, sorry, I can't agree with you here. The same library must behave the same way everywhere, on each target, on each platform. Supporting different behavior of the same library on different platforms is the bad design and leads to a lot of pain with support.
> > > > It's easier for the user if changing platform causes no change in behaviour, sure. But that also means the user can only use a feature that is supported on all systems.
> > > > 
> > > > This is the advantage of implementation defined behaviour. People can write portable code, winning easy porting at the cost of performance. Or they can depend on features that are only available on some systems, knowing roughly the price paid for that.
> > > > 
> > > > Both are reasonable positions for developers to take. I don't think we should mandate portable application code, especially given the popularity of openmp on HPC where code may only ever run on one machine.
> > > No, not reasonable at all. I insist that the target dependent code must contain only really target dependent code, the functiinality must be the same on all platforms.
> > In which direction would you like this one resolved? Force all targets to return zero or enhance nvptx and future targets beyond the requirements of the standard?
> > 
> > On the other hand, take the openmp locking code. On trunk, as far as I can tell, this is functionally correct on volta and deadlocks on earlier nvidia hardware. Imagine for a moment that we are unable to make it work on non-volta cards, would the fix be to deadlock everywhere or to drop support for non-volta cards? Or to make the locks a nop, a serious QOI degradation on volta.
> > 
> > That one is of particular interest as it is unclear whether amdgcn is able to support unstructured locks.
> Does not matter. I would impelement it to have better performance but you can reuse the original amdgcn solution.
> 
> As to locks, they works on all gpus, like pascal, volta, etc. And this code is platform-dependent. If the target does not support it, it is hardware limitation. In your case the code does not have any platform specific dependencies.
I've deleted the extra fields from this patch. Given amdgcn doesn't have a libcall implementation yet the discussion about implementation divergence is arguably premature.

I don't understand your requirements here. Are hardware limitations acceptable as a reason for different behaviour between systems? E.g. floating point arithmetic will inevitably return different values on different architectures when compiled with fast-math.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70971/new/

https://reviews.llvm.org/D70971