[Openmp-dev] [RFC] Clarify the absence of API stability for the *device runtime* (aka. libomptarget-nvptx-sm_XX.bc)

Johannes Doerfert via Openmp-dev openmp-dev at lists.llvm.org
Thu Jul 9 22:19:45 PDT 2020


On 7/9/20 8:25 PM, Hal Finkel wrote:
>
> On 7/9/20 6:46 PM, Johannes Doerfert wrote:
>>
>> On 7/9/20 6:22 PM, Hal Finkel wrote:
>>>
>>> On 7/9/20 5:33 PM, Johannes Doerfert wrote:
>>>>
>>>>
>>>> On 7/9/20 5:21 PM, Hal Finkel via Openmp-dev wrote:
>>>>>
>>>>> On 7/9/20 5:17 PM, Michael Kruse wrote:
>>>>>> Correct.
>>>>>>
>>>>>> I inferred from your response that we have no such guarantee yet, 
>>>>>> we just haven't broken it yet.
>>>>>>
>>>>>> However, I think that change will break ABI of libomptarget-nvptx.a
>>>>>
>>>>>
>>>>> I think that this is an important point. I interpreted this thread 
>>>>> in the context of the API between the runtime and the device-side 
>>>>> application code. Not the ABI between the plugin and the 
>>>>> device-side runtime. I suspect these two are separable, but we 
>>>>> should definitely clarify this.
>>>>>
>>>>
>>>> Neither should be considered stable at this point. We kept 
>>>> libomptarget stable while we recently added functions but I would 
>>>> not assume it will stay that way. As I mentioned before, for the 
>>>> device runtime there is basically no supported way today in which 
>>>> we could even guarantee stability.
>>>>
>>>
>>> I think that we should separate these concerns.
>>>
>>> The device-side runtime library is a static library (in IR form), 
>>> and multiple different versions can co-exist within one application 
>>> (in device code contained in different shared libraries). It seems 
>>> completely reasonable to consider that version-locked to the 
>>> compiler that compiled the code. It's an internal interface between 
>>> two parts of a translation unit compiled by the same compiler.
>>>
>>> libomptarget is a shared library. libomptarget.rtl.cuda.so is a 
>>> shared library. Unless we take special care in the naming and 
>>> linking, managing of global state, etc. I think that we need to 
>>> consider these to have some kind of ABI stability (because you only 
>>> get to have one in each process). That doesn't mean that we can't 
>>> ever decide to break that ABI, but we would likely decide not to do 
>>> so silently. This implies that the device-side runtime should 
>>> maintain ABI compatibility with libomptarget.rtl.cuda.so unless we 
>>> make a non-silent breaking change. Just having kernels in a shared 
>>> library suddenly stop working correctly when a newer version of 
>>> libomptarget.rtl.cuda.so is loaded is probably not something we can 
>>> considerately do at this point.
>>>
>>
>> I think this is all nice and well if we would have a stable and 
>> complete setup. I pretty much doubt we are there yet and pretending 
>> we are is hurting us and the user alike.
>>
>> I think this goes in the same direction as Ye's comment. Why do we 
>> want to guarantee stability if we don't even know if all the puzzle 
>> pieces are in place.
>>
>> Interestingly, the OpenMP standard has a way out of this, as Ravi 
>> hinted towards in another email. libomptarget is loaded on demand. If 
>> the version is not a match we can just not load it (or skip it).
>>
>> At the end of the day it is not the only library that you cannot just 
>> update and expect it to work. Nor the only one that will not work 
>> with a program compiled for a newer version.
>>
>>
>> Long story short. I would strongly suggest to not put false hopes out 
>> there that will come back and haunt us. All (openmp) target libraries 
>> are bound to the compiler until further notice. There is no stability 
>> guarantee until further notice. Update (+recompile) everything or 
>> nothing until for the time being.
>>
>>
>> ~ Johannes
>
>
> I understand what you're saying, but I don't think it's that simple. 
> We understand very well that the currently implementation has all 
> sorts of usability issues and suboptimalities of various kinds. By 
> many measures it's barely usable. Moreover, we're improving all of 
> these things, in part, due to feedback from users like Ye. We have a 
> lot of applications that want to use OpenMP offload support and can't 
> yet. However, the current implementation is not completely unusable, 
> and in fact, I think that we must assume that it has users who depend 
> on it. I don't think it has very many compared to the number of users 
> we'll have after things stabilize a bit. As a result, I think that we 
> can prioritize future users over any current ones. However, we should 
> still be kind to our current users and communicate with them clearly. 
> In my opinion, however, we have limited options for effectively 
> communicating with our users, and a mailing-list thread isn't one of 
> them. Release notes aren't really either, unfortunately. All of the 
> means we have are technical. We can name options with 'experimental' 
> in the name (although the ship has sailed already on that one, and 
> probably would not have been appropriate anyway). We can bump versions 
> of things (symbols, library names, etc.) to prevent linking things 
> together that will be broken. We can use dynamic, versioned 
> registration checks (i.e., the expected version is embedded into the 
> initialization call, and the library aborts or prints a warning if 
> provided an unexpected version), but we need to do something.
>
> In short, while I agree with you that pretending we have a significant 
> existing user base for which we need to prioritize stability would be 
> a mistake, as we make things better, our number of users will grow. 
> We'll have a significant number well before we consider the 
> functionality to be stable. In addition, we depend on these users 
> submitting bug reports and other feedback in order to improve things. 
> Thus, we should use technical measures to make it clear what mixing 
> will work and what won't work.
>
> I don't understand, however, whether this is an issue of present 
> concern, or only a matter of general policy. My impression, Johannes, 
> was that the patch that motivated this RFC does not break the 
> libomptarget <-> plugin <-> kernel interface at all. It changes only 
> the inward-facing IR-level AP of the device runtime. Is that correct?

Yes, this RFC was not about shared libraries (which are the interfaces 
you mentioned). I don't really know why we now have a huge discussion 
about something else that is theoretical in nature anyway. We always 
kept the interface stable, and then we added a feature or fixed a bug 
and told everyone to recompile everything anyway.


~ Johannes


>
> Thanks again,
>
> Hal
>
>
>>
>>
>>
>>>  -Hal
>>>
>>>
>>>>
>>>>>  -Hal
>>>>>
>>>>>
>>>>>>
>>>>>> Just want to clarify what ABI stability guarantees we have.
>>>>>>
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>> Le jeu. 9 juil. 2020 à 17:08, Hal Finkel <hfinkel at anl.gov 
>>>>>> <mailto:hfinkel at anl.gov>> a écrit :
>>>>>>
>>>>>>
>>>>>>     On 7/9/20 3:39 PM, Michael Kruse wrote:
>>>>>>     > Am Do., 9. Juli 2020 um 14:08 Uhr schrieb Hal Finkel
>>>>>>     <hfinkel at anl.gov <mailto:hfinkel at anl.gov>>:
>>>>>>     >> Also, if we wanted to change clang so that it linked 
>>>>>> version-locked
>>>>>>     >> versions of these libraries, -lomptarget-11 or whatever, 
>>>>>> that,
>>>>>>     in my
>>>>>>     >> opinion, would also be a reasonable choice to discuss.
>>>>>>     >>
>>>>>>     >> One thing worth capturing is the extent to which these 
>>>>>> things are
>>>>>>     >> connected. There is a relationship between libomptarget 
>>>>>> and its
>>>>>>     plugins,
>>>>>>     >> and the plugins and the device-side runtimes. There is an ABI
>>>>>>     boundary
>>>>>>     >> there somewhere. If we change nothing else, we might need to
>>>>>>     consider
>>>>>>     >> ABI stability of this part of the device-side interface.
>>>>>>     > The official distribution apt.llvm.org <http://apt.llvm.org>
>>>>>>     contains libomptarget.so for
>>>>>>     > LLVM 7 to 10, but each into separate directories under
>>>>>>     > /usr/lib/llvm-<version>/libomptarget.so. The prebuilt 
>>>>>> binaries under
>>>>>>     > https://releases.llvm.org/download.html
>>>>>> <https://releases.llvm.org/download.html> puts it directly under
>>>>>>     > <prefix>/libomptarget.so. If libomptarget is 
>>>>>> version-locked, users
>>>>>>     > need to be careful about pointing to the right 
>>>>>> LD_LIBRARY_PATH.
>>>>>>     > However, I could not find target device plugins in the 
>>>>>> distributions
>>>>>>     > (such as lib/libomptarget-nvptx-sm_60.bc when built on a 
>>>>>> machine
>>>>>>     with
>>>>>>     > CUDA). The official ubuntu repository doesn't contain
>>>>>>     libomptarget at
>>>>>>     > all. Arch Linux contains at least the x86_64 rtl
>>>>>>     > (https://www.archlinux.org/packages/extra/x86_64/openmp/files/
>>>>>> <https://www.archlinux.org/packages/extra/x86_64/openmp/files/>)
>>>>>>     > without any versioning resolution.
>>>>>>     >
>>>>>>     > Should make it explicit what the compatibility guarantees for
>>>>>>     > libomptarget are, maybe even discourage OS distributions to
>>>>>>     > pre-package libomptarget into ldconfig default paths? At least
>>>>>>     on Arch
>>>>>>     > Linux updating the openmp package will break previously
>>>>>>     > compiled-with-offloading binaries.
>>>>>>
>>>>>>
>>>>>>     You mean that it will break them *if* we make an ABI-breaking
>>>>>>     change in
>>>>>>     libomptarget. Changing the device-side runtime doesn't 
>>>>>> necessarily
>>>>>>     imply
>>>>>>     that. Nevertheless, certainly good to know.
>>>>>>
>>>>>>       -Hal
>>>>>>
>>>>>>
>>>>>>     >
>>>>>>     > Michael
>>>>>>
>>>>>>     --     Hal Finkel
>>>>>>     Lead, Compiler Technology and Programming Languages
>>>>>>     Leadership Computing Facility
>>>>>>     Argonne National Laboratory
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Openmp-dev mailing list
>>>>>> Openmp-dev at lists.llvm.org
>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>>


More information about the Openmp-dev mailing list