[Openmp-dev] [RFC] Clarify the absence of API stability for the *device runtime* (aka. libomptarget-nvptx-sm_XX.bc)

Hal Finkel via Openmp-dev openmp-dev at lists.llvm.org
Fri Jul 10 07:00:35 PDT 2020


On 7/10/20 12:19 AM, Johannes Doerfert wrote:
>
> On 7/9/20 8:25 PM, Hal Finkel wrote:
>>
>> On 7/9/20 6:46 PM, Johannes Doerfert wrote:
>>>
>>> On 7/9/20 6:22 PM, Hal Finkel wrote:
>>>>
>>>> On 7/9/20 5:33 PM, Johannes Doerfert wrote:
>>>>>
>>>>>
>>>>> On 7/9/20 5:21 PM, Hal Finkel via Openmp-dev wrote:
>>>>>>
>>>>>> On 7/9/20 5:17 PM, Michael Kruse wrote:
>>>>>>> Correct.
>>>>>>>
>>>>>>> I inferred from your response that we have no such guarantee 
>>>>>>> yet, we just haven't broken it yet.
>>>>>>>
>>>>>>> However, I think that change will break ABI of libomptarget-nvptx.a
>>>>>>
>>>>>>
>>>>>> I think that this is an important point. I interpreted this 
>>>>>> thread in the context of the API between the runtime and the 
>>>>>> device-side application code. Not the ABI between the plugin and 
>>>>>> the device-side runtime. I suspect these two are separable, but 
>>>>>> we should definitely clarify this.
>>>>>>
>>>>>
>>>>> Neither should be considered stable at this point. We kept 
>>>>> libomptarget stable while we recently added functions but I would 
>>>>> not assume it will stay that way. As I mentioned before, for the 
>>>>> device runtime there is basically no supported way today in which 
>>>>> we could even guarantee stability.
>>>>>
>>>>
>>>> I think that we should separate these concerns.
>>>>
>>>> The device-side runtime library is a static library (in IR form), 
>>>> and multiple different versions can co-exist within one application 
>>>> (in device code contained in different shared libraries). It seems 
>>>> completely reasonable to consider that version-locked to the 
>>>> compiler that compiled the code. It's an internal interface between 
>>>> two parts of a translation unit compiled by the same compiler.
>>>>
>>>> libomptarget is a shared library. libomptarget.rtl.cuda.so is a 
>>>> shared library. Unless we take special care in the naming and 
>>>> linking, managing of global state, etc. I think that we need to 
>>>> consider these to have some kind of ABI stability (because you only 
>>>> get to have one in each process). That doesn't mean that we can't 
>>>> ever decide to break that ABI, but we would likely decide not to do 
>>>> so silently. This implies that the device-side runtime should 
>>>> maintain ABI compatibility with libomptarget.rtl.cuda.so unless we 
>>>> make a non-silent breaking change. Just having kernels in a shared 
>>>> library suddenly stop working correctly when a newer version of 
>>>> libomptarget.rtl.cuda.so is loaded is probably not something we can 
>>>> considerately do at this point.
>>>>
>>>
>>> I think this is all nice and well if we would have a stable and 
>>> complete setup. I pretty much doubt we are there yet and pretending 
>>> we are is hurting us and the user alike.
>>>
>>> I think this goes in the same direction as Ye's comment. Why do we 
>>> want to guarantee stability if we don't even know if all the puzzle 
>>> pieces are in place.
>>>
>>> Interestingly, the OpenMP standard has a way out of this, as Ravi 
>>> hinted towards in another email. libomptarget is loaded on demand. 
>>> If the version is not a match we can just not load it (or skip it).
>>>
>>> At the end of the day it is not the only library that you cannot 
>>> just update and expect it to work. Nor the only one that will not 
>>> work with a program compiled for a newer version.
>>>
>>>
>>> Long story short. I would strongly suggest to not put false hopes 
>>> out there that will come back and haunt us. All (openmp) target 
>>> libraries are bound to the compiler until further notice. There is 
>>> no stability guarantee until further notice. Update (+recompile) 
>>> everything or nothing until for the time being.
>>>
>>>
>>> ~ Johannes
>>
>>
>> I understand what you're saying, but I don't think it's that simple. 
>> We understand very well that the currently implementation has all 
>> sorts of usability issues and suboptimalities of various kinds. By 
>> many measures it's barely usable. Moreover, we're improving all of 
>> these things, in part, due to feedback from users like Ye. We have a 
>> lot of applications that want to use OpenMP offload support and can't 
>> yet. However, the current implementation is not completely unusable, 
>> and in fact, I think that we must assume that it has users who depend 
>> on it. I don't think it has very many compared to the number of users 
>> we'll have after things stabilize a bit. As a result, I think that we 
>> can prioritize future users over any current ones. However, we should 
>> still be kind to our current users and communicate with them clearly. 
>> In my opinion, however, we have limited options for effectively 
>> communicating with our users, and a mailing-list thread isn't one of 
>> them. Release notes aren't really either, unfortunately. All of the 
>> means we have are technical. We can name options with 'experimental' 
>> in the name (although the ship has sailed already on that one, and 
>> probably would not have been appropriate anyway). We can bump 
>> versions of things (symbols, library names, etc.) to prevent linking 
>> things together that will be broken. We can use dynamic, versioned 
>> registration checks (i.e., the expected version is embedded into the 
>> initialization call, and the library aborts or prints a warning if 
>> provided an unexpected version), but we need to do something.
>>
>> In short, while I agree with you that pretending we have a 
>> significant existing user base for which we need to prioritize 
>> stability would be a mistake, as we make things better, our number of 
>> users will grow. We'll have a significant number well before we 
>> consider the functionality to be stable. In addition, we depend on 
>> these users submitting bug reports and other feedback in order to 
>> improve things. Thus, we should use technical measures to make it 
>> clear what mixing will work and what won't work.
>>
>> I don't understand, however, whether this is an issue of present 
>> concern, or only a matter of general policy. My impression, Johannes, 
>> was that the patch that motivated this RFC does not break the 
>> libomptarget <-> plugin <-> kernel interface at all. It changes only 
>> the inward-facing IR-level AP of the device runtime. Is that correct?
>
> Yes, this RFC was not about shared libraries (which are the interfaces 
> you mentioned). I don't really know why we now have a huge discussion 
> about something else that is theoretical in nature anyway.


Well, because Michael pointed out that Ye and I had (perhaps 
unintentionally) introduced some ambiguity into the thread, and also, 
that we needed to clarify to exactly which interfaces we were referring.

In any cases, I think that there's consensus to proceed with treating 
the device-runtime <-> device-code interface as internal (i.e., not 
having a stable API across releases) - and that's what we most needed.

  -Hal


> We always kept the interface stable, and then we added a feature or 
> fixed a bug and told everyone to recompile everything anyway.
>
>
> ~ Johannes
>
>
>>
>> Thanks again,
>>
>> Hal
>>
>>
>>>
>>>
>>>
>>>>  -Hal
>>>>
>>>>
>>>>>
>>>>>>  -Hal
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Just want to clarify what ABI stability guarantees we have.
>>>>>>>
>>>>>>> Michael
>>>>>>>
>>>>>>>
>>>>>>> Le jeu. 9 juil. 2020 à 17:08, Hal Finkel <hfinkel at anl.gov 
>>>>>>> <mailto:hfinkel at anl.gov>> a écrit :
>>>>>>>
>>>>>>>
>>>>>>>     On 7/9/20 3:39 PM, Michael Kruse wrote:
>>>>>>>     > Am Do., 9. Juli 2020 um 14:08 Uhr schrieb Hal Finkel
>>>>>>>     <hfinkel at anl.gov <mailto:hfinkel at anl.gov>>:
>>>>>>>     >> Also, if we wanted to change clang so that it linked 
>>>>>>> version-locked
>>>>>>>     >> versions of these libraries, -lomptarget-11 or whatever, 
>>>>>>> that,
>>>>>>>     in my
>>>>>>>     >> opinion, would also be a reasonable choice to discuss.
>>>>>>>     >>
>>>>>>>     >> One thing worth capturing is the extent to which these 
>>>>>>> things are
>>>>>>>     >> connected. There is a relationship between libomptarget 
>>>>>>> and its
>>>>>>>     plugins,
>>>>>>>     >> and the plugins and the device-side runtimes. There is an 
>>>>>>> ABI
>>>>>>>     boundary
>>>>>>>     >> there somewhere. If we change nothing else, we might need to
>>>>>>>     consider
>>>>>>>     >> ABI stability of this part of the device-side interface.
>>>>>>>     > The official distribution apt.llvm.org <http://apt.llvm.org>
>>>>>>>     contains libomptarget.so for
>>>>>>>     > LLVM 7 to 10, but each into separate directories under
>>>>>>>     > /usr/lib/llvm-<version>/libomptarget.so. The prebuilt 
>>>>>>> binaries under
>>>>>>>     > https://releases.llvm.org/download.html
>>>>>>> <https://releases.llvm.org/download.html> puts it directly under
>>>>>>>     > <prefix>/libomptarget.so. If libomptarget is 
>>>>>>> version-locked, users
>>>>>>>     > need to be careful about pointing to the right 
>>>>>>> LD_LIBRARY_PATH.
>>>>>>>     > However, I could not find target device plugins in the 
>>>>>>> distributions
>>>>>>>     > (such as lib/libomptarget-nvptx-sm_60.bc when built on a 
>>>>>>> machine
>>>>>>>     with
>>>>>>>     > CUDA). The official ubuntu repository doesn't contain
>>>>>>>     libomptarget at
>>>>>>>     > all. Arch Linux contains at least the x86_64 rtl
>>>>>>>     > 
>>>>>>> (https://www.archlinux.org/packages/extra/x86_64/openmp/files/
>>>>>>> <https://www.archlinux.org/packages/extra/x86_64/openmp/files/>)
>>>>>>>     > without any versioning resolution.
>>>>>>>     >
>>>>>>>     > Should make it explicit what the compatibility guarantees for
>>>>>>>     > libomptarget are, maybe even discourage OS distributions to
>>>>>>>     > pre-package libomptarget into ldconfig default paths? At 
>>>>>>> least
>>>>>>>     on Arch
>>>>>>>     > Linux updating the openmp package will break previously
>>>>>>>     > compiled-with-offloading binaries.
>>>>>>>
>>>>>>>
>>>>>>>     You mean that it will break them *if* we make an ABI-breaking
>>>>>>>     change in
>>>>>>>     libomptarget. Changing the device-side runtime doesn't 
>>>>>>> necessarily
>>>>>>>     imply
>>>>>>>     that. Nevertheless, certainly good to know.
>>>>>>>
>>>>>>>       -Hal
>>>>>>>
>>>>>>>
>>>>>>>     >
>>>>>>>     > Michael
>>>>>>>
>>>>>>>     --     Hal Finkel
>>>>>>>     Lead, Compiler Technology and Programming Languages
>>>>>>>     Leadership Computing Facility
>>>>>>>     Argonne National Laboratory
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Openmp-dev mailing list
>>>>>>> Openmp-dev at lists.llvm.org
>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>>>
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory



More information about the Openmp-dev mailing list