[Openmp-dev] [RFC] Clarify the absence of API stability for the *device runtime* (aka. libomptarget-nvptx-sm_XX.bc)
Hal Finkel via Openmp-dev
openmp-dev at lists.llvm.org
Fri Jul 10 07:00:35 PDT 2020
On 7/10/20 12:19 AM, Johannes Doerfert wrote:
>
> On 7/9/20 8:25 PM, Hal Finkel wrote:
>>
>> On 7/9/20 6:46 PM, Johannes Doerfert wrote:
>>>
>>> On 7/9/20 6:22 PM, Hal Finkel wrote:
>>>>
>>>> On 7/9/20 5:33 PM, Johannes Doerfert wrote:
>>>>>
>>>>>
>>>>> On 7/9/20 5:21 PM, Hal Finkel via Openmp-dev wrote:
>>>>>>
>>>>>> On 7/9/20 5:17 PM, Michael Kruse wrote:
>>>>>>> Correct.
>>>>>>>
>>>>>>> I inferred from your response that we have no such guarantee
>>>>>>> yet, we just haven't broken it yet.
>>>>>>>
>>>>>>> However, I think that change will break ABI of libomptarget-nvptx.a
>>>>>>
>>>>>>
>>>>>> I think that this is an important point. I interpreted this
>>>>>> thread in the context of the API between the runtime and the
>>>>>> device-side application code. Not the ABI between the plugin and
>>>>>> the device-side runtime. I suspect these two are separable, but
>>>>>> we should definitely clarify this.
>>>>>>
>>>>>
>>>>> Neither should be considered stable at this point. We kept
>>>>> libomptarget stable while we recently added functions but I would
>>>>> not assume it will stay that way. As I mentioned before, for the
>>>>> device runtime there is basically no supported way today in which
>>>>> we could even guarantee stability.
>>>>>
>>>>
>>>> I think that we should separate these concerns.
>>>>
>>>> The device-side runtime library is a static library (in IR form),
>>>> and multiple different versions can co-exist within one application
>>>> (in device code contained in different shared libraries). It seems
>>>> completely reasonable to consider that version-locked to the
>>>> compiler that compiled the code. It's an internal interface between
>>>> two parts of a translation unit compiled by the same compiler.
>>>>
>>>> libomptarget is a shared library. libomptarget.rtl.cuda.so is a
>>>> shared library. Unless we take special care in the naming and
>>>> linking, managing of global state, etc. I think that we need to
>>>> consider these to have some kind of ABI stability (because you only
>>>> get to have one in each process). That doesn't mean that we can't
>>>> ever decide to break that ABI, but we would likely decide not to do
>>>> so silently. This implies that the device-side runtime should
>>>> maintain ABI compatibility with libomptarget.rtl.cuda.so unless we
>>>> make a non-silent breaking change. Just having kernels in a shared
>>>> library suddenly stop working correctly when a newer version of
>>>> libomptarget.rtl.cuda.so is loaded is probably not something we can
>>>> considerately do at this point.
>>>>
>>>
>>> I think this is all nice and well if we would have a stable and
>>> complete setup. I pretty much doubt we are there yet and pretending
>>> we are is hurting us and the user alike.
>>>
>>> I think this goes in the same direction as Ye's comment. Why do we
>>> want to guarantee stability if we don't even know if all the puzzle
>>> pieces are in place.
>>>
>>> Interestingly, the OpenMP standard has a way out of this, as Ravi
>>> hinted towards in another email. libomptarget is loaded on demand.
>>> If the version is not a match we can just not load it (or skip it).
>>>
>>> At the end of the day it is not the only library that you cannot
>>> just update and expect it to work. Nor the only one that will not
>>> work with a program compiled for a newer version.
>>>
>>>
>>> Long story short. I would strongly suggest to not put false hopes
>>> out there that will come back and haunt us. All (openmp) target
>>> libraries are bound to the compiler until further notice. There is
>>> no stability guarantee until further notice. Update (+recompile)
>>> everything or nothing until for the time being.
>>>
>>>
>>> ~ Johannes
>>
>>
>> I understand what you're saying, but I don't think it's that simple.
>> We understand very well that the currently implementation has all
>> sorts of usability issues and suboptimalities of various kinds. By
>> many measures it's barely usable. Moreover, we're improving all of
>> these things, in part, due to feedback from users like Ye. We have a
>> lot of applications that want to use OpenMP offload support and can't
>> yet. However, the current implementation is not completely unusable,
>> and in fact, I think that we must assume that it has users who depend
>> on it. I don't think it has very many compared to the number of users
>> we'll have after things stabilize a bit. As a result, I think that we
>> can prioritize future users over any current ones. However, we should
>> still be kind to our current users and communicate with them clearly.
>> In my opinion, however, we have limited options for effectively
>> communicating with our users, and a mailing-list thread isn't one of
>> them. Release notes aren't really either, unfortunately. All of the
>> means we have are technical. We can name options with 'experimental'
>> in the name (although the ship has sailed already on that one, and
>> probably would not have been appropriate anyway). We can bump
>> versions of things (symbols, library names, etc.) to prevent linking
>> things together that will be broken. We can use dynamic, versioned
>> registration checks (i.e., the expected version is embedded into the
>> initialization call, and the library aborts or prints a warning if
>> provided an unexpected version), but we need to do something.
>>
>> In short, while I agree with you that pretending we have a
>> significant existing user base for which we need to prioritize
>> stability would be a mistake, as we make things better, our number of
>> users will grow. We'll have a significant number well before we
>> consider the functionality to be stable. In addition, we depend on
>> these users submitting bug reports and other feedback in order to
>> improve things. Thus, we should use technical measures to make it
>> clear what mixing will work and what won't work.
>>
>> I don't understand, however, whether this is an issue of present
>> concern, or only a matter of general policy. My impression, Johannes,
>> was that the patch that motivated this RFC does not break the
>> libomptarget <-> plugin <-> kernel interface at all. It changes only
>> the inward-facing IR-level AP of the device runtime. Is that correct?
>
> Yes, this RFC was not about shared libraries (which are the interfaces
> you mentioned). I don't really know why we now have a huge discussion
> about something else that is theoretical in nature anyway.
Well, because Michael pointed out that Ye and I had (perhaps
unintentionally) introduced some ambiguity into the thread, and also,
that we needed to clarify to exactly which interfaces we were referring.
In any cases, I think that there's consensus to proceed with treating
the device-runtime <-> device-code interface as internal (i.e., not
having a stable API across releases) - and that's what we most needed.
-Hal
> We always kept the interface stable, and then we added a feature or
> fixed a bug and told everyone to recompile everything anyway.
>
>
> ~ Johannes
>
>
>>
>> Thanks again,
>>
>> Hal
>>
>>
>>>
>>>
>>>
>>>> -Hal
>>>>
>>>>
>>>>>
>>>>>> -Hal
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Just want to clarify what ABI stability guarantees we have.
>>>>>>>
>>>>>>> Michael
>>>>>>>
>>>>>>>
>>>>>>> Le jeu. 9 juil. 2020 à 17:08, Hal Finkel <hfinkel at anl.gov
>>>>>>> <mailto:hfinkel at anl.gov>> a écrit :
>>>>>>>
>>>>>>>
>>>>>>> On 7/9/20 3:39 PM, Michael Kruse wrote:
>>>>>>> > Am Do., 9. Juli 2020 um 14:08 Uhr schrieb Hal Finkel
>>>>>>> <hfinkel at anl.gov <mailto:hfinkel at anl.gov>>:
>>>>>>> >> Also, if we wanted to change clang so that it linked
>>>>>>> version-locked
>>>>>>> >> versions of these libraries, -lomptarget-11 or whatever,
>>>>>>> that,
>>>>>>> in my
>>>>>>> >> opinion, would also be a reasonable choice to discuss.
>>>>>>> >>
>>>>>>> >> One thing worth capturing is the extent to which these
>>>>>>> things are
>>>>>>> >> connected. There is a relationship between libomptarget
>>>>>>> and its
>>>>>>> plugins,
>>>>>>> >> and the plugins and the device-side runtimes. There is an
>>>>>>> ABI
>>>>>>> boundary
>>>>>>> >> there somewhere. If we change nothing else, we might need to
>>>>>>> consider
>>>>>>> >> ABI stability of this part of the device-side interface.
>>>>>>> > The official distribution apt.llvm.org <http://apt.llvm.org>
>>>>>>> contains libomptarget.so for
>>>>>>> > LLVM 7 to 10, but each into separate directories under
>>>>>>> > /usr/lib/llvm-<version>/libomptarget.so. The prebuilt
>>>>>>> binaries under
>>>>>>> > https://releases.llvm.org/download.html
>>>>>>> <https://releases.llvm.org/download.html> puts it directly under
>>>>>>> > <prefix>/libomptarget.so. If libomptarget is
>>>>>>> version-locked, users
>>>>>>> > need to be careful about pointing to the right
>>>>>>> LD_LIBRARY_PATH.
>>>>>>> > However, I could not find target device plugins in the
>>>>>>> distributions
>>>>>>> > (such as lib/libomptarget-nvptx-sm_60.bc when built on a
>>>>>>> machine
>>>>>>> with
>>>>>>> > CUDA). The official ubuntu repository doesn't contain
>>>>>>> libomptarget at
>>>>>>> > all. Arch Linux contains at least the x86_64 rtl
>>>>>>> >
>>>>>>> (https://www.archlinux.org/packages/extra/x86_64/openmp/files/
>>>>>>> <https://www.archlinux.org/packages/extra/x86_64/openmp/files/>)
>>>>>>> > without any versioning resolution.
>>>>>>> >
>>>>>>> > Should make it explicit what the compatibility guarantees for
>>>>>>> > libomptarget are, maybe even discourage OS distributions to
>>>>>>> > pre-package libomptarget into ldconfig default paths? At
>>>>>>> least
>>>>>>> on Arch
>>>>>>> > Linux updating the openmp package will break previously
>>>>>>> > compiled-with-offloading binaries.
>>>>>>>
>>>>>>>
>>>>>>> You mean that it will break them *if* we make an ABI-breaking
>>>>>>> change in
>>>>>>> libomptarget. Changing the device-side runtime doesn't
>>>>>>> necessarily
>>>>>>> imply
>>>>>>> that. Nevertheless, certainly good to know.
>>>>>>>
>>>>>>> -Hal
>>>>>>>
>>>>>>>
>>>>>>> >
>>>>>>> > Michael
>>>>>>>
>>>>>>> -- Hal Finkel
>>>>>>> Lead, Compiler Technology and Programming Languages
>>>>>>> Leadership Computing Facility
>>>>>>> Argonne National Laboratory
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Openmp-dev mailing list
>>>>>>> Openmp-dev at lists.llvm.org
>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>>>
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
More information about the Openmp-dev
mailing list