[Openmp-dev] [RFC] Clarify the absence of API stability for the *device runtime* (aka. libomptarget-nvptx-sm_XX.bc)

Hal Finkel via Openmp-dev openmp-dev at lists.llvm.org
Thu Jul 9 18:25:58 PDT 2020


On 7/9/20 6:46 PM, Johannes Doerfert wrote:
>
> On 7/9/20 6:22 PM, Hal Finkel wrote:
>>
>> On 7/9/20 5:33 PM, Johannes Doerfert wrote:
>>>
>>>
>>> On 7/9/20 5:21 PM, Hal Finkel via Openmp-dev wrote:
>>>>
>>>> On 7/9/20 5:17 PM, Michael Kruse wrote:
>>>>> Correct.
>>>>>
>>>>> I inferred from your response that we have no such guarantee yet, 
>>>>> we just haven't broken it yet.
>>>>>
>>>>> However, I think that change will break ABI of libomptarget-nvptx.a
>>>>
>>>>
>>>> I think that this is an important point. I interpreted this thread 
>>>> in the context of the API between the runtime and the device-side 
>>>> application code. Not the ABI between the plugin and the 
>>>> device-side runtime. I suspect these two are separable, but we 
>>>> should definitely clarify this.
>>>>
>>>
>>> Neither should be considered stable at this point. We kept 
>>> libomptarget stable while we recently added functions but I would 
>>> not assume it will stay that way. As I mentioned before, for the 
>>> device runtime there is basically no supported way today in which we 
>>> could even guarantee stability.
>>>
>>
>> I think that we should separate these concerns.
>>
>> The device-side runtime library is a static library (in IR form), and 
>> multiple different versions can co-exist within one application (in 
>> device code contained in different shared libraries). It seems 
>> completely reasonable to consider that version-locked to the compiler 
>> that compiled the code. It's an internal interface between two parts 
>> of a translation unit compiled by the same compiler.
>>
>> libomptarget is a shared library. libomptarget.rtl.cuda.so is a 
>> shared library. Unless we take special care in the naming and 
>> linking, managing of global state, etc. I think that we need to 
>> consider these to have some kind of ABI stability (because you only 
>> get to have one in each process). That doesn't mean that we can't 
>> ever decide to break that ABI, but we would likely decide not to do 
>> so silently. This implies that the device-side runtime should 
>> maintain ABI compatibility with libomptarget.rtl.cuda.so unless we 
>> make a non-silent breaking change. Just having kernels in a shared 
>> library suddenly stop working correctly when a newer version of 
>> libomptarget.rtl.cuda.so is loaded is probably not something we can 
>> considerately do at this point.
>>
>
> I think this is all nice and well if we would have a stable and 
> complete setup. I pretty much doubt we are there yet and pretending we 
> are is hurting us and the user alike.
>
> I think this goes in the same direction as Ye's comment. Why do we 
> want to guarantee stability if we don't even know if all the puzzle 
> pieces are in place.
>
> Interestingly, the OpenMP standard has a way out of this, as Ravi 
> hinted towards in another email. libomptarget is loaded on demand. If 
> the version is not a match we can just not load it (or skip it).
>
> At the end of the day it is not the only library that you cannot just 
> update and expect it to work. Nor the only one that will not work with 
> a program compiled for a newer version.
>
>
> Long story short. I would strongly suggest to not put false hopes out 
> there that will come back and haunt us. All (openmp) target libraries 
> are bound to the compiler until further notice. There is no stability 
> guarantee until further notice. Update (+recompile) everything or 
> nothing until for the time being.
>
>
> ~ Johannes


I understand what you're saying, but I don't think it's that simple. We 
understand very well that the currently implementation has all sorts of 
usability issues and suboptimalities of various kinds. By many measures 
it's barely usable. Moreover, we're improving all of these things, in 
part, due to feedback from users like Ye. We have a lot of applications 
that want to use OpenMP offload support and can't yet. However, the 
current implementation is not completely unusable, and in fact, I think 
that we must assume that it has users who depend on it. I don't think it 
has very many compared to the number of users we'll have after things 
stabilize a bit. As a result, I think that we can prioritize future 
users over any current ones. However, we should still be kind to our 
current users and communicate with them clearly. In my opinion, however, 
we have limited options for effectively communicating with our users, 
and a mailing-list thread isn't one of them. Release notes aren't really 
either, unfortunately. All of the means we have are technical. We can 
name options with 'experimental' in the name (although the ship has 
sailed already on that one, and probably would not have been appropriate 
anyway). We can bump versions of things (symbols, library names, etc.) 
to prevent linking things together that will be broken. We can use 
dynamic, versioned registration checks (i.e., the expected version is 
embedded into the initialization call, and the library aborts or prints 
a warning if provided an unexpected version), but we need to do something.

In short, while I agree with you that pretending we have a significant 
existing user base for which we need to prioritize stability would be a 
mistake, as we make things better, our number of users will grow. We'll 
have a significant number well before we consider the functionality to 
be stable. In addition, we depend on these users submitting bug reports 
and other feedback in order to improve things. Thus, we should use 
technical measures to make it clear what mixing will work and what won't 
work.

I don't understand, however, whether this is an issue of present 
concern, or only a matter of general policy. My impression, Johannes, 
was that the patch that motivated this RFC does not break the 
libomptarget <-> plugin <-> kernel interface at all. It changes only the 
inward-facing IR-level AP of the device runtime. Is that correct?

Thanks again,

Hal


>
>
>
>>  -Hal
>>
>>
>>>
>>>>  -Hal
>>>>
>>>>
>>>>>
>>>>> Just want to clarify what ABI stability guarantees we have.
>>>>>
>>>>> Michael
>>>>>
>>>>>
>>>>> Le jeu. 9 juil. 2020 à 17:08, Hal Finkel <hfinkel at anl.gov 
>>>>> <mailto:hfinkel at anl.gov>> a écrit :
>>>>>
>>>>>
>>>>>     On 7/9/20 3:39 PM, Michael Kruse wrote:
>>>>>     > Am Do., 9. Juli 2020 um 14:08 Uhr schrieb Hal Finkel
>>>>>     <hfinkel at anl.gov <mailto:hfinkel at anl.gov>>:
>>>>>     >> Also, if we wanted to change clang so that it linked 
>>>>> version-locked
>>>>>     >> versions of these libraries, -lomptarget-11 or whatever, that,
>>>>>     in my
>>>>>     >> opinion, would also be a reasonable choice to discuss.
>>>>>     >>
>>>>>     >> One thing worth capturing is the extent to which these 
>>>>> things are
>>>>>     >> connected. There is a relationship between libomptarget and 
>>>>> its
>>>>>     plugins,
>>>>>     >> and the plugins and the device-side runtimes. There is an ABI
>>>>>     boundary
>>>>>     >> there somewhere. If we change nothing else, we might need to
>>>>>     consider
>>>>>     >> ABI stability of this part of the device-side interface.
>>>>>     > The official distribution apt.llvm.org <http://apt.llvm.org>
>>>>>     contains libomptarget.so for
>>>>>     > LLVM 7 to 10, but each into separate directories under
>>>>>     > /usr/lib/llvm-<version>/libomptarget.so. The prebuilt 
>>>>> binaries under
>>>>>     > https://releases.llvm.org/download.html
>>>>> <https://releases.llvm.org/download.html> puts it directly under
>>>>>     > <prefix>/libomptarget.so. If libomptarget is version-locked, 
>>>>> users
>>>>>     > need to be careful about pointing to the right LD_LIBRARY_PATH.
>>>>>     > However, I could not find target device plugins in the 
>>>>> distributions
>>>>>     > (such as lib/libomptarget-nvptx-sm_60.bc when built on a 
>>>>> machine
>>>>>     with
>>>>>     > CUDA). The official ubuntu repository doesn't contain
>>>>>     libomptarget at
>>>>>     > all. Arch Linux contains at least the x86_64 rtl
>>>>>     > (https://www.archlinux.org/packages/extra/x86_64/openmp/files/
>>>>> <https://www.archlinux.org/packages/extra/x86_64/openmp/files/>)
>>>>>     > without any versioning resolution.
>>>>>     >
>>>>>     > Should make it explicit what the compatibility guarantees for
>>>>>     > libomptarget are, maybe even discourage OS distributions to
>>>>>     > pre-package libomptarget into ldconfig default paths? At least
>>>>>     on Arch
>>>>>     > Linux updating the openmp package will break previously
>>>>>     > compiled-with-offloading binaries.
>>>>>
>>>>>
>>>>>     You mean that it will break them *if* we make an ABI-breaking
>>>>>     change in
>>>>>     libomptarget. Changing the device-side runtime doesn't 
>>>>> necessarily
>>>>>     imply
>>>>>     that. Nevertheless, certainly good to know.
>>>>>
>>>>>       -Hal
>>>>>
>>>>>
>>>>>     >
>>>>>     > Michael
>>>>>
>>>>>     --     Hal Finkel
>>>>>     Lead, Compiler Technology and Programming Languages
>>>>>     Leadership Computing Facility
>>>>>     Argonne National Laboratory
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Openmp-dev mailing list
>>>>> Openmp-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory



More information about the Openmp-dev mailing list