[llvm-dev] [RFC] The `implements` attribute, or how to swap functions statically but late

Johannes Doerfert via llvm-dev llvm-dev at lists.llvm.org
Fri Mar 19 17:10:36 PDT 2021

The basic idea is to provide a way to replace a callee statically 
*inside* the LLVM-core
pipeline [0]. The initial use case was the missing translation of 
certain (math) intrinsics
by the NVPTX backend, however, we believe there are other use cases that 
would benefit from
a generic model, one is described below as well as the benefits over 


Use cases

1a) NVPTX + fast-math + math functions

The original use case was discussed here [1] and the problem is shown 
here [2].
In a nutshell, `llvm.sin` (and friends) are not supported by the backend 
and it
will yell at you. This is unfortunate but could arguably be resolved in 
a lot of
different ways (below). Note that there is not necessarily a 1:1 mapping 
for all
intrinsics and we might need to write a wrapper which would probably be in
clang/lib/Header with other CUDA wrapper code.

Considered solutions we did not pick:
a) Teach the backend explicitly about the mapping `llvm.sin` -> `__nv_sin`.
b) Teach clang not to emit the intrinsics in the first place.
c) Use an existing overload "hack", e.g., __asm__.

Let's discuss those first.
a) Works, is very limited to this use case, the mapping is not with the 
    and definitions which makes the maintenance less appealing.
b) Seems easy enough, we loose all the benefits of intrinsics though, 
more on that
c) Right now, IR isn't happy with defining intrinsics so there is little 
we could do
    better than b) here I think, short of allowing intrinsic definitions 
(which is an

1b) User can overwrite the implementation of any intrinsic, regardless 
if the backend
     supports it or not. They still get the intrinsic benefits during 
the compilation.
     This is not limited to NVPTX and math intrinsics but also opens the 
door for cross
     platform tooling, e.g., Tool XYZ will emit intrinsics our backed 
doesn't support
     but we can map them to implementations for our architecture.

1c) You can prototype/test different lowerings for functions and 
intrinsics easily by
     linking in a bitcode file (which can be compiled form C*).

2) Function specifications and implementations

I would like to embed a function specification with the implementation, e.g.
void vec_add(double *A, double *B, int N) {
   for (int i = 0; i < N; ++i)
     A[i] += B[i];
void vec_add_impl(double *A, double *B, int N) {
   #pragma omp parallel for
for (int i = 0; i < N; ++i)
     A[i] += B[i];

I think that would open up cool possibilities:
A) Replace only the first N specifications with implementations, allows 
to improve debugging.
B) Use the specification for the first part of the optimization pipeline 
so you
    can derive facts the implementation "hides". E.g., `llvm.sin` calls 
could be
    annotated as `readnone` while the `__nv_sin` call contains inline 
assembly and
    acts as a optimization barrier. Or, in the above example, it is 
obvious from the
    "specification" that `B` is readonly but the implementation could 
arbitrarily hide this,
    e.g., the implementation might be a runtime call.
C) Program verification, test synthesis, etc. would be possible on a new 
level in a defined way.
    By keeping implementation and specification close together and 
written in the same
    language we might get somewhere in this area.



I initially thought about the reverse attribute, so `vec_add` in the 
above example would have
`__attribute__((specifies(vec_add_impl))` for the same reasons (A-C). 
The problem is that I'm
not sure how to attach anything to an intrinsic. Some `__asm__` trickery 
might help but it's
unclear that this is better than the `implements` version. Maybe we want 
both for convenience
with the `implements` version being able to handle intrinsics.

~ Johannes

[0] https://reviews.llvm.org/D98516
[2] https://godbolt.org/z/PxsEWs

∽ Johannes (he/his)

More information about the llvm-dev mailing list