[libc-dev] __builtin_* vs llvm-libc provided ones?

Guillaume Chatelet via libc-dev libc-dev at lists.llvm.org
Wed Feb 3 03:01:24 PST 2021


re: memcpy & builtins

> Now given that, not as a macro but is it possible to use the builtins
inside llvm-libc implementation maybe so llvm-libc won't have to implement
them again? I mean do you see it possible for llvm-libc to get its
implementation shared with compiler one somehow behind the scene?

Sure enough sharing implementation is always a noble goal to pursue.
Unfortunately, there are a number of things to consider that makes it hard
for memory functions in general.
I'll try to give an overview of the challenges here. I'll start with the
basics - my apologies if you already know most of it.

Most compilers use an internal representation that is well suited for
abstract, relatively high level depiction of operations.
1. When compiling C, C++, Rust - you name it - the source language is first
transformed by the front end into a common representation (the so-called
IR).
2. This IR can be transformed by general - CPU agnostic - passes and
progressively refined (lowered) to be closer and closer to the real
underlying hardware.
3. Finally code generation occurs (SelectionDAG legalization and
optimizations, register allocation, code generation)


During 1, we can convey the memcpy semantic to the IR in different ways:
 - by using the __bultin_memcpy builtin you are mentioning
https://godbolt.org/z/Pq3Exd
 - by using language constructs that require the use of the memcpy semantic
https://godbolt.org/z/bqY31h
 - by calling the standard library https://godbolt.org/z/f9dvda

During 2, a bunch of smart optimizations may recognize IR patterns and turn
them into the IR memcpy intrinsics
 - loop without optimization https://godbolt.org/z/cfzTas
 - loop with optimization https://godbolt.org/z/1E55rv

This behavior can be disabled by using the "-fno-builtin-memcpy" misnomer
https://godbolt.org/z/7GoxPT
In addition this flag also prevents the frontend from recognizing libc
memcpy function https://godbolt.org/z/dsrTrc
I know this is confusing :-/

Now the good thing with having the compiler understand memcpy semantic is
that it can produce excellent code based on the context:
 - If size is constant and small, the IR optimization passes turn the
memcpy intrinsic into loads and stores https://godbolt.org/z/b81z84
 - But if size is too big, the compiler may delegate to libc
https://godbolt.org/z/jhs7Pj
 - Under some circumstances it can also choose to emit a loop
https://godbolt.org/z/Wq3n99

To sum it up, many constructs can end up being interpreted as having the
memcpy semantic by LLVM and depending on the context the resulting code may
differ widely.

Now it is desirable to have a C/C++ implementation of memcpy to be able to
leverage optimization techniques like Profile Guided Optimization. For
instance when the compiler _sees_ the code, it can reason about it and take
inlining decisions, reorder branches, etc...
The complex interactions I described earlier turns this into a chicken and
egg problem where the code may end up calling itself indefinitely
https://godbolt.org/z/eg0p_E

This is why __builtin_memcpy_inline has been designed in the first place
(see the original thread about it
https://lists.llvm.org/pipermail/llvm-dev/2019-April/131973.html).
Its contract is simpler and makes it useful as a building block for
creating memcpy functions in pure C/C++.

> Maybe through some directory inside somewhere that the both compiler and
llvm-libc can share their implementations of those?

It may not be self evident from what I described earlier but the way memcpy
is implemented in LLVM really spans a lot of different parts and I'm not
sure it is possible to gather it in a single place as regular code without
adding ways to communicate intents to the compiler (aka more builtins).
For instance loop creation has to take place at the IR level (Phi nodes and
condition for the loop) but it may be in tension with the availability of
accelerators that are particular to backend implementations (think Enhanced
REP MOVSB/STOSB for x86 processors)

I'm aware that this answer is probably really confusing but I hope it helps
still.

On Tue, Feb 2, 2021 at 9:03 PM Ebrahim Byagowi via libc-dev <
libc-dev at lists.llvm.org> wrote:

> Thank you so much.
>
> With your explanation now I understand builtins propose better now I
> think, I was also wrong about __builtin_memcpy_inline as it isn't as
> flexible as a real libc memcpy and needs its third argument to be a
> constant, "error: argument to '__builtin_memcpy_inline' must be a constant
> int…" (which I wished it wasn't the case but is understandable why it is)
> and now I see __builtin_memcpy is also a proxy to libc memcpy which I
> guess is there just to make compiler code analysis easier.
>
> Now given that, not as a macro but is it possible to use the builtins
> inside llvm-libc implementation maybe so llvm-libc won't have to implement
> them again? I mean do you see it possible for llvm-libc to get its
> implementation shared with compiler one somehow behind the scene? Maybe
> through some directory inside somewhere that the both compiler and
> llvm-libc can share their implementations of those?
>
> The reason I'm asking is because of a hope I have to see compiler builtins
> some day to be more capable, which I understand I shouldn't be that hopeful
> about it, but I think the questions can be thought about regardless.
>
> Thanks!
>
> On Tue, Feb 2, 2021 at 10:38 PM Siva Chandra <sivachandra at google.com>
> wrote:
>
>> On Mon, Feb 1, 2021 at 2:58 AM Ebrahim Byagowi <ebraminio at gmail.com>
>> wrote:
>>
>>> To describe in maybe some better way, for example this is all I need to
>>> get ceilf floorf etc in a .wasm module being built by -nostdlib -nostdinc
>>>
>>> #define ceilf __builtin_ceilf
>>> #define floorf __builtin_floorf
>>> #define abs __builtin_abs
>>> #define fabs __builtin_fabs
>>>
>>> so I wondered if I could get more of libc this way (parts make sense
>>> ofc) or at least to know what will be the relation between those builtin
>>> implementations and the upcoming libc.
>>>
>>
>> IIUC, there are two parts to your question:
>> 1. Can we implement a libc function as a macro resolving to a builtin:
>> Not if the standard requires the function to be a real addressable
>> function. One can choose to also provide a macro, but an addressable
>> function declaration should be available. See section 7.1.4 of the C11
>> standard for more information.
>> 2. What is the difference between builtins and the libc flavors of the
>> functions: Typically, builtins resolve to the hardware instruction
>> implementing the operation. If a hardware implementation is not available,
>> the compiler builtin calls into the libc itself. With respect to math
>> functions, you will notice this wilh the `long double` flavors. That said,
>> we have implemented the math functions from first principles (as in, the
>> implementations do not assume any special hardware support) in LLVM libc.
>> However, we are just about starting to add machine specific implementations
>> (https://reviews.llvm.org/D95850). This should make the libc functions
>> equivalent to the compiler builtins.
>>
> _______________________________________________
> libc-dev mailing list
> libc-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/libc-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/libc-dev/attachments/20210203/13c7b62e/attachment-0001.html>


More information about the libc-dev mailing list