[libc-dev] __builtin_* vs llvm-libc provided ones?

Ebrahim Byagowi via libc-dev libc-dev at lists.llvm.org
Wed Feb 3 03:53:00 PST 2021


Thank you so much for the explanations, the way things are organized make
more sense to me now!

Thanks!

On Wed, Feb 3, 2021 at 2:31 PM Guillaume Chatelet <gchatelet at google.com>
wrote:

> re: memcpy & builtins
>
> > Now given that, not as a macro but is it possible to use the builtins
> inside llvm-libc implementation maybe so llvm-libc won't have to implement
> them again? I mean do you see it possible for llvm-libc to get its
> implementation shared with compiler one somehow behind the scene?
>
> Sure enough sharing implementation is always a noble goal to pursue.
> Unfortunately, there are a number of things to consider that makes it hard
> for memory functions in general.
> I'll try to give an overview of the challenges here. I'll start with the
> basics - my apologies if you already know most of it.
>
> Most compilers use an internal representation that is well suited for
> abstract, relatively high level depiction of operations.
> 1. When compiling C, C++, Rust - you name it - the source language is
> first transformed by the front end into a common representation (the
> so-called IR).
> 2. This IR can be transformed by general - CPU agnostic - passes and
> progressively refined (lowered) to be closer and closer to the real
> underlying hardware.
> 3. Finally code generation occurs (SelectionDAG legalization and
> optimizations, register allocation, code generation)
>
>
> During 1, we can convey the memcpy semantic to the IR in different ways:
>  - by using the __bultin_memcpy builtin you are mentioning
> https://godbolt.org/z/Pq3Exd
>  - by using language constructs that require the use of the memcpy
> semantic https://godbolt.org/z/bqY31h
>  - by calling the standard library https://godbolt.org/z/f9dvda
>
> During 2, a bunch of smart optimizations may recognize IR patterns and
> turn them into the IR memcpy intrinsics
>  - loop without optimization https://godbolt.org/z/cfzTas
>  - loop with optimization https://godbolt.org/z/1E55rv
>
> This behavior can be disabled by using the "-fno-builtin-memcpy" misnomer
> https://godbolt.org/z/7GoxPT
> In addition this flag also prevents the frontend from recognizing libc
> memcpy function https://godbolt.org/z/dsrTrc
> I know this is confusing :-/
>
> Now the good thing with having the compiler understand memcpy semantic is
> that it can produce excellent code based on the context:
>  - If size is constant and small, the IR optimization passes turn the
> memcpy intrinsic into loads and stores https://godbolt.org/z/b81z84
>  - But if size is too big, the compiler may delegate to libc
> https://godbolt.org/z/jhs7Pj
>  - Under some circumstances it can also choose to emit a loop
> https://godbolt.org/z/Wq3n99
>
> To sum it up, many constructs can end up being interpreted as having the
> memcpy semantic by LLVM and depending on the context the resulting code may
> differ widely.
>
> Now it is desirable to have a C/C++ implementation of memcpy to be able to
> leverage optimization techniques like Profile Guided Optimization. For
> instance when the compiler _sees_ the code, it can reason about it and take
> inlining decisions, reorder branches, etc...
> The complex interactions I described earlier turns this into a chicken and
> egg problem where the code may end up calling itself indefinitely
> https://godbolt.org/z/eg0p_E
>
> This is why __builtin_memcpy_inline has been designed in the first place
> (see the original thread about it
> https://lists.llvm.org/pipermail/llvm-dev/2019-April/131973.html).
> Its contract is simpler and makes it useful as a building block for
> creating memcpy functions in pure C/C++.
>
> > Maybe through some directory inside somewhere that the both compiler and
> llvm-libc can share their implementations of those?
>
> It may not be self evident from what I described earlier but the way
> memcpy is implemented in LLVM really spans a lot of different parts and I'm
> not sure it is possible to gather it in a single place as regular code
> without adding ways to communicate intents to the compiler (aka more
> builtins).
> For instance loop creation has to take place at the IR level (Phi nodes
> and condition for the loop) but it may be in tension with the availability
> of accelerators that are particular to backend implementations (think
> Enhanced REP MOVSB/STOSB for x86 processors)
>
> I'm aware that this answer is probably really confusing but I hope it
> helps still.
>
> On Tue, Feb 2, 2021 at 9:03 PM Ebrahim Byagowi via libc-dev <
> libc-dev at lists.llvm.org> wrote:
>
>> Thank you so much.
>>
>> With your explanation now I understand builtins propose better now I
>> think, I was also wrong about __builtin_memcpy_inline as it isn't as
>> flexible as a real libc memcpy and needs its third argument to be a
>> constant, "error: argument to '__builtin_memcpy_inline' must be a constant
>> int…" (which I wished it wasn't the case but is understandable why it is)
>> and now I see __builtin_memcpy is also a proxy to libc memcpy which I
>> guess is there just to make compiler code analysis easier.
>>
>> Now given that, not as a macro but is it possible to use the builtins
>> inside llvm-libc implementation maybe so llvm-libc won't have to implement
>> them again? I mean do you see it possible for llvm-libc to get its
>> implementation shared with compiler one somehow behind the scene? Maybe
>> through some directory inside somewhere that the both compiler and
>> llvm-libc can share their implementations of those?
>>
>> The reason I'm asking is because of a hope I have to see compiler
>> builtins some day to be more capable, which I understand I shouldn't be
>> that hopeful about it, but I think the questions can be thought about
>> regardless.
>>
>> Thanks!
>>
>> On Tue, Feb 2, 2021 at 10:38 PM Siva Chandra <sivachandra at google.com>
>> wrote:
>>
>>> On Mon, Feb 1, 2021 at 2:58 AM Ebrahim Byagowi <ebraminio at gmail.com>
>>> wrote:
>>>
>>>> To describe in maybe some better way, for example this is all I need to
>>>> get ceilf floorf etc in a .wasm module being built by -nostdlib -nostdinc
>>>>
>>>> #define ceilf __builtin_ceilf
>>>> #define floorf __builtin_floorf
>>>> #define abs __builtin_abs
>>>> #define fabs __builtin_fabs
>>>>
>>>> so I wondered if I could get more of libc this way (parts make sense
>>>> ofc) or at least to know what will be the relation between those builtin
>>>> implementations and the upcoming libc.
>>>>
>>>
>>> IIUC, there are two parts to your question:
>>> 1. Can we implement a libc function as a macro resolving to a builtin:
>>> Not if the standard requires the function to be a real addressable
>>> function. One can choose to also provide a macro, but an addressable
>>> function declaration should be available. See section 7.1.4 of the C11
>>> standard for more information.
>>> 2. What is the difference between builtins and the libc flavors of the
>>> functions: Typically, builtins resolve to the hardware instruction
>>> implementing the operation. If a hardware implementation is not available,
>>> the compiler builtin calls into the libc itself. With respect to math
>>> functions, you will notice this wilh the `long double` flavors. That said,
>>> we have implemented the math functions from first principles (as in, the
>>> implementations do not assume any special hardware support) in LLVM libc.
>>> However, we are just about starting to add machine specific implementations
>>> (https://reviews.llvm.org/D95850). This should make the libc functions
>>> equivalent to the compiler builtins.
>>>
>> _______________________________________________
>> libc-dev mailing list
>> libc-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/libc-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/libc-dev/attachments/20210203/a6a98528/attachment.html>


More information about the libc-dev mailing list