[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

Fri May 10 05:57:05 PDT 2019

gchatelet added a comment.

In D61634#1494518 <https://reviews.llvm.org/D61634#1494518>, @alexandre.isoard wrote:

> I'm not convinced by the approach.
>
> Can it still recognize the loop idiom into memcpy implementation but use `memmove` (as only `memcpy` has been blacklisted)?

Yes it can and it's fine, passing `src` and `dst` arguments as `__restrict` will ensure that `memcpy` is the function to choose in this case.
This attribute is not to be used widely but is directed towards library maintainer that already know what the compiler is allowed to do.

> Can it do the same for `memmove` (for the case without overlap), and end-up with infinite recursion?

I fail to see how this would happen. Could you craft some pseudo code that would exhibit the infinite recursion?
My take on it is that if the compiler can't generate the code it's totally OK to fail with an error message.

The purpose of this RFC it to get some help from the compiler to generate the best code for some building blocks (say `copy 4 bytes`, `copy 8 bytes`) and assemble them in a way that maximizes something (code size, runtime for certain parameter distributions).
It would be useful only to libc / libm / compiler-rt implementers to build these functions on top of smaller functions that we know the compiler can produce at compile time.

I don't think we want to use this attribute to generate fully the `memcpy`. I added the expansion into a `rep;movsb` for variable sized memcpy only because it's feasible on x86. It's preferable in this case since it has the correct semantic and prevents a compilation error but I would be fine with a compilation error here.

> I have a feeling we should pick a stance:
> 
> - either not allow the compiler to lower a builtin to a call to the library; (my preferred choice, but I'm biased)

>From the point of view of LLVM `@llvm.memcpy` intrinsics has the semantic of `memcpy` it does not know if it comes from a lowering in the frontend (e.g. passing structures by value) or from a built in. 
I believe this is feasible although I fail to see where my proposal falls short. Can you show a code example where the current proposal is problematic?

> - or not allow the library to use compiler builtins (but LTO flow with the runtime library *already* linked smells like trouble if we go this way).

This is currently generating very poor code because `-fno-builtin` prevents LLVM from understanding the copy semantic.

> The reason for my bias is that I have a multi-memcpy codegen in the compiler to generate those two calls:
> 
>   memcpy(left,  in, n);
>   memcpy(right, in, n);
> 
> 
> with a single loop.

I don't quite understand how this is linked to the issue at hand. Can you provide more context? Pointers to code?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634