[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

Fri May 10 05:07:17 PDT 2019

gchatelet added a comment.

In D61634#1493927 <https://reviews.llvm.org/D61634#1493927>, @efriedma wrote:

> I would be careful about trying to over-generalize here.  There are a few different related bits of functionality which seem to be interesting, given the discussion in the llvm-dev thread, here, and in related patches:

Thx for the feedback @efriedma, I don't fully understand what you're suggesting here so I will try to reply inline.

> 1. The ability to specify -fno-builtin* on a per-function level, using function attributes.

`-fno-builtin*` is about preventing clang/llvm from recognizing that a piece of code has the same semantic as a particular IR intrinsic, it has nothing to do with preventing the compiler from generating runtime calls.

- `fno-builtin` is about transformation from code to IR (frontend)
- The RFC is about the transformation from IR to runtime calls (backend)

> 2. Improved optimization when -fno-builtin-memcpy is specified.

I don't see this happening because if `-fno-builtin-memcpy`is used, clang (frontend) might already have unrolled and vectorized the loop, It is then very hard - by simply looking at the IR - to recognize that it's a `memcpy` and generate good code (e.g. https://godbolt.org/z/JZ-mR0)
Here we really want the compiler to understand that we are copying memory (i.e. this is really `@llvm.memcpy` semantic) but we want to prevent it from calling the runtime.

> 3. The ability to avoid calls to memcpy for certain C constructs which would naturally be lowered to a memcpy call, like struct assignment of large structs, or explicit calls to __builtin_memcpy().  Maybe also some generalization of this involving other libc/libm/compiler-rt calls.

I believe very few people will use the attribute described in the RFC, it will most probably be library maintainers that already know a good deal of how the compiler is allowed to transform the code.

> 4. The ability to force the compiler to generate "rep; movs" on x86 without inline asm.

This is not strictly required - at least this is not too useful from the purpose of building memcpy functions (more on this a few lines below).

> It's not clear to me that all of this should be tied together.  In particular, I'm not sure -fno-builtin-memcpy should imply the compiler never generates a call to memcpy().

As a matter of fact, those are not tied together. There are different use cases with different solutions, the one I'm focusing on here is about preventing the compiler from synthesizing runtime calls because we want to be able to implement them directly from C / C++.
It is orthogonal to having the compiler recognize a piece of code as an IR intrinsic.

> On recent x86 chips, you might be able to get away with unconditionally using "rep movs", but generally an efficient memcpy for more than a few bytes is a lot longer than one instruction, and is not something reasonable for the compiler to synthesize inline.

Well it depends. On Haswell and particularly Skylake it's hard to beat rep;movsb for anything bigger than 1k, be it aligned or not.
On other architectures and especially on the ones without ERMSB you have different strategies. Actually this is the very goal of this RFC: if you can inline or use PGO you can do a much better job for small sizes than calling libc's memcpy or inserting `rep;movsb`.

> If we're adding new IR attributes here, we should also consider the interaction with LTO.

Yes this is a very different story, that's why I'm not exploring this route. It's rather possible that it would come with a high maintenance cost as well.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634