[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

Tue May 14 04:42:24 PDT 2019

gchatelet added a subscriber: tejohnson.
gchatelet added a comment.

In D61634#1500453 <https://reviews.llvm.org/D61634#1500453>, @efriedma wrote:

> My main blocker is that I want to make sure we're moving in the right direction: towards LLVM IR with clear semantics, towards straightforward rules for writing freestanding C code, and towards solutions which behave appropriately for all targets.  There's clearly a problem here that's worth solving, but I want to make sure we're adding small features that interact with existing features in an obvious way.  The patch as written is adding a new attribute that changes the semantics of C and LLVM IR in a subtle way that interacts with existing optimizations and lowering in ways that are complex and hard to understand.

This makes a lot of sense, I'm totally on board to reduce entropy and confusion along the way.

> I don't want to mix general restrictions on calling C library functions, with restrictions that apply specifically to memcpy: clang generally works on the assumption that a "memcpy" symbol is available in freestanding environments, and we don't really want to change that.
> 
> With -fno-builtin, specifically, we currently apply the restriction that optimizations will not introduce memcpy calls that would not exist with optimization disabled.  This is sort of artificial, and and maybe a bit confusing, but it seems to work well enough in practice.  gcc does something similar.
> 
> I don't really want to add an attribute that has a different meaning from -fno-builtin.  An attribute that has the same meaning is a lot less problematic: it reduces potential confusion, and solves a related problem that -fno-builtin currently doesn't really work with LTO, because we don't encode it into the IR.

Adding @tejohnson to the conversation.

IIUC you're offering to introduce something like `__attribute__((no-builtin-memcpy))` or `__attribute__((no-builtin("memcpy")))`.
As attributes they would compose nicely with (Thin)LTO.

I believe we still want to turn `-fno-builtin` flags into attributes so there's no impedance mismatch between the flag and the attribute right?

> Your current patch is using the "AlwaysInline" flag for SelectionDAG::getMemcpy, which forces a memcpy to be lowered to an inline implementation.  In general, this flag is only supported for constant-size memcpy calls; otherwise, on targets where EmitTargetCodeForMemcpy does not have some special lowering, like the x86 "rep;movs", you'll hit an assertion failure.  And we don't really want to add an implementation of variable-length memcpy for a lot of targets; it's very inefficient on targets which don't have unaligned memory access.  You could try to narrowly fix this to only apply "AlwaysInline" if the size is a constant integer, but there's a related problem: even if a memcpy is constant size in C, optimizations don't promise to preserve that, in general.  We could theoretically add such a promise, I guess, but that's awkward: it would conditionally apply to llvm.memcpy where the parameter is already const, so it's not something we can easily verify.

Fair enough. This patch was really to get the conversation started : I was myself not especially convinced about the approach. Hijacking the `AlwaysInline` parameter was a shortcut that would not work for other mem function anyways.

> If we added a new builtin `llvm.memcpy.inline`, or something like that, the end result ends up being simpler in many ways. It has obvious rules for both generating it and lowering it, which don't depend on attributes in the parent function.  We could mark the size as "immarg", so we don't have to add special optimization rules for it.   And if we have it, we have a "complete" solution for writing memcpy implementations in C: we make `__builtin_memcpy`, or a new `__builtin_inline_memcpy`, produce it, and it can be combined with -fno-builtin to ensure we don't produce any calls to the "real" memcpy.  The downside of a new builtin, of course, is that we now have two functions with similar semantics, and potentially have to fix a bunch of places to check for both of them.

This was one of the approaches I envisioned, it's much cleaner and it's also a lot more work : )
I'm willing to go that route knowing that it would also work for @theraven's use case.

> MemCpyOpt has been mentioned (the pass which transforms memcpy-like loops into llvm.memcpy).  If we want it to perform some transform in circumstances where the call "memcpy" isn't available, we have to make sure to restrict it based on the target: in the worst case, on some targets without unaligned memory access, it could just act as a low-quality loop unroller.  This applies no matter how the result is actually represented in IR.

Yes if we have to generate loops it need to happen before SelectionDAG.

----

In this framework `-ffreestanding` stays as it is - namely it implies `-fno-builtin`. I still think that the semantic is somewhat surprising and unclear to the newcomer but I guess we can't do anything about it at this point - apart adding more documentation.

Lastly, if we are to introduce new IR intrinsics, how about adding some for `memcmp` and `bcmp`? It does not have to be part of this patch but I think it's worth considering for consistency.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634