[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

Mon May 13 13:14:14 PDT 2019

efriedma added a comment.

My main blocker is that I want to make sure we're moving in the right direction: towards LLVM IR with clear semantics, towards straightforward rules for writing freestanding C code, and towards solutions which behave appropriately for all targets.  There's clearly a problem here that's worth solving, but I want to make sure we're adding small features that interact with existing features in an obvious way.  The patch as written is adding a new attribute that changes the semantics of C and LLVM IR in a subtle way that interacts with existing optimizations and lowering in ways that are complex and hard to understand.

I don't want to mix general restrictions on calling C library functions, with restrictions that apply specifically to memcpy: clang generally works on the assumption that a "memcpy" symbol is available in freestanding environments, and we don't really want to change that.

With -fno-builtin, specifically, we currently apply the restriction that optimizations will not introduce memcpy calls that would not exist with optimization disabled.  This is sort of artificial, and and maybe a bit confusing, but it seems to work well enough in practice.  gcc does something similar.

I don't really want to add an attribute that has a different meaning from -fno-builtin.  An attribute that has the same meaning is a lot less problematic: it reduces potential confusion, and solves a related problem that -fno-builtin currently doesn't really work with LTO, because we don't encode it into the IR.

Your current patch is using the "AlwaysInline" flag for SelectionDAG::getMemcpy, which forces a memcpy to be lowered to an inline implementation.  In general, this flag is only supported for constant-size memcpy calls; otherwise, on targets where EmitTargetCodeForMemcpy does not have some special lowering, like the x86 "rep;movs", you'll hit an assertion failure.  And we don't really want to add an implementation of variable-length memcpy for a lot of targets; it's very inefficient on targets which don't have unaligned memory access.  You could try to narrowly fix this to only apply "AlwaysInline" if the size is a constant integer, but there's a related problem: even if a memcpy is constant size in C, optimizations don't promise to preserve that, in general.  We could theoretically add such a promise, I guess, but that's awkward: it would conditionally apply to llvm.memcpy where the parameter is already const, so it's not something we can easily verify.

If we added a new builtin `llvm.memcpy.inline`, or something like that, the end result ends up being simpler in many ways. It has obvious rules for both generating it and lowering it, which don't depend on attributes in the parent function.  We could mark the size as "immarg", so we don't have to add special optimization rules for it.   And if we have it, we have a "complete" solution for writing memcpy implementations in C: we make `__builtin_memcpy`, or a new `__builtin_inline_memcpy`, produce it, and it can be combined with -fno-builtin to ensure we don't produce any calls to the "real" memcpy.  The downside of a new builtin, of course, is that we now have two functions with similar semantics, and potentially have to fix a bunch of places to check for both of them.

MemCpyOpt has been mentioned (the pass which transforms memcpy-like loops into llvm.memcpy).  If we want it to perform some transform in circumstances where the call "memcpy" isn't available, we have to make sure to restrict it based on the target: in the worst case, on some targets without unaligned memory access, it could just act as a low-quality loop unroller.  This applies no matter how the result is actually represented in IR.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634