[PATCH] D67867: [libc] Add few docs and implementation of strcpy and strcat.

Mon Sep 30 09:11:52 PDT 2019

jyknight added a comment.

In D67867#1687128 <https://reviews.llvm.org/D67867#1687128>, @theraven wrote:

> In D67867#1686112 <https://reviews.llvm.org/D67867#1686112>, @jyknight wrote:
>
> > In D67867#1686056 <https://reviews.llvm.org/D67867#1686056>, @hfinkel wrote:
> >
> > > Maybe everything is fine, but given this setup, does anyone see any potential problems with compiling these functions for nvptx? I'd like to eventually see a mode where we compile an appropriate subset of these functions for GPU targets -- either in bitcode form for linking with our device-side OpenMP runtime library or as a native object -- to provide a more feature-complete offloading environment.
> > >
> > > The one thing that caught by eye was the use of the section attribute, and I was curious what nvptx does with that. As far as I can tell, perhaps the answer is nothing.
> >
> >
> > Then I think this scheme won't work, since the point of the sections is to enable the creation of the global symbols post-build.
> >
> > E.g., I think the idea is that the main implementation defines the function with C++ name `__llvm_libc::strcpy(char *, const char *)`, and places the code in the `.llvm.libc.entrypoint.strcpy` section. And then another tool comes along and iterates the llvm.libc.entrypoint sections, and adds global symbol aliases for each one.
> >
> > That scheme feels probably over-complex, IMO, but I don't have an concrete counter-proposal in mind.
>
>
> For what it's worth, FreeBSD libc does a similar namespacing trick in C.  The internal symbols are underscore prefixed and they're exported as aliases (typically as weak aliases, to allow them to be preempted by other implementations, and to explicitly give names to callers for the preemptible and non-preemptible versions).  Making symbols preemptible isn't really possible with PE/COFF, because the linkage model has a stronger concept of where definitions come from than ELF (at least, in the absence of symbol versions in ELF).  On ELF platforms, we should support symbol versions as early as possible, because adding them later is an ABI break, even if we change no code.

Actually, now that I think I understand the existing proposal better, I believe it's broken, as well as confusing. It's getting the same effect as using `__attribute__((alias))`, except harder to understand. But it's not ok to have a single object file expose both a strong public alias and an internal alias, for any function that's not in the baseline ISO C standard. It would be OK if the aliases were weak, or if they were strong but exposed by a separate .o file. (In any case, I'd like to suggest not using an external objcopy invocation to achieve this.)

For example of why this is wrong -- consider if libc has an 'open.o' object file, which defines __llvm_libc::open, and has also had the alias 'open' added to it with objcopy. Internally, if libc needs to call open, it calls __llvm_libc::open, which pulls in that open.o file, which then also defines the global 'open' function. Then there thus be a duplicate symbol error for any Standard C (e.g. non-posix) program which defines its own open function.

================
Comment at: libc/docs/header_generation.md:21
+2. Replace the line on which a command occurs with some other text as directed
+by the command. The replacment text can span multiple lines.
+
----------------
theraven wrote:
> sivachandra wrote:
> > theraven wrote:
> > > This sounds like you will end up with only one set of headers per configuration, so you lose the ability to have different projects using the same generated headers but enforcing different sets of standards compliance in their use of the interfaces.
> > Yes, that is the general direction in which this is going. We are making the headers for a configuration much simpler to navigate at the cost of having multiple sets of headers. In this day and age, I do not think forcing multiple sets of header files is a bad thing. Note also that users' build systems already have the knowledge and capability to handle multiple configurations. Hence, we are not making the build systems any more complicated.
> > 
> > This is not what traditional libcs have done. So, yes we are introducing a "third mechanism". At the same time, one can also argue that we are doing away with such mechanisms as we require that each configuration have its own set of header files.
> I think that's fine when you consider building libc and shipping a single configuration, but a lot of projects that I've seen have different feature macros defined for different components.  Are they now expected to rebuild the libc headers multiple times for each module?  Do they need to drive that from their own build system (which is often not CMake)?  It's even more complex when a project contains C89, C11, and C++11 files - these all have subtly different sets of requirements for the functions exposed in libc headers: do we require that they build a set for each?  Or do you imagine that anyone shipping C11 will ship a powerset of headers?  
> 
> The reason that we don't do the separate header thing in libcs today is that we end up with a huge explosion of the set of things that are supported.  For example, in FreeBSD we support 3 versions of the C standard, 3 or 4 versions of POSIX, GNU and BSD extensions.  Almost any combination of these is allowed, so we're looking at 20-30 possible sets of header files, before we start considering restricted subsets for sandboxed applications, custom configurations for sanitisers, and so on.
IMO, it makes sense not to bother making C99/C11-only functions conditionally available. The libc headers still ought to be compatible in C89 mode, but I don't see that there's really much point to excluding declarations for new functions like 'strtof', 'aligned_alloc', etc, when building in older standards modes.

The same most likely can apply to old POSIX versions.

However, I do think it is quite likely to be necessary to preserve the ability to conditionally disable the various standards "layers". That is -- for all the headers specified in ISO C, you should be able to disable the declarations added by POSIX (and extensions) with a define. And for all the headers specified in POSIX, you should be able to disable the declarations added by the GNU/BSD/etc extensions with a define.

================
Comment at: libc/include/string.h:16
+#define __need_NULL // To get only NULL from stddef.h
+#include <stddef.h>
+
----------------
theraven wrote:
> Does this work correctly with the inclusion guards?  I don't see the `stddef.h` implementation here, so I don't know what those macros do (they don't do anything in FreeBSD libc's `stddef.h`, they appear to do something in libc++'s `cstddef`, though I'm not entirely sure what) .  
> 
> The FreeBSD solution so this is to define types like `__size_t`, use these in headers that are supposed to use `size_t` in function prototypes but not provide a definition of `size_t` (yes, there are several in the C spec, it's annoying but that's what the standard says), and then add a guarded typedef to turn that into `size_t` in this header, `stddef.h` and a couple of other places.
Both Clang and GCC ship a stddef.h which supports these defines -- and it's expected to be first in the include path, before libc's headers.

For some reason, freebsd and some other platforms remove these compiler-shipped files and replace them with their own for their libc.

Exactly what the contract should be between the compiler headers and the libc headers could be a larger discussion, but for now, I'm strongly in favor of assuming that we're using the existing stddef.h from clang/gcc -- in which case this code will work correctly.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67867/new/

https://reviews.llvm.org/D67867