[PATCH] D67867: [libc] Add few docs and implementation of strcpy and strcat.

Sun Sep 29 00:39:28 PDT 2019

theraven added inline comments.

================
Comment at: libc/docs/header_generation.md:21
+2. Replace the line on which a command occurs with some other text as directed
+by the command. The replacment text can span multiple lines.
+
----------------
sivachandra wrote:
> theraven wrote:
> > This sounds like you will end up with only one set of headers per configuration, so you lose the ability to have different projects using the same generated headers but enforcing different sets of standards compliance in their use of the interfaces.
> Yes, that is the general direction in which this is going. We are making the headers for a configuration much simpler to navigate at the cost of having multiple sets of headers. In this day and age, I do not think forcing multiple sets of header files is a bad thing. Note also that users' build systems already have the knowledge and capability to handle multiple configurations. Hence, we are not making the build systems any more complicated.
> 
> This is not what traditional libcs have done. So, yes we are introducing a "third mechanism". At the same time, one can also argue that we are doing away with such mechanisms as we require that each configuration have its own set of header files.
I think that's fine when you consider building libc and shipping a single configuration, but a lot of projects that I've seen have different feature macros defined for different components.  Are they now expected to rebuild the libc headers multiple times for each module?  Do they need to drive that from their own build system (which is often not CMake)?  It's even more complex when a project contains C89, C11, and C++11 files - these all have subtly different sets of requirements for the functions exposed in libc headers: do we require that they build a set for each?  Or do you imagine that anyone shipping C11 will ship a powerset of headers?  

The reason that we don't do the separate header thing in libcs today is that we end up with a huge explosion of the set of things that are supported.  For example, in FreeBSD we support 3 versions of the C standard, 3 or 4 versions of POSIX, GNU and BSD extensions.  Almost any combination of these is allowed, so we're looking at 20-30 possible sets of header files, before we start considering restricted subsets for sandboxed applications, custom configurations for sanitisers, and so on.

================
Comment at: libc/docs/implementation_standard.md:79
+a post build step. For example, for the `round` function, one can use `objcopy`
+to add an alias symbol as follows:
+
----------------
sivachandra wrote:
> theraven wrote:
> > There are a few things that are unclear to me in this description:
> > 
> > 1. How do we express the standards to which an entrypoint conforms?  For example, a function defined in C11 or POSIX2008?
> > 2. How do we differentiate between things that we want to be preemptible versus things that we don't?  If we want to call the preemptible version of a symbol in other libc code, will we have the `::foo` symbol visible at library build time?
> > 2. How are we exposing information for building subsets of the implementation that avoid dependencies on certain platform features?  For example, a CloudABI-compatible mode that does not provide (or consume) any functions that touch the global namespace.
> Answers as per numbers in the comment.
> 
> 1. We can do it in the public header file, or in the implementation .cpp files. Or, at both the places. Did I understand this question correctly? Is this question related to or similar to #3 below? Like, are you asking as to how we will add a new function without breaking the old standard? If yes, then the %%include mechanism is present to accommodate such scenarios: we start with a baseline standard and %%include new standards until they become baseline.
> 2. I frankly do not have a good answer and would prefer someone who cares about this use case to contribute. May be a real world example can help me think about this more clearly.
> 3. For the header files, we have the %%include_file mechanism. For the library files, we pick and choose to compose a suitable library target. For example, like the one in lib/CMakeLists.txt of this patch.
> 
> At some level, my answers are only guessing about how things would evolve. So, I wouldn't be surprised if my answers here aren't valid or relevant even in say 3 months from now.
Compliance with overlapping standards is one of the core reasons that we make symbols preemptible in existing libc implementations.  C reserves a set of identifiers for the C standard and a C89 program is free to use any other identifier.  If POSIX2008 or C11 use those identifiers then libc should not call their implementations from internal code, but should allow theirs to be called.  Pthreads is subtly different, where users are allowed to bring their own pthreads implementation and libc should correctly consume it.

The third question is more in terms of layering.  For example, I recently tried building libc++ to work in kernel space.  This is incredibly hard, because a lot of things depend on locale, for example, and pull in iostream dependencies, so you end up needing a load of things that have no real meaning in kernel space.  The same is true for sandboxed environments, where things like `open` may not exist (though `openat`) may, (important when something like locale support in libc needs to open files: for a sandboxed deployment we should support either baking those files into the binary or not expose the symbols that depend on their working correctly).

If we don't start out with some declarative definitions of things like C89 / C99 / POSIX2008 compliance, then any kind of automatic tooling to generate a pure C11 library (no POSIX) or to ensure that the correct symbols are preemptible will be very hard.  

================
Comment at: libc/include/math.h:20
+
+long double acosl(long double);
+
----------------
sivachandra wrote:
> theraven wrote:
> > The `long long` functions should be exposed only for C99 or later and a version of C++ that supports the `long long` type.  
> We are C17 and higher already. Should we still have such conditions?
We are building as C++17.  There is no C17.  I would hope that we're still aiming for the headers to be consumed by C89 programs, because there are a huge number of those in the world.  

================
Comment at: libc/include/string.h:16
+#define __need_NULL // To get only NULL from stddef.h
+#include <stddef.h>
+
----------------
Does this work correctly with the inclusion guards?  I don't see the `stddef.h` implementation here, so I don't know what those macros do (they don't do anything in FreeBSD libc's `stddef.h`, they appear to do something in libc++'s `cstddef`, though I'm not entirely sure what) .  

The FreeBSD solution so this is to define types like `__size_t`, use these in headers that are supposed to use `size_t` in function prototypes but not provide a definition of `size_t` (yes, there are several in the C spec, it's annoying but that's what the standard says), and then add a guarded typedef to turn that into `size_t` in this header, `stddef.h` and a couple of other places.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67867/new/

https://reviews.llvm.org/D67867