[libc-dev] API generation

Tue Nov 19 01:33:44 PST 2019

Hi Petr,

As I understand it, the WASI interface is now very close to CloudABI, 
which is one of the use cases I was interested in.  There are two 
slightly conflated goals for the header generation that, I think, are 
going to need deconflating in the future:

  - Being able to support different sets of standards (e.g. pure C11, 
POSIX, POSIX + GNU extensions +  BSD extensions) so that a compilation 
unit can opt into only a subset of the required things.

  - Being able to support different sets of standards so that an 
implementation can ship a useful standards-compilant subset (e.g. just 
C11 on a non-POSIX platform).

  - Being able to support different subsets for builds with different 
sets of available target abstractions.

For legacy compatibility, the current WASI libc supports libpreload, but 
it's often easier to support a Capsicum environment with a more 
CloudABI-like interface that disallows all of the explicit global 
namespace operations.  For WASI / Capsicum deployments, I would like to 
be able to build a version of libc that exposes only CloudABI-like 
symbols, so I get linker failures (that I can then fix) when I use 
something that relies on access to the global namespace.

The first and second of these look very similar, but the second and 
third are the ones that share useful tooling.  Existing libc 
implementations support the former with a load of macros to 
conditionally expose things.  These are annoying to maintain 
(particularly if, for example, a BSD extension is later standardised in 
POSIX: you then need to rework the logic in the headers for exposing 
them).  Ideally, we'd just add POSIX20 or whatever to the list of 
standards and let the tool deal with it.  For the first use case, I 
think we will still end up needing conditional exposure via macros, but 
that's easier to machine generate than to write by hand.

For the second and third use cases, the goal in both cases is to make 
subsetting easier.  We could later extend this with some static analysis 
plugins that check for isolation (e.g. C11 can't depend on POSIX, 
Capsicum-safe functions can't depend on non-Capsicum-safe functions).

The final benefit that we haven't really explored yet for header 
generation is supporting different compiler annotations for API 
contracts that are not expressible in standard C.  For example, the 
Windows headers use SAL annotations to define in / out parameters, the 
size of buffers, and so on.  There are GNU extensions for some of these, 
but they often go in different places (e.g. as function attributes with 
parameters that index a specific function parameter versus parameter 
attributes).  If we encode the high-level contracts in the TableGen, 
then we should be able to generate MS C and GNU C variants of the same 
set of interfaces.

The TableGen format lets us put a lot more metadata on the functions and 
definitions than we would necessarily want to end up in any given build 
of the headers.

I agree that we are going to end up with TableGen files that are quite 
complex, but I believe that we should end up with a cleaner separation 
of concerns.  I have worked on a libc that did this manually, and 
refactoring any of the macro code is very painful because it is all very 
order-dependent and changes have non-local effects.  In the TableGen 
world, the back end will parse all of the definitions, build the 
dependency graph, and then generate the macros.  A change that requires 
reworking macros across half a dozen files is not a problem in this 
context.

David

On 18/11/2019 23:57, Petr Penzin via libc-dev wrote:
> Hi,
> 
> I work on WebAssembly, and I was hoping we would eventually use LLVM 
> libc for end-to-end Wasm toolchain. I have some questions about "ground 
> truth" approach to libc API. I am sorry if those have been asked, could 
> not find the answers looking through mailing list messages and code 
> reviews.
> 
> http://lists.llvm.org/pipermail/libc-dev/2019-October/000003.html
> 
> http://lists.llvm.org/pipermail/libc-dev/2019-October/000009.html
> 
> I was wondering what does API generation buy for the developers and 
> users. Maybe the question is how did previous implementations of libc 
> get away without generating headers, but also is API generation a 
> reasonable and foolproof solution.
> 
> Most importantly, the motivation seems to be that there are a few 
> potential standards a libc implementation needs to comply with. But how 
> many substantially different APIs are there realistically? If it is in 
> lower single digits, does this really make it worth the effort?
> 
> Secondly, libc API is not only types and function prototypes, it 
> typically includes depends on "feature test macros". I am not sure it is 
> possible to gracefully support those in a generated API. Encoding test 
> macros in API "ground truth" rules would make API rules as complex as C 
> macro code they are trying to replace. Leaving test macros up to the C 
> header files would result in a mix of preprocessor and rule logic which 
> would probably be more confusing than going all the way in either 
> (preprocessor or generation) direction.
> 
> Finally, somewhat rhetorical point on precedent and expertise. There is 
> enough precedent for a portable libc API written directly; likewise 
> C/C++ developers can understand and modify C headers without ramp-up - 
> not sure that can be said about tablegen. Writing header files is a 
> relatively simple part of the development process and there is a lot of 
> it happening inside and outside of LLVM.
> 
> 
> Best,
> 
> Petr
> 
> 
> _______________________________________________
> libc-dev mailing list
> libc-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/libc-dev
>