[libc-dev] API generation

Petr Penzin via libc-dev libc-dev at lists.llvm.org
Tue Nov 19 18:20:08 PST 2019


Hi David,

Thanks for the answers, I am going to send a separate reply to Siva's 
message about API generation. There was a conversation about upstreaming 
WASI during Wasm CG meeting, I thought that with this effort eventually 
there would be all the pieces for a end-to-end Wasm toolchain (not only 
WASI-based, but JS as well) in LLVM.

+ Dan Gohman, in case he has any thoughts on this, as he maintains 
current WASI implementation

Best,

Petr

On 11/19/19 1:33 AM, David Chisnall via libc-dev wrote:
> Hi Petr,
>
> As I understand it, the WASI interface is now very close to CloudABI, 
> which is one of the use cases I was interested in. There are two 
> slightly conflated goals for the header generation that, I think, are 
> going to need deconflating in the future:
>
>  - Being able to support different sets of standards (e.g. pure C11, 
> POSIX, POSIX + GNU extensions +  BSD extensions) so that a compilation 
> unit can opt into only a subset of the required things.
>
>  - Being able to support different sets of standards so that an 
> implementation can ship a useful standards-compilant subset (e.g. just 
> C11 on a non-POSIX platform).
>
>  - Being able to support different subsets for builds with different 
> sets of available target abstractions.
>
> For legacy compatibility, the current WASI libc supports libpreload, 
> but it's often easier to support a Capsicum environment with a more 
> CloudABI-like interface that disallows all of the explicit global 
> namespace operations.  For WASI / Capsicum deployments, I would like 
> to be able to build a version of libc that exposes only CloudABI-like 
> symbols, so I get linker failures (that I can then fix) when I use 
> something that relies on access to the global namespace.
>
> The first and second of these look very similar, but the second and 
> third are the ones that share useful tooling.  Existing libc 
> implementations support the former with a load of macros to 
> conditionally expose things.  These are annoying to maintain 
> (particularly if, for example, a BSD extension is later standardised 
> in POSIX: you then need to rework the logic in the headers for 
> exposing them).  Ideally, we'd just add POSIX20 or whatever to the 
> list of standards and let the tool deal with it. For the first use 
> case, I think we will still end up needing conditional exposure via 
> macros, but that's easier to machine generate than to write by hand.
>
> For the second and third use cases, the goal in both cases is to make 
> subsetting easier.  We could later extend this with some static 
> analysis plugins that check for isolation (e.g. C11 can't depend on 
> POSIX, Capsicum-safe functions can't depend on non-Capsicum-safe 
> functions).
>
> The final benefit that we haven't really explored yet for header 
> generation is supporting different compiler annotations for API 
> contracts that are not expressible in standard C.  For example, the 
> Windows headers use SAL annotations to define in / out parameters, the 
> size of buffers, and so on.  There are GNU extensions for some of 
> these, but they often go in different places (e.g. as function 
> attributes with parameters that index a specific function parameter 
> versus parameter attributes).  If we encode the high-level contracts 
> in the TableGen, then we should be able to generate MS C and GNU C 
> variants of the same set of interfaces.
>
> The TableGen format lets us put a lot more metadata on the functions 
> and definitions than we would necessarily want to end up in any given 
> build of the headers.
>
> I agree that we are going to end up with TableGen files that are quite 
> complex, but I believe that we should end up with a cleaner separation 
> of concerns.  I have worked on a libc that did this manually, and 
> refactoring any of the macro code is very painful because it is all 
> very order-dependent and changes have non-local effects.  In the 
> TableGen world, the back end will parse all of the definitions, build 
> the dependency graph, and then generate the macros.  A change that 
> requires reworking macros across half a dozen files is not a problem 
> in this context.
>
> David
>
> On 18/11/2019 23:57, Petr Penzin via libc-dev wrote:
>> Hi,
>>
>> I work on WebAssembly, and I was hoping we would eventually use LLVM 
>> libc for end-to-end Wasm toolchain. I have some questions about 
>> "ground truth" approach to libc API. I am sorry if those have been 
>> asked, could not find the answers looking through mailing list 
>> messages and code reviews.
>>
>> http://lists.llvm.org/pipermail/libc-dev/2019-October/000003.html
>>
>> http://lists.llvm.org/pipermail/libc-dev/2019-October/000009.html
>>
>> I was wondering what does API generation buy for the developers and 
>> users. Maybe the question is how did previous implementations of libc 
>> get away without generating headers, but also is API generation a 
>> reasonable and foolproof solution.
>>
>> Most importantly, the motivation seems to be that there are a few 
>> potential standards a libc implementation needs to comply with. But 
>> how many substantially different APIs are there realistically? If it 
>> is in lower single digits, does this really make it worth the effort?
>>
>> Secondly, libc API is not only types and function prototypes, it 
>> typically includes depends on "feature test macros". I am not sure it 
>> is possible to gracefully support those in a generated API. Encoding 
>> test macros in API "ground truth" rules would make API rules as 
>> complex as C macro code they are trying to replace. Leaving test 
>> macros up to the C header files would result in a mix of preprocessor 
>> and rule logic which would probably be more confusing than going all 
>> the way in either (preprocessor or generation) direction.
>>
>> Finally, somewhat rhetorical point on precedent and expertise. There 
>> is enough precedent for a portable libc API written directly; 
>> likewise C/C++ developers can understand and modify C headers without 
>> ramp-up - not sure that can be said about tablegen. Writing header 
>> files is a relatively simple part of the development process and 
>> there is a lot of it happening inside and outside of LLVM.
>>
>>
>> Best,
>>
>> Petr
>>
>>
>> _______________________________________________
>> libc-dev mailing list
>> libc-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/libc-dev
>>
> _______________________________________________
> libc-dev mailing list
> libc-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/libc-dev


More information about the libc-dev mailing list