[libc-commits] [libc] [libc] Make hdrgen support macro_header YAML field. (PR #123265)
Roland McGrath via libc-commits
libc-commits at lists.llvm.org
Thu Feb 13 16:03:33 PST 2025
frobtech wrote:
> Ah, so this is more of "add checks to generated headers that if a given macro isn't defined as expected, we will `#error` out.
>
> Coincidentally, I just reviewed a patch to bionic where I learned that they have dedicated header tests (seemingly only for POSIX IIUC): https://android.googlesource.com/platform/bionic/+/main/tests/headers/posix/
>
> So I guess those are two different means of testing the generated headers. The trade off being that with the approach in this PR, our headers are bigger; we pay that cost of the preprocessor checking, rechecking, rechecking, ... for every compile, regardless of whether or not we just mean to do a sanity check.
That was really an incidental feature. I've removed it now, because I concur that having the public headers do this self-test logic is not really the best way to do it. We can get the same automation in testing by having a new mode for hdrgen to produce test sources to perform these things. But we can make that a later addition.
> Privately, you seemed to indicate that your had more patches that build on top of this PR. Can you describe a little more how you plan to extend this feature?
I do have some additional changes ready, and I'm considering more along these lines. The overall thrust is to get more of the information to have its single source of truth in YAML and thus be machine-usable and machine-checkable; and also to reduce redundant information that has to be hand-maintained in general. To those ends, my next changes are:
* Deduce some of the `types` list from the function signatures so that maintainers don't need to keep the `types` list up to date wrt to the signatures used, only wrt the formal API contract of types a header is specifically meant to define per some standard.
* Get all `#include` lines to be generated from YAML in some fashion rather than hand-maintained in `.h.def` files. One motivation for this is the very easy-to-make errors about `#include "llvm-..."` vs `#include "../llvm-..."` and the like, as well as just general typos and unnecessary or omitted includes. The introduction of `macro_header` is the first step in this direction. I'd like to do more.
* An include/merge mechanism for YAML files so that we can have a single source of truth (in only one YAML file) for each function signature, even for cases where the (standard or de facto) API contract has certain functions declared in multiple headers (which aren't supposed to include each other); the current motivating example is the `malloc` suite in `stdlib.h` and `malloc.h`. This could also be addressed by generating more separate "internal" headers that each public header would `#include` for those declarations. But so far it seems easier in various regards (both for us and for users) to just generate the same function declarations in multiple headers.
* With those features, I think we can get rid of many of the `.h.def` files entirely and have only the YAML that has to be maintained manually. More features may be required to eliminate more of them, and probably a few oddball cases like `assert.h` will need to keep custom `.h.def` files because they don't quite fit any of the usual patterns.
In this vein, I'd like to do more to get more information maintained solely in YAML and machine-usable over time. These aren't on my critical path, which right now is frankly just about producing a sufficient `malloc.h` in a way that doesn't feel icky to maintain. But I want to drive our long-term direction on header (and other) maintenance in this direction. The documentation is something you've already been interested in folding into this regime. I'd like to do more. For example, getting more thorough about the `standards` lists not only for the headers but at the individual symbol granularity within headers. As well as annotating generated documentation about portability status of specific APIs, this help us generate standards-conformance test sorts of things (for standard identifier name space constraints and the like). If we choose to one day generate headers using fine-grained feature-test macro based conformance to different standards or standard editions that hide/expose some symbols, then a thoroughly-specified and fine-grained source of truth in YAML could allow us to do that entirely automatically, etc.
> Otherwise, I think I mildly prefer bionic's approach of just having unit tests. We already have this concept in the tests under `libc/test/include/`. I don't see why this could just be:
>
> ```c
> #ifndef RTLD_LAZY
> #error "WTF"
> #endif
> #ifndef ...
> ```
>
> which is effectively what the bionic tests are doing. That way, we keep our generated headers smaller (FWIW), [...]
I agree separate tests are better. Unlike normal unit tests, this kind of test is entirely mechanical and contains no semantic checking. That means that manual maintenance of such tests is especially likely to have cut&paste and omission errors that make it not test as much as intended. So I think it's better to generate these tests from the YAML files as the single source of truth of who is supposed to define what.
> [...] and can remove macro_name/macro_value support from hdrgen (#124957). cc @enh-google
I am not in favor of that direction. As I outlined above, I want to push very much in the direction of having the sources of truth be more in YAML, not less--at least for the "what is the API contract?" truths. For macros, so far that's just the name. There are some macros where the value is also standards-required, such as the `_POSIX_*` constants in `<limits.h>` under POSIX. Eventually for those I think it will make sense to have the source of truth for the value be part of the "API spec" in YAML, though I'm not worrying about that right now. As I mentioned above for all kinds of symbols, I want the "what standard requires it" bit to go here too. But right now, we have `macro_name` as the "it is part of the API" indicator, and I don't want to lose that. For many of the standard-specified macros, they specify something about the C type of the expression, and that's something we should probably encode here as well for the benefit of generated conformance tests.
The `foo-macros.h` pattern is used fairly commonly, but it's not universal. I also don't think it's necessarily the best pattern. If we do eventually go for fine-grained (generated) conditionals on what names will be defined by a header, then for macros whose values are implementation details (and perhaps not the same for all llvm-libc configurations) we may want to go to a per-macro (or per-few-macros) header model as we do for types. I don't know what we'll want to do, but I think the direction here with the `macro_header` gives us the freedom to explore that in the ways we might want to.
https://github.com/llvm/llvm-project/pull/123265
More information about the libc-commits
mailing list