[llvm-dev] Crefl - a Clang plug-in and C-type-reflection-API

Sun May 9 19:51:22 PDT 2021

Hi Folks,

Writing to let you know about a component I have been working on that 
could be of interest to the LLVM and Clang community:

 > "Crefl - a Clang plug-in and C-type-reflection-API"

The Crefl API and plugin provide access to runtime reflection metadata 
for C interfaces supporting arbitrarily nested combinations of: 
intrinsic, enum, struct, union, field, array, constant, and function.

Crefl focuses on addressing the following areas:

- a clang plug-in that outputs portable reflection metadata.
- a reflection database format for portable reflection metadata.
- an API that provides task-oriented access to reflection metadata.

I am aware that the C++ standards committee is focusing on compile-time 
type reflection for C++ and I am aware of similar work in-tree for Clang 
AST serialization, this work is intended to be complementary. Crefl is a 
small runtime dependency and I am focusing on a portable format and 
exposing reflection metadata to C. Crefl itself is written in C++.

The Crefl plugin is nearing beta level interface stability and 
reliability. I have been ironing out bugs and API usability issues. 
There is now a rudimentary reflection metadata linker that performs 
de-duplication using recursive tree hash sums. Linking still needs work 
regards incomplete types, modules, name indexing and namespaces.

The Crefl repo contains samples/example2_embed which shows reflection 
metadata embedding into a binary. An ASN.1 implementation exists in the 
tree and I am currently working on structure packing and alignment. The 
intention is to write an ASN.1 serializer for C structures and then use 
that to read and write the reflection metadata itself. The metadata is 
currently stored using C native structure packing and alignment. The 
reflection API is nearly complete and I am now starting on structure 
serialization which is the major planned use of the API.

There exists a cmake macro that handles invoking the crefl plugin, 
merging the metadata and embedding it into a linkable object file.

     include(cmake/crefl_macro.cmake)

     add_executable(example2_embed samples/example2_embed/main.c)
     crefl_target_reflect(example2_embed example2_embed_refl)
     target_link_libraries(example2_embed example2_embed_refl cmodel)

Here is a sample showing how to access the embedded metadata:

     int main(int argc, const char **argv)
     {
         decl_db *db = crefl_db_new();
         crefl_db_read_mem(db, __crefl_main_data, __crefl_main_size);

         size_t nsources = 0;
         crefl_archive_sources(crefl_root(db), NULL, &nsources);
         assert(nsources == 1);
         decl_ref *_sources = calloc(nsources, sizeof(decl_ref));
         assert(_sources);
         crefl_archive_sources(crefl_root(db), _sources, &nsources);

         size_t ntypes = 0;
         crefl_source_decls(_sources[0], NULL, &ntypes);
         decl_ref *_types = calloc(ntypes, sizeof(decl_ref));
         assert(_types);
         crefl_source_decls(_sources[0], _types, &ntypes);

         for (size_t i = 0; i < ntypes; i++) {
             _print(_types[i], 0);
         }

         crefl_db_destroy(db);
     }

I would also like to implement a reference counted allocator wrapper 
providing for serialization of arbitrary C object graphs. Handling 
arrays would need some sort of `alloc(T)(n)` for typed array buffers 
perhaps using negative array indices to find object metadata containing 
count. That would be to support serialization of strings and arrays ...

   ((struct _alloc*)ptr)[-1].rc

... using some structures to support reference counting:

   struct _alloc { size_t count; dtor_notify_t dtor_notify; rc_t rc; };
   struct _ref { void* obj; size_t base; }
   struct _weakref { void* obj; size_t base; dtor_notify_t dtor_notify; }

Support for arrays and references has not been implemented. To keep the 
memory overhead down it may be necessary to compress array dimensions 
using a scheme similar to ASN.1 object identifiers, a reverse LEB 
encoding or the sds string library scheme for compressed array sizes.

The idea to use destructor notifiers is to support zeroing weak 
references, and to avoid needing to maintain a secondary weak reference 
count and separate allocations for the reference count. This assumes 
code using this library to serialize object graphs would use a reference 
counting object allocator consistently. i.e. a shared_from_this would 
conceptually be a no-op and references could degrade to pointers. The 
challenge would be cramming what we need into say 16 bytes and 
minimizing alloc/unref overhead. destructor notifiers might need 32-bit 
relative addresses to sufficiently compress function pointers. One might 
even need the support of some special relocations in the linker.

The ultimate goal is to expose something like the following to C:

   obj_write(T)(stream, obj)
   obj_read(T)(stream, obj)

and a reference counting interface with destructor notification:

   obj_alloc(T)(obj)
   obj_ref(T)(obj)
   obj_unref(T)(obj)
   obj_dtor_notify(T)(obj,target,func)
   obj_dtor_denotify(T)(obj,target,func)
   obj_weakref(T)(obj)
   obj_weakunref(T)(obj)

The plan is to make something that can be used from C++ as a foundation 
for simulation object state serialization, but also making sure that 
components using this architecture can be written in C. It could be that 
we model simplified classic inheritance and Rust or Zig style traits and 
interfaces, the main rationale being that whatever we implement, we 
expose a mapping to C. The ideas regards arrays and references are still 
somewhat sketchy. Might be that we need compiler support for references, 
bounded arrays and closure scoped destructors in C. Pie in the sky?

In any case, this is mainly a heads up to see if folk are interested and 
would care to give feedback or collaborate. The Git repository is here:

- https://github.com/michaeljclark/crefl/

Regards,
Michael