[PATCH] D124221: Add new builtin __builtin_reflect_struct.
Erich Keane via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Mon Apr 25 06:20:41 PDT 2022
erichkeane added a comment.
In D124221#3470550 <https://reviews.llvm.org/D124221#3470550>, @aaron.ballman wrote:
> In D124221#3469251 <https://reviews.llvm.org/D124221#3469251>, @rsmith wrote:
>> In D124221#3468706 <https://reviews.llvm.org/D124221#3468706>, @aaron.ballman wrote:
>>> Thank you for looking into this! I've also thought that the `__builtin_dump_struct` implementation hasn't been quite as robust as it could be. However, the primary use case for that builtin has been for kernel folks to debug when they have no access to an actual debugger -- it being a super limited interface is actually a feature rather than a bug, in some ways.
>> I'm more concerned about the design of `__builtin_dump_struct` than the implementation -- we can always fix the implementation. Right now the design is hostile to any use other than the very simplest one where the function you give it is essentially `printf`. You can't even dump to a log file or to a string without having to fight the design of the builtin. And if you want anything other than the exact recursion and formatting choices that we've arbitrarily made, you're out of luck. And of course, if you're in C++, you'll want to be able to print out `std::string` and `std::optional<int>` and similar types too, which are not supported by the current design at all. Perhaps this is precisely what a specific set of kernel folks want for a specific use case, but if the only purpose is to address that one use case, then I don't really see how we can justify keeping this builtin in-tree. So I think we should either expand the scope beyond the specific kernel folks or we should remove it.
> Yes, I agree that the current builtin is limited and not particularly well designed for general use. I would be fine removing support for it. I'd be happier augmenting it so it works for more use cases. However, I don't think what you've got here so far "expands the scope" so much as "moves the goalposts" -- if the replacement can't be used by the kernel folks, then we're losing the primary use case we have for that builtin. However, it sounds like you may have some ideas on that so we can make it more usable in C.
In general I've not been a fan of `__builtin_dump_struct` (I wasn't paying attention when it got merged), but I was assured in the meantime that it was JUST for printf-debugging in places without a debugger. I found its inflexibility, changing format, and inconsistent functionality to be a 'feature' at that point, as it would discourage non-printf-debugging use cases. BUT I guess you're seeing Hyrum's law in action :D It would be a shame to LOSE the builtin, since it is apparently REALLY useful in the debugging case, but I definitely don't think I would have +1'ed it if I were to see/understand it as a new addition.
>>> I think what you're designing here is straight-up reflection, which is a different use case. If we're going to get something that's basically only usable in C++, I'm not certain there's much value in adding builtins for reflection until WG21 decides what reflection looks like, and then we can design around that. (Personally, I think designing usable reflection interfaces for C would be considerably harder but would provide considerably more benefit to users given the lack of reflection capabilities. There's almost no chance WG14 and WG21 would come up with the same interfaces for reflection, so I think we've got some opportunity for exploration here.)
>> What I set out to build is something that lets you implement struct dumping without all the shortcomings I see in `__builtin_dump_struct`. I think it's inevitable that that ends up being substantially a struct reflection system, and indeed `__builtin_dump_struct` is a reflection system, too, just a really awkward one -- people have already started parsing its format strings to extract struct field names, for example. (The usage I've seen actually *is* struct dumping, but that usage can't use `__builtin_dump_struct` directly because of its limitations, so `__builtin_dump_struct` is just used to extract the field names, and the field values and types are extracted via structured bindings instead.)
> Agreed that the current builtin is effectively (really bad) reflection as well, and I'm sad to hear people are trying to parse its output to get structure field names (we have a JSON AST dump for exactly that kind of situation, so I'm not super sympathetic to people using builtin dump struct in that way unless they also need the runtime value information of the fields).
I agree completely, over the weekend I also came to the conclusion that any attempt to do ANYTHING in this space is likely going to be used as a poor-man's reflection. At that point, I consider the awkwardness to be a bit of a 'feature' here, as I'd hope anyone trying to use it for its not-intended-purpose would at one point pause thinking whether they COULD, and think briefly whether they SHOULD.
>> I do agree that we shouldn't be providing a full reflection mechanism here, given both that one is coming anyway, and that whatever we design, people will inevitably ask for more, and we don't want to be maintaining our own reflection technology.
I VERY much don't want to be spending cycles maintaining our own reflection, part of why I am/was so concerned with `__builtin_reflect_struct`. Though, I'm coming to terms with the fact that `__builtin_dump_struct` _IS_ 'crappy reflection' already.
>> So, I think we should either roll back `__builtin_dump_struct` or fix forward. This patch attempted to do the latter, but maybe it's gone too far in the direction of reflection. I think we can address most of my concerns with `__builtin_dump_struct` without a new builtin, by incorporating things from this implementation as follows:
>> - Desugar it in Sema rather than in CodeGen -- this is necessary to enable following things as well as for constant evaluation
I agree here, I found the implemented-in-codegen bizarre and limiting, so even this would be an improvement without the others.
>> - Take any callable as the printing function and pass it the fields with their correct types, so that a C++ implementation can dispatch based on the type and print the values of types that we don't hard-code (eg, we'd generate calls like `user_dump_function("%s = %p", "field_name", &p->field_name);`, and a smart C++ formatting function can preserve the field type and do something suitable when formatting a corresponding `%p`).
> +1, though I'm curious what we want to do about things like bit-fields (do we want to alert the caller that it's a bit-field and what the width is?), anonymous bit-fields, base classes vs fields of class type, etc.
I suspect we'd want some sort of 'depth' parameter to that as well so we can handle the 'tabbing' correctly, and am also concerned with the things Aaron brought up (particularly the anonymous bit-fields/zero length bitfields), as they seem to be things that make current users persnickety.
>> - Take additional arguments to be passed onto the provided function, so that C users can easily format to a different file or to a string or whatever else
> +1, that seems pretty sensible for what is effectively a callback function.
I am somewhat concerned with these as they get confusing/obtuse to use sometimes (particularly with variadic functions as the callback, where they cause strange non-compile-time errors with minor, seemingly innocuous changes), but am reasonably sure we could come up with some level of design that would make it not-awful. If this was JUST C++, I'd say "we should not, and the user should use a functor/lambda for state", but, of course, this is C :/
>> The main concern that I think we can't address this way is that `__builtin_dump_struct` decides for itself how to format the struct. But if we keep that property, that might actually help to keep us in the bounds of a builtin that's good for dumping but nothing else?
> I think that could be workable. Given that it was intended as a debugging interface, I don't think the format of the struct is critical so long as the default format isn't too awful (for some definition of "too awful") and we provide at least the same amount of information as before.
Yep, agreed. The guarantee that we give is "human readable", anything we do beyond that encourages folks trying to see if it is "computer readable", which is immediately "bad reflection". The BOFH-compiler-guy part of me (B-Compiler-Dev-FH ?) thinks we should start randomizing the output to discourage this behavior, but I suspect that would be an arms-race we would ultimately lose.
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
More information about the cfe-commits