[PATCH] D124221: Reimplement `__builtin_dump_struct` in Sema.

Erich Keane via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu May 5 08:05:13 PDT 2022


erichkeane added a comment.

In D124221#3494000 <https://reviews.llvm.org/D124221#3494000>, @aaron.ballman wrote:

> In D124221#3493792 <https://reviews.llvm.org/D124221#3493792>, @erichkeane wrote:
>
>> FWIW, I'm in favor of the patch as it sits.
>>
>> As a followup: So I was thinking about the "%s" specifier for string types.  Assuming char-ptr types are all strings is a LITTLE dangerous, but more so the way we're doing it.  Its a shame we don't have some way of setting a 'max' limit to the number of characters we have for 2 reasons:
>>
>> 1- For safety: If the char-ptr points to non-null-terminated memory, it'll stop us from just arbitrarily printing into space by limiting at least the NUMBER of characters we print into nonsense.
>> 2- For readability: printing a 'long' string likely makes this output look like nonsense and breaks everything up.  Limiting us to only a few characters is likely a good idea.
>> 3- <Bonus #3 from @aaron.ballman >: It might discourage SOME level of attempts at using this for reflection, or at least make it a little harder.
>>
>> What I would love would be for something like a 10 char max:
>>
>>   struct S {
>>      char *C;
>>    };
>>    S s { "The Rest of this string is cut off"};
>>    print as:
>>    struct U20A a = {
>>      .c = 0x1234 "The Rest o"
>>    };
>>
>> Sadly, I don't see something like that in printf specifiers?  Unless someone smarter than me can come up with some trickery.  PERHAPS have the max-limit compile-time configurable, but I don't feel strongly.
>
> The C Standard has this in the specification of the %s format specifier:
>
>   If no l length modifier is present, the argument shall be a pointer to storage of character
>   type. Characters from the storage are written up to (but not including) the terminating
>   null character. If the precision is specified, no more than that many bytes are written. If
>   the precision is not specified or is greater than the size of the storage, the storage shall
>   contain a null character.
>
> So you can use the precision modifier on %s to limit the length to a particular number of bytes. The only downside I can think of to picking a limit is, what happens when the user stores valid UTF-8 data in their string and prints it via `%.10s` (will we then potentially be splitting a codepoint in half and that does something bad?

Ah! TIL!  That I think would be an excellent improvement to this builtin.  I suspect we could just choose a fixed value to start (maybe even in this patch @rsmith ?), then add a compile-time option in the future if folks care to extend it.  I would posit that 10-15 would be a good start?  Any shorter would be prohibitive I think?  I could be talked into a little longer...


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124221/new/

https://reviews.llvm.org/D124221



More information about the cfe-commits mailing list