[PATCH] D124221: Reimplement `__builtin_dump_struct` in Sema.

Aaron Ballman via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu May 5 07:59:25 PDT 2022


aaron.ballman added a comment.

In D124221#3493792 <https://reviews.llvm.org/D124221#3493792>, @erichkeane wrote:

> FWIW, I'm in favor of the patch as it sits.
>
> As a followup: So I was thinking about the "%s" specifier for string types.  Assuming char-ptr types are all strings is a LITTLE dangerous, but more so the way we're doing it.  Its a shame we don't have some way of setting a 'max' limit to the number of characters we have for 2 reasons:
>
> 1- For safety: If the char-ptr points to non-null-terminated memory, it'll stop us from just arbitrarily printing into space by limiting at least the NUMBER of characters we print into nonsense.
> 2- For readability: printing a 'long' string likely makes this output look like nonsense and breaks everything up.  Limiting us to only a few characters is likely a good idea.
> 3- <Bonus #3 from @aaron.ballman >: It might discourage SOME level of attempts at using this for reflection, or at least make it a little harder.
>
> What I would love would be for something like a 10 char max:
>
>   struct S {
>      char *C;
>    };
>    S s { "The Rest of this string is cut off"};
>    print as:
>    struct U20A a = {
>      .c = 0x1234 "The Rest o"
>    };
>
> Sadly, I don't see something like that in printf specifiers?  Unless someone smarter than me can come up with some trickery.  PERHAPS have the max-limit compile-time configurable, but I don't feel strongly.

The C Standard has this in the specification of the %s format specifier:

  If no l length modifier is present, the argument shall be a pointer to storage of character
  type. Characters from the storage are written up to (but not including) the terminating
  null character. If the precision is specified, no more than that many bytes are written. If
  the precision is not specified or is greater than the size of the storage, the storage shall
  contain a null character.

So you can use the precision modifier on %s to limit the length to a particular number of bytes. The only downside I can think of to picking a limit is, what happens when the user stores valid UTF-8 data in their string and prints it via `%.10s` (will we then potentially be splitting a codepoint in half and that does something bad?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124221/new/

https://reviews.llvm.org/D124221



More information about the cfe-commits mailing list