[PATCH] D124221: Reimplement `__builtin_dump_struct` in Sema.

Wang Yihan via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed May 4 22:25:05 PDT 2022


yihanaa added inline comments.


================
Comment at: clang/lib/Sema/SemaChecking.cpp:433
+
+  llvm::StringRef getFormatSpecifier(QualType T) {
+    if (auto *BT = T->getAs<BuiltinType>()) {
----------------
yihanaa wrote:
> erichkeane wrote:
> > rsmith wrote:
> > > yihanaa wrote:
> > > > rsmith wrote:
> > > > > rsmith wrote:
> > > > > > aaron.ballman wrote:
> > > > > > > yihanaa wrote:
> > > > > > > > I think this is better maintained in "clang/AST/FormatString.h". For example analyze_printf::PrintfSpecifier can get format specifier, what do you all think about?
> > > > > > > +1 to this suggestion -- my hope is that we could generalize it more then as I notice there are missing specifiers for things like intmax_t, size_t, ptrdiff_t, _Decimal types, etc. Plus, that will hopefully make it easier for __builtin_dump_struct to benefit when new format specifiers are added, such as ones for printing a _BitInt.
> > > > > > I am somewhat uncertain: every one of these is making arbitrary choices about how to format the value, so it's not clear to me that this is general logic rather than something specific to `__builtin_dump_struct`. For example, using `%f` rather than one of the myriad other ways of formatting `double` is somewhat arbitrary. Using `%s` for any `const char*` is *extremely* arbitrary and will be wrong and will cause crashes in some cases, but it may be the pragmatically correct choice for a dumping utility. A general-purpose mechanism would use `%p` for all kinds of pointer.
> > > > > > 
> > > > > > We could perhaps factor out the formatting for cases where there is a clear canonical default formatting, such as for integer types and probably `%p` for pointers, then call that from here with some special-casing, but without a second consumer for that functionality it's really not clear to me what form it should take.
> > > > > I went ahead and did this, mostly to match concurrent changes to the old implementation. There are a few cases where our existing "guess a format specifier" logic does the wrong thing for dumping purposes, which I've explicitly handled -- things like wanting to dump a `char` / `signed char` / `unsigned char` member as a number rather than as a (potentially non-printable or whitespace) character.
> > > >  When I was patching that old implementation, I found that for uint8_t, int8_t, Clang's existing "guess a format specifier" logic would treat the value as an integer, but for unsigned char, signed char, char types, it would Treat it as a character, please look at this example ( https://godbolt.org/z/ooqn4468T ), I guess this existing logic may have made some special judgment.
> > > Yeah. I think in the case where we see some random call to `printf`, `%c` is probably the right thing to guess here, but it doesn't seem appropriate to me to use this in a dumping routine. If we could dump as `'x'` for printable characters and as `'\xAB'` for everything else, that'd be great, but `printf` can't do that itself and I'm not sure we want to be injecting calls to `isprint` or whatever to make that work. Dumping as an integer always seems like it's probably the least-bad option.
> > > 
> > > Somewhat related: I wonder if we should use `"\"%s\""` instead of simply `"%s"` when dumping a `const char*`. That's not ideal but probably clearer than the current dump output.
> > I see value to having strings with SOME level of delimiter, if at least to handle cases when the string itself has a newline in it.
> This looks good, but the string may need to be escaped (if there are some special characters in the original string)
Maybe users should wrap the printf function, and then pass the wrapper function as a parameter, so that they can do some custom things


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124221/new/

https://reviews.llvm.org/D124221



More information about the cfe-commits mailing list