[PATCH] D124221: Reimplement `__builtin_dump_struct` in Sema.

Richard Smith - zygoloid via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue May 3 13:33:08 PDT 2022


rsmith added inline comments.


================
Comment at: clang/lib/Sema/SemaChecking.cpp:433
+
+  llvm::StringRef getFormatSpecifier(QualType T) {
+    if (auto *BT = T->getAs<BuiltinType>()) {
----------------
rsmith wrote:
> aaron.ballman wrote:
> > yihanaa wrote:
> > > I think this is better maintained in "clang/AST/FormatString.h". For example analyze_printf::PrintfSpecifier can get format specifier, what do you all think about?
> > +1 to this suggestion -- my hope is that we could generalize it more then as I notice there are missing specifiers for things like intmax_t, size_t, ptrdiff_t, _Decimal types, etc. Plus, that will hopefully make it easier for __builtin_dump_struct to benefit when new format specifiers are added, such as ones for printing a _BitInt.
> I am somewhat uncertain: every one of these is making arbitrary choices about how to format the value, so it's not clear to me that this is general logic rather than something specific to `__builtin_dump_struct`. For example, using `%f` rather than one of the myriad other ways of formatting `double` is somewhat arbitrary. Using `%s` for any `const char*` is *extremely* arbitrary and will be wrong and will cause crashes in some cases, but it may be the pragmatically correct choice for a dumping utility. A general-purpose mechanism would use `%p` for all kinds of pointer.
> 
> We could perhaps factor out the formatting for cases where there is a clear canonical default formatting, such as for integer types and probably `%p` for pointers, then call that from here with some special-casing, but without a second consumer for that functionality it's really not clear to me what form it should take.
I went ahead and did this, mostly to match concurrent changes to the old implementation. There are a few cases where our existing "guess a format specifier" logic does the wrong thing for dumping purposes, which I've explicitly handled -- things like wanting to dump a `char` / `signed char` / `unsigned char` member as a number rather than as a (potentially non-printable or whitespace) character.


================
Comment at: clang/lib/Sema/SemaChecking.cpp:573
+          // We don't know how to print this field. Print out its address
+          // with a format specifier that a smart tool will be able to
+          // recognize and treat specially.
----------------
aaron.ballman wrote:
> rsmith wrote:
> > yihanaa wrote:
> > > Can we generate a warning diagnostic message when we see an unsupported type (perhaps temporarily not supported), what do you all think about?
> > We could, but I'm leaning towards thinking that we shouldn't warn by default at least. That would get in the way of the C++ use case where the type actually is supported by the printing function; I'm not sure that we can easily tell those cases apart. And it's probably not all that useful to someone who just wants whatever information we can give them while debugging. I'd be OK with adding a warning that's off by default, though. What would be the intended use case here? Would an off-by-default warning satisfy it?
> So long as we don't trigger UB, I think it's fine to not diagnose for unsupported types. But the UB cases do strike me as something we might want to warn about if we think they can be hit (specifically, these kinds of UB: https://eel.is/c++draft/cstdarg.syn#1).
> 
> As an example, do we properly handle dumping this case:
> ```
> struct S {
>   int &i;
>   S(int &ref) : i(ref) {}
> };
> ```
> where the vararg argument is of reference type? I suppose another interesting question is:
> ```
> struct S {
>   int i;
> };
> 
> void func() {
>   S s;
>   [=]() {
>     __builtin_dump_struct(&s, ::printf);
>   };
> }
> ```
> (is S.i "an entity resulting from a lambda capture"?)
The only things we pass to the formatting function are:
* The small number of builtin types for which we have known formatting specifiers.
* Pointers to `void`, using `%p`.
* Pointers to `[const] char`, using `%s`.
* Pointers to user-defined types, using `*%p`.
All of these can be passed through a `...` without problems.

We can potentially introduce the following sources of UB:
* Indirection through a struct pointer that doesn't point to a struct object.
* Read of an inactive union member.
* Read of an uninitialized member.
* Printing something other than a pointer to a nul-terminated byte sequence with `%s`.
I'm not sure how much we should be worrying about those. It'd be nice to avoid them, though I don't think we can fix the fourth issue without giving up the `%s` formatting entirely.

Reference types aren't on our list of types with known formatting specifiers, so in your first example we'd call `printf("%s%s = *%p", "int &", "i", &p->i)`, producing output like `int & i = *0x1234abcd`. We could special-case that to also dereference the reference and include the value (`int & i = *0x1234abcd (5)`, but for now I'm treating it like any other type for which we don't have a special formatting rule. I've added test coverage for this case.

In your second example, we rewrite into `::printf("%s%s = %s", "int", "i", (&s)->i)`, which is fine. The `parmN` restriction is about calls to `va_start`:
```
void f(int a, ...) {
  [&] {
    va_list va;
    va_start(va, a); // error, `a` is captured here
    // ...
  };
}
```


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124221/new/

https://reviews.llvm.org/D124221



More information about the cfe-commits mailing list