[PATCH] D124221: Reimplement `__builtin_dump_struct` in Sema.

Richard Smith - zygoloid via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon May 2 14:45:36 PDT 2022


rsmith added inline comments.


================
Comment at: clang/docs/LanguageExtensions.rst:2442
+
+This builtin does not return a value.
+
----------------
erichkeane wrote:
> I don't know if anyone would be using this value, but I wonder if there is value to making this a 'sum' of the results of `f`?  Less useful for the purposes of printf, but perhaps to a sprintf-type-function?  
> 
> Not sure if we have the use cases to support this however.
I don't think you can use this directly with `sprintf` -- you'd need a wrapper to update the write position as part of each call to `f`, and that wrapper can handle accumulating the value returned from `sprintf`. In order to do that, you'd want to provide an additional input argument that can be modified as the dump progresses, and that can be used to return the overall (desired) size too. So I don't think even the `sprintf` case provides motivation for this, and it would add complexity (as well as making the builtin less usable in the case where the printing function doesn't return something addable).

I could see this being at least a little useful for a formatting function that returns `std::string` or similar, but ... I think it's probably better to let the formatting function deal with that itself through argument passing instead of assuming that we can apply `+` to the results. WDYT?


================
Comment at: clang/docs/ReleaseNotes.rst:174
   - Improve the dump format, dump both bitwidth(if its a bitfield) and field value.
-  - Remove anonymous tag locations.
-  - Beautify dump format, add indent for nested struct and struct members.
+  - Remove anonymous tag locations and flatten anonymous struct members.
+  - Beautify dump format, add indent for struct members.
----------------
yihanaa wrote:
> I tried it, D124221 will still generate anonymous tag locations, this place needs to be corrected, or removed anonymous tag locations.
> 
> int main() {
>   struct U20A {
>     _Bool a;
>     unsigned : 32;
>     struct {
>       int x;
>       int y;
>     } c;
>   };
> 
>   struct U20A a = {
>       .a = 1,
>       .c.x = 1998
>   };
>     __builtin_dump_struct(&a, printf);
> }
> 
> struct U20A {
>     _Bool a = 1
>     struct (unnamed struct at ./builtin_dump_struct.c:10:5) c = {
>         int x = 1998
>         int y = 0
>     }
> }
Thanks, fixed.


================
Comment at: clang/include/clang/Basic/DiagnosticSemaKinds.td:8470-8473
+def err_expected_struct_pointer_argument : Error<
+  "expected pointer to struct as %ordinal0 argument to %1, found %2">;
+def err_expected_callable_argument : Error<
+  "expected a callable expression as %ordinal0 argument to %1, found %2">;
----------------
aaron.ballman wrote:
> Should we combine these into:
> ```
> def err_builtin_dump_struct_expects : Error<
>   "expected %select{pointer to struct|a callable expression}0 as %ordinal1 argument to %2, found %3">;
> ```
> to reduce the number of diagnostics we have to tablegen?
I'm indifferent. That kind of refactoring will probably decrease the chance of these being reused for any other purpose, which might end up costing us more diagnostics, and will need a magic number in the source code to select between alternatives. Happy to do it if you think it's worthwhile, but I don't have any measurements for how much each diagnostic ID costs.


================
Comment at: clang/lib/AST/Expr.cpp:2742
+    // Otherwise, warn if the result expression would warn.
+    const Expr *Result = POE->getResultExpr();
+    return Result && Result->isUnusedResultAWarning(WarnE, Loc, R1, R2, Ctx);
----------------
aaron.ballman wrote:
> I don't think the changes here are incorrect, but I'm wondering if they should be split into a different patch as they seem to be largely unrelated to the builtin?
Yeah, will do. I think these changes may actually turn out to not be necessary for `__builtin_dump_struct`, because it returns `void`  (they were necessary for __builtin_reflect_struct`).


================
Comment at: clang/lib/Sema/SemaChecking.cpp:390
+    Expr *Lit = S.Context.getPredefinedStringLiteralFromCache(Str);
+    // Wrap the literal in parentheses to attach a source location.
+    return new (S.Context) ParenExpr(Loc, Loc, Lit);
----------------
aaron.ballman wrote:
> Haha, neat trick. Perhaps someday we should allow `getPredefinedStringLiteralFromCache()` to specify a `SourceLocation`?
The desire for caching and the desire for a source location are a bit at odds here, unfortunately. Also unfortunate: if you actually put a source location on a `StringLiteral`, other parts of Clang (especially relevant for this patch: format string checking!) will assume that there's a string literal token that they can re-lex from that location, and will get very unhappy if there isn't. Perhaps a no-op `LocatedExpr` that just wraps an expression and gives it a different location would work, but of course we'd need to teach various things to look through it.


================
Comment at: clang/lib/Sema/SemaChecking.cpp:427
+    llvm::SmallString<32> Indent;
+    Indent.resize(Depth * 4, ' ');
+    return getStringLiteral(Indent);
----------------
yihanaa wrote:
> What do you think of PrintingPolicy.Indentation here?
It looks like nothing sets that outside of libclang, so I don't think that'd be especially useful, but it seems fine to me to use it here. We're not exactly pretty-printing source code, but that doesn't seem like a problem. Done.


================
Comment at: clang/lib/Sema/SemaChecking.cpp:433
+
+  llvm::StringRef getFormatSpecifier(QualType T) {
+    if (auto *BT = T->getAs<BuiltinType>()) {
----------------
aaron.ballman wrote:
> yihanaa wrote:
> > I think this is better maintained in "clang/AST/FormatString.h". For example analyze_printf::PrintfSpecifier can get format specifier, what do you all think about?
> +1 to this suggestion -- my hope is that we could generalize it more then as I notice there are missing specifiers for things like intmax_t, size_t, ptrdiff_t, _Decimal types, etc. Plus, that will hopefully make it easier for __builtin_dump_struct to benefit when new format specifiers are added, such as ones for printing a _BitInt.
I am somewhat uncertain: every one of these is making arbitrary choices about how to format the value, so it's not clear to me that this is general logic rather than something specific to `__builtin_dump_struct`. For example, using `%f` rather than one of the myriad other ways of formatting `double` is somewhat arbitrary. Using `%s` for any `const char*` is *extremely* arbitrary and will be wrong and will cause crashes in some cases, but it may be the pragmatically correct choice for a dumping utility. A general-purpose mechanism would use `%p` for all kinds of pointer.

We could perhaps factor out the formatting for cases where there is a clear canonical default formatting, such as for integer types and probably `%p` for pointers, then call that from here with some special-casing, but without a second consumer for that functionality it's really not clear to me what form it should take.


================
Comment at: clang/lib/Sema/SemaChecking.cpp:572-574
+          // We don't know how to print this field. Print out its address
+          // with a format specifier that a smart tool will be able to
+          // recognize and treat specially.
----------------
aaron.ballman wrote:
> Should we handle array types specifically, or should we continue to punt on those for now?
I'd prefer to punt for now, if that's OK. I don't think this is hard to add, but it's probably clearer to do it in a separate patch. There are also some nontrivial questions there, eg: should we dump all elements or stop after some limit? Should we dump trailing zeroes in an integer array? How should we dump arrays of `char`? I'd prefer to not deal with those questions in this patch :)


================
Comment at: clang/lib/Sema/SemaChecking.cpp:573
+          // We don't know how to print this field. Print out its address
+          // with a format specifier that a smart tool will be able to
+          // recognize and treat specially.
----------------
yihanaa wrote:
> Can we generate a warning diagnostic message when we see an unsupported type (perhaps temporarily not supported), what do you all think about?
We could, but I'm leaning towards thinking that we shouldn't warn by default at least. That would get in the way of the C++ use case where the type actually is supported by the printing function; I'm not sure that we can easily tell those cases apart. And it's probably not all that useful to someone who just wants whatever information we can give them while debugging. I'd be OK with adding a warning that's off by default, though. What would be the intended use case here? Would an off-by-default warning satisfy it?


================
Comment at: clang/lib/Sema/SemaChecking.cpp:595
+                                             PseudoObjectExpr::NoResult);
+    TheCall->setType(Wrapper->getType());
+    TheCall->setValueKind(Wrapper->getValueKind());
----------------
erichkeane wrote:
> What is happening here?  I would expect the opposite to need to happen here (wrapper be set to match the call, not the other way around?).  
This is the normal way we handle builtins with custom type checking: the `SemaChecking` logic analyzes the builtin and sets the result type of the call as appropriate, because the type we get from `Builtins.def` is meaningless. As it happens, this is a degenerate case (compared to `__builtin_reflect_struct` where this did something more interesting), because the type of the `PseudoObjectExpr` is always `void` here; I can just set that directly if you think that's clearer. It shouldn't matter much anyway, given that the call is only used as the syntactic form of the `PseudoObjectExpr`, so its type doesn't really affect anything. but following the type and value kind of the `PseudoObjectExpr` seems "right" to me.


================
Comment at: clang/lib/Sema/SemaChecking.cpp:603
+static ExprResult SemaBuiltinDumpStruct(Sema &S, CallExpr *TheCall) {
+  if (TheCall->getNumArgs() < 2 && checkArgCount(S, TheCall, 2))
+    return ExprError();
----------------
erichkeane wrote:
> This is unfortunate.  Do we really not have a `checkAtLeastArgCount` type function here?  I know we have something similar for attributes.
We do now. :)


================
Comment at: clang/lib/Sema/SemaChecking.cpp:614
+  if (!PtrArgType->isPointerType() ||
+      !PtrArgType->getPointeeType()->isRecordType()) {
+    S.Diag(PtrArgResult.get()->getBeginLoc(),
----------------
erichkeane wrote:
> Is this sufficient?  Couldn't this be a pointer to a union?  Do you mean the suggestion?
Allowing this is what the existing builtin did, and there were tests for it.

I'm not sure what the best thing to do with unions is -- especially when a union appears as a struct member. Perhaps the least bad approach is to print out all the union members and hope that nothing goes wrong. That'll crash in some cases (eg, `union { const char *p; int n; };` when the `int` is the active member), but that seems to be the nature of this builtin.

I've added a FIXME to `dumpRecordValue`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124221/new/

https://reviews.llvm.org/D124221



More information about the cfe-commits mailing list