[PATCH] D154784: [clang] Fix crash caused by PseudoObjectExprBitfields::NumSubExprs overflow

Sat Aug 19 08:33:52 PDT 2023

yronglin marked an inline comment as done.
yronglin added inline comments.

================
Comment at: clang/include/clang/AST/Stmt.h:596-597

-    // These don't need to be particularly wide, because they're
-    // strictly limited by the forms of expressions we permit.
-    unsigned NumSubExprs : 8;
-    unsigned ResultIndex : 32 - 8 - NumExprBits;
+    unsigned NumSubExprs : 16;
+    unsigned ResultIndex : 16;
   };
----------------
dblaikie wrote:
> yronglin wrote:
> > dblaikie wrote:
> > > aaron.ballman wrote:
> > > > yronglin wrote:
> > > > > dblaikie wrote:
> > > > > > dblaikie wrote:
> > > > > > > aaron.ballman wrote:
> > > > > > > > dblaikie wrote:
> > > > > > > > > Could/should we add some error checking in the ctor to assert that we don't overflow these longer values/just hit the bug later on?
> > > > > > > > > 
> > > > > > > > > (& could we use `unsigned short` here rather than bitfields?)
> > > > > > > > We've already got them packed in with other bit-fields from the expression bits, so I think it's reasonable to continue the pattern of using bit-fields (that way we don't accidentally end up with padding between the unnamed bits at the start and the named bits in this object).
> > > > > > > > 
> > > > > > > > I think adding some assertions would not be a bad idea as a follow-up.
> > > > > > > Maybe some unconditional (rather than only in asserts builds) error handling? (report_fatal_error, if this is low priority enough to not have an elegant failure mode, but something where we don't just overflow and carry on would be good... )
> > > > > > Ping on this? I worry this code has just punted the same bug further down, but not plugged the hole/ensured we don't overflow on novel/larger inputs.
> > > > > Sorry for the late reply, I was looking through the emails and found this. I agree add some assertions to check the value is a good idea, It's easy to help people catch bugs, at least with when `-DLLVM_ENABLE_ASSERTIONS=ON`, and I'm glad to work on it, but one thing that worries me is that, in ASTReader, we access this field directly, not through the constructor or accessor, and we have to add assertions everywhere. https://github.com/llvm/llvm-project/blob/05b4310c8aec7a050574277ced08a0ab86b27681/clang/lib/Serialization/ASTReaderStmt.cpp#L1382
> > > > I don't think we have to add too many assertions. As best I can tell, we'll need one in each of the `PseudoObjectExpr` constructors and one in `ASTStmtReader::VisitPseudoObjectExpr()`, but those are the only places we assign a value into the bit-field. Three assertions isn't a lot, but if we're worried, we could add a setter method that does the assertion and use the setter in all three places.
> > > My concern wasn't (well, wasn't entirely) about adding more assertions - but about having a reliable error here. The patch only makes the sizes larger, but doesn't have a hard-stop in case those sizes are exceeded again (which, admittedly, is much harder to do - maybe it's totally unreachable now, for all practical purposes?) 
> > > 
> > > I suspect with more carefully constructed recursive inputs could still reach the higher limit & I think it'd be good to fail hard in that case in some way? (it's probably rare enough that a report_fatal_error would be not-the-worst-thing-ever)
> > > 
> > > But good assertions would be nice too (the old code only failed when you hit /exactly/ on just the overflow value, and any more than that the wraparound would not crash/fail, but misbehave) - I did add the necessary assertion to ArrayRef (begin <= end) which would've helped detect this more reliably, but some assert checking for overflow in the ctor would be good too (with all the usual nuance/care in checking for overflow) - unless we're going to make that into a fatal or other real error.
> > Sorry for the very late reply. I have no preference between assertion and `llvm_unreachable`, if error then fail fast is looks good. I have a patch D158296 to add assertion.
> Thanks for the assertions - though they still haven't met my main concern that this should have a hard failure even in a non-assertions build.
> 
> I know we don't have a perfect plan/policy for these sort of "run out of resources/hit a representational limit" issues (at least I don't think we do... do we, @aaron.ballman ? I know we have some limits (recursion, template expansion, etc) but they're fairly specific/aren't about every possible case of integer overflow in some representational element, etc) but we've seen this one is pretty reachable. 
> 
> Here's a test case that would still trigger the assertion, and trigger UB in a non-assertions build:
> ```
> truct t1 { };
> template<typename T1>
> struct templ {
>     T1 v1;
>     T1 v2;
>     T1 v3;
>     T1 v4;
> };
> 
> struct t2 {
>   templ<templ<templ<templ<templ<templ<t1>>>>>> c0;
>   templ<templ<templ<templ<templ<templ<t1>>>>>> c1;
>   templ<templ<templ<templ<templ<templ<t1>>>>>> c2;
> };
> 
> void aj(...);
> void f1(t2 w) { __builtin_dump_struct(&w, aj); }
> ```
> (used templates to pack this a bit more densely than the original test case) - the `sizeof` the struct is certainly a bit outlandish (~12kbytes) bit not, I think, totally unreasonable?
Thanks for your example. I have three ways:
1. use `llvm_unreachable` to emit a hard failure but not an assertion.
2. extend these two field to 32-bit unsigned, it's may big enough.
3. limit the functionality of `__builtin_dump_struct`, if there are too many fields in a struct, the part exceeding the limit will not be output, and replaced with `...`(maybe). 

WDYT? You guys are expert in clang, and I would like to wait for your guidance :)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154784/new/

https://reviews.llvm.org/D154784