[PATCH] D111417: [Demangle] Add support for D symbols back referencing

Tue Jan 11 18:04:15 PST 2022

dblaikie accepted this revision.
dblaikie added a comment.
This revision is now accepted and ready to land.

Looks good, thanks!

================
Comment at: llvm/lib/Demangle/DLangDemangle.cpp:80
+  ///
+  /// \note a result <= 0 is a failure.
+  ///
----------------
Since the function already communicates failure via a null return - should this maybe be an invariant/assertion/postcondition ("Ret is always >=0 on success, and unspecified on failure" or something like that?)

================
Comment at: llvm/lib/Demangle/DLangDemangle.cpp:386-387
+      Val += Mangled[0] - 'a';
+      if ((long)Val <= 0)
+        break;
+      *Ret = Val;
----------------
ljmf00 wrote:
> dblaikie wrote:
> > ljmf00 wrote:
> > > dblaikie wrote:
> > > > is this an overflow check? Can it be reached if the other overflow check at 379 succeeds? (is it tested?)
> > > This is a safe check when casting to long. The above overflow check is for unsigned long, but Ret is a long value. This is being done that way because signed overflow checks are considered undefined behaviour, since there is various ways to represent a signed value in different CPU targets. I checked locally, `_D8demangleQDXXXXXXXXXXXXx` triggers it with the fuzzer when that code is commented, so I'm going to add this to the test suite.
> > The fuzzer trips ubsan with this `<= 0` check commented out, for this test case? OK, great - thanks for explaining & adding the test case!
> > 
> > Hmm, I'm still not quite sure how this overflow could occur, if the check on line 214 passed. Could you walk me through a specific value of `Val` at the start of the loop that passes the initial overflow check, but then fails this one? And/or could you explain what the `- 25` is for in the earlier overflow check. I'm reading that as "make sure it's not too close to the end, so there's room to add the next value" but maybe it's for some other purpose?
> The fuzzer trips with Asan, since Ret value is used to offset the current position to discover the back reference. Since it is documented in decodeBackrefPos that Ret is required to be > 0 on success, negative values aren't checked, bad offset memory is dereferenced later, and it eventually segfaults.
> 
> All the math operations are made with unsigned long and then casted to a long value, at the end. Lets consider a 64-bit unsigned long, with max capacity of (2^64 - 1). If we consider `((ulong.max - 25) / 26) - 10` (`709490156681136589`), I can safely do both operations of multiply and add (`709490156681136589 * 26 + 25 == 18446744073709551339`) and still have a negative value when casting it to long. This would obviously depend on how signed numbers are represented in hardware but will likely produce `-277` if two's complement. https://godbolt.org/z/1GaMzzcxj
> 
> In another perspective, the last check is here to make sure that a value in unsigned long or long values are the same.
Ah, thanks for walking me through it!

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111417/new/

https://reviews.llvm.org/D111417