[Lldb-commits] [PATCH] D79645: [lldb/test] Fix for flakiness in TestNSDictionarySynthetic

Fri May 8 12:20:26 PDT 2020

vsk created this revision.
vsk added reviewers: JDevlieghere, jingham, shafik.
Herald added a project: LLDB.
Herald added a subscriber: lldb-commits.

TestNSDictionarySynthetic sets up an NSURL which does not initialize its
_baseURL member. When the test runs and we print out the NSURL, we print
out some garbage memory pointed-to by the _baseURL member, like:

  _baseURL = 0x0800010020004029 @"d��qX"

and this can cause a python unicode decoding error like:

  UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position
  10309: invalid start byte

There's a discrepancy here because lldb's StringPrinter facility tries
to only print out "printable" sequences (see: isprint32()), whereas python
rejects the StringPrinter output as invalid utf8. For the specific error
seen above, lldb's `isprint32(0xa0) = true`, even though 0xa0 is not
really "printable" in the usual sense.

The problem is that lldb and python disagree on what exactly is
"printable". Both have dismayingly hand-rolled utf8 validation code
(c.f. _Py_DecodeUTF8Ex), and I can't really tell which one is more
correct.

I tried replacing lldb's isprint32() with a call to libc's iswprint():
this satisfied python, but broke emoji printing :|.

Now, I believe that lldb (and python too) ought to just call into some
battle-tested utf library, and that we shouldn't aim for compatibility
with python's strict unicode decoding mode until then.

FWIW I ran this test under an ASanified lldb hundreds of times but
didn't turn up any other issues.

rdar://62941711


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D79645

Files:
  lldb/test/API/lldbtest.py


Index: lldb/test/API/lldbtest.py
===================================================================

--- lldb/test/API/lldbtest.py
+++ lldb/test/API/lldbtest.py
@@ -120,8 +120,8 @@
 
         if sys.version_info.major == 2:
             # In Python 2, string objects can contain Unicode characters.
-            out = out.decode('utf-8')
-            err = err.decode('utf-8')
+            out = out.decode('utf-8', 'replace')
+            err = err.decode('utf-8', 'replace')
 
         output = """Script:\n--\n%s\n--\nExit Code: %d\n""" % (
             ' '.join(cmd), exitCode)


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D79645.262927.patch
Type: text/x-patch
Size: 582 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/lldb-commits/attachments/20200508/e09c74fc/attachment.bin>