[lldb-dev] PATCH for REVIEW - Fix [Bug 15038] LLDB does not support printing wide-character variables on Linux

Tue Apr 2 19:19:34 PDT 2013

Hi Enrico,

Your suggestions got me thinking that the version of ReadCStringFromMemory that takes a char* can be extended to support null terminators of any width.  The attached patch is therefore a variation on what you suggested that consolidates the read and validation into one method, and handles strings of any type width.  In addition, this implementation provides a null terminator when a partial read occurs.

Incidentally, in the old implementation, strlen would always succeed on the first call, because curr_dst is set to dst, which is memset to 0.  The current implementation will scan for a null terminator within bytes_read and stop reading when one is found.

If this is the right approach, let me know if I should modify SBProcess to match (i.e to add the new parameter type_width).  Also, I noticed that this code is largely duplicated in methods of the same name in class Target.  Can I assume that a complete patch would include changes to all of those methods?  Thanks in advance,

-        Ashok

From: Enrico Granata [mailto:egranata at apple.com]
Sent: Monday, April 01, 2013 6:49 PM
To: Thirumurthi, Ashok
Cc: lldb-dev at cs.uiuc.edu<mailto:lldb-dev at cs.uiuc.edu>
Subject: Re: [lldb-dev] PATCH for REVIEW - Fix [Bug 15038] LLDB does not support printing wide-character variables on Linux

On Apr 1, 2013, at 2:50 PM, "Thirumurthi, Ashok" <ashok.thirumurthi at intel.com<mailto:ashok.thirumurthi at intel.com>> wrote:

Hi Enrico,

> If we go this route, the same thing might have to be done for ValueObject::ReadPointedString() which does the same kind of “unbounded read”
I see.  Looking through the code base for methods that call GetMaximumSizeOfStringSummary, I see that there are a number of code paths to handle.  The attached patch lowers the fix so that this functionality can be used as needed via a helper method in Process to validate an unbounded read.

We already have a Process::ReadCStringFromMemory - that does read&validation
You might instead want to consider making a Process::ReadWStringFromMemory
The mere act of validating that a buffer is 0-terminated is not a Process specific operation at all

>  But I am inclined to believe the best option would be to adopt your ProcessMonitor fix that reads smaller and smaller chunks and then returns the number of bytes read and no error
Well, the unbounded read is a characteristic of the use case (i.e. string reads), not the implementation.  The behavior of the plug-in under this type of request is implementation specific, and an error is always reasonable since there was a failure to read the requested size in bytes.  My point is that the corner case of null terminator in the last 8-byte read needs to be tested and fixed.

> What troubles me a little with the overall idea of this patch is that we are putting a workaround for a lower-level issue in higher-level code.
> I would prefer, if we do this, that at least we put the code around #if defined (__linux__), as in:
> What we have here is a Linux-specific implementation issue
Note that the issue can appear with any ptrace implementation.  So, I lowered the fix to the ProcessPOSIX plugin, which is consistent with the case of eErrorTypePOSIX.

I don’t think that is a Process responsibility to validate a buffer
If you want to make it a new type of request, ReadWString, and pass it a size so it knows how many consecutive bytes must be 0 to mean EndOfData, that would be fine.
It would look like
size_t Process::ReadWStringFromInferior(const void* buffer, size_t max_size, Error& error, size_t sizeofitem);
and there you could check the end-of-string
Then the WString data formatter can be moved to use this new read request, but you will have to ensure that partial strings are accepted :-)

Thanks,
Enrico Granata
✉ egranata@.com<mailto:egranata@.com>
✆ 27683

Thanks again for the feedback,
-        Ashok

From: Enrico Granata [mailto:egranata at apple.com]
Sent: Thursday, March 28, 2013 2:39 PM
To: Thirumurthi, Ashok
Cc: lldb-dev at cs.uiuc.edu<mailto:lldb-dev at cs.uiuc.edu>
Subject: Re: [lldb-dev] PATCH for REVIEW - Fix [Bug 15038] LLDB does not support printing wide-character variables on Linux

Hi,
now I pretty much see where you are coming from.
What we have here is a Linux-specific implementation issue (which of course is not going to get fixed anytime soon :-)

What troubles me a little with the overall idea of this patch is that we are putting a workaround for a lower-level issue in higher-level code.
If we go this route, the same thing might have to be done for ValueObject::ReadPointedString() which does the same kind of “unbounded read”
I would prefer, if we do this, that at least we put the code around #if defined (__linux__), as in:
     if (error.Fail() || data_read == 0)
     {
+        bool found = false;
#if defined (__linux__)
+        if (data_read && error.GetType() == eErrorTypePOSIX) {
+            // Determine if a null terminator was found in spite of an out-of-bounds read
+            if (error.GetError() == EIO || error.GetError() == EFAULT) {
+                char* buffer = reinterpret_cast<char *>(buffer_sp->GetBytes());
+                char terminator[4] = {'\0', '\0', '\0', '\0'};
+                assert(sizeof(terminator) >= type_size && "Atempting to read a wchar with more than 4 bytes per character!");
+
+                for (size_t i = 0; !found && (i + type_size <= data_read); i += type_size)
+                    if (::strncmp(&buffer[i], terminator, type_size) == 0)
+                        found = true; // null terminator
+            }
+        }
#endif
+        if (!found) {
+            stream.Printf("unable to read data");
+            return true;
         }
}

This will maintain a clean behavior for non-Linux systems (and other systems with a similarly broken API can opt-in this behavior by adding themselves to the ifdef)

But I am inclined to believe the best option would be to adopt your ProcessMonitor fix that reads smaller and smaller chunks and then returns the number of bytes read and no error - that would make all unbounded-read-based calls just work without having to add Linux-specific patches around (main issue being that now we all have to remember that anything that reads a string-like thing has to add a similar workaround or cause Linux unhappiness)

If that is not going to be viable, or will take some time, I don’t want to keep you blocked, so I would suggest:
- commit this patch with #ifdef markers to keep it Linux-only
- file a bugzilla issue to make process reads as functional as possible on Linux

Thoughts?

Enrico Granata
✉ egranata@.com<mailto:egranata@>
✆ 27683

On Mar 28, 2013, at 8:39 AM, "Thirumurthi, Ashok" <ashok.thirumurthi at intel.com<mailto:ashok.thirumurthi at intel.com>> wrote:

Thanks for the feedback, Enrico,

I’m relying on the fact that when the wstring is larger than the default size (1GB in your example), we won’t get a ptrace error like EIO or EFAULT.  So, there will be no search for a null terminator.

If we can’t rely on the null terminator, it’s hard to know if a different error occurred.  The trouble is that there is inconsistency in the PTRACE_PEEK implementation on Linux that is more or less arbitrary (sigh), so the out-of-bounds read can generate EIO or EFAULT.  In particular, EIO can mean a number of things:
  “request is invalid, or an attempt was made to read from or write to an invalid area in the parent's or child's memory, or there was a word-alignment violation, or an invalid signal was specified during a restart request.” –http://linux.die.net/man/2/ptrace

In addition, the Linux ProcessMonitor reads in 8-byte chunks, so ReadUTFBufferAndDumpToStream will probably fail (for the right reasons) in the case where at least part of the null terminator was in that block.  I see that as an implementation detail that can be resolved in ProcessMonitor by reading smaller chunks until PTRACE_PEEK succeeds (i.e. a good follow-up patch).

Do let me know if you feel this is okay to commit as is.  Cheers,

-        Ashok

From: Enrico Granata [mailto:egranata at apple.com]
Sent: Wednesday, March 27, 2013 6:18 PM
To: Thirumurthi, Ashok
Cc: lldb-dev at cs.uiuc.edu<mailto:lldb-dev at cs.uiuc.edu>
Subject: Re: [lldb-dev] PATCH for REVIEW - Fix [Bug 15038] LLDB does not support printing wide-character variables on Linux

Hi Ashok,
thanks for submitting an LLDB patch :)

I have looked at the code and while I see where the patch is coming from, I see a potential issue with it.

The code in ReadUTFBufferAndDumpToStream, which is what you are editing, is meant to work on a partial string (i.e. one that does not have a NULL terminator at the end). The reason for that is that you might have a wstring that is 1GB long and we would not want to try and read all of it and then display it. What we do is pick a size and only extract that much data. For obvious reasons, your string might be longer than our upper boundary, so you would get a chunk of valid bytes and then no end-of-buffer marker.

It looks like your code would fail in that case and produce no summary for a string if an EIO was received before \0.

Is there any reason why you can’t just check if (error == EIO and data_read > 0) and if so treat this as a “partial string” condition and keep going?
Would that break/crash anything?

Best,
Enrico Granata
✉ egranata@.com<mailto:egranata@>
✆ 27683

On Mar 27, 2013, at 2:52 PM, "Thirumurthi, Ashok" <ashok.thirumurthi at intel.com<mailto:ashok.thirumurthi at intel.com>> wrote:

The root cause for Bug 15038 is a ptrace EIO that can occur because we don't know the size of a (UTF) string and so read a fixed number of characters for strings.  The attached fix accepts EIO in the case where data has been read and a null terminator of the correct size and alignment was found.

- Ashok

-----Original Message-----
From: lldb-dev-bounces at cs.uiuc.edu<mailto:lldb-dev-bounces at cs.uiuc.edu> [mailto:lldb-dev-bounces at cs.uiuc.edu] On Behalf Ofbugzilla-daemon at llvm.org<mailto:bugzilla-daemon at llvm.org>
Sent: Tuesday, January 22, 2013 10:13 AM
To: lldb-dev at cs.uiuc.edu<mailto:lldb-dev at cs.uiuc.edu>
Subject: [lldb-dev] [Bug 15038] New: LLDB does not support printing wide-character variables on Linux

http://llvm.org/bugs/show_bug.cgi?id=15038

            Bug #: 15038
          Summary: LLDB does not support printing wide-character
                   variables on Linux
          Product: lldb
          Version: unspecified
         Platform: PC
       OS/Version: Linux
           Status: NEW
         Severity: enhancement
         Priority: P
        Component: All Bugs
       AssignedTo: lldb-dev at cs.uiuc.edu<mailto:lldb-dev at cs.uiuc.edu>
       ReportedBy: daniel.malea at intel.com<mailto:daniel.malea at intel.com>
   Classification: Unclassified

Printing a variable of wchar_t type does not behave as expected and results in garbage being printed on the screen.

To reproduce, remove the @expectedFailureLinux decorator from TestChar1632T.py and TestCxxWCharT.py and run:

python dotest.py --executable <path-to-lldb> lang/cpp/char1632_t lang/cpp/wchar_t

--
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: ------- You are the assignee for the bug.
_______________________________________________
lldb-dev mailing list
lldb-dev at cs.uiuc.edu<mailto:lldb-dev at cs.uiuc.edu>
http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
<read-strings.diff>_______________________________________________
lldb-dev mailing list
lldb-dev at cs.uiuc.edu<mailto:lldb-dev at cs.uiuc.edu>
http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev

<read-strings-2.diff>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20130403/98ad27bd/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: read-strings-4.diff
Type: application/octet-stream
Size: 8259 bytes
Desc: read-strings-4.diff
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20130403/98ad27bd/attachment.obj>