[lldb-dev] PATCH for REVIEW - Fix [Bug 15038] LLDB does not support printing wide-character variables on Linux

Thu Apr 11 14:05:42 PDT 2013

Hey there,
sorry for the delay. I have been busy over the last couple of days with some high priority work.
I have discussed this idea with a coworker. Our idea is that your “extended reader” is a good idea.
What we would prefer to see is committing it as a new function (e.g. Process::ReadNullTerminatedStringFromMemory) - that would allow an experimentation phase where we can use the old and new function to make sure things work correctly with it. Eventually, once we are all fully satisfied with the new API, the existing CString reader would simply become a wrapper for the NullTerminated reader passing a size of 1.
Feel free to change your patch in this way and check that in.We will then take it from there.

Thanks,
Enrico Granata
✉ egranata@.com
✆ 27683

On Apr 11, 2013, at 1:09 PM, "Thirumurthi, Ashok" <ashok.thirumurthi at intel.com> wrote:

> Ping!
>  
> From: lldb-dev-bounces at cs.uiuc.edu [mailto:lldb-dev-bounces at cs.uiuc.edu] On Behalf Of Thirumurthi, Ashok
> Sent: Tuesday, April 02, 2013 10:20 PM
> To: Enrico Granata
> Cc: lldb-dev at cs.uiuc.edu
> Subject: Re: [lldb-dev] PATCH for REVIEW - Fix [Bug 15038] LLDB does not support printing wide-character variables on Linux
>  
> Hi Enrico,
>  
> Your suggestions got me thinking that the version of ReadCStringFromMemory that takes a char* can be extended to support null terminators of any width.  The attached patch is therefore a variation on what you suggested that consolidates the read and validation into one method, and handles strings of any type width.  In addition, this implementation provides a null terminator when a partial read occurs.
>  
> Incidentally, in the old implementation, strlen would always succeed on the first call, because curr_dst is set to dst, which is memset to 0.  The current implementation will scan for a null terminator within bytes_read and stop reading when one is found.
>  
> If this is the right approach, let me know if I should modify SBProcess to match (i.e to add the new parameter type_width).  Also, I noticed that this code is largely duplicated in methods of the same name in class Target.  Can I assume that a complete patch would include changes to all of those methods?  Thanks in advance,
>  
> -        Ashok
>  
>  
> From: Enrico Granata [mailto:egranata at apple.com] 
> Sent: Monday, April 01, 2013 6:49 PM
> To: Thirumurthi, Ashok
> Cc: lldb-dev at cs.uiuc.edu
> Subject: Re: [lldb-dev] PATCH for REVIEW - Fix [Bug 15038] LLDB does not support printing wide-character variables on Linux
>  
> On Apr 1, 2013, at 2:50 PM, "Thirumurthi, Ashok" <ashok.thirumurthi at intel.com> wrote:
>  
> 
> Hi Enrico,
>  
> Ø If we go this route, the same thing might have to be done for ValueObject::ReadPointedString() which does the same kind of “unbounded read”
> I see.  Looking through the code base for methods that call GetMaximumSizeOfStringSummary, I see that there are a number of code paths to handle.  The attached patch lowers the fix so that this functionality can be used as needed via a helper method in Process to validate an unbounded read.
>  
>  
> We already have a Process::ReadCStringFromMemory - that does read&validation
> You might instead want to consider making a Process::ReadWStringFromMemory
> The mere act of validating that a buffer is 0-terminated is not a Process specific operation at all
>  
> Ø  But I am inclined to believe the best option would be to adopt your ProcessMonitor fix that reads smaller and smaller chunks and then returns the number of bytes read and no error
> Well, the unbounded read is a characteristic of the use case (i.e. string reads), not the implementation.  The behavior of the plug-in under this type of request is implementation specific, and an error is always reasonable since there was a failure to read the requested size in bytes.  My point is that the corner case of null terminator in the last 8-byte read needs to be tested and fixed.
>  
> Ø What troubles me a little with the overall idea of this patch is that we are putting a workaround for a lower-level issue in higher-level code.
> Ø I would prefer, if we do this, that at least we put the code around #if defined (__linux__), as in:
> Ø What we have here is a Linux-specific implementation issue
> Note that the issue can appear with any ptrace implementation.  So, I lowered the fix to the ProcessPOSIX plugin, which is consistent with the case of eErrorTypePOSIX.
>  
> I don’t think that is a Process responsibility to validate a buffer
> If you want to make it a new type of request, ReadWString, and pass it a size so it knows how many consecutive bytes must be 0 to mean EndOfData, that would be fine.
> It would look like
> size_t Process::ReadWStringFromInferior(const void* buffer, size_t max_size, Error& error, size_t sizeofitem);
> and there you could check the end-of-string
> Then the WString data formatter can be moved to use this new read request, but you will have to ensure that partial strings are accepted :-)
>  
> Thanks,
> Enrico Granata
> ✉ egranata@.com
> ✆ 27683
>  
> 
>  
> Thanks again for the feedback,
> -        Ashok
>  
>  
> From: Enrico Granata [mailto:egranata at apple.com] 
> Sent: Thursday, March 28, 2013 2:39 PM
> To: Thirumurthi, Ashok
> Cc: lldb-dev at cs.uiuc.edu
> Subject: Re: [lldb-dev] PATCH for REVIEW - Fix [Bug 15038] LLDB does not support printing wide-character variables on Linux
>  
> Hi,
> now I pretty much see where you are coming from.
> What we have here is a Linux-specific implementation issue (which of course is not going to get fixed anytime soon :-)
>  
> What troubles me a little with the overall idea of this patch is that we are putting a workaround for a lower-level issue in higher-level code.
> If we go this route, the same thing might have to be done for ValueObject::ReadPointedString() which does the same kind of “unbounded read”
> I would prefer, if we do this, that at least we put the code around #if defined (__linux__), as in:
>      if (error.Fail() || data_read == 0)
>      {
> +        bool found = false;
> #if defined (__linux__)
> +        if (data_read && error.GetType() == eErrorTypePOSIX) {
> +            // Determine if a null terminator was found in spite of an out-of-bounds read
> +            if (error.GetError() == EIO || error.GetError() == EFAULT) {
> +                char* buffer = reinterpret_cast<char *>(buffer_sp->GetBytes());
> +                char terminator[4] = {'\0', '\0', '\0', '\0'};
> +                assert(sizeof(terminator) >= type_size && "Atempting to read a wchar with more than 4 bytes per character!");
> +
> +                for (size_t i = 0; !found && (i + type_size <= data_read); i += type_size)
> +                    if (::strncmp(&buffer[i], terminator, type_size) == 0)
> +                        found = true; // null terminator
> +            }
> +        }
> #endif
> +        if (!found) {
> +            stream.Printf("unable to read data");
> +            return true;
>          }
> }
>  
> This will maintain a clean behavior for non-Linux systems (and other systems with a similarly broken API can opt-in this behavior by adding themselves to the ifdef)
>  
> But I am inclined to believe the best option would be to adopt your ProcessMonitor fix that reads smaller and smaller chunks and then returns the number of bytes read and no error - that would make all unbounded-read-based calls just work without having to add Linux-specific patches around (main issue being that now we all have to remember that anything that reads a string-like thing has to add a similar workaround or cause Linux unhappiness)
>  
> If that is not going to be viable, or will take some time, I don’t want to keep you blocked, so I would suggest:
> - commit this patch with #ifdef markers to keep it Linux-only
> - file a bugzilla issue to make process reads as functional as possible on Linux
>  
> Thoughts?
>  
> Enrico Granata
> ✉ egranata@.com
> ✆ 27683
>  
> On Mar 28, 2013, at 8:39 AM, "Thirumurthi, Ashok" <ashok.thirumurthi at intel.com> wrote:
> 
> 
> 
> Thanks for the feedback, Enrico,
>  
> I’m relying on the fact that when the wstring is larger than the default size (1GB in your example), we won’t get a ptrace error like EIO or EFAULT.  So, there will be no search for a null terminator.
>  
> If we can’t rely on the null terminator, it’s hard to know if a different error occurred.  The trouble is that there is inconsistency in the PTRACE_PEEK implementation on Linux that is more or less arbitrary (sigh), so the out-of-bounds read can generate EIO or EFAULT.  In particular, EIO can mean a number of things:
>   “request is invalid, or an attempt was made to read from or write to an invalid area in the parent's or child's memory, or there was a word-alignment violation, or an invalid signal was specified during a restart request.” –http://linux.die.net/man/2/ptrace            
>  
> In addition, the Linux ProcessMonitor reads in 8-byte chunks, so ReadUTFBufferAndDumpToStream will probably fail (for the right reasons) in the case where at least part of the null terminator was in that block.  I see that as an implementation detail that can be resolved in ProcessMonitor by reading smaller chunks until PTRACE_PEEK succeeds (i.e. a good follow-up patch).
>  
> Do let me know if you feel this is okay to commit as is.  Cheers,
>  
> -        Ashok
>  
> From: Enrico Granata [mailto:egranata at apple.com] 
> Sent: Wednesday, March 27, 2013 6:18 PM
> To: Thirumurthi, Ashok
> Cc: lldb-dev at cs.uiuc.edu
> Subject: Re: [lldb-dev] PATCH for REVIEW - Fix [Bug 15038] LLDB does not support printing wide-character variables on Linux
>  
> Hi Ashok,
> thanks for submitting an LLDB patch :)
>  
> I have looked at the code and while I see where the patch is coming from, I see a potential issue with it.
>  
> The code in ReadUTFBufferAndDumpToStream, which is what you are editing, is meant to work on a partial string (i.e. one that does not have a NULL terminator at the end). The reason for that is that you might have a wstring that is 1GB long and we would not want to try and read all of it and then display it. What we do is pick a size and only extract that much data. For obvious reasons, your string might be longer than our upper boundary, so you would get a chunk of valid bytes and then no end-of-buffer marker.
>  
> It looks like your code would fail in that case and produce no summary for a string if an EIO was received before \0.
>  
> Is there any reason why you can’t just check if (error == EIO and data_read > 0) and if so treat this as a “partial string” condition and keep going?
> Would that break/crash anything?
>  
> Best,
> Enrico Granata
> ✉ egranata@.com
> ✆ 27683
>  
> On Mar 27, 2013, at 2:52 PM, "Thirumurthi, Ashok" <ashok.thirumurthi at intel.com> wrote:
> 
> 
> 
> 
> The root cause for Bug 15038 is a ptrace EIO that can occur because we don't know the size of a (UTF) string and so read a fixed number of characters for strings.  The attached fix accepts EIO in the case where data has been read and a null terminator of the correct size and alignment was found.
> 
> - Ashok
> 
> -----Original Message-----
> From: lldb-dev-bounces at cs.uiuc.edu [mailto:lldb-dev-bounces at cs.uiuc.edu] On Behalf Ofbugzilla-daemon at llvm.org
> Sent: Tuesday, January 22, 2013 10:13 AM
> To: lldb-dev at cs.uiuc.edu
> Subject: [lldb-dev] [Bug 15038] New: LLDB does not support printing wide-character variables on Linux
> 
> http://llvm.org/bugs/show_bug.cgi?id=15038
> 
>             Bug #: 15038
>           Summary: LLDB does not support printing wide-character
>                    variables on Linux
>           Product: lldb
>           Version: unspecified
>          Platform: PC
>        OS/Version: Linux
>            Status: NEW
>          Severity: enhancement
>          Priority: P
>         Component: All Bugs
>        AssignedTo: lldb-dev at cs.uiuc.edu
>        ReportedBy: daniel.malea at intel.com
>    Classification: Unclassified
> 
> 
> Printing a variable of wchar_t type does not behave as expected and results in garbage being printed on the screen.
> 
> To reproduce, remove the @expectedFailureLinux decorator from TestChar1632T.py and TestCxxWCharT.py and run:
> 
> python dotest.py --executable <path-to-lldb> lang/cpp/char1632_t lang/cpp/wchar_t
> 
> --
> Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
> ------- You are receiving this mail because: ------- You are the assignee for the bug.
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
> <read-strings.diff>_______________________________________________
> lldb-dev mailing list
> lldb-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>  
> <read-strings-2.diff>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20130411/fa185bde/attachment.html>