[Lldb-commits] [PATCH] D17107: [lldb] Unicode support on Win32

Cameron via lldb-commits lldb-commits at lists.llvm.org
Mon Feb 22 15:01:11 PST 2016


cameron314 added a comment.

I've addressed more feedback. I'm working on a second version of the patch with a lot less `#ifdefs` and nicer wrappers (and better error handling where possible).


================
Comment at: lldb/trunk/source/Commands/CommandCompletions.cpp:171
@@ -168,1 +170,3 @@
         {
+#ifdef _WIN32
+            llvm::SmallVector<UTF16, 128> wpartial_name_copy;
----------------
zturner wrote:
> cameron314 wrote:
> > zturner wrote:
> > > I'm not fond of this approach of sprinkling #ifdefs around in every location where we call into a file system API.  We have `lldb/Host/FileSystem`, probably some of these methods should be moved over there, and we should provide a windows implementation and a non-windows implementation.  This way in places like this, you simply write:
> > > 
> > >     isa_directory = FileSystem::GetPermissions(partial_name_copy) & FileSystem::eDirectory);
> > > 
> > > and all this ugliness is hidden.
> > I agree. Again, my goal was not to refactor (I don't know this codebase very well at all and can't test it in much depth), but to fix existing code. It happens that `stat` is particularly awkward on Win32, so yes, it should probably be wrapped instead (or better yet replaced with `GetFileAttributes()`). But this is outside the scope of the patch.
> I know your intent was to just fix existing code, but when those fixes make the code harder to maintain or do things the wrong way, then I think it's worth asking whether the scope should expand.  Introducing ifdefs in common code is not a direction we want to move in unless it's absolutely necessary, so somehow we have to address that before this can go in, even if it means expanding the scope of the patch slightly.
I understand. I'll try to refactor out the ifdefs.

================
Comment at: lldb/trunk/source/Core/ConnectionSharedMemory.cpp:140
@@ +139,3 @@
+    llvm::SmallVector<UTF16, 128> wname;
+    if (llvm::convertUTF8ToUTF16String(name, wname))
+    {
----------------
zturner wrote:
> cameron314 wrote:
> > amccarth wrote:
> > > It's too bad there isn't an overload that gives the wide string as a std::wstring.  Then we wouldn't need all the ugly C-style casts of wname.data().
> > I didn't want to add/change the string conversion support libraries too much (there's already too many functions with overlapping functionality IMO), but on the whole this would indeed yield much nicer and simpler code. I'll make this change.
> It's going to be difficult to make this change.  Because that will require a separate review on the LLVM side with different reviewers.  You can certainly try though.
Hmm. Alright then. I'll put the wrappers in LLDB for now.

================
Comment at: lldb/trunk/source/Core/Disassembler.cpp:881
@@ -878,3 +880,3 @@
     }
-        
-    FILE *test_file = fopen (file_name, "r");
+    FILE *test_file = nullptr;
+#if _WIN32
----------------
cameron314 wrote:
> zturner wrote:
> > Would LLDB's `File` class work here instead of using raw APIs?  I see you fixed the code in `lldb_private::File`, so perhaps simply using that class here instead of a `FILE*` would allow you to reuse the fix from that class.
> No idea. Probably :-) I'll look into it, and make this change if it's trivial.
I don't trust myself to make this change without introducing a bug. But I'll get rid of the `#ifdef`.

================
Comment at: lldb/trunk/source/Host/common/File.cpp:353
@@ +352,3 @@
+        int stat_result;
+#ifdef _WIN32
+        llvm::SmallVector<UTF16, 128> wpath;
----------------
cameron314 wrote:
> zturner wrote:
> > Ahh, here's that method I was looking for earlier when I mentioend `FileSystem`.  So I guess you don't have to implement it, it's already here.  So you could just use this method earlier in the part that gets file permissions by manually calling stat.
> This method only returns the permissions -- the other one you're referring to checks if a path is a directory.
...which turns out to exist here too. I call that instead of using stat now, much cleaner!

================
Comment at: lldb/trunk/source/Host/common/FileSpec.cpp:108
@@ +107,3 @@
+        if (stat_result == 0)
+            *stats_ptr = *reinterpret_cast<struct stat *>(&file_stats);
+        return stat_result == 0;
----------------
zturner wrote:
> cameron314 wrote:
> > zturner wrote:
> > > I don't think this line is correct.  The source and destination struct types do not always have the same layout, so I think you may need to copy the values out one by one.
> > It's probably better if I copy the fields for the sake of forwards compatibility, but I checked the definitions of these structures and they should be 100% compatible. I'll make the change, though, the `reinterpret_cast` is really ugly.
> Apparently the CRT uses a reinterpret_cast too.  As long as you put:
> 
>     static_assert(sizeof(struct stat) == sizeof(struct _stat64i32));
> 
> then the cast is probably fine.
Makes sense. Done.

================
Comment at: lldb/trunk/source/Host/common/FileSpec.cpp:237
@@ +236,3 @@
+    int stat_result;
+#ifdef _WIN32
+    llvm::SmallVector<UTF16, 128> wpath;
----------------
zturner wrote:
> cameron314 wrote:
> > amccarth wrote:
> > > zturner wrote:
> > > > Maybe you can just call `GetStats()` here
> > > Instead of another #ifdef here, can't we just re-use the GetFileStats function above?
> > Probably, yes, but it requires a FileSpec. I'm not sure what creating a temporary one implicates. Surely there was a reason it wasn't already called before?
> This is another reason I argued for moving this to `FileSystem` so that we wouldn't need a `FileSpec` here.  
Actually, it seems the stat is just done to see if the path exists. `llvm::sys::fs::exists` does this already, I'll call that instead. It does the proper string conversion and uses the right Win32 API call on Windows.

================
Comment at: lldb/trunk/source/Host/common/FileSpec.cpp:829
@@ -799,2 +828,3 @@
 #ifdef _WIN32
-    auto attrs = ::GetFileAttributes (resolved_path);
+    llvm::SmallVector<UTF16, 128> wpath;
+    if (!llvm::convertUTF8ToUTF16String (resolved_path, wpath))
----------------
cameron314 wrote:
> zturner wrote:
> > I think a `FileSystem::IsSymbolicLink()` here would be better, and then have `File` call `FileSpec::IsSymbolicLink()`
> > 
> > There's currently some confusion and overlap about what goes in `FileSpec` and what goes in `FileSystem`, but my general opinion is that wherever possible, anything that queries the filesystem should be a static method in `Host/FileSystem`, and any other classes who want to do the same thing can call the methods in `FileSystem`.  This way you don't have to, for example, construct a `FileSpec` from a string just to check if something is a symbolic link since you could just call the low level method directly.
> I agree, but this is outside the scope of my patch :-)
> This method already existed, and was already written this way. I merely fixed the way it calls into the OS by doing a string conversion, which is completely independent of the existing file system code's design.
By the way, all the `FileSystem` methods seem to accept `FileSpec`s as input, not raw paths. I agree it would be nice if `FileSystem` abstracted the OS-level FS, which all the other higher-level parts (`FileSpec`, `File`) then depended on. But that would require changing the API rather significantly.

================
Comment at: lldb/trunk/source/Host/windows/Windows.cpp:34
@@ +33,3 @@
+{
+    bool utf8ToWide(const char* utf8, wchar_t* buf, size_t bufSize)
+    {
----------------
zturner wrote:
> Do the LLVM functions not work here for some reason?
They would, but these are called from functions that are trying to emulate libc functions, by avoiding calling malloc among other things. Hence these wrappers that work with buffers.

================
Comment at: lldb/trunk/source/Host/windows/Windows.cpp:210
@@ +209,3 @@
+    wchar_t wpath[PATH_MAX];
+    if (wchar_t* wresult = _wgetcwd(wpath, PATH_MAX))
+    {
----------------
zturner wrote:
> Why can't we just convert to the result of `_wgetcwd` directly to UTF8?
Not sure I understand. That's what this code does? There's a tad of extra logic needed because the `path` passed in is allowed to be `NULL`, indicating we should allocate a buffer for the caller.

================
Comment at: lldb/trunk/source/Plugins/ScriptInterpreter/Python/PythonDataObjects.cpp:1175
@@ -1166,1 +1174,3 @@
+    fp = _wfopen((const wchar_t *)wpath.data(), (const wchar_t *)wmode.data());
+#else
     fp = fopen(path, mode);
----------------
zturner wrote:
> Here's another example of an #ifdef in common code that we will need to do something about.  Is there a constructor of the `File` class that takes a path and a mode?
There is, but the mode is an enum and not a string. I'll get rid of the `#ifdef`, though.

================
Comment at: lldb/trunk/source/Target/ProcessLaunchInfo.cpp:457
@@ +456,3 @@
+                    const char *curr_path = nullptr;
+#ifdef _WIN32
+                    const wchar_t *wcurr_path = _wgetenv(L"PATH");
----------------
zturner wrote:
> Anothe here.
Quite so. I'll add a portable `getenv` wrapper in `HostInfo`.


Repository:
  rL LLVM

http://reviews.llvm.org/D17107





More information about the lldb-commits mailing list