[Lldb-commits] [PATCH] D17107: [lldb] Unicode support on Win32

Cameron via lldb-commits lldb-commits at lists.llvm.org
Thu Feb 11 10:22:52 PST 2016


cameron314 added a comment.

I've replied to most of your comments, thanks everyone. I'll make most of the suggested changes, but some of them have nothing to do with correct Unicode handling and are rather pre-existing design issues, and so are out of the scope of this patch.

@zturner: I agree that the conversion as it is now is ugly, but I'm not sure a `WinUtfString` would be the best way to clean it up, considering the unfortunate necessity of error handling (which is still context-dependent 90% of the time). I'm also slightly allergic to adding more string classes, especially platform-specific ones ;-) I'm more inclined to simply adopt a cleaner set of std::string<->std::wstring conversion function wrappers as suggested by @amccarth. I'll start with that and we'll see from there.


================
Comment at: cfe/trunk/tools/libclang/CIndexer.cpp:57
@@ -55,3 +56,3 @@
   MEMORY_BASIC_INFORMATION mbi;
-  char path[MAX_PATH];
+  wchar_t wpath[MAX_PATH];
   VirtualQuery((void *)(uintptr_t)clang_createTranslationUnit, &mbi,
----------------
zturner wrote:
> You'll probably need to submit this as a separate patch to cfe-commits, since it goes in a separate repo.  You can then commit them independently.
Alright, thanks. I'll do that. Thought it would make sense to sneak it in here ;-)

================
Comment at: lldb/trunk/source/Commands/CommandCompletions.cpp:171
@@ -168,1 +170,3 @@
         {
+#ifdef _WIN32
+            llvm::SmallVector<UTF16, 128> wpartial_name_copy;
----------------
zturner wrote:
> I'm not fond of this approach of sprinkling #ifdefs around in every location where we call into a file system API.  We have `lldb/Host/FileSystem`, probably some of these methods should be moved over there, and we should provide a windows implementation and a non-windows implementation.  This way in places like this, you simply write:
> 
>     isa_directory = FileSystem::GetPermissions(partial_name_copy) & FileSystem::eDirectory);
> 
> and all this ugliness is hidden.
I agree. Again, my goal was not to refactor (I don't know this codebase very well at all and can't test it in much depth), but to fix existing code. It happens that `stat` is particularly awkward on Win32, so yes, it should probably be wrapped instead (or better yet replaced with `GetFileAttributes()`). But this is outside the scope of the patch.

================
Comment at: lldb/trunk/source/Core/ConnectionSharedMemory.cpp:138
@@ -134,10 +137,3 @@
 #ifdef _WIN32
-    HANDLE handle;
-    if (create) {
-        handle = CreateFileMapping(
-            INVALID_HANDLE_VALUE,
-            NULL,
-            PAGE_READWRITE,
-            llvm::Hi_32(size),
-            llvm::Lo_32(size),
-            name);
+    HANDLE handle = INVALID_HANDLE_VALUE;
+    llvm::SmallVector<UTF16, 128> wname;
----------------
zturner wrote:
> Same thing applies here, we should probably have a `Host/SharedMemory.h` that exposes a `Create()` method.  Although, in this particular case the `#ifdef` was already here so I can't entirely complain, but it would be nice to improve the code as we go and get ifdefs like this out of generic code.
Again, I agree, but this is outside the scope of my patch.

================
Comment at: lldb/trunk/source/Core/ConnectionSharedMemory.cpp:140
@@ +139,3 @@
+    llvm::SmallVector<UTF16, 128> wname;
+    if (llvm::convertUTF8ToUTF16String(name, wname))
+    {
----------------
amccarth wrote:
> It's too bad there isn't an overload that gives the wide string as a std::wstring.  Then we wouldn't need all the ugly C-style casts of wname.data().
I didn't want to add/change the string conversion support libraries too much (there's already too many functions with overlapping functionality IMO), but on the whole this would indeed yield much nicer and simpler code. I'll make this change.

================
Comment at: lldb/trunk/source/Core/Disassembler.cpp:881
@@ -878,3 +880,3 @@
     }
-        
-    FILE *test_file = fopen (file_name, "r");
+    FILE *test_file = nullptr;
+#if _WIN32
----------------
zturner wrote:
> Would LLDB's `File` class work here instead of using raw APIs?  I see you fixed the code in `lldb_private::File`, so perhaps simply using that class here instead of a `FILE*` would allow you to reuse the fix from that class.
No idea. Probably :-) I'll look into it, and make this change if it's trivial.

================
Comment at: lldb/trunk/source/Host/common/File.cpp:330
@@ +329,3 @@
+        }
+        m_descriptor = ::_wopen((const wchar_t *)wpath.data(), oflag, mode);
+#else
----------------
amccarth wrote:
> _wopen is officially deprecated.  How about _wsopen_s?
Sure. I'll make this change. `open` was deprecated too...

================
Comment at: lldb/trunk/source/Host/common/File.cpp:353
@@ +352,3 @@
+        int stat_result;
+#ifdef _WIN32
+        llvm::SmallVector<UTF16, 128> wpath;
----------------
zturner wrote:
> Ahh, here's that method I was looking for earlier when I mentioend `FileSystem`.  So I guess you don't have to implement it, it's already here.  So you could just use this method earlier in the part that gets file permissions by manually calling stat.
This method only returns the permissions -- the other one you're referring to checks if a path is a directory.

================
Comment at: lldb/trunk/source/Host/common/File.cpp:357
@@ +356,3 @@
+        {
+            error.SetError(1, lldb::eErrorTypeGeneric);
+            return 0;
----------------
amccarth wrote:
> Why is the error reported here different than many of the other locations?
About a quarter of the functions report errors through an `Error` (like this one), so I followed suit in this context. Might be better if I set it with a message instead, though -- I'll make that change.

The rest report errors either through `errno`; by some context-dependent return value (e.g. functions that return false on any failure); or don't check for errors at all.

================
Comment at: lldb/trunk/source/Host/common/FileSpec.cpp:108
@@ +107,3 @@
+        if (stat_result == 0)
+            *stats_ptr = *reinterpret_cast<struct stat *>(&file_stats);
+        return stat_result == 0;
----------------
zturner wrote:
> I don't think this line is correct.  The source and destination struct types do not always have the same layout, so I think you may need to copy the values out one by one.
It's probably better if I copy the fields for the sake of forwards compatibility, but I checked the definitions of these structures and they should be 100% compatible. I'll make the change, though, the `reinterpret_cast` is really ugly.

================
Comment at: lldb/trunk/source/Host/common/FileSpec.cpp:237
@@ +236,3 @@
+    int stat_result;
+#ifdef _WIN32
+    llvm::SmallVector<UTF16, 128> wpath;
----------------
amccarth wrote:
> zturner wrote:
> > Maybe you can just call `GetStats()` here
> Instead of another #ifdef here, can't we just re-use the GetFileStats function above?
Probably, yes, but it requires a FileSpec. I'm not sure what creating a temporary one implicates. Surely there was a reason it wasn't already called before?

================
Comment at: lldb/trunk/source/Host/common/FileSpec.cpp:829
@@ -799,2 +828,3 @@
 #ifdef _WIN32
-    auto attrs = ::GetFileAttributes (resolved_path);
+    llvm::SmallVector<UTF16, 128> wpath;
+    if (!llvm::convertUTF8ToUTF16String (resolved_path, wpath))
----------------
zturner wrote:
> I think a `FileSystem::IsSymbolicLink()` here would be better, and then have `File` call `FileSpec::IsSymbolicLink()`
> 
> There's currently some confusion and overlap about what goes in `FileSpec` and what goes in `FileSystem`, but my general opinion is that wherever possible, anything that queries the filesystem should be a static method in `Host/FileSystem`, and any other classes who want to do the same thing can call the methods in `FileSystem`.  This way you don't have to, for example, construct a `FileSpec` from a string just to check if something is a symbolic link since you could just call the low level method directly.
I agree, but this is outside the scope of my patch :-)
This method already existed, and was already written this way. I merely fixed the way it calls into the OS by doing a string conversion, which is completely independent of the existing file system code's design.

================
Comment at: lldb/trunk/source/Host/common/FileSpec.cpp:1141
@@ -1103,1 +1140,3 @@
+
         WIN32_FIND_DATA ffd;
+        HANDLE hFind = FindFirstFileW((LPCWSTR)wszDir.data(), &ffd);
----------------
amccarth wrote:
> WIN32_FIND_DATATW to be consistent with calling the wide versions of the functions explicitly.
Right, thanks, missed this one. I'll make this change.

================
Comment at: lldb/trunk/source/Host/common/FileSpec.cpp:1173-1180
@@ -1134,4 +1172,10 @@
 
-            char child_path[MAX_PATH];
-            const int child_path_len = ::snprintf (child_path, sizeof(child_path), "%s\\%s", dir_path, ffd.cFileName);
+            std::string fileName;
+            if (!llvm::convertUTF16ToUTF8String((const UTF16*)ffd.cFileName, fileName))
+            {
+                continue;
+            }
+
+            char child_path[PATH_MAX];
+            const int child_path_len = ::snprintf (child_path, sizeof(child_path), "%s\\%s", dir_path, fileName.c_str());
             if (child_path_len < (int)(sizeof(child_path) - 1))
----------------
zturner wrote:
> sprintf'ing parent and child paths together to concatenate them doesn't seem like the best idea.  Can we use `llvm::sys::path::append()` here instead?
Probably, but a lot of this function's code would have to be changed to work with those objects instead of fixed buffers. The `snprintf` was there before, so I conserved it. The more changes I make, the more likely I am to introduce a bug, especially since I don't know the codebase very well at all.

================
Comment at: lldb/trunk/source/Host/common/FileSpec.cpp:1179
@@ +1178,3 @@
+
+            char child_path[PATH_MAX];
+            const int child_path_len = ::snprintf (child_path, sizeof(child_path), "%s\\%s", dir_path, fileName.c_str());
----------------
amccarth wrote:
> MAX_PATH in Windows.  (And someday, we should get rid of these fixed buffers so we can use UNC paths.)
I absolutely agree re the fixed buffers. I'll try to make this change in all of the functions that can allocate dynamic memory.

`MAX_PATH` is not `PATH_MAX`, though -- `MAX_PATH` is a Win32 constant of 260, whereas `PATH_MAX` is an LLDB constant of 32K. (Actually PATH_MAX is defined in multiple places and was either that or MAX_PATH depending on which header was included, but mostly it was 32K so I changed them all to at least be consistently 32K.)

================
Comment at: lldb/trunk/source/Host/windows/FileSystem.cpp:231
@@ -191,3 +230,3 @@
 
-    char buf[PATH_MAX];
+    wchar_t buf[PATH_MAX + 1];
     // Subtract 1 from the path length since this function does not add a null terminator.
----------------
amccarth wrote:
> MAX_PATH
`PATH_MAX` :-)

Like most of the wide Win32 API, this function can return nice long UNC paths.

`PATH_MAX` should be better named to avoid confusion, though. Maybe `LLDB_MAX_PATH` instead?

================
Comment at: lldb/trunk/source/Host/windows/Host.cpp:73
@@ -70,3 +72,3 @@
         // Get the process image path.  MAX_PATH isn't long enough, paths can actually be up to 32KB.
-        std::vector<char> buffer(32768);
+        std::vector<UTF16> buffer(PATH_MAX);
         DWORD dwSize = buffer.size();
----------------
amccarth wrote:
> The comment above says why you can't use MAX_PATH/PATH_MAX.  If that's no longer relevant, please update the comment.
It's still relevant, I just changed it to use the symbolic constant of the same size instead of a hard-coded constant. MAX_PATH != PATH_MAX, unfortunately.


Repository:
  rL LLVM

http://reviews.llvm.org/D17107





More information about the lldb-commits mailing list