[Lldb-commits] [PATCH] D17107: [lldb] Unicode support on Win32

Zachary Turner via lldb-commits lldb-commits at lists.llvm.org
Wed Feb 10 16:31:46 PST 2016


zturner requested changes to this revision.
zturner added a comment.
This revision now requires changes to proceed.

Overall I like the idea and I'm happy to get Unicode paths and strings working properly.  I'll look at the rest in more detail later, but I have some starting comments for now.

One thing I'm I'm not real excited about is how ugly this makes the code, but then again there's not a great way to handle this in an elegant fashion.  But I do have one suggestion.  Suppose we had a class like this:

  class WinUtfString
  {
  public:
      explicit WinUtfString(const char *utf8)
          : m_utf8(utf8), m_utf16(nullptr)
      { }
      explicit WinUtfString(const UTF16 *utf16)
          : m_utf8(nullptr), m_utf16(utf16)
      { }
      // Probably would want to add more useful constructors here that take StringRefs, std::strings, ConstStrings, and whatever else.
  
      const char *GetUtf8()
      {
          if (!m_utf8)
              ConvertToUtf8();
          return m_utf8;
      }
      const UTF16 *GetUtf16()
      {
          if (!m_utf16)
              ConvertToUtf16();
          return m_utf16;
      }
  
  private:
      void ConvertToUtf8();  // initialize m_utf8 from m_utf16
      void ConvertToUtf16();  // initialize m_utf16 from m_utf8
  
      llvm::SmallVector<UTF8, 128> m_utf8;
      llvm::SmallVector<UTF16, 128> m_utf16;
  };

The point of all this?  Much of the code becomes cleaner and easier to read.  For example, this:

  llvm::SmallVector<UTF16, 128> srcBuffer, dstBuffer;
  if (!llvm::convertUTF8ToUTF16String(src.GetCString(), srcBuffer))
      error.SetError(1, lldb::eErrorTypeGeneric);
  else if (!llvm::convertUTF8ToUTF16String(dst.GetCString(), dstBuffer))
      error.SetError(1, lldb::eErrorTypeGeneric);
  else if (!::CreateHardLinkW((LPCWSTR)srcBuffer.data(), (LPCWSTR)dstBuffer.data(), nullptr))
      error.SetError(::GetLastError(), lldb::eErrorTypeWin32);

Becomes this:

  WinUtfString utfSrc(src.GetCString());
  WinUtfString utfDest(dest.GetCString());
  if (!CreateHardLinkW(utfSrc.GetUtf16(), utfDest.GetUtf16(), nullptr))
      error.SetError(::GetLastError(), lldb::eErrorTypeWin32);

and this:

  llvm::SmallVector<UTF16, 128> wexecutable;
  llvm::convertUTF8ToUTF16String(executable, wexecutable);
  
  launch_info.GetArguments().GetQuotedCommandString(commandLine);
  llvm::SmallVector<UTF16, 64> wcommandLine;
  llvm::convertUTF8ToUTF16String(commandLine, wcommandLine);
  
  llvm::SmallVector<UTF16, 128> wworkingDirectory;
  llvm::convertUTF8ToUTF16String(launch_info.GetWorkingDirectory().GetCString(), wworkingDirectory);
  
  BOOL result = ::CreateProcessW((LPCWSTR)wexecutable.data(), (LPWSTR)wcommandLine.data(), NULL, NULL, TRUE, flags,
                                 env_block, (LPCWSTR)wworkingDirectory.data(), &startupinfo, &pi);

becomes this:

  launch_info.GetArguments().GetQuotedCommandString(commandLine);
  WinUtfString utfExecutable(executable);
  WinUtfString utfCommandLine(commandLine);
  WinUtfString utfWorkingDirectory(launch_info.GetWorkingDirectory().GetCString());
  BOOL result = ::CreateProcessW(utfExecutable.GetUtf16(), utfCommandLine.GetUtf16(), NULL, NULL, TRUE, flags,
                                 env_block, utfWorkingDirectory.GetUtf16(), &startupinfo, &pi);

etc.

I know this adds more work for you, but this is the type of thing that if it's not done right hte first time, it will stagnate and be gross forever.  So we should think about how to do this in the least-bad way possible.


================
Comment at: cfe/trunk/tools/libclang/CIndexer.cpp:57
@@ -55,3 +56,3 @@
   MEMORY_BASIC_INFORMATION mbi;
-  char path[MAX_PATH];
+  wchar_t wpath[MAX_PATH];
   VirtualQuery((void *)(uintptr_t)clang_createTranslationUnit, &mbi,
----------------
You'll probably need to submit this as a separate patch to cfe-commits, since it goes in a separate repo.  You can then commit them independently.

================
Comment at: lldb/trunk/source/Commands/CommandCompletions.cpp:171
@@ -168,1 +170,3 @@
         {
+#ifdef _WIN32
+            llvm::SmallVector<UTF16, 128> wpartial_name_copy;
----------------
I'm not fond of this approach of sprinkling #ifdefs around in every location where we call into a file system API.  We have `lldb/Host/FileSystem`, probably some of these methods should be moved over there, and we should provide a windows implementation and a non-windows implementation.  This way in places like this, you simply write:

    isa_directory = FileSystem::GetPermissions(partial_name_copy) & FileSystem::eDirectory);

and all this ugliness is hidden.

================
Comment at: lldb/trunk/source/Core/ConnectionSharedMemory.cpp:138
@@ -134,10 +137,3 @@
 #ifdef _WIN32
-    HANDLE handle;
-    if (create) {
-        handle = CreateFileMapping(
-            INVALID_HANDLE_VALUE,
-            NULL,
-            PAGE_READWRITE,
-            llvm::Hi_32(size),
-            llvm::Lo_32(size),
-            name);
+    HANDLE handle = INVALID_HANDLE_VALUE;
+    llvm::SmallVector<UTF16, 128> wname;
----------------
Same thing applies here, we should probably have a `Host/SharedMemory.h` that exposes a `Create()` method.  Although, in this particular case the `#ifdef` was already here so I can't entirely complain, but it would be nice to improve the code as we go and get ifdefs like this out of generic code.

================
Comment at: lldb/trunk/source/Core/Disassembler.cpp:881
@@ -878,3 +880,3 @@
     }
-        
-    FILE *test_file = fopen (file_name, "r");
+    FILE *test_file = nullptr;
+#if _WIN32
----------------
Would LLDB's `File` class work here instead of using raw APIs?  I see you fixed the code in `lldb_private::File`, so perhaps simply using that class here instead of a `FILE*` would allow you to reuse the fix from that class.

================
Comment at: lldb/trunk/source/Host/common/File.cpp:353
@@ +352,3 @@
+        int stat_result;
+#ifdef _WIN32
+        llvm::SmallVector<UTF16, 128> wpath;
----------------
Ahh, here's that method I was looking for earlier when I mentioend `FileSystem`.  So I guess you don't have to implement it, it's already here.  So you could just use this method earlier in the part that gets file permissions by manually calling stat.

================
Comment at: lldb/trunk/source/Host/common/FileSpec.cpp:108
@@ +107,3 @@
+        if (stat_result == 0)
+            *stats_ptr = *reinterpret_cast<struct stat *>(&file_stats);
+        return stat_result == 0;
----------------
I don't think this line is correct.  The source and destination struct types do not always have the same layout, so I think you may need to copy the values out one by one.

================
Comment at: lldb/trunk/source/Host/common/FileSpec.cpp:237
@@ +236,3 @@
+    int stat_result;
+#ifdef _WIN32
+    llvm::SmallVector<UTF16, 128> wpath;
----------------
Maybe you can just call `GetStats()` here

================
Comment at: lldb/trunk/source/Host/common/FileSpec.cpp:829
@@ -799,2 +828,3 @@
 #ifdef _WIN32
-    auto attrs = ::GetFileAttributes (resolved_path);
+    llvm::SmallVector<UTF16, 128> wpath;
+    if (!llvm::convertUTF8ToUTF16String (resolved_path, wpath))
----------------
I think a `FileSystem::IsSymbolicLink()` here would be better, and then have `File` call `FileSpec::IsSymbolicLink()`

There's currently some confusion and overlap about what goes in `FileSpec` and what goes in `FileSystem`, but my general opinion is that wherever possible, anything that queries the filesystem should be a static method in `Host/FileSystem`, and any other classes who want to do the same thing can call the methods in `FileSystem`.  This way you don't have to, for example, construct a `FileSpec` from a string just to check if something is a symbolic link since you could just call the low level method directly.

================
Comment at: lldb/trunk/source/Host/common/FileSpec.cpp:1173-1180
@@ -1134,4 +1172,10 @@
 
-            char child_path[MAX_PATH];
-            const int child_path_len = ::snprintf (child_path, sizeof(child_path), "%s\\%s", dir_path, ffd.cFileName);
+            std::string fileName;
+            if (!llvm::convertUTF16ToUTF8String((const UTF16*)ffd.cFileName, fileName))
+            {
+                continue;
+            }
+
+            char child_path[PATH_MAX];
+            const int child_path_len = ::snprintf (child_path, sizeof(child_path), "%s\\%s", dir_path, fileName.c_str());
             if (child_path_len < (int)(sizeof(child_path) - 1))
----------------
sprintf'ing parent and child paths together to concatenate them doesn't seem like the best idea.  Can we use `llvm::sys::path::append()` here instead?


Repository:
  rL LLVM

http://reviews.llvm.org/D17107





More information about the lldb-commits mailing list