[Lldb-commits] [PATCH] D45977: Always normalize FileSpec paths.

Greg Clayton via Phabricator via lldb-commits lldb-commits at lists.llvm.org
Tue Apr 24 07:34:20 PDT 2018

clayborg added a comment.

I am trying to switch to using llvm::sys::path::remove_dots() and we will see where we end up. By switching to this it means:

"./foo.c" --> "foo.c"
"./foo/bar.c" --> "foo/bar.c"

This makes is easier to see if a relative path matches another FileSpec since we don't have to remove the "./" from the beginning of the path.

Comment at: source/Breakpoint/BreakpointResolverFileLine.cpp:131
+    while (relative_path.consume_front("."))
+      /* Do nothing. */;
+  }
I will be removing this after I switch to using llvm::sys::path::remove_dots() instead of the Normalize() I converted.

Comment at: source/Utility/FileSpec.cpp:107
+  for (auto i = path.find('/'); i != llvm::StringRef::npos;
+       i = path.find('/', i + 1)) {
+    const auto nextChar = safeCharAtIndex(path, i+1);
labath wrote:
> aprantl wrote:
> > clayborg wrote:
> > > clayborg wrote:
> > > > aprantl wrote:
> > > > > clayborg wrote:
> > > > > > I am fine switching to using the llvm functions for removing, this is only detecting if any normalization needs to happen. If there is an equivalent LLVM function that will only run through the string one time to detect all needs for normalization (no multiple passes looking for "..", then for "." etc like we used to have), I would be happy to use it, but there doesn't seem to be. We are trying to avoid having to create a "SmallVectorImpl< char >" for all paths if they don't need fixing here. This function is just iterating through an llvm::StringRef just to see if normalization needs to happen. If it does, then we make a SmallVectorImpl< char > and we fix the path. We can easily use llvm functions for the fixing part. 
> > > > > I see, you want to avoid copying the string when it isn't necessary to do so. IMHO the best way to do this is to add a `bool hasDots(const Twine &)` function to llvm::sys::path and add a `bool remove_dot_dot` parameter to `removeDots()`. Since you have already written the unittests, adding the function to LLVM shouldn't be any extra work (I'll help reviewing), and we'll have 100 LoC less to maintain.
> > > > > 
> > > > > This way we can avoid having two functions that perform path normalization that superficially look like they might do the same thing, but a year from now nobody can really tell whether they behave exactly the same, and whether a bug found in one implementation should also be fixed in the other one, etc... .
> > > > I want to know if there are redundant slashes as well. I want something like "bool llvm::sys::has(const Twine &, bool dots, bool dot_dots, bool redundant_slashes)". But I don't see that getting accepted? I just want a "bool llvm:sys::path::contains_redundant_stuff(const Twine &)".  Not sure on the name though. needs_normalization? can_be_shortened? Everything in LLVM right now assumes that dots, dot dots, slashes and realpath stuff happen in the same function. Not sure how to break that up without ruining the performance of the one and only loop that should be happening. I like the current approach since it doesn't require chopping up the string into an array and iterating through the array.
> > > Also not sure if the LLVM stuff will try to substitute in the current working directory for "."?
> > I just read the source code of remove_dots() (http://llvm.org/doxygen/Path_8cpp_source.html#l00699) and if I'm not mistaken, it actually also removes double-separators as a side-effect, since it iterates over the path components of the string as it constructs the copy. It also seems to avoid the copy operation if it isn't necessary.
> > 
> > Could you take another look? Perhaps I'm missing something, (or perhaps we can just turn this into a small addition to remove_dots).
> > 
> > > Everything in LLVM right now assumes that dots, dot dots, slashes and realpath stuff happen in the same function.
> > I'm not sure I understand.
> FWIW, this could be implemented in a much simpler way, while still making sure we run through the string only once. I'm thinking of something like:
> ```
> bool first = true;
> for(it = llvm::sys::path::begin(path, style), end = llvm::sys::path::end(path); it != end; ++it) {
>   if (!first && (*it == "." || *it == "..")) return true;
>   first = false;
> }
> return false;
> ```
This will all go away, I am just going to call llvm::sys::path::remove_dots()...

Comment at: source/Utility/FileSpec.cpp:187
+        continue; // Skip '.' unless it is the first thing
+    }
+    else if (component != "..") {
Yes we would need to remove this. Again, I will switch to using llvm::sys::path::remove_dots and we can deal with the fallout there,.

Comment at: source/Utility/FileSpec.cpp:406
+      m_directory.Clear();
+  }
> they resolve to the same file in a virtual filesystem, where all referenced path components exist and are directories

Normalizing happens after resolving and if we don't resolve a path, we have no way to know what is a directory and what isn't. We will be setting breakpoints for remote targets quite a bit in LLDB and ww can't assume or stat anything in a path. So I would say FileSpec's are equivalent if the relative paths match all components.

> Now the (c) part would imply that foo/.. should normalize to ./. and not ., which is a bit odd, but it is consistent with our stated intention of preserving directory information. If we do not want to have ./. here, then we need to come up with a different definition of what it means to be "normalized".

I guess we could make the  m_directory contain "." and m_filename contain nothing for the "./." case. It doesn 't make  sense to have "." in both places.

Comment at: source/Utility/FileSpec.cpp:857
+        !IsPathSeparator(components[i].back(), syntax) &&
+        (result.empty() || !IsPathSeparator(result.back(), syntax)))
       result += GetPreferredPathSeparator(syntax);
labath wrote:
> Why is this necessary? `result` can't be empty because we have just appended something to it (and we've checked that `components[i]` is non-empty).
Because if you don't check this joining {"/', "foo.txt"} will result in "//foo.txt" which is wrong,.


More information about the lldb-commits mailing list