[libcxx-commits] [PATCH] D91176: [20/N] [libcxx] Implement parsing of root_name for paths on windows

Martin Storsjö via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Fri Jan 8 08:22:24 PST 2021


mstorsjo added a comment.

In D91176#2486783 <https://reviews.llvm.org/D91176#2486783>, @curdeius wrote:

> I may be repeating myself, but my biggest remark is that there are no tests. I know there's no libcxx CI on Windows, but still, that will make reviewing easier, looking what are the new test cases and what has been forgotten.

Thanks for having a look!

Yeah, that's a good point, and I'm sorry that they aren't available for viewing here. Making the tests compatible with windows requires a huge number of patches (50+), and reviewer time is very limited. I started out there, but as it was clear that they weren't getting handled at a rate where it'd be possible get them beforehand, I postponed occupying reviewer time with them. I hoped to at least get the actual functionality merged by the LLVM 12 branch date (getting reviewer time for that alone is a stretch).

Also for the tests, there's no clear policy for what way of handling the differing expected behaviours is acceptable - I tried the waters with e.g. D89943 <https://reviews.llvm.org/D89943>, which hasn't gotten an opinion yet. (E.g. is it ok with this kind of lots of ifdefs in the tables, or should they be split out into separate files, or what.) FWIW, that particular test update actually happens to cover mostly the things that this patch touches.

There's a bit of nuance difference between what's returned from MSVC STL and libc++ with these patches though, in a mostly pathological case though. If iterating over the path `///foo`, libcxx today only returns the tokens `/` and `foo` while MSVC STL returns `///` and `foo`. But that's more of an overall libcxx implementation issue (or intentional design?) than an issue about how to implement parsing of the windows specific features. I'd love to get to nailing down those fine details when the feature actually can be compiled and there's actual motion in getting the tests fixed for windows, but I don't want to postpone getting a generally decent implementation merged until the next release in 6 months, given the extremely scarce reviewer availability.



================
Comment at: libcxx/src/filesystem/operations.cpp:202-205
+        PosPtr TkStart = consumeRootName(SepEnd, REnd);
+        if (TkStart == REnd)
+          return makeState(PS_InRootDir, RStart, RStart + 1);
         return makeState(PS_InTrailingSep, SepEnd + 1, RStart + 1);
----------------
curdeius wrote:
> This looks strange. IIUC it means that you consumed a separator, and then, if `TkStart != nulltptr`, you consumed drive (e.g. "C:").
> Can it happen anywhere? `/C:`?
> 
> Apart from that, TkStart returned by `consumeRootName` isn't used after the if (in `makeState`).
> Shouldn't it be used instead of `SepEnd + 1` (if non null)?
This is for the decrement function, which iterates backwards. So when parsing "c:/" backwards, this first consumes the separator. If we hit the end (which would happen for a path like "/" but not for "c:/"), it's considered a root dir. If it's not the last token, but followed by (parsing-wise, i.e. backwards) a root name, and that root name is at the start of the string, then the separator also is a root dir.

When parsing forward, the use of `consumeRootName` is easy as we only need to try to call it at the very beginning. When parsing backwards, we need to speculatively test it at a few places, in order to know how to classify a previous separator.

So here, the root name parsed by `consumeRootName` isn't actually consumed, i.e. we don't really use the `TkStart` it returns for returning parts of the string to the caller, but only for classifying the separator.


================
Comment at: libcxx/src/filesystem/operations.cpp:207-209
+        PosPtr TkStart = consumeRootName(RStart, REnd);
+        if (TkStart == REnd)
+          return makeState(PS_InRootName, TkStart + 1, RStart + 1);
----------------
curdeius wrote:
> You don't check for `TkStart` not being null.
If `TkStart == REnd`, it can't be null at the same time.


================
Comment at: libcxx/src/filesystem/operations.cpp:210
+          return makeState(PS_InRootName, TkStart + 1, RStart + 1);
+        TkStart = consumeName(RStart, REnd);
         return makeState(PS_InFilenames, TkStart + 1, RStart + 1);
----------------
curdeius wrote:
> Why `RStart`? You should use `TkStart` (+ 1?) if it's non-null.
No, this is a case of speculative use of `consumeRootName` backwards - if we didn't end up at `REnd` it wasn't actually a root name for real, and we just handle it like we did before.


================
Comment at: libcxx/src/filesystem/operations.cpp:222
+      PosPtr TkStart = consumeRootName(SepEnd ? SepEnd : RStart, REnd);
+      if (TkStart == REnd) {
+        if (SepEnd)
----------------
curdeius wrote:
> Pretty much the same remarks as above. No null check of `TkStart`.
> Then, `SepEnd` is used instead of `TkStart`.
If `TkStart == REnd`, it can't be null.

If `TkStart` returned from `consumeRootName` is something else than `REnd`, then this reduces down to exactly the same code as before (modulo renaming the variable from `TkEnd` to `TkStart`, making it consistent with the other cases in this function, as the pointer is to the start of a token, when tokenizing backwards).

If `consumeRootName` returned `REnd`, then we found a root name. If we consumed a separator in `SepEnd`, we should return this separately as a root dir, otherwise return only the root name.

I.e. possible cases are "foo/bar" (normal case), "foo/c:/bar" (`consumeRootName` returns a non-null `TkStart`, but it isn't `REnd`, so we proceed like we used to do before), `c:/bar` (we've consumed `bar` before, `SepEnd` points at, or actually one char before, the separator, and we return the separator as root dir) and `c:bar` (we've consumed `bar` and now consume the root name `c:`).


================
Comment at: libcxx/src/filesystem/operations.cpp:347-359
+  PosPtr consumeSeparators(PosPtr P, PosPtr End, int N) const noexcept {
+    PosPtr Ret = consumeSeparator(P, End);
+    if (Ret == nullptr)
+      return nullptr;
+    if (P < End) {
+      if (Ret == P + N)
+        return Ret;
----------------
curdeius wrote:
> This function, when called with N==2 will consume e.g. `//`, `\\`, but also `/x`, and `\x`.
> Is it intended?
> I was rather expecting a loop calling `consumeSeparator` up to `N` times.
The existing function `consumeSeparator` actually is a bit misnamed, it consumes one or more separators. This function checks that it consumed exactly `N` separators, otherwise we return null and don't consume anything.

Should this maybe be called `consumeExactlyNSeparators` for clarity? (That name is a bit clumsy though.)


================
Comment at: libcxx/src/filesystem/operations.cpp:363
+    PosPtr Start = P;
+    if (P == nullptr || P == End || isSeparator(*P))
       return nullptr;
----------------
curdeius wrote:
> I'm not a big fan of allowing P being nullptr.
This (and the other case) was added for making `consumeNetworkRoot` more elegant. I could also spell it out like this:
```
    if (P < End) {
      P = consumeSeparators(P, End, 2);
      if (P == nullptr)
        return nullptr;
      return consumeName(P, End);
    } else {
      P = consumeName(P, End);
      if (P == nullptr)
        return nullptr;
      return consumeSeparators(P, End, 2);
    }

```


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91176/new/

https://reviews.llvm.org/D91176



More information about the libcxx-commits mailing list