[libcxx-commits] [PATCH] D64818: [libcxx] Construct path using a instance of std::locale

Jean Guegant via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Mon Jul 22 12:55:04 PDT 2019


jguegant marked 8 inline comments as done.
jguegant added a comment.

Thanks a lot @Quuxplusone! I am aware that you cannot take any final decision on what gets in but these are really helpful comments here.
I will fix some of you remarks and upload a better diff (with proper context to get a better picture) afterwards.

Any way to gently attract the three reviewers here?



================
Comment at: libcxx/include/__locale:1367
+        const int __sz = 32;
+        _InternT __buf[__sz];
+        _InternT* __bn;
----------------
Quuxplusone wrote:
> Peanut gallery says: I'm not 100% sure of the rules about `const`. I observe that this line seems to be okay even in `-pedantic` mode, but intuitively I would expect `__buf` to be a VLA. IMHO it would be clearer to make `__sz` a `constexpr int` instead of just a `const int` — or even my generally preferred formulation,
> 
>     _InternT __buf[32];
>     constexpr ptrdiff_t __sz = sizeof(__buf) / sizeof(__buf[0]);
This function is a dumb generalisation of the functions 1389, 1426 which had this pattern.

Correct me if I am wrong but this will work as C-arrays have such syntax:

```
D1 [ constant-expressionopt ] attribute-specifier-seqopt
```

Where:

```
The constant-expression shall be a converted constant expression of type std::size_­t ([expr.const]).
```

Here const int should work thanks to a special rule for const-qualified integrals:

```
A variable is usable in constant expressions after its initializing declaration is encountered if it is a constexpr variable, or it is of reference type or of const-qualified integral or enumeration type, and its initializer is a constant initializer.
```


================
Comment at: libcxx/include/__locale:1374
+            __throw_runtime_error("locale not supported");
+        for (const _InternT* __p = __buf; __p < __bn; ++__p, ++__s)
+            *__s = (wchar_t)*__p;
----------------
Quuxplusone wrote:
> `__s` is of user-provided type. Depending on level of library-maintainer paranoia, you might want to make this `++__p, void(), ++__s`, or pull `++__s` down into the body of the for-loop (and add curly braces). The current formulation will call a user-defined `operator,(const char*, _OutputIterator)` if one exists. (And will perform ADL to find out if one exists, regardless.)
Good idea! Now that this function accept more than one type, it seems a reasonable to be paranoid.


================
Comment at: libcxx/include/filesystem:749
+template <class _InputIt, 
+         class = typename enable_if<!is_constructible<const char*, _InputIt>::value>::type>
+inline _LIBCPP_INLINE_VISIBILITY
----------------
Quuxplusone wrote:
> `class = _EnableIf<!is_constructible<const char*, _InputIt>::value>`
> But also, how is that constructibility relevant? Does the Standard say something like "If the input iterator is explicitly convertible to `const char*` then do X, else do Y"?
> 
> Furthermore, this codepath is going to do heap-allocation with `string __s(__b, __e)`. Can we avoid the heap-allocation somehow?
The idea here is to give the priority to the overload with `const char*` which effectively make both `char*` and `const char*`being routed by the first overload line 739. Obviously, my intent is not really clear here, I wonder how I could express that in a better way if I continue in that direction. Maybe having 3 overloads? const char*, char* and generic? A more descriptive trait name?


================
Comment at: libcxx/include/filesystem:753
+  const string __s(__b, __e);
+  return __widen_from_char_iter_pair(__s.data(), __s.data() + __s.size(), __loc);
+}
----------------
Quuxplusone wrote:
> I'm worried about the way you use overloading on the name `__widen_from_char_iter_pair`. If some well-intentioned maintainer removed the `const` on line 752, then line 753 would turn into an infinite recursion. I don't see any reason to use overloading here. Maybe change `__widen_from_char_iter_pair` on lines 739, 753, 766, 776, 793, and 811 to `__widen_from_char_pointer_pair`? That would leave only line 889 as a potential source of extra heap-allocations...
I believe that this would not create an infinite recursion, the enable_if is sending both char* and const char* to the first overload  line 739.

I am not sure if there is a smart way to avoid allocation here. Maybe if I call `codecvt::in` on every character from the iterator. But wouldn't that be slow? What happen if your code-point is made of 3-4 char instead of one? Should I then copy X amounts of characters into a buffer allocated on the stack and send process the iterator chunks by chunks?
I have a similar issue line 808 and if anyone has a clever solution, I am all for it!

I had a look at how other constructors for path already in place (like line 681 or 692) do it and they already heap-allocate on even more code-paths:
```
  template <class _Iter>
  static void __append_range(string& __dest, _Iter __b, _Iter __e) {
    static_assert(!is_same<_Iter, _ECharT*>::value, "Call const overload");
    if (__b == __e)
      return;
    basic_string<_ECharT> __tmp(__b, __e);
    _Narrower()(back_inserter(__dest), __tmp.data(),
                __tmp.data() + __tmp.length());
  }
```

If we go in your direction, 889 would now heap-allocation in all scenarios, whereas now the 889 overload supplied with const char* and char* would avoid allocating a string.


================
Comment at: libcxx/include/filesystem:792
+    for (; *__e != '\0'; ++__e)
+      ;
+    return __widen_from_char_iter_pair(__b, __e, __loc);
----------------
Quuxplusone wrote:
> This could probably be more simply expressed as
> ```
> string_view __sv(__b);
> return __widen_from_char_iter_pair(__sv.data(), __sv.data() + __sv.size(), __loc);
> ```
Neat! Will do that.


================
Comment at: libcxx/include/filesystem:838
+  using _EnableIfWidenableFromCharSource =
+      typename enable_if<__is_widenable_from_char_source<_SourceOrIter>::value, _Tp>::type;
+  
----------------
Quuxplusone wrote:
> ```
> template <class _SourceOrIter>
> using _EnableIfWidenableFromCharSource =
>     _EnableIf<__is_widenable_from_char_source<_SourceOrIter>::value>;
> ```
> You don't seem to use the `_Tp` parameter for anything at the moment.
Indeed, I will remove it!


================
Comment at: libcxx/test/std/input.output/filesystems/class.path/path.member/path.construct/source_and_locale.pass.cpp:32
+template <class ...Args>
+void RunTestCase(const char* TestPath, const char* Expect, std::locale Locale, Args... args) {
+  using namespace fs;
----------------
Quuxplusone wrote:
> Should `std::locale` be passed as `const std::locale&` instead? Does it matter? (I don't know.)
I was thinking that locale is a pretty cheap type (a sort of reference counted pointer), but I might be wrong. Taking it by reference cannot make it worst ;)


================
Comment at: libcxx/test/std/input.output/filesystems/class.path/path.member/path.construct/source_and_locale.pass.cpp:97
+    static_assert(std::__is_input_iterator<It>::value, "");
+    //static_assert(std::is_constructible<path, It, std::locale>::value, "");
+  }
----------------
Quuxplusone wrote:
> Why is this commented out?
This should not be here. Thanks for spotting it!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64818/new/

https://reviews.llvm.org/D64818





More information about the libcxx-commits mailing list