[libcxx-commits] [PATCH] D98207: [SystemZ][z/OS] Missing wchar functions libc++

Daniel McIntosh via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Wed Sep 8 14:35:05 PDT 2021


DanielMcIntosh-IBM requested changes to this revision.
DanielMcIntosh-IBM added a comment.
This revision now requires changes to proceed.

As discussed over slack, there is an edge case that needs to be addressed (the case where `ps == nullptr` on line 55 of wcsnrtombs and/or on line 61 of mbsnrtowcs).
Once that's fixed, LGTM. I've left several comments for a few more ways it could be tidied up, but none of them are functional changes.



================
Comment at: libcxx/src/support/ibm/mbsnrtowcs.cpp:29-35
+  size_t dest_converted = 0;
+  size_t source_converted = 0;
+  size_t source_remaining = src_size_bytes;
+  size_t result = 0;
+
+  wchar_t buff[MB_LEN_MAX];
+  mbstate_t mbstate_tmp;
----------------
Similar to my comments for wcsnrtombs.cpp, we might want to move some of these so they have a narrower scope. (buff to line 60, mbstate_tmp to line 61, and source_remaining is discussed on line 42)


================
Comment at: libcxx/src/support/ibm/mbsnrtowcs.cpp:42
+
+  while (source_remaining && (!dst || dest_converted < max_dest_chars)) {
+    // Converts one multi byte character.
----------------
By tweaking this condition a bit we can get rid of `source_remaining` as a state variable: `source_converted < src_size_bytes && (!dst || dest_converted < max_dest_chars)`

You may still want to keep `source_remaining` around, but declare it inside the loop so it's no longer used as a state variable (`size_t source_remaining = src_size_bytes - source_converted;`). 

This also allows you to make it obvious that `source_converted` and `dest_converted` are only declared outside the for loop so that we can read from them after: After moving all the declarations except `terminated_sequence`, `invalid_sequence`, `incomplete_sequence`, `source_converted`, and `dest_converted`, the only non-const variables you've declared before the loop are `source_converted`, and `dest_converted`. If you move the zero-initialization to the init statement of the for loop, this signals that all writes to these two variables happen within the for loop, and after we only read from them (even better, we'd actually be doing all the writing **within the init and increment of the loop statement**, showing the reader right away what the pre-conditions and post-conditions are for each loop iteration).


================
Comment at: libcxx/src/support/ibm/mbsnrtowcs.cpp:48
+
+    if (dst) {
+      size_t dest_remaining = max_dest_chars - dest_converted;
----------------
See my comment in `wcsnrtombs.cpp`.


================
Comment at: libcxx/src/support/ibm/mbsnrtowcs.cpp:80
+    }
+    source_remaining -= result;
+    source_converted += result;
----------------
If you move source_remaining into a for loop init statement, this should probably become the increment statement of the for loop. (probably along with the next 2 lines)


================
Comment at: libcxx/src/support/ibm/wcsnrtombs.cpp:27-33
+  size_t source_converted = 0;
+  size_t dest_converted = 0;
+  size_t dest_remaining = dst_size_bytes;
+  size_t char_size = 0;
+
+  char buff[MB_LEN_MAX];
+  mbstate_t mbstate_tmp;
----------------
Is there some reason we need to preserve all of these across loop iterations? For example, could we move mbstate_tmp's declaration to line 55 to limit the scope, and thereby reduce the number of variables the reader needs to keep track of. What about moving the declaration of char_size to line 42, and buff to line 54?


================
Comment at: libcxx/src/support/ibm/wcsnrtombs.cpp:37-40
+  if (!dst)
+      dest_remaining = static_cast<size_t>(-1);
+
+  while (source_converted != max_source_chars && dest_remaining) {
----------------
If you change this around so it's more like mbsnrtowcs, you can get rid of `dest_remaining` as a state variable: instead of `dest_remaining`, you set `dst_size_bytes = static_cast<size_t>(-1)`, then the loop condition can be changed to `source_converted < max_source_chars && (!dst || dest_converted < dst_size_bytes)`.

You may still want to keep `dest_remaining` around, but declare it inside the loop so it's no longer used as a state variable (`size_t dest_remaining = dst_size_bytes - dest_converted;`). This also gets rid of line 87 and 88, allowing you to cleanly turn line 86 and 89 into the increment statement in a for loop.

If you move the declarations as I suggested, then you can go one step further and move the zero-initialization of `source_converted` and `dest_converted` to the init statement of the for loop for all the reasons I explained in my comments on line 42 of mbsnrtowcs.


================
Comment at: libcxx/src/support/ibm/wcsnrtombs.cpp:43
+
+    if (dst) {
+      if (dest_remaining >= static_cast<size_t>(MB_CUR_MAX)) {
----------------
I think we could make things a bit more clear here by inverting this if and reducing the level of nesting a little. i.e.

```
if (dst == nullptr) {
  char_size = wcrtomb(nullptr, c, ps);
} else if (dest_remaining >= static_cast<size_t>(MB_CUR_MAX)) {
  char_size = wcrtomb(dst + dest_converted, c, ps);
} else {
  //...
}
```


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98207/new/

https://reviews.llvm.org/D98207



More information about the libcxx-commits mailing list