[llvm] [libc] Add monadic functions to cpp::expected (PR #66523)

Wed Sep 20 07:00:32 PDT 2023

gchatelet wrote:

> In the example you provided, it doesn't seem like there would be any difference between calling `and_then` on the next function and just calling that function on its own. None of the intermediate functions (e.g. `consume_optional_sign`) can fail, and they also don't return anything, so they'll always advance to the next one.

Not "all of them" can fail but some do : `assert_not_empty`, `consume_digits` and `materialize_value`. You can look for `return std::unexpected` to find them. These ones can fail and short circuit the parsing.

> Additionally I like the idea of limiting which functions interact with which pieces of state, but the final number parsing function always needs to interact with all of them and that seems to defeat the purpose.

In this design the monadic operations are used to do error propagation and make the parsing steps stand out. Not returning a value is a deliberate choice as it helps run the steps in any order and keep them as isolated as possible. A pure functional approach would require that each call returns a different type and that the state gets augmented from type to type making the order of operations carved in stone. But we may want to skip prefix or sign parsing or add a mandatory prefix / suffix or consume heading or trailing whitepaces.

In the end, the pieces of state have to be aggregated somehow, this logic is tied to a particular parsing implementation (overflow strategy, error strategy : `errno` / `ErrorOr` / error code / no error). The reusable parts are all the functions above that interact with part of the state, these parts are testable in isolation but the final operation has to depend on all of the state. There's no way around it. This design is still useful (I believe) because only the state required to take the final decision is exposed. Also note that the needed state can be different for different parser instances. For instance, we may want a parser that only deals with positive decimal strings making the state (and code) much smaller.

> What I do like a lot is the idea of templating the reading mechanism, and having a way for it to propagate errors. The current reading mechanism in scanf's int converter is very ugly to handle this, see https://github.com/llvm/llvm-project/blob/main/libc/src/stdio/scanf_core/int_converter.cpp#L96 . The problem is that each read first needs to check if the max width has been hit, leading to several repetitions of that "if (max_width > 1) then read a char, else return 0" pattern. By adding another templated reader layer we could use the same code for scanf and the simple strtol that doesn't need any error checking on read at all.

This is the reasoning behind `LimitingReader`, it hides the complexity of dealing with the length limitation in the parser itself.
Other benefits of templated readers:
 - char type can be a template parameter, this way we support wide char automatically.
 - reader can be instrumented for tests so we can make sure that the stream was not read more than strictly required.
 - compiler can optimize through the abstraction (different code for `FILE*` or C string).
Downside:
 - it can lead to code bloat for embedded systems.

https://github.com/llvm/llvm-project/pull/66523