[llvm] [libc] Add monadic functions to cpp::expected (PR #66523)

Mon Sep 25 07:25:15 PDT 2023

gchatelet wrote:

> There are some steps that need to be executed conditionally. In your example, you always run the `consume_optional_prefix` step, which sets the base based on the prefix. It's an important step, but only relevant when the starting base is 0 or 16. For any other base, it's unnecessary, and as it's written right now it will always treat the base as if it's 0.

Conditional code is not a problem at all. It's quite easy to have an early return in `consume_optional_prefix` that will drop some of the work if unneeded.

> Additionally, there are some steps that may succeed before the final step. As an example, in string to float, there should be a step that checks for "inf" and "nan". If one of those special cases is found, it should return immediately, since no further parsing is necessary.

Now that's a fair point. The _choice_ between "inf" / "nan" and a valid float cannot be expressed directly as a single sequence of continuations. The _choice_ is usually done by
 1. having several parsers run in lockstep (through Deterministic Finite Automaton), or
 2. use a greedy approach where we test them one by one by looking ahead and backtracking (if parsing is unambiguous).

---
**Small digression**

This second method is what's currently implemented. Unfortunately it doesn't play well with the `FILE*` interface which only allows for a backtracking of exactly one char. To compensate for that, the current parser copies the characters over to a `char` buffer. The `char` buffer then offers random access and enables the greedy approach.
A direct consequence of the copy though is that it needs a buffer to write to, and since the string is potentially very large we have to depend on heap allocation. This adds "out of memory" as a possible outcome of parsing a number. I believe this is unfortunate (especially for embedded) so I would like to explore the first approach instead.

---

Back to the main discussion, I agree that we won't be able to express the _choice_ concept  with a sequence of monadic operations but I don't think it's a big deal, we can still compose the parser out of reusable pieces. The current code could already be written like so.

pseudo code:
```
cpp::expected<T, error> parse_float(Reader& reader, bool is_positive) {
  CharVector buffer;
  // fill buffer with reader
  if (buffer.starts_with("inf")) return is_positive ? std::numeric_limits<T>::infinity : -std::numeric_limits<T>::infinity;
  if (buffer.starts_with("nan")) return std::numeric_limits<T>::quiet_NaN;
  CStringReader r(buffer);
  return assert_not_empty(r)
         .and_then(....);
}

```

To move forward (and if you agree), I will put this patch on hold and start working on a full implementation (integer and floating point). This way I think we will be in a better position to discuss the approach. In case we don't reach an agreement, I think we could still include some of the parts you found useful (templated code, limiting reader, etc...)

https://github.com/llvm/llvm-project/pull/66523