[libcxx-commits] [libcxx] [libc++][chrono] Loads leap-seconds.list in tzdb. (PR #82113)
via libcxx-commits
libcxx-commits at lists.llvm.org
Mon Apr 1 10:56:41 PDT 2024
EricWF wrote:
Code review is a consensus building activity. There are two or more engineers, both with different skills and experience, and the goal is utilize the knowledge of both parties to validate changes. I have not always been easy to make concede and I'm working to improve that. I would appreciate a similar effort on your part.
> > I think we should split out the interface for parsing TZDB-type strings.
> > I've noticed a number of complications arising from testing TZDB internals using only the public standard interface. The public interface is simply insufficient to allow proper testing and validation of the libraries internals.
> > I think we should split our concerns into two, They are:
> > ```
> > 1. Implement & Test the C++ Standard interface.
> >
> > 2. Implement a secure, platform/system agnostic TZDB parser/adapter.
> > ```
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > The latter of the two problems is much hairier of a task. There are a number of formats, each with a number of revisions, that we need the parser to accept. And more importantly, there's an infinite number of inputs we need the parser to securely reject.
>
> The binary format files have different revisions. From what I can tell we only need to parse one revision for the text format.
We should meet in private to discuss the security issues further.
>
> > I say "securely" reject because the time zone database parser provides a potential attack vector for any attacker with some control over the bytes we read from disk. So validation using both unit tests and fuzzing tests is essential.
> > We can achieve this testibility without affecting the public interface by abstracting out the parser into a separate testable unit.
>
> That would give a public symbol to the parser. I kept it all internal so we can adapt the code if the IANA database changes its format. Currently I need to parse two files, but who knows what happens in a few years time. Since we can already feed hand-crafted databases to these tests I think we don't need to change anything for the testability.
You misunderstand me. The point of separating out the interface is to avoid public symbols in the library. When testing the parser, we would compile the source code into Here's how you do it without exposing any public symbols:
We could use it to remove the public symbols that are
```c++
// src/tzdb_parser.hpp
// We can add a mechanism to ensure this is only included by one source file in the build.
namespace { // anonymous namespace, internal linkage
struct Parser {
// implementation
};
}
```
```c++
// src/tzdb.cpp
#include "tzdb_parser.hpp"
// This is the only source file which uses the header (outside of the tests).
```
```c++
// test/libcxx/test_tzdb_parser.pass.cpp
// INCLUDE_DIRECTORIES: %LIBCXX_ROOT/
#include "src/tzdb_parser.hpp" // Do this in as many tests as you want.
void test() {
TZDBParser Parser;
// tests...
}
```
Since exporting private symbols is a concern, you can use this method to remove the private symbols TZDB already exports like `__init_tzdb` and all of the symbols from `std::__1::chrono::tzdb_list::__impl`. Further you can now use regular names and put these symbols outside of namespace `std` making it clear that these symbols are fully internal.
>
> > The remaining tests, for conformance of the public API, would look almost identical to what we have now, except with fewer places needing to reference internal implementation details.
>
> I'm not sure what you exactly mean. There are public tests that don't use internals. There libc++ specific tests that use the internals. These private tests test boundary conditions. I think it would be too much effort to manually validate every permutation; we should use fuzz tests instead.
Your right, the `std/` tests that don't reference internals.
What I meant to say is that you can write tests for the internals that don't reference the public API or even private bits of namespace `std`. This makes it clearer what the testable surface of the internals is meant to be, and what's just normal internal code tested via the public interface.
https://github.com/llvm/llvm-project/pull/82113
More information about the libcxx-commits
mailing list