[libcxx-commits] [PATCH] D98703: [libcxx] [test] Readd randomness to temporary directory names
Arthur O'Dwyer via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Tue Mar 16 14:52:27 PDT 2021
Quuxplusone added inline comments.
================
Comment at: libcxx/test/support/filesystem_test_helper.h:324-331
+#if !defined(_LIBCPP_HAS_NO_RANDOM_DEVICE)
+ base += "-" + random_utils::random_string(4);
+#endif
int i = 0;
fs::path p = tmp / (base + "-static_env." + std::to_string(i));
while (utils::exists(p.string())) {
p = tmp / (base + "-static_env." + std::to_string(++i));
----------------
mstorsjo wrote:
> Quuxplusone wrote:
> > mstorsjo wrote:
> > > Quuxplusone wrote:
> > > > How about (suggested edit)? My logic is that if this piece of code ever produces a collision EVER, then we need a programmer to revisit this piece of code; there's no sense in trying to make the code "work around" a collision, because the whole point is that collisions shouldn't happen.
> > > >
> > > > However, I still don't understand why there's a collision here in the first place. You say "As some multiple test subdirectories contain tests with the same file names..." but shouldn't we just be using the entire (unique) path, then? Why (and where) are we throwing away the parts of the path that aren't the filename?
> > > >
> > > > Also, is it possible that we could still have collisions when the same llvm-lit process spawns two copies of the same test (does it ever do that? idk) or when two different llvm-lit processes spawn the same test at the same time (do we support running two llvm-lit runs simultaneously? idk)
> > > No, we can't just add an incrementing integer and forcibly require it to be unique - it's a race condition.
> > >
> > > Originally this used random names and all was well - but to simplify building in environments without randomness available, it was chosen to use `cwd.filename()` as uniquefier, under the assumption that `cwd.filename()` is unique within the set of tests that use these helpers.
> > >
> > > At this point, cwd would be something like `<build-dir>/test/std/input.output/filesystems/class.directory_iterator/directory_iterator.members/Output/copy_assign.pass.cpp.dir`. The assumption made in the commit that removed randomness was that `cwd.filename()`, i.e. `copy_assign.pass.cpp.dir` would be unique (i.e. the only possible race condition is if there's multiple instances in parallel within the same test), but in this case, there's 3 `copy_ssign.pass.cpp` under test/std/input.output/filesystems.
> > >
> > > Including the full path from `cwd` on top of `tmp` is a huge mess (also see D98702 about an issue with leftover intermediate directories - we don't want to create a deep tree of possibly leftover intermediate directoreis).
> > >
> > > Adding a small bit of randomness should guard against the few testcases with similar names hitting the same dir at the same time.
> > >
> > > The alternative woule perhaps be to create the temp dir tree directly under cwd instead of under the system temp root - maybe that would be ok? That would at least avoid littering that dir excessively. But maybe there's other reasons why the filesystem tests wants to keep these temp files outside of the normal test file tree?
> > >
> > > (One run of llvm-lit normally doesn't launch the same test twice afaik, but two runs of llvm-lit in parallel on the same tree would trigger the same issues. I don't think we explicitly forbid being able to run two instances of llvm-lit with overlapping sets of tests at the same time - being able to do that sounds handy to me.)
> > >
> > > it's a race condition
> >
> > Do you mean because the integer isn't atomic? I thought of that, but I figured that incrementing an unsigned int is at least //vastly more likely// to be safe in practice than calling `operator()` on an `mt19937`, which is what you're proposing to do in this patch right now.
> >
> > Would it make you feel better to stick a `std::atomic` around that `unsigned` (on platforms that support `atomic`)? And/or have I misunderstood the race you're worried about?
> >
> > > Including the full path from cwd on top of tmp is a huge mess
> >
> > Is it, though? Can't you just `replace('/', '-')` (and maybe also `replace('\\', '-')` for Windows) to turn any full path into a single suitable directory name? Or is the problem that it quickly becomes longer than `PATH_MAX`? Could we take the //hash// of the whole path name, using SHA1 if we've got it handy, or `std::hash<std::string>` if we haven't?
> No, there's no race within the process (unless tests use this in threads, but I don't think any test do) so atomicity doesn't matter, it's races between processes that is the issue.
>
> I guess a hash of the cwd could work for deterministically distinguishing between cases without creating an unwieldy path though...
Okay, so how about
```
fs::path const cwd = utils::getcwd();
fs::path const tmp = fs::temp_directory_path();
std::string base = cwd.filename().string();
size_t i = std::hash<std::string>()(cwd.string());
fs::path p = tmp / (base + "-static_env." + std::to_string(i));
while (utils::exists(p.string())) { ...
```
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D98703/new/
https://reviews.llvm.org/D98703
More information about the libcxx-commits
mailing list