[libcxx-commits] [PATCH] D74997: [libc++] Bugfix to std::binomial_distribution<int>

Wed Mar 11 18:20:40 PDT 2020

atmnpatel added a comment.

Sorry, I was transitioning between machines while updating the commit and I couldn't figure out arc diff - I'd like it to be committed under a335pate at uwaterloo.ca.

I have access to the paper and can confirm that although the implementation is a convoluted implementation of the algorithm in the paper, it does indeed follow the algorithm as best as it should - the most direct implementation of the algorithm is at least partially incorrect (I implemented it and it doesn't pass the most basic tests we have as far as I can tell). The original algorithm consists of a few for-loops (with return conditions) as well as a catch-all return condition, which as I understand it, has been merged rather cleverly into a single while loop while maintaining its behavior (the original algorithm as written is almost certainly erroneous probably due to typographical mistakes). This edit should cause the while loop to terminate when it has looped through an empty loop - which should correspond to the case in the original algorithm where it terminates when it is known that a good sample cannot be constructed given the initial sample from uniform distribution, and returns a fairly default value.

If we assume that the current implementation is in fact a functional implementation of the binomial_distribution, this edit will not change that fact, and will only return 0 in the cases where the current implementation will not terminate. Also I'd like to be really upfront about this - It returns 0 as opposed to following the paper in the catch-all return condition because if the paper is followed, the mean is nowhere near it should be, and this is particular evident in the case of the bug when the probability is set to be extremely small. If instead we return 0, we do end up with a better approximation to the binomial distribution in terms of mean, kurtosis, etc. Since this bug is really hard to reproduce, I can't speak to how generically this will remain true but it is empirically true given what we know now. I did find newer papers with authors that I can contact in case of confusion on the same topic of sampling from a binomial distribution, but given that we have an implementation that is known to work, I wasn't sure if it was appropriate to change the sampling method entirely while claiming to fix a bug that is clearly caused by a simple case of missing a termination guarantee. That is, I would be open to implementing a newer method for binomial_distribution if this seems like an unsatisfactory resolution.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74997/new/

https://reviews.llvm.org/D74997