[PATCH] D88283: [NFCI][IR] ConstantRangeTest: add basic scaffolding for next-gen precision/correctness testing

Sat Sep 26 11:47:51 PDT 2020

nikic added a comment.

Thanks for putting up the review! I agree that the current testing situation is not ideal, but don't think the approach used here is moving us in the right direction.

> It neither tells us without doubt that the result is
> conservatively correct, nor the precise match to the ConstantRange
> result tells us that the result is precise.

To start with, the current testing code does check for conservative correctness -- the problem is just that it does not check for it separately, so it's not obvious whether a failure is a failure of correctness or a failure of preciseness.

The question of precision is more complicated, because there is no generally applicable notion of a precise range. In many cases the "best" range is the one with the smallest size, but in some cases we also have an explicit sign preference (e.g. for intersection). In D88356 <https://reviews.llvm.org/D88356> I've put up a more general testing framework that checks correctness and optimality under a preference function. The preference function is something like "smallest" or "smallest non-wrapping", which allows us to precisely specify what it is we're testing for.

This is an alternative to the approach you are proposing here. There's two reasons why I think it is more useful:

- In this patch, you determine precision by counting how many values there are in the value set and how many there are in the range. For the case of the binaryNot() implementation, both of these will coincide for an optimal implementation. However, this is due to a special property of binaryNot() and will not generalize to other operations. In particular, binaryNot() maps a continuous range of values onto another continuous range, so there will be no gaps. In the general case, even an optimal range will contain elements that are not in the exact set. If the precise result set is just two elements `0` and `0x7f` then the smallest covering range will still contain 64x more elements than the exact set. So if at the end of the test you get the result that on average, the computed range has 3.7x more elements than the exact set -- what does that actually tell you? Depending on the operation, this could easily be both a very good and a very bad result. For this reason, I don't think value counting is a useful metric in this context.
- This testing approach still retains the need to explicitly specify how the "exact range" is constructed. This is easy for signed/unsigned non-wrapping ranges, but gets tricky for "smallest" ranges, and even trickier if the preference criterion is more complicated than that. The implementation of `testBinarySetOperandExhaustive` is probably the best example of that: https://github.com/llvm/llvm-project/blob/915310bf14cbac58a81fd60e0fa9dc8d341108e2/llvm/unittests/IR/ConstantRangeTest.cpp#L481 The approach of enumerating all possible ranges and then picking out the best one by preference allows us to move that code onto the normal testing framework.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88283/new/

https://reviews.llvm.org/D88283