[llvm-dev] [RFC] Updating googletest to non-release tagged version

Thu Mar 22 08:03:59 PDT 2018

Doesn't sound like there's opposition - can't say whether or not it's wort
hit, but if you're willing to give it a go/send a patch, probably can't
hurt. Might be a bit of work to get all the corners right - I think there
are a few local patches to the gtest in llvm, unfortunately, that you'd
have to port over, etc.

On Thu, Mar 22, 2018 at 3:59 AM James Henderson <
jh7370.2008 at my.bristol.ac.uk> wrote:

> Thanks for the comments all. I've thought about it and I agree that I can
> probably reduce the number of test cases, so that it is not combinatorial,
> in my use case.
>
> I guess the broader question still is there as to whether a) people would
> be opposed to updating to a non-official-release version of googletest, and
> b) whether it would be worthwhile doing so. As I noted previously, there is
> at least one other place in the tests that said it would like to use
> Combine, so it still seems like there would be some benefit in it, although
> I admit that I haven't looked to make sure that this is a reasonable
> request there.
>
> James
>
> On 20 March 2018 at 20:30, <paul.robinson at sony.com> wrote:
>
>> >> In my particular case from https://reviews.llvm.org/D44560, I
>> currently
>> >> test the following 3 different cases across the full set of DWARF
>> >> versions and formats:
>> >>  - Parsing a valid line table
>> >>  - Emitting an error if the stated prologue length is greater than the
>> >>    actual length
>> >>  - Emitting an error if the stated prologue length is shorter than the
>> >>    actual length
>> >> The first is just testing the positive cases for each possible input. I
>> >> guess a single test for DWARF64 could be written for versions 2-4 and
>> >> another for version 5 (where there is more stuff between the two length
>> >> fields so this becomes interesting), similarly for DWARF32. To
>> summarise,
>> >> I think that these cases are interesting: V5 + 32, V5 + 64, V2 + 32/64,
>> >> V3 + 32/64, V4 + 32/64. The biggest issue I have with cutting down to
>> >> just this set is that it makes the tests less specific, e.g. at a
>> glance,
>> >> what is important about the test - the fact that it is v4, or DWARF64,
>> >> or both independently, or the combination?
>> >>
>> >> Related aside: I've realised since earlier that there is scope for
>> >> version 2 tests, distinct from version 3: v2 tests test the lower
>> >> boundary on valid versions, and v3 the upper boundary on versions
>> >> without maximum_operations_per_instruction.
>> >>
>> >> The latter two test cases are important because a) the length field has
>> >> a different size for DWARF32/64 and therefore the prologue length needs
>> >> to be measured from a different point between the different formats,
>> and
>> >> b) the contents of the prologue are different in each of version 3, 4,
>> >> and 5, and thus the amount read will be different. We could test each
>>
>>                       |
>> >> individual version, independently of the format, but it is
>> theoretically
>> >> possible for an error to sneak in whereby the two different failure
>> >> points cancel each other out. The benefit is admittedly small, but
>> these
>> >> tests are fast, so I don't think it hurts to have them.
>> >
>> > Still not sure I follow all of this - though perhaps the test design
>> > discussion might be better in its own thread. But the broad
>> subject/topic/
>> > name of the stuff I'm interested in applying here is equivalence
>> > partitioning: https://en.wikipedia.org/wiki/Equivalence_partitioning -
>> > helps explain the general idea of reducing "test all combinations/
>> > permutations of cases" to "test representative cases from each class".
>> >
>> > - Dave
>>
>> Equivalence classes are a fine way to reduce duplication in manually
>> written tests.  They tend to depend on a white-box understanding of the
>> system-under-test to define the equivalences reasonably.  LLVM regression
>> tests tend to be written very much white-box, as it's the developer who
>> is writing the test. :-)  The problem there is that as the code evolves,
>> the original equivalence classes may change, but that doesn't necessarily
>> make any tests fail, it just ends up with reduced coverage.  Without good
>> coverage analysis we can easily find tests no longer verifying everything
>> we wanted them to.
>>
>> (And in fact back at HP I had an interesting chat with a product owner
>> who was very much a fan of combinatoric testing; he said every single
>> time he added a new dimension, he found new bugs, and didn't think he
>> could have predicted which instances would have found the bugs.)
>>
>> That said, in the case at hand I think looking at the testing at the
>> granularity of "parsing a line table" is too coarse.  There are several
>> different bits to it and we can reasonably anticipate that they will
>> remain relatively independent bits:
>> - detecting the 32/64 format
>> - extracting the version-dependent sets of fields accurately
>> - detecting mismatches between the stated length and the internally
>>   consistent length (e.g., the header says length X, but the file
>>   table has N entries which runs it to length > X).
>>
>> To the extent that new DWARF versions have different fields, we should
>> test that the new fields are extracted and reported (the success cases).
>> If the new fields are 32/64-sensitive, those should be tested in both
>> formats to make sure that sensitivity is handled (more success cases).
>> The notion of running past the stated length (conversely, finishing the
>> parsing before reaching the stated length) intuitively isn't either
>> version-sensitive or format-sensitive, i.e. all those cases look like
>> they should be part of the same equivalence class, so you don't need
>> to replicate those across all versions and formats.  With the caveat
>> that if white-box examination shows that some case takes a very
>> different path, it's worth replicating the test.  In particular the
>> v5 directory/file tables are very different from prior versions, so
>> it's worth having separate bad-length tests for pre-v5 and v5.
>>
>> --paulr
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180322/4e9e09d1/attachment.html>