[llvm-dev] [RFC] Updating googletest to non-release tagged version

Fri Mar 23 08:03:13 PDT 2018

Yes, I imagine there will be some interesting issues. I'm away for a couple
of weeks around Easter, so it will likely be a few weeks before I or
anybody in my team get a chance to look at this further, but we'll
certainly keep people updated on here with how it goes (or via reviews).

James

On 22 March 2018 at 15:03, David Blaikie <dblaikie at gmail.com> wrote:

> Doesn't sound like there's opposition - can't say whether or not it's wort
> hit, but if you're willing to give it a go/send a patch, probably can't
> hurt. Might be a bit of work to get all the corners right - I think there
> are a few local patches to the gtest in llvm, unfortunately, that you'd
> have to port over, etc.
>
> On Thu, Mar 22, 2018 at 3:59 AM James Henderson <
> jh7370.2008 at my.bristol.ac.uk> wrote:
>
>> Thanks for the comments all. I've thought about it and I agree that I can
>> probably reduce the number of test cases, so that it is not combinatorial,
>> in my use case.
>>
>> I guess the broader question still is there as to whether a) people would
>> be opposed to updating to a non-official-release version of googletest, and
>> b) whether it would be worthwhile doing so. As I noted previously, there is
>> at least one other place in the tests that said it would like to use
>> Combine, so it still seems like there would be some benefit in it, although
>> I admit that I haven't looked to make sure that this is a reasonable
>> request there.
>>
>> James
>>
>> On 20 March 2018 at 20:30, <paul.robinson at sony.com> wrote:
>>
>>> >> In my particular case from https://reviews.llvm.org/D44560, I
>>> currently
>>> >> test the following 3 different cases across the full set of DWARF
>>> >> versions and formats:
>>> >>  - Parsing a valid line table
>>> >>  - Emitting an error if the stated prologue length is greater than the
>>> >>    actual length
>>> >>  - Emitting an error if the stated prologue length is shorter than the
>>> >>    actual length
>>> >> The first is just testing the positive cases for each possible input.
>>> I
>>> >> guess a single test for DWARF64 could be written for versions 2-4 and
>>> >> another for version 5 (where there is more stuff between the two
>>> length
>>> >> fields so this becomes interesting), similarly for DWARF32. To
>>> summarise,
>>> >> I think that these cases are interesting: V5 + 32, V5 + 64, V2 +
>>> 32/64,
>>> >> V3 + 32/64, V4 + 32/64. The biggest issue I have with cutting down to
>>> >> just this set is that it makes the tests less specific, e.g. at a
>>> glance,
>>> >> what is important about the test - the fact that it is v4, or DWARF64,
>>> >> or both independently, or the combination?
>>> >>
>>> >> Related aside: I've realised since earlier that there is scope for
>>> >> version 2 tests, distinct from version 3: v2 tests test the lower
>>> >> boundary on valid versions, and v3 the upper boundary on versions
>>> >> without maximum_operations_per_instruction.
>>> >>
>>> >> The latter two test cases are important because a) the length field
>>> has
>>> >> a different size for DWARF32/64 and therefore the prologue length
>>> needs
>>> >> to be measured from a different point between the different formats,
>>> and
>>> >> b) the contents of the prologue are different in each of version 3, 4,
>>> >> and 5, and thus the amount read will be different. We could test each
>>>
>>>                         |
>>> >> individual version, independently of the format, but it is
>>> theoretically
>>> >> possible for an error to sneak in whereby the two different failure
>>> >> points cancel each other out. The benefit is admittedly small, but
>>> these
>>> >> tests are fast, so I don't think it hurts to have them.
>>> >
>>> > Still not sure I follow all of this - though perhaps the test design
>>> > discussion might be better in its own thread. But the broad
>>> subject/topic/
>>> > name of the stuff I'm interested in applying here is equivalence
>>> > partitioning: https://en.wikipedia.org/wiki/Equivalence_partitioning -
>>> > helps explain the general idea of reducing "test all combinations/
>>> > permutations of cases" to "test representative cases from each class".
>>> >
>>> > - Dave
>>>
>>> Equivalence classes are a fine way to reduce duplication in manually
>>> written tests.  They tend to depend on a white-box understanding of the
>>> system-under-test to define the equivalences reasonably.  LLVM regression
>>> tests tend to be written very much white-box, as it's the developer who
>>> is writing the test. :-)  The problem there is that as the code evolves,
>>> the original equivalence classes may change, but that doesn't necessarily
>>> make any tests fail, it just ends up with reduced coverage.  Without good
>>> coverage analysis we can easily find tests no longer verifying everything
>>> we wanted them to.
>>>
>>> (And in fact back at HP I had an interesting chat with a product owner
>>> who was very much a fan of combinatoric testing; he said every single
>>> time he added a new dimension, he found new bugs, and didn't think he
>>> could have predicted which instances would have found the bugs.)
>>>
>>> That said, in the case at hand I think looking at the testing at the
>>> granularity of "parsing a line table" is too coarse.  There are several
>>> different bits to it and we can reasonably anticipate that they will
>>> remain relatively independent bits:
>>> - detecting the 32/64 format
>>> - extracting the version-dependent sets of fields accurately
>>> - detecting mismatches between the stated length and the internally
>>>   consistent length (e.g., the header says length X, but the file
>>>   table has N entries which runs it to length > X).
>>>
>>> To the extent that new DWARF versions have different fields, we should
>>> test that the new fields are extracted and reported (the success cases).
>>> If the new fields are 32/64-sensitive, those should be tested in both
>>> formats to make sure that sensitivity is handled (more success cases).
>>> The notion of running past the stated length (conversely, finishing the
>>> parsing before reaching the stated length) intuitively isn't either
>>> version-sensitive or format-sensitive, i.e. all those cases look like
>>> they should be part of the same equivalence class, so you don't need
>>> to replicate those across all versions and formats.  With the caveat
>>> that if white-box examination shows that some case takes a very
>>> different path, it's worth replicating the test.  In particular the
>>> v5 directory/file tables are very different from prior versions, so
>>> it's worth having separate bad-length tests for pre-v5 and v5.
>>>
>>> --paulr
>>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180323/ab9119dc/attachment.html>