[llvm-dev] [RFC] Updating googletest to non-release tagged version

Mon May 21 02:35:46 PDT 2018

Hi all,

Just wanted to give a quick update on this: only a few weeks after I
started this email thread, the gtest developers announced tentative plans
for their next two releases. Full details can be seen on various issues on
the gtest github page, but basically the current plan is to create their
next release (tentatively called 1.8.1) in the coming months, with another
release (tentatively 1.9) some time thereafter, which will migrate to
C++11. It's a little unclear what on the current master will be in 1.8.1,
but it looks like all of it, (but I'm not 100% certain). Given other things
have meant I didn't get to look at this, and the now-planned release, I
don't think there's any point in updating to a copy from master (except
maybe as a local experiment).

James

On 23 March 2018 at 15:03, James Henderson <jh7370.2008 at my.bristol.ac.uk>
wrote:

> Yes, I imagine there will be some interesting issues. I'm away for a
> couple of weeks around Easter, so it will likely be a few weeks before I or
> anybody in my team get a chance to look at this further, but we'll
> certainly keep people updated on here with how it goes (or via reviews).
>
> James
>
> On 22 March 2018 at 15:03, David Blaikie <dblaikie at gmail.com> wrote:
>
>> Doesn't sound like there's opposition - can't say whether or not it's
>> wort hit, but if you're willing to give it a go/send a patch, probably
>> can't hurt. Might be a bit of work to get all the corners right - I think
>> there are a few local patches to the gtest in llvm, unfortunately, that
>> you'd have to port over, etc.
>>
>> On Thu, Mar 22, 2018 at 3:59 AM James Henderson <
>> jh7370.2008 at my.bristol.ac.uk> wrote:
>>
>>> Thanks for the comments all. I've thought about it and I agree that I
>>> can probably reduce the number of test cases, so that it is not
>>> combinatorial, in my use case.
>>>
>>> I guess the broader question still is there as to whether a) people
>>> would be opposed to updating to a non-official-release version of
>>> googletest, and b) whether it would be worthwhile doing so. As I noted
>>> previously, there is at least one other place in the tests that said it
>>> would like to use Combine, so it still seems like there would be some
>>> benefit in it, although I admit that I haven't looked to make sure that
>>> this is a reasonable request there.
>>>
>>> James
>>>
>>> On 20 March 2018 at 20:30, <paul.robinson at sony.com> wrote:
>>>
>>>> >> In my particular case from https://reviews.llvm.org/D44560, I
>>>> currently
>>>> >> test the following 3 different cases across the full set of DWARF
>>>> >> versions and formats:
>>>> >>  - Parsing a valid line table
>>>> >>  - Emitting an error if the stated prologue length is greater than
>>>> the
>>>> >>    actual length
>>>> >>  - Emitting an error if the stated prologue length is shorter than
>>>> the
>>>> >>    actual length
>>>> >> The first is just testing the positive cases for each possible
>>>> input. I
>>>> >> guess a single test for DWARF64 could be written for versions 2-4 and
>>>> >> another for version 5 (where there is more stuff between the two
>>>> length
>>>> >> fields so this becomes interesting), similarly for DWARF32. To
>>>> summarise,
>>>> >> I think that these cases are interesting: V5 + 32, V5 + 64, V2 +
>>>> 32/64,
>>>> >> V3 + 32/64, V4 + 32/64. The biggest issue I have with cutting down to
>>>> >> just this set is that it makes the tests less specific, e.g. at a
>>>> glance,
>>>> >> what is important about the test - the fact that it is v4, or
>>>> DWARF64,
>>>> >> or both independently, or the combination?
>>>> >>
>>>> >> Related aside: I've realised since earlier that there is scope for
>>>> >> version 2 tests, distinct from version 3: v2 tests test the lower
>>>> >> boundary on valid versions, and v3 the upper boundary on versions
>>>> >> without maximum_operations_per_instruction.
>>>> >>
>>>> >> The latter two test cases are important because a) the length field
>>>> has
>>>> >> a different size for DWARF32/64 and therefore the prologue length
>>>> needs
>>>> >> to be measured from a different point between the different formats,
>>>> and
>>>> >> b) the contents of the prologue are different in each of version 3,
>>>> 4,
>>>> >> and 5, and thus the amount read will be different. We could test each
>>>>
>>>>                         |
>>>> >> individual version, independently of the format, but it is
>>>> theoretically
>>>> >> possible for an error to sneak in whereby the two different failure
>>>> >> points cancel each other out. The benefit is admittedly small, but
>>>> these
>>>> >> tests are fast, so I don't think it hurts to have them.
>>>> >
>>>> > Still not sure I follow all of this - though perhaps the test design
>>>> > discussion might be better in its own thread. But the broad
>>>> subject/topic/
>>>> > name of the stuff I'm interested in applying here is equivalence
>>>> > partitioning: https://en.wikipedia.org/wiki/Equivalence_partitioning
>>>>  -
>>>> > helps explain the general idea of reducing "test all combinations/
>>>> > permutations of cases" to "test representative cases from each class".
>>>> >
>>>> > - Dave
>>>>
>>>> Equivalence classes are a fine way to reduce duplication in manually
>>>> written tests.  They tend to depend on a white-box understanding of the
>>>> system-under-test to define the equivalences reasonably.  LLVM
>>>> regression
>>>> tests tend to be written very much white-box, as it's the developer who
>>>> is writing the test. :-)  The problem there is that as the code evolves,
>>>> the original equivalence classes may change, but that doesn't
>>>> necessarily
>>>> make any tests fail, it just ends up with reduced coverage.  Without
>>>> good
>>>> coverage analysis we can easily find tests no longer verifying
>>>> everything
>>>> we wanted them to.
>>>>
>>>> (And in fact back at HP I had an interesting chat with a product owner
>>>> who was very much a fan of combinatoric testing; he said every single
>>>> time he added a new dimension, he found new bugs, and didn't think he
>>>> could have predicted which instances would have found the bugs.)
>>>>
>>>> That said, in the case at hand I think looking at the testing at the
>>>> granularity of "parsing a line table" is too coarse.  There are several
>>>> different bits to it and we can reasonably anticipate that they will
>>>> remain relatively independent bits:
>>>> - detecting the 32/64 format
>>>> - extracting the version-dependent sets of fields accurately
>>>> - detecting mismatches between the stated length and the internally
>>>>   consistent length (e.g., the header says length X, but the file
>>>>   table has N entries which runs it to length > X).
>>>>
>>>> To the extent that new DWARF versions have different fields, we should
>>>> test that the new fields are extracted and reported (the success cases).
>>>> If the new fields are 32/64-sensitive, those should be tested in both
>>>> formats to make sure that sensitivity is handled (more success cases).
>>>> The notion of running past the stated length (conversely, finishing the
>>>> parsing before reaching the stated length) intuitively isn't either
>>>> version-sensitive or format-sensitive, i.e. all those cases look like
>>>> they should be part of the same equivalence class, so you don't need
>>>> to replicate those across all versions and formats.  With the caveat
>>>> that if white-box examination shows that some case takes a very
>>>> different path, it's worth replicating the test.  In particular the
>>>> v5 directory/file tables are very different from prior versions, so
>>>> it's worth having separate bad-length tests for pre-v5 and v5.
>>>>
>>>> --paulr
>>>>
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180521/9340a5eb/attachment.html>