[PATCH] D16726: [Profiling] Speed up unittests by ~5x

Mon Feb 8 17:17:54 PST 2016

On Mon, Feb 8, 2016 at 4:42 PM, Xinliang David Li <xinliangli at gmail.com>
wrote:

>
>
> On Mon, Feb 8, 2016 at 4:38 PM, David Blaikie via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
>>
>>
>> On Mon, Feb 8, 2016 at 4:25 PM, Xinliang David Li <davidxl at google.com>
>> wrote:
>>
>>> On Mon, Feb 8, 2016 at 4:12 PM, David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>> >
>>> >
>>> > On Mon, Feb 8, 2016 at 4:00 PM, Xinliang David Li <davidxl at google.com>
>>> > wrote:
>>> >>
>>> >> Yes, if we talk about all tests in different repos collectively, 0 is
>>> >> probably covered -- but that is still an assumption from the
>>> >> unittest's point of view.
>>> >
>>> >
>>> > Ah, sure - I agree it (along with any other API surface area) should be
>>> > covered by a unit test, even if the codepath is also covered by a
>>> higher
>>> > level test of some kind.
>>> >
>>> > But it looks like the 0 padding case is covered by other /unit/ tests
>>> in the
>>> > same file.
>>>
>>> AFAIK, this is the only unittest that covers 0 padding case.
>>>
>>
>> Presumably any other test testing InstrProfSymtab::create tests the zero
>> padding case, so far as I'm understanding?
>> (instr_prof_symtab_test, instr_prof_symtab_module_test?)
>>
>> (also, as I look at this - the tests seem generally pretty repetitive and
>> brittle - repeating the same string in many places, for example & seem
>> overly verbose, testing several more inputs than seem
>> necessary/constructive (instr_prof_symtab_test tests 5 strings - why not
>> one or two? instr_prof_symtab_module_test tests even more... - no tests
>> seem to test a failed lookup (getFuncName with a hash that is not present))
>>
>>
> No -- those tests do not test serialization nor testing combining two
> serial segments together which is tested here (simulating what linker does).
>

Ah, I see - there are different forms of 'create'.

Might it be easier to follow if each test case for different forms of
create were otherwise the same? (eg: we just try feeding in different
sources of the same data (from a Module, from a String, from a compressed
String, from a compressed string containing two distinct chunks) & check
that everything coming out the other end is identical?)

As it stands it's hard for me to tell what's being tested in each case.
(perhaps this is a case of the {a,x},{b,y} where different creation tests
are simultaneously testing different other features of the Symtab as well -
if that's the case, some documentation of the test strategy might be
helpful to make it clear why the differences are there/what they're
exercising)

>
>
>> It looks like this could be simplified across the board & make it clearer
>> what's being tested and how it's covering all the functionality of the
>> InstrProfRecord class.
>>
>>
>>>
>>> > This seems like a reasonable case of test case reduction by
>>> > combination (if you have {a, b} x {x, y} to test you can test it in two
>>> > cases {a, x}, {b, y} rather than running 4 tests, since the features
>>> are
>>> > independent).
>>>
>>> We need to be very careful about reducing {a,b}x{x,y} into {a,x} and
>>> {b,y} though -- it makes explicit assumption about the implementation
>>> -- there might be subtle dependencies there..
>>>
>>
>> Agreed, though it is impractical to assume everything's related to
>> everything else - testing would be prohibitive. So we generally test fairly
>> precisely/narrowly within the regression suite, and leave it to integration
>> tests (like the test-suite, self hosts, etc) to test more broadly for the
>> cross-functional interactions.
>>
>> That's not to say that some tests don't end up a bit redundant so as to
>> get more explicit coverage.
>>
>
> yes -- that is why I said 'we need to be careful' -- one of the biggest
> pitfalls of writing tests is making assumptions above coverage.  In lack of
> strong proof, I would err on the more conservative side.
>

> David
>
>
>
>>
>> - David
>>
>>
>>>
>>> David
>>>
>>> >
>>> >>
>>> >>
>>> >> David
>>> >>
>>> >> On Mon, Feb 8, 2016 at 3:55 PM, David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>> >> >
>>> >> >
>>> >> > On Mon, Feb 8, 2016 at 3:53 PM, Xinliang David Li <
>>> davidxl at google.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> On Mon, Feb 8, 2016 at 3:30 PM, David Blaikie <dblaikie at gmail.com>
>>> >> >> wrote:
>>> >> >> >
>>> >> >> >
>>> >> >> > On Mon, Feb 8, 2016 at 3:17 PM, Xinliang David Li
>>> >> >> > <davidxl at google.com>
>>> >> >> > wrote:
>>> >> >> >>
>>> >> >> >> On Mon, Feb 8, 2016 at 3:12 PM, David Blaikie <
>>> dblaikie at gmail.com>
>>> >> >> >> wrote:
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > On Mon, Feb 8, 2016 at 3:08 PM, Xinliang David Li
>>> >> >> >> > <davidxl at google.com>
>>> >> >> >> > wrote:
>>> >> >> >> >>
>>> >> >> >> >> On Mon, Feb 1, 2016 at 10:44 AM, David Blaikie
>>> >> >> >> >> <dblaikie at gmail.com>
>>> >> >> >> >> wrote:
>>> >> >> >> >> >
>>> >> >> >> >> >
>>> >> >> >> >> > On Fri, Jan 29, 2016 at 5:58 PM, Xinliang David Li
>>> >> >> >> >> > <davidxl at google.com>
>>> >> >> >> >> > wrote:
>>> >> >> >> >> >>
>>> >> >> >> >> >> To clarify, it is not 128 iterations, but creating a
>>> symbol
>>> >> >> >> >> >> table
>>> >> >> >> >> >> with
>>> >> >> >> >> >> 128 entries -- which is a reasonable size.
>>> >> >> >> >> >
>>> >> >> >> >> >
>>> >> >> >> >> > We don't generally test on "realistic" sized inputs in the
>>> >> >> >> >> > regression
>>> >> >> >> >> > suite.
>>> >> >> >> >> > We write targeted tests for functionality. Broad testing is
>>> >> >> >> >> > done
>>> >> >> >> >> > in
>>> >> >> >> >> > the
>>> >> >> >> >> > test-suite and other integration level testing.
>>> >> >> >> >> >
>>> >> >> >> >> >>
>>> >> >> >> >> >> Test coverage wise, it is probably the same as a 3-entry
>>> >> >> >> >> >> symtab.
>>> >> >> >> >> >
>>> >> >> >> >> >
>>> >> >> >> >> > Then let's use a 3-entry symtab.
>>> >> >> >> >> >
>>> >> >> >> >> > (why 3? Because it tests the boundaries (first and last)
>>> and
>>> >> >> >> >> > one
>>> >> >> >> >> > "normal"
>>> >> >> >> >> > case of a non-boundary value - while the boundaries
>>> probably
>>> >> >> >> >> > aren't
>>> >> >> >> >> > interesting in this algorithm, it's cheap enough to just
>>> follow
>>> >> >> >> >> > that
>>> >> >> >> >> > common
>>> >> >> >> >> > practice in test case design)
>>> >> >> >> >>
>>> >> >> >> >> Will update it to 3.
>>> >> >> >> >>
>>> >> >> >> >> >
>>> >> >> >> >> > I'm also curious about the padding parameter - what does
>>> it do?
>>> >> >> >> >> > Choose
>>> >> >> >> >> > how
>>> >> >> >> >> > many null characters go between each value? What effect
>>> does
>>> >> >> >> >> > that
>>> >> >> >> >> > have/why
>>> >> >> >> >> > is that a tuning parameter? (understanding what it's for
>>> can
>>> >> >> >> >> > help
>>> >> >> >> >> > us
>>> >> >> >> >> > choose
>>> >> >> >> >> > appropriate test cases/coverage for that functionality)
>>> >> >> >> >>
>>> >> >> >> >> Internal padding bytes (for alignment to 4 bytes) can be
>>> zero to
>>> >> >> >> >> 3.
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > Any idea what's particularly useful to test here? (does it
>>> just
>>> >> >> >> > assert
>>> >> >> >> > that
>>> >> >> >> > the parameter is [0,3] ? Or does it have well defined behavior
>>> >> >> >> > (returning an
>>> >> >> >> > error code? doing something else?) outside that range? is any
>>> case
>>> >> >> >> > more
>>> >> >> >> > interesting than any other - or just a simple loop for
>>> [0,Padding]
>>> >> >> >> > done
>>> >> >> >> > at
>>> >> >> >> > some point in the algorithm? Does anything test that the
>>> algorithm
>>> >> >> >> > emitted
>>> >> >> >> > the right padding?)
>>> >> >> >>
>>> >> >> >> It tests that the reader is (flexible) and capable of handing
>>> >> >> >> padding
>>> >> >> >> bytes not produced by the writer.  How many paddings should be
>>> >> >> >> emitted
>>> >> >> >> is not specified. For instance, if some producer forces 8 byte
>>> >> >> >> alignment, it should be handled too.
>>> >> >> >
>>> >> >> >
>>> >> >> > Ah, OK - perhaps we could just test one pseudo-random (if it's
>>> really
>>> >> >> > just a
>>> >> >> > "while (null byte)" loop to ignore the padding - I'd probably
>>> pick 2
>>> >> >> > bytes
>>> >> >> > of padding, but don't mind any small number) amount of padding to
>>> >> >> > test
>>> >> >> > that
>>> >> >> > the reader ignores it, rather than testing several amounts of
>>> >> >> > padding?
>>> >> >> > Alternatively/in addition, might be good to test these features
>>> >> >> > separately
>>> >> >> > to make triage easier? Rather than combining compression and
>>> padding
>>> >> >> > together - unless there's an interesting interaction between the
>>> two
>>> >> >> > features in the implementation?
>>> >> >> >
>>> >> >>
>>> >> >> I think 0 is more special here, so I would pick 0 and 1 byte.
>>> >> >
>>> >> >
>>> >> > Is zero bytes of padding not already covered by any other tests? (I
>>> >> > assume
>>> >> > it's covered by most tests as it sounds like it's the common case?)
>>> >> >
>>> >> >>
>>> >> >> > You say "padding bytes not produced by the writer" - does the
>>> writer
>>> >> >> > produce
>>> >> >> > zero bytes of padding, or some amount of padding that's just not
>>> the
>>> >> >> > same
>>> >> >> > amounts as are being tested here?
>>> >> >>
>>> >> >> The writer can produce 0 or more padding bytes, the assembler and
>>> >> >> linker may or may not pad more. The purpose of the testing is that
>>> the
>>> >> >> reader does not depend/care about those behavior.
>>> >> >>
>>> >> >> David
>>> >> >>
>>> >> >> >
>>> >> >> > - David
>>> >> >
>>> >> >
>>> >
>>> >
>>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160208/621ed0df/attachment.html>