<div dir="ltr">grep *.py files for test_with_dsym.  But a random example I'll pull from the search results is lldb\test\expression_command\call-function\TestCallStdStringFunction.py<br><div><br></div><div>In it you'll see this:</div><div><br></div><div><div>    <b>@unittest2.skipUnless(sys.platform.startswith("darwin"), "requires Darwin")</b></div><div>    @dsym_test</div><div>    @expectedFailureDarwin(16361880) # <rdar://problem/16361880>, we get the result correctly, but fail to invoke the Summary formatter.</div><div>    def test_with_dsym(self):</div><div>        """Test calling std::String member function."""</div><div>        self.buildDsym()</div><div>        self.call_function()</div><div><br></div><div>    @dwarf_test</div><div>    @expectedFailureFreeBSD('<a href="http://llvm.org/pr17807">llvm.org/pr17807</a>') # Fails on FreeBSD buildbot</div><div>    @expectedFailureGcc # <a href="http://llvm.org/pr14437">llvm.org/pr14437</a>, fails with GCC 4.6.3 and 4.7.2</div><div>    @expectedFailureIcc # <a href="http://llvm.org/pr14437">llvm.org/pr14437</a>, fails with ICC 13.1</div><div>    @expectedFailureDarwin(16361880) # <rdar://problem/16361880>, we get the result correctly, but fail to invoke the Summary formatter.</div><div>    def test_with_dwarf(self):</div><div>        """Test calling std::String member function."""</div><div>        self.buildDwarf()</div><div>        self.call_function()</div></div><div><br></div><div>The LLDB test runner considers any class which derives from TestBase to be a "test case" (so ExprCommandCallFunctionTestCase from this file is a test case), and for each test case, any member function whose name starts with "test" to be a single test.  So in this case we've got <span style="font-size:13.1999998092651px;line-height:19.7999992370605px">ExprCommandCallFunctionTestCase.test_with_dsym and </span><span style="font-size:13.1999998092651px;line-height:19.7999992370605px">ExprCommandCallFunctionTestCase.test_with_dwarf.  The first only runs on darwin, the second runs on all platforms but is xfail'ed on FreeBSD, GCC, ICC, and darwin</span></div><div><span style="font-size:13.1999998092651px;line-height:19.7999992370605px"><br></span></div><div><span style="font-size:13.1999998092651px;line-height:19.7999992370605px">(I'm not sure what the @dsym_test and @dwarf_test annotations are for)</span></div></div><br><div class="gmail_quote">On Sat, Mar 14, 2015 at 10:05 AM Jonathan Roelofs <<a href="mailto:jonathan@codesourcery.com">jonathan@codesourcery.com</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

<br>

On 3/13/15 9:10 PM, Zachary Turner wrote:<br>

><br>

><br>

> On Fri, Mar 13, 2015 at 4:01 PM Jonathan Roelofs<br>

> <<a href="mailto:jonathan@codesourcery.com" target="_blank">jonathan@codesourcery.com</a> <mailto:<a href="mailto:jonathan@codesourcery.com" target="_blank">jonathan@codesourcery.<u></u>com</a>>> wrote:<br>

><br>

>     +ddunbar<br>

><br>

>     On 3/13/15 9:53 AM, <a href="mailto:jingham@apple.com" target="_blank">jingham@apple.com</a> <mailto:<a href="mailto:jingham@apple.com" target="_blank">jingham@apple.com</a>> wrote:<br>

>      >>> Depending on how different the different things are.  Compiler<br>

>     tests<br>

>      >>> tend to have input, output and some machine that converts the<br>

>     input to<br>

>      >>> the output.  That is one very particular model of testing.<br>

>     Debugger<br>

>      >>> tests need to do: get to stage 1, if that succeeded, get to<br>

>     stage 2,<br>

>      >>> if that succeeded, etc.  Plus there's generally substantial<br>

>     setup code<br>

>      >>> to get somewhere interesting, so while you are there you<br>

>     generally try<br>

>      >>> to test a bunch of similar things.  Plus, the tests often have<br>

>     points<br>

>      >>> where there are several success cases, but each one requires a<br>

>      >>> different "next action", stepping being the prime example of this.<br>

>      >>> These are very different models and I don't see that trying to<br>

>     smush<br>

>      >>> the two together would be a fruitful exercise.<br>

><br>

>     I think LIT does make the assumption that one "test file" has one "test<br>

>     result". But this is a place where we could extend LIT a bit. I don't<br>

>     think it would be very painful.<br>

><br>

>     For me, this would be very useful for a few of the big libc++abi tests,<br>

>     like the demangler one, as currently I have to #ifdef out a couple of<br>

>     the cases that can't possibly work on my platform. It would be much<br>

>     nicer if that particular test file outputted multiple test results of<br>

>     which I could XFAIL the ones I know won't ever work. (For anyone who is<br>

>     curious, the one that comes to mind needs the c99 %a printf format,<br>

>     which my libc doesn't have. It's a baremetal target, and binary size is<br>

>     really important).<br>

><br>

>     How much actual benefit is there in having lots of results per test<br>

>     case, rather than having them all &&'d together to one result?<br>

><br>

>     Out of curiosity, does lldb's existing testsuite allow you to run<br>

>     individual test results in test cases where there are more than one test<br>

>     result?<br>

><br>

><br>

>   I think I'm not following this line of discussion.  So it's possible<br>

> you and Jim are talking about different things here.<br>

<br>

I think that's the case... I was imagining the "logic of the test"<br>

something like this:<br>

<br>

   1) Set 5 breakpoints<br>

   2) Continue<br>

   3) Assert that the debugger stopped at the first breakpoint<br>

   4) Continue<br>

   5) Assert that the debugger stopped at the second breakpoint<br>

   6) etc.<br>

<br>

Reading Jim's description again, with the help of your speculative<br>

example, it sounds like the test logic itself isn't straightline<br>

code.... that's okay too. What I was speaking to is a perceived<br>

difference in what the "results" of running such a test are.<br>

<br>

In llvm, the assertions are CHECK lines. In libc++, the assertions are<br>

calls to `assert` from assert.h, as well as `static_assert`s. In both<br>

cases, failing any one of those checks in a test makes the whole test<br>

fail. For some reason I had the impression that in lldb there wasn't a<br>

single test result per *.py test. Perhaps that's not the case? Either<br>

way, what I want to emphasize is that LIT doesn't care about the "logic<br>

of the test", as long as there is one test result per test (and even<br>

that condition could be amended, if it would be useful for lldb).<br>

<br>

><br>

> If I understand correctly (and maybe I don't), what Jim is saying is<br>

> that a debugger test might need to do something like:<br>

><br>

> 1) Set 5 breakpoints<br>

> 2) Continue<br>

> 3) Depending on which breakpoint gets hit, take one of 5 possible "next"<br>

> actions.<br>

><br>

> But I'm having trouble coming up with an example of why this might be<br>

> useful.  Jim, can you make this a little more concrete with a specific<br>

> example of a test that does this, how the test works, and what the<br>

> different success / failure cases are so we can be sure everyone is on<br>

> the same page?<br>

><br>

> In the case of the libc++ abi tests, I'm not sure what is meant by<br>

> "multiple results per test case".  Do you mean (for example) you'd like<br>

> to be able to XFAIL individual run lines based on some condition?  If<br>

<br>

I think this means I should make the libc++abi example even more<br>

concrete.... In libc++/libc++abi tests, the "RUN" line is implicit<br>

(well, aside from the few ShTest tests ericwf has added recently). Every<br>

*.pass.cpp test is a file that the test harness knows it has to compile,<br>

run, and check its exit status. That being said,<br>

libcxxabi/test/test_demangle.<u></u>pass.cpp has a huge array like this:<br>

<br>

       20 const char* cases[][2] =<br>

       21 {<br>

       22     {"_Z1A", "A"},<br>

       23     {"_Z1Av", "A()"},<br>

       24     {"_Z1A1B1C", "A(B, C)"},<br>

       25     {"_Z4testI1A1BE1Cv", "C test<A, B>()"},<br>

<br>

    snip<br>

<br>

    29594     {"_Zli2_xy", "operator\"\" _x(unsigned long long)"},<br>

    29595     {"_Z1fIiEDcT_", "decltype(auto) f<int>(int)"},<br>

    29596 };<br>

<br>

Then there's some logic in `main()` that runs, __cxa_demangle on<br>

`cases[i][0]`, and asserts that it's the same as `cases[i][1]`. If any<br>

of those assertions fail, the entire test is marked as failing, and no<br>

further lines in that array are verified. For the sake of discussion,<br>

let's call each of entries in `cases` a "subtest", and the entirety of<br>

test_demangle.pass.cpp a test.<br>

<br>

The sticky issue is that there are a few subtests in this test that<br>

don't make sense on various platforms, so currently, they are #ifdef'd<br>

out. If the LIT TestFormat and the tests themselves had a way to<br>

communicate that a subtest failed, but to continue running other<br>

subtests after that, then we could XFAIL these weird subtests individually.<br>

<br>

Keep in mind though that I'm not really advocating we go and change<br>

test_demangle.pass.cpp to suit that model, because #ifdef's work<br>

reasonably well there, and there are relatively few subtests that have<br>

these platform differences... That's just the first example of the<br>

test/subtest relationship that I could think of.<br>

<br>

> so, LLDB definitely needs that.  One example which LLDB uses almost<br>

> everywhere is that of running the same test with dSYM or DWARF debug<br>

> info.  On Apple platforms, tests generally need to run with both dSYM<br>

> and DWARF debug info (literally just repeat the same test twice), and on<br>

> non Apple platforms, only DWARF tests ever need to be run.  So there<br>

> would need to be a way to express this.<br>

<br>

Can you point me to an example of this?<br>

<br>

><br>

> There are plenty of other one-off examples.  Debuggers have a lot of<br>

> platform specific code, and the different platforms support different<br>

> amounts of functionality (especially for things like Android / Windows<br>

> that are works in progress).  So we frequently have the need to have a<br>

> single test file which has, say 10 tests in it.  And specific tests can<br>

> be XFAILed or even disabled individually based on conditions (usually<br>

> which platform is running the test suite, but not always).<br>

<br>

--<br>

Jon Roelofs<br>

<a href="mailto:jonathan@codesourcery.com" target="_blank">jonathan@codesourcery.com</a><br>

CodeSourcery / Mentor Embedded<br>

</blockquote></div>