<div dir="ltr">I agree that if all the tests look like the ones in this review there will be no way to test 100% of the things we currently test.  At the same time, I see this as just a beginning.  A replacement for lldbinline but one which is more extensible.  To test things like you said like multi-threaded stepping behavior, a simple sequential list of actions and pattern matches won't be sufficient.  <div><br></div><div>So I think a path forward is something like this:</div><div><br></div><div>1. get all the easy stuff out of the way first, such as the tests in this patch (and there are probably at least 100 more that are just as easy)</div><div>2. Once we have a reasonable number of all easy cases ported, we do some benchmarking (at this point the original versions of these tests are still in the tree).  Is it faster?  Slower?  Just to give us actionable data.</div><div>3. Turn on these tests on the bots, let them run for a while and see how stable they are.</div><div>4. If all goes well after a week or two, delete the ones that have been ported over and starting using this style of test whenever possible and whenever one would have previously used an lldbinline test.</div><div>5. Analyze the remaining difficult tests, and find areas where we can make simple, incremental changes to LLDB's lit test runner to support new types of tests.  Some will be easier than others.  At every point we're just getting low hanging fruit, small, non-controversial improvements that enable writing new tests in this style.</div><div><br></div><div>And then just see how far we can get.  I could throw out crazy sledgehammer ideas right now, like "allow people to embed SB API code directly in the lit test file", but it seems unproductive since we don't even know exactly what problems we'll encounter.</div></div><br><div class="gmail_quote"><div dir="ltr">On Wed, Sep 14, 2016 at 5:38 PM Jim Ingham <<a href="mailto:jingham@apple.com">jingham@apple.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br class="gmail_msg">

> On Sep 14, 2016, at 5:13 PM, Zachary Turner <<a href="mailto:zturner@google.com" class="gmail_msg" target="_blank">zturner@google.com</a>> wrote:<br class="gmail_msg">

><br class="gmail_msg">

> I'm only saying that we should have an open mind.  Obviously there are (valid!) concerns.  If we can't solve them then we can't solve them.  The goal (my goal anyway) is always to make things better, not to use X because it's X.  There's value in consistency, but that doesn't mean that the value from consistency always outweighs the value you get from doing a custom thing.<br class="gmail_msg">

><br class="gmail_msg">

> So what I'm saying is: IF we can find a way to have one test suite across all of LLVM and it's subprojects, AND it is sufficiently powerful to test all the things we need to test while remaining maintainable in the long term, we should absolutely jump on the opportunity.<br class="gmail_msg">

><br class="gmail_msg">

> But this is one of those things that requires a lot of upfront time investment before you can actually know if it can work for 100% of things.  Obviously some people don't want to invest their time that way when they're already satisfied, and I don't blame them.<br class="gmail_msg">

><br class="gmail_msg">

> But for the people who do, and who think they can solve the problem, what's the harm?  Obviously the burden is on those people to prove that their vision can be realized.<br class="gmail_msg">

><br class="gmail_msg">

> But if it is successful, then there's no denying the benefits.  1) Tests become easier to write.  2) Tests become easier to debug.  3) Consistency encourages people who have traditionally stayed away from LLDB to contribute.  4) All the people pouring their effort into the custom thing can now pour it into the shared thing, so everybody benefits.<br class="gmail_msg">

<br class="gmail_msg">

I disagree that one of the benefits will be #2.  I worry that what will really happen is "easy tests will become slightly easier to write, and complex tests will be fragile when they aren't impossible" which will mean we end up only writing easy tests.  Lots of easy tests, but we won't test complex multi-threaded stepping behavior, etc and so we'll end up breaking that sort of thing, which we won't find out about quickly because the failures are intermittent, and hard to write bugs about.<br class="gmail_msg">

<br class="gmail_msg">

Jim<br class="gmail_msg">

<br class="gmail_msg">

<br class="gmail_msg">

><br class="gmail_msg">

> I don't blame you for being scared of command tests.  I don't support their use in the current LLDB test suite either, for exactly the same reasons you and Jason have expressed.  But I do think it's possible to come up with something that a) doesn't suffer from the same problems, b) allows testing a ton of extra functionality that is not currently testable through the api, and c) doesn't rely on python at all.  If I'm wrong I'll eat crow :)<br class="gmail_msg">

><br class="gmail_msg">

> On Wed, Sep 14, 2016 at 5:00 PM Jim Ingham <<a href="mailto:jingham@apple.com" class="gmail_msg" target="_blank">jingham@apple.com</a>> wrote:<br class="gmail_msg">

> Also, w.r.t:<br class="gmail_msg">

><br class="gmail_msg">

> >  Aside from write imperative control flow constructs, which I see as a positive rather than a negative.<br class="gmail_msg">

><br class="gmail_msg">

> I wrote a bunch of tests to test that stepping behavior for swift and C was reasonable.  When stepping through source code, there is not one correct way to write the line tables, and in fact clang & swiftc change how they describe the source through the line tables all the time.  So you have to do: I stepped, and sometimes I'll get to A, sometimes to B, both are "right" but I have to do different things in either case.  If A, step again before the next test, if B go to the next test.<br class="gmail_msg">

><br class="gmail_msg">

> You could "fix" that by only doing one step per test, and taking each of these as a success.  But then you wouldn't test that series of steps don't accumulate errors, you'd only test "run to a breakpoint and step once."  That would not be good.  So your positive would be very much a negative for this kind of test.<br class="gmail_msg">

><br class="gmail_msg">

> Traditionally the answer to this has been: we know we have to keep the current testsuite around but we're just adding other new different ways to write tests.  Now you are saying something very different.  Do you really mean that?<br class="gmail_msg">

><br class="gmail_msg">

> Jim<br class="gmail_msg">

><br class="gmail_msg">

<br class="gmail_msg">

</blockquote></div>