[lldb-dev] Testing through api vs. commands

Wed Oct 7 11:54:24 PDT 2015

> On Oct 7, 2015, at 11:40 AM, Zachary Turner <zturner at google.com> wrote:
> 
> 
> 
> On Wed, Oct 7, 2015 at 11:26 AM Jim Ingham <jingham at apple.com> wrote:
> 
> > On Oct 7, 2015, at 11:16 AM, Jim Ingham <jingham at apple.com> wrote:
> >
> >>
> >> On Oct 7, 2015, at 10:37 AM, Zachary Turner via lldb-dev <lldb-dev at lists.llvm.org> wrote:
> >>
> >>
> >>
> >> On Wed, Oct 7, 2015 at 10:17 AM Greg Clayton <gclayton at apple.com> wrote:
> >>
> >>
> >>> On Oct 7, 2015, at 10:05 AM, Zachary Turner via lldb-dev <lldb-dev at lists.llvm.org> wrote:
> >>>
> >>> Jim, Greg,
> >>>
> >>> Can I get some feedback on this?  I would like to start enforcing this moving forward.  I want to make sure we're in agreement.
> >>>
> >>> On Mon, Oct 5, 2015 at 12:30 PM Todd Fiala <todd.fiala at gmail.com> wrote:
> >>> IMHO that all sounds reasonable.
> >>>
> >>> FWIW - I wrote some tests for the test system changes I put in (for the pure-python impl of timeout support), and in the process, I discovered a race condition in using a python facility that there really is no way I would have found anywhere near as reasonably without having added the tests.  (For those of you who are test-centric, this is not a surprising outcome, but I'm adding this for those who may be inclined to think of it as an afterthought).
> >>>
> >>> -Todd
> >>>
> >>> On Mon, Oct 5, 2015 at 11:24 AM, Zachary Turner via lldb-dev <lldb-dev at lists.llvm.org> wrote:
> >>> On Fri, Sep 11, 2015 at 11:42 AM Jim Ingham <jingham at apple.com> wrote:
> >>> I have held from the beginning that the only tests that should be written using HandleCommand are those that explicitly test command behavior, and if it is possible to write a test using the SB API you should always do it that way for the very reasons you cite.  Not everybody agreed with me at first, so we ended up with a bunch of tests that do complex things using HandleCommand where they really ought not to.  I'm not sure it is worth the time to go rewrite all those tests, but we shouldn't write any new tests that way.
> >>>
> >>> I would like to revive this thread, because there doesn't seem to be consensus that this is the way to go.  I've suggested on a couple of reviews recently that people put new command api tests under a new top-level folder under tests, and so far the responses I've gotten have not indicated that people are willing to do this.
> >>>
> >>> Nobody chimed in on this thread with a disagreement, which indicates to me that we are ok with moving this forward.  So I'm reviving this in hopes that we can come to agreement.  With that in mind, my goal is:
> >>>
> >>> 1) Begin enforcing this on new CLs that go in.  We need to maintain a consistent message and direction for the project, and if this is a "good idea", then it should be applied and enforced consistently. Command api tests should be the exception, not the norm.
> >>
> >> You mean API tests should be the norm right? I don't want people submitting command line tests like "file a.out", "run", "step". I want the API to be used. Did you get this reversed?
> >> I didn't get it reversed, but I agree my wording wasn't clear.  By "command api", I meant HandleCommand / etc.  I *do* want the SB API to be used.
> >>
> >>>
> >>> 2) Begin rejecting or reverting changes that go in without tests.  I understand there are some situations where tests are difficult.  Core dumps and unwinding come to mind.  There are probably others.  But this is the exception, and not the norm.  Almost every change should go in with tests.
> >>
> >> As long as it can be tested reasonably I am fine with rejecting changes going in that don't have tests.
> >> One of the problem is that most changes go in without review.  I understand why this is, because Apple especially are code owners of more than 80% of LLDB, so people adhere to the post-commit review.  This is fine in principle, but if changes go in without tests and there was no corresponding code review, then my only option is to either keep pinging the commit thread in hopes I'll get a response (which I sometimes don't get), or revert the change.  Often though I get a response that says "Yea I'll get to adding tests eventually".  I especially want this last type of response to go the way of the dinosaur.  I don't know how to change peoples' habits, but if you could bring this up at you daily/weekly standups or somehow make sure everyone is on the same page, perhaps that would be a good start.  Reverting is the best way I know to handle this, because it forces a change.  But at the same time it's disruptive, so I really don't want to do it.
> >
> > I agree that reversion is aggressive and it would be better to have some nicer way to enforce this.  It is also a bit rigid and sometimes people's schedules make delaying the test-writing seem a very persuasive option.  We want to take that into account.  Maybe every time a change goes in that warrants a test but doesn't have one we file a bug against the author to write the tests, and mark it some way that is easy to find.  Then if you have more than N such bugs you can't make new checkins till you get them below N?  That way this work can be batch more easily to accommodate schedules but there are still consequences if you put it off too long.
> That seems like a reasonable idea.  We'd need to decide what N was (Is 3 too low?), but in principle it sounds like a good idea.

Yeah, 3 sounds like a reasonable number.

>  
> >
> >>
> >>>
> >>> 3) If a CL cannot be tested without a command api test due to limitations of the SB API, require new changes to go in with a corresponding SB API change.
> >>
> >> One issue here is I don't want stuff added to the SB API just so that it can be tested. The SB API must remain clean and consistent and remain an API that makes sense for debugging. I don't want internal goo being exposed just so we can test things. If we run into this a lot, we might need to make an alternate binary that can test internal unit tests. We could make a lldb_internal.so/lldb_internal.dylib/lldb_internal.dll that can be linked to by internal unit tests and then those unit tests can be run as part of the testing process. So lets keep the SB API clean and sensible with no extra fluff, and find a way to tests internal stuff in a different way.
> >> Agree that the SB API should be treated with care.  Unit tests are a good way around this as you mention.  We already have them running under the CMake build.  If you want to try it out, build with CMake and run "ninja check-lldb-unit".
> >
> > Another way to solve this (which we used in the Tcl/Tk project which has an extensive test suite written in Tcl) is to have a testing API that is for poking at internal bits explicitly for testing.  This is more stable than using the internal API's since presumably the testing API's are more surgical about what they need and so less exposed to changes.  It also means you can use the conveniences of working with Python where there are lots of examples and a more user-friendly environment when you have to do complex setup or test evaluation.  In the Tcl case it also would occasionally uncover functionality that was useful in the SB API's, since you might originally get a bunch of disparate bits of information in small special-purpose commands, but after a bit they will coalesce into a reasonable SB API.
>  
> The best place for using unit tests is in internal data structures that are self-contained by their very nature.  For example, I have a CL that I'm working on to get the python extension module working under Python 3.  I had to refactor / rewrite some of the stuff in PythonDataObjects.  Then a bunch of tests started failing.  It's really annoying to debug Python code that calls through to native code.  If there were unit tests that tested that all the different types of PythonObject worked correctly (string, list, int, array, etc) I would have found the bugs immediately.
> 
> So I'm actually working on adding these unit tests right now before I upload the CL for review.
> 
> Another example is back when I first started working on LLDB, there were a bunch of windows bugs in the IRMemoryMap part of the expression evaluator.  But they were in really low level stuff like finding an item in a binary tree, or allocating a block of memory.  It's way too low level to be exposed through a language as high level as Python or Tcl, but at the same time the tests are really useful because they make it really easy to find bugs when you change something.

Sorry, I didn't mean to argue against unit tests.  I can see cases where you want to poke at the implementation details of some bit of lldb, and the only reasonable way to do that is to grab the data structure directly.  I was addressing the orthogonal concern where there's a bit of information that some object holds which doesn't fit naturally in the SB API, but isn't really an implementation detail in the way that unit tests are useful for.  In that case having a way to easily add access to it in the test-suite objects is a much lighter-weight way to do this.  Seems like there's room for both approaches, where unit tests are better for really low level testing, and a "test library" is better for things that don't depend on internal details so much as on data which it isn't worth figuring out a good way to expose but is useful to test.

Jim