[Lldb-commits] [lldb] r296495 - Fix incorrect logic in StackFrame::Disassemble.

Tue Feb 28 16:43:41 PST 2017

> On Feb 28, 2017, at 4:15 PM, Zachary Turner <zturner at google.com> wrote:
> 
> 
> 
> On Tue, Feb 28, 2017 at 3:49 PM Jim Ingham <jingham at apple.com> wrote:
> 
> > On Feb 28, 2017, at 3:36 PM, Zachary Turner <zturner at google.com> wrote:
> >
> > That patch was not really about early returns, it was about using StringRef.  So my comment about the aesthetics was referring to the patch in general.  The early return was more of a drive by, since I was already touching the code.  In that respect I was just following the well documented and widely accepted LLVM policy.
> 
> I do not think that it is widely accepted LLVM policy to go changing logic in code you don't really understand without testing the results of that change.  It is much better to leave a little bit of code looking not quite like you would like it to be, than to add a bug to it.
>  
> It seems like this is going to end up in a circular argument.  Testing those changes happens automatically in LLVM, because there is good test coverage.  So it's difficult to impossible to change logic in code without testing the results, because everywhere is tested.  As a result, there is a widespread mentality that "if the tests pass, I didn't break anything".  That is, in fact, the main benefit of having tests.
> 
> Especially when changing a function like StackFrame::Disassemble, you would think that something, somewhere is testing disassembly right?  I mean we can't expect everyone to have the entire code coverage of LLDB mapped out in their heads, so it's reasonable to make assumptions, especially about such core functionality as "is disassembly completely broken?"
> 
> We should not be adopting an exclusionary policy of gating changes to the code based on how much code coverage there is or how much expertise you have in a particular area.  We should instead be finding ways to make it easier to contribute.

You live in the world you have not the one you wished you had.  The world you have is that at present the lldb code coverage is good but not exhaustive, which latter goal would be quite difficult to achieve BTW because by necessity this is a tool with a huge surface area.  More importantly, it is a tool that many thousands of people rely on to get their jobs done.  So if you are going to responsibly edit lldb code, you can't pretend you can do so without ensuring that there is testing for the change you made or adding it.  By doing so you are actively harming folks, and for this sort of change returning no real benefit for that harm. 

> 
>  
> 
> >
> > Regarding testing, I honestly don't see a reasonable path forward regarding increasing test coverage until we have a better testing infrastructure in place that is less platform-dependent, less flaky, and makes it much much easier to write small, targeted tests.
> 
> That seems to me to be a cop-out.  Every time we've actually chased down a "flakey" test on OS X we've found the origin in some race condition in lldb - i.e. the test suite was not flakey, it was working quite correctly in showing somewhere that lldb was flakey.  There seem to still be instances of this - the multi debugger test case - another one of the "flakey" ones - turns out to be a race condition in accessing the section lists in the debugger.  I doubt you are going to get something that actually runs the debugger up to some point and then does testing that is significantly simpler than the current test suite.  And I really doubt that the effort of doing that would end up having more benefit than adding tests to the current suite.
> Right, my point is just that tests are so all encompassing that if one fails it often gives you no clues about the reason for the failure.  Because everything is a full scale integration test.  

I almost never find this to be the case, except for tests - like the multi debugger test which it really did take some work to track down - where the failure is at the integration level.  Otherwise, the failure log points you to exactly the statement that went wrong, and with a few printf's in the Python code - or just running that test case in the debugger - you can easily figure out what went wrong.

The one weakness in this sort of test is that if you break something fundamental in the debugger you will get a huge cascade of failures because none of the other tests can get to the point where they start to do their real work.  But given the way the debugger works, there are large parts of it that just don't work if you break say the ability to stop at a breakpoint...  That was probably a pain when bringing up a new port, because you have to solve a few hard problems first before you can begin to see any light.  But even in that case, it's pretty clear what doesn't work yet ("look, all the tests that try to hit a breakpoint fail at the point where they try to do this.")

> On the other hand, that's the only type of test that it's practical to write.  We have some gtests now, but it's still often very hard to write them because the system is not modular enough.  Since A depends on B depends on C depends on D ad infinitum, you can't even create an instance of A to test it without spinning up 100 other things.

Note the non-modularity that you are talking about is a build-time modularity.  I can't see how that affects writing gtests.  The parts of the code that the tests touch is no different because it is running on a binary that includes all of liblldbcore.a or just some subset of it.

BTW, I don't think Jason had all that much difficulty writing gtests for the unwinder.  That seemed a perfect case where it was easy to isolate the inputs and provide faked versions of them.  He said it was a really convenient way to add tests for a pretty complex part of the debugger.

> 
> There should be a system in place that allows one to write a test that tests *almost nothing* other than the specific functionality you are trying to test.  You can still have the full scale tests, but you need a first line of defense, and integration tests should not be it.  Anyway, this is starting to sound like a discussion we've had many times before.

Again, I am pretty sure that's what the gtests do, and they are appropriate for parts of the debugging that can do their job with almost nothing but the functionality they provide.  The problem with extending this more broadly in lldb is that many parts of the debugger rely on complex input from many sources.  So you have to figure out how to mock up those inputs in a way that isn't so artificial that you end up more testing your mock up than the code that it is driving.  At some point it is easier and more reliable to use real objects for these inputs.

Jim