[LLVMdev] DWARF 2/3 backwards compatibility?

Thu Oct 18 14:11:35 PDT 2012

On 10/18/2012 01:39 PM, David Blaikie wrote:
> On Thu, Oct 18, 2012 at 11:19 AM, Rick Foos<rfoos at codeaurora.org>  wrote:
>> On 10/18/2012 10:36 AM, David Blaikie wrote:
>>> On Thu, Oct 18, 2012 at 12:48 AM, Renato Golin<rengolin at systemcall.org>
>>> wrote:
>>>> On 18 October 2012 05:11, Rick Foos<rfoos at codeaurora.org>   wrote:
>>>>> I don't think GDB testsuite should block a commit, it can vary by a few
>>>>> tests, they rarely if ever all pass 100%. Tracking the results over time
>>>>> can
>>>>> catch big regressions, as well as the ones that slowly increase the
>>>>> failed
>>>>> tests.
>>>> Agreed. It should at least serve as comparison between two branches,
>>>> but hopefully, being actively monitored.
>>>>
>>>> Maybe would be good to add a directory (if there isn't one yet) to the
>>>> testsuite repository, or at least the code necessary to make it run on
>>>> LLVM.
>>>
>>> The clang-tests repository (
>>> http://llvm.org/viewvc/llvm-project/clang-tests/ ) includes an Apple
>>> GCC 4.2 compatible version of the GCC and GDB test suites that Apple
>>> run internally. I'm working on bringing up an equivalent public
>>> buildbot at least for the GDB suite here (
>>> http://lab.llvm.org:8011/builders/clang-x86_64-darwin10-gdb-gcc ) -
>>> just a few timing out tests I need to look at to get that green.
>>> Apparently it's fairly stable.
>>>
>>> Beyond that I'll be trying to bring up one with the latest suite (7.4
>>> is what I've been playing with) on Linux as well.
>>>
>> Since you're going to bring a bot up in zorg, I'll stop working on bringing
>> mine testsuite runner forward.
> I'm still interested in any details you have about issues you've
> resolved/learnt, etc.
>
>> A couple thoughts:
>>
>> 1) I've been running on the latest test suite, polling once a day. I think
>> Eric and anyone working dwarf 4/5 should be running against the upstream
>> testsuite. (I have no problems with running 7.4 too)
> Interesting thought. (just so we're all on the same page when you say
> "test suite" you're talking about the GDB dejagnu test suite (the same
> one (well, more recent version of it) that's in clang-tests)) Though I
> hesitate to have such a moving target, I can see how it could be
> useful.
>
Yes, the sourceware.org site. I hesitated as well, but I tried it and 
it's OK.
>> It's been stable to run at the tip of GDB this way, the test results aren't
>> varying much.
> With the right heuristics I suppose this could be valuable, but will
> require more work to find the right signal in the (even small) noise.
>
I wrote a 10 line awk script to create a CSV file out of the test 
summaries to make a one-to-one comparison of tests. It's over 70 lines 
now...It's not a matter that I should have used Python, Everything is an 
exception to the rules. You can get close, but I can't say you can get 
perfect signals.

 From a compiler developer point of view, the spreadsheet was worthless. 
We're not testing GDB, but rather what the compiler feeds to GDB.

Take the log file, check out the suite, rerun a failing test, use 
dwarfdump and llvm-dwarfdump, find the "bad" dwarf records produced by 
the compiler.

All the eventual bugs are about dwarf records, and a gdb testsuite test 
to duplicate.

A bad/confused dwarf record fails multiple tests without a way to map a 
failure back to dwarf.

In the end, a fine grained signal doesn't do what you might want.

>> 2) A surprise benefit of running this way is that hundreds of obsolete
>> tests, or broken tests are getting removed. This hasn't resulted in any
>> broken backwards compatibility here at least. Saves tons of time debugging
>> tests that don't work, and developing around compatible things that
>> reasonable people have decided no longer matter.
> Fair point.
>
>> 3) Testsuite runs against two compilers at a time makes it easier to see
>> regressions. By comparing against a known stable compiler, or GCC,
>> regressions are visible by summary numbers.
> I assume GDB runs their own test suite against some version (or the
> simultaneous latest) of GCC? If we can't scrape those existing results
> we can reproduce them (running the full suite with both GCC&  Clang
> side-by-side).
>
gdb-testers at sourceware.org has a run every night. Yes, I reproduce. A 
non-x86 target has very different results, so I look for a good gcc 
cross compiler to establish a baseline.

In the case of clang, all the arch's share the Dwarf processing. So an 
x86 run covers a lot more of the dwarf processing than worrying too much 
about a cross compiler run. (But some worry about limiting fixes to 
regressions from a cross-compiled GCC, so have to run that as well)
>> 4) I have plots of the summary numbers online with a window of a month or
>> two. The trend allows you to see regressions occurring, and remaining as
>> regressions. Sometimes GDB Testsuite or a compiler has a bad day. The trend
>> let's you see a stable regression, and when you get a round toit, tells you
>> when the regression started.
> Yep, also if we're trying to address all these issues, would be to
> prioritize the very stable failures (where Clang fails a test that GCC
> passes&  does so consistently for a long time) first. Then look at the
> unstable ones last - figure out which compiler's to blame, "XFAIL:
> clang" them or whatever is necessary.
>
I avoid doing that the XFAIL thing. Plotting all the lines from the 
summary makes more sense, and it's what you see when you run the test 
manually.

When you move a Fail artificially to Xfail, the plot just has a few V's 
in it where the Fail line drops, and the Xfail line goes up. No new 
information.

I prefer leaving the actual summary numbers in place. All the data you 
need is there.

As you might have guessed, I like tests that fail, and want to get rid 
of the ones the pass too often :)

>> <soapbox>
>> I've been doing this with Jenkins. It's fairly easy to set up, and does the
>> plotting. Developers can grab a copy of the script to duplicate a run on
>> their broken compiler. Running the testsuite under JNLP increased the number
>> of executed tests - don't know why just did.
>> </soapbox>
> I wouldn't mind seeing your jenkins setup/config/tweaks/etc as a
> reference point, if you've got it lying around.
>
I'll see what I can send, or just as easy to walk through it. Jenkins 
isn't really like buildbot. Do you have Jenkins running there?

> - David

-- 
Rick Foos
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation