[cfe-dev] [llvm-dev] RFC: End-to-end testing

Wed Oct 9 16:43:07 PDT 2019

On Wed, Oct 9, 2019 at 2:31 PM David Greene <dag at cray.com> wrote:

> Mehdi AMINI via llvm-dev <llvm-dev at lists.llvm.org> writes:
>
> >> I absolutely disagree about vectorization tests.  We have seen
> >> vectorization loss in clang even though related LLVM lit tests pass,
> >> because something else in the clang pipeline changed that caused the
> >> vectorizer to not do its job.
> >
> > Of course, and as I mentioned I tried to add these tests (probably 4 or 5
> > years ago), but someone (I think Chandler?) was asking me at the time:
> does
> > it affect a benchmark performance? If so why isn't it tracked there? And
> if
> > not does it matter?
> > The benchmark was presented as the actual way to check this invariant
> > (because you're only vectoring to get performance, not for the sake of
> it).
> > So I never pursued, even if I'm a bit puzzled that we don't have such
> tests.
>
> Thanks for explaining.
>
> Our experience is that relying solely on performance tests to uncover
> such issues is problematic for several reasons:
>
> - Performance varies from implementation to implementation.  It is
>   difficult to keep tests up-to-date for all possible targets and
>   subtargets.
>
> - Partially as a result, but also for other reasons, performance tests
>   tend to be complicated, either in code size or in the numerous code
>   paths tested.  This makes such tests hard to debug when there is a
>   regression.
>
> - Performance tests don't focus on the why/how of vectorization.  They
>   just check, "did it run fast enough?"  Maybe the test ran fast enough
>   for some other reason but we still lost desired vectorization and
>   could have run even faster.
>
> With a small asm test one can documented why vectorization is desired
> and how it comes about right in the test.  Then when it breaks it's
> usually pretty obvious what the problem is.
>
> They don't replace performance tests, they complement each other, just
> as end-to-end and component tests complement each other.
>
> >> Debugging and PGO involve other components, no?
> >
> > I don't think that you need anything else than LLVM core (which is a
> > dependency of clang) itself?
>
> What about testing that what clang produces is debuggable with lldb?
> debuginfo tests do that now but AFAIK they are not end-to-end.
>
> > Things like PGO (unless you're using frontend instrumentation) don't even
> > have anything to do with clang, so we may get into the situation David
> > mentioned where we would rely on clang to test LLVM features, which I
> find
> > non-desirable.
>
> We would still expect component-level tests.  This would be additional
> end-to-end testing, not replacing existing testing methodology.  I agree
> the concern is valid but shouldn't code review discover such problems?
>

Yes I agree, this concern is not a blocker for doing end-to-end testing,
but more a "we will need to be careful about scoping the role of the
end-to-end testing versus component level testing".

>
> >> > Actually the first one seems more of a pure LLVM test.
> >>
> >> Definitely not.  It would test the pipeline as constructed by clang,
> >> which is very different from the default pipeline constructed by
> >> opt/llc.
> >
> > I am not convinced it is "very" difference (they are using the
> > PassManagerBuilder AFAIK), I am only aware of very subtle difference.
>
> I don't think clang exclusively uses PassManagerBuilder but it's been a
> while since I looked at that code.
>

Here is the code:
https://github.com/llvm/llvm-project/blob/master/clang/lib/CodeGen/BackendUtil.cpp#L545

All the extension point where passes are hooked in are likely things where
the pipeline would differ from LLVM.

>
> > But more fundamentally: *should* they be different? I would want `opt
> -O3`
> > to be able to reproduce 1-1 the clang pipeline.
>
> Which clang pipeline?  -O3?  -Ofast?  opt currently can't do -Ofast.  I
> don't *think* -Ofast affects the pipeline itself but I am not 100%
> certain.
>

If -Ofast affects the pipeline, I would expose it on the PassManagerBuilder
I think.

>
> > Isn't it the role of LLVM PassManagerBuilder to expose what is the "-O3"
> > pipeline?
>
> Ideally, yes.  In practice, it's not.
>
> > If we see the PassManagerBuilder as the abstraction for the pipeline,
> then
> > I don't see what testing belongs to clang here, this seems like a
> layering
> > violation (and maintaining the PassManagerBuilder in LLVM I wouldn't want
> > to have to update the tests of all the subproject using it because they
> > retest the same feature).
>
> If nothing else, end-to-end testing of the pipeline would uncover
> layering violations.  :)
>
> >> The old and new pass managers also construct different
> >> pipelines.  As we have seen with various mailing list messages, this is
> >> surprising to users.  Best to document and check it with testing.
> >>
> >
> > Yes: both old and new pass managers are LLVM components, so hopefully
> that
> > are documented and tested in LLVM :)
>
> But we have nothing to guarantee that what clang does matches what opt
> does.  Currently they do different things.

My point is that this should be guaranteed by refactoring and using the
right APIs, not duplicate the testing. But I can also misunderstand what it
is that you would test on the clang side. For instance I wouldn't want to
duplicate testing the O3 pass pipeline which is covered here:
https://github.com/llvm/llvm-project/blob/master/llvm/test/Other/opt-O3-pipeline.ll

But testing that a specific pass is added with respect to a particular
clang option is fair, and actually this is *already* what we do I believe,
like here:
https://github.com/llvm/llvm-project/blob/master/clang/test/CodeGen/thinlto-debug-pm.c#L11-L14

I don't think these particular tests are the most controversial though, and
it is really still fairly "focused" testing. I'm much more curious about
larger end-to-end scope: for instance since you mention debug info and
LLDB, what about a test that would verify that LLDB can print a particular
variable content from a test that would come as a source program for
instance. Such test are valuable in the absolute, it isn't clear to me that
we could in practice block any commit that would break such test though:
this is because a bug fix or an improvement in one of the pass may be
perfectly correct in isolation but make the test fail by exposing a bug
where we are already losing some debug info precision in a totally
unrelated part of the codebase.
I wonder how you see this managed in practice: would you gate any change on
InstCombine (or other mid-level pass) on not regressing any of the
debug-info quality test on any of the backend, and from any frontend (not
only clang)? Or worse: a middle-end change that would end-up with a
slightly different Dwarf construct on this particular test, which would
trip LLDB but not GDB (basically expose a bug in LLDB). Should we require
the contributor of inst-combine to debug LLDB and fix it first?

Best,

-- 
Mehdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20191009/4f6f2dc1/attachment.html>