<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Oct 9, 2019 at 2:31 PM David Greene <<a href="mailto:dag@cray.com">dag@cray.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Mehdi AMINI via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> writes:<br>
<br>
>> I absolutely disagree about vectorization tests. We have seen<br>
>> vectorization loss in clang even though related LLVM lit tests pass,<br>
>> because something else in the clang pipeline changed that caused the<br>
>> vectorizer to not do its job.<br>
><br>
> Of course, and as I mentioned I tried to add these tests (probably 4 or 5<br>
> years ago), but someone (I think Chandler?) was asking me at the time: does<br>
> it affect a benchmark performance? If so why isn't it tracked there? And if<br>
> not does it matter?<br>
> The benchmark was presented as the actual way to check this invariant<br>
> (because you're only vectoring to get performance, not for the sake of it).<br>
> So I never pursued, even if I'm a bit puzzled that we don't have such tests.<br>
<br>
Thanks for explaining.<br>
<br>
Our experience is that relying solely on performance tests to uncover<br>
such issues is problematic for several reasons:<br>
<br>
- Performance varies from implementation to implementation. It is<br>
difficult to keep tests up-to-date for all possible targets and<br>
subtargets.<br>
<br>
- Partially as a result, but also for other reasons, performance tests<br>
tend to be complicated, either in code size or in the numerous code<br>
paths tested. This makes such tests hard to debug when there is a<br>
regression.<br>
<br>
- Performance tests don't focus on the why/how of vectorization. They<br>
just check, "did it run fast enough?" Maybe the test ran fast enough<br>
for some other reason but we still lost desired vectorization and<br>
could have run even faster.<br>
<br>
With a small asm test one can documented why vectorization is desired<br>
and how it comes about right in the test. Then when it breaks it's<br>
usually pretty obvious what the problem is.<br>
<br>
They don't replace performance tests, they complement each other, just<br>
as end-to-end and component tests complement each other.<br>
<br>
>> Debugging and PGO involve other components, no?<br>
><br>
> I don't think that you need anything else than LLVM core (which is a<br>
> dependency of clang) itself?<br>
<br>
What about testing that what clang produces is debuggable with lldb?<br>
debuginfo tests do that now but AFAIK they are not end-to-end.<br>
<br>
> Things like PGO (unless you're using frontend instrumentation) don't even<br>
> have anything to do with clang, so we may get into the situation David<br>
> mentioned where we would rely on clang to test LLVM features, which I find<br>
> non-desirable.<br>
<br>
We would still expect component-level tests. This would be additional<br>
end-to-end testing, not replacing existing testing methodology. I agree<br>
the concern is valid but shouldn't code review discover such problems?<br></blockquote><div><br></div><div>Yes I agree, this concern is not a blocker for doing end-to-end testing, but more a "we will need to be careful about scoping the role of the end-to-end testing versus component level testing".</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<br>
>> > Actually the first one seems more of a pure LLVM test.<br>
>><br>
>> Definitely not. It would test the pipeline as constructed by clang,<br>
>> which is very different from the default pipeline constructed by<br>
>> opt/llc.<br>
><br>
> I am not convinced it is "very" difference (they are using the<br>
> PassManagerBuilder AFAIK), I am only aware of very subtle difference.<br>
<br>
I don't think clang exclusively uses PassManagerBuilder but it's been a<br>
while since I looked at that code.<br></blockquote><div><br></div><div>Here is the code: <a href="https://github.com/llvm/llvm-project/blob/master/clang/lib/CodeGen/BackendUtil.cpp#L545">https://github.com/llvm/llvm-project/blob/master/clang/lib/CodeGen/BackendUtil.cpp#L545</a></div><div><br></div><div>All the extension point where passes are hooked in are likely things where the pipeline would differ from LLVM.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<br>
> But more fundamentally: *should* they be different? I would want `opt -O3`<br>
> to be able to reproduce 1-1 the clang pipeline.<br>
<br>
Which clang pipeline? -O3? -Ofast? opt currently can't do -Ofast. I<br>
don't *think* -Ofast affects the pipeline itself but I am not 100%<br>
certain.<br></blockquote><div><br></div><div>If -Ofast affects the pipeline, I would expose it on the PassManagerBuilder I think.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<br>
> Isn't it the role of LLVM PassManagerBuilder to expose what is the "-O3"<br>
> pipeline?<br>
<br>
Ideally, yes. In practice, it's not.<br>
<br>
> If we see the PassManagerBuilder as the abstraction for the pipeline, then<br>
> I don't see what testing belongs to clang here, this seems like a layering<br>
> violation (and maintaining the PassManagerBuilder in LLVM I wouldn't want<br>
> to have to update the tests of all the subproject using it because they<br>
> retest the same feature).<br>
<br>
If nothing else, end-to-end testing of the pipeline would uncover<br>
layering violations. :)<br>
<br>
>> The old and new pass managers also construct different<br>
>> pipelines. As we have seen with various mailing list messages, this is<br>
>> surprising to users. Best to document and check it with testing.<br>
>><br>
><br>
> Yes: both old and new pass managers are LLVM components, so hopefully that<br>
> are documented and tested in LLVM :)<br>
<br>
But we have nothing to guarantee that what clang does matches what opt<br>
does. Currently they do different things.</blockquote><div><br></div><div>My point is that this should be guaranteed by refactoring and using the right APIs, not duplicate the testing. But I can also misunderstand what it is that you would test on the clang side. For instance I wouldn't want to duplicate testing the O3 pass pipeline which is covered here: <a href="https://github.com/llvm/llvm-project/blob/master/llvm/test/Other/opt-O3-pipeline.ll">https://github.com/llvm/llvm-project/blob/master/llvm/test/Other/opt-O3-pipeline.ll</a> </div><div>But testing that a specific pass is added with respect to a particular clang option is fair, and actually this is *already* what we do I believe, like here: <a href="https://github.com/llvm/llvm-project/blob/master/clang/test/CodeGen/thinlto-debug-pm.c#L11-L14">https://github.com/llvm/llvm-project/blob/master/clang/test/CodeGen/thinlto-debug-pm.c#L11-L14</a></div><div><br></div><div>I don't think these particular tests are the most controversial though, and it is really still fairly "focused" testing. I'm much more curious about larger end-to-end scope: for instance since you mention debug info and LLDB, what about a test that would verify that LLDB can print a particular variable content from a test that would come as a source program for instance. Such test are valuable in the absolute, it isn't clear to me that we could in practice block any commit that would break such test though: this is because a bug fix or an improvement in one of the pass may be perfectly correct in isolation but make the test fail by exposing a bug where we are already losing some debug info precision in a totally unrelated part of the codebase. </div><div>I wonder how you see this managed in practice: would you gate any change on InstCombine (or other mid-level pass) on not regressing any of the debug-info quality test on any of the backend, and from any frontend (not only clang)? Or worse: a middle-end change that would end-up with a slightly different Dwarf construct on this particular test, which would trip LLDB but not GDB (basically expose a bug in LLDB). Should we require the contributor of inst-combine to debug LLDB and fix it first?</div><div><br></div><div>Best,</div><div><br></div><div>-- </div><div>Mehdi</div><div><br></div><div><br></div></div></div></div></div></div>