<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Oct 9, 2019 at 2:31 PM David Greene <<a href="mailto:dag@cray.com">dag@cray.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Mehdi AMINI via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> writes:<br>

<br>

>> I absolutely disagree about vectorization tests.  We have seen<br>

>> vectorization loss in clang even though related LLVM lit tests pass,<br>

>> because something else in the clang pipeline changed that caused the<br>

>> vectorizer to not do its job.<br>

><br>

> Of course, and as I mentioned I tried to add these tests (probably 4 or 5<br>

> years ago), but someone (I think Chandler?) was asking me at the time: does<br>

> it affect a benchmark performance? If so why isn't it tracked there? And if<br>

> not does it matter?<br>

> The benchmark was presented as the actual way to check this invariant<br>

> (because you're only vectoring to get performance, not for the sake of it).<br>

> So I never pursued, even if I'm a bit puzzled that we don't have such tests.<br>

<br>

Thanks for explaining.<br>

<br>

Our experience is that relying solely on performance tests to uncover<br>

such issues is problematic for several reasons:<br>

<br>

- Performance varies from implementation to implementation.  It is<br>

  difficult to keep tests up-to-date for all possible targets and<br>

  subtargets.<br>

<br>

- Partially as a result, but also for other reasons, performance tests<br>

  tend to be complicated, either in code size or in the numerous code<br>

  paths tested.  This makes such tests hard to debug when there is a<br>

  regression.<br>

<br>

- Performance tests don't focus on the why/how of vectorization.  They<br>

  just check, "did it run fast enough?"  Maybe the test ran fast enough<br>

  for some other reason but we still lost desired vectorization and<br>

  could have run even faster.<br>

<br>

With a small asm test one can documented why vectorization is desired<br>

and how it comes about right in the test.  Then when it breaks it's<br>

usually pretty obvious what the problem is.<br>

<br>

They don't replace performance tests, they complement each other, just<br>

as end-to-end and component tests complement each other.<br>

<br>

>> Debugging and PGO involve other components, no?<br>

><br>

> I don't think that you need anything else than LLVM core (which is a<br>

> dependency of clang) itself?<br>

<br>

What about testing that what clang produces is debuggable with lldb?<br>

debuginfo tests do that now but AFAIK they are not end-to-end.<br>

<br>

> Things like PGO (unless you're using frontend instrumentation) don't even<br>

> have anything to do with clang, so we may get into the situation David<br>

> mentioned where we would rely on clang to test LLVM features, which I find<br>

> non-desirable.<br>

<br>

We would still expect component-level tests.  This would be additional<br>

end-to-end testing, not replacing existing testing methodology.  I agree<br>

the concern is valid but shouldn't code review discover such problems?<br></blockquote><div><br></div><div>Yes I agree, this concern is not a blocker for doing end-to-end testing, but more a "we will need to be careful about scoping the role of the end-to-end testing versus component level testing".</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

<br>

>> > Actually the first one seems more of a pure LLVM test.<br>

>><br>

>> Definitely not.  It would test the pipeline as constructed by clang,<br>

>> which is very different from the default pipeline constructed by<br>

>> opt/llc.<br>

><br>

> I am not convinced it is "very" difference (they are using the<br>

> PassManagerBuilder AFAIK), I am only aware of very subtle difference.<br>

<br>

I don't think clang exclusively uses PassManagerBuilder but it's been a<br>

while since I looked at that code.<br></blockquote><div><br></div><div>Here is the code: <a href="https://github.com/llvm/llvm-project/blob/master/clang/lib/CodeGen/BackendUtil.cpp#L545">https://github.com/llvm/llvm-project/blob/master/clang/lib/CodeGen/BackendUtil.cpp#L545</a></div><div><br></div><div>All the extension point where passes are hooked in are likely things where the pipeline would differ from LLVM.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

<br>

> But more fundamentally: *should* they be different? I would want `opt -O3`<br>

> to be able to reproduce 1-1 the clang pipeline.<br>

<br>

Which clang pipeline?  -O3?  -Ofast?  opt currently can't do -Ofast.  I<br>

don't *think* -Ofast affects the pipeline itself but I am not 100%<br>

certain.<br></blockquote><div><br></div><div>If -Ofast affects the pipeline, I would expose it on the PassManagerBuilder I think.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

<br>

> Isn't it the role of LLVM PassManagerBuilder to expose what is the "-O3"<br>

> pipeline?<br>

<br>

Ideally, yes.  In practice, it's not.<br>

<br>

> If we see the PassManagerBuilder as the abstraction for the pipeline, then<br>

> I don't see what testing belongs to clang here, this seems like a layering<br>

> violation (and maintaining the PassManagerBuilder in LLVM I wouldn't want<br>

> to have to update the tests of all the subproject using it because they<br>

> retest the same feature).<br>

<br>

If nothing else, end-to-end testing of the pipeline would uncover<br>

layering violations.  :)<br>

<br>

>> The old and new pass managers also construct different<br>

>> pipelines.  As we have seen with various mailing list messages, this is<br>

>> surprising to users.  Best to document and check it with testing.<br>

>><br>

><br>

> Yes: both old and new pass managers are LLVM components, so hopefully that<br>

> are documented and tested in LLVM :)<br>

<br>

But we have nothing to guarantee that what clang does matches what opt<br>

does.  Currently they do different things.</blockquote><div><br></div><div>My point is that this should be guaranteed by refactoring and using the right APIs, not duplicate the testing. But I can also misunderstand what it is that you would test on the clang side. For instance I wouldn't want to duplicate testing the O3 pass pipeline which is covered here: <a href="https://github.com/llvm/llvm-project/blob/master/llvm/test/Other/opt-O3-pipeline.ll">https://github.com/llvm/llvm-project/blob/master/llvm/test/Other/opt-O3-pipeline.ll</a> </div><div>But testing that a specific pass is added with respect to a particular clang option is fair, and actually this is *already* what we do I believe, like here: <a href="https://github.com/llvm/llvm-project/blob/master/clang/test/CodeGen/thinlto-debug-pm.c#L11-L14">https://github.com/llvm/llvm-project/blob/master/clang/test/CodeGen/thinlto-debug-pm.c#L11-L14</a></div><div><br></div><div>I don't think these particular tests are the most controversial though, and it is really still fairly "focused" testing. I'm much more curious about larger end-to-end scope: for instance since you mention debug info and LLDB, what about a test that would verify that LLDB can print a particular variable content from a test that would come as a source program for instance. Such test are valuable in the absolute, it isn't clear to me that we could in practice block any commit that would break such test though: this is because a bug fix or an improvement in one of the pass may be perfectly correct in isolation but make the test fail by exposing a bug where we are already losing some debug info precision in a totally unrelated part of the codebase. </div><div>I wonder how you see this managed in practice: would you gate any change on InstCombine (or other mid-level pass) on not regressing any of the debug-info quality test on any of the backend, and from any frontend (not only clang)? Or worse: a middle-end change that would end-up with a slightly different Dwarf construct on this particular test, which would trip LLDB but not GDB (basically expose a bug in LLDB). Should we require the contributor of inst-combine to debug LLDB and fix it first?</div><div><br></div><div>Best,</div><div><br></div><div>-- </div><div>Mehdi</div><div><br></div><div><br></div></div></div></div></div></div>