<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Aug 8, 2018 at 9:08 AM, David Greene via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">Simon Pilgrim <<a href="mailto:llvm-dev@redking.me.uk">llvm-dev@redking.me.uk</a>> writes:<br>

<br>

> Changing a test's IR to avoid an issue in a patch is very problematic,<br>

> but if any test's codegen changes because of a patch then it just<br>

> needs to be reviewed, preferably by someone who has touched that test<br>

> in the past.<br>

<br>

</span>But wouldn't it be even better if that output didn't need to be changed<br>

at all and therefore didn't need to be reviewed?  Right now a lot of X86<br>

changes have a lot of noise in the diff due to test output (asm)<br>

changes.  A fair majority of those changes are incidental to a given<br>

patch.<br></blockquote><div><br></div><div>I'm working on the RISC-V port and I find the tests extremely brittle and what should be independent tests getting "broken" (not really broken, just codegen changed a little) all the time.</div><div><br></div><div>Some examples I've come across recently:</div><div><br></div><div>- 128x128 multiply on rv32. Generates several dozen inline instructions. Any change in which temporary register the compiler picks for any single instruction breaks it. Any instruction scheduling change breaks it. An operational test checking the results of some random values and some extreme values on qemu would be much better I think. All the major llvm target ISAs are supported in qemu, and I suppose the rest have some other emulator.</div><div><br></div><div>- several months ago the code generator started using an alias "ret" instead of "jalr x0,x1,0" (jump to the address in register x1 (+0 offset) and store the old PC in register x0 i.e. discard it). That broke literally every test. Other recent changes such as outputting "mv a,b" instead of "addi a,b,0" broke tests all over the place, though of course fewer of them.</div><div><br></div><div>- another patch (not yet merged to master) outputs stack use metadata just after the addi that adjust the stack pointer at the start of a function. It's not even actual code, but it breaks every test that makes a stack frame (thankfully most tests are too simple to need one).</div><div><br></div><div>Yes, there are scripts to automatically regenerate tests. Hopefully after making sure that the changes to the output are not in fact bugs :-) It seems pretty cumbersome to use the output from llvm-lit to track down what it's actually complaining about. I'm gravitating to just running the update script for all tests and then using git diff to see what it changed. And, as mentioned, the diffs in the tests are often orders of magnitude bigger than the diffs for the actual code.</div><div><br></div><div>Is this the same for all ISAs? Is it really the best way? Maybe it calms down once a back-end is more mature.</div><div><br></div></div></div></div>