[PATCH] D35598: Rework machine creation strategy

Fri Aug 11 11:07:45 PDT 2017

MatzeB added a comment.

Hi James,

this is very interesting to hear as I would not have expected the previous behavior to be desirable. Just to explain some more where I am coming from:

- LNT Submissions are typically performed by CI jobs which for us are required to have a unique name, so it is natural to use the same LNT machine name as the CI jobs name.
- When selecting a machine in LNT the only thing to go on is the machine name. If for example I have 7 different machines named "gcc7" (with some of the other fields differing), I would need to click 7 times today, to figure out which is the machine that I want.
- Similarily when connecting 3rd party visualisation/analysis to LNT it is convenient to have unique machine names that you can reference. Machine id numbers are only valid within one LNT database, and are also not predictable.
- Looking at `lnt runtest test-suite` mode, nobody even bothered filling out the machine fields and to my knowledge nobody complained to this day.

To your points:

In https://reviews.llvm.org/D35598#838818, @jgreenhalgh wrote:

> In https://reviews.llvm.org/D35598#819677, @grosser wrote:
>
> > Hi Matze,
> >
> > I did not check the implementation in detail, but this makes total sense to me. From my perspective this is a clear improvement and should go in.
>
>
> A more useful flag for our use would restore the previous behavior, rather than always update the machine. We have a lot of historical data, crossing a number of kernel versions and other machine characteristics to import to LNT. The old behavior was very convenient for this data set - we automatically ended up with new machines after each system configuration change (actually, this automatic disambiguation of machines with variations was a key benefit of LNT for us, and caught a number of bugs in our test infrastructure). We don't want to always update the machine, that would "poison" the quality of the historical data.

- Note that submission are rejected if the machine data doesn't match the previous data, so bugs are catched and incompatible/uncomparable data is avoided. (the `--update-machine` flag is not intended for the regular CI job, but rather to be used manually after updateing a machine in a way that changes the data but is believed to not change performance/keep the data comparable).
- If after changing a machine the new data is not comparable to the historical data I would expect the user to choose a new machine name (which is also nice as it makes the fact of the changed configuration more obvious).
- I also created the `lnt admin` subcommands to enable ways to rename, merge, delete machines to allow cleanup/reorganisation of the data.

> Inventing unique names for each of the variants would be difficult but possible. It feels like it somewhat defeats the point of the machine field - I would be re-encoding the same information in to a machine name (for example something like gcc7-cortex-a57-ubuntu-14.04-linux-4.13-64k-pages ). That makes all other interactions with the system (e.g. choosing runs for comparison)  very cumbersome.
> 
> I agree that the old behavior could be confusing, but I don't really know how to sensibly interact with the new design in a way that preserves data quality without needing an explosion in naming complexity. For me, this is not a clear improvement.

So I am not completely convinced the automatic machine name creation is a desirable behavior. I can see the convenience of machines getting created automatically at the cost of the machine names becoming less meaningful.

Having said all that I'd be fine to add a flag supporting a variation of the previous behavior where we create new machines if the machine data doesn't match (however I'd slightly change the behavior to append a number to the new machines name to maintain the property that machine names are unique). Would that be fine with you?

Repository:
  rL LLVM

https://reviews.llvm.org/D35598