[PATCH] D35598: Rework machine creation strategy

Mon Aug 14 04:01:26 PDT 2017

jgreenhalgh added a comment.

In https://reviews.llvm.org/D35598#839397, @MatzeB wrote:

> Hi James,

Hi,

> So I am not completely convinced the automatic machine name creation is a desirable behavior. I can see the convenience of machines getting created automatically at the cost of the machine names becoming less meaningful.
> 
> Having said all that I'd be fine to add a flag supporting a variation of the previous behavior where we create new machines if the machine data doesn't match (however I'd slightly change the behavior to append a number to the new machines name to maintain the property that machine names are unique). Would that be fine with you?

Before I go in to more detail about how we're testing (for background, and your interest) - that sounds like a very helpful solution, thanks!

> this is very interesting to hear as I would not have expected the previous behavior to be desirable. Just to explain some more where I am coming from:
> 
> - LNT Submissions are typically performed by CI jobs which for us are required to have a unique name, so it is natural to use the same LNT machine name as the CI jobs name.

This is also true for us, however we are building nightly using 30 groupings of "machines" (which are themselves pools of real machines driven by buildbot), and with historical data going back 4 years. Each of these groupings of machines use names automatically derived from the key aspects of their hardware and release branch they track, and we're diligent at recording more subtle machine differences in the "machine" field.

> - When selecting a machine in LNT the only thing to go on is the machine name. If for example I have 7 different machines named "gcc7" (with some of the other fields differing), I would need to click 7 times today, to figure out which is the machine that I want.

That is probably where the difference in perspectives comes from. We would have 7 machines producing results, which we expect to have identical configuration, and which we would group under the name "gcc7.$target_board.$target_cpu". In normal use, we would not expect 7 machines named "gcc7" to produce a result in one night, we would expect there to be one "active" gcc7 at a time (measured in months), and so choosing the right machine would be a matter of picking the one which has been building most recently. Occasionally due to sysadmin/user error, one of the machines in the pool might malfunction and end up in an inappropriate configuration. As an example from today, one rogue machine in the pool was accidentally patched up to a newer kernel version. When we import a run from that machine, the old LNT behaviour would create a separate machine, ensuring that the data integrity was maintained with the new machine isolated (which it wouldn't be if we forced --update-machine) but that we still had data in the system that we could compare (which we wouldn't get with the new error behaviour). This becomes important to us when importing historical data, as we really do want a new machine every time configuration changes, but we don't want to have to encode that in the machine name. Put another way, when we migrate a grouping of boards to a more recent kernel version, we want that change to make the data sets disjoint, but without us having to invent a new name for the machine pool.

Automatically appending a number to the machine name would therefore work well for our use case.

> - Similarily when connecting 3rd party visualisation/analysis to LNT it is convenient to have unique machine names that you can reference. Machine id numbers are only valid within one LNT database, and are also not predictable.

We don't do this, but I can see why this would be useful to you.

> - Looking at `lnt runtest test-suite` mode, nobody even bothered filling out the machine fields and to my knowledge nobody complained to this day.

We're a somewhat unique consumer of LNT, in that we have a completely separate infrastructure for running and recording test results, we generate JSON from this infrastructure which is suitable for import to LNT for visualisation. We make heavy use of the machine fields.

I'm very grateful for your help in resolving this, I appreciate we're running a non-standard configuration over here!

James

Repository:
  rL LLVM

https://reviews.llvm.org/D35598