[PATCH] D28789: [lit] Support sharding testsuites, for parallel execution.

Tue Jan 17 21:33:03 PST 2017

graydon added inline comments.

================
Comment at: utils/lit/lit/main.py:436
+                            shard_begin+1, shard_end))
+            run.tests = run.tests[shard_begin:shard_end]
+
----------------
ddunbar wrote:
> graydon wrote:
> > ddunbar wrote:
> > > Would it be better to shard in a round robin fashion? There is some tendency for tests to be clumped by where they are defined, and where they are defined to be (weakly) correlated with how long they take to run, so that would distribute long running tests across machines, which should help reduce the deviation between total testing time among shards.
> > Considered it, but decided against based on the (possibly wrong) guess that the discovery-clumping order would have better locality in terms of what test-prerequisites are built, tested, and hot-in-cache. If you think round-robin will work better overall, I'm happy to change it.
> > 
> > 
> > 
> What do you mean by test prerequisites? lit currently doesn't really do any shared work on a per-test basis that could be cached.
> 
> One other advantage of the current clumping is you are more likely to get deterministic assignments to machines, which is a blessing and a curse. The blessing means you won't have weird configuration changes that users might not think to check, the curse means you are less likely to shake such things out.
> 
> I'm ok with the current patch unless you feel swayed the other direction.
I meant things like, say, if there is a module that gets cached as a .pcm between a bunch of tests against it, or a .dylib or .a that's only generated on-demand for running tests, there might be advantage in only running them in one spot.

I agree the likelihood of the same test running on the same machine over multiple runs might be either good or bad. My gut suggests it's better to actually have them move around some, to shake nondeterminism bugs out. Hard to say.

I just redid the code to support round-robin assignment (and fixed some bugs) and think it actually reads a bit nicer, and thinking it over I think it might be a bit more useful as a smoke-test or profiling mode for users as well (i.e. you can run --num-shards=100 --run-shard=1 to run an evenly-distributed 1% of the testsuite against a wip change). Will post revised patch once I've adjusted tests.

https://reviews.llvm.org/D28789