[llvm-dev] GitHub Survey?

Thu Oct 13 11:56:51 PDT 2016

Hi Duncan,

I don't understand your concerns.

First, the choice between sub-modules and mono-repo has been put
forward as the only two choices because people felt that, if we let it
open, we'd have too many different implementation details and we'd
never get anywhere.

So...

> - how much pain the transition would cause, instead of what they think the right final state is.

The final state is defined by submod vs. monorepo, and that's
represented in a different question. Those questions are addressing
the additional work done to get there, as many have said would be the
crucial decision point.

It also outlines the cost over their preferred vs non-preferred
solutions, which leads to the aggregated cost over the whole project
for each decision.

> - what's good for the individuals responding, instead of what they think is best for the LLVM project; and

That's implied. I think it is clear enough, but we can always change
the wording if others feel confused.

> Secondly, I'm worried about this question: "How does the choice between a single repository with all projects and the use of sub-modules impact your usage of Git?"  I'm not sure we'll good signal from this; it's essentially a vote on the two variants, but it doesn't force the respondent to think about the specific issues.  I'd rather find a way to ask about the specific concerns raised in the document.

It is a vote. The "thinking" is on the extended answer that follows.
Answers with good extended reasoning will have a greater weight than
those without.

If you're worried about data mining, than leaving those questions to
full text answers will require someone to read it all, interpret, and
put their bias on top. Given the nature of this problem, we should
avoid bias whenever possible, especially when interpreting the
answers.

> Thirdly, I'm worried that the follow-ups talk about "preferred" and "non-preferred" instead of "multirepo" and "monorepo".  This makes data-mining non-trivial (because the meaning depends on previous answers) and increases the chance of respondent confusion.

I see your point. We can re-word to make that more clear.

> 4. How often do you work on a small LLVM sub-project without using a checkout of LLVM itself?
> - Always.
> - Most of the time.
> - Sometimes.
> - Never.

Interesting, it covers the main problem with both proposals.

> 5. Please categorize how you interact with upstream.
> - I need read/write access, and I have limited disk space.
> - I need read/write access, but a 1GB clone doesn't scare me.
> - I only need read access.

I'm not sure that's critical. My current source repo has 35GB with
just a few worktrees.

Also, both solutions have low-disk-usage modes, and this would make no
difference on how we proceed.

> 6. How important is cross-project blame, grep, etc.?
> - Vital.  I already use SVN/monorepo/custom-tooling to accomplish this.
> - Extremely.  It should be easy enough that everyone does it by default.
> - Somewhat.  I would use it if it were easy, but it's just nice to have.
> - Not at all.  Anyone who cares can write their own tooling.

Based on other comments in the thread, we should leave this one out.

> 7. Single-commit cross-project refactoring designs away a class of build failures and simplifies making API changes.  How important is it?
> - Vital.  I already use SVN/monorepo/custom-tooling to accomplish this.
> - Extremely.  It should be easy enough that everyone does it by default.
> - Somewhat.  I would use it if it were easy, but it's just nice to have.
> - Not at all.  Anyone who cares can write their own tooling.

I don't like to assert my opinion and then ask how much people agree.
I prefer to ask the question directly, like:

How often do you need to commit across repositories (ex. llvm+clang)
and how often are your builds broken because they're in separate
repositories?

Also, I think your scale of important is somewhat skewed up. Vital and
Extremely are at the top, somewhat is right bang in the middle and not
at all is the very bottom.

You either have two positive and two negative (very, somewhat, not
much, not at all) or you add a fifth in the middle. I prefer 4 because
that makes people think harder.

> 8. The multirepo variant provides read-only umbrella repository to coordinate commits between the split sub-project repositories using Git submodules.  Assuming multirepo gets adopted, how do you expect to use the umbrella?
> // checkboxes:
> + Actively contribute tooling improvements to improve it.
> + Integrate it into our downstream fork.
> + Use it for upstream contributions.
> + Use it as the primary interface development environment.
> + Use it for bisection.

Good. (+ N/A, too)

> 9. If multirepo is adopted, how do you plan to contribute to upstream?
> - Using Git submodules.
> - Using the Git repos directly.
> - Using the SVN bridges.
> - n/a: I don't contribute.
>
> 10. The monorepo variant provides read/write access to sub-projects via an SVN bridge and git-svn.  Contributors will have the option to continue using repositories split on project boundaries.  Assuming monorepo gets adopted, how do you plan to contribute?
> - I'll use the monorepo as soon as it's possible, even before it's canonical.
> - I'll use the monorepo as soon as it's canonical.
> - I'll transition to monorepo eventually.
> - I'll use the SVN bridge on separated sub-projects forever.
> - I'll use a Git mirror (and/or git-svn) on separated sub-projects forever.
> - n/a: I don't contribute.
>
> 11. If monorepo is adopted, how do you plan to integrate it downstream?
> - We already use monorepo.
> - We'll switch to pulling from monorepo during the transition period.
> - We'll switch to pulling from monorepo eventually.
> - We'll integrate from the SVN bridge forever.
> - We'll integrate from the split sub-project Git mirror forever.
> - n/a: There is no downstream.

Good.

> 12. The multi/mono hybrid variant merges some sub-projects, but leaves runtimes in separate repositories using the umbrella to tie them together.  Is this the best or worst of both worlds?
> - This is great.  Native cross-project refactoring, without penalizing runtime-only developers.
> - Whatever.  I'll deal with it.
> - This is terrible.  All the transition pain of monorepo, without the advantages.

I didn't know we were proposing yet another variant. This seems like a
last minute rushed in proposal and I don't want to endorse it in the
survey. We can discuss them in the BoF, though.

> 13. If multirepo is adopted, how much pain will there be in your transition?
> - Nothing consequential.
> - A little; but it'll be fine.
> - A lot; but it'll get done somehow.
> - Too much; I/we may stop contributing to LLVM.
>
> 14. If monorepo is adopted, how much pain will there be in your transition?
> - Nothing consequential.
> - A little; but it'll be fine.
> - A lot; but it'll get done somehow.
> - Too much; I/we may stop contributing to LLVM.

Those are already covered by the current bad/good, but I'll change the
wording to be like this one.

> 15. If we could go back in time and restart the project with today's technologies, which repository scheme would be best for the LLVM project?
> - CVS.
> - Subversion repository with split sub-projects (<sub-project>/trunk), with git-svn.
> - Subversion repository as a single project (trunk/<sub-project>), with git-svn.
> - Git: multirepo variant.
> - Git: monorepo variant.
> - Git: multi/mono hybrid variant.
> - Other.

Let's not put CVS in there, please. :)

So, what's the purpose of this question? I mean, we are "starting
fresh" in a way, and the responses of the rest of the survey would
make this question irrelevant, no?

I'll be changing the wording on the ones we all agree on and leave the
ones with questions until they're all solved.

cheers,
--renato