[llvm-dev] GitHub Survey?

Duncan P. N. Exon Smith via llvm-dev llvm-dev at lists.llvm.org
Thu Oct 13 14:14:20 PDT 2016


> On 2016-Oct-13, at 11:56, Renato Golin <renato.golin at linaro.org> wrote:
> 
> Hi Duncan,
> 
> I don't understand your concerns.
> 
> First, the choice between sub-modules and mono-repo has been put
> forward as the only two choices because people felt that, if we let it
> open, we'd have too many different implementation details and we'd
> never get anywhere.

Yup, that makes sense.  I'm partly just trying to fill in the survey now that we've been informed by the proposal process (impossible to predict ahead of time).

BTW, this has now been committed, so you can read it here:
http://llvm.org/docs/Proposals/GitHubMove.html

> So...
> 
>> - how much pain the transition would cause, instead of what they think the right final state is.
> 
> The final state is defined by submod vs. monorepo, and that's
> represented in a different question. Those questions are addressing
> the additional work done to get there, as many have said would be the
> crucial decision point.
> 
> It also outlines the cost over their preferred vs non-preferred
> solutions, which leads to the aggregated cost over the whole project
> for each decision.

I guess my problem is that it's really hard for people to judge this, so I'm breaking it down to specific questions that are easy to judge.  I think if we can trust that people answered accurately, the date will be more useful for the BoF discussion.

> 
>> - what's good for the individuals responding, instead of what they think is best for the LLVM project; and
> 
> That's implied. I think it is clear enough, but we can always change
> the wording if others feel confused.
> 
> 
>> Secondly, I'm worried about this question: "How does the choice between a single repository with all projects and the use of sub-modules impact your usage of Git?"  I'm not sure we'll good signal from this; it's essentially a vote on the two variants, but it doesn't force the respondent to think about the specific issues.  I'd rather find a way to ask about the specific concerns raised in the document.
> 
> It is a vote. The "thinking" is on the extended answer that follows.
> Answers with good extended reasoning will have a greater weight than
> those without.

I'd like to move it later.

Asking for a vote first will affect how respondents answer the rest of the questions; humans have a tendency to post-rationalize their decisions.  Asking for the vote at the end forces them to think through the issues first, informing their eventual decision/vote.

> 
> If you're worried about data mining, than leaving those questions to
> full text answers will require someone to read it all, interpret, and
> put their bias on top. Given the nature of this problem, we should
> avoid bias whenever possible, especially when interpreting the
> answers.

I agree that full text answers aren't good for data mining, and lead to bias.

That's why I argue for spelling out the known concerns with specific radio and/or checkbox questions.

> 
> 
>> Thirdly, I'm worried that the follow-ups talk about "preferred" and "non-preferred" instead of "multirepo" and "monorepo".  This makes data-mining non-trivial (because the meaning depends on previous answers) and increases the chance of respondent confusion.
> 
> I see your point. We can re-word to make that more clear.
> 
>> 4. How often do you work on a small LLVM sub-project without using a checkout of LLVM itself?
>> - Always.
>> - Most of the time.
>> - Sometimes.
>> - Never.
> 
> Interesting, it covers the main problem with both proposals.
> 
> 
>> 5. Please categorize how you interact with upstream.
>> - I need read/write access, and I have limited disk space.
>> - I need read/write access, but a 1GB clone doesn't scare me.
>> - I only need read access.
> 
> I'm not sure that's critical. My current source repo has 35GB with
> just a few worktrees.
> 
> Also, both solutions have low-disk-usage modes, and this would make no
> difference on how we proceed.

This is targeting the number one contentious issue about monorepo.  You can see more in the proposal:
http://llvm.org/docs/Proposals/GitHubMove.html#id12

Affected users that need read-write access can use the SVN bridge (or the git-svn layer on top of it).

There's another concern that the SVN bridge might somehow go away, killing the split sub-project option for write access.  However, we'll *always* be able to maintain a split sub-project Git mirror.

This question groups everyone into three categories:
- People that are worried about disk space and need read/write access.  They'll be relying somehow on the SVN bridge.
- People that are not worried about disk space.  Whether they decide to use monorepo or the SVN bridge, their disk space is not preventing them from using monorepo.  (It sounds like this is your category.)
- People that don't need write access, so split Git mirrors would be sufficient (they don't rely on the SVN bridge).

> 
> 
> 
>> 6. How important is cross-project blame, grep, etc.?
>> - Vital.  I already use SVN/monorepo/custom-tooling to accomplish this.
>> - Extremely.  It should be easy enough that everyone does it by default.
>> - Somewhat.  I would use it if it were easy, but it's just nice to have.
>> - Not at all.  Anyone who cares can write their own tooling.
> 
> Based on other comments in the thread, we should leave this one out.

I don't understand your reasoning.  Why?

This is targeting one of the benefits of monorepo.  I think it's important to know if anyone cares.
> 
>> 7. Single-commit cross-project refactoring designs away a class of build failures and simplifies making API changes.  How important is it?
>> - Vital.  I already use SVN/monorepo/custom-tooling to accomplish this.
>> - Extremely.  It should be easy enough that everyone does it by default.
>> - Somewhat.  I would use it if it were easy, but it's just nice to have.
>> - Not at all.  Anyone who cares can write their own tooling.
> 
> I don't like to assert my opinion and then ask how much people agree.

This doesn't strike me as opinion.  It does design away a class of build failures (whether it's an important class is opinion), and it does simplify making API changes (whether it matters is opinion).

> I prefer to ask the question directly, like:
> 
> How often do you need to commit across repositories (ex. llvm+clang)
> and how often are your builds broken because they're in separate
> repositories?

This affects more than just the people *making* the changes, so I'm not a big fan of your wording.

I also don't think the frequency of the problems necessarily reflects developers' opinions about how important they are.
- Some people may find that it happens "all the time", but think that it's not important.
- Others may find that it happens rarely, but think that it's devastating.

> Also, I think your scale of important is somewhat skewed up. Vital and
> Extremely are at the top, somewhat is right bang in the middle and not
> at all is the very bottom.

How about "Quite" instead of "Extremely"?

> 
> You either have two positive and two negative (very, somewhat, not
> much, not at all) or you add a fifth in the middle. I prefer 4 because
> that makes people think harder.

I think they're all positive except "not at all" (which is 0).  Since "vital" is an absolute adjective, it clearly sets an upper limit.  But I'm happy to shift somewhat/extremely if you can think of better things in the middle.

> 
> 
>> 8. The multirepo variant provides read-only umbrella repository to coordinate commits between the split sub-project repositories using Git submodules.  Assuming multirepo gets adopted, how do you expect to use the umbrella?
>> // checkboxes:
>> + Actively contribute tooling improvements to improve it.
>> + Integrate it into our downstream fork.
>> + Use it for upstream contributions.
>> + Use it as the primary interface development environment.
>> + Use it for bisection.
> 
> Good. (+ N/A, too)
> 

Sure, that works.  Although "leaving all boxes blank" means the same thing I think.

> 
> 
>> 12. The multi/mono hybrid variant merges some sub-projects, but leaves runtimes in separate repositories using the umbrella to tie them together.  Is this the best or worst of both worlds?
>> - This is great.  Native cross-project refactoring, without penalizing runtime-only developers.
>> - Whatever.  I'll deal with it.
>> - This is terrible.  All the transition pain of monorepo, without the advantages.
> 
> I didn't know we were proposing yet another variant. This seems like a
> last minute rushed in proposal and I don't want to endorse it in the
> survey. We can discuss them in the BoF, though.

It was raised around a month ago in the proposal thread as a compromise solution.  Here's the description.
http://llvm.org/docs/Proposals/GitHubMove.html#multi-mono-hybrid-variant

Since this hasn't been carefully thought through, it risks wasting a *lot* of time at the BoF.  I'd like to raise it here so that we know if it's worth talking about there.  If this gets a lot of support, then we should talk about it at the BoF.  But if most people think it's the end of the world then we can skip the conversation.
> 
>> 13. If multirepo is adopted, how much pain will there be in your transition?
>> - Nothing consequential.
>> - A little; but it'll be fine.
>> - A lot; but it'll get done somehow.
>> - Too much; I/we may stop contributing to LLVM.
>> 
>> 14. If monorepo is adopted, how much pain will there be in your transition?
>> - Nothing consequential.
>> - A little; but it'll be fine.
>> - A lot; but it'll get done somehow.
>> - Too much; I/we may stop contributing to LLVM.
> 
> Those are already covered by the current bad/good, but I'll change the
> wording to be like this one.

Yes, these were basically rewording of those questions :).

> 
> 
>> 15. If we could go back in time and restart the project with today's technologies, which repository scheme would be best for the LLVM project?
>> - CVS.
>> - Subversion repository with split sub-projects (<sub-project>/trunk), with git-svn.
>> - Subversion repository as a single project (trunk/<sub-project>), with git-svn.
>> - Git: multirepo variant.
>> - Git: monorepo variant.
>> - Git: multi/mono hybrid variant.
>> - Other.
> 
> Let's not put CVS in there, please. :)

I believe it was the choice Chris made, so it seemed worth mentioning ;).  If you really don't want it there, I'm fine with you taking it out.

> 
> So, what's the purpose of this question?

This is my wording for "the vote".

> I mean, we are "starting
> fresh" in a way, and the responses of the rest of the survey would
> make this question irrelevant, no?

This is wording to tease out: "If there were no transition pain, what do you think the best solution would be?"  I worded differently so that I wasn't making people think about the pain while they thought about their answer.

(Might be good to have a text box for "other" in case the entire community wants Mercurial or something; up to you.)

> I'll be changing the wording on the ones we all agree on and leave the
> ones with questions until they're all solved.

Makes sense!

> 
> cheers,
> --renato



More information about the llvm-dev mailing list