[llvm-dev] Git move survey

Wed Aug 24 14:11:12 PDT 2016

I went back and looked at the current survey, and I have a lot of thoughts I wanted to share. I apologize for this being a dense response to a days-dormant thread.

What information do we want to get out of the survey? The current survey is mostly just giving people people a way to vote for their preferred solution. I think this is a huge missed opportunity. One of the things I've found most frustrating about the Git-related threads is that there have been several assertions following the form "most people <blah>". This is a really great opportunity for us to actually get some real data to prove or disprove these assertions. For some data points I pulled a few assertions out of the mono-repo thread:

David Chisnall wrote:
"clang-tools-extra is explicitly a bunch of stuff that doesn’t belong in the main clang repo because it’s not of interest to most people doing clang work"

Paul Robinson wrote:
"I'm not clear why imposing this cost on everybody who wants less-than-all (which I'd think would be most people)"

Justin Lebar wrote:
"If you use the workflow that we currently have, then on the client side, there is no guarantee that your subprojects will be sync'ed.  (This is the same as most peoples' client-side git workflows today.)"

I wrote:
"I think we have some pretty strong evidence in the form of the github fork counts (https://github.com/llvm-mirror/) that most people aren’t using all of the LLVM projects."

In a very general sense I think the survey as written is little more than a vote for which option people prefer, and an opportunity to rate how good or bad they think the alternative is. As a result I think the current survey has a selection bias that will exclude people who may not have clear or strong opinions on the proposals. As I said in an earlier response I also think the reliance on text fields will make the data harder to process and understand if we get a large number of responses (and I really hope we get a lot of responses).

I think we should consider approaching this problem differently. Instead of structuring a vote, we could focus on gathering data about users and workflows, and using that real-world data to guide a decision that is best for the most common use cases. Correlating information about people's workflow answers against their relationship to the community will allow us to categorize and weigh the results.

I've compiled a list of a few pieces of data I think we should gather. If we took an approach like I'm proposing for the survey we would want more people in the community to suggest additional things to gather information around.

My list is:

(1) Which projects people contribute to, and which ones they use (separately)

By combining the projects you use or contribute to into a single question we're actually losing a lot of relevant information. I believe a lot of people contribute to Clang, but only use libcxx. I believe this based on the number of contributors to clang and libcxx over the last year (284 and 41 respectively). Mashing these into the same question loses information that I think is relevant. In particular I believe it is common for clang contributors to use projects that they don't contribute to, and we should try and quantify that. If we don't want to have multiple questions for this, we could infer the projects a person contributes to if we match the email address in the survey against the email address on commits, which would also be an acceptable route to this information.

(2) How many people build clang against an installed LLVM?

I know it does get used this way, but have no idea how common it is. We recently had a series of changes because cc1_main.cpp was including llvm's Config.h which isn't installed. I think this is a very uncommon use case, my evidence for this is that the change breaking the standalone build was months old before it was detected. Alternatively it might be a common use case that is only used on the release branches (which would make some sense). Either way it would be good to gather data around it. Knowing how end users and package maintainers are using our existing source distributions is useful information when thinking about infrastructure changes. This doesn't necessarily mean we shouldn't do something that impacts them, but it allows us to make informed decisions.

(3) How many people use runtime projects without LLVM or Clang?

There have been several discussions lately about supporting runtimes without LLVM sources, we might want to figure out how common that desire is. It also might be nice to be able to correlating people who want that support with people who contribute to the runtimes.

Data points:

C Bergström wrote on llvm-dev:
/* Side rant - I wish I didn't even need the llvm sources. I just want to build libcxxrt */

Michał Górny filed:
Bug 18331 - [cmake] Please make compiler-rt's build system stand-alone
Bug 29109 - [cmake / compiler-rt] Please make tests runnable against installed LLVM

(4) How many people are people getting LLVM sources today?

Over the course of the many discussions on moving to Git we still actually don't know how many people are using Git already. Knowing how many people are using Git, or Git-SVN when interacting with LLVM sources is a really simple question that will tell us a lot about the impact of a move to Git on the wider community. We also don't know whether people are getting sources from the LLVM SVN repository, or the git mirrors, or the GitHub mirrors, or Takumi's mono-repo. It would be really great to gather information about where people are getting LLVM sources, and how they interact with them.

Structuring a survey to gather primarily information either in addition to or instead of opinion we can augment any decision with data providing a justification.

-Chris

> On Aug 17, 2016, at 2:23 PM, Renato Golin via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> On 17 August 2016 at 22:18, Mehdi Amini <mehdi.amini at apple.com> wrote:
>> I think the survey should be regarding question based-off a single document
>> putting side-by-side the options that we came-up with on the mailing-list.
>> Indeed I don’t plan to write a document describing a “mono-repo” proposal to
>> counter the submodules one, but I plan instead to unify it with the existing
>> one (submodules…) along with the possible variants/options in a single
>> document.
> 
> I agree this is probably the most sensible solution. Thanks for
> merging the options.
> 
> 
>> I plan to include examples of workflow today and after for each scenario,
>> side-by-side. I hope to have it up for public review by the end of the
>> month.
> 
> Excellent! I'll get the form rolling in parallel, and hopefully we'll
> reach maturity around the same time.
> 
> 
>> I’d regret not having the results of the survey for the BoF as these data
>> seem critical to drive the discussion.
> 
> Agreed. Let's aim for that.
> 
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160824/d83cf4ea/attachment.html>