[llvm-dev] Git move survey

Wed Aug 24 16:26:23 PDT 2016

On 24 August 2016 at 22:11, Chris Bieneman <beanz at apple.com> wrote:
> I went back and looked at the current survey, and I have a lot of thoughts I
> wanted to share. I apologize for this being a dense response to a
> days-dormant thread.

That's ok, I ws just about to ask again. After a few long threads, I
learnt that silence rarely means consensus.

> What information do we want to get out of the survey? The current survey is
> mostly just giving people people a way to vote for their preferred solution.

So, this is specifically what people wanted. I apologise for not doing
a full sweep on the old threads, but a few points were clear as I
asked around:

1. We don't want simple votes, we want to understand the vote (the survey).
2. We need concrete data to choose from. Each proposition needs a
clear and complete description (the docs).
3. Free text fields are required to allow people to expand on their choices.

and a few things are clear (to me, personally) from previous attempts
to gather opinions and consensus:

4. Too many questions dilute the statistical quality of the answers.

We won't have many more than 100 answers, so asking more than 10
relevant questions could get us in a situation where there is no clear
consensus.

5. Different people answer the same questions differently.

It is practically impossible to phrase a question in a way that
everyone will answer in the same way, and trying to capture all
possible ways will explode the number of multiple choices to
exhaustion, leading to the problem above (4).

6. Too many free text answers can be exhaustive to read, interpret and classify.

100 answers may be too little for statistical relevance, but it's too
much to collate in a coherent and meaningful report. Free text is not
far from emails, where we'll end up quoting one line of a long thread
as more relevant than the rest of the thread itself. In a sense,
creating a survey of free text questions will be too much like a long
thread, but without digressions.

> I think this is a huge missed opportunity. One of the things I've found most
> frustrating about the Git-related threads is that there have been several
> assertions following the form "most people <blah>". This is a really great
> opportunity for us to actually get some real data to prove or disprove these
> assertions.

While I agree with you that we could learn so much more if we did a
more elaborate survey, the point of this one in particular was to know
what people prefer with regards to their version control system.

I think it would be amazing to learn all different ways people use
LLVM, and that would give us a huge insight on how to organise the
repositories, websites, mailing lists and even the code itself and how
it's built.

But this is a different topic, and one that will take considerably
longer than two months to do. My first email on 31st of May and we're
planning to have a survey in September, a meeting in November to maybe
decide something in December. And this is *just* about version
control.

> In a very general sense I think the survey as written is little more than a
> vote for which option people prefer, and an opportunity to rate how good or
> bad they think the alternative is. As a result I think the current survey
> has a selection bias that will exclude people who may not have clear or
> strong opinions on the proposals.

I disagree.

People that don't have a strong opinions *also* need to evaluate how
this is change their work. I myself don't have a strong opinion, and
I'm fine either way, but I *will* have to change my work and we are
already planning for both moves.

If people don't prefer any, but will have a much more serious problem
migrating to one and not the other, they should mark the right option
on the survey and then describe what the problem will be.

In the end, we are already ignoring the preference of a lot of people
that use Git today. So I don't see a way to cater for everyone, nor I
see a way to weight someone's opinions heavier that others based on
free text answers, in the same way I can't tell if a thread has
consensus or not by counting the number or people pro and against.

The survey needs an element of a vote, but it also needs an element of
description, and it has both.

> As I said in an earlier response I also
> think the reliance on text fields will make the data harder to process and
> understand if we get a large number of responses (and I really hope we get a
> lot of responses).

Precisely.

> I think we should consider approaching this problem differently. Instead of
> structuring a vote, we could focus on gathering data about users and
> workflows, and using that real-world data to guide a decision that is best
> for the most common use cases. Correlating information about people's
> workflow answers against their relationship to the community will allow us
> to categorize and weigh the results.

This is a *completely* different problem and, while I can see it's
related to the one at hand, I don't think we can reliably assess
what's the best way forward in the particular case of version control
in any reasonable time.

> we could infer the projects a person contributes to if we match the email
> address in the survey against the email address on commits, which would also
> be an acceptable route to this information.

we could, for most cases.

> Knowing how
> end users and package maintainers are using our existing source
> distributions is useful information when thinking about infrastructure
> changes.

Again, a completely different problem. One I do want to solve, but I
really didn't want to start intermixing hard problems together.

> There have been several discussions lately about supporting runtimes without
> LLVM sources, we might want to figure out how common that desire is. It also
> might be nice to be able to correlating people who want that support with
> people who contribute to the runtimes.

There is enough evidence already in the lists and current downstream
users to assume it is a common use.

But this is being addressed already via other ways, ie. discussion
between the interested parties (ex. compiler-rt/libunwind split,
compiler-rt cross-build, test-suite cmake-ification, libc++ isolated
testing, etc.)

> (4) How many people are people getting LLVM sources today?

This is one question that we could easily add, giving a few options
and "other" with a free text field.

I foresee no complications from this additional question.

> Structuring a survey to gather primarily information either in addition to
> or instead of opinion we can augment any decision with data providing a
> justification.

It can also make the whole thing useless by not providing enough
information so that each biased opinion can be "proven" by munching
the data in a slightly different way.

In order to have a clear signal we need simple and aggregative
questions, with a free text question to complement.

I'm equally interested in the results of such a broader survey, but
not as a way to choose the version control system that we use.

Maybe we could do a parallel survey? One that wouldn't need to be
complete by the dev meeting to be of use? One that would be taken even
more advisory than the current one?

cheers,
--renato