[llvm-foundation] LLVM Infrastructure

Wed May 18 14:38:03 PDT 2016

Just my 2 cents: the recent outage of the repositories was caused by
DDoS targeted towards viewvc. There were no outages at the time
(almost 2 months) when viewvc was disabled. Recently viewvc was
reenabled with bunch of tweaks and also we're having some whole /16
networks banned to stop DDoS.

On Wed, May 18, 2016 at 5:26 PM, Renato Golin via llvm-foundation
<llvm-foundation at lists.llvm.org> wrote:
> Folks,
>
> I wanted to understand what's the foundation plans on infrastructure.
>
> According to the announcement [1], the foundation is responsible for
> overseeing all LLVM tools, websites and general infrastructure.
>
> Although some work was done already, I feel most of them were to
> remediate the worst immediate problems. I believe some investigation
> is necessary, and shared in the right forum. It might not be this
> list, but I'll take the risk of being wrong here, rather than start
> another giant thread on the main lists.
>
> Basically, I'm proposing a three step process for all our infrastructure:
>
> 1. Investigate the problems and solutions we have, potentially asking
> on the main lists.
> 2. Propose a set of changes (publicly or privately), from cheaper to
> most expensive, and make the cut on what we can afford.
> 3. Draw a plan, and when publicly visible changes are due, make sure
> the proposal and schedule are fair, and do it.
>
> For example, changing the web server from cloud to cloud makes no
> difference, so it doesn't need to be a public process, but changing
> our repository provider may affect a lot of internal processes, and
> people will react badly if nothing is shared with them beforehand, so
> we need some exposure beforehand.
>
> The areas I think we must improve:
>
>
>   A. Code repositories
>
> The SVN server is reasonably unstable, having outages that affect all buildbots.
>
> There was a migration earlier this year, I don't know if the repo was
> involved, but we still had an outage a few weeks ago. I don't know
> what the cost of hosting our own "stable enough" repository versus
> paying some other SVN hosting company to do that for us.
>
> The benefit of using code hosting companies is that they have a
> larger, distributed and more stable infrastructure specifically
> tailored to code hosting, which is something that would be very
> expensive for us to do. But we may get away with slightly more than
> what we have today and not pay too much.
>
> In my view, this requires a deeper investigation, list of prices
> versus features on varied providers to make an informed proposal.
> It'll also require community involvement, as this changes the core of
> what we do.
>
> Same goes for the Git repo, which is a lot better than SVN, but could
> reduce costs if we moved to a FOSS friendly host.
>
>
>   B. Buildbots
>
> Our current build master is *very* slow. Buildbot itself is slow, I
> know, but with the number of people we have looking at those bots, it
> can take several seconds to a minute to get a page back. We may need
> an individual server (cloud instance) for this, and scale as
> necessary.
>
> Also, the current master is on version 0.8.5, which besides being
> ancient, has several drawbacks:
>  * It doesn't support newer SQLAlchemy, and it has trouble with newer
> buildslaves that use them.
>  * It doesn't support submitting patches to a particular build, making
> pre-commit buildbots impossible.
>  * The new versions have better support for Windows builders
>
> Plus a huge list of changes around SVN/Git actions, authentication,
> RSS feeds, stability, interaction with other systems (Gerrit, GitHub,
> etc).
>
> But a migration to a newer build master would probably require a large
> scale Zorg refactoring.
>
> I don't know the costs of the migration, nor I know the costs of the
> current solution. We need a clear picture and maybe propose a few ways
> out of stagnation:
>  - Start a new master elsewhere, slowly move the bots towards it
>  - Refresh the master in compatibility mode, slowly move slaves, switch
>  - Deprecate buildbots and move everyone to Jenkins?
>
> Whatever works for everyone.
>
>
>   C. Bugzilla
>
> Bugzilla is great, I'm one of the weirdos that actually like it. But
> our bugzilla is also severely outdated, and the internal organisation
> disconnected with how the project evolved over the years.
>
> I think we should update our Bugzilla purely in the interest of bug
> fixes and security issues, as I don't know anything that we may want
> from a newer version. Some other people might...
>
> Also, if we're going to be writing scripts to automate creating bugs
> and scanning through Bugzilla web services, it may be a bit more
> loaded than it is now, and I don't think it's great as it is.
>
> Since this is mostly a web service anyway, upgrading the version
> should bear very little community impact, so it's more about the cost
> of a new server (cloud instance) and the migration itself than
> anything else.
>
>
>   D. Phabricator
>
> Phab is a good tool, but I believe we have a copy that has been
> modified to suit our needs and thus diverged from whatever it is
> upstream. I think it's ok to modify our tools, but very little effort
> has been put into finishing the modifications, for example,
> understanding inline email replies. This is not a trivial task, but
> far too many people have had this problem in the main Phab Phab, and I
> remember reading that, due to our changes, it may not be easy to
> upgrade.
>
> I personally think that local modifications without an effort to
> upstream is against the goals of FOSS communities and we shouldn't be
> promoting it ourselves. All in all, I think this deserves at least a
> report on how good/bad it is, and how we should improve the bad things
> (ie. lack of updates).
>
>
>   E. Others
>
> I believe that the email and web pages infrastructure is now good
> enough and fit for purpose, but it also needs to be part of the
> overall plan (below). If I'm not mistaken, this is the server that was
> just upgraded, so that shows some progress, which is highly welcomed.
>
>
>   Z. Update Plan
>
> In the end, I think we need to think about how much work to put into
> updating the infrastructure, from moving to new servers, to updating
> software, to changing into new solutions. Since this is something that
> can disrupt the community at large (either updating or not), this
> should also be shared with the larger community, so that we have a
> clear picture of what to expect and what to request, if serious
> problems arise.
>
> Until now, we've been very relaxed with using tools, like when
> Chandler introduced Phabricator. I think that's a perfectly valid way
> of introducing new concepts and tools, but not so much in maintaining
> them. But the more we depend on the tools we add, the more care we
> need to put in keeping them available, fast, up-to-date and secure.
>
> With all that in mind, my question is: what is the Foundation's plan
> for the core LLVM's infrastructure maintenance and improvements, and
> how can the rest of the community help in defining and implementing
> those plans?
>
> cheers,
> --renato
>
> [1] http://blog.llvm.org/2014/04/the-llvm-foundation.html
> _______________________________________________
> llvm-foundation mailing list
> llvm-foundation at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-foundation

-- 
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University