[LLVMdev] http://llvm.org/perf/ instability: some clues

Kristof Beyls kristof.beyls at arm.com
Sun May 10 11:21:39 PDT 2015


Daniel, Tobias, Renato and myself have been looking a little bit at the
potential underlying reason
for why http://llvm.org/perf/ is instable, and have found some clues. I want
to share them here
to give people with more experience in the frameworks used by LNT (flask,
sqlalchemy, wsgi, .)
a chance to check if our reasoning below seems plausible.

 

Daniel noticed the following backtrace in the log after
http://llvm.org/perf started giving "Internal Server Error"
again:

2015-05-08 22:57:05,309 ERROR: Exception on /db_default/v4/nts/287/graph
[GET] [in
/opt/venv/perf/lib/python2.7/site-packages/Flask-0.10.1-py2.7.egg/flask/app.
py:1423]

Traceback (most recent call last):

  File
"/opt/venv/perf/lib/python2.7/site-packages/Flask-0.10.1-py2.7.egg/flask/app
.py", line 1817, in wsgi_app

    response = self.full_dispatch_request()

  File
"/opt/venv/perf/lib/python2.7/site-packages/Flask-0.10.1-py2.7.egg/flask/app
.py", line 1477, in full_dispatch_request

    rv = self.handle_user_exception(e)

  File
"/opt/venv/perf/lib/python2.7/site-packages/Flask-0.10.1-py2.7.egg/flask/app
.py", line 1381, in handle_user_exception

    reraise(exc_type, exc_value, tb)

  File
"/opt/venv/perf/lib/python2.7/site-packages/Flask-0.10.1-py2.7.egg/flask/app
.py", line 1475, in full_dispatch_request

    rv = self.dispatch_request()

  File
"/opt/venv/perf/lib/python2.7/site-packages/Flask-0.10.1-py2.7.egg/flask/app
.py", line 1461, in dispatch_request

    return self.view_functions[rule.endpoint](**req.view_args)

  File
"/opt/venv/perf/lib/python2.7/site-packages/LNT-0.4.1dev-py2.7.egg/lnt/serve
r/ui/decorators.py", line 67, in wrap

    result = f(**args)

  File
"/opt/venv/perf/lib/python2.7/site-packages/LNT-0.4.1dev-py2.7.egg/lnt/serve
r/ui/views.py", line 385, in v4_run_graph

    ts = request.get_testsuite()

  File
"/opt/venv/perf/lib/python2.7/site-packages/LNT-0.4.1dev-py2.7.egg/lnt/serve
r/ui/app.py", line 76, in get_testsuite

    testsuites = self.get_db().testsuite

  File
"/opt/venv/perf/lib/python2.7/site-packages/LNT-0.4.1dev-py2.7.egg/lnt/serve
r/ui/app.py", line 55, in get_db

    self.db = current_app.old_config.get_database(g.db_name, echo=echo)

  File
"/opt/venv/perf/lib/python2.7/site-packages/LNT-0.4.1dev-py2.7.egg/lnt/serve
r/config.py", line 148, in get_database

    return lnt.server.db.v4db.V4DB(db_entry.path, self, echo=echo)

  File
"/opt/venv/perf/lib/python2.7/site-packages/LNT-0.4.1dev-py2.7.egg/lnt/serve
r/db/v4db.py", line 108, in __init__

    .filter_by(id = lnt.testing.PASS).first()

  File
"/opt/venv/perf/lib/python2.7/site-packages/SQLAlchemy-0.9.6-py2.7.egg/sqlal
chemy/orm/query.py", line 2334, in first

    ret = list(self[0:1])

  File
"/opt/venv/perf/lib/python2.7/site-packages/SQLAlchemy-0.9.6-py2.7.egg/sqlal
chemy/orm/query.py", line 2201, in __getitem__

    return list(res)

  File
"/opt/venv/perf/lib/python2.7/site-packages/SQLAlchemy-0.9.6-py2.7.egg/sqlal
chemy/orm/query.py", line 2405, in __iter__

    return self._execute_and_instances(context)

  File
"/opt/venv/perf/lib/python2.7/site-packages/SQLAlchemy-0.9.6-py2.7.egg/sqlal
chemy/orm/query.py", line 2418, in _execute_and_instances

    close_with_result=True)

  File
"/opt/venv/perf/lib/python2.7/site-packages/SQLAlchemy-0.9.6-py2.7.egg/sqlal
chemy/orm/query.py", line 2409, in _connection_from_session

    **kw)

  File
"/opt/venv/perf/lib/python2.7/site-packages/SQLAlchemy-0.9.6-py2.7.egg/sqlal
chemy/orm/session.py", line 846, in connection

    close_with_result=close_with_result)

  File
"/opt/venv/perf/lib/python2.7/site-packages/SQLAlchemy-0.9.6-py2.7.egg/sqlal
chemy/orm/session.py", line 850, in _connection_for_bind

    return self.transaction._connection_for_bind(engine)

  File
"/opt/venv/perf/lib/python2.7/site-packages/SQLAlchemy-0.9.6-py2.7.egg/sqlal
chemy/orm/session.py", line 315, in _connection_for_bind

    conn = bind.contextual_connect()

  File
"/opt/venv/perf/lib/python2.7/site-packages/SQLAlchemy-0.9.6-py2.7.egg/sqlal
chemy/engine/base.py", line 1737, in contextual_connect

    self.pool.connect(),

  File
"/opt/venv/perf/lib/python2.7/site-packages/SQLAlchemy-0.9.6-py2.7.egg/sqlal
chemy/pool.py", line 332, in connect

    return _ConnectionFairy._checkout(self)

  File
"/opt/venv/perf/lib/python2.7/site-packages/SQLAlchemy-0.9.6-py2.7.egg/sqlal
chemy/pool.py", line 630, in _checkout

    fairy = _ConnectionRecord.checkout(pool)

  File
"/opt/venv/perf/lib/python2.7/site-packages/SQLAlchemy-0.9.6-py2.7.egg/sqlal
chemy/pool.py", line 433, in checkout

    rec = pool._do_get()

  File
"/opt/venv/perf/lib/python2.7/site-packages/SQLAlchemy-0.9.6-py2.7.egg/sqlal
chemy/pool.py", line 945, in _do_get

    (self.size(), self.overflow(), self._timeout))

TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection
timed out, timeout 30

 

After browsing through the sqlalchemy documentation and bits of the LNT
implementation,
it seems so far that the following pieces may be the key parts that cause
the problem
shown in the log.

 

The SQLAlchemy documentation seems to recommend to have a sqlalchemy session
per web
request. Looking at the following pieces of LNT, I got the impression that
instead a
session is shared between many or all requests:

 

>From ui/app.py, it shows Request.get_db() basically caches get_database from
"config":

...

class Request(flask.Request):

...

    def get_db(self):

...

        if self.db is None:

            echo = bool(self.args.get('db_log') or self.form.get('db_log'))

            self.db = current_app.old_config.get_database(g.db_name,
echo=echo)
...

        return self.db

 

in config.py, it is shown that get_database returns a V4DB object by calling
a constructor:

...

    def get_database(self, name, echo=False):

...

        # Instantiate the appropriate database version.

        if db_entry.db_version == '0.4':

            return lnt.server.db.v4db.V4DB(db_entry.path, self,

                                           db_entry.baseline_revision,

                                           echo)
...

 

This constructor is in db/v4db.py:

...

class V4DB(object):

...

    def __init__(self, path, config, baseline_revision=0, echo=False):

...

        self.session = sqlalchemy.orm.sessionmaker(self.engine)()

...

        # Add several shortcut aliases.

        self.add = self.session.add

        self.commit = self.session.commit

        self.query = self.session.query

        self.rollback = self.session.rollback
...

 

 

 

It seems like a single session object is created in this constructor that
will ultimately
be shared across all Requests. It seems that instead, the request.get_db
method should
create a new session for each request. And close that session when the
request is finalized
which probably needs to be done by hooking into something Flask-specific.

 

The self.add and following lines in the constructor show that it probably
will be
non-trivial to refactor code so that there will not be a single session per
v4db object.

 

We're not sure if making separate sessions per Request is going to solve the
http://llvm.org/perf
instability problems; but that's the best idea we've got so far. 

 

Thanks,

 

Kristof

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150510/1996bc90/attachment.html>


More information about the llvm-dev mailing list