[cfe-dev] [RFC] Adding Thread Role Analysis

Fri Jun 28 11:22:34 PDT 2013

On Mon, Jun 24, 2013 at 3:01 PM, Arthur O'Dwyer <arthur.j.odwyer at gmail.com> wrote:
> On Mon, Jun 24, 2013 at 11:16 AM, Delesley Hutchins <delesley at google.com> wrote:
>> 
>> In particular, I believe that "lock" and "role" are essentially the same
>> thing.  In both cases, the analysis is marking functions and data structures
>> as being "protected" by a particular lock/role, and ensuring that a thread
>> must hold/have the lock/role before calling protected functions, or
>> accessing protected data.
> 
> Dean, perhaps you could expand on how your "roles" fundamentally
> differ from "locks"; e.g. how is the statement "Thread A has the GUI
> role" fundamentally different from "Thread A notionally holds the GUI
> write-lock"; how is "Thread B has the Compute role" fundamentally
> different from "Thread B notionally holds a Compute read-lock"?

Ostensibly: locks are data-driven, and roles are code-driven.  That
being said, you can emulate one using the other.  You can gin up a
piece of data to associate the lock with for each given role.  So when
you thread_role_grant, you are acquiring that lock, and
thread_role_revoke would release that lock. In this model, requiring a role
for one or more methods would be equivalent to requiring that the lock be
already held for those methods.

One difference between roles and locks is cognitive.  Programmers are
taught to lock minimally -- grab the lock only when you need it, and
release it as soon as you're done with it.  Keeping your locks tight
reduces contention, etc.  Roles go to the other extreme: you define a
role that a thread performs, and stick with it for as long as the role is
relevant.  For roles like GUI (e.g., the event thread), that's likely to be a
very long time. It's slightly counter-intuitive for you to grab a notional lock
and hold it for the lifetime of a thread.  This really boils down to the idea
that when you lock something, you are protecting a resource (data, or a code
section, etc).  Defining roles is more contractual -- it doesn't focus on the
'how' like locking does.  You are saying "this method should only ever be used
from this role". Consequently, if that method is ever invoked from another role
(or, equivalently, from code that may be executed by a thread performing
another role), you've broken the contract.  Contrast this with locks where it
is expected that the data is going to be touched from two different contexts
(hence the need to protect in the first place).

The underlying analysis technique is useful for specifying and enforcing a
variety of contracts. Thread roles happen to be a high-impact choice.

>> That said, you did point out some useful missing features, which are:
>> 
>>> We support more complex boolean expressions over thread roles. The
>>> existing Clang annotations cannot express multiple combinations of
>>> such properties.  For example, ((A & B & !C) | (D & !E)).
>> 
>> As you mention, the existing analysis does not support this case.  However,
>> this functionality would not be hard to add -- it's simply a boolean
>> constraint on allowable lock sets.  If you feel it's valuable, then let's do
>> that!
> 
> For reference, Dean gave an example of an (!A | !B) constraint that
> seems very valuable:
> 
> Dean Sutherland wrote:
>>> * TRA allows statements that certain thread roles are mutually
>>> exclusive (a GUI thread can't be a Compute thread, and vice
>>> versa). The Clang annotations lack support for this property.
> 
> Delesley, does Clang's current lock-based analysis really have no
> support for the annotation "No thread can hold both the GUI lock and
> the Compute lock at the same time"? You could sort of simulate this by
> annotating the invariants "Whenever you call the function lock_GUI(),
> you must not hold the Compute lock" and "Whenever you call the
> function lock_Compute(), you must not hold the GUI lock", but that's
> quite ugly, and also supposes that functions analogous to lock_GUI()
> and lock_Compute() exist for every role, which is probably false.
> 
> Dean Sutherland wrote:
>>> If this statement requires user-written annotations for all such
>>> methods, the required annotation effort will make adoption exceedingly
>>> difficult for most practicing programmers.  One of the benefits of TRA
>>> is the application of well-founded techniques to avoid the need for
>>> most user-written annotations.  Perhaps some of these techniques could
>>> be brought to bear on the existing thread safety analysis.
> 
> Dean, how does this work? Don't you need annotations somewhere to tell
> the analyzer what counts as a "GUI method", for example? You're saying
> that Clang's current analyzer would require a ton of annotation to
> mark all the GUI methods, which IMHO is true; but how would your
> solution avoid that burden?

Section 5.2 of the PPoPP paper gives the full explanation.  In brief, it runs
off the idea that annotations are propagated from one function to another along
the call graph.  Eg)

void h() {}
void g() { h(); }

[[cert::thread_role("GUI")]]
void f() { g(); }

When analyzing the flow of f, g and h are automatically given the GUI role
through propagation.  This lets us infer the missing annotations without
additional user effort.  The inference happens via iterative flow analysis with
Boolean expressions for the lattice values, and logical OR as the join
operator.  This is an ordinary inference analysis. The unusual part is that we
perform it over quite large swathes of code (call them "modules"). The enabling
conditions are that (1) all directly-invoked entry points to the module are
known at analysis time, (2) all indirect calls are adequately covered by thread
role annotations (by inheritance for OO, or by suitable genericity for function
pointers), and (3) all code that is "inside" the module is presented for
analysis (that is, the module is "sealed").  As you might expect, the user writes 
annotations fort the visible interface of the module; the tool infers (most of) the
annotations inside the module. As with most such inference
algorithms worst-case BigO is exponential, but this is never seen in practice.
Typical case is linear in program size! We're happy to provide the explanation
off-line -- this message is already way too large.

Past experiments show that dividing up programs in this fashion greatly reduces
the number of user-written annotations. An intelligent division yields best
results, but even random division helps.  Previous results show that the
visible interface between modules (when grouped intelligently) is typically <8%
of the functions (and data declarations) that were visible across TUs.  Random
groupings had about 25% of the functions as visible interface.  That's a
4x-12x reduction in the number of annotations to be written! See the PPoPP paper 
for descriptions of the other annotation reduction techniques.