[cfe-dev] Fwd: [RFC] Adding Thread Role Analysis

Thu Jun 20 14:18:46 PDT 2013

Bringing this over to cfe-dev at the suggestion of the IRC channel.

---------- Forwarded message ----------
From: Aaron Ballman <aaron at aaronballman.com>
Date: Thu, Jun 20, 2013 at 5:03 PM
Subject: [RFC] Adding Thread Role Analysis
To: llvm cfe <cfe-commits at cs.uiuc.edu>

Hello!

We at CERT are interested in adding thread role analysis support (also
known as thread coloring) to clang.  This email is meant to briefly
introduce the subject and then open up to the community for comments.

Thread role analysis (or TRA), allows programmers to express their
thread usage policies in source code using simple and concise
annotations.  A static analyzer can check whether the expressed
policies and the as-written code are consistent, ensuring that the
annotated contract remains effective.

We've worked extensively with--and have demonstrated the utility
of--TRA for Java, including analyzing and finding significant problems
in dozens of fielded applications ranging from toy applets to
multi-million LOC commercial applications.  As part of that prior work
we demonstrated:
* Required annotation density (for Java) of ~6 annotations per KLOC
(and a design that should permit reducing this to <1 annotation per
KLOC)
* Composability of the analysis
* Adoptability by working programmers
* Capability to analyze large programs (>150KLOC in a single chunk;
multi-million LOC in multiple chunks)
* The annotation language and analysis suffice for analyzing a wide
variety of real programs

Publications regarding our previous work with TRA for Java include:
* PPoPP'10 paper (best short introduction) --
http://www.fluid.cs.cmu.edu:8080/Fluid/fluid-publications/p233-sutherland.pdf
* Ph.D. Dissertation (all the gory details, but LONG) --
http://reports-archive.adm.cs.cmu.edu/anon/isr2008/abstracts/08-112.html
* Blog post -- http://blog.sei.cmu.edu/post.cfm/thread-role-analysis

Thread usage policies describe what kinds of threads ("thread roles")
are permitted to invoke particular functions or methods.  TRA can be
used to enforce a wide variety of policies including (but not limited
to):
* Single-thread-only (e.g., use of threads is forbidden)
* Restricting use of GUI methods to the GUI event thread
* Ensuring that threads engaged in reader-writer locking restrict
themselves to calls that are consistent with their role -- e.g.,
readers never attempt to write and outside threads neither read nor
write.
* Ensuring that llvm Function passes never invoke llvm Module passes
(note that this would require C++ support; initial phases of this
project involve C only)

When combined with a sufficiently-powerful effects analysis, TRA can
also be applied to data objects. In this mode, it can be used to
either (1) prove thread confinement of data (which then need not be
locked), or (2) identify data that is potentially shared across
threads (and consequently requires concurrency control of some sort).
However, we are not aware of a suitable effects analysis for C or C++,
and consequently have no current plans to address this capability.

In principle, TRA can be performed entirely one translation-unit at a
time. However, this approach requires unrealistic numbers of
user-written annotations.  Past research has demonstrated that TRA is
most effective (in terms of bang for the end-user's effort) when
performed over large groups of translation units. In this mode, we
require annotations on the interfaces between groups of TUs, and infer
annotations within each group.  The potential for using Modules (or
something similar) should be obvious. Note that the cost of the
analysis is O(n) in program size in the typical case.

We realize that this RFC doesn't touch on implementation details (we
wanted to spare you the wall of text).  However, we're happy to
discuss our envisioned implementation strategy if you have any
questions.  We propose to proceed by giving small RFCs on
implementation details which may require us to introduce new concepts,
but otherwise go with post-commit reviews for the majority of the
work.

Our implementation strategy from 50,000 feet:
* Start by supporting C (only).
* Begin adding C++ support in US Govt. FY14 (starting Oct. 1, 2013)
* Add new declaration attributes
* Add new statement attributes and round out the statement attribute subsystem
* Emit YAML-based sidecar files (either during compilation or from the
Static Analyzer)
* Produce a utility application to read the sidecar files, perform the
analysis, and produce results.
* Add a static analyzer pass which reads the results and emits diagnostics

Our Team:
Dr. Dean Sutherland and I work for the securing coding group at CERT
(within the Software Engineering Institute of Carnegie Mellon
University) and will be the primary people working on
implementing thread role analysis.  We are funded to develop and
maintain this system for at least the next 16 months. You can find
more information about CERT and the secure coding group at:
http://www.cert.org/secure-coding/.

~Aaron