[cfe-dev] Proposal for Uniqueness Analysis

Mon Aug 12 09:59:12 PDT 2013

Arbitrary typestates are a very nice generalization of this work, and
as Dean mentions, there is plenty of prior work in the literature to
draw on.  I see two issues that need to be addressed, namely:  (1)
What is the best way to generalize the annotations to user-declarable
states?  (2) How do we handle aliasing?

With respect to (1), it seems pretty straightforward to parameterize
the attributes with arbitrary values, such as enums, and introduce one
predefined state, "Uknown", which represents the set of all states.
E.g.

enum ContainerState { CS_Empty, CS_NonEmpty };

class Stack {
  bool isEmpty() TEST_STATE(CS_Empty)
  { ... }
  T pop() STATE_TRANSITION(CS_NonEmpty, "Unknown"),
  { ... }
};

With respect to (2), the most tractable (although unsound) approach is
to assume no aliasing unless an object is explicitly aliased.  The
nice thing about the uniqueness property is that you don't have to
worry about aliasing, but that doesn't hold for other properties.  Any
other ideas?

I do want to mention one important caveat, which is that states like
"non-empty" are not always as useful in practice as one might hope.
The nice thing about the "consumed/unconsumed" analysis is that most
operations leave an object in a known state.  The pop() method, on the
other hand, transitions to an unknown state, which renders the
container unusable until a test method method is called.  That may, in
fact, be what you want; but it could also lead to a lot of false
positives, especially when users who don't fully understand the how
typestates work start implementing their own states.

Just my thoughts on the issue,

  -DeLesley

On Mon, Aug 12, 2013 at 7:46 AM, Dean Sutherland <dsutherland at cert.org> wrote:
> This is a GREAT idea, where there's a fair amount of prior work.  You could do worse than taking a look at the body of work on "typestates" by Bierhoff, Beckman and Aldrich.  They showed that checking properties such as those you describe is quite feasible.
>
> See "Practical API Protocol Checking with Access Permissions" from ECOOP '09 as an example; you can follow the citation chain from there.
>
> For unique ownership, there's a bunch of great work Boyland, later extended by Aldrich and crew.  A good starting point is "A type system for borrowing permissions " <http://dl.acm.org/citation.cfm?id=2103722> from POPL '12.
>
> I hope these references are useful.
>
> Dean
>
> On Aug 11, 2013, at 10:42 PM, David Blaikie <dblaikie at gmail.com>
>  wrote:
>
>> Of course we've discussed this at length & we're currently reviewing
>> the first stages of such a feature, so I'm not in any way attempting
>> to subvert/derail/distract from that work, but I had a thought that
>> Delesley might have some ideas on, or you, Christian:
>>
>> Could this analysis be generalized further to essentially a "state"
>> attribute system. I suppose then the defaults might become a bit
>> weird/non-general, but what I'm wondering is if it could be extended,
>> to, say, describe the state of a container as being non-empty (& thus
>> front/back functions are valid), or even other objects with more than
>> just an "empty" and "non-empty" state. Essentially there are functions
>> that transition between states, those that are valid only in certain
>> states, and those that query the state (OK, that would get painful -
>> describing how a given query function maps to the various states the
>> analysis knows about).
>>
>> Without a concrete use case I'm not rushing to do this - but I'm just
>> wondering if there's some obvious reason I've missed that makes this a
>> bad idea to begin with. Otherwise it's something I'll keep in mind,
>> should a use-case arise where we might want to generalize this over
>> more than a 2-state type.
>>
>> On Wed, Jul 10, 2013 at 11:58 AM, Christian Wailes
>> <chriswailes at google.com> wrote:
>>> As things currently stand, there is a problem with some move-only classes
>>> which model unique-ownership semantics: we are unable to determine which
>>> methods are safe to call only before a move, or both before and after a
>>> move.  An example of this is the std::unique_ptr class.  When a
>>> std::unique_ptr object is managing a pointer to a valid region of memory you
>>> may dereference it.  However, once the pointer has been moved to another
>>> object a warning should be generated if it is dereferenced again, e.g.:
>>>
>>> int main(void) {
>>>
>>> std::unique_ptr<int> p0(new int);
>>>
>>> std::unique_ptr<int> p1();
>>>
>>>
>>> *p0 = 0;
>>>
>>>
>>> p1 = std::move(p0);
>>>
>>>
>>> // Dereferencing p0 is illegal here, as it has a null value.
>>>
>>> printf(“p0 points to %d\n”, *p0);
>>>
>>> }
>>>
>>>
>>> To address this problem for std::unique_ptr we propose a new set of
>>> annotations and an analysis.  This work will later be applied to other use
>>> cases, such as ensuring continuations are called and that files are closed
>>> (and not accessed after they are closed).  We have designed the annotations
>>> and analysis to be applicable to any class that implements move-only
>>> unique-ownership semantics.  The std::unique_ptr class was chosen as the
>>> first step in this work due to its immediate usefulness and the simplicity
>>> of the analysis needed for this use case.
>>>
>>>
>>> The analysis itself is a data-flow analysis, and borrows both terminology
>>> and design aspects from literature on linear types [1,2].  It tracks the
>>> state (consumed, unconsumed, or unknown) of a std::unique_ptr through its
>>> lifetime by traversing the control flow graph (CFG).  A std::unique_ptr
>>> starts its lifetime in the consumed state if it is initialized using the
>>> default constructor.  If the non-default constructor is used it is
>>> considered to be in the unconsumed state.  Ownership of a value is
>>> transferred whenever a std::unique_ptr is passed as an rvalue reference;
>>> this transitions the object into a consumed state.  An object may also be
>>> transitioned into the consumed state when certain, annotated, methods are
>>> called.
>>>
>>>
>>> When a branch is encountered the state of the object is merged at the
>>> branch’s join point.  If the object is in the unconsumed state along one
>>> branch, and the consumed state along another branch the object will enter
>>> the unknown state; any further uses of the object must be guarded by tests
>>> for the object’s consumededness.
>>>
>>>
>>> To provide useful warnings we must also annotate methods as being able to be
>>> called only when the object is in the unconsumed state or callable when the
>>> object is in any state.  Annotating the dereference operators for
>>> std::unique_ptr as only being able to be called when the object is in the
>>> unconsumed state would allow us to generate the correct warning in the above
>>> example.
>>>
>>>
>>> Future work would be to allow functions that take move-semantic objects as
>>> parameters to specify the state that they expect the objects to be in;
>>> unannotated functions that take std::unique_ptr object parameters currently
>>> assume that the argument is in the unknown state.  If the same analysis is
>>> desired for C code and non-class/struct types we could add additional
>>> annotations and extend the analysis at a later date.
>>>
>>>
>>> The proposed annotations, which may be changed, are TESTS_UNCONSUMED,
>>> CALLABLE_ALWAYS, CALLABLE_WHEN_UNCONSUMED, and CONSUMES.  We are, of course,
>>> open to suggestions for better/different names.  Any class that has a method
>>> annotated with TESTS_UNCONSUMED, and either CALLABLE_ALWAYS or
>>> CALLABLE_WHEN_UNCONSUMED is subject to our analysis.  A class may have its
>>> methods annotated with either CALLABLE_ALWAYS or CALLABLE_WHEN_UNCONSUMED
>>> but not both, as if one annotation is used the other is assumed for all
>>> unannotated functions.  A brief description of each of the annotations
>>> follows:
>>>
>>>
>>> TESTS_UNCONSUMED - Marks methods that return true when the object is in an
>>> unconsumed state.
>>>
>>> CALLABLE_ALWAYS - Marks methods that can be called when the object is in any
>>> state.
>>>
>>> CALLABLE_WHEN_UNCONSUMED - Marks methods that can be called only when the
>>> object is unconsumed.
>>>
>>> CONSUMES - Marks a method that transitions an object from the unconsumed to
>>> the consumed state.
>>>
>>>
>>> Following shortly will be a patch that adds the appropriate annotations to
>>> Clang.  Next, we will refactor the existing thread safety analysis
>>> infrastructure to be usable for this analysis.  Following the refactoring we
>>> will implement the analysis within AnalysisBasedWarnings.  Any comments,
>>> questions, or concerns would be much appreciated.
>>>
>>>
>>> - Chris
>>>
>>>
>>> Wadler, Philip. "Linear types can change the world." In IFIP TC, vol. 2, pp.
>>> 347-359. 1990.
>>>
>>> Kobayashi, Naoki. "Quasi-linear types." In Proceedings of the 26th ACM
>>> SIGPLAN-SIGACT symposium on Principles of programming languages, pp. 29-42.
>>> ACM, 1999.
>>>
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>

-- 
DeLesley Hutchins | Software Engineer | delesley at google.com | 505-206-0315