[cfe-dev] Discussion: should we enforce access control in C++ attributes?

Wed Oct 31 15:48:16 PDT 2012

Hi Delesly,

Admittedly I've only briefly skimmed through this, but I did have a
few thoughts:

On Wed, Oct 31, 2012 at 3:21 PM, Delesley Hutchins <delesley at google.com> wrote:
> A number of users have recently expressed frustration with fact that
> access-control restrictions (i.e. public, private, etc.) are enforced
> within thread safety attributes. I am considering whether or not I
> should lift those restrictions, and this is not a decision that I want
> to make unilaterally.  There are some potentially far-reaching
> ramifications here in terms of how attributes are handled in C++ in
> general, so I want get feedback from cfe-dev before blindly forging
> ahead.  Be warned that this is a somewhat long post. I will try to lay
> out the problem, along with the pros and cons as I seem them.  I would
> love to hear what the rest of you think.
>
> Here's the problem.  Thread-safety attributes are attached to fields
> or methods, and name the protecting mutex for that field or method.
> Most often, the protecting mutex is a private or protected data
> member.  For example:
>
>
> class TaskQueue {
> public:
>   void lock() EXCLUSIVE_LOCK_FUNCTION(mu) {
>     mu.Lock();
>     logfile << "Task queue acquired by " << getCurrentThreadID() << "\n";
>   }
>
>   void unlock() UNLOCK_FUNCTION(mu) {
>     mu.Unlock();
>   }
>
>   virtual void addTask(int taskID) EXCLUSIVE_LOCKS_REQUIRED(mu) {
>     queue.push_back(taskID);
>   }
>
>   void addTaskSync(int taskID) {
>     lock();
>     addTask(taskID);
>     unlock();
>   }
>
> private:
>   Mutex mu;
>   std::deque<int> queue GUARDED_BY(mu);
> };
>
>
> class CheckedTaskQueue : public TaskQueue {
> public:
>   bool validTask(int id);
>
>   // error: TaskQueue::mu is private
>   virtual void addTask(int taskID) EXCLUSIVE_LOCKS_REQUIRED(mu) {
>     if (validTask(taskID)) TaskQueue::addTask(taskID);
>   }
> };
>
>
> // error: q->mu is private.
> void addTwoTasks(TaskQueue* q, int t1, int t2) EXCLUSIVE_LOCKS_REQUIRED(q->mu) {
>   q->addTask(t1);
>   q->addTask(t2);
> }
>
>
> This example illustrates how thread safety annotations are used, for
> readers who are unfamiliar with them.  TaskQueue provides two ways to
> add tasks to a queue in a thread-safe manner.  The addTaskSync()
> method takes care of locking and unlocking automatically, while the
> addTask() method requires the caller to lock and unlock the queue.
> Because addTask() is marked with EXCLUSIVE_LOCKS_REQUIRED,
> thread-safety analysis will issue a warning at compile time if
> addTask() is called when the lock is not held.
>
> A problem arises because mu is private.  It is an error to override
> addTask() in derived classes, because overriding requires mentioning
> "mu" again in the attribute, and that violates access control
> restrictions.  Similarly, it is impossible to create external
> functions like addTwoTasks(), because the locking requirements
> propagate up, and "mu" isn't visible outside of TaskQueue.
>
> Fortunately, there's a relatively simple workaround, which is to
> declare a public getter method for the mutex.  The main issue here is
> that "mu" is private for a reason; we want clients of the class to go
> through TaskQueue::lock() and unlock(), rather than acquiring the
> mutex directly.  There are two ways to keep the mutex from being
> misused:
>
>
> class Taskqueue {
> public:
>   // Option 1: const getter method.
>   const Mutex& getMu() const LOCK_RETURNED(mu) { return mu; }
>
>   // Option 2: fake getter that returns null.
>   const Mutex* getMu() const LOCK_RETURNED(mu) { return 0; }

Could this function not be defined? Possibly defined as deleted (=
delete)? (not sure whether it makes sense to allow calls to deleted
functions in the unevaluated context of a thread safety attribute (I
think/assume we don't allow calls to deleted functions in other
unevaluated contexts (which is where a deleted function would differ
from an undefined function))).

>
>   ...
> };
>
> class CheckedTaskQueue : public TaskQueue {
> public:
>   // error: TaskQueue::mu is private
>   virtual void addTask(int taskID) EXCLUSIVE_LOCKS_REQUIRED(getMu())

What about an Option 3?

Have you considered doing something like C++11's noexcept operator (
http://en.cppreference.com/w/cpp/language/noexcept )? In which you
could provide an expression & whatever locks are required by that
expression are required by this "Locks Required" clause (or other
properties you're interested in capturing in the locking annotations)

So in this case:

  virtual void addTask(int taskID)
EXCLUSIVE_LOCKS_REQUIRED(required_locks(TaskQueue::addTask(taskID)));

or something to that effect. This encapsulates the contract better &
without code duplication - if the contract of the base addTask
changes, so does the derived implementation (or the wrapper function,
or the whatever else needs this). Relying on the implementation
details of the functions you're calling & duplicating their contracts
explicitly seems sub-optimal compared to propagating the important
expressions up.

>
>   ...
> };
>
>
> Option 1 follows standard OO design practice -- use a public getter
> method to provide limited access to a private variable.  Attributes
> outside of TaskQueue can refer to getMu() rather than mu, and
> LOCK_RETURNED tells the analysis that mu and getMu() refer to the same
> lock.
>
> However, some users don't like Option 1, because a const reference can
> easily become a real reference with a simple const_cast, and they want
> stronger protection.  Option 2 provides the extra protection, but it's
> a bit weird.  It exploits the fact that thread-safety analysis only
> looks at the LOCK_RETURNED attribute, not the body of the method.
> Many users don't like the fact that their code becomes littered with
> public getter methods that are never actually called, and have bogus
> implementations.
>
> I can see both sides of the argument here, so I'll do my best to
> present a case both for and against enforcing access control
> restrictions.
>
>
> Argument in favor of disabling access control in attributes.
> ------------------------------------------------------------
>
> Thread-safety attributes are used only by thread-safety analysis at
> compile time; they do not affect the run-time behavior of the code.
> Referring to a private mutex in an attribute therefore does not break
> encapsulation, because the expression will never be evaluated.
>
> In practice, mutexes are almost always private, so the thread-safety
> analysis should natively support using private mutexes in attributes.
> It is ugly and unnecessary to force users to declare every mutex twice
> -- once as a data member, and once as a getter method.  This is
> especially true because the getter methods are never called; they're
> bogus methods that are present only to get around the public/private
> problem.
>
>
> Argument in favor of keeping access control enabled (which is the status quo):
> -------------------------------------------------------------------------------
>
> Thread-safety attributes, like type signatures, are part of the
> published interface to a class, and would (ideally) be part of any
> generated documentation, such as that produced by doxygen.  It is bad
> engineering practice to allow private members and other implementation
> details to appear in the public interface of a class.  Generated
> documentation does not mention private members, so we get into
> situations where the documented interface mentions names that are not
> declared anywhere in the documented interface.
>
> Moreover, thread-safety attributes may contain arbitrary C++
> expressions, e.g. this->field1->field2->someMethod()->mu.  If access
> control is turned off, it is turned off for the entire expression.  I
> have seen real-world use cases in which an attribute goes through 3 or
> 4 private fields in order to find the controlling mutex. That's bad
> programming practice, it exposes implementation details, and it
> shouldn't be allowed.
>
>
> Discussion: public & private in type signatures:
> ------------------------------------------------
>
> A final argument in favor of keeping access control enabled is that a
> good language extension should obey the semantics of the host
> language.  The closest analog in this case is the use of private type
> members within public method signatures.  The C++ language allows
> this, but clients of the class will encounter problems very similar to
> the ones described above.  For example:
>
>
> class Foo {
> private:
>   class PrivateFoo {
>   public:
>     int a;
>   };
>
>   PrivateFoo pf_;
>
> public:
>   virtual PrivateFoo* getPrivateFoo() { return &pf_; }
> };
>
>
> void test() {
>   Foo f;
>   f.getPrivateFoo()->a = 0;        // legal in C++, illegal in Java.
>   Foo::PrivateFoo* pf = f.getPrivateFoo();     // illegal
>   auto pf2 = f.getPrivateFoo();    // legal in C++; (no auto in Java)
> }
>
>
> class FooDerived : public Foo {
> public:
>   virtual PrivateFoo* getPrivateFoo() { return 0; }  // illegal
> };
>
>
> In Java, which arguably has the more consistent approach to private
> types, neither the name nor the structure of a private type are
> visible outside of the class in which it is defined.  Thus, a call to
> f.getPrivateFoo()->a  (or it's Java equivalent) is rejected by the
> compiler, because the structure of PrivateFoo (and thus the fact that
> it has a member named a) is not visible within the test() method.  In
> Java the rule is simple: don't use private types in public interfaces.
>
> In C++, the situation is a bit more confusing.  The name of a private
> type is restricted, but the structure of a private type is visible.
> So C++ allows the first call, because PrivateFoo::a is visible.
> However, you cannot create a variable to hold the intermediate value,
> because PrivateFoo itself is private.  Except in C++11, where you can
> use the auto keyword to avoid referring to the type by name.  But if
> you create a virtual method that refers to the private type, then C++
> does not allow you to override it, again because you have to refer to
> the type by name.  IMHO, the general rule in C++ also appears to be:
> don't use private types in public interfaces, because you will just
> cause trouble for yourself.
>
> Regardless of whether the C++ model makes sense, the current
> implementation of thread-safety analysis actually matches it quite
> closely.  As shown earlier, you can use private mutexes in attributes
> on public methods, but you will not be allowed to override those
> methods in derived classes, and you may or may not be able to use them
> outside of the class, depending on the situation.
>
>
> Discussion: argument from first principles.
> --------------------------------------------
>
> Assuming that we abandon established language precedent, what *should*
> thread-safety analysis be doing?
>
> The fundamental problem here is that thread-safety analysis conflates
> two distinct concepts.  There is a difference between the mutex
> object, (i.e. an object with Lock() and Unlock() methods), and the
> region of memory that a mutex controls.  The mutex object should be
> private, because you don't want clients of a class to be able to lock
> and unlock it arbitrarily.  However, the memory region (or at least,
> the name of the region) should be public.  Regions are not an
> implementation detail.  If a class has more than one region, and does
> not do locking internally, then clients of the class need to know
> which regions are being read or modified by a particular method.  (For
> a good paper on regions, threading, and type-and-effect systems, I
> recommend "A Type and Effect System for Deterministic Parallel Java",
> OOPSLA 2009.  As that paper shows, you can prove thread safety using
> regions alone, without any mutexes at all.)
>
> That being said, conflating the region name and the mutex name is very
> convenient from a practical programming standpoint.  Ideally, we want
> the best of both worlds.  We want to publicly expose the mutex *name*,
> while keeping the actual mutex *object* private.  Going back to the
> private type analogy, this would correspond to a privacy model where
> the name of an internal type was visible, but the implementation of
> the type was not visible.  (Recall that C++ does the opposite -- the
> name is private, but the implementation is public.)  There are several
> academic languages, most notably the ML module system, that do exactly
> this, and it works well.  Unfortunately, C++ provides no such
> mechanism.
>
> Given the limitations of C++, public getter methods are actually a
> decent way of enforcing the mutex/region distinction.  The getter
> method names a region, while the mutex provides a private object which
> can be used to gain exclusive access to the region.
>
> On the other hand, turning off access control would also make the
> mutex name visible, at least in those places where it matters, without
> actually exposing the mutex object.  It has the unfortunate
> side-effect of making everything visible within attributes (not just
> the mutex names), and it does not make the mutex name visible in
> documentation, but perhaps those are reasonable compromises to get
> around limitations in the C++ privacy model.
>
> What does everyone else think?
>
>   -DeLesley
>
> --
> DeLesley Hutchins | Software Engineer | delesley at google.com | 505-206-0315
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev