[cfe-dev] Crisp: Coding rule checking using clang and LLVM

Guillem Marpons gmarpons at babel.ls.fi.upm.es
Wed May 16 17:14:15 PDT 2012


Hi,

I've been working the last months on a coding rule validation add-on
for clang/LLVM, called Crisp:

    https://github.com/gmarpons/Crisp

Coding Rules constrain admissible constructs of a language to help
produce better code (improving reliability, portability,
maintainability, etc.). Some well-known coding rule sets are:

- MISRA-C/C++ (no public access available)
- High Integrity C++ Coding Standard (HICPP): http://www.codingstandard.com/
- CERT's Secure Coding Standards: http://www.cert.org/secure-coding/

Coding rule sets can include style conventions but they go typically
further. Rules range from purely syntactic properties (e.g. "Do not
use the ‘inline’ keyword for member functions") to those that need
deep static analyses to be automated (e.g. "Do not return non-const
handles to class data from const member functions", both examples are
from HICPP).

There are some tools that can be used to define and enforce coding
rules on C/C++ code. Some distinctive features of our tool are:

- Rules (i.e., user checks) are going to be defined using a high-level
declarative Domain Specific Language. This language, called CRISP, is
not implemented yet. CRISP is based on first order logic, and rule
definitions are expected to be very concise and easy to read (see
below). The use of CRISP to formally define rules should avoid the
ambiguity and imprecision problems that arise with current standard
rule sets (they use plain English to define rules), and make the tool
highly and easily extensible (which is important, as almost every
project establish its own set of rules). E.g., part of
http://llvm.org/docs/CodingStandards.html could be probably formalized
and automatically enforced.
- It uses clang as front-end, taking advantage of its rich AST. The
full clang API is available to write new rules. Rules can be checked
during ordinary execution.
- It can integrate information from static analyses to implement
rules. At time being, the only interfaced analysis is alias analysis
as implemented in LLVM.
- It's free software.

Example
=======

Take as example rule HICPP 3.3.13: "Do not invoke virtual methods of
the declared class in a constructor or destructor".

This rule was discussed in this mailing list some months ago:
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2011-September/thread.html#17024.
A justification of the rule can be found here:
http://www.codingstandard.com/HICPPCM/High_Integrity_CPP_Rule_3.3.13.html.

A tentative formalization in CRISP could be the following (many
details of CRISP are not defined, yet):

rule    HICPP 3.3.13
warn    "ctor/dtor %0 calls (maybe indirectly) virtual method %1"
vars    Caller is CXXMethodDecl, note "caller %0 declared here"
        Callee is CXXMethodDecl, note "callee %0 declared here"
def     Record is CXXRecordDecl
        Record has ctor or destructor Caller
        Record has method Callee
        Callee is virtual
        Caller calls+ Callee where ( CallExpr is CXXMemberCallExpr
                                   CallExpr has implicitObjectArgument
MemberExpr
                                   MemberExpr is CXXThisExpr )

Words beginning with a capital letter are either CRISP variables or clang types.


Implementation
==============

The tool is implemented as a clang plug-in plus a LLVM module pass
that has access to alias analysis information.

CRISP is meant to be automatically translated into Prolog, and then
rule validation machinery is executed in Prolog. In its current
status, our tool can be extended with rules directly written in
Prolog. For example, rule HICPP 3.3.13 has been defined as follows
(which is quite difficult to read to people not acquainted to Prolog,
but not that complex, and far more concise that a manual check written
in C++):

violation('HICPP 3.3.13',
          'ctor/dtor %0 calls (maybe indirectly) virtual method %1',
          [ 'NamedDecl'(Caller, 'caller %0 declared here')
          , 'NamedDecl'(Callee, 'callee %0 declared here')]) :-
        isA(Record, 'CXXRecordDecl'),
        ( 'CXXRecordDecl::ctor'(Record, Caller)
        ; 'CXXRecordDecl::destructor'(Record, Caller)
        ),
        'CXXRecordDecl::method'(Record, Callee),
        'CXXMethodDecl::is_virtual'(Callee),                       %
implies Caller \= Callee
        'calls_to_this+'(Caller, Callee).

The diagnostic reporting machinery of clang is used to inform the user
about rule violations (they are reported as warnings, and for every
code entity involved a "note" message is generated).

A number of const methods and iterators from the C++ API are available
from Prolog code to write rules. There are more than a thousand
functions (or Prolog predicates) available so far: those of classes
inheriting from Decl, Type or Stmt, so an enormous number of rules can
be easily written. Examples of Prolog predicates available in the rule
above are 'CXXRecordDecl::ctor' (an iterator in clang) and
'CXXMethodDecl::is_virtual' (a method in clang, with slightly
different name). Methods and iterators from llvm::Value heirs can also
be used for rules that need alias analysis. 'calls_to_this+' is
implemented in Prolog.

All this Prolog binding of clang/LLVM is automatically generated
during the build process of the tool. In fact, this binding is useful
in itself, as it could be used to, e.g., implement:

- A documentation tool
- A refactoring tool
- An API analysis tool for automatically generate bindings for other languages
- More ideas??

Well, I think it's enough information for one single mail. Detailed
installation instructions are given here:
https://github.com/gmarpons/Crisp. Any comments, criticisms, ideas
will be really welcome,

--
Guillem Marpons
Universidad Politécnica de Madrid - Babel Group




More information about the cfe-dev mailing list