[cfe-dev] RFC: Clang Automatic Bug Reporting

Mon Jul 26 16:16:16 PDT 2010

Hi Collin,

As I said, there really isn't anything coherent worth checking in that
isn't already there. I can check in a basic skeleton, but we are
talking about a trivial amount of code.

I'll assume you have seen:
 1. include/llvm/ADT/DeltaAlgorithm.h and include/llvm/ADT/DAGDeltaAlgorithm.h
 2. clang/utils/token-delta.py

The first cut at a clang based delta I was planning on was:
 1. Run the clang lexer to dice the input source(s) into raw tokens.
All minimization is done based on tokens.
    - This is already what token-delta.py is doing, although it is a hack.
    - By itself, this will be unusable slow, compared to the standard
'multidelta' tool.
 2. Build an approximate location based DAG:
    - Use the libclang APIs to traverse the AST and extract all range
information.
    - Construct the DAG by treating any token/range as depending on
the other ranges it overlaps. Ignoring overlapped ranges, this will
give us a tree structure over all the input tokens.
 3. Minimize using DAGDeltaAlgorithm.

The basic idea here is that this subsumes multidelta by generating
minimization items ("changes", in the parlance of the original paper)
for any "interesting structure" with has an AST element with a correct
range. Multidelta effectively only could make minimization items for
compound blocks.

All in all this shouldn't be too hard, it is just various bits of
gluing. The current code I have doesn't do anything interesting, the
token-delta.py is already more sophisticated than it.

The other "piece" I have lying around is an CIL based minimization
tool (in OCaML) which constructed a dependency graph using actually
source dependencies. For example, functions would get edges for the
types they depended on. This has the potential to minimize much more
efficiently, but it is also more fragile. The code is horrible, and I
don't feel like sharing it, I'd rather reimplement it on top of Clang
one day. :)

 - Daniel

On Wed, Jul 21, 2010 at 10:03 AM, Collin Winter <collinwinter at google.com> wrote:
> Hey Daniel,
>
> On Tue, Jul 20, 2010 at 12:51 AM, Daniel Dunbar <daniel at zuster.org> wrote:
>> On Mon, Jul 19, 2010 at 8:36 PM, Michael Spencer <bigcheesegs at gmail.com> wrote:
>>> On Mon, Jul 19, 2010 at 11:44 AM, Daniel Dunbar <daniel at zuster.org> wrote:
>>>> and eventually could use a Clang based delta tool to minimize the input source.
>>>
>>> This is something I am very interested in working on and have thought
>>> of before. The basic plan was to first figure out which part of clang
>>> was crashing, and then use the information in the layers above that to
>>> guide the delta algorithm. This part could actually be implemented now
>>> as either part of the driver, or as a separate tool and be useful
>>> right away.
>>
>> Yes, definitely, this is a very worthy project. I have an
>> implementation of the delta algorithm ready and waiting in llvm/ADT,
>> and have bits and pieces of a Clang based delta lying around on my
>> hard drive, but nothing concrete enough to be worth checking in. If
>> someone else wanted to work on this it would make me very happy. :)
>
> Could you put these bits and pieces in source control? I've been
> planning to work on a similar tool for Google's compiler teams, and it
> would be great if we could collaborate on this.
>
> Thanks,
> Collin Winter
>