[llvm-dev] How to ask MustAlias queries from DSA results

Sun Dec 18 09:38:04 PST 2016

On 12/17/16 9:55 PM, 杨至轩(Zhixuan Yang) wrote:
> Dear Josh,
>
>
>     > If I understand correctly, if you find memory leak, you want to
>     find the corresponding call(s) to malloc() that allocated the
>     memory object, correct?  Can you more completely explain what you
>     are trying to accomplish?
>
>
> Thanks for your reply. In my task, I use data flow analysis to locate 
> a program point where a malloc must be leaked (by must leaked, I mean 
> (a) it must be allocated, (b) must not be free()d and (c) never used 
> in the future). And I want to fix this leak by finding a pointer must 
> point to that malloc(). So I want to perform a must-alias query.

When you say "must be allocated," you mean it must have been allocated 
via a call to a heap allocator (e.g., malloc(), calloc(), etc), correct?

Technically, global variables and stack variables also allocated; they 
just don't allocate heap memory.

Also, are you performing intra-procedural or inter-procedural data-flow 
analysis?

>
>     >However, DSA is a unification-based analysis, so I would think
>     that the accuracy of a must-alias feature would be pretty weak. 
>     Also, DSA loses precision as it performs more inter-procedural
>     analysis (the local analysi>s will be the most precise but will
>     have many Incomplete DSNodes; the Bottom-Up and Top-Down propagate
>     information up and down the call graph but will cause further
>     DSNode merging).
>
>
> Thanks for your clarification. I agree with you. Even if we 
> implemented a MustAlias interface in DSA, it will be too weak.
>
>
>     >It may be that you will need a more accurate points-to analysis
>     algorithm for your work.
>
>
> In fact, my task can be solved in a simpler (while less elegant) way. 
> If I want to find pointers must-alias with a malloc() call, I can 
> create a new variable storing the result returned by the malloc() when 
> it is called.

This is essentially a fat pointer; you're extending the pointer that 
you're checking to contain the base address of the memory object to 
which it points as well as the memory address to which it points. Since 
you're not adding the base address to the pointer but passing it around 
with the pointer, you must transform the code so that the base address 
"follows" the pointer value wherever it goes (into memory, passed to 
functions as arguments, etc).

Fat pointers are relatively easy for local variables but are much more 
of a pain for pointers that are stored to/read from memory or passed to 
functions as arguments.  I'm also of the opinion that every fat pointer 
approach suffers from some degree of compatibility problems with 
third-party library code (the infamous "external code" problem).

If you're going to transform the program, I would recommend that you use 
SAFECode's new BBAC feature to track the base address.  BBAC has a 
run-time library which can take a pointer to a memory object and 
calculate, in constant time, the first address of the memory object into 
which the pointer is pointing.  You could use this to find the base 
address of the memory object so that you can pass it to the free() 
function.  As BBAC is a referent object approach, it doesn't suffer from 
the compatibility problems that fat pointer approaches suffer.

My Google Summer of Code student, Zhengyang Liu, worked on BBAC this 
summer and created an updated and robust implementation of it that you 
could modify for your project.  If you're interested, please email me so 
that I can put you in touch with him.

Regards,

John Criswell

> Thanks for your help.
>
> Best regards, Zhixuan Yang

-- 
John Criswell
Assistant Professor
Department of Computer Science, University of Rochester
http://www.cs.rochester.edu/u/criswell

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161218/6e9bfe54/attachment.html>