[llvm-dev] Problem of array index manipulation collection of LLVM IR

Qingkun Meng via llvm-dev llvm-dev at lists.llvm.org
Fri Jul 22 02:38:04 PDT 2016


                     It depends what you expect exactly. What would be the
ideal output for you on the example you provided before?
                     Also what is the use-case? (I.e. *why* do you want
this information).

I want to collect the array index manipulation frequency in a loop from a
function. I recently have read a paper named "Dowsing for Overflows: A
guided Fuzzer to Find Buffer Boundary Violations". It says the array index
manipulations are related to buffer violations so I want to implement it
since I can't get source code from writer. What the paper has analysed is
LLVM bitcode and this is the reason that I post this problem. Is there any
solution?

2016-07-22 11:48 GMT+08:00 Mehdi Amini <mehdi.amini at apple.com>:

>
> On Jul 21, 2016, at 8:28 PM, Qingkun Meng <mengqingkun1988 at gmail.com>
> wrote:
>
> >if you are interested about what gets actually *executed*, some of these
> computation will be folded in the addressing mode depending on the
> architecture
>
> If I just want to collect array index manipulation lexically, is there any
> reliable solution?
>
>
> It depends what you expect exactly. What would be the ideal output for you
> on the example you provided before?
> Also what is the use-case? (I.e. *why* do you want this information).
>
>
>
> By noting this
> >Some people are doing these kind of analyses using debug info to map back
> to the source code
> do you mean reversing to source code from LLVM IR? Is there any open
> source project? I am very appreciated you could refer it to me.
>
>
> I meant debug information as what clang generates with -g.
> For instance, try with a simple example:
>
> $ cat test.c
> int foo(int a, int b) {
>   return a + b;
> }
>
> And look at the difference in the output when compiled with -g or not
> (i.e. `clang -emit-llvm -S test.c -O3 -o -` and  `clang -emit-llvm -S
> test.c -O3 -o - -g`).
> In the first you’ll get something like:
>
> define i32 @foo(i32, i32) #0 {
>   %3 = add nsw i32 %1, %0
>   ret i32 %3
> }
>
> while in the second case it will look like (stripped to keep only the
> relevant informations):
>
> define i32 @foo(i32, i32) #0 !dbg !7 {
>   tail call void @llvm.dbg.value(metadata i32 %0, i64 0, metadata !12,
> metadata !14), !dbg !15
>   tail call void @llvm.dbg.value(metadata i32 %1, i64 0, metadata !13,
> metadata !14), !dbg !16
>   %3 = add nsw i32 %1, %0, !dbg !17
>   ret i32 %3, !dbg !18
> }
> […]
> !1 = !DIFile(filename: "test.c", directory: “…")
> […]
> !7 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 1,
> type: !8, isLocal: false, isDefinition: true, scopeLine: 1, flags:
> DIFlagPrototyped, isOptimized: true, unit: !0, variables: !11)
> [….]
> !12 = !DILocalVariable(name: "a", arg: 1, scope: !7, file: !1, line: 1,
> type: !10)
> !13 = !DILocalVariable(name: "b", arg: 2, scope: !7, file: !1, line: 1,
> type: !10)
> !14 = !DIExpression()
> !15 = !DILocation(line: 1, column: 13, scope: !7)
> !16 = !DILocation(line: 1, column: 20, scope: !7)
> !17 = !DILocation(line: 2, column: 12, scope: !7)
> !18 = !DILocation(line: 2, column: 3, scope: !7)
>
>
> Now from there you can analyze the IR and see that there is an addition
> for two values (%0 and %1), and the calls to llvm.dbg.value points you to
> some information about these variables (name, type, source location).
>
>> Mehdi
>
>
>
>
>
>
>
>
> 2016-07-22 6:38 GMT+08:00 Mehdi Amini <mehdi.amini at apple.com>:
>
>>
>> > On Jul 21, 2016, at 5:07 AM, Qingkun Meng via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>> >
>> >
>> > Hi there,
>> >
>> > I am a newbie of llvm and here is my question situation. Assume that
>> there is a function F which contains a loop named L, a array b[100]. I want
>> to collect the statistical information of array index operation op(i) (take
>> add and mul simply) of i in the loop L. Pseudocode lists below.
>> >
>> > void F(arg1, arg2){
>> >     int b[100];
>> >     for(int i=0; i<n; i++){
>> >         op1(i);
>> >         op2(i);
>> >         ......
>> >         b[op1(i)]=n1;
>> >         b[op2(i)]=n2;    // n1 and n2 are just common constants
>> > }
>> > }
>> >
>> > The code fragment is compiled to LLVM IR, I want to collect how many
>> times are operations (like add and mul) put on i. However the operations
>> are not easily obtained because there are many temp variables mix the
>> variable trace. Does anyone have ideas to solve this or some open source
>> project do this job?
>>
>> In short: there is no reliable way in the absolute. The optimizer will
>> make transformations that completely loses any relationship with the
>> source-code. Also if you are interested about what gets actually
>> *executed*, some of these computation will be folded in the addressing mode
>> depending on the architecture.
>>
>> Some people are doing these kind of analyses using debug info to map back
>> to the source code, it may be enough if you don’t need precise results or
>> results that are accurate with respect to the final optimized binary
>> instruction stream.
>>
>>>> Mehdi
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160722/dfa638fa/attachment.html>


More information about the llvm-dev mailing list