[LLVMdev] Handling of pointer difference in llvm-gcc and clang

Florian Merz florian.merz at kit.edu
Wed Aug 10 09:43:03 PDT 2011


Hi Duncan,

Am Wednesday, 10. August 2011, 18:13:16 schrieb Duncan Sands:
> Hi Stephan,
> 
> > We are developing a bounded model checker for C/C++ programs
> > (http://baldur.iti.kit.edu/llbmc/) that operates on LLVM's intermediate
> > representation.  While checking a C++ program that uses STL containers
> > we noticed that llvm-gcc and clang handle pointer differences in
> > disagreeing ways.
> > 
> > Consider the following C function:
> > int f(int *p, int *q)
> > {
> > 
> >       return q - p;
> > 
> > }
> > 
> > Here's the LLVM code generated by llvm-gcc (2.9):
> > define i32 @f(i32* %p, i32* %q) nounwind readnone {
> > 
> > entry:
> >     %0 = ptrtoint i32* %q to i32
> >     %1 = ptrtoint i32* %p to i32
> >     %2 = sub nsw i32 %0, %1
> >     %3 = ashr exact i32 %2, 2
> >     ret i32 %3
> > 
> > }
> > 
> > And here is what clang (2.9) produces:
> > define i32 @f(i32* %p, i32* %q) nounwind readnone {
> > 
> >     %1 = ptrtoint i32* %q to i32
> >     %2 = ptrtoint i32* %p to i32
> >     %3 = sub i32 %1, %2
> >     %4 = ashr exact i32 %3, 2
> >     ret i32 %4
> > 
> > }
> > 
> > Thus, llvm-gcc added the nsw flag to the sub, whereas clang didn't.
> > 
> > We think that clang is right and llvm-gcc is wrong:  it could be the
> > case that p and q point into the same array, that q is 0x80000000, and
> > that p is 0x7FFFFFFE.  Then the sub results in a signed overflow, i.e.,
> > sub with nsw is a trap value.
> > 
> > Is this a bug in llvm-gcc?
> 
> in llvm-gcc (and dragonegg) this is coming directly from GCC's gimple:
> 
> f (int * p, int * q)
> {
>    long int D.2718;
>    long int D.2717;
>    long int p.1;
>    long int q.0;
>    int D.2714;
> 
> <bb 2>:
>    q.0_2 = (long int) q_1(D);
>    p.1_4 = (long int) p_3(D);
>    D.2717_5 = q.0_2 - p.1_4;
>    D.2718_6 = D.2717_5 /[ex] 4;
>    D.2714_7 = (int) D.2718_6;
>    return D.2714_7;
> 
> }
> 
> Signed overflow in the difference of two long int (ptrdiff_t) values
> results in undefined behaviour according to the GCC type system, which is
> where the nsw flag comes from.
> 
> The C front-end generates this gimple in the pointer_diff routine.  The
> above is basically a direct transcription of what pointer_diff does.
> 
> In short, I don't know if this is right or wrong; but if it is wrong it
> seems to be a bug in GCC's C frontend.
> 
> Ciao, Duncan.

In section 6.5.6 (Additive operators) the C standard says that the result of 
subtracting two pointers should have type ptrdiff_t, which is signed. I could 
not find any clear statement about this particular case in the standard, but to 
me it does not make much sense to treat the subtraction as signed just because 
the result is. As Stephan mentioned above treating the subtraction as signed 
declares the result of subtracting pointers smaller than 0x7F...F from 
pointers larger than 0x80...0 as undefined. This can hardly be intendend.

Regards,
 Florian



More information about the llvm-dev mailing list