[LLVMdev] How to infer sign information from LLVM IR?

Duncan Sands baldrick at free.fr
Tue Oct 18 01:11:38 PDT 2011


Hi Jorge,

> I'm implementing an integer bounds analysis (i.e., for each variable
> compute its lower and upper bound) as a LLVM pass. My objective is to
> identify potential overflows in both signed and unsigned C integers.
>
> To do that I need to know the sign of each integer.

there is no well-defined concept of the sign of an integer type in LLVM
IR.  Thus I think this approach is misguided.  Consider for example the
following C:

   void foo(int x, int y) {
     if ((x < y) || ((unsigned)x < (unsigned)y))
       abort();
   }

The casts simply don't appear at all in the LLVM IR.  You get something
like this:

   %cmp = icmp slt i32 %x, %y
   %cmp1 = icmp ult i32 %x, %y
   %or.cond = or i1 %cmp, %cmp1
   br i1 %or.cond, label %if.then, label %if.end

if.then:                                          ; preds = %entry
   tail call void @abort() noreturn nounwind
   unreachable

if.end:                                           ; preds = %entry
   ret void

The general point I'm trying to make is that in C you change something
from signed to unsigned by casting.  In LLVM IR none of those casts are
visible, it's always the same register representing both the signed and
unsigned C values.  Thus you can't say that the register is signed or
unsigned because it represents both signed and unsigned values simultaneously.
You may well see both signed and unsigned operations on it (like the signed and
unsigned icmp instructions above).

Ciao, Duncan.


  However, we know
> that LLVM IR does not keep sign information explicitly.
>
> My understanding is that the only instructions with explicit sign
> information is ICmp (e.g., icmp ugt ....) for the case of
> integers. Thus, sign information may be reconstructred from icmp
> instructions.  On the other hand, arithmetic instructions as Add, Sub,
> and Mul do not have this information. (I'm aware of the "nsw" and
> "nuw" flags but they are optional.) As a result, it's possible to have
> a straight-line piece of code (without icmp instructions) where no
> sign information can be inferred but potential overflows can happen
> without knowing whether the integer is signed or not.
>
> Recently, somebody posted:
>
>> Some time ago I posted here a couple times about integer overflow
>> checking.  Since then we (at Utah) joined forces with Vikram Adve and
>> his student Will Dietz who are also looking at integer issues.  Between
>> the four of us we've done a lot of looking at and thinking about integer
>> overflow in C/C++, and have written up a paper containing some data and
>> observations.  Perhaps it'll be interesting to people here:
>
>>     http://www.cs.utah.edu/~regehr/papers/overflow12.pdf
>
> In this work, they implement a dynamic analysis for detecting integer
> overflows in the front-end (using clang). I belive they moved to the
> front-end rather than implementing a LLVM pass precisely due to, among
> others, the sign problem I mention.
>
> Is there a quick solution to infer sign information without moving to
> the front-end? Maybe encoding some metadata?
>
> Thanks!
> Jorge
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev




More information about the llvm-dev mailing list