[LLVMdev] Pointers in Load and Store

John Criswell criswell at illinois.edu
Mon Jan 24 09:03:43 PST 2011


On 1/22/11 7:19 PM, Surinder wrote:
> John,
>
> I have looked at the real code (instead of the obsolete one) and it
> appears to be easy to find if an operand is a getelementptr
> instruction.
>
>    if (ConstantExpr * CE = dyn_cast<ConstantExpr>(I.getOperand(0)))
>      { Out<<   "*** operand 0 is a constant Expr******";
>         if (CE->getOpcode() == Instruction::GetElementPtr)
>           { Out<<   "*** operand 0 is a gep instruction ********";
>             if (const ArrayType *ar =
> dyn_cast<ArrayType>(CE->getPointerOperandType()->getElementType()))
>                hi=ar->getNumElements();
>
> Thank you for that.

You're welcome.

> I would like to use safecode programs rather than write my own code.
> However, the website of safecode says that it works only with version
> 2.6 or 2.7 of llvm whereas I use version 2.8 of llvm.

SAFECode already does all the things you mention below.  Unless you have 
a pressing need to use LLVM 2.8, I recommend switching to LLVM 2.7 so 
that you can re-use the SAFECode passes unmodified.

If you must use LLVM 2.8 or mainline LLVM, then I have the following 
suggestions below:

> To get around the problem, I plan to do as follows :
>
> (1)  Do not install safecode with llvm 2.8 (as it may or may not work)
>
> (2)  Create a new pass named "unGep", "Breaks Constant GEPs"

The BreakConstantGEP pass is self-contained and should be trivial to 
update to work with LLVM 2.8.  There is no need for you to go through 
the effort to rewrite it.

> (3) The new pass derives from FunctionPass (because safecode does so,
> if I had to write it, ModulePass would have been good enough.)
>
> (4) The RunOnFunction method of the unGep pass invokes
> addPoolChecks(F) passing it the function F.  I will modify
> addGetElementPtrChecks so that it produces array bounds in the way I
> need. (I need a check that array bounds are being voilated for my
> reaserch to detect overflows.)

First, passes should do just one thing.  The BreakConstantGEP pass 
should convert ConstantExpr GEPs into GEP instructions and not do 
anything else.  A separate pass should insert bounds checks.  Calling 
addPoolChecks() from the BreakConstantGEP pass is a bad idea; it 
prevents the BreakConstantGEP pass from being reusable.

I'm currently in the process of making SAFECode follow this philosophy.  
For example, for LLVM 2.6, the InsertPoolChecks pass added load/store 
checks, array bounds checks, and indirect function call checks.  I've 
moved the code that inserts load/store checks into a separate pass in 
mainline SAFECode and intend to do the same for indirect function call 
checks.  We have also moved various array bounds check optimizations 
into separate passes.

Second, the code in InsertPoolChecks that inserts checks on GEP 
instructions is pretty straightforward.  If you take the mainline 
version and remove the call to isGEPSafe(), it should not have any other 
pass dependencies, and you should be able to easily update it to LLVM 2.8.

As for the implementation of the run-time check, the interface is pretty 
generic: it takes a pool handle, the source of the GEP, and the result 
of the GEP and does the run-time check.  The only extraneous parameter 
is the pool handle, and your run-time check can just ignore it if it 
doesn't need it.  You only need to specialize the code for your run-time 
array bounds check implementation if you require parameters other than 
these.


> I will then run opt as
>
> opt -load../unGep.so

Yes, this is how you could run the passes.  We built the SAFECode tool 
(sc) because SAFECode uses several different libraries; creating a 
separate tool was easier than trying to load all the libraries into opt.

-- John T.

> to produce llvm code without geps as operands.
>
> Please advise if this will work or if there is an easier way.
>
> Thanks.
>
> Surinder Kumar Jain
>
>
> On Sat, Jan 22, 2011 at 4:08 PM, John Criswell<criswell at illinois.edu>  wrote:
>> On 1/21/2011 10:46 PM, Surinder wrote:
>>> John,
>>>
>>> I have looked at the SAFECode and thought following sould work
>>>
>>>         if (isa<Constant>(I.getOperand(0)))
>>>         { Out<<    "*** operand 0 is a constant ******";
>>>           if (Instruction *operandI =
>>> dyn_cast<Instruction>(I.getOperand(0)))
>>>             { Out<<    "********operand is an instruction ****";
>>>               if (GetElementPtrInst *gepI =
>>> dyn_cast<GetElementPtrInst>(operandI))
>>>                 { Out<<    "*** operand is a gep instruction ********";
>>>                   if (const ArrayType *ar =
>>> dyn_cast<ArrayType>(gepI->getPointerOperandType()->getElementType()))
>>>                     hi=ar->getNumElements();
>>> But this does not recognize that operand(0) of instruction I is even
>>> an instruction, let alone a get element pointer instruction.  I have
>>> taken the code from line 632 and line 757 of
>>> safecode/lib/ArrayBoundsChecks/ArrayBoundCheck.cpp
>>>
>>> I must be doing something wrong, what is it?
>> The problem is simple: you're looking at the wrong source file.
>> :)
>>
>> More specifically, you're looking at the very antiquated static array bounds
>> checking pass (it hasn't compiled in several years now).  The file you want
>> to look at is in lib/InsertPoolChecks/insert.cpp.  This file contains the
>> InsertPoolChecks pass which, in mainline SAFECode, is responsible for
>> inserting array bounds checks and indirect function call checks.  In
>> particular, you want to look at the addGetElementPtrChecks() method.
>>
>> As for Constant Expression GEPs, you want to look at the BreakConstGEP pass,
>> located in lib/ArrayBoundsChecks/BreakConstantGEPs.cpp.
>>
>> The BreakConstantGEP pass is run first.  All it does is find instructions
>> that use constant expression GEPs and replaces the Constant Expression GEP
>> with a GEP instruction.  All of the other SAFECode passes that worry about
>> array bounds checks (i.e., the static array bounds checking passes in
>> lib/ArrayBoundsCheck and the run-time instrumentation pass in
>> lib/InsertPoolChecks/insert.cpp) only look for GEP instructions.
>>
>> -- John T.
>>
>>
>>> Surinder Kumar Jain
>>>
>>>
>>> PS: Yes, I will be using safecode but still I want to know why above
>>> code does not work.  I am posting a separate mail wioth the title "OPT
>>> optimizations"
>>
>>> On Fri, Jan 21, 2011 at 3:12 PM, John Criswell<criswell at illinois.edu>
>>>   wrote:
>>>> On 1/20/2011 10:02 PM, Surinder wrote:
>>>>> When I compile C programs into llvm, it produces load instructions in
>>>>> two different flavours.
>>>>>
>>>>> (1)    %8 = load i8** %bp, align 8
>>>>>
>>>>> (2)    %1 = load i8* getelementptr inbounds ([4 x i8]* @.str, i64 0,
>>>>> i64 0), align 1
>>>>>
>>>>> I know that %bp in first case and the entire "getelementptr inbounds
>>>>> ([4 x i8]* @.str, i64 0, i64 0)" in second case can be obtained by
>>>>> dump'ing I.getOperand(0)
>>>>>
>>>>> However, I want to find out which of the two forms of load have been
>>>>> produced because in the second case, I want to insert checks for array
>>>>> bounds.
>>>>>
>>>>> How can I find out when I am in Instruction object I and I.getOpcode()
>>>>> == 29 whether I am dealing with type (1) or type (2) above.
>>>> The second load instruction is not really a "special" load instruction.
>>>> Rather, its pointer argument is an LLVM constant expression (class
>>>> llvm::ConstExpr).  The Getelementptr (GEP), which would normally be a
>>>> GEP instruction, is instead a constant expression that will be converted
>>>> into a constant numeric value at code generation time.
>>>>
>>>> So, what you need to do is to examine the LoadInst's operand and see if
>>>> its a ConstExpr, and then see whether the ConstExpr's opcode is a GEP
>>>> opcode.
>>>>
>>>> However, there's an easier way to handle this.  SAFECode
>>>> (http://safecode.cs.illinois.edu) has an LLVM pass which converts
>>>> constant expression GEPs into GEP instructions.  If you run it on your
>>>> code first, you'll get the following instruction sequence:
>>>>
>>>> %tmp = getelementptr inbounds ([4 x i8]* @.str, i64 0,i64 0), align 1
>>>> %1 = load i8* %tmp
>>>>
>>>> You then merely search for GEP instructions and put run-time checks on
>>>> those (which you have to do anyway if you're adding array bounds
>>>> checking).  The only ConstantExpr GEPs that aren't converted, I think,
>>>> are those in global variable initializers.
>>>>
>>>> Now, regarding the insertion of array bounds checks, SAFECode does that,
>>>> too (it is a memory safety compiler for C code).  It also provides a
>>>> simple static array bounds checker and some array bounds check
>>>> optimization passes.
>>>>
>>>> I can direct you to the relevant portions of the source code if you're
>>>> interested.
>>>>
>>>> -- John T.
>>>>
>>>>> Thanks.
>>>>>
>>>>> Surinder Kumar Jain
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>




More information about the llvm-dev mailing list