[LLVMdev] Re:RE: Question about inserting instructions

Qiuyu Zhang qiuyu at ucla.edu
Wed May 11 13:30:29 PDT 2005


Hi, 

Thanks Volodya, Misha and Chris,

> > For example,
> >  Correct way:
> >     Instruction *NewInst = new LoadInst(...);
> >     NewBB->getInstList().push_back(NewInst);
> > 
> >  what I need just put some junk data in the BB, not instructions. From
> >  assemble code level, it looks like the following,
> > 
> >  a piece of code from correct instructions by disassemble object code.
> >    
> > :00000009 0533709283              add eax, 83927033
> > :0000000E 05A2B78135              add eax, 3581B7A2
> > :00000013 C1C819                  ror eax, 19
> > :00000016 05E5167711              add eax, 117716E5
> > :0000001B 0542F7A8DC              add eax, DCA8F742
> > 
> > 
> > :00000009 0533709283              add eax, 83927033
> > :0000000E 7878787878          ???                                    <<<<<<  here is the illegal instruction.
> > :00000013 23232                    ???                                    <<<<<<
> > :00000016 05E5167711              add eax, 117716E5
> > :0000001B 0542F7A8DC              add eax, DCA8F742
> > 
> >     what I tried is to make *NewInst point to random memory(cast to
> >     Instuction pointer) and push_back to instList. But I failed to do
> >     it. 
> >     
> >             Instruction *NewInst  =  ;   
> >             NewBB->getInstList().push_back(NewInst);
> > 
> > So I was wondering if it is allowed in LLVM or not, if so, how to do that?
> 
> LLVM code must not have any dangling pointers, and hence, this is not
> valid LLVM.
> 
> If you want to generate "invalid native code", the way I would suggest
> doing it is to create some LLVM instruction in the dead basic block that
> you can easily identify, such as:
> 
> * create a new external function, do not define it
> * call it from the dead basic block
> * then, modify the native code generator for your chosen platform to
>   look for the call(s) to the fake external function and create some
>   "new instruction", i.e. one that's invalid for the real target but one
>   that gives you the bit pattern you want
> * you will want to add a new instruction definition to the .td file,
>   and then generate it in the instruction selector
> 
> However, the question is what is your bigger goal?  What you're doing
> here is hacking around the optimizers, trying to trick them to not
> delete the dead code.  Perhaps there is another way to achieve your end
> goal, if you could tell us what the big picture is.

    Let's say on IR level, regular way the following IR code
       %tmp.0 = getelementptr [10 x sbyte]* %str1, int 0, int 0                ; <sbyte*> [#uses=1]        store sbyte 116, sbyte* %tmp.0        %tmp.1 = getelementptr [10 x sbyte]* %str1, int 0, int 1                ; <sbyte*> [#uses=1]        store sbyte 101, sbyte* %tmp.1        %tmp.2 = getelementptr [10 x sbyte]* %str1, int 0, int 2                ; <sbyte*> [#uses=1]        store sbyte 115, sbyte* %tmp.2        %tmp.3 = getelementptr [10 x sbyte]* %str1, int 0, int 3                ; <sbyte*> [#uses=1]        store sbyte 116, sbyte* %tmp.3
will be assembled to        
        movb $116, 18(%esp)        movb $101, 19(%esp)        movb $115, 20(%esp)        movb $116, 21(%esp) 
But for me, in dummy BB, we'd like to put some meaningless code or illegal code. From assemble machine level, it looks like

                push %eax
                push %ecx
                pop %edx
                pusha
                safh
                cltd
                das
                clc
 
all of them are legal one-byte x86 machine instructions. Since those instructions have no chance to be executed, so it will not affect the original code. I thought the above machine code cannot be inserted by using new Instruction(....) way because it is IR level. So maybe we can control machineinst generator to generate the above code in dummy bb. By the way,  those dummy BBs' name include string " dummy ", so we can identify which BB is dummy on IR level.

If there is a way to be able to get that,   I am supposed that like the following,

1. generate some dummy BB on IR level ( working on *.bc by writing a pass)
2. llc *.bc ( generate machine code)
3. as -o *.s *o ( generate object file, or use gcc )
4. ld -o *.out *.o ( generate executable file)     

during step 2, we read *.bc code and find dummy BB and put some meaningless machinecode, here, we cannot put some illegal machince code, otherwise, step 3 goes to fail.  So is it possible to do that for inserting any machine code into BB? if so, how could we chang llc? I take a look at MachineInstr.c CodeGenerator.c etc, but I still don't know how to do it.

Here is a thing that may be useful to understand what I want to do. Some virus coder, they code a virus by assemble code and insert some meaningless code into virus, but they work on assemble level, so it is easy to get it. For me. I don't know if I could do same thing by another way.

   
> -- 
> This isn't going to work.  The LLVM code always has to be well-defined. 
> The way to get the machine code to contain garbage like this is to add an 
> intrinsic, then have the code generator expand it to the garbage you want.
    
    So we cannot use LLVM code to this, but I am not clear for the way you mentioned.

Thanks 




More information about the llvm-dev mailing list