<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#ffffff">

    John, Ether,<br>

    <br>

    Thank you for the reply.<br>

    <br>

    With John's help and suggestions from Ether, I can now identify the

    full list of dependent instructions that I need to make a copy and

    insert somewhere else (before their original location).<br>

    <br>

    Here is the list:<br>

    <br>

    <pre wrap="">0    %90 = load i32* @strstart, align 4

1    %91 = add i32 %90, 2

2    %88 = load i32* @ins_h, align 4

3    %92 = getelementptr inbounds [65536 x i8]* @window, i32 0, i32 %91

4    %89 = shl i32 %88, 5

5    %93 = load i8* %92, align 1

6    %.masked = and i32 %89, 32736

7    %94 = zext i8 %93 to i32

8    %95 = xor i32 %94, %.masked

9    %.sum73 = or i32 %95, 32768

10    %104 = getelementptr inbounds [65536 x i16]* @prev, i32 0, i32 %.sum73

11    %take_addr2 = getelementptr i16* %104

12    %105 = bitcast i16* %take_addr2 to i8*

13    call void @bkp_memory(i8* %105, i32 2)

</pre>

    But, when I try to insert them in the above order into a PREVIOUS

    location, the "Instruction does not dominate all uses!" still shows

    up:<br>

    <br>

    Here is the code I was trying to do the clone and insert:<br>

    <br>

    std::vector<Instruction *>::iterator p;<br>

    Instruction * Pos = INSERT_POSITION;<br>

    for(p = coll.begin(); p != coll.end(); ++p){<br>

      Instruction * CurI = *p;<br>

      Instruction * CloneI = CurI->clone();<br>

      CloneI->InsertAfter(Pos);<br>

      Pos = CloneI;<br>

    }<br>

    <br>

    i am not sure if this it the right approach to insert a list of

    dependent instructions. (For delete instructions, people need to

    collect them 1st before removing. Are there similar process for

    insertion?)<br>

    There doesn't seem to have Phi node in the list.<br>

    <br>

    The instructions seem to be inside a single BasicBlock. Would it

    help?<br>

    <br>

    Thank you very much<br>

    <br>

    Chuck<br>

    <br>

    P.S.<br>

    Here is the full list of errors:<br>

    <br>

    <pre wrap="">Instruction does not dominate all uses!

  %88 = bitcast i16* %take_addr2 to i8*

  call void @bkp_memory(i8* %88, i32 2)

Instruction does not dominate all uses!

  %take_addr2 = getelementptr i16* %89

  %88 = bitcast i16* %take_addr2 to i8*

Instruction does not dominate all uses!

  %89 = getelementptr inbounds [65536 x i16]* @prev, i32 0, i32 %.sum73

  %take_addr2 = getelementptr i16* %89

Instruction does not dominate all uses!

  %.sum73 = or i32 %90, 32768

  %89 = getelementptr inbounds [65536 x i16]* @prev, i32 0, i32 %.sum73

Instruction does not dominate all uses!

  %90 = xor i32 %91, %.masked

  %.sum73 = or i32 %90, 32768

Instruction does not dominate all uses!

  %91 = zext i8 %92 to i32

  %90 = xor i32 %91, %.masked

Instruction does not dominate all uses!

  %92 = load i8* %94, align 1

  %91 = zext i8 %92 to i32

Instruction does not dominate all uses!

  %93 = shl i32 %95, 5

  %.masked = and i32 %93, 32736

Instruction does not dominate all uses!

  %94 = getelementptr inbounds [65536 x i8]* @window, i32 0, i32 %96

  %92 = load i8* %94, align 1

Instruction does not dominate all uses!

  %95 = load i32* @ins_h, align 4

  %93 = shl i32 %95, 5

Instruction does not dominate all uses!

  %96 = add i32 %97, 2

  %94 = getelementptr inbounds [65536 x i8]* @window, i32 0, i32 %96

Instruction does not dominate all uses!

  %97 = load i32* @strstart, align 4

  %96 = add i32 %97, 2

Instruction does not dominate all uses!

  %90 = xor i32 %91, %.masked

  store i32 %90, i32* @ins_h, align 4

Instruction does not dominate all uses!

  %90 = xor i32 %91, %.masked

  %.sum39 = or i32 %90, 32768

Instruction does not dominate all uses!

  %.sum39 = or i32 %90, 32768

  %100 = getelementptr inbounds [65536 x i16]* @prev, i32 0, i32 %.sum39

Instruction does not dominate all uses!

  %100 = getelementptr inbounds [65536 x i16]* @prev, i32 0, i32 %.sum39

  %101 = load i16* %100, align 2

Instruction does not dominate all uses!

  %101 = load i16* %100, align 2

  %102 = zext i16 %101 to i32

Instruction does not dominate all uses!

  %101 = load i16* %100, align 2

  store i16 %101, i16* %103, align 2

Instruction does not dominate all uses!

  %89 = getelementptr inbounds [65536 x i16]* @prev, i32 0, i32 %.sum73

  store i16 %105, i16* %89, align 2

Instruction does not dominate all uses!

  %101 = load i16* %100, align 2

  %108 = icmp ne i16 %101, 0

Instruction does not dominate all uses!

  %108 = icmp ne i16 %101, 0

  %or.cond47 = and i1 %108, %110

Instruction does not dominate all uses!

  %or.cond47 = and i1 %108, %110

  br i1 %or.cond47, label %bb3, label %bb9

Broken module found, compilation aborted!

</pre>

    <br>

    <br>

    On 5/4/2011 8:20 PM, John Criswell wrote:

    <blockquote cite="mid:4DC1ED30.9060101@illinois.edu" type="cite">

      <meta content="text/html; charset=ISO-8859-1"

        http-equiv="Content-Type">

      Dear Chuck,<br>

      <br>

      I haven't read all of the details, but it seems that what you need

      to do is to clone defs before you clone any uses of the def.  To

      do that, you want to iterate over the instructions in

      dominator-tree order.<br>

      <br>

      To do that, you first construct the dominator tree (there is an

      LLVM analysis pass that does that).  Then, you iterate over the

      basic blocks in the dominator tree from the root of the tree to

      the bottom; make sure you process each basic block before

      processing any of its children.  When you process a basic block,

      you clone the instructions in that basic block.<br>

      <br>

      This ensures that you process all defs before uses in a def-use

      chain for everything except phi-nodes.  Phi-nodes are special

      because they are not dominated by their inputs.  If your phi-node

      copying is simple, you want to create a dummy phi-node whose

      inputs are all undef.  After you've finished copying all

      instructions, you go back and change the operands of the phi-nodes

      to be what you want them to be.<br>

      <br>

      The above works well for simple cases in which knowing where to

      insert phi-nodes is easy (e.g., you are inserting instrumentation

      in such a way that you are adding a phi-node that mirrors an

      existing phi-node within the program).  Knowing where to put

      phi-nodes can be difficult in general.  For such cases, you should

      make alloca'ed memory locations for each variable you are adding

      in the program.  You then use load/store instructions to

      read/write the data into/out of SSA variables, eliminating the

      need to add phi-nodes.  Then, when your pass is finished, run the

      mem2reg pass; it will convert your alloca'ed variables into SSA

      virtual registers and insert phi-nodes for you.<br>

      <br>

      As an aside, knowing how the SSA construction algorithm works is

      what led me to understand how to visit defs before uses in LLVM

      IR.  I recommend reading Cytron et. al.'s paper (<a

        moz-do-not-send="true" class="moz-txt-link-freetext"

        href="http://portal.acm.org/citation.cfm?id=115320">http://portal.acm.org/citation.cfm?id=115320</a>).<br>

      <br>

      -- John T.<br>

      <br>

      <br>

      On 5/4/11 6:16 PM, Chuck Zhao wrote:

      <blockquote cite="mid:4DC1DE33.2000408@eecg.toronto.edu"

        type="cite">

        <meta http-equiv="Content-Type" content="text/html;

          charset=ISO-8859-1">

        <font size="+1">While working on my optimization pass (a

          Function Pass), I try to replicate a call instruction and

          insert it at some earlier location (similar to LICM).<br>

          However, the instruction I am trying to replicate has

          dependencies on an uncertain number of instructions that are

          used to generate an address.<br>

          <br>

          A simple example (IR segment):<br>

          <br>

          define void @foo() nounwind {<br>

          entry:<br>

            <font color="#cc0000">%a = alloca i32, align 4;           

                   ; [0]</font><br>

            %b = alloca i32, align 4<br>

            %c = alloca i32, align 4<br>

            %d = alloca i32, align 4<br>

            %e = alloca i32, align 4<br>

            %f = alloca i32, align 4<br>

            %A = alloca [100 x i32], align 4<br>

            call void @start_ckpt() nounwind<br>

            %0 = call i32 @puts(i8* getelementptr inbounds ([8 x i8]*

          @.str, i32 0, i32 0)) nounwind<br>

           <font color="#990000"> </font><font color="#990000">%a1 =

            bitcast i32* %a to i8*          ;  [1]</font><br>

           <font color="#6600cc"> call void @bkp_memory(i8* %a1, i32 4)

            nounwind            ; [2] the function call instruction that

            I am about to replicate</font><br>

          <br>

          I need to make a copy/clone of the call instruction [2].

          However, I->clone() failed on [2], because [2] has

          dependency chained instructions that all will need to be

          cloned before itself can.<br>

          The back-trace of dependent instructions go to: [1] and [0].

          Thus both [1] and [2] need to be cloned in order before

          replicate [2].<br>

          <br>

          <br>

          A more interesting example is given below (IR Segment):<br>

          <br>

          bb1:                                              ; preds =

          %bb29, %bb30.preheader<br>

            %match_length.354 = phi i32 [ %match_length.2.ph, %bb29 ], [

          2, %bb30.preheader ]<br>

            %match_available.153 = phi i32 [ %match_available.0.ph,

          %bb29 ], [ 0, %bb30.preheader ]<br>

            tail call void @start_ckpt() nounwind noinline<br>

          <font color="#000099">  %88 = load i32* @ins_h, align 4       

                   ; [0]<br>

              %89 = shl i32 %88, 5                                  

            ;[1]<br>

              %90 = load i32* @strstart, align 4           ; [2]<br>

              %91 = add i32 %90, 2                                ; [3]<br>

              %92 = getelementptr inbounds [65536 x i8]* @window, i32 0,

            i32 %91   ; [4]<br>

              %93 = load i8* %92, align 1                      ; [5]<br>

              %94 = zext i8 %93 to i32                          ; [6]<br>

          </font><font color="#000099">  %.masked = and i32 %89,

            32736                ;[7]</font><br>

          <font color="#000099">  %95 = xor i32 %94, %.masked           

                    ;[8]</font><br>

            %take_addr = getelementptr i32* @ins_h<br>

            %96 = bitcast i32* %take_addr to i8*<br>

            call void @bkp_memory(i8* %96, i32 4)<br>

            store i32 %95, i32* @ins_h, align 4<br>

            %97 = and i32 %90, 32767<br>

            %.sum39 = or i32 %95, 32768<br>

            %98 = getelementptr inbounds [65536 x i16]* @prev, i32 0,

          i32 %.sum39<br>

            %99 = load i16* %98, align 2<br>

            %100 = zext i16 %99 to i32<br>

            %101 = getelementptr inbounds [65536 x i16]* @prev, i32 0,

          i32 %97<br>

            %take_addr1 = getelementptr i16* %101<br>

            %102 = bitcast i16* %take_addr1 to i8*<br>

            call void @bkp_memory(i8* %102, i32 2)<br>

            store i16 %99, i16* %101, align 2<br>

            %103 = trunc i32 %90 to i16<br>

          <font color="#000099">  %.sum73 = or i32 %95, 32768           

                    ;[9]<br>

              %104 = getelementptr inbounds [65536 x i16]* @prev, i32 0,

            i32 %.sum73    ;[10]<br>

              %take_addr2 = getelementptr i16* %104 ; [11]<br>

              %105 = bitcast i16* %take_addr2 to i8*   ;[12]</font><br>

          <font color="#6600cc">  call void @bkp_memory(i8* %105, i32

            2); [13]</font><br>

          <br>

          Again, attempting to replicate a call instruction [13]. But,

          this time the dependency chain is much longer and goes way

          further back, which includes [0] .. [12].<br>

          <br>

          Knowing this dependency chain could be arbitrarily long, with

          multiple levels of use-def and def-use relationships. The

          termination conditions could be one of:<br>

          (1). local variable declaration (Alloca)<br>

          (2). global variable <br>

          (3). const<br>

          <br>

          I think it will need a depth-first or breath-first search to

          identify all related instructions before the I->clone() can

          possibly happen.<br>

          <br>

          I am asking if there is a known-good way of doing this, or

          some LLVM API provided, since making a copy of an existing

          LLVM instruction seems to be a very common task in code

          transformations.<br>

          <br>

          <br>

          Thank you very much<br>

          <br>

          Chuck<br>

          <br>

        </font>

        <pre wrap=""><fieldset class="mimeAttachmentHeader"></fieldset>

_______________________________________________

LLVM Developers mailing list

<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a>

<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a>

</pre>

      </blockquote>

      <br>

    </blockquote>

    <br>

  </body>

</html>