<html>

    <head>

      <base href="http://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - Add support for DWARF path discriminators"

   href="http://llvm.org/bugs/show_bug.cgi?id=18270">18270</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>Add support for DWARF path discriminators

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>new-bugs

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>unspecified

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Linux

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Keywords</th>

          <td>missing-feature

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>new bugs

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>dnovillo@google.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>dblaikie@gmail.com, echristo@gmail.com, llvmbugs@cs.uiuc.edu

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>This feature in DWARF is mainly used for sample-based PGO

(<a href="http://wiki.dwarfstd.org/index.php?title=Path_Discriminators">http://wiki.dwarfstd.org/index.php?title=Path_Discriminators</a>). When the

profiler samples an instruction, it determines the associated source LOC for

that instruction using debug information.

This allows the compiler to map that source LOC back to the IR, compute

block/edge frequencies and use them to guide the optimizer. This mapping is,

naturally, imprecise, but in some cases it can be actively detrimental to the

optimizer.

For instance, take the following program:

$ cat -n a.cc

     1  long bar(long x, long y) {

     2    while (x < y * 30000)

     3      x++;

     4    return x + y;

     5  }

     6

     7  int main() { return bar(0, 100000) ? 0 : -1; }

When executed under a runtime profiler, the instruction-to-SLOC mapping

provides the following profile:

$ cat a.prof 

_Z3barll:3146580:0

2: 87405

3: 87405

so, line 2 (the while header) executes with frequency 87,405. The body (line 3)

executes with frequency 87,405. And main() does not have enough samples to even

register. That seems straightforward enough.

The problem begins when this is converted into IR. We get the following code

for function bar():

define i64 @_Z3barll(i64 %x, i64 %y) #0 {

entry:

  %x.addr = alloca i64, align 8

  %y.addr = alloca i64, align 8

  store i64 %x, i64* %x.addr, align 8

  store i64 %y, i64* %y.addr, align 8

  br label %while.cond, !dbg !11

while.cond:                                       ; preds = %while.body, %entry

  %0 = load i64* %x.addr, align 8, !dbg !11

  %1 = load i64* %y.addr, align 8, !dbg !11

  %mul = mul nsw i64 %1, 30000, !dbg !11

  %cmp = icmp slt i64 %0, %mul, !dbg !11

  br i1 %cmp, label %while.body, label %while.end, !dbg !11

while.body:                                       ; preds = %while.cond

  %2 = load i64* %x.addr, align 8, !dbg !12

  %inc = add nsw i64 %2, 1, !dbg !12

  store i64 %inc, i64* %x.addr, align 8, !dbg !12

  br label %while.cond, !dbg !12

while.end:                                        ; preds = %while.cond

  %3 = load i64* %x.addr, align 8, !dbg !13

  %4 = load i64* %y.addr, align 8, !dbg !13

  %add = add nsw i64 %3, %4, !dbg !13

  ret i64 %add, !dbg !13

} 

Notice that block 'entry' has a jump to block 'while.cond'. This branch

instruction has exactly the same debug info as the instructions in block

'while.cond'.

                     E1

              +-------------+

              |             |

              v             |

entry -> while.cond -> while.body     while.end

              |                            ^

              |             E2             |

              +----------------------------+

The assignment of block weights from instruction weights is done such that a

block's weight is defined to be

the maximum of all sample counts in the instructions for that basic block.

Since block 'entry' has the branch 'br label %while.cond', its weight is

computed as 87,405.

During propagation, we find that block 'entry' dominates block 'while.end'. By

the property of dominance, the weight of 'while.end' will be the same as the

weight for 'entry'.

This gives us:

weight[entry]: 87405

weight[while.cond]: 87405

weight[while.body]: 87405

weight[while.end]: 87405

After computing block weights, we propagate those weights into edge weights.

Since while.end has a single incoming edge, and the weight of the block is

known, we conclude that the weight of that edge (E2) must be 87,405. Similarly,

the weight of the edge entry->while.cond must be 87,405.

This leaves edge E1 as the only unassigned edge. Since the weight of all

incoming edges cannot add up to more than the weight of the block, we conclude

that the weight of edge E1 must be 0.

This is the opposite of what we should've concluded. The real conclusion should

have been:

weight(E2) = 0

weight(E1) = 87405

This causes us to mispredict the backedge, leading into bad block placement.

=========================================================================================

This can be avoided with the use of path discriminators

(<a href="http://wiki.dwarfstd.org/index.php?title=Path_Discriminators">http://wiki.dwarfstd.org/index.php?title=Path_Discriminators</a>).

Notice that the problem starts with the mis-computation of 'entry's weight. If

the compiler

emits path discriminators, the branch instruction at the end of entry would be

emitted with a different discriminator as the instructions in block while.cond

(essentially, they are on the same line but on different basic blocks).

In this case, we could have something like:

$ cat a.prof 

_Z3barll:3146580:0

2: 0

2.1: 87405

3: 87405

All the instructions in block 'entry' belonging to line 2 would get no

discriminator, all the instructions in block 'while.cond' belonging to line 2

would get discriminator '1'.

This way, the compiler can now correctly compute 'entry's and 'while.end's

weight as 0. Leaving edge E2 with weight 0 and E1 with weight 87405.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>