<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    On 07/07/2016 07:10 PM, Daniel Berlin wrote:<br>

    <blockquote

cite="mid:CAF4BwTU+bfVC+kKvjjBHOLraQ_YEwzk0-R7S7BCBqzqLR5mH-w@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div>Two thoughts after staring at it:<br>

              <br>

            </div>

            <div>1. If you were to form SESE/etc regions and process

              them in topo order, you could actually do even better than

              you do now.   You would be able to validly say "the first

              X regions of this function are the same". You currently do

              this in some cases, but you could do it in more :)</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    Yep, that's a good idea. You could then also outline identical

    regions from functions... :) <br>

    <br>

    An easy improvement to the current traversal algorithm would be to

    enqueue successor blocks in the order of their instruction count as

    opposed to the order they appear in the terminator instruction. That

    way, you could handle easy cases where the successor blocks are

    swapped but otherwise equivalent or similar.<br>

    <br>

    <blockquote

cite="mid:CAF4BwTU+bfVC+kKvjjBHOLraQ_YEwzk0-R7S7BCBqzqLR5mH-w@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div>2. You could integrate most of what you are doing with

              mergefuncs with a bit of work.</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    Depends on how you define "a bit of work" ;) See below.<br>

    <br>

    <blockquote

cite="mid:CAF4BwTU+bfVC+kKvjjBHOLraQ_YEwzk0-R7S7BCBqzqLR5mH-w@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              As to whether this could be integrated into the in-tree

              MergeFunctions... Well, this started off as a patch to

              MergeFunctions back in 2013. However, the in-tree

              MergeFunctions has undergone significant architectural

              changes since then; it now uses a total ordering of

              functions to speed up merging. This is great if you only

              want to merge identical functions, but it doesn't work for

              merging of similar functions.<br>

            </blockquote>

            <div><br>

            </div>

            <div>This is not correct.</div>

            <div>You could use simhash, for example, or any thing that

              will give you actual similarity metrics in addition to

              total ordering.</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    What I meant to say is that it doesn't make sense to use the

    *current* hash function and the *current* total ordering criterion

    in the in-tree MergeFunctions if you want to merge similar

    functions.<br>

    <br>

    Why? <br>

    <br>

    1. The hash function includes the instruction opcodes. Similar, but

    non-identical, functions will end up in different buckets. We'd have

    to replace the hash function by something less precise for similar

    merging. Hashing the CFG structure is a neat idea though, I should

    be doing that too.<br>

    <br>

    2. The total ordering criterion (used inside the buckets) in

    MergeFunctions is based on equality. A very large chunk of the

    MergeFunctions code deals with those comparisons (see all the cmpXYZ

    functions). Similar functions won't be near each other if you use

    that specific ordering criterion, so it's not much use. We'd need to

    replace the criterion by something else - and yes, similarity

    hashing is a great option for that (although not yet implemented so

    unclear how well it'd work for IR). <br>

    <br>

    So essentially, to make this work in MergeFunctions, we'd have to

    rip out two of its key advantages (which are great if you only want

    to merge identical functions)... and you'd essentially get this pass

    :)<br>

    <br>

    Am I missing something?<br>

    <br>

    Since the two MergeFunctions share a common ancestor, there is some

    shared code; basically a shared skeleton. So it should be possible

    to factor that out into a "MergeFunctionsUtils" sort of module.

    Perhaps that's a way to go if people are happy with that?<br>

    <br>

    Thanks,<br>

    Tobias<br>

    <pre class="moz-signature" cols="72">-- 

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,

a Linux Foundation Collaborative Project.

</pre>

  </body>

</html>