<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <br>

    <div class="moz-cite-prefix">On 12/02/2014 11:45 AM, Reid Kleckner

      wrote:<br>

    </div>

    <blockquote

cite="mid:CACs=tyKyUSu-8CCddzNzo9ndk1_a1uOXkiS_+bDzUmsRX68VBw@mail.gmail.com"

      type="cite">

      <div dir="ltr">What if we had a pragma or attribute that lowered

        down to metadata indicating that the variable length trip count

        was small?

        <div><br>

        </div>

        <div>Then backends could choose to lower short memcpys to an

          inlined, slightly widened loop. For example, 'rep movsq' on

          x86_64.</div>

        <div><br>

        </div>

        <div>That seems nice from the compiler perspective, since it

          preserves the canonical form and we get the same kind of

          information from profiling. Then again, I can imagine most

          game dev users just want control and don't want to change

          their code.</div>

      </div>

    </blockquote>

    I like this general idea.  Here's another possibility...<br>

    <br>

    We actually already have such a construct in the form of the expect

    builtins. 

<a class="moz-txt-link-freetext" href="http://llvm.org/docs/BranchWeightMetadata.html#built-in-expect-instructions">http://llvm.org/docs/BranchWeightMetadata.html#built-in-expect-instructions</a><br>

    <br>

    One way to structure this would be:<br>

    if (__builtin_expect(N < SmallSize, 1)) {<br>

      //small loop here<br>

    } else {<br>

      // memcpy here<br>

      // or unreachable if you're really brave<br>

    }<br>

    <br>

    I could see us failing to exploit this of course.  :)<br>

    <br>

    <br>

    <blockquote

cite="mid:CACs=tyKyUSu-8CCddzNzo9ndk1_a1uOXkiS_+bDzUmsRX68VBw@mail.gmail.com"

      type="cite">

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Tue, Dec 2, 2014 at 11:23 AM, Robert

          Lougher <span dir="ltr"><<a moz-do-not-send="true"

              href="mailto:rob.lougher@gmail.com" target="_blank">rob.lougher@gmail.com</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>

            <br>

            In feedback from game studios a common issue is the

            replacement of<br>

            loops with calls to memcpy/memset.  These loops are often<br>

            hand-optimised, and highly-efficient and the developers

            strongly want<br>

            a way to control the compiler (i.e. leave my loop alone).<br>

            <br>

            The culprit is of course the loop-idiom recognizer.  This

            replaces any<br>

            loop that looks like a memset/memcpy with calls.  This

            affects loops<br>

            with both a variable and a constant trip-count.  The

            question is, does<br>

            this make sense in all cases?  Also, should the compiler

            provide a way<br>

            to turn it off for certain types of loop, or on a loop

            individually?<br>

            The standard answer is to use -fno-builtin but this does not

            provide<br>

            fine-grain control (e.g. we may want the loop-idiom to

            recognise<br>

            constant loops but not variable loops).<br>

            <br>

            As an example, it could be argued that replacing constant

            loops always<br>

            makes sense.  Here the compiler knows how big the

            memset/memcpy is and<br>

            can make an accurate decision.  For small values the

            memcpy/memset<br>

            will be expanded inline, while larger values will remain a

            call, but<br>

            due to the size the overhead will be negligible.<br>

            <br>

            On the other hand, the compiler knows very little about

            variable loops<br>

            (the loop could be used primarily for copying 10 bytes or 10

            Mbytes,<br>

            the compiler doesn't know).  The compiler will replace it

            with a call,<br>

            but as it is variable it will not be expanded inline.  In

            this case<br>

            small values may see significant overhead in comparison to

            the<br>

            original loop.  The game studio examples all fall into this

            category.<br>

            <br>

            The loop-idiom recognizer also has no notion of "quality" -

            it always<br>

            assumes that replacing the loop makes sense.  While it might

            be the<br>

            case for a naive byte-copy, some of the examples we've seen

            have been<br>

            carefully tuned.<br>

            <br>

            So, to summarise, we feel that there's sufficient

            justification to add<br>

            some sort of user-control.  However, we do not want to

            suggest a<br>

            solution, but prefer to start a discussion, and obtain

            opinions.  So<br>

            to start, how do people feel about:<br>

            <br>

            - A switch to disable loop-idiom recognizer completely?<br>

            <br>

            - A switch to disable loop-idiom recognizer for loops with

            variable trip count?<br>

            <br>

            - A switch to disable loop-idiom recognizer for loops with

            constant<br>

            trip count (can't see this being much use)?<br>

            <br>

            - Per-function control of loop-idiom recognizer (which must

            work with LTO)?<br>

            <br>

            Thanks for any feedback!<br>

            Rob.<br>

            <br>

            --<br>

            Robert Lougher<br>

            SN Systems - Sony Computer Entertainment Group<br>

            _______________________________________________<br>

            LLVM Developers mailing list<br>

            <a moz-do-not-send="true" href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a> 

                   <a moz-do-not-send="true"

              href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

            <a moz-do-not-send="true"

              href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev"

              target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

          </blockquote>

        </div>

        <br>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

LLVM Developers mailing list

<a class="moz-txt-link-abbreviated" href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a class="moz-txt-link-freetext" href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a>

<a class="moz-txt-link-freetext" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>