<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 04/26/2015 12:49 PM, James Molloy

      wrote:<br>

    </div>

    <blockquote

cite="mid:CALCTSA3R_qeS-nuTPyf7HTZEgeEhyPbMJR=DCCS0reO_vW4SqQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">Hi Philip,<br>

        <br>

        <div>Thanks very much for reviewing my proposal. I should say

          that I generally agree with your points, and am still in

          multiple minds about what the best approach would look like.</div>

        <div><br>

        </div>

        <div><span

            style="font-size:13.1999998092651px;line-height:19.7999992370605px">>

            I don't see the challenge here.  Matching a compare+select

            as a min/max for the purpose of the cost model under a

            target specific hook seems quite straightforward.  What am I

            missing?</span><br>

        </div>

        <div><span

            style="font-size:13.1999998092651px;line-height:19.7999992370605px">>

            ...</span></div>

        <div><span

            style="font-size:13.1999998092651px;line-height:19.7999992370605px">>

            Er, not sure I get your point here.  Not having to match two

            distinct families of representation is an advantage, not a

            disadvantage.  The branch form should be getting converted

            into the select form much earlier in the optimizer.  Which

            cases are you worried about here?</span><span

            style="font-size:13.1999998092651px;line-height:19.7999992370605px"><br>

          </span></div>

        <div><span

            style="font-size:13.1999998092651px;line-height:19.7999992370605px"><br>

          </span></div>

        <div><span

            style="font-size:13.1999998092651px;line-height:19.7999992370605px">The

            awkward part is twofold. Firstly, the example I gave in my

            previous message where InstCombine mangles the pattern by

            pushing an fptoui between the icmp and the select. There's

            two approaches to this I see - first, match early and deal

            with the pattern as an intrinsic so it can't be broken up.

            Second, have a (flexible, deals with edge cases) way of

            matching at any stage in the pipeline - this might be

            ripping some of the matching logic out of InstCombine and

            exposing it as a utility. This might be a lose in compile

            time though.</span></div>

      </div>

    </blockquote>

    First, you're not going to catch every possible case in existence

    with late matching, but you'll hopefully catch enough.  There will

    always be cases you'll miss by matching late, but there are also

    cases you'd miss by matching early.  <br>

    <br>

    For your specific example, I sort of wonder where the transformation

    is actually a profitable one at all.  If you're doing a floating

    point compare, isn't that generally using a different set of flags

    than an integer compare?  I wonder if the domain crossing is more

    expensive for the flags than the floating point value.  If you could

    convert both the select and the compare to use the integer value

    that would be a clear win.  It's not clear to me that the current

    transform is.  <br>

    <br>

    Regardless, reversing that particular transform in CGP is a bit

    ugly, but reasonable.  Is it just a couple of forms you're worried

    about?  Or are there a large number of such patterns showing up in

    the real world examples you're looking at?  <br>

    <blockquote

cite="mid:CALCTSA3R_qeS-nuTPyf7HTZEgeEhyPbMJR=DCCS0reO_vW4SqQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><span

            style="font-size:13.1999998092651px;line-height:19.7999992370605px"><br>

          </span></div>

        <div><span

            style="font-size:13.1999998092651px;line-height:19.7999992370605px">Secondly

            branches. You're right that we speculate branches to

            selects, but only to a small threshold. I upped the

            threshold so at least we'll get "min+min" or "max+min" to be

            speculated, but larger sequences such as "min+min+min"

            (folding four-input minimum) will still end up with a

            branch.</span></div>

      </div>

    </blockquote>

    I think it would be very reasonable to have a TTI call for whether

    the target supported a min/max instruction and then special casing

    the branch->select transformation when it does.  To say this

    differently, a min/max on a platform with such a (cheap!)

    instruction should always be expressed as a cmp/select.  There's no

    reason not to perform the transformation provided that the min/max

    has approximately the same cost as a mov.  (I say mov only because

    you might still need the cmp/br sequence even after handing the

    min/max operations.  If you didn't, it's even more clearly okay.)  <br>

    <br>

    Although, there's a catch in what I just said.  Given movs are

    generally handled via renaming, it's not clear that any min/max

    instruction can be that cheap.  How does that tradeoff work out for

    the architecture you care about?  <br>

    <blockquote

cite="mid:CALCTSA3R_qeS-nuTPyf7HTZEgeEhyPbMJR=DCCS0reO_vW4SqQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><span

            style="font-size:13.1999998092651px;line-height:19.7999992370605px"><br>

          </span></div>

        <div><span

            style="font-size:13.1999998092651px;line-height:19.7999992370605px">I've

            been doing experiments with how much of the compiler I need

            to touch when inserting these intrinsics, and yes I do agree

            with you - random testing shows the compiler does need quite

            a bit of teaching, in InstCombine at least, to not regress

            randomly generated simple testcases. So I'm starting to come

            around more to your way of thinking.</span></div>

        <div><span

            style="font-size:13.1999998092651px;line-height:19.7999992370605px"><br>

          </span></div>

        <div><span

            style="font-size:13.1999998092651px;line-height:19.7999992370605px">The

            only thing I haven't solved in my mind yet if we don't go

            down the intrinsics route is d</span><span

            style="font-size:13.1999998092651px;line-height:19.7999992370605px">ealing

            with branches - we can't just keep upping the speculation

            threshold. Perhaps identify these (and only these) and up

            the speculation threshold in these cases only?</span></div>

        <div><br>

        </div>

        <div>Matching late in CGP would solve the sharing code across

          backends problem, so after your feedback I'm leaning further

          towards this approach.</div>

        <div><br>

        </div>

        <div>Do you have any further comments?</div>

        <div><br>

        </div>

        <div>Cheers,</div>

        <div><br>

        </div>

        <div>James</div>

      </div>

      <br>

      <div class="gmail_quote">On Sun, 26 Apr 2015 at 20:04 Philip

        Reames <<a moz-do-not-send="true"

          href="mailto:listmail@philipreames.com">listmail@philipreames.com</a>>

        wrote:<br>

        <blockquote class="gmail_quote" style="margin:0 0 0

          .8ex;border-left:1px #ccc solid;padding-left:1ex">

          <div bgcolor="#FFFFFF" text="#000000">

            <div>On 04/23/2015 07:42 AM, James Molloy wrote:<br>

            </div>

            <blockquote type="cite">

              <div dir="ltr">Hi all,

                <div><br>

                </div>

                <div>I've just started again on trying to solve the

                  problem of getting decent code generation for min, max

                  and abs idioms as used by the programmer and as

                  emitted by the loop vectorizer.</div>

                <div><br>

                </div>

                <div>I've been looking at doing this as a target

                  DAGCombine, but actually I think:</div>

                <div>  1. it's incredibly complex to do at that stage

                  and it limits all the work I do to just one target.</div>

                <div>  2. It's also much more difficult to test.</div>

                <div>  3. The loop and SLP vectorizers still don't have

                  a cost model for them - they're just seen as

                  compare+selects.</div>

              </div>

            </blockquote>

          </div>

          <div bgcolor="#FFFFFF" text="#000000"> I don't see the

            challenge here.  Matching a compare+select as a min/max for

            the purpose of the cost model under a target specific hook

            seems quite straightforward.  What am I missing?</div>

          <div bgcolor="#FFFFFF" text="#000000"><br>

            <br>

            <blockquote type="cite">

              <div dir="ltr">

                <div>So my proposal is:</div>

                <div>  * To add new intrinsics for minimum, maximum and

                  absolute value. These would have signed int/unsigned

                  int/float variants and be valid across all numeric

                  types.</div>

                <div>  * To add a pass fairly early in the pipeline to

                  idiom recognize and create intrinsics. This would be

                  controllable per-backend - if a backend doesn't have

                  efficient lowering for these operations, perhaps it's

                  best not to do the idiom recognition.</div>

              </div>

            </blockquote>

          </div>

          <div bgcolor="#FFFFFF" text="#000000"> I am strongly opposed

            to this part of the proposal.  I have no problem* adding

            such intrinsics and matching them late (i.e. CodeGenPrep),

            but I am deeply concerned about the negative impacts of

            matching early.  Unless you are volunteering to add support

            for these intrinsics to *every* pass, I believe doing this

            is a non-starter.  As a good example, consider what happened

            recently with the x.with.overflow intrinsics where we were

            missing important simplifications on induction variable

            dependent checks due to early canonicalization to a form

            that the rest of the optimizer didn't understand.  <br>

            <br>

            More generally, I'm not even sure matching these early would

            be the right answer even if you were volunteering to update

            the entire optimizer.  Being able to fold the condition

            (CSE, etc..) independently of the select and then being able

            to exploit a dominating branch is extremely powerful at

            eliminating the min/max operation entirely.  I would be

            deeply concerned about giving up that power without an

            incredible compelling reason. <br>

            <br>

            * By "no problem", I really mean that I have no opinion

            here.  I am neither endorsing nor opposing.  <br>

          </div>

          <div bgcolor="#FFFFFF" text="#000000">

            <blockquote type="cite">

              <div dir="ltr">

                <div><br>

                </div>

                <div>The cost model would then fall out in the wash,

                  because we already have a cost model for intrinsics,

                  it would be as simple as adding new instructions. </div>

              </div>

            </blockquote>

            <br>

            <blockquote type="cite">

              <div dir="ltr">

                <div>Because we idiom recognize at the IR stage instead

                  of the SDAG stage, we also wouldn't have to rely on

                  the min/max idioms being in canonical "select" form;

                  we could match a branch sequence also.</div>

              </div>

            </blockquote>

          </div>

          <div bgcolor="#FFFFFF" text="#000000"> Er, not sure I get your

            point here.  Not having to match two distinct families of

            representation is an advantage, not a disadvantage.  The

            branch form should be getting converted into the select form

            much earlier in the optimizer.  Which cases are you worried

            about here?<br>

          </div>

          <div bgcolor="#FFFFFF" text="#000000">

            <blockquote type="cite">

              <div dir="ltr">

                <div><br>

                </div>

                <div>What do you think? Is this an acceptable proposal?</div>

                <div><br>

                </div>

                <div>Cheers,</div>

                <div><br>

                </div>

                <div>James</div>

              </div>

              <br>

              <fieldset></fieldset>

              <br>

            </blockquote>

          </div>

          <div bgcolor="#FFFFFF" text="#000000">

            <blockquote type="cite">

              <pre>_______________________________________________

llvm-commits mailing list

<a moz-do-not-send="true" href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a>

<a moz-do-not-send="true" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a>

</pre>

            </blockquote>

          </div>

        </blockquote>

      </div>

    </blockquote>

    <br>

  </body>

</html>