<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 04/26/2015 12:49 PM, James Molloy
wrote:<br>
</div>
<blockquote
cite="mid:CALCTSA3R_qeS-nuTPyf7HTZEgeEhyPbMJR=DCCS0reO_vW4SqQ@mail.gmail.com"
type="cite">
<div dir="ltr">Hi Philip,<br>
<br>
<div>Thanks very much for reviewing my proposal. I should say
that I generally agree with your points, and am still in
multiple minds about what the best approach would look like.</div>
<div><br>
</div>
<div><span
style="font-size:13.1999998092651px;line-height:19.7999992370605px">>
I don't see the challenge here. Matching a compare+select
as a min/max for the purpose of the cost model under a
target specific hook seems quite straightforward. What am I
missing?</span><br>
</div>
<div><span
style="font-size:13.1999998092651px;line-height:19.7999992370605px">>
...</span></div>
<div><span
style="font-size:13.1999998092651px;line-height:19.7999992370605px">>
Er, not sure I get your point here. Not having to match two
distinct families of representation is an advantage, not a
disadvantage. The branch form should be getting converted
into the select form much earlier in the optimizer. Which
cases are you worried about here?</span><span
style="font-size:13.1999998092651px;line-height:19.7999992370605px"><br>
</span></div>
<div><span
style="font-size:13.1999998092651px;line-height:19.7999992370605px"><br>
</span></div>
<div><span
style="font-size:13.1999998092651px;line-height:19.7999992370605px">The
awkward part is twofold. Firstly, the example I gave in my
previous message where InstCombine mangles the pattern by
pushing an fptoui between the icmp and the select. There's
two approaches to this I see - first, match early and deal
with the pattern as an intrinsic so it can't be broken up.
Second, have a (flexible, deals with edge cases) way of
matching at any stage in the pipeline - this might be
ripping some of the matching logic out of InstCombine and
exposing it as a utility. This might be a lose in compile
time though.</span></div>
</div>
</blockquote>
First, you're not going to catch every possible case in existence
with late matching, but you'll hopefully catch enough. There will
always be cases you'll miss by matching late, but there are also
cases you'd miss by matching early. <br>
<br>
For your specific example, I sort of wonder where the transformation
is actually a profitable one at all. If you're doing a floating
point compare, isn't that generally using a different set of flags
than an integer compare? I wonder if the domain crossing is more
expensive for the flags than the floating point value. If you could
convert both the select and the compare to use the integer value
that would be a clear win. It's not clear to me that the current
transform is. <br>
<br>
Regardless, reversing that particular transform in CGP is a bit
ugly, but reasonable. Is it just a couple of forms you're worried
about? Or are there a large number of such patterns showing up in
the real world examples you're looking at? <br>
<blockquote
cite="mid:CALCTSA3R_qeS-nuTPyf7HTZEgeEhyPbMJR=DCCS0reO_vW4SqQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><span
style="font-size:13.1999998092651px;line-height:19.7999992370605px"><br>
</span></div>
<div><span
style="font-size:13.1999998092651px;line-height:19.7999992370605px">Secondly
branches. You're right that we speculate branches to
selects, but only to a small threshold. I upped the
threshold so at least we'll get "min+min" or "max+min" to be
speculated, but larger sequences such as "min+min+min"
(folding four-input minimum) will still end up with a
branch.</span></div>
</div>
</blockquote>
I think it would be very reasonable to have a TTI call for whether
the target supported a min/max instruction and then special casing
the branch->select transformation when it does. To say this
differently, a min/max on a platform with such a (cheap!)
instruction should always be expressed as a cmp/select. There's no
reason not to perform the transformation provided that the min/max
has approximately the same cost as a mov. (I say mov only because
you might still need the cmp/br sequence even after handing the
min/max operations. If you didn't, it's even more clearly okay.) <br>
<br>
Although, there's a catch in what I just said. Given movs are
generally handled via renaming, it's not clear that any min/max
instruction can be that cheap. How does that tradeoff work out for
the architecture you care about? <br>
<blockquote
cite="mid:CALCTSA3R_qeS-nuTPyf7HTZEgeEhyPbMJR=DCCS0reO_vW4SqQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><span
style="font-size:13.1999998092651px;line-height:19.7999992370605px"><br>
</span></div>
<div><span
style="font-size:13.1999998092651px;line-height:19.7999992370605px">I've
been doing experiments with how much of the compiler I need
to touch when inserting these intrinsics, and yes I do agree
with you - random testing shows the compiler does need quite
a bit of teaching, in InstCombine at least, to not regress
randomly generated simple testcases. So I'm starting to come
around more to your way of thinking.</span></div>
<div><span
style="font-size:13.1999998092651px;line-height:19.7999992370605px"><br>
</span></div>
<div><span
style="font-size:13.1999998092651px;line-height:19.7999992370605px">The
only thing I haven't solved in my mind yet if we don't go
down the intrinsics route is d</span><span
style="font-size:13.1999998092651px;line-height:19.7999992370605px">ealing
with branches - we can't just keep upping the speculation
threshold. Perhaps identify these (and only these) and up
the speculation threshold in these cases only?</span></div>
<div><br>
</div>
<div>Matching late in CGP would solve the sharing code across
backends problem, so after your feedback I'm leaning further
towards this approach.</div>
<div><br>
</div>
<div>Do you have any further comments?</div>
<div><br>
</div>
<div>Cheers,</div>
<div><br>
</div>
<div>James</div>
</div>
<br>
<div class="gmail_quote">On Sun, 26 Apr 2015 at 20:04 Philip
Reames <<a moz-do-not-send="true"
href="mailto:listmail@philipreames.com">listmail@philipreames.com</a>>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>On 04/23/2015 07:42 AM, James Molloy wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Hi all,
<div><br>
</div>
<div>I've just started again on trying to solve the
problem of getting decent code generation for min, max
and abs idioms as used by the programmer and as
emitted by the loop vectorizer.</div>
<div><br>
</div>
<div>I've been looking at doing this as a target
DAGCombine, but actually I think:</div>
<div> 1. it's incredibly complex to do at that stage
and it limits all the work I do to just one target.</div>
<div> 2. It's also much more difficult to test.</div>
<div> 3. The loop and SLP vectorizers still don't have
a cost model for them - they're just seen as
compare+selects.</div>
</div>
</blockquote>
</div>
<div bgcolor="#FFFFFF" text="#000000"> I don't see the
challenge here. Matching a compare+select as a min/max for
the purpose of the cost model under a target specific hook
seems quite straightforward. What am I missing?</div>
<div bgcolor="#FFFFFF" text="#000000"><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>So my proposal is:</div>
<div> * To add new intrinsics for minimum, maximum and
absolute value. These would have signed int/unsigned
int/float variants and be valid across all numeric
types.</div>
<div> * To add a pass fairly early in the pipeline to
idiom recognize and create intrinsics. This would be
controllable per-backend - if a backend doesn't have
efficient lowering for these operations, perhaps it's
best not to do the idiom recognition.</div>
</div>
</blockquote>
</div>
<div bgcolor="#FFFFFF" text="#000000"> I am strongly opposed
to this part of the proposal. I have no problem* adding
such intrinsics and matching them late (i.e. CodeGenPrep),
but I am deeply concerned about the negative impacts of
matching early. Unless you are volunteering to add support
for these intrinsics to *every* pass, I believe doing this
is a non-starter. As a good example, consider what happened
recently with the x.with.overflow intrinsics where we were
missing important simplifications on induction variable
dependent checks due to early canonicalization to a form
that the rest of the optimizer didn't understand. <br>
<br>
More generally, I'm not even sure matching these early would
be the right answer even if you were volunteering to update
the entire optimizer. Being able to fold the condition
(CSE, etc..) independently of the select and then being able
to exploit a dominating branch is extremely powerful at
eliminating the min/max operation entirely. I would be
deeply concerned about giving up that power without an
incredible compelling reason. <br>
<br>
* By "no problem", I really mean that I have no opinion
here. I am neither endorsing nor opposing. <br>
</div>
<div bgcolor="#FFFFFF" text="#000000">
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>The cost model would then fall out in the wash,
because we already have a cost model for intrinsics,
it would be as simple as adding new instructions. </div>
</div>
</blockquote>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>Because we idiom recognize at the IR stage instead
of the SDAG stage, we also wouldn't have to rely on
the min/max idioms being in canonical "select" form;
we could match a branch sequence also.</div>
</div>
</blockquote>
</div>
<div bgcolor="#FFFFFF" text="#000000"> Er, not sure I get your
point here. Not having to match two distinct families of
representation is an advantage, not a disadvantage. The
branch form should be getting converted into the select form
much earlier in the optimizer. Which cases are you worried
about here?<br>
</div>
<div bgcolor="#FFFFFF" text="#000000">
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>What do you think? Is this an acceptable proposal?</div>
<div><br>
</div>
<div>Cheers,</div>
<div><br>
</div>
<div>James</div>
</div>
<br>
<fieldset></fieldset>
<br>
</blockquote>
</div>
<div bgcolor="#FFFFFF" text="#000000">
<blockquote type="cite">
<pre>_______________________________________________
llvm-commits mailing list
<a moz-do-not-send="true" href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a>
<a moz-do-not-send="true" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a>
</pre>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
<br>
</body>
</html>