[PATCH] D17859: [InstCombine] convert 'isPositive' and 'isNegative' vector comparisons to shifts (PR26701, PR26819)

Thu May 12 15:18:56 PDT 2016

chandlerc added inline comments.

================
Comment at: lib/Transforms/InstCombine/InstCombineCasts.cpp:985
@@ +984,3 @@
+        // because vector ISAs may not have a full range of comparison
+        // operators. This cmp+sext transform, however, will simplify the IR, so
+        // we always do it.
----------------
spatel wrote:
> chandlerc wrote:
> > hfinkel wrote:
> > > spatel wrote:
> > > > joker.eph wrote:
> > > > > I usually see canonicalization as something that is target independent, and then after optimization the lowering can adjust toward something that the target will "like" (ultimately the SDAG legalizer is doing this). Any comment?
> > > > That sounds right to me in general. But are you suggesting that we should do the comparison canonicalizations for vector types too? Ie, (x <=s -1) --> (x <s 0).
> > > > 
> > > > My code comment here is my assumption about why we *don't* do that - vector ISAs may have limited functionality, so some backends (x86 at least) would have to then undo that canonicalization.
> > > > 
> > > > I see 2 reasons to justify this patch as written:
> > > > 1. The shift transform is an actual IR optimization, not just a canonicalization - it reduces the instruction count and/or complexity of the IR.
> > > > 2. We're already doing this (shift) transform for some comparisons, so we might as well include the other, related comparison operators where it can apply.
> > > > That sounds right to me in general. But are you suggesting that we should do the comparison canonicalizations for vector types too? Ie, (x <=s -1) --> (x <s 0).
> > > 
> > > Yes. Having different canonicalization rules for vectors and scalars is confusing, and leads to problems like this. Keeping track of what we do, in general, is hard. Keeping track of different rules for scalars and vectors should be avoided if possible.
> > > 
> > > > My code comment here is my assumption about why we *don't* do that - vector ISAs may have limited functionality, so some backends (x86 at least) would have to then undo that canonicalization.
> > > 
> > > Often times these things are just oversights, or were done a long time ago when the backend infrastructure was not as mature. I don't know how we ended up in this state on this particular issue, but given that matching this in the backend seems fairly easy (in either form), we should do that.
> > > 
> > +1 to what Hal said.
> > 
> > We should canonicalize vectors and scalars aggressively and solve problems that arise in the backend.
> > 
> > We shouldn't add more code to match non-canonical patterns in instcombine, we should fix the canonicalization.
> Some day I'll get canonicalization right. :)
> 
> I've drafted a patch to do as suggested here, but now I'm wondering if there's another reason not to treat vectors and scalars the same for the purposes of icmp with constant: there's a fundamental difference between vector and scalar icmp ops.
> 
> Example:
> icmp uge i32 %x, 0  --> true
> icmp uge <2 x i32> %x, <i32 0, i32 42> --> ?
> 
> We want to subtract 1 from the constant operand and canonicalize to 'ugt' here. We don't have to worry about the first case in InstCombiner::visitICmpInst() because InstSimplify takes care of those kinds of edge cases. That doesn't exist for vectors (although it probably should for a splatted constant vector). 
> 
> Is it still worth doing this canonicalization if it doesn't work in general?
This is a great question.

I'm not sure. I can see at least three options:

1) Canonicalize to ugt when we can, but not always, much as your code does here. Teach passes that assume we canonicalized on ugt to consider the uge case for vectors.

2) Drop all canonicalizing to ugt, and audit the code that matches on ugt to also match on uge.

3) Canonicalize the vector case to ugt as well by replacing the boundary elements with undef, and then inserting a 'true' over the resulting i1 lane coming out of the icmp.

While #1 is clearly the simplest and safest approach, I'm not sure what the right long term solution is here. #3 seems really appealing if it works, but I'm worried that it the insertion of the true will end up causing problems downstream. Might be worth some experimentation though.

http://reviews.llvm.org/D17859