[Patch]New InstCombine pattern for Icmp

Gerolf Hoflehner ghoflehner at apple.com
Wed Aug 13 15:49:57 PDT 2014


LGTM-24hr. If you don’t get a review within 24hr make mine count :-)

My disclosure should make people nervous and get you a higher quality review, though:
1) I’m not an expert in the InstCombine code
2) Even better, it is the first time I have ever I looked into that code
3) Yi’s original logic looked OK to me already until Nuno and his magic tool uncovered
all the subtle conditions under which his transformation holds.

Cheers
Gerolf



Nits:

- there should be a space between ranges and if
- typo: sep*e*rated -> separated
- “It implies that values in these two ranges are just one bit difference.”: would the following sound better?
“This implies all values in the two ranges differ by exactly one bit”?

+  // We can fold these two rangesif:
+  // 1) C1 and C2 is unsigned greater than C3
+  // 2) The two ranges are seperated.
+  // 3) C1 ^ C2 is one-bit mask.
+  // 4) LowRange1 ^ LowRange2 and HighRange1 ^ HighRange2 are one-bit mask. It
+  // implies that values in these two ranges are just one bit difference.
+

- It might make the code more logical/readable if the order of the conditions in the code matched the comment
order above.

+        if (LowRangeDiff.isPowerOf2() && LowRangeDiff == HighRangeDiff &&
+            DiffCst.isPowerOf2() && RangeDiff.ugt(LHSCst->getValue())) {
+          Value *MaskCst = ConstantInt::get(LAddCst->getType(), ~DiffCst);
+



On Aug 13, 2014, at 12:59 PM, Yi Jiang <yjiang at apple.com> wrote:

> Ping ...
> On Aug 11, 2014, at 10:19 AM, Yi Jiang <yjiang at apple.com> wrote:
> 
>> Hi Nuno, 
>> Yes. it does. We have already extracted the test case to reflect the pattern we want to catch most and will go with the patch. We believe the most useful case is to optimize:
>> ((c) >= 'A' && (c) <= 'Z') || ((c) >= 'a' && (c) <= 'z’).  Thank you.
>> 
>> -Yi
>> 
>> On Aug 11, 2014, at 9:17 AM, Nuno Lopes <nuno.lopes at ist.utl.pt> wrote:
>> 
>>> AFAICT, correctness wise the patch looks good. I've translated your optimization into Alive by hand and now it seems good.
>>> Now I wonder whether the optimization still kicks in on your benchmarks? (given that now the constraints are much thigher)
>>> 
>>> Nuno
>>> 
>>> ----- Original Message ----- From: "Yi Jiang" <yjiang at apple.com>
>>> Sent: Thursday, August 07, 2014 8:37 AM
>>> Subject: Re: [Patch]New InstCombine pattern for Icmp
>>> 
>>> 
>>>> With the help of Nuno and the Alive tool, I figured out the new condition. The original condition actually refers to the following two ranges:
>>>> (icmp ult/ule (A + C1), C3) | (icmp ult/ule (A + C2), C3) == [MAX_UINT-C1+1, MAX_UINT-C1+1+C3] and [MAX_UINT-C2+1, MAX_UINT-C2+1+C3]
>>>> We can fold these two range sif:
>>>> 1) C1 and C2 is unsigned greater than C3
>>>> 2) The two ranges are separated.  abs(LowRange1-LowRange2) > C3
>>>> 3) C1 ^ C2 is one-bit mask.
>>>> 4) LowRange1 ^ LowRange2 and HighRange1 ^ HighRange2 are one-bit mask.
>>>> 
>>>> Here is the new patch:
>>>> 
>>>> 
>>>> On Jul 30, 2014, at 2:24 PM, Nuno Lopes <nuno.lopes at ist.utl.pt> wrote:
>>>> 
>>>>> Hi Yi,
>>>>> 
>>>>> ALIVe still complains, I'm afraid:
>>>>> 
>>>>> Precondition: isPowerOf2(C1 ^ C2) && ugt(C1, C3) && ugt(C2, C3)
>>>>> %x = add %A, C1
>>>>> %i = icmp ult %x, C3
>>>>> %y = add %A, C2
>>>>> %j = icmp ult %y, C3
>>>>> %r = or %i, %j
>>>>> =>
>>>>> %and = and %A, ~(C1 ^ C2)
>>>>> %lhs = add %and, umax(C1, C2)
>>>>> %r = icmp ult %lhs, C3
>>>>> 
>>>>> Done: 1
>>>>> ERROR: Mismatch in values of i1 %r
>>>>> 
>>>>> Example:
>>>>> %A i2 = 2 (0x2)
>>>>> C1 i2 = 3 (0x3)
>>>>> %x i2 = 1 (0x1)
>>>>> C3 i2 = 1 (0x1)
>>>>> %i i1 = 0 (0x0)
>>>>> C2 i2 = 2 (0x2)
>>>>> %y i2 = 0 (0x0)
>>>>> %j i1 = 1 (0x1)
>>>>> %and i2 = 2 (0x2)
>>>>> %lhs i2 = 1 (0x1)
>>>>> Source value: 1 (0x1)
>>>>> Target value: 0 (0x0)
>>>>> 
>>>>> 
>>>>> Nuno
>>>>> 
>>>>> ----- Original Message ----- From: "Yi Jiang" <yjiang at apple.com>
>>>>> To: "Nuno Lopes" <nuno.lopes at ist.utl.pt>
>>>>> Cc: "LLVM Commits" <llvm-commits at cs.uiuc.edu>
>>>>> Sent: Wednesday, July 30, 2014 8:28 PM
>>>>> Subject: Re: [Patch]New InstCombine pattern for Icmp
>>>>> 
>>>>> 
>>>>> Hi Nuno,
>>>>> 
>>>>> Thank you for pointing out this! One condition is missing: C1 and C2 should be unsigned greater than C3.
>>>>> Here is the new patch.
>>>>> 
>>>>> -Yi
>>>>> 
>>>>> On Jul 29, 2014, at 11:34 AM, Nuno Lopes <nuno.lopes at ist.utl.pt> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I sent you an email the other day, but it was rejected by Apple's email server.  Tyring now through another email account.
>>>>>> Please take a look here: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140721/227575.html
>>>>>> 
>>>>>> Nuno
>>>>>> 
>>>>>> ----- Original Message -----
>>>>>> 
>>>>>> Ping...
>>>>>> On Jul 25, 2014, at 4:11 PM, Yi Jiang <yjiang at apple.com> wrote:
>>>>>> 
>>>>>>> Hi Ben,
>>>>>>> 
>>>>>>> Thank you for your comments! Here is a new version.  In this version, I still kept some "if" thinking that some operand has been already extracted so I would like to use it directly.  Please let me know if any other comments.
>>>>>>> <combinecmpV2.patch>
>>>>>>> -Yi
>>>>>>> On Jul 25, 2014, at 2:12 AM, Benjamin Kramer <benny.kra at gmail.com> wrote:
>>>>>>> 
>>>>>>>> On Fri, Jul 25, 2014 at 1:24 AM, Yi Jiang <yjiang at apple.com> wrote:
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> his patch is trying to fold (icmp ult/ule (A + C1), C3) | (icmp ult/ule (A +
>>>>>>>>> C2), C3)  to (icmp ult/ule ((A & ~(C1 ^ C2)) + max(C1, C2)), C3)  .
>>>>>>>>> This transformation is legal if C1 ^ C2 is one-bit mask, in other word, C1
>>>>>>>>> is only one bit different from C2. In this case, we can "mask" that bit and
>>>>>>>>> do just one comparison.
>>>>>>>>> A typical example is:
>>>>>>>>> isALPHA(c)    ((c) >= 'A' && (c) <= 'Z') || ((c) >= 'a' && (c) <= 'z')
>>>>>>>>> 
>>>>>>>>> Now llvm will optimize it to ((c + 191) <=25) || ((c + 159) <=25)
>>>>>>>>> 
>>>>>>>>> With this patch, we can optimize it further to:
>>>>>>>>> (c & 223) + 191 <= 25
>>>>>>>>> 
>>>>>>>>> The binary format of the constants are:
>>>>>>>>> 191   10111111
>>>>>>>>> 159   10011111
>>>>>>>>> 223   11011111
>>>>>>>>> 
>>>>>>>>> Here is some experiment result on arm64:
>>>>>>>>> The patch shows no regression and improve spec2000 perlbmk 3.8% in
>>>>>>>>> test-suite under -O3.
>>>>>>>>> We also test the spec2006 400.perlbench with ref size input, it will improve
>>>>>>>>> 1% under -O3 and 1.2% under -O3+lto+pgo.
>>>>>>>> 
>>>>>>>> Wow, very nice.
>>>>>>>> 
>>>>>>>>> Any comments are appreciated.
>>>>>>>> 
>>>>>>>> The patch could be significantly simplified with the PatternMatch
>>>>>>>> tool. Something like match(LHS, m_Add(m_OneUse(m_Value(A),
>>>>>>>> m_ConstantInt(LAddCst)... could replace your if chain.
>>>>>>>> 
>>>>>>>> - Ben 
>>> 
>> 
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits





More information about the llvm-commits mailing list