[Patch]New InstCombine pattern for Icmp

Mon Aug 11 09:17:44 PDT 2014

AFAICT, correctness wise the patch looks good. I've translated your 
optimization into Alive by hand and now it seems good.
Now I wonder whether the optimization still kicks in on your benchmarks? 
(given that now the constraints are much thigher)

Nuno

----- Original Message ----- 
From: "Yi Jiang" <yjiang at apple.com>
Sent: Thursday, August 07, 2014 8:37 AM
Subject: Re: [Patch]New InstCombine pattern for Icmp

> With the help of Nuno and the Alive tool, I figured out the new condition. 
> The original condition actually refers to the following two ranges:
> (icmp ult/ule (A + C1), C3) | (icmp ult/ule (A + C2), C3) == 
> [MAX_UINT-C1+1, MAX_UINT-C1+1+C3] and [MAX_UINT-C2+1, MAX_UINT-C2+1+C3]
> We can fold these two range sif:
> 1) C1 and C2 is unsigned greater than C3
> 2) The two ranges are separated.  abs(LowRange1-LowRange2) > C3
> 3) C1 ^ C2 is one-bit mask.
> 4) LowRange1 ^ LowRange2 and HighRange1 ^ HighRange2 are one-bit mask.
>
> Here is the new patch:
>
>
> On Jul 30, 2014, at 2:24 PM, Nuno Lopes <nuno.lopes at ist.utl.pt> wrote:
>
>> Hi Yi,
>>
>> ALIVe still complains, I'm afraid:
>>
>> Precondition: isPowerOf2(C1 ^ C2) && ugt(C1, C3) && ugt(C2, C3)
>> %x = add %A, C1
>> %i = icmp ult %x, C3
>> %y = add %A, C2
>> %j = icmp ult %y, C3
>> %r = or %i, %j
>> =>
>> %and = and %A, ~(C1 ^ C2)
>> %lhs = add %and, umax(C1, C2)
>> %r = icmp ult %lhs, C3
>>
>> Done: 1
>> ERROR: Mismatch in values of i1 %r
>>
>> Example:
>> %A i2 = 2 (0x2)
>> C1 i2 = 3 (0x3)
>> %x i2 = 1 (0x1)
>> C3 i2 = 1 (0x1)
>> %i i1 = 0 (0x0)
>> C2 i2 = 2 (0x2)
>> %y i2 = 0 (0x0)
>> %j i1 = 1 (0x1)
>> %and i2 = 2 (0x2)
>> %lhs i2 = 1 (0x1)
>> Source value: 1 (0x1)
>> Target value: 0 (0x0)
>>
>>
>> Nuno
>>
>> ----- Original Message ----- From: "Yi Jiang" <yjiang at apple.com>
>> To: "Nuno Lopes" <nuno.lopes at ist.utl.pt>
>> Cc: "LLVM Commits" <llvm-commits at cs.uiuc.edu>
>> Sent: Wednesday, July 30, 2014 8:28 PM
>> Subject: Re: [Patch]New InstCombine pattern for Icmp
>>
>>
>> Hi Nuno,
>>
>> Thank you for pointing out this! One condition is missing: C1 and C2 
>> should be unsigned greater than C3.
>> Here is the new patch.
>>
>> -Yi
>>
>> On Jul 29, 2014, at 11:34 AM, Nuno Lopes <nuno.lopes at ist.utl.pt> wrote:
>>
>>> Hi,
>>>
>>> I sent you an email the other day, but it was rejected by Apple's email 
>>> server.  Tyring now through another email account.
>>> Please take a look here: 
>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140721/227575.html
>>>
>>> Nuno
>>>
>>> ----- Original Message -----
>>>
>>> Ping...
>>> On Jul 25, 2014, at 4:11 PM, Yi Jiang <yjiang at apple.com> wrote:
>>>
>>>> Hi Ben,
>>>>
>>>> Thank you for your comments! Here is a new version.  In this version, I 
>>>> still kept some "if" thinking that some operand has been already 
>>>> extracted so I would like to use it directly.  Please let me know if 
>>>> any other comments.
>>>> <combinecmpV2.patch>
>>>> -Yi
>>>> On Jul 25, 2014, at 2:12 AM, Benjamin Kramer <benny.kra at gmail.com> 
>>>> wrote:
>>>>
>>>>> On Fri, Jul 25, 2014 at 1:24 AM, Yi Jiang <yjiang at apple.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> his patch is trying to fold (icmp ult/ule (A + C1), C3) | (icmp 
>>>>>> ult/ule (A +
>>>>>> C2), C3)  to (icmp ult/ule ((A & ~(C1 ^ C2)) + max(C1, C2)), C3)  .
>>>>>> This transformation is legal if C1 ^ C2 is one-bit mask, in other 
>>>>>> word, C1
>>>>>> is only one bit different from C2. In this case, we can "mask" that 
>>>>>> bit and
>>>>>> do just one comparison.
>>>>>> A typical example is:
>>>>>> isALPHA(c)    ((c) >= 'A' && (c) <= 'Z') || ((c) >= 'a' && (c) <= 
>>>>>> 'z')
>>>>>>
>>>>>> Now llvm will optimize it to ((c + 191) <=25) || ((c + 159) <=25)
>>>>>>
>>>>>> With this patch, we can optimize it further to:
>>>>>> (c & 223) + 191 <= 25
>>>>>>
>>>>>> The binary format of the constants are:
>>>>>> 191   10111111
>>>>>> 159   10011111
>>>>>> 223   11011111
>>>>>>
>>>>>> Here is some experiment result on arm64:
>>>>>> The patch shows no regression and improve spec2000 perlbmk 3.8% in
>>>>>> test-suite under -O3.
>>>>>> We also test the spec2006 400.perlbench with ref size input, it will 
>>>>>> improve
>>>>>> 1% under -O3 and 1.2% under -O3+lto+pgo.
>>>>>
>>>>> Wow, very nice.
>>>>>
>>>>>> Any comments are appreciated.
>>>>>
>>>>> The patch could be significantly simplified with the PatternMatch
>>>>> tool. Something like match(LHS, m_Add(m_OneUse(m_Value(A),
>>>>> m_ConstantInt(LAddCst)... could replace your if chain.
>>>>>
>>>>> - Ben