[llvm-commits] [PATCH] fold umax(zext A, zext B) -> zext (umax(A, B))

Thu Jun 18 19:08:21 PDT 2009

Dan Gohman wrote:
> On Jun 18, 2009, at 12:37 AM, Török Edwin wrote:
> 
> 
>> On 2009-06-18 01:12, Dan Gohman wrote:
>>
>>> On Jun 17, 2009, at 1:44 PM, Török Edwin wrote:
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>> On 2009-06-17 23:25, Dan Gohman wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> On Jun 17, 2009, at 11:35 AM, Török Edwin wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> I noticed that umax (zext t1 %X to %t0, zext t1 %Y to %t0) isn't
>>>>>>
>>>>>>
>>>>>>
>>>>>> folded,
>>>>>>
>>>>>>
>>>>>>
>>>>>> the attached patch folds this into:
>>>>>>
>>>>>>
>>>>>>
>>>>>> umax (zext t1 %X to %t0, zext t1 %Y to %t0).
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> It also folds umax (sext t1 %X to %t0, sext t1 %Y to %t0) ->  sext
>>>>>>
>>>>>> t1
>>>>>>
>>>>>>
>>>>>>
>>>>>> (umax (%X, %Y)) to %t0.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> zext is very often encountered in SCEV expressions on x86-64,  
>>>>>> since
>>>>>>
>>>>>>
>>>>>>
>>>>>> pointer indexes for GEP are i64.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> Another question to ask is whether this kind of thing belongs in
>>>>>
>>>>>
>>>>>
>>>>> ScalarEvolution, or if it would be more appropriate for
>>>>>
>>>>>
>>>>>
>>>>> instcombine. Instcombine looks at all instructions in a program,
>>>>>
>>>>>
>>>>>
>>>>> while ScalarEvolution typically only looks at those related to
>>>>>
>>>>>
>>>>>
>>>>> loop iteration. Also, instcombine could more easily handle more
>>>>>
>>>>>
>>>>>
>>>>> generalized cases of this optimization, for example with umin.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On the other hand, there are cases where it makes sense to do
>>>>>
>>>>>
>>>>>
>>>>> such simplifications in ScalarEvolution. Can you given an
>>>>>
>>>>>
>>>>>
>>>>> example where you're seeing this kind of code?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> It doesn't have much to do with code generation/optimization, but
>>>>
>>>> rather
>>>>
>>>> with analysis.
>>>>
>>>> I have a pass that tries to find buffer overflow bugs, doing that
>>>>
>>>> involves lots of umax() expressions.
>>>>
>>>> If I can move the zext out of umax I can decide quite early that an
>>>>
>>>> access is valid:
>>>>
>>>>
>>>>
>>>> int foo(unsigned n, unsigned i)
>>>>
>>>> {
>>>>
>>>> char *a = malloc(n);
>>>>
>>>> if (!a)
>>>>
>>>>  return -1;
>>>>
>>>> if (i < n)
>>>>
>>>>   a[i]=0;
>>>>
>>>> }
>>>>
>>>>
>>>>
>>>
>>> Where does the umax come from in this code?  I guess that you're
>>>
>>> transforming it in some way; could you show what the code looks
>>>
>>> like after the transformation?
>>>
>>>
>>>
>> %0 = malloc i8, i32 %n          ; <i8*> [#uses=2]
>> %2 = zext i32 %i to i64         ; <i64> [#uses=1]
>> %3 = getelementptr i8* %0, i64 %2               ; <i8*> [#uses=1]
>>
>> %3 is '((zext i32 %i to i64) + %0' according to SCEV,
>> %0 has size %n =>
>>
>> offset is (zext i32 %i to i64), hence one of the safety conditions  
>> is i
>>
>>> = 0 &&  i < n. We already have i >= 0 since I treat all
>>>
>> values as unsigned, so that leaves us with i < n:
>>
>> if ((zext i32 %n to i64) umax (zext i32 %i to i64)) == (zext i32 %i to
>> i64) then access is always valid
>> if ((zext i32 %n to i64) umax (zext i32 %i to i64)) == (zext i32 %n to
>> i64) then access is always invalid
> 
> Thanks, though it's still not clear why umax is needed.
> Wouldn't the code be a lot simpler expanded like this:
> 
> if (zext %i to i64) < (zext %n to i64) then access is always valid
> else access is always invalid
> 
> Stripping out the zexts from comparisons is fairly trivial.
> 
>> Otherwise I use a solver to decide
>>
>> There are 2 more conditions (access size < size, offset +  
>> access_size <
>> size) with similar expressions.
>> As I said I could do the zext simplification in my pass only, but I
>> thought this is something that would benefit SCEV(Expander) in  
>> general.
> 
> I agree that the transformation sounds more generally applicable,
> though umax is relatively uncommon in common SCEV usage, so I was
> curious where you were seeing this kind of expression. Having
> SCEVExpander do clever things is fine, though it's often easier
> to just let it expand expressions in a simple way and let a
> subsequent instcombine run tidy things up.

I've gotten in trouble with Chris for that in the past, he says SCEV 
should not leave the IR any more complicated than when it saw it. A good 
way to achieve that is to always make changes to createSCEV and the SCEV 
expander in matching pairs.

Nick