[llvm] r179925 - Add CodeGen support for functions that always return arguments via a new parameter attribute 'returned', which is taken advantage of in target-independent tail call opportunity detection and in ARM call lowering (when placed on an integral first parameter).

Stephen Lin stephenwlin at gmail.com
Wed Apr 24 06:16:09 PDT 2013


To be clear, the only cases supportable would be:

1) no extension at all (i.e the type is the same size as the register
used to pass it)
2) both the 'returned' parameter and return value are zero-extended
3) both the 'returned' parameter and return value are sign-extended

-Stephen

On Wed, Apr 24, 2013 at 9:07 AM, Stephen Lin <stephenwlin at gmail.com> wrote:
> Hi Evan, Jakob,
>
> Actually, I did some more testing and it turns out that the case of
> mismatching zero extension and/or sign extension can't be supported at
> all (even when using ISD::ANY_EXTEND), since ISD::ANY_EXTEND doesn't
> necessarily imply that the extended-into bits are invalidated.
>
> i.e. the following
>
>     declare i16 @identity16(i16 returned %x)
>     declare i32 @identity32(i32 returned %x)
>
>     define i16 @test_identity2(i16 zeroext %x) {
>     entry:
>      %call = tail call i16 @identity16(i16 %x)
>      %b = zext i16 %x to i32
>      %call2 = tail call i32 @identity32(i32 %b)
>      ret i16 %call
>     }
>
> yields:
>
>     test_identity2:                         @ @test_identity2
>     @ BB#0:                                 @ %entry
>            push    {r11, lr}
>            bl      identity16
>            bl      identity32
>            pop     {r11, pc}
>
> I suppose the easiest fix there is to disallow any case where the size
> of the 'returned' parameter is not equal to the size of the register
> used to pass it through the lowered call, even though that will
> disallow optimization in some cases that are currently supported by
> accident (but probably not that useful in practice.)
>
> Alternatively, I can find some way to invalidate only the high bits
> before passing them to LowerCall...is there a way to do that with SD
> nodes? i.e. something like AssertSext/AssertZext that does the
> opposite: asserts that the zero extension or sign extension of the
> node is unknowable without actually modifying the value? (I'm not
> suggesting this is worth the implementation effort to handle this
> corner case, it's just what I thought might work.)
>
> Let me know if you think is best and I will revise the patch.
>
> Thanks,
> Stephen
>
> On Wed, Apr 24, 2013 at 3:27 AM, Stephen Lin <stephenwlin at gmail.com> wrote:
>> Hi Evan, Jakob,
>>
>> By writing some more tests, I found a corner case bug in the
>> implementation below that I've resolved, but I wanted to check with
>> you two as to whether you think my resolution is the most appropriate
>> one.
>>
>> Basically, the issue is that the following:
>>
>>   declare i32 @identity32(i32 returned %x)
>>   declare i16 @paramzext16(i16 zeroext returned %x)
>>
>>   define i16 @test_matched_param(i16 %x) {
>>   entry:
>>     %call = tail call i16 @paramzext16(i16 %x)
>>     %b = zext i16 %call to i32
>>     %call2 = tail call i32 @identity32(i32 %b)
>>     %call3 = tail call i16 @paramzext16(i16 %call)
>>     ret i16 %call3
>>   }
>>
>> gets optimized to:
>>
>>     test_matched_param:
>>     @ BB#0:
>>            push    {r11, lr}
>>            uxth    r0, r0
>>            bl      paramzext16
>>            bl      identity32
>>            pop     {r11, lr}
>>            b       paramzext16
>>
>> whereas the correct output and (as far as I can tell, optimal) output is:
>>
>>    test_matched_param:
>>     @ BB#0:
>>            push    {r11, lr}
>>            uxth    r0, r0
>>            bl      paramzext16
>>            uxth    r0, r0
>>            bl      identity32
>>            pop     {r11, lr}
>>            b       paramzext16
>>
>> In other words, the code generator incorrectly assumes that r0 is
>> completely preserved across the call to paramzext16, which is not
>> guaranteed (only the lower 16-bits are preserved; at least, that's the
>> most defensible interpretation of 'returned' here)
>>
>> As far as I can tell this cannot be fixed in
>> ARMTargetLowering::LowerCall (with its current interface), since by
>> that point the register has already been extended to i32 and there's
>> no information that the original value was i16. So I resolved this by
>> changing the target-independent code in TargetLowering::LowerCallTo to
>> not pass 'returned' to the target-specific LowerCall implementations
>> unless either 1) the size of the register holding the passed argument
>> is equal to the size of the actual argument type or 2) the register is
>> extended using any-extend, in which case the high bits will be treated
>> as unknown even if the register is preserved across the call or 3) the
>> return value and 'returned' parameter have compatible extension
>> methods.
>>
>> This is sufficient to fix the particular example and I believe is an
>> adequate solution for the general case: basically, it means that
>> 'returned' being passed to the target-specific LowerCall means that
>> the actual full register value is returned unchanged, which is a
>> subset of the cases where there's a semantic 'returned' at the IR
>> level. The major limitation of this is that, in the case that the
>> parameter is zero/sign extended but the return value is not, it treats
>> the entire value as clobbered by the call when it's only the extended
>> bits that are possibly clobbered. This would be fixable with some
>> refactoring, but (as far as I can tell) would require changing the
>> interface to the target implementations of LowerCall to either pass
>> more information forward or back.
>>
>> The issue can be seen by changing the line:
>>     %b = zext i16 %call to i32
>> to
>>     %b = zext i16 %x to i32
>>
>> which results in significantly worse code, even though technically the
>> result should be the same because %x and %call should always be equal
>> in their lower 16 bits. Since it doesn't seem worth it to fix this for
>> now, I put a FIXME note about it in the code and in a test.
>>
>> I considered a few other ways to resolve this, but all of them seemed
>> to have some issues (I can go into them in more detail, though, if
>> you'd like more details.)
>>
>> Please let me know if you have any suggestions or concerns.
>>
>> -Stephen
>>
>> P.S. I also removed checks for sign and zero extension in
>> "sameNoopInput" that I added in r180118; they turned out to be
>> superfluous, which makes sense in retrospect now that I better
>> understand the semantics of the two.



More information about the llvm-commits mailing list