[llvm] r179925 - Add CodeGen support for functions that always return arguments via a new parameter attribute 'returned', which is taken advantage of in target-independent tail call opportunity detection and in ARM call lowering (when placed on an integral first parameter).

Stephen Lin stephenwlin at gmail.com
Wed Apr 24 06:07:51 PDT 2013


Hi Evan, Jakob,

Actually, I did some more testing and it turns out that the case of
mismatching zero extension and/or sign extension can't be supported at
all (even when using ISD::ANY_EXTEND), since ISD::ANY_EXTEND doesn't
necessarily imply that the extended-into bits are invalidated.

i.e. the following

    declare i16 @identity16(i16 returned %x)
    declare i32 @identity32(i32 returned %x)

    define i16 @test_identity2(i16 zeroext %x) {
    entry:
     %call = tail call i16 @identity16(i16 %x)
     %b = zext i16 %x to i32
     %call2 = tail call i32 @identity32(i32 %b)
     ret i16 %call
    }

yields:

    test_identity2:                         @ @test_identity2
    @ BB#0:                                 @ %entry
           push    {r11, lr}
           bl      identity16
           bl      identity32
           pop     {r11, pc}

I suppose the easiest fix there is to disallow any case where the size
of the 'returned' parameter is not equal to the size of the register
used to pass it through the lowered call, even though that will
disallow optimization in some cases that are currently supported by
accident (but probably not that useful in practice.)

Alternatively, I can find some way to invalidate only the high bits
before passing them to LowerCall...is there a way to do that with SD
nodes? i.e. something like AssertSext/AssertZext that does the
opposite: asserts that the zero extension or sign extension of the
node is unknowable without actually modifying the value? (I'm not
suggesting this is worth the implementation effort to handle this
corner case, it's just what I thought might work.)

Let me know if you think is best and I will revise the patch.

Thanks,
Stephen

On Wed, Apr 24, 2013 at 3:27 AM, Stephen Lin <stephenwlin at gmail.com> wrote:
> Hi Evan, Jakob,
>
> By writing some more tests, I found a corner case bug in the
> implementation below that I've resolved, but I wanted to check with
> you two as to whether you think my resolution is the most appropriate
> one.
>
> Basically, the issue is that the following:
>
>   declare i32 @identity32(i32 returned %x)
>   declare i16 @paramzext16(i16 zeroext returned %x)
>
>   define i16 @test_matched_param(i16 %x) {
>   entry:
>     %call = tail call i16 @paramzext16(i16 %x)
>     %b = zext i16 %call to i32
>     %call2 = tail call i32 @identity32(i32 %b)
>     %call3 = tail call i16 @paramzext16(i16 %call)
>     ret i16 %call3
>   }
>
> gets optimized to:
>
>     test_matched_param:
>     @ BB#0:
>            push    {r11, lr}
>            uxth    r0, r0
>            bl      paramzext16
>            bl      identity32
>            pop     {r11, lr}
>            b       paramzext16
>
> whereas the correct output and (as far as I can tell, optimal) output is:
>
>    test_matched_param:
>     @ BB#0:
>            push    {r11, lr}
>            uxth    r0, r0
>            bl      paramzext16
>            uxth    r0, r0
>            bl      identity32
>            pop     {r11, lr}
>            b       paramzext16
>
> In other words, the code generator incorrectly assumes that r0 is
> completely preserved across the call to paramzext16, which is not
> guaranteed (only the lower 16-bits are preserved; at least, that's the
> most defensible interpretation of 'returned' here)
>
> As far as I can tell this cannot be fixed in
> ARMTargetLowering::LowerCall (with its current interface), since by
> that point the register has already been extended to i32 and there's
> no information that the original value was i16. So I resolved this by
> changing the target-independent code in TargetLowering::LowerCallTo to
> not pass 'returned' to the target-specific LowerCall implementations
> unless either 1) the size of the register holding the passed argument
> is equal to the size of the actual argument type or 2) the register is
> extended using any-extend, in which case the high bits will be treated
> as unknown even if the register is preserved across the call or 3) the
> return value and 'returned' parameter have compatible extension
> methods.
>
> This is sufficient to fix the particular example and I believe is an
> adequate solution for the general case: basically, it means that
> 'returned' being passed to the target-specific LowerCall means that
> the actual full register value is returned unchanged, which is a
> subset of the cases where there's a semantic 'returned' at the IR
> level. The major limitation of this is that, in the case that the
> parameter is zero/sign extended but the return value is not, it treats
> the entire value as clobbered by the call when it's only the extended
> bits that are possibly clobbered. This would be fixable with some
> refactoring, but (as far as I can tell) would require changing the
> interface to the target implementations of LowerCall to either pass
> more information forward or back.
>
> The issue can be seen by changing the line:
>     %b = zext i16 %call to i32
> to
>     %b = zext i16 %x to i32
>
> which results in significantly worse code, even though technically the
> result should be the same because %x and %call should always be equal
> in their lower 16 bits. Since it doesn't seem worth it to fix this for
> now, I put a FIXME note about it in the code and in a test.
>
> I considered a few other ways to resolve this, but all of them seemed
> to have some issues (I can go into them in more detail, though, if
> you'd like more details.)
>
> Please let me know if you have any suggestions or concerns.
>
> -Stephen
>
> P.S. I also removed checks for sign and zero extension in
> "sameNoopInput" that I added in r180118; they turned out to be
> superfluous, which makes sense in retrospect now that I better
> understand the semantics of the two.



More information about the llvm-commits mailing list