[llvm] r197466 - Add -mcpu=z10 to SystemZ tests.

Richard Sandiford rsandifo at linux.vnet.ibm.com
Thu Dec 19 05:45:02 PST 2013


Andrew Trick <atrick at apple.com> writes:
> The problem was, for example, this test case:
>
> define i32 @f4(i32 %dummy, i32 signext %a, i32 %b) {
> ; CHECK-LABEL: f4:
> ; CHECK-NOT: {{%r[234]}}
> ; CHECK: dsgfr %r2, %r4
> ; CHECK-NOT: dsgfr
> ; CHECK: or %r2, %r3
> ; CHECK: br %r14
>   %div = sdiv i32 %a, %b
>   %rem = srem i32 %a, %b
>   %or = or i32 %rem, %div
>   ret i32 %or
> }
>
> Before my MachineCSE fix the coalscer sees:
>
> BB#0: derived from LLVM BB %0
>     Live Ins: %R3D %R4L
> 	%vreg2<def> = COPY %R4L; GR32Bit:%vreg2
> 	%vreg1<def> = COPY %R3D; GR64Bit:%vreg1
> 	%vreg8<def> = IMPLICIT_DEF; GR128Bit:%vreg8
> 	%vreg3<def,tied1> = INSERT_SUBREG %vreg8<tied0>, %vreg1, subreg_l64; GR128Bit:%vreg3,%vreg8 GR64Bit:%vreg1
> 	%vreg4<def,tied1> = DSGFR %vreg3<tied0>, %vreg2; GR128Bit:%vreg4,%vreg3 GR32Bit:%vreg2
> 	%vreg5<def> = COPY %vreg4:subreg_l32; GR32Bit:%vreg5 GR128Bit:%vreg4
> 	%vreg6<def> = COPY %vreg4:subreg_hl32; GR32Bit:%vreg6 GR128Bit:%vreg4
> 	%vreg7<def,tied1> = OR %vreg6<tied0>, %vreg5<kill>, %CC<imp-def,dead>; GR32Bit:%vreg7,%vreg6,%vreg5
> 	%R2L<def> = COPY %vreg7; GR32Bit:%vreg7
> 	Return %R2L<imp-use>
>
> After my MachineCSE fix the coalescer sees:
>
> BB#0: derived from LLVM BB %0
>     Live Ins: %R3D %R4L
> 	%vreg2<def> = COPY %R4L; GR32Bit:%vreg2
> 	%vreg1<def> = COPY %R3D; GR64Bit:%vreg1
> 	%vreg8<def> = IMPLICIT_DEF; GR128Bit:%vreg8
> 	%vreg3<def,tied1> = INSERT_SUBREG %vreg8<tied0>, %vreg1, subreg_l64; GR128Bit:%vreg3,%vreg8 GR64Bit:%vreg1
> 	%vreg4<def,tied1> = DSGFR %vreg3<tied0>, %vreg2; GR128Bit:%vreg4,%vreg3 GR32Bit:%vreg2
> 	%vreg7<def,tied1> = OR %vreg4:subreg_hl32<tied0>, %vreg4:subreg_l32, %CC<imp-def,dead>; GR32Bit:%vreg7 GR128Bit:%vreg4
> 	%R2L<def> = COPY %vreg7; GR32Bit:%vreg7
> 	Return %R2L<imp-use>
>
> The 2-addr pass ties the DSGFR register operands first. When it tries
> to tie the OR operands it looks past copies and sees that it actually
> is tying R3D with R2L. Since these registers don't overlap, instead of
> creating a copy, it converts to ORK.
>
> When I saw that happening I figured this test really isn't right for
> z196. But I was wrong.
>
> The interesting thing is that with -z10, where we can't convert to
> ORK, the copy eventually goes away through a subregister trick. It
> turns out that R3D is actually the "low" bits of R2Q...
>
> 80B		%R2Q<def,tied1> = DSGFR %R2Q<kill,tied0>, %R4L<kill>
> 144B		%R2L<def,tied1> = OR %R2L<tied0>, %R3L, %CC<imp-def,dead>, %R2Q<imp-use,kill>, %R2Q<imp-def>
> 160B		%R2L<def> = KILL %R2L, %R2Q<imp-use,kill>
>
> To make the TwoAddress pass smart enough to recognize this, we need
> more book-keeping. The SrcRegMap needs to include subregister indices
> (coming from an INSERT_SUBREG) so we can recognize such overlaps.

Ah, thanks for the explanation.  It looks like the tying in this case is
a bit more tricky than I'd realised.

So please let me know if this gets too difficult to handle (or if the
cost of handling it outweights the pay-off).  I can try to rework the
test if so.

Thanks,
Richard




More information about the llvm-commits mailing list