[llvm] r197466 - Add -mcpu=z10 to SystemZ tests.

Andrew Trick atrick at apple.com
Wed Dec 18 15:11:58 PST 2013


On Dec 18, 2013, at 3:32 AM, Richard Sandiford <rsandifo at linux.vnet.ibm.com> wrote:

> Hi Andrew,
> 
> Andrew Trick <atrick at apple.com> writes:
>> URL: http://llvm.org/viewvc/llvm-project?rev=197466&view=rev
>> Log:
>> Add -mcpu=z10 to SystemZ tests.
>> 
>> Modified:
>>    llvm/trunk/test/CodeGen/SystemZ/int-div-01.ll
>>    llvm/trunk/test/CodeGen/SystemZ/int-div-03.ll
>>    llvm/trunk/test/CodeGen/SystemZ/int-div-04.ll
>>    llvm/trunk/test/CodeGen/SystemZ/int-mul-08.ll
> 
> I tried winding back to r197465 and it looks like you added these because
> on z196 you got:
> 
>        ogrk %r2, %r2, %r3      instead of      ogr %r2, %r3
>  and   ork %r2, %r2, %r3       instead of      or %r2, %r3
> 
> Is that right?  We don't really want to use the three-address forms
> (OGRK and ORK) when the two-address form would do.  In these cases
> I think the destination and first source are natural ties so we really
> should be using OGR and OR for z196 too.
> 
> It looks like current trunk (r197599) works again for -mcpu=z196.
> Would it be OK to revert this for now?

You are correct, and I reverted the test case changes.

TLDR;

The problem was, for example, this test case:

define i32 @f4(i32 %dummy, i32 signext %a, i32 %b) {
; CHECK-LABEL: f4:
; CHECK-NOT: {{%r[234]}}
; CHECK: dsgfr %r2, %r4
; CHECK-NOT: dsgfr
; CHECK: or %r2, %r3
; CHECK: br %r14
  %div = sdiv i32 %a, %b
  %rem = srem i32 %a, %b
  %or = or i32 %rem, %div
  ret i32 %or
}

Before my MachineCSE fix the coalscer sees:

BB#0: derived from LLVM BB %0
    Live Ins: %R3D %R4L
	%vreg2<def> = COPY %R4L; GR32Bit:%vreg2
	%vreg1<def> = COPY %R3D; GR64Bit:%vreg1
	%vreg8<def> = IMPLICIT_DEF; GR128Bit:%vreg8
	%vreg3<def,tied1> = INSERT_SUBREG %vreg8<tied0>, %vreg1, subreg_l64; GR128Bit:%vreg3,%vreg8 GR64Bit:%vreg1
	%vreg4<def,tied1> = DSGFR %vreg3<tied0>, %vreg2; GR128Bit:%vreg4,%vreg3 GR32Bit:%vreg2
	%vreg5<def> = COPY %vreg4:subreg_l32; GR32Bit:%vreg5 GR128Bit:%vreg4
	%vreg6<def> = COPY %vreg4:subreg_hl32; GR32Bit:%vreg6 GR128Bit:%vreg4
	%vreg7<def,tied1> = OR %vreg6<tied0>, %vreg5<kill>, %CC<imp-def,dead>; GR32Bit:%vreg7,%vreg6,%vreg5
	%R2L<def> = COPY %vreg7; GR32Bit:%vreg7
	Return %R2L<imp-use>

After my MachineCSE fix the coalescer sees:

BB#0: derived from LLVM BB %0
    Live Ins: %R3D %R4L
	%vreg2<def> = COPY %R4L; GR32Bit:%vreg2
	%vreg1<def> = COPY %R3D; GR64Bit:%vreg1
	%vreg8<def> = IMPLICIT_DEF; GR128Bit:%vreg8
	%vreg3<def,tied1> = INSERT_SUBREG %vreg8<tied0>, %vreg1, subreg_l64; GR128Bit:%vreg3,%vreg8 GR64Bit:%vreg1
	%vreg4<def,tied1> = DSGFR %vreg3<tied0>, %vreg2; GR128Bit:%vreg4,%vreg3 GR32Bit:%vreg2
	%vreg7<def,tied1> = OR %vreg4:subreg_hl32<tied0>, %vreg4:subreg_l32, %CC<imp-def,dead>; GR32Bit:%vreg7 GR128Bit:%vreg4
	%R2L<def> = COPY %vreg7; GR32Bit:%vreg7
	Return %R2L<imp-use>

The 2-addr pass ties the DSGFR register operands first. When it tries to tie the OR operands it looks past copies and sees that it actually is tying R3D with R2L. Since these registers don't overlap, instead of creating a copy, it converts to ORK.

When I saw that happening I figured this test really isn't right for z196. But I was wrong.

The interesting thing is that with -z10, where we can't convert to ORK, the copy eventually goes away through a subregister trick. It turns out that R3D is actually the "low" bits of R2Q...

80B		%R2Q<def,tied1> = DSGFR %R2Q<kill,tied0>, %R4L<kill>
144B		%R2L<def,tied1> = OR %R2L<tied0>, %R3L, %CC<imp-def,dead>, %R2Q<imp-use,kill>, %R2Q<imp-def>
160B		%R2L<def> = KILL %R2L, %R2Q<imp-use,kill>

To make the TwoAddress pass smart enough to recognize this, we need more book-keeping. The SrcRegMap needs to include subregister indices (coming from an INSERT_SUBREG) so we can recognize such overlaps.

I do think the MachineCSE fix is the right direction, but it looks like there will be a fair amount of work before I can reenable it with no regressions.

-Andy



More information about the llvm-commits mailing list