[llvm] r181927 - Implement PPC counter loops as a late IR-level pass

Ulrich Weigand Ulrich.Weigand at de.ibm.com
Thu May 16 10:34:27 PDT 2013


Hal Finkel <hfinkel at anl.gov> wrote on 16.05.2013 18:55:06:

> This should be fixed by r182023, please verify.

Not, it isn't fixed, unfortunately.

Here's the reduced test case I'm using:

target datalayout =
"E-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v128:128:128-n32"
target triple = "powerpc-unknown-linux-gnu"

@init_value = global double 1.000000e+00, align 8
@data64 = global [8000 x i64] zeroinitializer, align 8

define i32 @main(i32 %argc, i8** nocapture %argv) {
entry:
  %0 = load double* @init_value, align 8
  %conv = fptosi double %0 to i64
  %broadcast.splatinsert.i = insertelement <2 x i64> undef, i64 %conv, i32
0
  %broadcast.splat.i = shufflevector <2 x i64> %broadcast.splatinsert.i, <2
x i64> undef, <2 x i32> zeroinitializer
  br label %vector.body.i

vector.body.i:                                    ; preds = %vector.body.i,
%entry
  %index.i = phi i32 [ 0, %entry ], [ %index.next.i, %vector.body.i ]
  %next.gep.i = getelementptr [8000 x i64]* @data64, i32 0, i32 %index.i
  %1 = bitcast i64* %next.gep.i to <2 x i64>*
  store <2 x i64> %broadcast.splat.i, <2 x i64>* %1, align 8
  %next.gep.sum24.i = or i32 %index.i, 2
  %2 = getelementptr [8000 x i64]* @data64, i32 0, i32 %next.gep.sum24.i
  %3 = bitcast i64* %2 to <2 x i64>*
  store <2 x i64> %broadcast.splat.i, <2 x i64>* %3, align 8
  %index.next.i = add i32 %index.i, 4
  %4 = icmp eq i32 %index.next.i, 8000
  br i1 %4, label %_Z4fillIPxxEvT_S1_T0_.exit, label %vector.body.i

_Z4fillIPxxEvT_S1_T0_.exit:                       ; preds = %vector.body.i
  ret i32 0
}

You'll need to use -mcpu=ppc to actually get the helper routine call.

Note that in this instance, the fptosi isn't even in the loop in the first
place, but in the block before it!

After the PPCCtr pass, we have:

entry:
  %0 = load double* @init_value, align 8
  %conv = fptosi double %0 to i64
  %broadcast.splatinsert.i = insertelement <2 x i64> undef, i64 %conv, i32
0
  %broadcast.splat.i = shufflevector <2 x i64> %broadcast.splatinsert.i, <2
x i64> undef, <2 x i32> zeroinitializer
  call void @llvm.ppc.mtctr.i32(i32 2000)
  br label %vector.body.i

so the load of CTR is *after* the fptosi, but apparently this doesn't count
as a dependency for SelectionDAG, as after selection we get:

        ADJCALLSTACKDOWN 8, %R1<imp-def,dead>, %R1<imp-use>
        MTCTRse %vreg10<kill>, %CTR<imp-def,dead>; GPRC:%vreg10
        %F1<def> = COPY %vreg11; F8RC:%vreg11
        BL <es:__fixdfdi>, <regmask>, %LR<imp-def,dead>, %RM<imp-use>,
%F1<imp-use>, %R1<imp-def>, %R3<imp-def>, %R4<imp-def>
        ADJCALLSTACKUP 8, 0, %R1<imp-def,dead>, %R1<imp-use>

Bye,
Ulrich




More information about the llvm-commits mailing list