[llvm] r181927 - Implement PPC counter loops as a late IR-level pass
Ulrich Weigand
Ulrich.Weigand at de.ibm.com
Thu May 16 10:34:27 PDT 2013
Hal Finkel <hfinkel at anl.gov> wrote on 16.05.2013 18:55:06:
> This should be fixed by r182023, please verify.
Not, it isn't fixed, unfortunately.
Here's the reduced test case I'm using:
target datalayout =
"E-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v128:128:128-n32"
target triple = "powerpc-unknown-linux-gnu"
@init_value = global double 1.000000e+00, align 8
@data64 = global [8000 x i64] zeroinitializer, align 8
define i32 @main(i32 %argc, i8** nocapture %argv) {
entry:
%0 = load double* @init_value, align 8
%conv = fptosi double %0 to i64
%broadcast.splatinsert.i = insertelement <2 x i64> undef, i64 %conv, i32
0
%broadcast.splat.i = shufflevector <2 x i64> %broadcast.splatinsert.i, <2
x i64> undef, <2 x i32> zeroinitializer
br label %vector.body.i
vector.body.i: ; preds = %vector.body.i,
%entry
%index.i = phi i32 [ 0, %entry ], [ %index.next.i, %vector.body.i ]
%next.gep.i = getelementptr [8000 x i64]* @data64, i32 0, i32 %index.i
%1 = bitcast i64* %next.gep.i to <2 x i64>*
store <2 x i64> %broadcast.splat.i, <2 x i64>* %1, align 8
%next.gep.sum24.i = or i32 %index.i, 2
%2 = getelementptr [8000 x i64]* @data64, i32 0, i32 %next.gep.sum24.i
%3 = bitcast i64* %2 to <2 x i64>*
store <2 x i64> %broadcast.splat.i, <2 x i64>* %3, align 8
%index.next.i = add i32 %index.i, 4
%4 = icmp eq i32 %index.next.i, 8000
br i1 %4, label %_Z4fillIPxxEvT_S1_T0_.exit, label %vector.body.i
_Z4fillIPxxEvT_S1_T0_.exit: ; preds = %vector.body.i
ret i32 0
}
You'll need to use -mcpu=ppc to actually get the helper routine call.
Note that in this instance, the fptosi isn't even in the loop in the first
place, but in the block before it!
After the PPCCtr pass, we have:
entry:
%0 = load double* @init_value, align 8
%conv = fptosi double %0 to i64
%broadcast.splatinsert.i = insertelement <2 x i64> undef, i64 %conv, i32
0
%broadcast.splat.i = shufflevector <2 x i64> %broadcast.splatinsert.i, <2
x i64> undef, <2 x i32> zeroinitializer
call void @llvm.ppc.mtctr.i32(i32 2000)
br label %vector.body.i
so the load of CTR is *after* the fptosi, but apparently this doesn't count
as a dependency for SelectionDAG, as after selection we get:
ADJCALLSTACKDOWN 8, %R1<imp-def,dead>, %R1<imp-use>
MTCTRse %vreg10<kill>, %CTR<imp-def,dead>; GPRC:%vreg10
%F1<def> = COPY %vreg11; F8RC:%vreg11
BL <es:__fixdfdi>, <regmask>, %LR<imp-def,dead>, %RM<imp-use>,
%F1<imp-use>, %R1<imp-def>, %R3<imp-def>, %R4<imp-def>
ADJCALLSTACKUP 8, 0, %R1<imp-def,dead>, %R1<imp-use>
Bye,
Ulrich
More information about the llvm-commits
mailing list