[LLVMdev] opt -std-compile-opts breaks tail calls

Jon Harrop jon at ffconsultancy.com
Fri Nov 13 09:07:08 PST 2009


On Friday 13 November 2009 16:26:01 Chris Lattner wrote:
> On Nov 13, 2009, at 3:34 AM, Jon Harrop wrote:
> >> Point 4 is the one that caused me trouble for some time.
> >> Unfortunately
> >> it causes a bad interaction with the optimiser, specifically the
> >> 'simplifycfg' pass. What seems to happen is that since the function
> >> you are calling is marked with 'noreturn', the simplifycfg pass will
> >> then put a 'unreachable' statement after the tail call to that
> >> function. This stuffs up the tail call though as the llc compiler
> >> requires tail calls be followed by a 'return void' statement. In my
> >> case this caused my compiled programs to segfault if the llvm
> >> optimiser was run on them.
> >
> > Isn't that a bug in LLVM?
>
> Yes, please file a bugzilla with a small example.

I think I may be having a similar problem, albeit with with noalias rather 
than noreturn. This is my tail recursive function:

define fastcc noalias i8* @fold_aux(%2*, i32, i8* (%2*, %2, double)*, %2, %0) 
{
entry:
  %5 = load i32* @shadow_stack_depth              ; <i32> [#uses=4]
  %6 = alloca %2, align 8                         ; <%2*> [#uses=2]
  %7 = load %0* @shadow_stack                     ; <%0> [#uses=1]
  %8 = extractvalue %0 %7, 2                      ; <i8*> [#uses=1]
  %9 = bitcast i8* %8 to %0*                      ; <%0*> [#uses=1]
  %10 = sext i32 %5 to i64                        ; <i64> [#uses=1]
  %11 = getelementptr %0* %9, i64 %10             ; <%0*> [#uses=1]
  store %0 %4, %0* %11
  %12 = add i32 %5, 1                             ; <i32> [#uses=1]
  store i32 %12, i32* @shadow_stack_depth
  %13 = load i32* @n_allocated                    ; <i32> [#uses=1]
  %14 = load i32* @quota                          ; <i32> [#uses=1]
  %15 = icmp sgt i32 %13, %14                     ; <i1> [#uses=1]
  br i1 %15, label %fail, label %cont

fail:                                             ; preds = %entry
  %16 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([4 x i8]* 
@buf16, i64 0, i64 0)) ; <i32> [#uses=0]
  %17 = load double ()** @hlvm_time               ; <double ()*> [#uses=1]
  %18 = call double %17()                         ; <double> [#uses=1]
  %19 = call fastcc i8* @gc_mark(i32 0)           ; <i8*> [#uses=0]
  %20 = load double* @mark_time                   ; <double> [#uses=1]
  %21 = load double ()** @hlvm_time               ; <double ()*> [#uses=1]
  %22 = call double %21()                         ; <double> [#uses=1]
  %23 = fadd double %20, %22                      ; <double> [#uses=1]
  %24 = fsub double %23, %18                      ; <double> [#uses=1]
  store double %24, double* @mark_time
  %25 = load double ()** @hlvm_time               ; <double ()*> [#uses=1]
  %26 = call double %25()                         ; <double> [#uses=1]
  %27 = call fastcc i8* @gc_sweep(i32 0)          ; <i8*> [#uses=0]
  %28 = load double* @sweep_time                  ; <double> [#uses=1]
  %29 = load double ()** @hlvm_time               ; <double ()*> [#uses=1]
  %30 = call double %29()                         ; <double> [#uses=1]
  %31 = fadd double %28, %30                      ; <double> [#uses=1]
  %32 = fsub double %31, %26                      ; <double> [#uses=1]
  store double %32, double* @sweep_time
  %33 = load i32* @n_allocated                    ; <i32> [#uses=1]
  %34 = shl i32 %33, 2                            ; <i32> [#uses=1]
  %35 = add i32 %34, 256                          ; <i32> [#uses=1]
  store i32 %35, i32* @quota
  br label %cont

cont:                                             ; preds = %fail, %entry
  %36 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([10 x i8]* 
@buf17, i64 0, i64 0)) ; <i32> [#uses=0]
  %37 = extractvalue %0 %4, 1                     ; <i32> [#uses=1]
  %38 = icmp sgt i32 %37, %1                      ; <i1> [#uses=1]
  br i1 %38, label %pass2, label %fail1

fail1:                                            ; preds = %cont
  store i32 %5, i32* @shadow_stack_depth
  store %2 %3, %2* %0
  ret i8* undef

pass2:                                            ; preds = %cont
  %39 = add i32 %1, 1                             ; <i32> [#uses=1]
  %40 = icmp slt i32 %1, 0                        ; <i1> [#uses=1]
  br i1 %40, label %fail6, label %pass4

pass4:                                            ; preds = %pass2
  %41 = extractvalue %0 %4, 1                     ; <i32> [#uses=1]
  %42 = icmp sgt i32 %41, %1                      ; <i1> [#uses=1]
  br i1 %42, label %cont8, label %fail6

fail6:                                            ; preds = %pass4, %pass2
  %43 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([27 x i8]* 
@buf5, i64 0, i64 0)) ; <i32> [#uses=0]
  call void @exit(i32 1)
  br label %cont8

cont8:                                            ; preds = %fail6, %pass4
  %44 = extractvalue %0 %4, 2                     ; <i8*> [#uses=1]
  %45 = bitcast i8* %44 to double*                ; <double*> [#uses=1]
  %46 = sext i32 %1 to i64                        ; <i64> [#uses=1]
  %47 = getelementptr double* %45, i64 %46        ; <double*> [#uses=1]
  %48 = load double* %47                          ; <double> [#uses=1]
  %49 = call fastcc i8* %2(%2* %6, %2 %3, double %48) ; <i8*> [#uses=0]
  %50 = load %2* %6                               ; <%2> [#uses=1]
  store i32 %5, i32* @shadow_stack_depth
  %51 = tail call fastcc i8* @fold_aux(%2* %0, i32 %39, i8* (%2*, %2, 
double)* %2, %2 %50, %0 %4) ; <i8*> [#uses=1]
  ret i8* %51
}

That last call is the critical tail call and AFAICT it is stack overflowing. 
Anyone seen any problems like this before?

I'll try removing the noalias and seeing if that fixes it. If so, I'll file a 
bug report. Which part of LLVM would this be a bug in if things like noalias 
and noreturn inhibit TCO when they shouldn't?

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e



More information about the llvm-dev mailing list