<html><head><style>body{font-family:Helvetica,Arial;font-size:13px}</style></head><body><div style="margin:0px"><div style="margin:0px">Hi,</div><div style="margin:0px"><br></div><div style="margin:0px">I’m experimenting with LLVM coroutines, and am wondering about a particular case where a seemingly irrelevant IR change prevents elision optimization. Any insight into why this happens would be greatly appreciated. I'm using LLVM 7.0.1.</div><div style="margin:0px"><br></div><div style="margin:0px">The code I'm working with is basically equivalent to the following Python example:</div><div style="margin:0px"><br></div><div style="margin:0px"> def my_coro(n: int):</div><div style="margin:0px"> yield n</div><div style="margin:0px"><br></div><div style="margin:0px"> my_var = <some extern global></div><div style="margin:0px"> if my_var > 0:</div><div style="margin:0px"> for a in my_coro(my_var):</div><div style="margin:0px"> print a</div><div style="margin:0px"><br></div><div style="margin:0px"><br></div><div style="margin:0px">Here’s my_coro in LLVM IR (note that there is an initial suspend, then a suspend to yield the value, then the final suspend):</div><div style="margin:0px"><br></div><div style="margin:0px"> define private i8* @my_coro(i64) {</div><div style="margin:0px"> entry:</div><div style="margin:0px"> %promise = alloca i64, i64 1</div><div style="margin:0px"> %1 = bitcast i64* %promise to i8*</div><div style="margin:0px"> %id = call token @<a href="http://llvm.coro.id">llvm.coro.id</a>(i32 0, i8* %1, i8* null, i8* null)</div><div style="margin:0px"> %2 = alloca i64, i64 1</div><div style="margin:0px"> store i64 %0, i64* %2</div><div style="margin:0px"> %3 = call i1 @llvm.coro.alloc(token %id)</div><div style="margin:0px"> br i1 %3, label %alloc, label %begin</div><div style="margin:0px"><br></div><div style="margin:0px"> alloc: ; preds = %entry</div><div style="margin:0px"> %4 = call i64 @llvm.coro.size.i64()</div><div style="margin:0px"> %5 = call i8* @my_alloc(i64 %4)</div><div style="margin:0px"> br label %begin</div><div style="margin:0px"><br></div><div style="margin:0px"> begin: ; preds = %entry, %alloc</div><div style="margin:0px"> %6 = phi i8* [ null, %entry ], [ %5, %alloc ]</div><div style="margin:0px"> %hdl = call i8* @llvm.coro.begin(token %id, i8* %6)</div><div style="margin:0px"> %7 = call i8 @llvm.coro.suspend(token none, i1 false)</div><div style="margin:0px"> switch i8 %7, label %suspend [</div><div style="margin:0px"> i8 0, label %9</div><div style="margin:0px"> i8 1, label %cleanup</div><div style="margin:0px"> ]</div><div style="margin:0px"><br></div><div style="margin:0px"> final: ; preds = %12</div><div style="margin:0px"> %8 = call i8 @llvm.coro.suspend(token none, i1 true)</div><div style="margin:0px"> switch i8 %8, label %suspend [</div><div style="margin:0px"> i8 0, label %13</div><div style="margin:0px"> i8 1, label %cleanup</div><div style="margin:0px"> ]</div><div style="margin:0px"><br></div><div style="margin:0px"> ; <label>:9: ; preds = %begin</div><div style="margin:0px"> %10 = load i64, i64* %2</div><div style="margin:0px"> store i64 %10, i64* %promise</div><div style="margin:0px"> %11 = call i8 @llvm.coro.suspend(token none, i1 false)</div><div style="margin:0px"> switch i8 %11, label %suspend [</div><div style="margin:0px"> i8 0, label %12</div><div style="margin:0px"> i8 1, label %cleanup</div><div style="margin:0px"> ]</div><div style="margin:0px"><br></div><div style="margin:0px"> ; <label>:12: ; preds = %9</div><div style="margin:0px"> br label %final</div><div style="margin:0px"><br></div><div style="margin:0px"> ; <label>:13: ; preds = %final</div><div style="margin:0px"> unreachable</div><div style="margin:0px"><br></div><div style="margin:0px"> cleanup: ; preds = %final, %9, %entry</div><div style="margin:0px"> %14 = call i8* @llvm.coro.free(token %id, i8* %hdl)</div><div style="margin:0px"> br label %suspend</div><div style="margin:0px"><br></div><div style="margin:0px"> suspend: ; preds = %final, %9, %entry, %cleanup</div><div style="margin:0px"> %15 = call i1 @llvm.coro.end(i8* %hdl, i1 false)</div><div style="margin:0px"> ret i8* %hdl</div><div style="margin:0px"> }</div><div style="margin:0px"><br></div><div style="margin:0px">And how it's called (i.e. the for-loop above):</div><div style="margin:0px"><br></div><div style="margin:0px"> define external void @main() {</div><div style="margin:0px"> entry:</div><div style="margin:0px"> %0 = load i64, i64* @my.var</div><div style="margin:0px"> %1 = icmp sgt i64 %0, 0</div><div style="margin:0px"> br i1 %1, label %if, label %exit</div><div style="margin:0px"><br></div><div style="margin:0px"> if: ; preds = %entry</div><div style="margin:0px"> %2 = load i64, i64* @my.var</div><div style="margin:0px"> %3 = call i8* @my_coro(i64 %2)</div><div style="margin:0px"> br label %for</div><div style="margin:0px"><br></div><div style="margin:0px"> for: ; preds = %body, %for_cont, %if</div><div style="margin:0px"> call void @llvm.coro.resume(i8* %3)</div><div style="margin:0px"> %4 = call i1 @llvm.coro.done(i8* %3)</div><div style="margin:0px"> br i1 %4, label %cleanup, label %body</div><div style="margin:0px"><br></div><div style="margin:0px"> body: ; preds = %for</div><div style="margin:0px"> %5 = call i8* @llvm.coro.promise(i8* %3, i32 8, i1 false)</div><div style="margin:0px"> %6 = bitcast i8* %5 to i64*</div><div style="margin:0px"> %7 = load i64, i64* %6</div><div style="margin:0px"> call void @my_print(i64 %7)</div><div style="margin:0px"> br label %for</div><div style="margin:0px"><br></div><div style="margin:0px"> cleanup: ; preds = %for</div><div style="margin:0px"> call void @llvm.coro.destroy(i8* %3)</div><div style="margin:0px"> br label %exit</div><div style="margin:0px"><br></div><div style="margin:0px"> exit: ; preds = %entry, %cleanup</div><div style="margin:0px"> ret void</div><div style="margin:0px"> }</div><div style="margin:0px"><br></div><div style="margin:0px"><br></div><div style="margin:0px">Now if I optimize this with "opt -S -enable-coroutines -O3", the coroutine allocation is _not_ elided. But if I remove the if-statement (e.g. change the condition to 1 > 0, giving the first branch in main() an i1 true condition), then elision does take place.</div><div style="margin:0px"><br></div><div style="margin:0px">However, I see no reason why elision can't be applied to the first version -- why does the presence of a branch outside the blocks where the coroutine is used change anything? Am I perhaps using opt incorrectly? Any insight would be greatly appreciated here, and thanks in advance.</div><div style="margin:0px"><br></div><div style="margin:0px">Ariya</div><div style="font-family:Helvetica,Arial;font-size:13px;margin:0px"><br></div></div><br><div class="gmail_signature"></div></body></html>