<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">My O3 performance runs look fine on x86 and ARM for the llvm tests. <div>Note that in LTO the order is already IPSCCPP and then global opt.</div><div>The change then establishes the same order at lower optimization levels.</div><div><br></div><div>There is still one twist with the indirect-goto.c test in clang: After the changes the test case</div><div>passes in 64b mode, but not in 32b. LLVM generates different initialization code for the block address array</div><div>the two modes. The effect is that the reference</div><div>count of block labels is zero only in 64b mode. But in 32b some optimizations</div><div>like jump threading, inlining etc. that check for the address taken bit don’t kick in. This requires a</div><div>deeper and separate investigation into implementation and (possibly) performance differences</div><div>between 32b and 64b. To recover the test case regression a simple change to the triple will do:</div><div><br></div><div><div style="margin: 0px; font-family: Menlo;">Index: test/CodeGen/indirect-goto.c</div><div style="margin: 0px; font-family: Menlo;">===================================================================</div><div style="margin: 0px; font-family: Menlo;">--- test/CodeGen/indirect-goto.c<span class="Apple-tab-span" style="white-space:pre"> </span>(revision 212082)</div><div style="margin: 0px; font-family: Menlo;">+++ test/CodeGen/indirect-goto.c<span class="Apple-tab-span" style="white-space:pre"> </span>(working copy)</div><div style="margin: 0px; font-family: Menlo;">@@ -1,4 +1,4 @@</div><div style="margin: 0px; font-family: Menlo;">-// RUN: %clang_cc1 -triple i386-unknown-unknown -O3 -emit-llvm -o - %s | grep "ret i32 2520"</div><div style="margin: 0px; font-family: Menlo;">+// RUN: %clang_cc1 -triple x86_64-unknown-unknown -O3 -emit-llvm -o - %s | grep "ret i32 2520"</div><p style="margin: 0px; font-family: Menlo; min-height: 21px;"> <br class="webkit-block-placeholder"></p><div style="margin: 0px; font-family: Menlo;"> static int foo(unsigned i) {</div><div style="margin: 0px; font-family: Menlo;"> void *addrs[] = { &&L1, &&L2, &&L3, &&L4, &&L5 };</div></div><div><br></div><div><br></div><div>Do the changes in PassManagerBuilder.cpp and the test case LGTY?</div><div><br></div><div>Thanks</div><div>Gerolf</div><div><br></div><div>PS: Change to PassManagerBuilder.cpp:</div><div><br></div><div><blockquote type="cite"><blockquote type="cite">Index: lib/Transforms/IPO/PassManagerBuilder.cpp<br>===================================================================<br>--- lib/Transforms/IPO/PassManagerBuilder.cpp (revision 211729)<br>+++ lib/Transforms/IPO/PassManagerBuilder.cpp (working copy)<br>@@ -156,9 +156,9 @@<br> if (!DisableUnitAtATime) {<br> addExtensionsToPM(EP_ModuleOptimizerEarly, MPM);<br><br>- MPM.add(createGlobalOptimizerPass()); // Optimize out global vars<br><br> MPM.add(createIPSCCPPass()); // IP SCCP<br>+ MPM.add(createGlobalOptimizerPass()); // Optimize out global vars<br> MPM.add(createDeadArgEliminationPass()); // Dead argument elimination<br><br> MPM.add(createInstructionCombiningPass());// Clean up after IPCP & DAE</blockquote></blockquote></div><div><br></div><div><div><div><div>On Jun 27, 2014, at 4:50 PM, Eric Christopher <<a href="mailto:echristo@gmail.com">echristo@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">That does seem reasonably straightforward and makes sense on the surface.<br><br>Did you want to benchmark it and see what you get? If you'd like I can<br>do the same here.<br><br>-eric<br><br>On Fri, Jun 27, 2014 at 4:26 PM, Gerolf Hoflehner <<a href="mailto:ghoflehner@apple.com">ghoflehner@apple.com</a>> wrote:<br><blockquote type="cite">I revised the comments and commit message of the Ruby fix.<br><br><br><br><br>I continue to work on the indirect-goto.c case. The fix that would make it<br>happen is inverting the<br>order of invocation of global const prop and global optimizer:<br><br>Index: lib/Transforms/IPO/PassManagerBuilder.cpp<br>===================================================================<br>--- lib/Transforms/IPO/PassManagerBuilder.cpp (revision 211729)<br>+++ lib/Transforms/IPO/PassManagerBuilder.cpp (working copy)<br>@@ -156,9 +156,9 @@<br> if (!DisableUnitAtATime) {<br> addExtensionsToPM(EP_ModuleOptimizerEarly, MPM);<br><br>- MPM.add(createGlobalOptimizerPass()); // Optimize out global vars<br><br> MPM.add(createIPSCCPPass()); // IP SCCP<br>+ MPM.add(createGlobalOptimizerPass()); // Optimize out global vars<br> MPM.add(createDeadArgEliminationPass()); // Dead argument elimination<br><br> MPM.add(createInstructionCombiningPass());// Clean up after IPCP & DAE<br><br><br>When the global optimizer is invoked first, the indirect branch using the<br>global array has not<br>been eliminated. So the global can’t be eliminated. That is pretty straight<br>forward and the pass swap<br>looks reasonable but requires more scrutiny and careful testing. As<br>additional benefit the global<br>array would be removed from the IR.<br><br><br>-Gerolf<br><br>On Jun 25, 2014, at 6:01 PM, Gerolf Hoflehner <<a href="mailto:ghoflehner@apple.com">ghoflehner@apple.com</a>> wrote:<br><br>Bob and I discussed this a bit more. Interprocedural const prop eliminates<br>the indirect branch. Then the labels. So when the<br>inliner kicks in it does not “see" any of the issues that could block it.<br><br>However, with the patch there must still be some information about block<br>label address taken kept around when a<br>global (static) array contains the label addresses. I verified that the<br>test case would work when foo2<br>also declared just a local array const void *addrs[] = { &&L1, &&L2, &&L3,<br>&&L4, &&L5 }.<br>Optimistically it seems like we might catch the global case also without<br>escape analysis,<br>but I have not looked into that code yet.<br><br>-Gerolf<br><br><br><br>On Jun 25, 2014, at 4:00 PM, Bob Wilson <<a href="mailto:bob.wilson@apple.com">bob.wilson@apple.com</a>> wrote:<br><br>Oh! It’s a front-end test. I didn’t even think to look there….<br><br>The test predates the introduction of indirect branches and has only had<br>superficial changes since then. We used to translate computed gotos into<br>switch statements. I’m still trying to figure out how this ever worked in<br>the past.<br><br>On Jun 25, 2014, at 3:50 PM, Gerolf Hoflehner <<a href="mailto:ghoflehner@apple.com">ghoflehner@apple.com</a>> wrote:<br><br>Looking at the history it looks like Chris L. committed the test case in<br>tools/clang/test/CodeGen/indirect-goto.c to test<br>block address labels.<br><br>I’m fine with the train of reasoning:<br>there is an indirect branch in the source and we<br>never attempt to inline such functions. Furthermore for inlining functions<br>with block<br>label address taken we would need escape analysis. We don’t have it. So<br>let’s not inline.<br><br>It looks like everyone is signing up for<br>“Expect no optimization when block address labels are used”.<br><br>Cheers<br>Gerolf<br><br><br><br><br>On Jun 25, 2014, at 2:49 PM, Eric Christopher <<a href="mailto:echristo@gmail.com">echristo@gmail.com</a>> wrote:<br><br>On Wed, Jun 25, 2014 at 2:46 PM, Bob Wilson <<a href="mailto:bob.wilson@apple.com">bob.wilson@apple.com</a>> wrote:<br><br>On Jun 25, 2014, at 2:30 PM, Gerolf Hoflehner <<a href="mailto:ghoflehner@apple.com">ghoflehner@apple.com</a>> wrote:<br><br>I noticed one side-effect of this patch: we no longer optimize fully cases<br>like indirect-goto.c test in CodeGen below.<br>Since foo and foo2 are no longer inlined constant folding of the result in<br>main no longer happens.<br><br>Does anyone have an issue with this?<br><br><br>I don’t. We’ve never attempted to inline functions containing indirect<br>branches. I’m surprised that your testcase works now, but to really do it<br>correctly, we would need to check that there were no remaining uses of the<br>blockaddress that might escape the function. I’m not aware of any code that<br>does that check, and without it, it’s not a safe optimization.<br><br><br><br>Oh, wait, I see what the testcase is doing now. I'd thought it was<br>inlining into something that had blockaddr rather than the other way<br>around.<br><br>What does the commit for the testcase say about it?<br><br>Assuming it doesn't say anything enlightening then no, no issue here either.<br><br>-eric<br><br>Thanks<br>Gerolf<br><br>// RUN: %clang_cc1 -triple i386-unknown-unknown -O3 -emit-llvm -o - %s |<br>grep "ret i32 2520"<br><br>static int foo(unsigned i) {<br> void *addrs[] = { &&L1, &&L2, &&L3, &&L4, &&L5 };<br> int res = 1;<br><br> goto *addrs[i];<br>L5: res *= 11;<br>L4: res *= 7;<br>L3: res *= 5;<br>L2: res *= 3;<br>L1: res *= 2;<br> return res;<br>}<br><br>static int foo2(unsigned i) {<br> static const void *addrs[] = { &&L1, &&L2, &&L3, &&L4, &&L5 };<br> int res = 1;<br><br> goto *addrs[i];<br>L5: res *= 11;<br>L4: res *= 7;<br>L3: res *= 5;<br>L2: res *= 3;<br>L1: res *= 2;<br> return res;<br>}<br><br>int main() {<br> return foo(3)+foo2(4);<br>}<br><br><br>IR before:<br><br>; Function Attrs: nounwind readnone<br>define i32 @main() #0 {<br>entry:<br> ret i32 2520<br>}<br><br><br>IR now:<br><br>; Function Attrs: nounwind readnone<br>define i32 @main() #0 {<br>entry:<br> %call = tail call fastcc i32 @foo(i32 3)<br> %call1 = tail call fastcc i32 @foo2(i32 4)<br> %add = add nsw i32 %call1, %call<br> ret i32 %add<br>}<br><br>; Function Attrs: nounwind readnone<br>define internal fastcc i32 @foo(i32 %i) #0 {<br>entry:<br> br label %L1<br><br>L1: ; preds = %entry<br> ret i32 210<br>}<br><br>; Function Attrs: nounwind readnone<br>define internal fastcc i32 @foo2(i32 %i) #0 {<br>entry:<br> br label %L1<br><br>L1: ; preds = %entry<br> ret i32 2310<br>}<br><br><br>On Jun 25, 2014, at 11:32 AM, Bob Wilson <<a href="mailto:bob.wilson@apple.com">bob.wilson@apple.com</a>> wrote:<br><br><br>On Jun 25, 2014, at 10:47 AM, Gerolf Hoflehner <<a href="mailto:ghoflehner@apple.com">ghoflehner@apple.com</a>> wrote:<br><br>Hi<br><br>there have been reports on Ruby segfaults triggered by inlining functions<br>with block label address taken like<br><br><br>//Ruby code snippet<br>vm_exec_core() {<br> finish_insn_seq_0 = &&INSN_LABEL_finish;<br>INSN_LABEL_finish:<br> ;<br>}<br><br>This kind of scenario can also happen when LLVM picks a subset of blocks for<br>inlining,<br>which is the case with the actual code in the Ruby environment.<br><br>Background:<br>LLVM suppresses inlining for such functions when there is an indirect<br>branch. The attached<br>patch does so even when there is no indirect branch. Note that user code<br>like above would not<br>make much sense: using the global for jumping across function boundaries<br>would be illegal.<br><br>Why is there a segfault:<br><br>In the snipped above the block with the label is recognized as dead. So it<br>is eliminated. Instead<br>of a block address the cloner stores a constant (sic!) into the global<br>resulting in the segfault (when<br>the global is used in a goto).<br><br>Why did it work in the past then:<br><br>By luck. In older versions vm_exec_core was also inlined but the label<br>address used<br>was the block label address in vm_exec_core. So the global jump ended up in<br>the original function<br>rather than in the caller which accidentally happened to work.<br><br><br>-Gerolf<br><br><0001-Suppress-inlining-when-the-block-address-is-taken.patch><br><br><br>The change is fine, but I still have more comments about comments….<br><br>Inlining function with block label address taken can cause<br>many problem and requires a rich infrastructure to support<br>including escape analysis. At this point the safest approach<br>to address these problems is by blocking inlining from<br>happening.<br><br><br>Can you please put more of the context into the commit message? You’ve got<br>some good background in the email above, and it would be useful to have that<br>in the commit.<br><br><br><a href="rdar://17245966">rdar://17245966</a><br>---<br>lib/Analysis/IPA/InlineCost.cpp | 21 ++++++++++++++++++++-<br>test/Transforms/Inline/blockaddress.ll | 5 +++--<br>2 files changed, 23 insertions(+), 3 deletions(-)<br><br>diff --git a/lib/Analysis/IPA/InlineCost.cpp<br>b/lib/Analysis/IPA/InlineCost.cpp<br>index 4536583..b079f46 100644<br>--- a/lib/Analysis/IPA/InlineCost.cpp<br>+++ b/lib/Analysis/IPA/InlineCost.cpp<br>@@ -1097,6 +1097,16 @@ bool CallAnalyzer::analyzeCall(CallSite CS) {<br> BasicBlock *BB = BBWorklist[Idx];<br> if (BB->empty())<br> continue;<br>+ // Disallow inlining of functions when the address of a label is<br>+ // taken. Usually when a block address is taken there is also<br>+ // an indirect branch which effectively suppresses inlining.<br>+ // But when we partially inline and only take a subset of the blocks<br>+ // into account there may not be an indirect branch. As cautionary<br>+ // measure we suppress inlining when there is a block label address<br>taken.<br>+ // See also the comments to visitIndirectBrInst() and isInlineViable().<br>+<br>+ if (BB->hasAddressTaken())<br>+ return false;<br><br><br>Please remove the blank line between the comment and the code.<br><br>You should remove the comment sentence beginning with “As cautionary<br>measure…”. It is not just a cautionary measure — it fixes a serious<br>miscompile of Ruby.<br><br>I’m not sure it is necessary to cross-reference the comments in the other<br>functions, but regardless, you should update the visitIndirectBrInst comment<br>to remove this sentence: "And as a QOI issue, if someone is using a<br>blockaddress without an indirectbr, and that reference somehow ends up in<br>another function or global, we probably don't want to inline this function.”<br>With your change here, that comment is no longer relevant.<br><br><br> // Analyze the cost of this block. If we blow through the threshold,<br>this<br> // returns false, and we can bail on out.<br><br>@@ -1279,7 +1289,16 @@ bool InlineCostAnalysis::isInlineViable(Function &F)<br>{<br> Attribute::ReturnsTwice);<br> for (Function::iterator BI = F.begin(), BE = F.end(); BI != BE; ++BI) {<br> // Disallow inlining of functions which contain an indirect branch.<br>- if (isa<IndirectBrInst>(BI->getTerminator()))<br>+ // Also disallow inlining of functions when the address of a label is<br>+ // taken. Usually when a block address is taken there is also<br>+ // an indirect branch which effectively suppresses inlining.<br>+ // But when there is no indirect branch the user wrote a function<br>+ // that takes the label address but does not jump to it within<br>+ // the function. This does not make much sense and could point<br>+ // to code that is not legal (eg. resulting in jump across function<br>+ // boundaries). The check here at least prevents inlining of such code.<br>+ // See also the comments to visitIndirectBrInst() and in analyzeCall().<br>+ if (isa<IndirectBrInst>(BI->getTerminator()) || BI->hasAddressTaken())<br> return false;<br><br> for (BasicBlock::iterator II = BI->begin(), IE = BI->end(); II != IE;<br><br><br>This comment seems a little verbose to me. I would at least remove the<br>cross-references to comments in other functions.<br><br><br><br><br><br><br><br><br></blockquote></blockquote></div><br></div></div></body></html>