<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div style="margin: 0px; font-family: Menlo;">We had some discussion on this mailing list about the always_inline attribute and converged</div><div style="margin: 0px; font-family: Menlo;">at the semantics “always inline when it safe to do so”. In the test</div><div style="margin: 0px; font-family: Menlo;">case below it is safe to do, and it would expose the bug this patch is taking care</div><div style="margin: 0px; font-family: Menlo;">of, but escape analysis is not ready yet to let inline happen. Until</div><div style="margin: 0px; font-family: Menlo;">then the test case is sitting on the bench.</div><div style="margin: 0px; font-family: Menlo;"><br></div><div style="margin: 0px; font-family: Menlo;">At one point I had suggested enabling it with an user assertion (option!) like “I tell you always</div><div style="margin: 0px; font-family: Menlo;">inline is safe”, but got hammered for it. :-) If there is a change of mind and</div><div style="margin: 0px; font-family: Menlo;">the urge for an immediate test case now overrules the concern about an extra option,</div><div style="margin: 0px; font-family: Menlo;">I will pursue that route again.</div><div style="margin: 0px; font-family: Menlo;"><br></div><div style="margin: 0px; font-family: Menlo;">Cheers</div><div style="margin: 0px; font-family: Menlo;">Gerolf</div><div style="margin: 0px; font-family: Menlo; color: rgb(213, 59, 211);"><br></div><div style="margin: 0px; font-family: Menlo; color: rgb(213, 59, 211);">Test case:</div><div style="margin: 0px; font-family: Menlo; color: rgb(213, 59, 211);"><br></div><div style="margin: 0px; font-family: Menlo; color: rgb(213, 59, 211);">#include <span style="color: rgb(195, 55, 32);"><stdio.h></span></div><div style="margin: 0px; font-family: Menlo; color: rgb(52, 189, 38);">extern<span style="color: rgb(0, 0, 0);"> </span>int<span style="color: rgb(0, 0, 0);"> foo(</span>int<span style="color: rgb(0, 0, 0);">);</span></div><div style="margin: 0px; font-family: Menlo; color: rgb(52, 189, 38);">extern<span style="color: rgb(0, 0, 0);"> </span>int<span style="color: rgb(0, 0, 0);"> gv;</span></div><div style="margin: 0px; font-family: Menlo; min-height: 21px;"><br></div><div style="margin: 0px; font-family: Menlo;">__attribute__((always_inline)) <span style="color: rgb(52, 189, 38);">int</span> <span style="color: rgb(52, 189, 38);">static</span> bar(<span style="color: rgb(52, 189, 38);">int</span> v)</div><div style="margin: 0px; font-family: Menlo;">{</div><div style="margin: 0px; font-family: Menlo;"> <span style="color: rgb(52, 189, 38);">void</span> *goToNext;</div><div style="margin: 0px; font-family: Menlo; min-height: 21px;"><br></div><div style="margin: 0px; font-family: Menlo;"> goToNext = &&DecrementV; //local</div><div style="margin: 0px; font-family: Menlo;"> <span style="color: rgb(206, 121, 36);">goto</span> DecrementV;</div><div style="margin: 0px; font-family: Menlo; min-height: 21px;"><br></div><div style="margin: 0px; font-family: Menlo; color: rgb(206, 121, 36);">DecrementV<span style="color: rgb(0, 0, 0);">:</span></div><div style="margin: 0px; font-family: Menlo;"> <span style="color: rgb(206, 121, 36);">if</span> (v == <span style="color: rgb(195, 55, 32);">1</span>) {</div><div style="margin: 0px; font-family: Menlo;"> goToNext = &&ByeBye; //local</div><div style="margin: 0px; font-family: Menlo;"> }</div><div style="margin: 0px; font-family: Menlo;"> v--;</div><div style="margin: 0px; font-family: Menlo;"> <span style="color: rgb(206, 121, 36);">goto</span> *goToNext; // can only jump to local label.</div><div style="margin: 0px; font-family: Menlo; min-height: 21px;"><br></div><div style="margin: 0px; font-family: Menlo; color: rgb(206, 121, 36);">ByeBye<span style="color: rgb(0, 0, 0);">:</span></div><div style="margin: 0px; font-family: Menlo;"> <span style="color: rgb(206, 121, 36);">return</span> foo(v);</div><div style="margin: 0px; font-family: Menlo;">}</div><div style="margin: 0px; font-family: Menlo;"><br></div><div style="margin: 0px; font-family: Menlo;"><br></div><div style="margin: 0px; font-family: Menlo; min-height: 21px;"><br></div><div style="margin: 0px; font-family: Menlo;"><span style="color: rgb(52, 189, 38);">int</span> main() { <span style="color: rgb(52, 189, 38);">int</span> res; res=bar(gv); <span style="color: rgb(206, 121, 36);">return</span> res; }</div><div style="margin: 0px; font-family: Menlo;"><br></div><div style="margin: 0px; font-family: Menlo;"><br></div><div style="margin: 0px;"><div style="margin: 0px;"><font face="Menlo">--- test/Transforms/Inline/inline-all-address-taken-blocks.ll<span class="Apple-tab-span" style="white-space: pre;"> </span>(revision 0)</font></div><div style="margin: 0px;"><font face="Menlo">+++ test/Transforms/Inline/inline-all-address-taken-blocks.ll<span class="Apple-tab-span" style="white-space: pre;"> </span>(working copy)</font></div><div style="margin: 0px;"><font face="Menlo">@@ -0,0 +1,39 @@</font></div><div style="margin: 0px;"><font face="Menlo">+; RUN: opt < %s -O3 -S | FileCheck %s</font></div><div style="margin: 0px;"><font face="Menlo">+@gv = external global i32</font></div><div style="margin: 0px;"><font face="Menlo">+</font></div><div style="margin: 0px;"><font face="Menlo">+define i32 @main() nounwind {</font></div><div style="margin: 0px;"><font face="Menlo">+entry:</font></div><div style="margin: 0px;"><font face="Menlo">+; CHECK-NOT: call i32 @bar</font></div><div style="margin: 0px;"><font face="Menlo">+; CHECK: call i32 @foo</font></div><div style="margin: 0px;"><font face="Menlo">+ %0 = load i32* @gv, align 4</font></div><div style="margin: 0px;"><font face="Menlo">+ %call = tail call i32 @bar(i32 %0)</font></div><div style="margin: 0px;"><font face="Menlo">+ ret i32 %call</font></div><div style="margin: 0px;"><font face="Menlo">+}</font></div><div style="margin: 0px;"><font face="Menlo">+</font></div><div style="margin: 0px;"><font face="Menlo">+define internal i32 @bar(i32 %v) alwaysinline {</font></div><div style="margin: 0px;"><font face="Menlo">+entry:</font></div><div style="margin: 0px;"><font face="Menlo">+ br label %DecrementV</font></div><div style="margin: 0px;"><font face="Menlo">+</font></div><div style="margin: 0px;"><font face="Menlo">+DecrementV: ; preds = %entry, %DecrementV</font></div><div style="margin: 0px;"><font face="Menlo">+ %v.addr.0 = phi i32 [ %v, %entry ], [ %dec, %DecrementV ]</font></div><div style="margin: 0px;"><font face="Menlo">+ %goToNext.0 = phi i8* [ blockaddress(@bar, %DecrementV), %entry ], [ %.goToNext.0, %DecrementV ]</font></div><div style="margin: 0px;"><font face="Menlo">+ %cmp = icmp eq i32 %v.addr.0, 1</font></div><div style="margin: 0px;"><font face="Menlo">+ %.goToNext.0 = select i1 %cmp, i8* blockaddress(@bar, %ByeBye), i8* %goToNext.0</font></div><div style="margin: 0px;"><font face="Menlo">+ %dec = add nsw i32 %v.addr.0, -1</font></div><div style="margin: 0px;"><font face="Menlo">+ indirectbr i8* %.goToNext.0, [label %DecrementV, label %ByeBye]</font></div><div style="margin: 0px;"><font face="Menlo">+</font></div><div style="margin: 0px;"><font face="Menlo">+ByeBye: ; preds = %DecrementV</font></div><div style="margin: 0px;"><font face="Menlo">+ %call = tail call i32 @foo(i32 %dec) #3</font></div><div style="margin: 0px;"><font face="Menlo">+ ret i32 %call</font></div><div style="margin: 0px;"><font face="Menlo">+}</font></div><div style="margin: 0px;"><font face="Menlo">+</font></div><div style="margin: 0px;"><font face="Menlo">+declare i32 @foo(i32) nounwind</font></div><div><font face="Menlo"><br></font></div></div><div><div>On May 1, 2014, at 11:03 AM, Eric Christopher <<a href="mailto:echristo@gmail.com">echristo@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">It's not particularly possible to test it. I had the discussion with<br>Gerolf there a bit. It mostly relies on being able to inline things<br>that right now we wouldn't be able to without better escape analysis.<br>I think the bug is real which is why I finally said ok. :\<br><br>-eric<br><br>On Wed, Apr 30, 2014 at 9:11 PM, Rafael Espíndola<br><<a href="mailto:rafael.espindola@gmail.com">rafael.espindola@gmail.com</a>> wrote:<br><blockquote type="cite">is it possible to test this?<br><br>On 30 April 2014 18:05, Gerolf Hoflehner <<a href="mailto:ghoflehner@apple.com">ghoflehner@apple.com</a>> wrote:<br><blockquote type="cite">Author: ghoflehner<br>Date: Wed Apr 30 17:05:02 2014<br>New Revision: 207713<br><br>URL: <a href="http://llvm.org/viewvc/llvm-project?rev=207713&view=rev">http://llvm.org/viewvc/llvm-project?rev=207713&view=rev</a><br>Log:<br>Patch for function cloning to inline all blocks whose address is taken<br><br>Not all address taken blocks get inlined. The reason is<br>that a blocks new address is known only when it is cloned. But e.g.<br>a branch instruction in a different block could need that address earlier<br>while it gets cloned. The solution is to collect the set of all<br>blocks that can potentially get inlined and compute a new block address<br>up front. Then clone and cleanup.<br><br><a href="rdar://16427209">rdar://16427209</a><br><br>Modified:<br> llvm/trunk/lib/Transforms/Utils/CloneFunction.cpp<br><br>Modified: llvm/trunk/lib/Transforms/Utils/CloneFunction.cpp<br>URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Utils/CloneFunction.cpp?rev=207713&r1=207712&r2=207713&view=diff<br>==============================================================================<br>--- llvm/trunk/lib/Transforms/Utils/CloneFunction.cpp (original)<br>+++ llvm/trunk/lib/Transforms/Utils/CloneFunction.cpp Wed Apr 30 17:05:02 2014<br>@@ -32,6 +32,7 @@<br> #include "llvm/Transforms/Utils/Local.h"<br> #include "llvm/Transforms/Utils/ValueMapper.h"<br> #include <map><br>+#include <set><br> using namespace llvm;<br><br> // CloneBasicBlock - See comments in Cloning.h<br>@@ -272,42 +273,36 @@ namespace {<br> NameSuffix(nameSuffix), CodeInfo(codeInfo), DL(DL) {<br> }<br><br>- /// CloneBlock - The specified block is found to be reachable, clone it and<br>- /// anything that it can reach.<br>+ /// CloneBlock - The specified block is found to be reachable, so clone it<br>+ /// into NewBB.<br>+ /// ToClone is the vector of actually cloned blocks.<br>+ /// OrigBBs is the set of all blocks reacheable from the entry block.<br>+ /// It contains the block candidates and makes sure each block<br>+ /// is cloned at most once.<br> void CloneBlock(const BasicBlock *BB,<br>- std::vector<const BasicBlock*> &ToClone);<br>+ BasicBlock *NewBB,<br>+ std::vector<const BasicBlock *> &ToClone,<br>+ std::set<const BasicBlock *> &OrigBBs);<br> };<br> }<br><br>-/// CloneBlock - The specified block is found to be reachable, clone it and<br>-/// anything that it can reach.<br>+/// CloneBlock - The specified block is found to be reachable, so clone it<br>+/// into NewBB.<br>+/// ToClone is the vector of actually cloned blocks.<br>+/// OrigBBs is the set of all blocks reacheable from the entry block.<br>+/// It contains the block candidates and makes sure each block<br>+/// is cloned at most once.<br> void PruningFunctionCloner::CloneBlock(const BasicBlock *BB,<br>- std::vector<const BasicBlock*> &ToClone){<br>- WeakVH &BBEntry = VMap[BB];<br>-<br>- // Have we already cloned this block?<br>- if (BBEntry) return;<br>+ BasicBlock *NewBB,<br>+ std::vector<const BasicBlock *> &ToClone,<br>+ std::set<const BasicBlock *> &OrigBBs) {<br><br>- // Nope, clone it now.<br>- BasicBlock *NewBB;<br>- BBEntry = NewBB = BasicBlock::Create(BB->getContext());<br>- if (BB->hasName()) NewBB->setName(BB->getName()+NameSuffix);<br>+ // Remove BB from list of blocks to clone.<br>+ // When it was not in the list, it has been cloned already, so<br>+ // don't clone again.<br>+ if (!OrigBBs.erase(BB)) return;<br><br>- // It is only legal to clone a function if a block address within that<br>- // function is never referenced outside of the function. Given that, we<br>- // want to map block addresses from the old function to block addresses in<br>- // the clone. (This is different from the generic ValueMapper<br>- // implementation, which generates an invalid blockaddress when<br>- // cloning a function.)<br>- //<br>- // Note that we don't need to fix the mapping for unreachable blocks;<br>- // the default mapping there is safe.<br>- if (BB->hasAddressTaken()) {<br>- Constant *OldBBAddr = BlockAddress::get(const_cast<Function*>(OldFunc),<br>- const_cast<BasicBlock*>(BB));<br>- VMap[OldBBAddr] = BlockAddress::get(NewFunc, NewBB);<br>- }<br>-<br>+ // Nope, clone it now.<br><br> bool hasCalls = false, hasDynamicAllocas = false, hasStaticAllocas = false;<br><br>@@ -425,7 +420,7 @@ void llvm::CloneAndPruneFunctionInto(Fun<br> const DataLayout *DL,<br> Instruction *TheCall) {<br> assert(NameSuffix && "NameSuffix cannot be null!");<br>-<br>+<br> #ifndef NDEBUG<br> for (Function::const_arg_iterator II = OldFunc->arg_begin(),<br> E = OldFunc->arg_end(); II != E; ++II)<br>@@ -435,15 +430,91 @@ void llvm::CloneAndPruneFunctionInto(Fun<br> PruningFunctionCloner PFC(NewFunc, OldFunc, VMap, ModuleLevelChanges,<br> NameSuffix, CodeInfo, DL);<br><br>- // Clone the entry block, and anything recursively reachable from it.<br>+ // Since all BB address references need to be known before block-by-block<br>+ // processing, we need to create all reachable blocks before processing<br>+ // them for instruction cloning and pruning. Some of these blocks may<br>+ // be removed due to later pruning.<br> std::vector<const BasicBlock*> CloneWorklist;<br>+ //<br>+ // OrigBBs consists of all blocks reachable from the entry<br>+ // block.<br>+ // This list will be pruned down by the CloneFunction() due to two<br>+ // two optimizations:<br>+ // First, when a conditional branch target is known at compile-time,<br>+ // only the actual branch destination block needs to be cloned.<br>+ // Second, when a switch statement target is known at compile-time,<br>+ // only the actual case statement needs to be cloned.<br>+ std::set<const BasicBlock *> OrigBBs;<br>+<br>+ CloneWorklist.push_back(&OldFunc->getEntryBlock());<br>+ while (!CloneWorklist.empty()) {<br>+ const BasicBlock *BB = CloneWorklist.back();<br>+ CloneWorklist.pop_back();<br>+<br>+ // Don't revisit blocks.<br>+ if (VMap.count(BB))<br>+ continue;<br>+<br>+ BasicBlock *NewBB = BasicBlock::Create(BB->getContext());<br>+ if (BB->hasName())<br>+ NewBB->setName(BB->getName() + NameSuffix);<br>+<br>+ // It is legal to clone a function when a block address within that<br>+ // function is never escapes outside of the function. Given that, we<br>+ // want to map block addresses from the old function to block addresses in<br>+ // the clone. (This is different from the generic ValueMapper<br>+ // implementation, which generates an invalid block address when<br>+ // cloning a function.)<br>+ // Note the current escape address does not catch all legal cases: even<br>+ // when all block addresses taken are local and the function has the<br>+ // always_inline attribute due to the indirect branch inlining is<br>+ // suppressed.<br>+ // Note that we don't need to fix the mapping for unreachable blocks;<br>+ // the default mapping there is safe.<br>+ if (BB->hasAddressTaken()) {<br>+ Constant *OldBBAddr = BlockAddress::get(const_cast<Function*>(OldFunc),<br>+ const_cast<BasicBlock*>(BB));<br>+ VMap[OldBBAddr] = BlockAddress::get(NewFunc, NewBB);<br>+ }<br>+<br>+ OrigBBs.insert(BB);<br>+ VMap[BB] = NewBB;<br>+ // Iterate over all possible successors and add them to the CloneWorklist.<br>+ const TerminatorInst *Term = BB->getTerminator();<br>+ for (unsigned i = 0, e = Term->getNumSuccessors(); i != e; ++i) {<br>+ BasicBlock *Succ = Term->getSuccessor(i);<br>+ CloneWorklist.push_back(Succ);<br>+ }<br>+ }<br>+<br>+ // Now, fill only the reachable blocks with the cloned contents<br>+ // of the originals.<br>+ assert(CloneWorklist.empty() && "Dirty worklist before re-use\n");<br> CloneWorklist.push_back(&OldFunc->getEntryBlock());<br> while (!CloneWorklist.empty()) {<br> const BasicBlock *BB = CloneWorklist.back();<br> CloneWorklist.pop_back();<br>- PFC.CloneBlock(BB, CloneWorklist);<br>+ PFC.CloneBlock(BB, cast<BasicBlock>(VMap[BB]), CloneWorklist,<br>+ OrigBBs);<br> }<br>-<br>+<br>+ // FIXME: Delete BB's that were created but have been pruned.<br>+ // Actual cloning may have found pruning opportunities since<br>+ // branch or switch statement target may have been known at compile-time.<br>+ // Alternatively we could write a routine CloneFunction and add a) a<br>+ // parameter to actually do the cloning and b) a return parameter that<br>+ // gives a list of blocks that need to be cloned also. Then we could<br>+ // call CloneFunction when we collect the blocks to clone, but suppress<br>+ // cloning. And then actually *do* the cloning in the while loop above. Then<br>+ // the cleanup here would become redundant, and so would be the OrigBBs.<br>+ for (std::set<const BasicBlock *>::iterator Oi = OrigBBs.begin(),<br>+ Oe = OrigBBs.end(); Oi != Oe; ++Oi) {<br>+ const BasicBlock *Orig = *Oi;<br>+ BasicBlock *NewBB = cast<BasicBlock>(VMap[Orig]);<br>+ delete NewBB;<br>+ VMap[Orig] = 0;<br>+ }<br>+<br> // Loop over all of the basic blocks in the old function. If the block was<br> // reachable, we have cloned it and the old block is now in the value map:<br> // insert it into the new function in the right order. If not, ignore it.<br>@@ -454,7 +525,8 @@ void llvm::CloneAndPruneFunctionInto(Fun<br> BI != BE; ++BI) {<br> Value *V = VMap[BI];<br> BasicBlock *NewBB = cast_or_null<BasicBlock>(V);<br>- if (!NewBB) continue; // Dead block.<br>+ if (!NewBB)<br>+ continue; // Dead block.<br><br> // Add the new block to the new function.<br> NewFunc->getBasicBlockList().push_back(NewBB);<br><br><br>_______________________________________________<br>llvm-commits mailing list<br>llvm-commits@cs.uiuc.edu<br>http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits<br></blockquote>_______________________________________________<br>llvm-commits mailing list<br><a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits<br></blockquote></blockquote></div><br></body></html>