<div dir="ltr">Can we transform indirectbr into a chain of comparisons and branches against the old BB labels on inlining? This is probably not profitable, but at least it maintains the correctness of the program while honoring alwaysinline.</div>
<div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, May 1, 2014 at 11:58 AM, Gerolf Hoflehner <span dir="ltr"><<a href="mailto:ghoflehner@apple.com" target="_blank">ghoflehner@apple.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div style="margin:0px;font-family:Menlo">We had some discussion on this mailing list about the always_inline attribute and converged</div>
<div style="margin:0px;font-family:Menlo">at the semantics “always inline when it safe to do so”. In the test</div><div style="margin:0px;font-family:Menlo">case below it is safe to do, and it would expose the bug this patch is taking care</div>
<div style="margin:0px;font-family:Menlo">of, but escape analysis is not ready yet to let inline happen. Until</div><div style="margin:0px;font-family:Menlo">then the test case is sitting on the bench.</div><div style="margin:0px;font-family:Menlo">
<br></div><div style="margin:0px;font-family:Menlo">At one point I had suggested enabling it with an user assertion (option!) like “I tell you always</div><div style="margin:0px;font-family:Menlo">inline is safe”, but got hammered for it. :-) If there is a change of mind and</div>
<div style="margin:0px;font-family:Menlo">the urge for an immediate test case now overrules the concern about an extra option,</div><div style="margin:0px;font-family:Menlo">I will pursue that route again.</div><div style="margin:0px;font-family:Menlo">
<br></div><div style="margin:0px;font-family:Menlo">Cheers</div><div style="margin:0px;font-family:Menlo">Gerolf</div><div style="margin:0px;font-family:Menlo;color:rgb(213,59,211)"><br></div><div style="margin:0px;font-family:Menlo;color:rgb(213,59,211)">
Test case:</div><div style="margin:0px;font-family:Menlo;color:rgb(213,59,211)"><br></div><div style="margin:0px;font-family:Menlo;color:rgb(213,59,211)">#include <span style="color:rgb(195,55,32)"><stdio.h></span></div>
<div style="margin:0px;font-family:Menlo;color:rgb(52,189,38)">extern<span style="color:rgb(0,0,0)"> </span>int<span style="color:rgb(0,0,0)"> foo(</span>int<span style="color:rgb(0,0,0)">);</span></div><div style="margin:0px;font-family:Menlo;color:rgb(52,189,38)">
extern<span style="color:rgb(0,0,0)"> </span>int<span style="color:rgb(0,0,0)"> gv;</span></div><div style="margin:0px;font-family:Menlo;min-height:21px"><br></div><div style="margin:0px;font-family:Menlo">__attribute__((always_inline)) <span style="color:rgb(52,189,38)">int</span> <span style="color:rgb(52,189,38)">static</span> bar(<span style="color:rgb(52,189,38)">int</span> v)</div>
<div style="margin:0px;font-family:Menlo">{</div><div style="margin:0px;font-family:Menlo"> <span style="color:rgb(52,189,38)">void</span> *goToNext;</div><div style="margin:0px;font-family:Menlo;min-height:21px"><br></div>
<div style="margin:0px;font-family:Menlo"> goToNext = &&DecrementV; //local</div><div style="margin:0px;font-family:Menlo"> <span style="color:rgb(206,121,36)">goto</span> DecrementV;</div><div style="margin:0px;font-family:Menlo;min-height:21px">
<br></div><div style="margin:0px;font-family:Menlo;color:rgb(206,121,36)">DecrementV<span style="color:rgb(0,0,0)">:</span></div><div style="margin:0px;font-family:Menlo"> <span style="color:rgb(206,121,36)">if</span> (v == <span style="color:rgb(195,55,32)">1</span>) {</div>
<div style="margin:0px;font-family:Menlo"> goToNext = &&ByeBye; //local</div><div style="margin:0px;font-family:Menlo"> }</div><div style="margin:0px;font-family:Menlo"> v--;</div><div style="margin:0px;font-family:Menlo">
<span style="color:rgb(206,121,36)">goto</span> *goToNext; // can only jump to local label.</div><div style="margin:0px;font-family:Menlo;min-height:21px"><br></div><div style="margin:0px;font-family:Menlo;color:rgb(206,121,36)">
ByeBye<span style="color:rgb(0,0,0)">:</span></div><div style="margin:0px;font-family:Menlo"> <span style="color:rgb(206,121,36)">return</span> foo(v);</div><div style="margin:0px;font-family:Menlo">}</div><div style="margin:0px;font-family:Menlo">
<br></div><div style="margin:0px;font-family:Menlo"><br></div><div style="margin:0px;font-family:Menlo;min-height:21px"><br></div><div style="margin:0px;font-family:Menlo"><span style="color:rgb(52,189,38)">int</span> main() { <span style="color:rgb(52,189,38)">int</span> res; res=bar(gv); <span style="color:rgb(206,121,36)">return</span> res; }</div>
<div style="margin:0px;font-family:Menlo"><br></div><div style="margin:0px;font-family:Menlo"><br></div><div style="margin:0px"><div style="margin:0px"><font face="Menlo">--- test/Transforms/Inline/inline-all-address-taken-blocks.ll<span style="white-space:pre-wrap"> </span>(revision 0)</font></div>
<div style="margin:0px"><font face="Menlo">+++ test/Transforms/Inline/inline-all-address-taken-blocks.ll<span style="white-space:pre-wrap"> </span>(working copy)</font></div><div style="margin:0px"><font face="Menlo">@@ -0,0 +1,39 @@</font></div>
<div style="margin:0px"><font face="Menlo">+; RUN: opt < %s -O3 -S | FileCheck %s</font></div><div style="margin:0px"><font face="Menlo">+@gv = external global i32</font></div><div style="margin:0px"><font face="Menlo">+</font></div>
<div style="margin:0px"><font face="Menlo">+define i32 @main() nounwind {</font></div><div style="margin:0px"><font face="Menlo">+entry:</font></div><div style="margin:0px"><font face="Menlo">+; CHECK-NOT: call i32 @bar</font></div>
<div style="margin:0px"><font face="Menlo">+; CHECK: call i32 @foo</font></div><div style="margin:0px"><font face="Menlo">+ %0 = load i32* @gv, align 4</font></div><div style="margin:0px"><font face="Menlo">+ %call = tail call i32 @bar(i32 %0)</font></div>
<div style="margin:0px"><font face="Menlo">+ ret i32 %call</font></div><div style="margin:0px"><font face="Menlo">+}</font></div><div style="margin:0px"><font face="Menlo">+</font></div><div style="margin:0px"><font face="Menlo">+define internal i32 @bar(i32 %v) alwaysinline {</font></div>
<div style="margin:0px"><font face="Menlo">+entry:</font></div><div style="margin:0px"><font face="Menlo">+ br label %DecrementV</font></div><div style="margin:0px"><font face="Menlo">+</font></div><div style="margin:0px">
<font face="Menlo">+DecrementV: ; preds = %entry, %DecrementV</font></div><div style="margin:0px"><font face="Menlo">+ %v.addr.0 = phi i32 [ %v, %entry ], [ %dec, %DecrementV ]</font></div>
<div style="margin:0px"><font face="Menlo">+ %goToNext.0 = phi i8* [ blockaddress(@bar, %DecrementV), %entry ], [ %.goToNext.0, %DecrementV ]</font></div><div style="margin:0px"><font face="Menlo">+ %cmp = icmp eq i32 %v.addr.0, 1</font></div>
<div style="margin:0px"><font face="Menlo">+ %.goToNext.0 = select i1 %cmp, i8* blockaddress(@bar, %ByeBye), i8* %goToNext.0</font></div><div style="margin:0px"><font face="Menlo">+ %dec = add nsw i32 %v.addr.0, -1</font></div>
<div style="margin:0px"><font face="Menlo">+ indirectbr i8* %.goToNext.0, [label %DecrementV, label %ByeBye]</font></div><div style="margin:0px"><font face="Menlo">+</font></div><div style="margin:0px"><font face="Menlo">+ByeBye: ; preds = %DecrementV</font></div>
<div style="margin:0px"><font face="Menlo">+ %call = tail call i32 @foo(i32 %dec) #3</font></div><div style="margin:0px"><font face="Menlo">+ ret i32 %call</font></div><div style="margin:0px"><font face="Menlo">+}</font></div>
<div style="margin:0px"><font face="Menlo">+</font></div><div style="margin:0px"><font face="Menlo">+declare i32 @foo(i32) nounwind</font></div><div><font face="Menlo"><br></font></div></div><div><div class="h5"><div><div>
On May 1, 2014, at 11:03 AM, Eric Christopher <<a href="mailto:echristo@gmail.com" target="_blank">echristo@gmail.com</a>> wrote:</div><br><blockquote type="cite">It's not particularly possible to test it. I had the discussion with<br>
Gerolf there a bit. It mostly relies on being able to inline things<br>that right now we wouldn't be able to without better escape analysis.<br>I think the bug is real which is why I finally said ok. :\<br><br>-eric<br>
<br>On Wed, Apr 30, 2014 at 9:11 PM, Rafael Espíndola<br><<a href="mailto:rafael.espindola@gmail.com" target="_blank">rafael.espindola@gmail.com</a>> wrote:<br><blockquote type="cite">is it possible to test this?<br>
<br>On 30 April 2014 18:05, Gerolf Hoflehner <<a href="mailto:ghoflehner@apple.com" target="_blank">ghoflehner@apple.com</a>> wrote:<br><blockquote type="cite">Author: ghoflehner<br>Date: Wed Apr 30 17:05:02 2014<br>
New Revision: 207713<br><br>URL: <a href="http://llvm.org/viewvc/llvm-project?rev=207713&view=rev" target="_blank">http://llvm.org/viewvc/llvm-project?rev=207713&view=rev</a><br>Log:<br>Patch for function cloning to inline all blocks whose address is taken<br>
<br>Not all address taken blocks get inlined. The reason is<br>that a blocks new address is known only when it is cloned. But e.g.<br>a branch instruction in a different block could need that address earlier<br>while it gets cloned. The solution is to collect the set of all<br>
blocks that can potentially get inlined and compute a new block address<br>up front. Then clone and cleanup.<br><br><a>rdar://16427209</a><br><br>Modified:<br> llvm/trunk/lib/Transforms/Utils/CloneFunction.cpp<br><br>Modified: llvm/trunk/lib/Transforms/Utils/CloneFunction.cpp<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Utils/CloneFunction.cpp?rev=207713&r1=207712&r2=207713&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Utils/CloneFunction.cpp?rev=207713&r1=207712&r2=207713&view=diff</a><br>
==============================================================================<br>--- llvm/trunk/lib/Transforms/Utils/CloneFunction.cpp (original)<br>+++ llvm/trunk/lib/Transforms/Utils/CloneFunction.cpp Wed Apr 30 17:05:02 2014<br>
@@ -32,6 +32,7 @@<br> #include "llvm/Transforms/Utils/Local.h"<br> #include "llvm/Transforms/Utils/ValueMapper.h"<br> #include <map><br>+#include <set><br> using namespace llvm;<br><br> // CloneBasicBlock - See comments in Cloning.h<br>
@@ -272,42 +273,36 @@ namespace {<br> NameSuffix(nameSuffix), CodeInfo(codeInfo), DL(DL) {<br> }<br><br>- /// CloneBlock - The specified block is found to be reachable, clone it and<br>- /// anything that it can reach.<br>
+ /// CloneBlock - The specified block is found to be reachable, so clone it<br>+ /// into NewBB.<br>+ /// ToClone is the vector of actually cloned blocks.<br>+ /// OrigBBs is the set of all blocks reacheable from the entry block.<br>
+ /// It contains the block candidates and makes sure each block<br>+ /// is cloned at most once.<br> void CloneBlock(const BasicBlock *BB,<br>- std::vector<const BasicBlock*> &ToClone);<br>
+ BasicBlock *NewBB,<br>+ std::vector<const BasicBlock *> &ToClone,<br>+ std::set<const BasicBlock *> &OrigBBs);<br> };<br> }<br><br>-/// CloneBlock - The specified block is found to be reachable, clone it and<br>
-/// anything that it can reach.<br>+/// CloneBlock - The specified block is found to be reachable, so clone it<br>+/// into NewBB.<br>+/// ToClone is the vector of actually cloned blocks.<br>+/// OrigBBs is the set of all blocks reacheable from the entry block.<br>
+/// It contains the block candidates and makes sure each block<br>+/// is cloned at most once.<br> void PruningFunctionCloner::CloneBlock(const BasicBlock *BB,<br>- std::vector<const BasicBlock*> &ToClone){<br>
- WeakVH &BBEntry = VMap[BB];<br>-<br>- // Have we already cloned this block?<br>- if (BBEntry) return;<br>+ BasicBlock *NewBB,<br>+ std::vector<const BasicBlock *> &ToClone,<br>
+ std::set<const BasicBlock *> &OrigBBs) {<br><br>- // Nope, clone it now.<br>- BasicBlock *NewBB;<br>- BBEntry = NewBB = BasicBlock::Create(BB->getContext());<br>- if (BB->hasName()) NewBB->setName(BB->getName()+NameSuffix);<br>
+ // Remove BB from list of blocks to clone.<br>+ // When it was not in the list, it has been cloned already, so<br>+ // don't clone again.<br>+ if (!OrigBBs.erase(BB)) return;<br><br>- // It is only legal to clone a function if a block address within that<br>
- // function is never referenced outside of the function. Given that, we<br>- // want to map block addresses from the old function to block addresses in<br>- // the clone. (This is different from the generic ValueMapper<br>
- // implementation, which generates an invalid blockaddress when<br>- // cloning a function.)<br>- //<br>- // Note that we don't need to fix the mapping for unreachable blocks;<br>- // the default mapping there is safe.<br>
- if (BB->hasAddressTaken()) {<br>- Constant *OldBBAddr = BlockAddress::get(const_cast<Function*>(OldFunc),<br>- const_cast<BasicBlock*>(BB));<br>- VMap[OldBBAddr] = BlockAddress::get(NewFunc, NewBB);<br>
- }<br>-<br>+ // Nope, clone it now.<br><br> bool hasCalls = false, hasDynamicAllocas = false, hasStaticAllocas = false;<br><br>@@ -425,7 +420,7 @@ void llvm::CloneAndPruneFunctionInto(Fun<br> const DataLayout *DL,<br>
Instruction *TheCall) {<br> assert(NameSuffix && "NameSuffix cannot be null!");<br>-<br>+<br> #ifndef NDEBUG<br> for (Function::const_arg_iterator II = OldFunc->arg_begin(),<br>
E = OldFunc->arg_end(); II != E; ++II)<br>@@ -435,15 +430,91 @@ void llvm::CloneAndPruneFunctionInto(Fun<br> PruningFunctionCloner PFC(NewFunc, OldFunc, VMap, ModuleLevelChanges,<br> NameSuffix, CodeInfo, DL);<br>
<br>- // Clone the entry block, and anything recursively reachable from it.<br>+ // Since all BB address references need to be known before block-by-block<br>+ // processing, we need to create all reachable blocks before processing<br>
+ // them for instruction cloning and pruning. Some of these blocks may<br>+ // be removed due to later pruning.<br> std::vector<const BasicBlock*> CloneWorklist;<br>+ //<br>+ // OrigBBs consists of all blocks reachable from the entry<br>
+ // block.<br>+ // This list will be pruned down by the CloneFunction() due to two<br>+ // two optimizations:<br>+ // First, when a conditional branch target is known at compile-time,<br>+ // only the actual branch destination block needs to be cloned.<br>
+ // Second, when a switch statement target is known at compile-time,<br>+ // only the actual case statement needs to be cloned.<br>+ std::set<const BasicBlock *> OrigBBs;<br>+<br>+ CloneWorklist.push_back(&OldFunc->getEntryBlock());<br>
+ while (!CloneWorklist.empty()) {<br>+ const BasicBlock *BB = CloneWorklist.back();<br>+ CloneWorklist.pop_back();<br>+<br>+ // Don't revisit blocks.<br>+ if (VMap.count(BB))<br>+ continue;<br>+<br>
+ BasicBlock *NewBB = BasicBlock::Create(BB->getContext());<br>+ if (BB->hasName())<br>+ NewBB->setName(BB->getName() + NameSuffix);<br>+<br>+ // It is legal to clone a function when a block address within that<br>
+ // function is never escapes outside of the function. Given that, we<br>+ // want to map block addresses from the old function to block addresses in<br>+ // the clone. (This is different from the generic ValueMapper<br>
+ // implementation, which generates an invalid block address when<br>+ // cloning a function.)<br>+ // Note the current escape address does not catch all legal cases: even<br>+ // when all block addresses taken are local and the function has the<br>
+ // always_inline attribute due to the indirect branch inlining is<br>+ // suppressed.<br>+ // Note that we don't need to fix the mapping for unreachable blocks;<br>+ // the default mapping there is safe.<br>
+ if (BB->hasAddressTaken()) {<br>+ Constant *OldBBAddr = BlockAddress::get(const_cast<Function*>(OldFunc),<br>+ const_cast<BasicBlock*>(BB));<br>+ VMap[OldBBAddr] = BlockAddress::get(NewFunc, NewBB);<br>
+ }<br>+<br>+ OrigBBs.insert(BB);<br>+ VMap[BB] = NewBB;<br>+ // Iterate over all possible successors and add them to the CloneWorklist.<br>+ const TerminatorInst *Term = BB->getTerminator();<br>+ for (unsigned i = 0, e = Term->getNumSuccessors(); i != e; ++i) {<br>
+ BasicBlock *Succ = Term->getSuccessor(i);<br>+ CloneWorklist.push_back(Succ);<br>+ }<br>+ }<br>+<br>+ // Now, fill only the reachable blocks with the cloned contents<br>+ // of the originals.<br>+ assert(CloneWorklist.empty() && "Dirty worklist before re-use\n");<br>
CloneWorklist.push_back(&OldFunc->getEntryBlock());<br> while (!CloneWorklist.empty()) {<br> const BasicBlock *BB = CloneWorklist.back();<br> CloneWorklist.pop_back();<br>- PFC.CloneBlock(BB, CloneWorklist);<br>
+ PFC.CloneBlock(BB, cast<BasicBlock>(VMap[BB]), CloneWorklist,<br>+ OrigBBs);<br> }<br>-<br>+<br>+ // FIXME: Delete BB's that were created but have been pruned.<br>+ // Actual cloning may have found pruning opportunities since<br>
+ // branch or switch statement target may have been known at compile-time.<br>+ // Alternatively we could write a routine CloneFunction and add a) a<br>+ // parameter to actually do the cloning and b) a return parameter that<br>
+ // gives a list of blocks that need to be cloned also. Then we could<br>+ // call CloneFunction when we collect the blocks to clone, but suppress<br>+ // cloning. And then actually *do* the cloning in the while loop above. Then<br>
+ // the cleanup here would become redundant, and so would be the OrigBBs.<br>+ for (std::set<const BasicBlock *>::iterator Oi = OrigBBs.begin(),<br>+ Oe = OrigBBs.end(); Oi != Oe; ++Oi) {<br>+ const BasicBlock *Orig = *Oi;<br>
+ BasicBlock *NewBB = cast<BasicBlock>(VMap[Orig]);<br>+ delete NewBB;<br>+ VMap[Orig] = 0;<br>+ }<br>+<br> // Loop over all of the basic blocks in the old function. If the block was<br> // reachable, we have cloned it and the old block is now in the value map:<br>
// insert it into the new function in the right order. If not, ignore it.<br>@@ -454,7 +525,8 @@ void llvm::CloneAndPruneFunctionInto(Fun<br> BI != BE; ++BI) {<br> Value *V = VMap[BI];<br> BasicBlock *NewBB = cast_or_null<BasicBlock>(V);<br>
- if (!NewBB) continue; // Dead block.<br>+ if (!NewBB)<br>+ continue; // Dead block.<br><br> // Add the new block to the new function.<br> NewFunc->getBasicBlockList().push_back(NewBB);<br><br><br>
_______________________________________________<br>llvm-commits mailing list<br><a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>
</blockquote>_______________________________________________<br>llvm-commits mailing list<br><a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>
</blockquote></blockquote></div><br></div></div></div><br>_______________________________________________<br>
llvm-commits mailing list<br>
<a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>
<br></blockquote></div><br></div>