[PATCH] D41860: [CallSiteSplitting] Support splitting of blocks with instrs before call.

Mon Feb 5 03:35:23 PST 2018

fhahn marked 2 inline comments as done.
fhahn added inline comments.

================
Comment at: lib/Transforms/Scalar/CallSiteSplitting.cpp:192-193
   BasicBlock *CallSiteBB = Instr->getParent();
-  if (Instr != CallSiteBB->getFirstNonPHIOrDbg())
+  if (std::distance(CallSiteBB->getFirstNonPHIOrDbg()->getIterator(),
+                    Instr->getIterator()) >= DuplicationThreshold)
     return false;
----------------
junbuml wrote:
> Here we assume that the duplication costs of instructions are all same. However, I believe this assumption is not always true because each instruction here will end up differently after lowering. For example, if there is another call before the call we are trying to split, the cost for such call should be more expensive than other simple instructions (e.g., add or trunc ).  If such call duplicated is inlined, the code size could be significantly impacted. I think we should consider using getUserCost(), instead of number of instructions ? 
I've updated the code to use getInstructionCost with TCK_CodeSize. Per default it uses getUserCost, which seems OK, although in some cases, the cost it returns might be too high for our case (e.g. `sdiv` has a high cost although it might be lowered to a single instruction)

================
Comment at: lib/Transforms/Scalar/CallSiteSplitting.cpp:313
+        NewPN->addIncoming(Mapping[CurrentI],
+													 cast<Instruction>(Mapping[CurrentI])->getParent());
+      NewPN->insertBefore(&*TailBB->begin());
----------------
junbuml wrote:
> If CurrnetI is PHI, "Mapping[CurrentI])->getParent()" may not point the new SplitBlock becasue DuplicateInstructionsInSplitBetween map a PHI with its original incoming value. Isn't it okay to skip doing this if CurrnetI is PHI? 
> 
> Based on your test_remove_unused_phi(), we should add a test case something like this : 
> 
> ```
> define void @test_no_remove_used_phi(i32* %ptrarg, i32* %ptrarg2, i32 %i, i32 %i2) {
> Header:
>   %l1 = load i32, i32* undef, align 16
>   %tobool = icmp ne i32* %ptrarg, null
>   br i1 %tobool, label %TBB, label %CallSite
> 
> TBB:                                    ; preds = %Header
>   %arrayidx = getelementptr inbounds i32, i32* %ptrarg, i64 42
>   %0 = load i32, i32* %arrayidx, align 4
>   %l2 = load i32, i32* undef, align 16
>   %tobool1 = icmp ne i32 %0, 0
>   br i1 %tobool1, label %CallSite, label %End
> 
> CallSite:                                          ; preds = %TBB, %Header
>   %l = phi i32 [ %l1, %Header ], [ %l2, %TBB ]
>   call void @bar(i32* %ptrarg, i32 %l)
>   call void @bari(i32 %l)                   <------------ %l is used here, so it will hit the case I mentioned.
>   br label %End
> 
> End:                                           ; preds = %CallSite, %TBB
>   ret void
> }
> 
> ```
Thanks, there is indeed no need to re-create the PHI node there. I've added your test case. The incoming basic blocks in the PHI node are updated to use the split blocks. I assume this is done when splitting. Not sure if it would be better to use the original basic blocks. But I think that should be follow up patch if we want to change it ;)

https://reviews.llvm.org/D41860