fix for Ruby build fails due to inlining functions with block label address taken

Gerolf Hoflehner ghoflehner at apple.com
Fri Jun 27 17:00:05 PDT 2014


Yes, I’ll kick of tests on x86 and ARM. Would be great if you could run some benchmarks on x86.

Cheers
Gerolf

On Jun 27, 2014, at 4:50 PM, Eric Christopher <echristo at gmail.com> wrote:

> That does seem reasonably straightforward and makes sense on the surface.
> 
> Did you want to benchmark it and see what you get? If you'd like I can
> do the same here.
> 
> -eric
> 
> On Fri, Jun 27, 2014 at 4:26 PM, Gerolf Hoflehner <ghoflehner at apple.com> wrote:
>> I revised the comments and commit message of the Ruby fix.
>> 
>> 
>> 
>> 
>> I continue to work on the indirect-goto.c case. The fix that would make it
>> happen is inverting the
>> order of invocation of global const prop and global optimizer:
>> 
>> Index: lib/Transforms/IPO/PassManagerBuilder.cpp
>> ===================================================================
>> --- lib/Transforms/IPO/PassManagerBuilder.cpp (revision 211729)
>> +++ lib/Transforms/IPO/PassManagerBuilder.cpp (working copy)
>> @@ -156,9 +156,9 @@
>>   if (!DisableUnitAtATime) {
>>     addExtensionsToPM(EP_ModuleOptimizerEarly, MPM);
>> 
>> -    MPM.add(createGlobalOptimizerPass());     // Optimize out global vars
>> 
>>     MPM.add(createIPSCCPPass());              // IP SCCP
>> +    MPM.add(createGlobalOptimizerPass());     // Optimize out global vars
>>     MPM.add(createDeadArgEliminationPass());  // Dead argument elimination
>> 
>>     MPM.add(createInstructionCombiningPass());// Clean up after IPCP & DAE
>> 
>> 
>> When the global optimizer is invoked first, the indirect branch using the
>> global array has not
>> been eliminated. So the global can’t be eliminated. That is pretty straight
>> forward and the pass swap
>> looks reasonable but requires more scrutiny and careful testing. As
>> additional benefit the global
>> array would be removed from the IR.
>> 
>> 
>> -Gerolf
>> 
>> On Jun 25, 2014, at 6:01 PM, Gerolf Hoflehner <ghoflehner at apple.com> wrote:
>> 
>> Bob and I discussed this a bit more. Interprocedural const prop eliminates
>> the indirect branch. Then the labels. So when the
>> inliner kicks in it does not “see" any of the issues that could block it.
>> 
>> However, with the patch there must still be some information about block
>> label address taken kept around when a
>> global  (static) array contains the label addresses. I verified that the
>> test case would work when foo2
>> also declared just a local array  const void *addrs[] = { &&L1, &&L2, &&L3,
>> &&L4, &&L5 }.
>> Optimistically it seems like we might catch the global case also without
>> escape analysis,
>> but I have not looked into that code yet.
>> 
>> -Gerolf
>> 
>> 
>> 
>> On Jun 25, 2014, at 4:00 PM, Bob Wilson <bob.wilson at apple.com> wrote:
>> 
>> Oh! It’s a front-end test. I didn’t even think to look there….
>> 
>> The test predates the introduction of indirect branches and has only had
>> superficial changes since then. We used to translate computed gotos into
>> switch statements. I’m still trying to figure out how this ever worked in
>> the past.
>> 
>> On Jun 25, 2014, at 3:50 PM, Gerolf Hoflehner <ghoflehner at apple.com> wrote:
>> 
>> Looking at the history it looks like Chris L. committed the test case in
>> tools/clang/test/CodeGen/indirect-goto.c to test
>> block address labels.
>> 
>> I’m fine with the train of reasoning:
>> there is an indirect branch in the source and we
>> never attempt to inline such functions. Furthermore for inlining functions
>> with block
>> label address taken we would need escape analysis. We don’t have it. So
>> let’s not inline.
>> 
>> It looks like everyone is signing up for
>> “Expect no optimization when block address labels are used”.
>> 
>> Cheers
>> Gerolf
>> 
>> 
>> 
>> 
>> On Jun 25, 2014, at 2:49 PM, Eric Christopher <echristo at gmail.com> wrote:
>> 
>> On Wed, Jun 25, 2014 at 2:46 PM, Bob Wilson <bob.wilson at apple.com> wrote:
>> 
>> On Jun 25, 2014, at 2:30 PM, Gerolf Hoflehner <ghoflehner at apple.com> wrote:
>> 
>> I noticed one side-effect of this patch: we no longer optimize fully cases
>> like indirect-goto.c test in CodeGen below.
>> Since foo and foo2 are no longer inlined constant folding of the result in
>> main no longer happens.
>> 
>> Does anyone have an issue with this?
>> 
>> 
>> I don’t. We’ve never attempted to inline functions containing indirect
>> branches. I’m surprised that your testcase works now, but to really do it
>> correctly, we would need to check that there were no remaining uses of the
>> blockaddress that might escape the function. I’m not aware of any code that
>> does that check, and without it, it’s not a safe optimization.
>> 
>> 
>> 
>> Oh, wait, I see what the testcase is doing now. I'd thought it was
>> inlining into something that had blockaddr rather than the other way
>> around.
>> 
>> What does the commit for the testcase say about it?
>> 
>> Assuming it doesn't say anything enlightening then no, no issue here either.
>> 
>> -eric
>> 
>> Thanks
>> Gerolf
>> 
>> // RUN: %clang_cc1 -triple i386-unknown-unknown -O3 -emit-llvm -o - %s |
>> grep "ret i32 2520"
>> 
>> static int foo(unsigned i) {
>> void *addrs[] = { &&L1, &&L2, &&L3, &&L4, &&L5 };
>> int res = 1;
>> 
>> goto *addrs[i];
>> L5: res *= 11;
>> L4: res *= 7;
>> L3: res *= 5;
>> L2: res *= 3;
>> L1: res *= 2;
>> return res;
>> }
>> 
>> static int foo2(unsigned i) {
>> static const void *addrs[] = { &&L1, &&L2, &&L3, &&L4, &&L5 };
>> int res = 1;
>> 
>> goto *addrs[i];
>> L5: res *= 11;
>> L4: res *= 7;
>> L3: res *= 5;
>> L2: res *= 3;
>> L1: res *= 2;
>> return res;
>> }
>> 
>> int main() {
>> return foo(3)+foo2(4);
>> }
>> 
>> 
>> IR before:
>> 
>> ; Function Attrs: nounwind readnone
>> define i32 @main() #0 {
>> entry:
>> ret i32 2520
>> }
>> 
>> 
>> IR now:
>> 
>> ; Function Attrs: nounwind readnone
>> define i32 @main() #0 {
>> entry:
>> %call = tail call fastcc i32 @foo(i32 3)
>> %call1 = tail call fastcc i32 @foo2(i32 4)
>> %add = add nsw i32 %call1, %call
>> ret i32 %add
>> }
>> 
>> ; Function Attrs: nounwind readnone
>> define internal fastcc i32 @foo(i32 %i) #0 {
>> entry:
>> br label %L1
>> 
>> L1:                                               ; preds = %entry
>> ret i32 210
>> }
>> 
>> ; Function Attrs: nounwind readnone
>> define internal fastcc i32 @foo2(i32 %i) #0 {
>> entry:
>> br label %L1
>> 
>> L1:                                               ; preds = %entry
>> ret i32 2310
>> }
>> 
>> 
>> On Jun 25, 2014, at 11:32 AM, Bob Wilson <bob.wilson at apple.com> wrote:
>> 
>> 
>> On Jun 25, 2014, at 10:47 AM, Gerolf Hoflehner <ghoflehner at apple.com> wrote:
>> 
>> Hi
>> 
>> there have been reports on Ruby segfaults triggered by inlining functions
>> with block label address taken like
>> 
>> 
>> //Ruby code snippet
>> vm_exec_core() {
>> finish_insn_seq_0 = &&INSN_LABEL_finish;
>> INSN_LABEL_finish:
>> ;
>> }
>> 
>> This kind of scenario can also happen when LLVM picks a subset of blocks for
>> inlining,
>> which is the case with the actual code in the Ruby environment.
>> 
>> Background:
>> LLVM suppresses inlining for such functions when there is an indirect
>> branch. The attached
>> patch does so even when there is no indirect branch. Note that user code
>> like above would not
>> make much sense: using the global for jumping across function boundaries
>> would be illegal.
>> 
>> Why is there a segfault:
>> 
>> In the snipped above the block with the label is recognized as dead. So it
>> is eliminated. Instead
>> of a block address the cloner stores a constant (sic!) into the global
>> resulting in the segfault (when
>> the global is used in a goto).
>> 
>> Why did it work in the past then:
>> 
>> By luck. In older versions vm_exec_core was also inlined but the label
>> address used
>> was the block label address in vm_exec_core. So the global jump ended up in
>> the original function
>> rather than in the caller which accidentally happened to work.
>> 
>> 
>> -Gerolf
>> 
>> <0001-Suppress-inlining-when-the-block-address-is-taken.patch>
>> 
>> 
>> The change is fine, but I still have more comments about comments….
>> 
>> Inlining function with block label address taken can cause
>> many problem and requires a rich infrastructure to support
>> including escape analysis. At this point the safest approach
>> to address these problems is by blocking inlining from
>> happening.
>> 
>> 
>> Can you please put more of the context into the commit message? You’ve got
>> some good background in the email above, and it would be useful to have that
>> in the commit.
>> 
>> 
>> rdar://17245966
>> ---
>> lib/Analysis/IPA/InlineCost.cpp        | 21 ++++++++++++++++++++-
>> test/Transforms/Inline/blockaddress.ll |  5 +++--
>> 2 files changed, 23 insertions(+), 3 deletions(-)
>> 
>> diff --git a/lib/Analysis/IPA/InlineCost.cpp
>> b/lib/Analysis/IPA/InlineCost.cpp
>> index 4536583..b079f46 100644
>> --- a/lib/Analysis/IPA/InlineCost.cpp
>> +++ b/lib/Analysis/IPA/InlineCost.cpp
>> @@ -1097,6 +1097,16 @@ bool CallAnalyzer::analyzeCall(CallSite CS) {
>>    BasicBlock *BB = BBWorklist[Idx];
>>    if (BB->empty())
>>      continue;
>> +    // Disallow inlining of functions when the address of a label is
>> +    // taken. Usually when a block address is taken there is also
>> +    // an indirect branch which effectively suppresses inlining.
>> +    // But when we partially inline and only take a subset of the blocks
>> +    // into account there may not be an indirect branch. As cautionary
>> +    // measure we suppress inlining when there is a block label address
>> taken.
>> +    // See also the comments to visitIndirectBrInst() and isInlineViable().
>> +
>> +    if (BB->hasAddressTaken())
>> +      return false;
>> 
>> 
>> Please remove the blank line between the comment and the code.
>> 
>> You should remove the comment sentence beginning with “As cautionary
>> measure…”. It is not just a cautionary measure — it fixes a serious
>> miscompile of Ruby.
>> 
>> I’m not sure it is necessary to cross-reference the comments in the other
>> functions, but regardless, you should update the visitIndirectBrInst comment
>> to remove this sentence: "And as a QOI issue, if someone is using a
>> blockaddress without an indirectbr, and that reference somehow ends up in
>> another function or global, we probably don't want to inline this function.”
>> With your change here, that comment is no longer relevant.
>> 
>> 
>>    // Analyze the cost of this block. If we blow through the threshold,
>> this
>>    // returns false, and we can bail on out.
>> 
>> @@ -1279,7 +1289,16 @@ bool InlineCostAnalysis::isInlineViable(Function &F)
>> {
>>                                   Attribute::ReturnsTwice);
>>  for (Function::iterator BI = F.begin(), BE = F.end(); BI != BE; ++BI) {
>>    // Disallow inlining of functions which contain an indirect branch.
>> -    if (isa<IndirectBrInst>(BI->getTerminator()))
>> +    // Also disallow inlining of functions when the address of a label is
>> +    // taken. Usually when a block address is taken there is also
>> +    // an indirect branch which effectively suppresses inlining.
>> +    // But when there is no indirect branch the user wrote a function
>> +    // that takes the label address but does not jump to it within
>> +    // the function. This does not make much sense and could point
>> +    // to code that is not legal (eg. resulting in jump across function
>> +    // boundaries). The check here at least prevents inlining of such code.
>> +    // See also the comments to visitIndirectBrInst() and in analyzeCall().
>> +    if (isa<IndirectBrInst>(BI->getTerminator()) || BI->hasAddressTaken())
>>      return false;
>> 
>>    for (BasicBlock::iterator II = BI->begin(), IE = BI->end(); II != IE;
>> 
>> 
>> This comment seems a little verbose to me. I would at least remove the
>> cross-references to comments in other functions.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 





More information about the llvm-commits mailing list