fix for Ruby build fails due to inlining functions with block label address taken

Gerolf Hoflehner ghoflehner at apple.com
Thu Jul 3 12:43:28 PDT 2014


Thank you, Eric!

Cheers
Gerolf


On Jul 1, 2014, at 6:32 PM, Eric Christopher <echristo at gmail.com> wrote:

> Having read some tea leaves it looks like the change is in the noise
> on this end.
> 
> Thanks!
> 
> -eric
> 
> On Tue, Jul 1, 2014 at 12:03 PM, Eric Christopher <echristo at gmail.com> wrote:
>> Had some instability yesterday. I'll run it today.
>> 
>> On Jul 1, 2014 10:39 AM, "Bob Wilson" <bob.wilson at apple.com> wrote:
>>> 
>>> The pass order change looks fine. Eric, you had offered to do some
>>> benchmarking of that change, too. How does it look for you?
>>> 
>>> On Jun 30, 2014, at 9:15 PM, Gerolf Hoflehner <ghoflehner at apple.com>
>>> wrote:
>>> 
>>> My O3 performance runs look fine on x86 and ARM for the llvm tests.
>>> Note that in LTO the order is already IPSCCPP  and then global opt.
>>> The change  then establishes the same order at lower optimization levels.
>>> 
>>> There is still one twist with the indirect-goto.c test in clang: After the
>>> changes the test case
>>> passes in 64b mode, but not in 32b.  LLVM generates different
>>> initialization code for the block address array
>>> the two modes.  The effect is that the reference
>>> count of block labels is zero only in 64b mode. But in 32b  some
>>> optimizations
>>> like jump threading, inlining etc. that check for the address taken bit
>>> don’t kick in. This requires a
>>> deeper and separate investigation into implementation and (possibly)
>>> performance differences
>>> between 32b and 64b. To recover the test case regression a simple change
>>> to the triple will do:
>>> 
>>> Index: test/CodeGen/indirect-goto.c
>>> ===================================================================
>>> --- test/CodeGen/indirect-goto.c (revision 212082)
>>> +++ test/CodeGen/indirect-goto.c (working copy)
>>> @@ -1,4 +1,4 @@
>>> -// RUN: %clang_cc1 -triple i386-unknown-unknown -O3 -emit-llvm -o - %s |
>>> grep "ret i32 2520"
>>> +// RUN: %clang_cc1 -triple x86_64-unknown-unknown -O3 -emit-llvm -o - %s
>>> | grep "ret i32 2520"
>>> 
>>> static int foo(unsigned i) {
>>>  void *addrs[] = { &&L1, &&L2, &&L3, &&L4, &&L5 };
>>> 
>>> 
>>> Do the changes in PassManagerBuilder.cpp and the test case LGTY?
>>> 
>>> Thanks
>>> Gerolf
>>> 
>>> PS: Change to PassManagerBuilder.cpp:
>>> 
>>> Index: lib/Transforms/IPO/PassManagerBuilder.cpp
>>> ===================================================================
>>> --- lib/Transforms/IPO/PassManagerBuilder.cpp (revision 211729)
>>> +++ lib/Transforms/IPO/PassManagerBuilder.cpp (working copy)
>>> @@ -156,9 +156,9 @@
>>> if (!DisableUnitAtATime) {
>>>   addExtensionsToPM(EP_ModuleOptimizerEarly, MPM);
>>> 
>>> -    MPM.add(createGlobalOptimizerPass());     // Optimize out global vars
>>> 
>>>   MPM.add(createIPSCCPPass());              // IP SCCP
>>> +    MPM.add(createGlobalOptimizerPass());     // Optimize out global vars
>>>   MPM.add(createDeadArgEliminationPass());  // Dead argument elimination
>>> 
>>>   MPM.add(createInstructionCombiningPass());// Clean up after IPCP & DAE
>>> 
>>> 
>>> On Jun 27, 2014, at 4:50 PM, Eric Christopher <echristo at gmail.com> wrote:
>>> 
>>> That does seem reasonably straightforward and makes sense on the surface.
>>> 
>>> Did you want to benchmark it and see what you get? If you'd like I can
>>> do the same here.
>>> 
>>> -eric
>>> 
>>> On Fri, Jun 27, 2014 at 4:26 PM, Gerolf Hoflehner <ghoflehner at apple.com>
>>> wrote:
>>> 
>>> I revised the comments and commit message of the Ruby fix.
>>> 
>>> 
>>> 
>>> 
>>> I continue to work on the indirect-goto.c case. The fix that would make it
>>> happen is inverting the
>>> order of invocation of global const prop and global optimizer:
>>> 
>>> Index: lib/Transforms/IPO/PassManagerBuilder.cpp
>>> ===================================================================
>>> --- lib/Transforms/IPO/PassManagerBuilder.cpp (revision 211729)
>>> +++ lib/Transforms/IPO/PassManagerBuilder.cpp (working copy)
>>> @@ -156,9 +156,9 @@
>>> if (!DisableUnitAtATime) {
>>>   addExtensionsToPM(EP_ModuleOptimizerEarly, MPM);
>>> 
>>> -    MPM.add(createGlobalOptimizerPass());     // Optimize out global vars
>>> 
>>>   MPM.add(createIPSCCPPass());              // IP SCCP
>>> +    MPM.add(createGlobalOptimizerPass());     // Optimize out global vars
>>>   MPM.add(createDeadArgEliminationPass());  // Dead argument elimination
>>> 
>>>   MPM.add(createInstructionCombiningPass());// Clean up after IPCP & DAE
>>> 
>>> 
>>> When the global optimizer is invoked first, the indirect branch using the
>>> global array has not
>>> been eliminated. So the global can’t be eliminated. That is pretty
>>> straight
>>> forward and the pass swap
>>> looks reasonable but requires more scrutiny and careful testing. As
>>> additional benefit the global
>>> array would be removed from the IR.
>>> 
>>> 
>>> -Gerolf
>>> 
>>> On Jun 25, 2014, at 6:01 PM, Gerolf Hoflehner <ghoflehner at apple.com>
>>> wrote:
>>> 
>>> Bob and I discussed this a bit more. Interprocedural const prop eliminates
>>> the indirect branch. Then the labels. So when the
>>> inliner kicks in it does not “see" any of the issues that could block it.
>>> 
>>> However, with the patch there must still be some information about block
>>> label address taken kept around when a
>>> global  (static) array contains the label addresses. I verified that the
>>> test case would work when foo2
>>> also declared just a local array  const void *addrs[] = { &&L1, &&L2,
>>> &&L3,
>>> &&L4, &&L5 }.
>>> Optimistically it seems like we might catch the global case also without
>>> escape analysis,
>>> but I have not looked into that code yet.
>>> 
>>> -Gerolf
>>> 
>>> 
>>> 
>>> On Jun 25, 2014, at 4:00 PM, Bob Wilson <bob.wilson at apple.com> wrote:
>>> 
>>> Oh! It’s a front-end test. I didn’t even think to look there….
>>> 
>>> The test predates the introduction of indirect branches and has only had
>>> superficial changes since then. We used to translate computed gotos into
>>> switch statements. I’m still trying to figure out how this ever worked in
>>> the past.
>>> 
>>> On Jun 25, 2014, at 3:50 PM, Gerolf Hoflehner <ghoflehner at apple.com>
>>> wrote:
>>> 
>>> Looking at the history it looks like Chris L. committed the test case in
>>> tools/clang/test/CodeGen/indirect-goto.c to test
>>> block address labels.
>>> 
>>> I’m fine with the train of reasoning:
>>> there is an indirect branch in the source and we
>>> never attempt to inline such functions. Furthermore for inlining functions
>>> with block
>>> label address taken we would need escape analysis. We don’t have it. So
>>> let’s not inline.
>>> 
>>> It looks like everyone is signing up for
>>> “Expect no optimization when block address labels are used”.
>>> 
>>> Cheers
>>> Gerolf
>>> 
>>> 
>>> 
>>> 
>>> On Jun 25, 2014, at 2:49 PM, Eric Christopher <echristo at gmail.com> wrote:
>>> 
>>> On Wed, Jun 25, 2014 at 2:46 PM, Bob Wilson <bob.wilson at apple.com> wrote:
>>> 
>>> On Jun 25, 2014, at 2:30 PM, Gerolf Hoflehner <ghoflehner at apple.com>
>>> wrote:
>>> 
>>> I noticed one side-effect of this patch: we no longer optimize fully cases
>>> like indirect-goto.c test in CodeGen below.
>>> Since foo and foo2 are no longer inlined constant folding of the result in
>>> main no longer happens.
>>> 
>>> Does anyone have an issue with this?
>>> 
>>> 
>>> I don’t. We’ve never attempted to inline functions containing indirect
>>> branches. I’m surprised that your testcase works now, but to really do it
>>> correctly, we would need to check that there were no remaining uses of the
>>> blockaddress that might escape the function. I’m not aware of any code
>>> that
>>> does that check, and without it, it’s not a safe optimization.
>>> 
>>> 
>>> 
>>> Oh, wait, I see what the testcase is doing now. I'd thought it was
>>> inlining into something that had blockaddr rather than the other way
>>> around.
>>> 
>>> What does the commit for the testcase say about it?
>>> 
>>> Assuming it doesn't say anything enlightening then no, no issue here
>>> either.
>>> 
>>> -eric
>>> 
>>> Thanks
>>> Gerolf
>>> 
>>> // RUN: %clang_cc1 -triple i386-unknown-unknown -O3 -emit-llvm -o - %s |
>>> grep "ret i32 2520"
>>> 
>>> static int foo(unsigned i) {
>>> void *addrs[] = { &&L1, &&L2, &&L3, &&L4, &&L5 };
>>> int res = 1;
>>> 
>>> goto *addrs[i];
>>> L5: res *= 11;
>>> L4: res *= 7;
>>> L3: res *= 5;
>>> L2: res *= 3;
>>> L1: res *= 2;
>>> return res;
>>> }
>>> 
>>> static int foo2(unsigned i) {
>>> static const void *addrs[] = { &&L1, &&L2, &&L3, &&L4, &&L5 };
>>> int res = 1;
>>> 
>>> goto *addrs[i];
>>> L5: res *= 11;
>>> L4: res *= 7;
>>> L3: res *= 5;
>>> L2: res *= 3;
>>> L1: res *= 2;
>>> return res;
>>> }
>>> 
>>> int main() {
>>> return foo(3)+foo2(4);
>>> }
>>> 
>>> 
>>> IR before:
>>> 
>>> ; Function Attrs: nounwind readnone
>>> define i32 @main() #0 {
>>> entry:
>>> ret i32 2520
>>> }
>>> 
>>> 
>>> IR now:
>>> 
>>> ; Function Attrs: nounwind readnone
>>> define i32 @main() #0 {
>>> entry:
>>> %call = tail call fastcc i32 @foo(i32 3)
>>> %call1 = tail call fastcc i32 @foo2(i32 4)
>>> %add = add nsw i32 %call1, %call
>>> ret i32 %add
>>> }
>>> 
>>> ; Function Attrs: nounwind readnone
>>> define internal fastcc i32 @foo(i32 %i) #0 {
>>> entry:
>>> br label %L1
>>> 
>>> L1:                                               ; preds = %entry
>>> ret i32 210
>>> }
>>> 
>>> ; Function Attrs: nounwind readnone
>>> define internal fastcc i32 @foo2(i32 %i) #0 {
>>> entry:
>>> br label %L1
>>> 
>>> L1:                                               ; preds = %entry
>>> ret i32 2310
>>> }
>>> 
>>> 
>>> On Jun 25, 2014, at 11:32 AM, Bob Wilson <bob.wilson at apple.com> wrote:
>>> 
>>> 
>>> On Jun 25, 2014, at 10:47 AM, Gerolf Hoflehner <ghoflehner at apple.com>
>>> wrote:
>>> 
>>> Hi
>>> 
>>> there have been reports on Ruby segfaults triggered by inlining functions
>>> with block label address taken like
>>> 
>>> 
>>> //Ruby code snippet
>>> vm_exec_core() {
>>> finish_insn_seq_0 = &&INSN_LABEL_finish;
>>> INSN_LABEL_finish:
>>> ;
>>> }
>>> 
>>> This kind of scenario can also happen when LLVM picks a subset of blocks
>>> for
>>> inlining,
>>> which is the case with the actual code in the Ruby environment.
>>> 
>>> Background:
>>> LLVM suppresses inlining for such functions when there is an indirect
>>> branch. The attached
>>> patch does so even when there is no indirect branch. Note that user code
>>> like above would not
>>> make much sense: using the global for jumping across function boundaries
>>> would be illegal.
>>> 
>>> Why is there a segfault:
>>> 
>>> In the snipped above the block with the label is recognized as dead. So it
>>> is eliminated. Instead
>>> of a block address the cloner stores a constant (sic!) into the global
>>> resulting in the segfault (when
>>> the global is used in a goto).
>>> 
>>> Why did it work in the past then:
>>> 
>>> By luck. In older versions vm_exec_core was also inlined but the label
>>> address used
>>> was the block label address in vm_exec_core. So the global jump ended up
>>> in
>>> the original function
>>> rather than in the caller which accidentally happened to work.
>>> 
>>> 
>>> -Gerolf
>>> 
>>> <0001-Suppress-inlining-when-the-block-address-is-taken.patch>
>>> 
>>> 
>>> The change is fine, but I still have more comments about comments….
>>> 
>>> Inlining function with block label address taken can cause
>>> many problem and requires a rich infrastructure to support
>>> including escape analysis. At this point the safest approach
>>> to address these problems is by blocking inlining from
>>> happening.
>>> 
>>> 
>>> Can you please put more of the context into the commit message? You’ve got
>>> some good background in the email above, and it would be useful to have
>>> that
>>> in the commit.
>>> 
>>> 
>>> rdar://17245966
>>> ---
>>> lib/Analysis/IPA/InlineCost.cpp        | 21 ++++++++++++++++++++-
>>> test/Transforms/Inline/blockaddress.ll |  5 +++--
>>> 2 files changed, 23 insertions(+), 3 deletions(-)
>>> 
>>> diff --git a/lib/Analysis/IPA/InlineCost.cpp
>>> b/lib/Analysis/IPA/InlineCost.cpp
>>> index 4536583..b079f46 100644
>>> --- a/lib/Analysis/IPA/InlineCost.cpp
>>> +++ b/lib/Analysis/IPA/InlineCost.cpp
>>> @@ -1097,6 +1097,16 @@ bool CallAnalyzer::analyzeCall(CallSite CS) {
>>>  BasicBlock *BB = BBWorklist[Idx];
>>>  if (BB->empty())
>>>    continue;
>>> +    // Disallow inlining of functions when the address of a label is
>>> +    // taken. Usually when a block address is taken there is also
>>> +    // an indirect branch which effectively suppresses inlining.
>>> +    // But when we partially inline and only take a subset of the blocks
>>> +    // into account there may not be an indirect branch. As cautionary
>>> +    // measure we suppress inlining when there is a block label address
>>> taken.
>>> +    // See also the comments to visitIndirectBrInst() and
>>> isInlineViable().
>>> +
>>> +    if (BB->hasAddressTaken())
>>> +      return false;
>>> 
>>> 
>>> Please remove the blank line between the comment and the code.
>>> 
>>> You should remove the comment sentence beginning with “As cautionary
>>> measure…”. It is not just a cautionary measure — it fixes a serious
>>> miscompile of Ruby.
>>> 
>>> I’m not sure it is necessary to cross-reference the comments in the other
>>> functions, but regardless, you should update the visitIndirectBrInst
>>> comment
>>> to remove this sentence: "And as a QOI issue, if someone is using a
>>> blockaddress without an indirectbr, and that reference somehow ends up in
>>> another function or global, we probably don't want to inline this
>>> function.”
>>> With your change here, that comment is no longer relevant.
>>> 
>>> 
>>>  // Analyze the cost of this block. If we blow through the threshold,
>>> this
>>>  // returns false, and we can bail on out.
>>> 
>>> @@ -1279,7 +1289,16 @@ bool InlineCostAnalysis::isInlineViable(Function
>>> &F)
>>> {
>>>                                 Attribute::ReturnsTwice);
>>> for (Function::iterator BI = F.begin(), BE = F.end(); BI != BE; ++BI) {
>>>  // Disallow inlining of functions which contain an indirect branch.
>>> -    if (isa<IndirectBrInst>(BI->getTerminator()))
>>> +    // Also disallow inlining of functions when the address of a label is
>>> +    // taken. Usually when a block address is taken there is also
>>> +    // an indirect branch which effectively suppresses inlining.
>>> +    // But when there is no indirect branch the user wrote a function
>>> +    // that takes the label address but does not jump to it within
>>> +    // the function. This does not make much sense and could point
>>> +    // to code that is not legal (eg. resulting in jump across function
>>> +    // boundaries). The check here at least prevents inlining of such
>>> code.
>>> +    // See also the comments to visitIndirectBrInst() and in
>>> analyzeCall().
>>> +    if (isa<IndirectBrInst>(BI->getTerminator()) ||
>>> BI->hasAddressTaken())
>>>    return false;
>>> 
>>>  for (BasicBlock::iterator II = BI->begin(), IE = BI->end(); II != IE;
>>> 
>>> 
>>> This comment seems a little verbose to me. I would at least remove the
>>> cross-references to comments in other functions.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 





More information about the llvm-commits mailing list