fix for Ruby build fails due to inlining functions with block label address taken

Eric Christopher echristo at gmail.com
Tue Jul 1 12:03:02 PDT 2014


You can also file an external bug and clone it internally. That way we have
public visibility into the issue.

Thanks.
On Jul 1, 2014 12:01 PM, "Gerolf Hoflehner" <ghoflehner at apple.com> wrote:

> I filed internal rdar://17523868 and assigned it to the clang team to
> check why the initialization for block address arrays
> is different in 32b and 64b mode. Bob had also offered to take a stab at
> it.
>
> -Gerolf
>
> On Jun 30, 2014, at 9:15 PM, Gerolf Hoflehner <ghoflehner at apple.com>
> wrote:
>
> My O3 performance runs look fine on x86 and ARM for the llvm tests.
> Note that in LTO the order is already IPSCCPP  and then global opt.
> The change  then establishes the same order at lower optimization levels.
>
> There is still one twist with the indirect-goto.c test in clang: After the
> changes the test case
> passes in 64b mode, but not in 32b.  LLVM generates different
> initialization code for the block address array
> the two modes.  The effect is that the reference
> count of block labels is zero only in 64b mode. But in 32b  some
> optimizations
> like jump threading, inlining etc. that check for the address taken bit
> don’t kick in. This requires a
> deeper and separate investigation into implementation and (possibly)
> performance differences
> between 32b and 64b. To recover the test case regression a simple change
> to the triple will do:
>
> Index: test/CodeGen/indirect-goto.c
> ===================================================================
> --- test/CodeGen/indirect-goto.c (revision 212082)
> +++ test/CodeGen/indirect-goto.c (working copy)
> @@ -1,4 +1,4 @@
> -// RUN: %clang_cc1 -triple i386-unknown-unknown -O3 -emit-llvm -o - %s |
> grep "ret i32 2520"
> +// RUN: %clang_cc1 -triple x86_64-unknown-unknown -O3 -emit-llvm -o - %s
> | grep "ret i32 2520"
>
>  static int foo(unsigned i) {
>    void *addrs[] = { &&L1, &&L2, &&L3, &&L4, &&L5 };
>
>
> Do the changes in PassManagerBuilder.cpp and the test case LGTY?
>
> Thanks
> Gerolf
>
> PS: Change to PassManagerBuilder.cpp:
>
> Index: lib/Transforms/IPO/PassManagerBuilder.cpp
> ===================================================================
> --- lib/Transforms/IPO/PassManagerBuilder.cpp (revision 211729)
> +++ lib/Transforms/IPO/PassManagerBuilder.cpp (working copy)
> @@ -156,9 +156,9 @@
>   if (!DisableUnitAtATime) {
>     addExtensionsToPM(EP_ModuleOptimizerEarly, MPM);
>
> -    MPM.add(createGlobalOptimizerPass());     // Optimize out global vars
>
>     MPM.add(createIPSCCPPass());              // IP SCCP
> +    MPM.add(createGlobalOptimizerPass());     // Optimize out global vars
>     MPM.add(createDeadArgEliminationPass());  // Dead argument elimination
>
>     MPM.add(createInstructionCombiningPass());// Clean up after IPCP & DAE
>
>
> On Jun 27, 2014, at 4:50 PM, Eric Christopher <echristo at gmail.com> wrote:
>
> That does seem reasonably straightforward and makes sense on the surface.
>
> Did you want to benchmark it and see what you get? If you'd like I can
> do the same here.
>
> -eric
>
> On Fri, Jun 27, 2014 at 4:26 PM, Gerolf Hoflehner <ghoflehner at apple.com>
> wrote:
>
> I revised the comments and commit message of the Ruby fix.
>
>
>
>
> I continue to work on the indirect-goto.c case. The fix that would make it
> happen is inverting the
> order of invocation of global const prop and global optimizer:
>
> Index: lib/Transforms/IPO/PassManagerBuilder.cpp
> ===================================================================
> --- lib/Transforms/IPO/PassManagerBuilder.cpp (revision 211729)
> +++ lib/Transforms/IPO/PassManagerBuilder.cpp (working copy)
> @@ -156,9 +156,9 @@
>   if (!DisableUnitAtATime) {
>     addExtensionsToPM(EP_ModuleOptimizerEarly, MPM);
>
> -    MPM.add(createGlobalOptimizerPass());     // Optimize out global vars
>
>     MPM.add(createIPSCCPPass());              // IP SCCP
> +    MPM.add(createGlobalOptimizerPass());     // Optimize out global vars
>     MPM.add(createDeadArgEliminationPass());  // Dead argument elimination
>
>     MPM.add(createInstructionCombiningPass());// Clean up after IPCP & DAE
>
>
> When the global optimizer is invoked first, the indirect branch using the
> global array has not
> been eliminated. So the global can’t be eliminated. That is pretty straight
> forward and the pass swap
> looks reasonable but requires more scrutiny and careful testing. As
> additional benefit the global
> array would be removed from the IR.
>
>
> -Gerolf
>
> On Jun 25, 2014, at 6:01 PM, Gerolf Hoflehner <ghoflehner at apple.com>
> wrote:
>
> Bob and I discussed this a bit more. Interprocedural const prop eliminates
> the indirect branch. Then the labels. So when the
> inliner kicks in it does not “see" any of the issues that could block it.
>
> However, with the patch there must still be some information about block
> label address taken kept around when a
> global  (static) array contains the label addresses. I verified that the
> test case would work when foo2
> also declared just a local array  const void *addrs[] = { &&L1, &&L2, &&L3,
> &&L4, &&L5 }.
> Optimistically it seems like we might catch the global case also without
> escape analysis,
> but I have not looked into that code yet.
>
> -Gerolf
>
>
>
> On Jun 25, 2014, at 4:00 PM, Bob Wilson <bob.wilson at apple.com> wrote:
>
> Oh! It’s a front-end test. I didn’t even think to look there….
>
> The test predates the introduction of indirect branches and has only had
> superficial changes since then. We used to translate computed gotos into
> switch statements. I’m still trying to figure out how this ever worked in
> the past.
>
> On Jun 25, 2014, at 3:50 PM, Gerolf Hoflehner <ghoflehner at apple.com>
> wrote:
>
> Looking at the history it looks like Chris L. committed the test case in
> tools/clang/test/CodeGen/indirect-goto.c to test
> block address labels.
>
> I’m fine with the train of reasoning:
> there is an indirect branch in the source and we
> never attempt to inline such functions. Furthermore for inlining functions
> with block
> label address taken we would need escape analysis. We don’t have it. So
> let’s not inline.
>
> It looks like everyone is signing up for
> “Expect no optimization when block address labels are used”.
>
> Cheers
> Gerolf
>
>
>
>
> On Jun 25, 2014, at 2:49 PM, Eric Christopher <echristo at gmail.com> wrote:
>
> On Wed, Jun 25, 2014 at 2:46 PM, Bob Wilson <bob.wilson at apple.com> wrote:
>
> On Jun 25, 2014, at 2:30 PM, Gerolf Hoflehner <ghoflehner at apple.com>
> wrote:
>
> I noticed one side-effect of this patch: we no longer optimize fully cases
> like indirect-goto.c test in CodeGen below.
> Since foo and foo2 are no longer inlined constant folding of the result in
> main no longer happens.
>
> Does anyone have an issue with this?
>
>
> I don’t. We’ve never attempted to inline functions containing indirect
> branches. I’m surprised that your testcase works now, but to really do it
> correctly, we would need to check that there were no remaining uses of the
> blockaddress that might escape the function. I’m not aware of any code that
> does that check, and without it, it’s not a safe optimization.
>
>
>
> Oh, wait, I see what the testcase is doing now. I'd thought it was
> inlining into something that had blockaddr rather than the other way
> around.
>
> What does the commit for the testcase say about it?
>
> Assuming it doesn't say anything enlightening then no, no issue here
> either.
>
> -eric
>
> Thanks
> Gerolf
>
> // RUN: %clang_cc1 -triple i386-unknown-unknown -O3 -emit-llvm -o - %s |
> grep "ret i32 2520"
>
> static int foo(unsigned i) {
> void *addrs[] = { &&L1, &&L2, &&L3, &&L4, &&L5 };
> int res = 1;
>
> goto *addrs[i];
> L5: res *= 11;
> L4: res *= 7;
> L3: res *= 5;
> L2: res *= 3;
> L1: res *= 2;
> return res;
> }
>
> static int foo2(unsigned i) {
> static const void *addrs[] = { &&L1, &&L2, &&L3, &&L4, &&L5 };
> int res = 1;
>
> goto *addrs[i];
> L5: res *= 11;
> L4: res *= 7;
> L3: res *= 5;
> L2: res *= 3;
> L1: res *= 2;
> return res;
> }
>
> int main() {
> return foo(3)+foo2(4);
> }
>
>
> IR before:
>
> ; Function Attrs: nounwind readnone
> define i32 @main() #0 {
> entry:
> ret i32 2520
> }
>
>
> IR now:
>
> ; Function Attrs: nounwind readnone
> define i32 @main() #0 {
> entry:
> %call = tail call fastcc i32 @foo(i32 3)
> %call1 = tail call fastcc i32 @foo2(i32 4)
> %add = add nsw i32 %call1, %call
> ret i32 %add
> }
>
> ; Function Attrs: nounwind readnone
> define internal fastcc i32 @foo(i32 %i) #0 {
> entry:
> br label %L1
>
> L1:                                               ; preds = %entry
> ret i32 210
> }
>
> ; Function Attrs: nounwind readnone
> define internal fastcc i32 @foo2(i32 %i) #0 {
> entry:
> br label %L1
>
> L1:                                               ; preds = %entry
> ret i32 2310
> }
>
>
> On Jun 25, 2014, at 11:32 AM, Bob Wilson <bob.wilson at apple.com> wrote:
>
>
> On Jun 25, 2014, at 10:47 AM, Gerolf Hoflehner <ghoflehner at apple.com>
> wrote:
>
> Hi
>
> there have been reports on Ruby segfaults triggered by inlining functions
> with block label address taken like
>
>
> //Ruby code snippet
> vm_exec_core() {
> finish_insn_seq_0 = &&INSN_LABEL_finish;
> INSN_LABEL_finish:
> ;
> }
>
> This kind of scenario can also happen when LLVM picks a subset of blocks
> for
> inlining,
> which is the case with the actual code in the Ruby environment.
>
> Background:
> LLVM suppresses inlining for such functions when there is an indirect
> branch. The attached
> patch does so even when there is no indirect branch. Note that user code
> like above would not
> make much sense: using the global for jumping across function boundaries
> would be illegal.
>
> Why is there a segfault:
>
> In the snipped above the block with the label is recognized as dead. So it
> is eliminated. Instead
> of a block address the cloner stores a constant (sic!) into the global
> resulting in the segfault (when
> the global is used in a goto).
>
> Why did it work in the past then:
>
> By luck. In older versions vm_exec_core was also inlined but the label
> address used
> was the block label address in vm_exec_core. So the global jump ended up in
> the original function
> rather than in the caller which accidentally happened to work.
>
>
> -Gerolf
>
> <0001-Suppress-inlining-when-the-block-address-is-taken.patch>
>
>
> The change is fine, but I still have more comments about comments….
>
> Inlining function with block label address taken can cause
> many problem and requires a rich infrastructure to support
> including escape analysis. At this point the safest approach
> to address these problems is by blocking inlining from
> happening.
>
>
> Can you please put more of the context into the commit message? You’ve got
> some good background in the email above, and it would be useful to have
> that
> in the commit.
>
>
> rdar://17245966
> ---
> lib/Analysis/IPA/InlineCost.cpp        | 21 ++++++++++++++++++++-
> test/Transforms/Inline/blockaddress.ll |  5 +++--
> 2 files changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/lib/Analysis/IPA/InlineCost.cpp
> b/lib/Analysis/IPA/InlineCost.cpp
> index 4536583..b079f46 100644
> --- a/lib/Analysis/IPA/InlineCost.cpp
> +++ b/lib/Analysis/IPA/InlineCost.cpp
> @@ -1097,6 +1097,16 @@ bool CallAnalyzer::analyzeCall(CallSite CS) {
>    BasicBlock *BB = BBWorklist[Idx];
>    if (BB->empty())
>      continue;
> +    // Disallow inlining of functions when the address of a label is
> +    // taken. Usually when a block address is taken there is also
> +    // an indirect branch which effectively suppresses inlining.
> +    // But when we partially inline and only take a subset of the blocks
> +    // into account there may not be an indirect branch. As cautionary
> +    // measure we suppress inlining when there is a block label address
> taken.
> +    // See also the comments to visitIndirectBrInst() and
> isInlineViable().
> +
> +    if (BB->hasAddressTaken())
> +      return false;
>
>
> Please remove the blank line between the comment and the code.
>
> You should remove the comment sentence beginning with “As cautionary
> measure…”. It is not just a cautionary measure — it fixes a serious
> miscompile of Ruby.
>
> I’m not sure it is necessary to cross-reference the comments in the other
> functions, but regardless, you should update the visitIndirectBrInst
> comment
> to remove this sentence: "And as a QOI issue, if someone is using a
> blockaddress without an indirectbr, and that reference somehow ends up in
> another function or global, we probably don't want to inline this
> function.”
> With your change here, that comment is no longer relevant.
>
>
>    // Analyze the cost of this block. If we blow through the threshold,
> this
>    // returns false, and we can bail on out.
>
> @@ -1279,7 +1289,16 @@ bool InlineCostAnalysis::isInlineViable(Function &F)
> {
>                                   Attribute::ReturnsTwice);
>  for (Function::iterator BI = F.begin(), BE = F.end(); BI != BE; ++BI) {
>    // Disallow inlining of functions which contain an indirect branch.
> -    if (isa<IndirectBrInst>(BI->getTerminator()))
> +    // Also disallow inlining of functions when the address of a label is
> +    // taken. Usually when a block address is taken there is also
> +    // an indirect branch which effectively suppresses inlining.
> +    // But when there is no indirect branch the user wrote a function
> +    // that takes the label address but does not jump to it within
> +    // the function. This does not make much sense and could point
> +    // to code that is not legal (eg. resulting in jump across function
> +    // boundaries). The check here at least prevents inlining of such
> code.
> +    // See also the comments to visitIndirectBrInst() and in
> analyzeCall().
> +    if (isa<IndirectBrInst>(BI->getTerminator()) || BI->hasAddressTaken())
>      return false;
>
>    for (BasicBlock::iterator II = BI->begin(), IE = BI->end(); II != IE;
>
>
> This comment seems a little verbose to me. I would at least remove the
> cross-references to comments in other functions.
>
>
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140701/87cabec5/attachment.html>


More information about the llvm-commits mailing list