fix for Ruby build fails due to inlining functions with block label address taken

Tue Jul 1 18:32:19 PDT 2014

Having read some tea leaves it looks like the change is in the noise
on this end.

Thanks!

-eric

On Tue, Jul 1, 2014 at 12:03 PM, Eric Christopher <echristo at gmail.com> wrote:
> Had some instability yesterday. I'll run it today.
>
> On Jul 1, 2014 10:39 AM, "Bob Wilson" <bob.wilson at apple.com> wrote:
>>
>> The pass order change looks fine. Eric, you had offered to do some
>> benchmarking of that change, too. How does it look for you?
>>
>> On Jun 30, 2014, at 9:15 PM, Gerolf Hoflehner <ghoflehner at apple.com>
>> wrote:
>>
>> My O3 performance runs look fine on x86 and ARM for the llvm tests.
>> Note that in LTO the order is already IPSCCPP  and then global opt.
>> The change  then establishes the same order at lower optimization levels.
>>
>> There is still one twist with the indirect-goto.c test in clang: After the
>> changes the test case
>> passes in 64b mode, but not in 32b.  LLVM generates different
>> initialization code for the block address array
>> the two modes.  The effect is that the reference
>> count of block labels is zero only in 64b mode. But in 32b  some
>> optimizations
>> like jump threading, inlining etc. that check for the address taken bit
>> don’t kick in. This requires a
>> deeper and separate investigation into implementation and (possibly)
>> performance differences
>> between 32b and 64b. To recover the test case regression a simple change
>> to the triple will do:
>>
>> Index: test/CodeGen/indirect-goto.c
>> ===================================================================
>> --- test/CodeGen/indirect-goto.c (revision 212082)
>> +++ test/CodeGen/indirect-goto.c (working copy)
>> @@ -1,4 +1,4 @@
>> -// RUN: %clang_cc1 -triple i386-unknown-unknown -O3 -emit-llvm -o - %s |
>> grep "ret i32 2520"
>> +// RUN: %clang_cc1 -triple x86_64-unknown-unknown -O3 -emit-llvm -o - %s
>> | grep "ret i32 2520"
>>
>>  static int foo(unsigned i) {
>>    void *addrs[] = { &&L1, &&L2, &&L3, &&L4, &&L5 };
>>
>>
>> Do the changes in PassManagerBuilder.cpp and the test case LGTY?
>>
>> Thanks
>> Gerolf
>>
>> PS: Change to PassManagerBuilder.cpp:
>>
>> Index: lib/Transforms/IPO/PassManagerBuilder.cpp
>> ===================================================================
>> --- lib/Transforms/IPO/PassManagerBuilder.cpp (revision 211729)
>> +++ lib/Transforms/IPO/PassManagerBuilder.cpp (working copy)
>> @@ -156,9 +156,9 @@
>>   if (!DisableUnitAtATime) {
>>     addExtensionsToPM(EP_ModuleOptimizerEarly, MPM);
>>
>> -    MPM.add(createGlobalOptimizerPass());     // Optimize out global vars
>>
>>     MPM.add(createIPSCCPPass());              // IP SCCP
>> +    MPM.add(createGlobalOptimizerPass());     // Optimize out global vars
>>     MPM.add(createDeadArgEliminationPass());  // Dead argument elimination
>>
>>     MPM.add(createInstructionCombiningPass());// Clean up after IPCP & DAE
>>
>>
>> On Jun 27, 2014, at 4:50 PM, Eric Christopher <echristo at gmail.com> wrote:
>>
>> That does seem reasonably straightforward and makes sense on the surface.
>>
>> Did you want to benchmark it and see what you get? If you'd like I can
>> do the same here.
>>
>> -eric
>>
>> On Fri, Jun 27, 2014 at 4:26 PM, Gerolf Hoflehner <ghoflehner at apple.com>
>> wrote:
>>
>> I revised the comments and commit message of the Ruby fix.
>>
>>
>>
>>
>> I continue to work on the indirect-goto.c case. The fix that would make it
>> happen is inverting the
>> order of invocation of global const prop and global optimizer:
>>
>> Index: lib/Transforms/IPO/PassManagerBuilder.cpp
>> ===================================================================
>> --- lib/Transforms/IPO/PassManagerBuilder.cpp (revision 211729)
>> +++ lib/Transforms/IPO/PassManagerBuilder.cpp (working copy)
>> @@ -156,9 +156,9 @@
>>   if (!DisableUnitAtATime) {
>>     addExtensionsToPM(EP_ModuleOptimizerEarly, MPM);
>>
>> -    MPM.add(createGlobalOptimizerPass());     // Optimize out global vars
>>
>>     MPM.add(createIPSCCPPass());              // IP SCCP
>> +    MPM.add(createGlobalOptimizerPass());     // Optimize out global vars
>>     MPM.add(createDeadArgEliminationPass());  // Dead argument elimination
>>
>>     MPM.add(createInstructionCombiningPass());// Clean up after IPCP & DAE
>>
>>
>> When the global optimizer is invoked first, the indirect branch using the
>> global array has not
>> been eliminated. So the global can’t be eliminated. That is pretty
>> straight
>> forward and the pass swap
>> looks reasonable but requires more scrutiny and careful testing. As
>> additional benefit the global
>> array would be removed from the IR.
>>
>>
>> -Gerolf
>>
>> On Jun 25, 2014, at 6:01 PM, Gerolf Hoflehner <ghoflehner at apple.com>
>> wrote:
>>
>> Bob and I discussed this a bit more. Interprocedural const prop eliminates
>> the indirect branch. Then the labels. So when the
>> inliner kicks in it does not “see" any of the issues that could block it.
>>
>> However, with the patch there must still be some information about block
>> label address taken kept around when a
>> global  (static) array contains the label addresses. I verified that the
>> test case would work when foo2
>> also declared just a local array  const void *addrs[] = { &&L1, &&L2,
>> &&L3,
>> &&L4, &&L5 }.
>> Optimistically it seems like we might catch the global case also without
>> escape analysis,
>> but I have not looked into that code yet.
>>
>> -Gerolf
>>
>>
>>
>> On Jun 25, 2014, at 4:00 PM, Bob Wilson <bob.wilson at apple.com> wrote:
>>
>> Oh! It’s a front-end test. I didn’t even think to look there….
>>
>> The test predates the introduction of indirect branches and has only had
>> superficial changes since then. We used to translate computed gotos into
>> switch statements. I’m still trying to figure out how this ever worked in
>> the past.
>>
>> On Jun 25, 2014, at 3:50 PM, Gerolf Hoflehner <ghoflehner at apple.com>
>> wrote:
>>
>> Looking at the history it looks like Chris L. committed the test case in
>> tools/clang/test/CodeGen/indirect-goto.c to test
>> block address labels.
>>
>> I’m fine with the train of reasoning:
>> there is an indirect branch in the source and we
>> never attempt to inline such functions. Furthermore for inlining functions
>> with block
>> label address taken we would need escape analysis. We don’t have it. So
>> let’s not inline.
>>
>> It looks like everyone is signing up for
>> “Expect no optimization when block address labels are used”.
>>
>> Cheers
>> Gerolf
>>
>>
>>
>>
>> On Jun 25, 2014, at 2:49 PM, Eric Christopher <echristo at gmail.com> wrote:
>>
>> On Wed, Jun 25, 2014 at 2:46 PM, Bob Wilson <bob.wilson at apple.com> wrote:
>>
>> On Jun 25, 2014, at 2:30 PM, Gerolf Hoflehner <ghoflehner at apple.com>
>> wrote:
>>
>> I noticed one side-effect of this patch: we no longer optimize fully cases
>> like indirect-goto.c test in CodeGen below.
>> Since foo and foo2 are no longer inlined constant folding of the result in
>> main no longer happens.
>>
>> Does anyone have an issue with this?
>>
>>
>> I don’t. We’ve never attempted to inline functions containing indirect
>> branches. I’m surprised that your testcase works now, but to really do it
>> correctly, we would need to check that there were no remaining uses of the
>> blockaddress that might escape the function. I’m not aware of any code
>> that
>> does that check, and without it, it’s not a safe optimization.
>>
>>
>>
>> Oh, wait, I see what the testcase is doing now. I'd thought it was
>> inlining into something that had blockaddr rather than the other way
>> around.
>>
>> What does the commit for the testcase say about it?
>>
>> Assuming it doesn't say anything enlightening then no, no issue here
>> either.
>>
>> -eric
>>
>> Thanks
>> Gerolf
>>
>> // RUN: %clang_cc1 -triple i386-unknown-unknown -O3 -emit-llvm -o - %s |
>> grep "ret i32 2520"
>>
>> static int foo(unsigned i) {
>> void *addrs[] = { &&L1, &&L2, &&L3, &&L4, &&L5 };
>> int res = 1;
>>
>> goto *addrs[i];
>> L5: res *= 11;
>> L4: res *= 7;
>> L3: res *= 5;
>> L2: res *= 3;
>> L1: res *= 2;
>> return res;
>> }
>>
>> static int foo2(unsigned i) {
>> static const void *addrs[] = { &&L1, &&L2, &&L3, &&L4, &&L5 };
>> int res = 1;
>>
>> goto *addrs[i];
>> L5: res *= 11;
>> L4: res *= 7;
>> L3: res *= 5;
>> L2: res *= 3;
>> L1: res *= 2;
>> return res;
>> }
>>
>> int main() {
>> return foo(3)+foo2(4);
>> }
>>
>>
>> IR before:
>>
>> ; Function Attrs: nounwind readnone
>> define i32 @main() #0 {
>> entry:
>> ret i32 2520
>> }
>>
>>
>> IR now:
>>
>> ; Function Attrs: nounwind readnone
>> define i32 @main() #0 {
>> entry:
>> %call = tail call fastcc i32 @foo(i32 3)
>> %call1 = tail call fastcc i32 @foo2(i32 4)
>> %add = add nsw i32 %call1, %call
>> ret i32 %add
>> }
>>
>> ; Function Attrs: nounwind readnone
>> define internal fastcc i32 @foo(i32 %i) #0 {
>> entry:
>> br label %L1
>>
>> L1:                                               ; preds = %entry
>> ret i32 210
>> }
>>
>> ; Function Attrs: nounwind readnone
>> define internal fastcc i32 @foo2(i32 %i) #0 {
>> entry:
>> br label %L1
>>
>> L1:                                               ; preds = %entry
>> ret i32 2310
>> }
>>
>>
>> On Jun 25, 2014, at 11:32 AM, Bob Wilson <bob.wilson at apple.com> wrote:
>>
>>
>> On Jun 25, 2014, at 10:47 AM, Gerolf Hoflehner <ghoflehner at apple.com>
>> wrote:
>>
>> Hi
>>
>> there have been reports on Ruby segfaults triggered by inlining functions
>> with block label address taken like
>>
>>
>> //Ruby code snippet
>> vm_exec_core() {
>> finish_insn_seq_0 = &&INSN_LABEL_finish;
>> INSN_LABEL_finish:
>> ;
>> }
>>
>> This kind of scenario can also happen when LLVM picks a subset of blocks
>> for
>> inlining,
>> which is the case with the actual code in the Ruby environment.
>>
>> Background:
>> LLVM suppresses inlining for such functions when there is an indirect
>> branch. The attached
>> patch does so even when there is no indirect branch. Note that user code
>> like above would not
>> make much sense: using the global for jumping across function boundaries
>> would be illegal.
>>
>> Why is there a segfault:
>>
>> In the snipped above the block with the label is recognized as dead. So it
>> is eliminated. Instead
>> of a block address the cloner stores a constant (sic!) into the global
>> resulting in the segfault (when
>> the global is used in a goto).
>>
>> Why did it work in the past then:
>>
>> By luck. In older versions vm_exec_core was also inlined but the label
>> address used
>> was the block label address in vm_exec_core. So the global jump ended up
>> in
>> the original function
>> rather than in the caller which accidentally happened to work.
>>
>>
>> -Gerolf
>>
>> <0001-Suppress-inlining-when-the-block-address-is-taken.patch>
>>
>>
>> The change is fine, but I still have more comments about comments….
>>
>> Inlining function with block label address taken can cause
>> many problem and requires a rich infrastructure to support
>> including escape analysis. At this point the safest approach
>> to address these problems is by blocking inlining from
>> happening.
>>
>>
>> Can you please put more of the context into the commit message? You’ve got
>> some good background in the email above, and it would be useful to have
>> that
>> in the commit.
>>
>>
>> rdar://17245966
>> ---
>> lib/Analysis/IPA/InlineCost.cpp        | 21 ++++++++++++++++++++-
>> test/Transforms/Inline/blockaddress.ll |  5 +++--
>> 2 files changed, 23 insertions(+), 3 deletions(-)
>>
>> diff --git a/lib/Analysis/IPA/InlineCost.cpp
>> b/lib/Analysis/IPA/InlineCost.cpp
>> index 4536583..b079f46 100644
>> --- a/lib/Analysis/IPA/InlineCost.cpp
>> +++ b/lib/Analysis/IPA/InlineCost.cpp
>> @@ -1097,6 +1097,16 @@ bool CallAnalyzer::analyzeCall(CallSite CS) {
>>    BasicBlock *BB = BBWorklist[Idx];
>>    if (BB->empty())
>>      continue;
>> +    // Disallow inlining of functions when the address of a label is
>> +    // taken. Usually when a block address is taken there is also
>> +    // an indirect branch which effectively suppresses inlining.
>> +    // But when we partially inline and only take a subset of the blocks
>> +    // into account there may not be an indirect branch. As cautionary
>> +    // measure we suppress inlining when there is a block label address
>> taken.
>> +    // See also the comments to visitIndirectBrInst() and
>> isInlineViable().
>> +
>> +    if (BB->hasAddressTaken())
>> +      return false;
>>
>>
>> Please remove the blank line between the comment and the code.
>>
>> You should remove the comment sentence beginning with “As cautionary
>> measure…”. It is not just a cautionary measure — it fixes a serious
>> miscompile of Ruby.
>>
>> I’m not sure it is necessary to cross-reference the comments in the other
>> functions, but regardless, you should update the visitIndirectBrInst
>> comment
>> to remove this sentence: "And as a QOI issue, if someone is using a
>> blockaddress without an indirectbr, and that reference somehow ends up in
>> another function or global, we probably don't want to inline this
>> function.”
>> With your change here, that comment is no longer relevant.
>>
>>
>>    // Analyze the cost of this block. If we blow through the threshold,
>> this
>>    // returns false, and we can bail on out.
>>
>> @@ -1279,7 +1289,16 @@ bool InlineCostAnalysis::isInlineViable(Function
>> &F)
>> {
>>                                   Attribute::ReturnsTwice);
>>  for (Function::iterator BI = F.begin(), BE = F.end(); BI != BE; ++BI) {
>>    // Disallow inlining of functions which contain an indirect branch.
>> -    if (isa<IndirectBrInst>(BI->getTerminator()))
>> +    // Also disallow inlining of functions when the address of a label is
>> +    // taken. Usually when a block address is taken there is also
>> +    // an indirect branch which effectively suppresses inlining.
>> +    // But when there is no indirect branch the user wrote a function
>> +    // that takes the label address but does not jump to it within
>> +    // the function. This does not make much sense and could point
>> +    // to code that is not legal (eg. resulting in jump across function
>> +    // boundaries). The check here at least prevents inlining of such
>> code.
>> +    // See also the comments to visitIndirectBrInst() and in
>> analyzeCall().
>> +    if (isa<IndirectBrInst>(BI->getTerminator()) ||
>> BI->hasAddressTaken())
>>      return false;
>>
>>    for (BasicBlock::iterator II = BI->begin(), IE = BI->end(); II != IE;
>>
>>
>> This comment seems a little verbose to me. I would at least remove the
>> cross-references to comments in other functions.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>