[llvm-dev] ThinLTO + CFI
Peter Collingbourne via llvm-dev
llvm-dev at lists.llvm.org
Tue May 1 11:46:11 PDT 2018
On Tue, May 1, 2018 at 11:29 AM, <dmitry.mikulin at sony.com> wrote:
> It’s not the functions themselves, it’s the jump tables that end up in the
> same section of the thin-link temporary object. If I have, say, foo and bar
> and both of them need a jump table entry, both entries will be in the same
> section of the temp object. The real functions will be renamed to foo.cfi
> and bar.cfi. If I list “foo.cfi, foo, bar.cfi, bar” in the link order file,
> the linker (I use lld) will place foo.cfi followed by the entire jump
> table, followed by bar. In our case the jump table section is about 80K, so
> the expectation of foo being on the same page as bar breaks down.
>
I see what the problem is, it's that we use a custom section name for jump
tables. Normally, each function (the jump table for each type is a separate
function) gets a separate section, but that isn't the case for functions
with custom section names.
If you comment out this line of LowerTypeTests.cpp:
http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/LowerTypeTests.cpp#1274
does that help reduce the size of the individual sections?
Peter
>
>
> > On May 1, 2018, at 11:16 AM, Peter Collingbourne <peter at pcc.me.uk>
> wrote:
> >
> >
> >
> > On Tue, May 1, 2018 at 11:10 AM, <dmitry.mikulin at sony.com> wrote:
> >
> >
> > > On Apr 30, 2018, at 6:04 PM, Peter Collingbourne <peter at pcc.me.uk>
> wrote:
> > >
> > > On Mon, Apr 30, 2018 at 1:05 PM, <dmitry.mikulin at sony.com> wrote:
> > > Replacing direct calls to jump table entries with calls to real
> targets recovers most of the performance loss in my benchmark. Dealing with
> link order files is a bit cumbersome though: I can’t keep both, say “foo”
> and “foo.cfi” next to each other in link order, because if both exist, the
> linker reorders the jump table next to the real function. This is not what
> we want when the goal is to get rid of calls through jump tables.
> > >
> > > I would have thought that it could be beneficial to place the jump
> table next to one of the target functions, as it may reduce the chance of
> an additional page fault when performing an indirect call via a jump table.
> Do your benchmarks show otherwise?
> >
> > The problem is that the jump table entries end up in a large section of
> <name>.lto.o and the entire section is placed right next to one of the real
> functions. It breaks all locality in our case.
> >
> > Is <name>.lto.o being compiled with function sections? That should allow
> the linker to move just the jump table. I know that lld and the gold plugin
> do that by default, but I don't know what your linker does.
> >
> > >
> > > I had to manually pick the functions which were renamed to “foo.cfi”
> and replace them in the link order file. The process can be automated, but
> it’s a bit flaky if we have to do it to make CFI work.
> > >
> > > I attached 2 patches here: one is the direct call replacement and the
> other is moving type test lowering pass to run later in the pipeline.
> Interestingly, running the pass later seems to help with performance as
> well. Though the comment in PassManagerBuilder implies that this pass needs
> to run early. Why is it different from full LTO?
> > >
> > > The issue with running the passes early is that the
> llvm.assume(llvm.type.test) pattern that WholeProgramDevirt looks for is
> somewhat fragile, so it needs to see the IR first before one of the other
> passes can break it. (This is probably something that we want to fix by
> using a different pattern.) I don't think that the same issue exists with
> any of the patterns that LowerTypeTests looks for, so it would probably be
> fine to move it later.
> >
> > Moving WholeProgramDevirt breaks some tests. LowerTypeTests seems OK,
> not sure how much we can trust the tests though ;)
> > I’ll create a Phab review for this as well.
> >
> > Thanks, I'll take a look.
> >
> > Peter
> >
> >
> > >
> > > Peter
> > >
> > >
> > >
> > >
> > >
> > >
> > > > On Apr 27, 2018, at 11:04 AM, via llvm-dev <llvm-dev at lists.llvm.org>
> wrote:
> > > >
> > > > For the test case below, I get the following IR for main() on entry
> to ThinLTO backend invocation of LowerTypeTestsModule::lower(). This is
> after I moved this pass down in the pipeline so it’s invoked after inlining.
> > > >
> > > > The declarations for foo() and bar() are read in at the time of
> module import, Importer.importFunctions() in lto::thinBackend(). They do
> not have type metadata attached to them.
> > > > In lowerTypeTestCall() we check if the pointer in the type test is
> of a known type, so we look at bitcast and then select operands. foo and
> bar in select are global objects with no type metadata, so the type check
> cannot be guaranteed to be true and it can’t be eliminated. In full LTO
> case this works as expected: both foo and bar are the same known type and
> type check is gone.
> > > >
> > > > Maybe the problem is not with renaming but with the missing type
> metadata in this particular case.
> > > > Though having so many direct calls routed through the jump table
> still seems problematic. Is there a feasible solution?
> > > >
> > > > --------------------------------
> > > > define hidden i32 @main(i32, i8** nocapture readnone)
> local_unnamed_addr #0 !type !3 !type !4 {
> > > > %3 = add nsw i32 %0, -2
> > > > store i32 %3, i32* @i, align 4, !tbaa !5
> > > > %4 = icmp sgt i32 %0, 1
> > > > %5 = select i1 %4, i32 ()* @foo, i32 ()* @bar
> > > > %6 = bitcast i32 ()* %5 to i8*, !nosanitize !9
> > > > %7 = tail call i1 @llvm.type.test(i8* nonnull %6, metadata
> !"_ZTSFivE"), !nosanitize !9
> > > > br i1 %7, label %10, label %8, !prof !10, !nosanitize !9
> > > >
> > > > ; <label>:8: ; preds = %2
> > > > %9 = ptrtoint i32 ()* %5 to i64
> > > > tail call void @__ubsan_handle_cfi_check_fail_abort(i8*
> getelementptr inbounds ({ i8, { [4 x i8]*, i32, i32 }, { i16, i16, [13 x
> i8] }* }, { i8, { [4 x i8]*, i32, i32 }, { i16, i16, [13 x i8] }* }* @anon.
> fad58de7366495db4650cfefac2fcd61.1, i64 0, i32 0), i64 %9, i64 undef) #4,
> !nosanitize !9
> > > > unreachable, !nosanitize !9
> > > >
> > > > ; <label>:10: ; preds = %2
> > > > %11 = tail call i32 %5() #5
> > > > ret i32 %11
> > > > }
> > > >
> > > > . . .
> > > > declare hidden i32 @foo() #3
> > > > declare hidden i32 @bar() #3
> > > >
> > > >
> > > > —————————test case---------------
> > > > b.c
> > > > =================
> > > > typedef int (*fptr_t) (void);
> > > > fptr_t get_fptr();
> > > > extern int i;
> > > >
> > > > int main(int argc, char *argv[])
> > > > {
> > > > i = argc - 2;
> > > > fptr_t fp = get_fptr();
> > > > return fp();
> > > > }
> > > >
> > > > v.c
> > > > ================
> > > > int i;
> > > > typedef int (*fptr_t) (void);
> > > > int foo(void) { return 11; }
> > > > int bar(void) { return 22; }
> > > > fptr_t get_fptr(void) { return (i >= 0) ? foo : bar; }
> > > >
> > > >
> > > >> On Apr 26, 2018, at 5:29 PM, Peter Collingbourne <peter at pcc.me.uk>
> wrote:
> > > >>
> > > >>> We could probably tolerate a certain amount of unused jump table
> entries. However, I just realized that all non-inline imported calls end up
> going through a jump table entry. Is that correct?
> > > >>
> > > >> In fact it is all calls that go through a function pointer type
> that is used anywhere in the program for an indirect call, but depending on
> your program that could be very close to "yes".
> > > >>
> > > >>> Moving the type check lowering pass further down the pipeline
> (after inlining) still does not solve the problem because CFI renaming
> happens early and symbols attached to the jump table do not have a matching
> type.
> > > >>
> > > >> As far as I know, renaming happens during the LowerTypeTests pass,
> after the type checks are lowered.
> > > >> Lowering: http://llvm-cs.pcc.me.uk/lib/
> Transforms/IPO/LowerTypeTests.cpp#1620
> > > >> Renaming: http://llvm-cs.pcc.me.uk/lib/
> Transforms/IPO/LowerTypeTests.cpp#1642
> > > >> Do you have an example of what you are seeing?
> > > >>
> > > >> Peter
> > > >>
> > > >> On Thu, Apr 26, 2018 at 4:54 PM, <dmitry.mikulin at sony.com> wrote:
> > > >> Hi Peter,
> > > >>
> > > >> We could probably tolerate a certain amount of unused jump table
> entries. However, I just realized that all non-inline imported calls end up
> going through a jump table entry. Is that correct? Initially I thought you
> meant calls promoted from indirect. While this can be fixed by replacing
> direct calls to jump tables with direct calls to real targets, I found
> other cases where ThinLTO+CFI has issues.
> > > >>
> > > >> In ThinLTO backend, type test lowering happens very early in the
> pipeline, before inlining. When the type check after the call to get_fptr()
> is lowered (in my original example, below), the compiler cannot see that
> both targets belong to the same type and that the type check will always
> return ‘true’ and can be eliminated. Moving the type check lowering pass
> further down the pipeline (after inlining) still does not solve the problem
> because CFI renaming happens early and symbols attached to the jump table
> do not have a matching type.
> > > >>
> > > >> I’m trying to think if there’s a way to delay renaming until
> ThinLTO backend type check lowering pass. It would help with solving both
> problems.
> > > >>
> > > >> Thanks.
> > > >> Dmitry.
> > > >>
> > > >>
> > > >>
> > > >>>>>>
> > > >>>>>> a.c
> > > >>>>>> =============================
> > > >>>>>> typedef int (*fptr_t) (void);
> > > >>>>>> fptr_t get_fptr();
> > > >>>>>> int main(int argc, char *argv[])
> > > >>>>>> {
> > > >>>>>> fptr_t fp = get_fptr();
> > > >>>>>> return fp();
> > > >>>>>> }
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> b.c
> > > >>>>>> =============================
> > > >>>>>> typedef int (*fptr_t) (void);
> > > >>>>>> int foo(void) { return 11; }
> > > >>>>>> int bar(void) { return 22; }
> > > >>>>>>
> > > >>>>>> static fptr_t fptr = bar;
> > > >>>>>> static int i = 53;
> > > >>>>>>
> > > >>>>>> fptr_t get_fptr(void)
> > > >>>>>> {
> > > >>>>>> if (i >= 0)
> > > >>>>>> fptr = foo;
> > > >>>>>> else
> > > >>>>>> fptr = bar;
> > > >>>>>>
> > > >>>>>> return fptr;
> > > >>>>>> }
> > > >>>>>>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>> On Apr 19, 2018, at 6:18 PM, Peter Collingbourne <peter at pcc.me.uk>
> wrote:
> > > >>>
> > > >>> Regarding the orderfile, yes, I was thinking more about ordering
> the real functions.
> > > >>>
> > > >>> In that case it sounds like your best option may be to implement
> the optimization pass to make direct calls go directly to the real
> function. From a performance perspective I don't think it would make much
> difference if there are unused jump table entries.
> > > >>>
> > > >>> Peter
> > > >>>
> > > >>> On Thu, Apr 19, 2018 at 6:09 PM, via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> > > >>> Teresa, Peter,
> > > >>>
> > > >>> Thanks for your help!
> > > >>> I need to re-run my experiments as the compiler I used did not
> have the latest changes like r327254.
> > > >>> The fact that the decision about routing calls through jump table
> entries is made early may be problematic. In my experiments with FreeBSD
> kernel, ThinLTO produced thousands jump table entries compared to only
> dozens with full LTO. As for re-ordering jump table entries, I don’t think
> it’s going to work as they are placed in the same section. Including *.cfi
> names into a link order file will take care of re-ordering real functions
> routed through jump table entries, but in our case we need to force some
> functions to be on the same page. So not having jump table entries for the
> functions that don't really need them would be ideal.
> > > >>>
> > > >>> Thanks.
> > > >>> Dmitry.
> > > >>>
> > > >>>
> > > >>>> On Apr 18, 2018, at 6:11 PM, Teresa Johnson <tejohnson at google.com>
> wrote:
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Wed, Apr 18, 2018 at 4:49 PM, <dmitry.mikulin at sony.com>
> wrote:
> > > >>>> Hi Teresa,
> > > >>>>
> > > >>>> Thanks for the info!
> > > >>>> This example is my attempt to reduce FreeBSD kernel to something
> more manageable :)
> > > >>>>
> > > >>>> I will take a look at why globals are not being imported in this
> case. What’s the best tool to look into ThinLTO objects and their
> summaries? Most dumping tools don’t seem to like ThinLTO bitcode files…
> > > >>>>
> > > >>>> Sadly there isn't a really great way to dump the summaries. =(
> There was a patch awhile back by a GSOC student to dump in YAML format, but
> there was resistance from some who preferred dumping to llvm assembly via
> llvm-dis and support reading in the summary from llvm assembly. It's been
> on my list of things to do, hasn't yet risen high enough in priority to
> work on that. For now, you have to use llvm-bcanalyzer -dump and look at
> the raw format.
> > > >>>>
> > > >>>> Teresa
> > > >>>>
> > > >>>>
> > > >>>> Hopefully Peter can chime in regarding CFI related issues.
> > > >>>>
> > > >>>> Thanks.
> > > >>>> Dmitry.
> > > >>>>
> > > >>>>
> > > >>>>> On Apr 17, 2018, at 9:37 AM, Teresa Johnson <
> tejohnson at google.com> wrote:
> > > >>>>>
> > > >>>>> Hi Dmitry,
> > > >>>>>
> > > >>>>> Sorry for the late reply. For CFI specific code generation, pcc
> is a better person to answer. But on the issue of global variables being
> optimized, that hasn't happened yet. That would be great if you wanted to
> pick that up!
> > > >>>>>
> > > >>>>> In your original email example, it seems like the file static
> i=53 could be constant propagated since there are no other defs, and the
> code in get_fptr simplified during the compile step, but I assume this is
> part of a more complex example where it is not possible to do this? Also
> note that with r327254 we started importing global variables. Do you know
> why we don't import in your case? I wonder if it has to do with it being
> CFI inserted code?
> > > >>>>>
> > > >>>>> Teresa
> > > >>>>>
> > > >>>>> On Tue, Apr 17, 2018 at 9:17 AM <dmitry.mikulin at sony.com> wrote:
> > > >>>>> I watched Teresa’s talk on ThinLTO from last year’s CppCon, and
> it sounded like adding global variable information to the summaries was in
> the works, or at least in planning. Can someone (Teresa?) please share the
> current status? If it’s part of future plans, are there any specific
> proposals that can be picked up and worked on?
> > > >>>>>
> > > >>>>> Thanks!
> > > >>>>>
> > > >>>>>
> > > >>>>>> On Apr 9, 2018, at 6:51 PM, via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> > > >>>>>>
> > > >>>>>> Hi,
> > > >>>>>>
> > > >>>>>> I’m working on setting up ThinLTO+CFI for a C application which
> uses a lot of function pointers. While functionally it appears stable, it’s
> performance is significantly degraded, to the tune of double digit
> percentage points compared to regular LTO+CFI.
> > > >>>>>>
> > > >>>>>> Looking into possible causes I see that under ThinLTO+CFI iCall
> type checks almost always generate jump table entries for indirect calls,
> which creates another level of indirection for every such call. On top of
> that it breaks the link order layout because real function names point to
> jump table entries. It appears that I’m hitting a limitation in ThinLTO on
> how much information it can propagate across modules, particularly
> information about constants. In the example below, the fact that “i” is
> effectively a constant, is lost under ThinLTO, and the inlined copy of
> b.c:get_fptr() in a.c does not eliminate the conditional, which, for CFI
> purposes requires to generate a type check/jump table.
> > > >>>>>>
> > > >>>>>> I was wondering if there was a way to mitigate this limitation.
> > > >>>>>>
> > > >>>>>> a.c
> > > >>>>>> =============================
> > > >>>>>> typedef int (*fptr_t) (void);
> > > >>>>>> fptr_t get_fptr();
> > > >>>>>> int main(int argc, char *argv[])
> > > >>>>>> {
> > > >>>>>> fptr_t fp = get_fptr();
> > > >>>>>> return fp();
> > > >>>>>> }
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> b.c
> > > >>>>>> =============================
> > > >>>>>> typedef int (*fptr_t) (void);
> > > >>>>>> int foo(void) { return 11; }
> > > >>>>>> int bar(void) { return 22; }
> > > >>>>>>
> > > >>>>>> static fptr_t fptr = bar;
> > > >>>>>> static int i = 53;
> > > >>>>>>
> > > >>>>>> fptr_t get_fptr(void)
> > > >>>>>> {
> > > >>>>>> if (i >= 0)
> > > >>>>>> fptr = foo;
> > > >>>>>> else
> > > >>>>>> fptr = bar;
> > > >>>>>>
> > > >>>>>> return fptr;
> > > >>>>>> }
> > > >>>>>>
> > > >>>>>> _______________________________________________
> > > >>>>>> LLVM Developers mailing list
> > > >>>>>> llvm-dev at lists.llvm.org
> > > >>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> --
> > > >>>>> Teresa Johnson | Software Engineer | tejohnson at google.com
> | 408-460-2413
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >>>> Teresa Johnson | Software Engineer | tejohnson at google.com
> | 408-460-2413
> > > >>>
> > > >>>
> > > >>> _______________________________________________
> > > >>> LLVM Developers mailing list
> > > >>> llvm-dev at lists.llvm.org
> > > >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> --
> > > >>> Peter
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> --
> > > >> Peter
> > > >
> > > > _______________________________________________
> > > > LLVM Developers mailing list
> > > > llvm-dev at lists.llvm.org
> > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > >
> > >
> > >
> > >
> > > --
> > > --
> > > Peter
> >
> >
> >
> >
> > --
> > --
> > Peter
>
>
--
--
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180501/9241d7ba/attachment.html>
More information about the llvm-dev
mailing list