[llvm-dev] ThinLTO + CFI

Peter Collingbourne via llvm-dev llvm-dev at lists.llvm.org
Tue May 1 11:16:59 PDT 2018


On Tue, May 1, 2018 at 11:10 AM, <dmitry.mikulin at sony.com> wrote:

>
>
> > On Apr 30, 2018, at 6:04 PM, Peter Collingbourne <peter at pcc.me.uk>
> wrote:
> >
> > On Mon, Apr 30, 2018 at 1:05 PM,  <dmitry.mikulin at sony.com> wrote:
> > Replacing direct calls to jump table entries with calls to real targets
> recovers most of the performance loss in my benchmark. Dealing with link
> order files is a bit cumbersome though: I can’t keep both, say “foo” and
> “foo.cfi” next to each other in link order, because if both exist, the
> linker reorders the jump table next to the real function. This is not what
> we want when the goal is to get rid of calls through jump tables.
> >
> > I would have thought that it could be beneficial to place the jump table
> next to one of the target functions, as it may reduce the chance of an
> additional page fault when performing an indirect call via a jump table. Do
> your benchmarks show otherwise?
>
> The problem is that the jump table entries end up in a large section of
> <name>.lto.o and the entire section is placed right next to one of the real
> functions. It breaks all locality in our case.
>

Is <name>.lto.o being compiled with function sections? That should allow
the linker to move just the jump table. I know that lld and the gold plugin
do that by default, but I don't know what your linker does.

>
> > I had to manually pick the functions which were renamed to “foo.cfi” and
> replace them in the link order file. The process can be automated, but it’s
> a bit flaky if we have to do it to make CFI work.
> >
> > I attached 2 patches here: one is the direct call replacement and the
> other is moving type test lowering pass to run later in the pipeline.
> Interestingly, running the pass later seems to help with performance as
> well. Though the comment in PassManagerBuilder implies that this pass needs
> to run early. Why is it different from full LTO?
> >
> > The issue with running the passes early is that the
> llvm.assume(llvm.type.test) pattern that WholeProgramDevirt looks for is
> somewhat fragile, so it needs to see the IR first before one of the other
> passes can break it. (This is probably something that we want to fix by
> using a different pattern.) I don't think that the same issue exists with
> any of the patterns that LowerTypeTests looks for, so it would probably be
> fine to move it later.
>
> Moving WholeProgramDevirt breaks some tests. LowerTypeTests seems OK, not
> sure how much we can trust the tests though ;)
> I’ll create a Phab review for this as well.
>

Thanks, I'll take a look.

Peter

>
>
> >
> > Peter
> >
> >
> >
> >
> >
> >
> > > On Apr 27, 2018, at 11:04 AM, via llvm-dev <llvm-dev at lists.llvm.org>
> wrote:
> > >
> > > For the test case below, I get the following IR for main() on entry to
> ThinLTO backend invocation of LowerTypeTestsModule::lower(). This is after
> I moved this pass down in the pipeline so it’s invoked after inlining.
> > >
> > > The declarations for foo() and bar() are read in at the time of module
> import, Importer.importFunctions() in lto::thinBackend(). They do not have
> type metadata attached to them.
> > > In lowerTypeTestCall() we check if the pointer in the type test is of
> a known type, so we look at bitcast and then select operands. foo and bar
> in select are global objects with no type metadata, so the type check
> cannot be guaranteed to be true and it can’t be eliminated. In full LTO
> case this works as expected: both foo and bar are the same known type and
> type check is gone.
> > >
> > > Maybe the problem is not with renaming but with the missing type
> metadata in this particular case.
> > > Though having so many direct calls routed through the jump table still
> seems problematic. Is there a feasible solution?
> > >
> > > --------------------------------
> > > define hidden i32 @main(i32, i8** nocapture readnone)
> local_unnamed_addr #0 !type !3 !type !4 {
> > >  %3 = add nsw i32 %0, -2
> > >  store i32 %3, i32* @i, align 4, !tbaa !5
> > >  %4 = icmp sgt i32 %0, 1
> > >  %5 = select i1 %4, i32 ()* @foo, i32 ()* @bar
> > >  %6 = bitcast i32 ()* %5 to i8*, !nosanitize !9
> > >  %7 = tail call i1 @llvm.type.test(i8* nonnull %6, metadata
> !"_ZTSFivE"), !nosanitize !9
> > >  br i1 %7, label %10, label %8, !prof !10, !nosanitize !9
> > >
> > > ; <label>:8:                                      ; preds = %2
> > >  %9 = ptrtoint i32 ()* %5 to i64
> > >  tail call void @__ubsan_handle_cfi_check_fail_abort(i8*
> getelementptr inbounds ({ i8, { [4 x i8]*, i32, i32 }, { i16, i16, [13 x
> i8] }* }, { i8, { [4 x i8]*, i32, i32 }, { i16, i16, [13 x i8] }* }* @anon.
> fad58de7366495db4650cfefac2fcd61.1, i64 0, i32 0), i64 %9, i64 undef) #4,
> !nosanitize !9
> > >  unreachable, !nosanitize !9
> > >
> > > ; <label>:10:                                     ; preds = %2
> > >  %11 = tail call i32 %5() #5
> > >  ret i32 %11
> > > }
> > >
> > > . . .
> > > declare hidden i32 @foo() #3
> > > declare hidden i32 @bar() #3
> > >
> > >
> > > —————————test case---------------
> > > b.c
> > > =================
> > > typedef int (*fptr_t) (void);
> > > fptr_t get_fptr();
> > > extern int i;
> > >
> > > int main(int argc, char *argv[])
> > > {
> > >  i = argc - 2;
> > >  fptr_t fp = get_fptr();
> > >  return fp();
> > > }
> > >
> > > v.c
> > > ================
> > > int i;
> > > typedef int (*fptr_t) (void);
> > > int foo(void) {  return 11; }
> > > int bar(void) {  return 22; }
> > > fptr_t get_fptr(void) {  return (i >= 0) ? foo : bar; }
> > >
> > >
> > >> On Apr 26, 2018, at 5:29 PM, Peter Collingbourne <peter at pcc.me.uk>
> wrote:
> > >>
> > >>> We could probably tolerate a certain amount of unused jump table
> entries. However, I just realized that all non-inline imported calls end up
> going through a jump table entry. Is that correct?
> > >>
> > >> In fact it is all calls that go through a function pointer type that
> is used anywhere in the program for an indirect call, but depending on your
> program that could be very close to "yes".
> > >>
> > >>> Moving the type check lowering pass further down the pipeline (after
> inlining) still does not solve the problem because CFI renaming happens
> early and symbols attached to the jump table do not have a matching type.
> > >>
> > >> As far as I know, renaming happens during the LowerTypeTests pass,
> after the type checks are lowered.
> > >> Lowering: http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/LowerTypeTests.
> cpp#1620
> > >> Renaming: http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/LowerTypeTests.
> cpp#1642
> > >> Do you have an example of what you are seeing?
> > >>
> > >> Peter
> > >>
> > >> On Thu, Apr 26, 2018 at 4:54 PM,  <dmitry.mikulin at sony.com> wrote:
> > >> Hi Peter,
> > >>
> > >> We could probably tolerate a certain amount of unused jump table
> entries. However, I just realized that all non-inline imported calls end up
> going through a jump table entry. Is that correct? Initially I thought you
> meant calls promoted from indirect. While this can be fixed by replacing
> direct calls to jump tables with direct calls to real targets, I found
> other cases where ThinLTO+CFI has issues.
> > >>
> > >> In ThinLTO backend, type test lowering happens very early in the
> pipeline, before inlining. When the type check after the call to get_fptr()
> is lowered (in my original example, below), the compiler cannot see that
> both targets belong to the same type and that the type check will always
> return ‘true’ and can be eliminated. Moving the type check lowering pass
> further down the pipeline (after inlining) still does not solve the problem
> because CFI renaming happens early and symbols attached to the jump table
> do not have a matching type.
> > >>
> > >> I’m trying to think if there’s a way to delay renaming until ThinLTO
> backend type check lowering pass. It would help with solving both problems.
> > >>
> > >> Thanks.
> > >> Dmitry.
> > >>
> > >>
> > >>
> > >>>>>>
> > >>>>>> a.c
> > >>>>>> =============================
> > >>>>>> typedef int (*fptr_t) (void);
> > >>>>>> fptr_t get_fptr();
> > >>>>>> int main(int argc, char *argv[])
> > >>>>>> {
> > >>>>>> fptr_t fp = get_fptr();
> > >>>>>> return fp();
> > >>>>>> }
> > >>>>>>
> > >>>>>>
> > >>>>>> b.c
> > >>>>>> =============================
> > >>>>>> typedef int (*fptr_t) (void);
> > >>>>>> int foo(void) { return 11; }
> > >>>>>> int bar(void) { return 22; }
> > >>>>>>
> > >>>>>> static fptr_t fptr = bar;
> > >>>>>> static int i = 53;
> > >>>>>>
> > >>>>>> fptr_t get_fptr(void)
> > >>>>>> {
> > >>>>>> if (i >= 0)
> > >>>>>>   fptr = foo;
> > >>>>>> else
> > >>>>>>   fptr = bar;
> > >>>>>>
> > >>>>>> return fptr;
> > >>>>>> }
> > >>>>>>
> > >>
> > >>
> > >>
> > >>
> > >>> On Apr 19, 2018, at 6:18 PM, Peter Collingbourne <peter at pcc.me.uk>
> wrote:
> > >>>
> > >>> Regarding the orderfile, yes, I was thinking more about ordering the
> real functions.
> > >>>
> > >>> In that case it sounds like your best option may be to implement the
> optimization pass to make direct calls go directly to the real function.
> From a performance perspective I don't think it would make much difference
> if there are unused jump table entries.
> > >>>
> > >>> Peter
> > >>>
> > >>> On Thu, Apr 19, 2018 at 6:09 PM, via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> > >>> Teresa, Peter,
> > >>>
> > >>> Thanks for your help!
> > >>> I need to re-run my experiments as the compiler I used did not have
> the latest changes like r327254.
> > >>> The fact that the decision about routing calls through jump table
> entries is made early may be problematic. In my experiments with FreeBSD
> kernel, ThinLTO produced thousands jump table entries compared to only
> dozens with full LTO. As for re-ordering jump table entries, I don’t think
> it’s going to work as they are placed in the same section. Including *.cfi
> names into a link order file will take care of re-ordering real functions
> routed through jump table entries, but in our case we need to force some
> functions to be on the same page. So not having jump table entries for the
> functions that don't really need them would be ideal.
> > >>>
> > >>> Thanks.
> > >>> Dmitry.
> > >>>
> > >>>
> > >>>> On Apr 18, 2018, at 6:11 PM, Teresa Johnson <tejohnson at google.com>
> wrote:
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Wed, Apr 18, 2018 at 4:49 PM,  <dmitry.mikulin at sony.com> wrote:
> > >>>> Hi Teresa,
> > >>>>
> > >>>> Thanks for the info!
> > >>>> This example is my attempt to reduce FreeBSD kernel to something
> more manageable :)
> > >>>>
> > >>>> I will take a look at why globals are not being imported in this
> case. What’s the best tool to look into ThinLTO objects and their
> summaries? Most dumping tools don’t seem to like ThinLTO bitcode files…
> > >>>>
> > >>>> Sadly there isn't a really great way to dump the summaries. =(
> There was a patch awhile back by a GSOC student to dump in YAML format, but
> there was resistance from some who preferred dumping to llvm assembly via
> llvm-dis and support reading in the summary from llvm assembly. It's been
> on my list of things to do, hasn't yet risen high enough in priority to
> work on that. For now, you have to use llvm-bcanalyzer -dump and look at
> the raw format.
> > >>>>
> > >>>> Teresa
> > >>>>
> > >>>>
> > >>>> Hopefully Peter can chime in regarding CFI related issues.
> > >>>>
> > >>>> Thanks.
> > >>>> Dmitry.
> > >>>>
> > >>>>
> > >>>>> On Apr 17, 2018, at 9:37 AM, Teresa Johnson <tejohnson at google.com>
> wrote:
> > >>>>>
> > >>>>> Hi Dmitry,
> > >>>>>
> > >>>>> Sorry for the late reply. For CFI specific code generation, pcc is
> a better person to answer. But on the issue of global variables being
> optimized, that hasn't happened yet. That would be great if you wanted to
> pick that up!
> > >>>>>
> > >>>>> In your original email example, it seems like the file static i=53
> could be constant propagated since there are no other defs, and the code in
> get_fptr simplified during the compile step, but I assume this is part of a
> more complex example where it is not possible to do this? Also note that
> with r327254 we started importing global variables. Do you know why we
> don't import in your case? I wonder if it has to do with it being CFI
> inserted code?
> > >>>>>
> > >>>>> Teresa
> > >>>>>
> > >>>>> On Tue, Apr 17, 2018 at 9:17 AM <dmitry.mikulin at sony.com> wrote:
> > >>>>> I watched  Teresa’s talk on ThinLTO from last year’s CppCon, and
> it sounded like adding global variable information to the summaries was in
> the works, or at least in planning. Can someone (Teresa?) please share the
> current status? If it’s part of future plans, are there any specific
> proposals that can be picked up and worked on?
> > >>>>>
> > >>>>> Thanks!
> > >>>>>
> > >>>>>
> > >>>>>> On Apr 9, 2018, at 6:51 PM, via llvm-dev <llvm-dev at lists.llvm.org>
> wrote:
> > >>>>>>
> > >>>>>> Hi,
> > >>>>>>
> > >>>>>> I’m working on setting up ThinLTO+CFI for a C application which
> uses a lot of function pointers. While functionally it appears stable, it’s
> performance is significantly degraded, to the tune of double digit
> percentage points compared to regular LTO+CFI.
> > >>>>>>
> > >>>>>> Looking into possible causes I see that under ThinLTO+CFI iCall
> type checks almost always generate jump table entries for indirect calls,
> which creates another level of indirection for every such call. On top of
> that it breaks the link order layout because real function names point to
> jump table entries. It appears that I’m hitting a limitation in ThinLTO on
> how much information it can propagate across modules, particularly
> information about constants. In the example below, the fact that “i” is
> effectively a constant, is lost under ThinLTO, and the inlined copy of
> b.c:get_fptr() in a.c does not eliminate the conditional, which, for CFI
> purposes requires to generate a type check/jump table.
> > >>>>>>
> > >>>>>> I was wondering if there was a way to mitigate this limitation.
> > >>>>>>
> > >>>>>> a.c
> > >>>>>> =============================
> > >>>>>> typedef int (*fptr_t) (void);
> > >>>>>> fptr_t get_fptr();
> > >>>>>> int main(int argc, char *argv[])
> > >>>>>> {
> > >>>>>> fptr_t fp = get_fptr();
> > >>>>>> return fp();
> > >>>>>> }
> > >>>>>>
> > >>>>>>
> > >>>>>> b.c
> > >>>>>> =============================
> > >>>>>> typedef int (*fptr_t) (void);
> > >>>>>> int foo(void) { return 11; }
> > >>>>>> int bar(void) { return 22; }
> > >>>>>>
> > >>>>>> static fptr_t fptr = bar;
> > >>>>>> static int i = 53;
> > >>>>>>
> > >>>>>> fptr_t get_fptr(void)
> > >>>>>> {
> > >>>>>> if (i >= 0)
> > >>>>>>   fptr = foo;
> > >>>>>> else
> > >>>>>>   fptr = bar;
> > >>>>>>
> > >>>>>> return fptr;
> > >>>>>> }
> > >>>>>>
> > >>>>>> _______________________________________________
> > >>>>>> LLVM Developers mailing list
> > >>>>>> llvm-dev at lists.llvm.org
> > >>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Teresa Johnson |   Software Engineer |      tejohnson at google.com
> |   408-460-2413
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Teresa Johnson |    Software Engineer |      tejohnson at google.com
> |   408-460-2413
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> LLVM Developers mailing list
> > >>> llvm-dev at lists.llvm.org
> > >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> --
> > >>> Peter
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> --
> > >> Peter
> > >
> > > _______________________________________________
> > > LLVM Developers mailing list
> > > llvm-dev at lists.llvm.org
> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> >
> >
> >
> > --
> > --
> > Peter
>
>


-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180501/429dffe5/attachment.html>


More information about the llvm-dev mailing list