[llvm-dev] ThinLTO + CFI

Peter Collingbourne via llvm-dev llvm-dev at lists.llvm.org
Tue May 1 15:11:50 PDT 2018


The problem as I recall was that we do not want jumptable sections to
appear before __cfi_check, so we gave these jump tables a name so that the
bfd default linker script would sort them after __cfi_check. This is the
relevant part of the default linker script:

  .text           :
  {
    [...]
    *(.text.hot .text.hot.*)
    *(.text .stub .text.* .gnu.linkonce.t.*)
  }

So I guess the idea was that if __cfi_check was in .text and the jumptables
were in .text.something then the jumptables would appear after __cfi_check?
But that can't be right, because as I understand it, a *(...) grouping does
not control the order of sections mentioned within it, only between other
groups. So a .text section can appear in any order relative to .text.*, but
all the .text.* and .text sections must appear after .text.hot and
.text.hot.* sections.

The section naming trick also wouldn't work if function sections was
enabled, and at the time when that code was added I think that function
sections was disabled by default with LTO. So this is probably only working
by chance.

I think I'd be happy to remove this line entirely then, as long as
check-cfi (i.e. the execution tests for CFI, including cross-DSO CFI) still
passes. If that breaks check-cfi, we can probably leave the logic in place
for cross-DSO CFI only (i.e. module flag named "Cross-DSO CFI" present) and
we can figure out a better way to do this later.

Peter

On Tue, May 1, 2018 at 2:04 PM, <dmitry.mikulin at sony.com> wrote:

> Jump table sections already have this prefix: ".text..L.cfi.jumptable.”,
> without forcing the name with setSection(). Is that good enough?
>
>
> > On May 1, 2018, at 1:16 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:
> >
> > I think we just need to change the section name for non-MachO object
> formats because "__TEXT,__text,regular,pure_instructions" is the default
> text section name in MachO.
> >
> > There seems to be a way to get the ".cfi" suffix on section names that
> doesn't involve setting a custom section name (which would invoke the
> "everything goes in the same section" behaviour), and that's to use the
> confusingly-named "section prefix" feature. If you replace that line with:
> > if (ObjectFormat != Triple::MachO) F->setSectionPrefix(".cfi");
> > that should make all jump table section names begin with ".text.cfi",
> which I think should be enough to cause the behaviour needed for cross-DSO
> CFI here (eugenis can probably confirm).
> >
> > Peter
> >
> > On Tue, May 1, 2018 at 12:59 PM, <dmitry.mikulin at sony.com> wrote:
> > That seems to do the trick.
> > I get “symbol ordering file: no such symbol: <blah>” warnings for
> symbols that didn’t get renamed.
> > Maybe we'll have a separate order file for CFI builds...
> >
> > What about the comment above that line?
> >
> >
> > > On May 1, 2018, at 11:46 AM, Peter Collingbourne <peter at pcc.me.uk>
> wrote:
> > >
> > > On Tue, May 1, 2018 at 11:29 AM,  <dmitry.mikulin at sony.com> wrote:
> > > It’s not the functions themselves, it’s the jump tables that end up in
> the same section of the thin-link temporary object. If I have, say, foo and
> bar and both of them need a jump table entry, both entries will be in the
> same section of the temp object. The real functions will be renamed to
> foo.cfi and bar.cfi. If I list “foo.cfi, foo, bar.cfi, bar” in the link
> order file, the linker (I use lld) will place foo.cfi followed by the
> entire jump table, followed by bar. In our case the jump table section is
> about 80K, so the expectation of foo being on the same page as bar breaks
> down.
> > >
> > > I see what the problem is, it's that we use a custom section name for
> jump tables. Normally, each function (the jump table for each type is a
> separate function) gets a separate section, but that isn't the case for
> functions with custom section names.
> > >
> > > If you comment out this line of LowerTypeTests.cpp:
> > > http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/LowerTypeTests.cpp#1274
> > > does that help reduce the size of the individual sections?
> > >
> > > Peter
> > >
> > >
> > > > On May 1, 2018, at 11:16 AM, Peter Collingbourne <peter at pcc.me.uk>
> wrote:
> > > >
> > > >
> > > >
> > > > On Tue, May 1, 2018 at 11:10 AM,  <dmitry.mikulin at sony.com> wrote:
> > > >
> > > >
> > > > > On Apr 30, 2018, at 6:04 PM, Peter Collingbourne <peter at pcc.me.uk>
> wrote:
> > > > >
> > > > > On Mon, Apr 30, 2018 at 1:05 PM,  <dmitry.mikulin at sony.com> wrote:
> > > > > Replacing direct calls to jump table entries with calls to real
> targets recovers most of the performance loss in my benchmark. Dealing with
> link order files is a bit cumbersome though: I can’t keep both, say “foo”
> and “foo.cfi” next to each other in link order, because if both exist, the
> linker reorders the jump table next to the real function. This is not what
> we want when the goal is to get rid of calls through jump tables.
> > > > >
> > > > > I would have thought that it could be beneficial to place the jump
> table next to one of the target functions, as it may reduce the chance of
> an additional page fault when performing an indirect call via a jump table.
> Do your benchmarks show otherwise?
> > > >
> > > > The problem is that the jump table entries end up in a large section
> of <name>.lto.o and the entire section is placed right next to one of the
> real functions. It breaks all locality in our case.
> > > >
> > > > Is <name>.lto.o being compiled with function sections? That should
> allow the linker to move just the jump table. I know that lld and the gold
> plugin do that by default, but I don't know what your linker does.
> > > >
> > > > >
> > > > > I had to manually pick the functions which were renamed to
> “foo.cfi” and replace them in the link order file. The process can be
> automated, but it’s a bit flaky if we have to do it to make CFI work.
> > > > >
> > > > > I attached 2 patches here: one is the direct call replacement and
> the other is moving type test lowering pass to run later in the pipeline.
> Interestingly, running the pass later seems to help with performance as
> well. Though the comment in PassManagerBuilder implies that this pass needs
> to run early. Why is it different from full LTO?
> > > > >
> > > > > The issue with running the passes early is that the
> llvm.assume(llvm.type.test) pattern that WholeProgramDevirt looks for is
> somewhat fragile, so it needs to see the IR first before one of the other
> passes can break it. (This is probably something that we want to fix by
> using a different pattern.) I don't think that the same issue exists with
> any of the patterns that LowerTypeTests looks for, so it would probably be
> fine to move it later.
> > > >
> > > > Moving WholeProgramDevirt breaks some tests. LowerTypeTests seems
> OK, not sure how much we can trust the tests though ;)
> > > > I’ll create a Phab review for this as well.
> > > >
> > > > Thanks, I'll take a look.
> > > >
> > > > Peter
> > > >
> > > >
> > > > >
> > > > > Peter
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > On Apr 27, 2018, at 11:04 AM, via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> > > > > >
> > > > > > For the test case below, I get the following IR for main() on
> entry to ThinLTO backend invocation of LowerTypeTestsModule::lower(). This
> is after I moved this pass down in the pipeline so it’s invoked after
> inlining.
> > > > > >
> > > > > > The declarations for foo() and bar() are read in at the time of
> module import, Importer.importFunctions() in lto::thinBackend(). They do
> not have type metadata attached to them.
> > > > > > In lowerTypeTestCall() we check if the pointer in the type test
> is of a known type, so we look at bitcast and then select operands. foo and
> bar in select are global objects with no type metadata, so the type check
> cannot be guaranteed to be true and it can’t be eliminated. In full LTO
> case this works as expected: both foo and bar are the same known type and
> type check is gone.
> > > > > >
> > > > > > Maybe the problem is not with renaming but with the missing type
> metadata in this particular case.
> > > > > > Though having so many direct calls routed through the jump table
> still seems problematic. Is there a feasible solution?
> > > > > >
> > > > > > --------------------------------
> > > > > > define hidden i32 @main(i32, i8** nocapture readnone)
> local_unnamed_addr #0 !type !3 !type !4 {
> > > > > >  %3 = add nsw i32 %0, -2
> > > > > >  store i32 %3, i32* @i, align 4, !tbaa !5
> > > > > >  %4 = icmp sgt i32 %0, 1
> > > > > >  %5 = select i1 %4, i32 ()* @foo, i32 ()* @bar
> > > > > >  %6 = bitcast i32 ()* %5 to i8*, !nosanitize !9
> > > > > >  %7 = tail call i1 @llvm.type.test(i8* nonnull %6, metadata
> !"_ZTSFivE"), !nosanitize !9
> > > > > >  br i1 %7, label %10, label %8, !prof !10, !nosanitize !9
> > > > > >
> > > > > > ; <label>:8:                                      ; preds = %2
> > > > > >  %9 = ptrtoint i32 ()* %5 to i64
> > > > > >  tail call void @__ubsan_handle_cfi_check_fail_abort(i8*
> getelementptr inbounds ({ i8, { [4 x i8]*, i32, i32 }, { i16, i16, [13 x
> i8] }* }, { i8, { [4 x i8]*, i32, i32 }, { i16, i16, [13 x i8] }* }* @anon.
> fad58de7366495db4650cfefac2fcd61.1, i64 0, i32 0), i64 %9, i64 undef) #4,
> !nosanitize !9
> > > > > >  unreachable, !nosanitize !9
> > > > > >
> > > > > > ; <label>:10:                                     ; preds = %2
> > > > > >  %11 = tail call i32 %5() #5
> > > > > >  ret i32 %11
> > > > > > }
> > > > > >
> > > > > > . . .
> > > > > > declare hidden i32 @foo() #3
> > > > > > declare hidden i32 @bar() #3
> > > > > >
> > > > > >
> > > > > > —————————test case---------------
> > > > > > b.c
> > > > > > =================
> > > > > > typedef int (*fptr_t) (void);
> > > > > > fptr_t get_fptr();
> > > > > > extern int i;
> > > > > >
> > > > > > int main(int argc, char *argv[])
> > > > > > {
> > > > > >  i = argc - 2;
> > > > > >  fptr_t fp = get_fptr();
> > > > > >  return fp();
> > > > > > }
> > > > > >
> > > > > > v.c
> > > > > > ================
> > > > > > int i;
> > > > > > typedef int (*fptr_t) (void);
> > > > > > int foo(void) {  return 11; }
> > > > > > int bar(void) {  return 22; }
> > > > > > fptr_t get_fptr(void) {  return (i >= 0) ? foo : bar; }
> > > > > >
> > > > > >
> > > > > >> On Apr 26, 2018, at 5:29 PM, Peter Collingbourne <
> peter at pcc.me.uk> wrote:
> > > > > >>
> > > > > >>> We could probably tolerate a certain amount of unused jump
> table entries. However, I just realized that all non-inline imported calls
> end up going through a jump table entry. Is that correct?
> > > > > >>
> > > > > >> In fact it is all calls that go through a function pointer type
> that is used anywhere in the program for an indirect call, but depending on
> your program that could be very close to "yes".
> > > > > >>
> > > > > >>> Moving the type check lowering pass further down the pipeline
> (after inlining) still does not solve the problem because CFI renaming
> happens early and symbols attached to the jump table do not have a matching
> type.
> > > > > >>
> > > > > >> As far as I know, renaming happens during the LowerTypeTests
> pass, after the type checks are lowered.
> > > > > >> Lowering: http://llvm-cs.pcc.me.uk/lib/
> Transforms/IPO/LowerTypeTests.cpp#1620
> > > > > >> Renaming: http://llvm-cs.pcc.me.uk/lib/
> Transforms/IPO/LowerTypeTests.cpp#1642
> > > > > >> Do you have an example of what you are seeing?
> > > > > >>
> > > > > >> Peter
> > > > > >>
> > > > > >> On Thu, Apr 26, 2018 at 4:54 PM,  <dmitry.mikulin at sony.com>
> wrote:
> > > > > >> Hi Peter,
> > > > > >>
> > > > > >> We could probably tolerate a certain amount of unused jump
> table entries. However, I just realized that all non-inline imported calls
> end up going through a jump table entry. Is that correct? Initially I
> thought you meant calls promoted from indirect. While this can be fixed by
> replacing direct calls to jump tables with direct calls to real targets, I
> found other cases where ThinLTO+CFI has issues.
> > > > > >>
> > > > > >> In ThinLTO backend, type test lowering happens very early in
> the pipeline, before inlining. When the type check after the call to
> get_fptr() is lowered (in my original example, below), the compiler cannot
> see that both targets belong to the same type and that the type check will
> always return ‘true’ and can be eliminated. Moving the type check lowering
> pass further down the pipeline (after inlining) still does not solve the
> problem because CFI renaming happens early and symbols attached to the jump
> table do not have a matching type.
> > > > > >>
> > > > > >> I’m trying to think if there’s a way to delay renaming until
> ThinLTO backend type check lowering pass. It would help with solving both
> problems.
> > > > > >>
> > > > > >> Thanks.
> > > > > >> Dmitry.
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>>>>>
> > > > > >>>>>> a.c
> > > > > >>>>>> =============================
> > > > > >>>>>> typedef int (*fptr_t) (void);
> > > > > >>>>>> fptr_t get_fptr();
> > > > > >>>>>> int main(int argc, char *argv[])
> > > > > >>>>>> {
> > > > > >>>>>> fptr_t fp = get_fptr();
> > > > > >>>>>> return fp();
> > > > > >>>>>> }
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> b.c
> > > > > >>>>>> =============================
> > > > > >>>>>> typedef int (*fptr_t) (void);
> > > > > >>>>>> int foo(void) { return 11; }
> > > > > >>>>>> int bar(void) { return 22; }
> > > > > >>>>>>
> > > > > >>>>>> static fptr_t fptr = bar;
> > > > > >>>>>> static int i = 53;
> > > > > >>>>>>
> > > > > >>>>>> fptr_t get_fptr(void)
> > > > > >>>>>> {
> > > > > >>>>>> if (i >= 0)
> > > > > >>>>>>   fptr = foo;
> > > > > >>>>>> else
> > > > > >>>>>>   fptr = bar;
> > > > > >>>>>>
> > > > > >>>>>> return fptr;
> > > > > >>>>>> }
> > > > > >>>>>>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>> On Apr 19, 2018, at 6:18 PM, Peter Collingbourne <
> peter at pcc.me.uk> wrote:
> > > > > >>>
> > > > > >>> Regarding the orderfile, yes, I was thinking more about
> ordering the real functions.
> > > > > >>>
> > > > > >>> In that case it sounds like your best option may be to
> implement the optimization pass to make direct calls go directly to the
> real function. From a performance perspective I don't think it would make
> much difference if there are unused jump table entries.
> > > > > >>>
> > > > > >>> Peter
> > > > > >>>
> > > > > >>> On Thu, Apr 19, 2018 at 6:09 PM, via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> > > > > >>> Teresa, Peter,
> > > > > >>>
> > > > > >>> Thanks for your help!
> > > > > >>> I need to re-run my experiments as the compiler I used did not
> have the latest changes like r327254.
> > > > > >>> The fact that the decision about routing calls through jump
> table entries is made early may be problematic. In my experiments with
> FreeBSD kernel, ThinLTO produced thousands jump table entries compared to
> only dozens with full LTO. As for re-ordering jump table entries, I don’t
> think it’s going to work as they are placed in the same section. Including
> *.cfi names into a link order file will take care of re-ordering real
> functions routed through jump table entries, but in our case we need to
> force some functions to be on the same page. So not having jump table
> entries for the functions that don't really need them would be ideal.
> > > > > >>>
> > > > > >>> Thanks.
> > > > > >>> Dmitry.
> > > > > >>>
> > > > > >>>
> > > > > >>>> On Apr 18, 2018, at 6:11 PM, Teresa Johnson <
> tejohnson at google.com> wrote:
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> On Wed, Apr 18, 2018 at 4:49 PM,  <dmitry.mikulin at sony.com>
> wrote:
> > > > > >>>> Hi Teresa,
> > > > > >>>>
> > > > > >>>> Thanks for the info!
> > > > > >>>> This example is my attempt to reduce FreeBSD kernel to
> something more manageable :)
> > > > > >>>>
> > > > > >>>> I will take a look at why globals are not being imported in
> this case. What’s the best tool to look into ThinLTO objects and their
> summaries? Most dumping tools don’t seem to like ThinLTO bitcode files…
> > > > > >>>>
> > > > > >>>> Sadly there isn't a really great way to dump the summaries.
> =( There was a patch awhile back by a GSOC student to dump in YAML format,
> but there was resistance from some who preferred dumping to llvm assembly
> via llvm-dis and support reading in the summary from llvm assembly. It's
> been on my list of things to do, hasn't yet risen high enough in priority
> to work on that. For now, you have to use llvm-bcanalyzer -dump and look at
> the raw format.
> > > > > >>>>
> > > > > >>>> Teresa
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> Hopefully Peter can chime in regarding CFI related issues.
> > > > > >>>>
> > > > > >>>> Thanks.
> > > > > >>>> Dmitry.
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>> On Apr 17, 2018, at 9:37 AM, Teresa Johnson <
> tejohnson at google.com> wrote:
> > > > > >>>>>
> > > > > >>>>> Hi Dmitry,
> > > > > >>>>>
> > > > > >>>>> Sorry for the late reply. For CFI specific code generation,
> pcc is a better person to answer. But on the issue of global variables
> being optimized, that hasn't happened yet. That would be great if you
> wanted to pick that up!
> > > > > >>>>>
> > > > > >>>>> In your original email example, it seems like the file
> static i=53 could be constant propagated since there are no other defs, and
> the code in get_fptr simplified during the compile step, but I assume this
> is part of a more complex example where it is not possible to do this? Also
> note that with r327254 we started importing global variables. Do you know
> why we don't import in your case? I wonder if it has to do with it being
> CFI inserted code?
> > > > > >>>>>
> > > > > >>>>> Teresa
> > > > > >>>>>
> > > > > >>>>> On Tue, Apr 17, 2018 at 9:17 AM <dmitry.mikulin at sony.com>
> wrote:
> > > > > >>>>> I watched  Teresa’s talk on ThinLTO from last year’s CppCon,
> and it sounded like adding global variable information to the summaries was
> in the works, or at least in planning. Can someone (Teresa?) please share
> the current status? If it’s part of future plans, are there any specific
> proposals that can be picked up and worked on?
> > > > > >>>>>
> > > > > >>>>> Thanks!
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>> On Apr 9, 2018, at 6:51 PM, via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> > > > > >>>>>>
> > > > > >>>>>> Hi,
> > > > > >>>>>>
> > > > > >>>>>> I’m working on setting up ThinLTO+CFI for a C application
> which uses a lot of function pointers. While functionally it appears
> stable, it’s performance is significantly degraded, to the tune of double
> digit percentage points compared to regular LTO+CFI.
> > > > > >>>>>>
> > > > > >>>>>> Looking into possible causes I see that under ThinLTO+CFI
> iCall type checks almost always generate jump table entries for indirect
> calls, which creates another level of indirection for every such call. On
> top of that it breaks the link order layout because real function names
> point to jump table entries. It appears that I’m hitting a limitation in
> ThinLTO on how much information it can propagate across modules,
> particularly information about constants. In the example below, the fact
> that “i” is effectively a constant, is lost under ThinLTO, and the inlined
> copy of b.c:get_fptr() in a.c does not eliminate the conditional, which,
> for CFI purposes requires to generate a type check/jump table.
> > > > > >>>>>>
> > > > > >>>>>> I was wondering if there was a way to mitigate this
> limitation.
> > > > > >>>>>>
> > > > > >>>>>> a.c
> > > > > >>>>>> =============================
> > > > > >>>>>> typedef int (*fptr_t) (void);
> > > > > >>>>>> fptr_t get_fptr();
> > > > > >>>>>> int main(int argc, char *argv[])
> > > > > >>>>>> {
> > > > > >>>>>> fptr_t fp = get_fptr();
> > > > > >>>>>> return fp();
> > > > > >>>>>> }
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> b.c
> > > > > >>>>>> =============================
> > > > > >>>>>> typedef int (*fptr_t) (void);
> > > > > >>>>>> int foo(void) { return 11; }
> > > > > >>>>>> int bar(void) { return 22; }
> > > > > >>>>>>
> > > > > >>>>>> static fptr_t fptr = bar;
> > > > > >>>>>> static int i = 53;
> > > > > >>>>>>
> > > > > >>>>>> fptr_t get_fptr(void)
> > > > > >>>>>> {
> > > > > >>>>>> if (i >= 0)
> > > > > >>>>>>   fptr = foo;
> > > > > >>>>>> else
> > > > > >>>>>>   fptr = bar;
> > > > > >>>>>>
> > > > > >>>>>> return fptr;
> > > > > >>>>>> }
> > > > > >>>>>>
> > > > > >>>>>> _______________________________________________
> > > > > >>>>>> LLVM Developers mailing list
> > > > > >>>>>> llvm-dev at lists.llvm.org
> > > > > >>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> --
> > > > > >>>>> Teresa Johnson |   Software Engineer |
> tejohnson at google.com |   408-460-2413
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> --
> > > > > >>>> Teresa Johnson |    Software Engineer |
> tejohnson at google.com |   408-460-2413
> > > > > >>>
> > > > > >>>
> > > > > >>> _______________________________________________
> > > > > >>> LLVM Developers mailing list
> > > > > >>> llvm-dev at lists.llvm.org
> > > > > >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>> --
> > > > > >>> --
> > > > > >>> Peter
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> --
> > > > > >> Peter
> > > > > >
> > > > > > _______________________________________________
> > > > > > LLVM Developers mailing list
> > > > > > llvm-dev at lists.llvm.org
> > > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > --
> > > > > Peter
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > --
> > > > Peter
> > >
> > >
> > >
> > >
> > > --
> > > --
> > > Peter
> >
> >
> >
> >
> > --
> > --
> > Peter
>
>


-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180501/c1471f94/attachment.html>


More information about the llvm-dev mailing list