[llvm-dev] target-features attribute prevents inlining?

Craig Topper via llvm-dev llvm-dev at lists.llvm.org
Fri Jun 12 23:58:34 PDT 2020


On Fri, Jun 12, 2020 at 11:48 PM David Blaikie via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> On Fri, Jun 12, 2020 at 10:42 PM Haoran Xu <haoranxu510 at gmail.com> wrote:
> >
> > Thank you so much David! After thinking a bit more I agree with you that
> attempting to add 'target-features' to my functions seem to be the safest
> approach of all.
> >
> > I noticed that if I mark the clang++ function as 'AlwaysInline', the
> inlining is performed normally. Is this a potential bug, given what you
> said that LLVM may accidentally move code using advanced cpu features
> outside the condition-check?
>
> I guess that's probably just one of those "you get what you asked for"
> situations - if you're mixing target attributes and forcing inlining,
> it's assumed you've weighed the risks/figured out how to make that
> work safely. But I'm not entirely sure.


Clang checks the target features for attribute(always_inline) and will fail
to compile if the caller isn’t a superset of the caller.



>
> > Also, may I ask an additional (kind of irrelevant) question?
> > The functions I extracted from clang++ output are already optimized. I
> wanted to have some way to prevent LLVM from wasting time in optimizing
> them again at runtime, when those functions are fed together with my
> functions (which should be optimized) into the optimizer. Is there any way
> to achieve this? I do not think the 'optnone' attribute is the solution
> since it prevents the function from being inlined. I am currently marking
> those clang++ functions to have 'available_externally' linkage, which I
> feel is the closest to what I want from my understanding of the document,
> though I'm not sure if this is the right approach, or if there is a better
> approach. Could you kindly give some pointers on this question?
>
> available_externally seems problematic - what that does is, if LLVM
> fails to inline the available_externally definition, it can
> drop/delete the definition and rely on a definition being available in
> some other object file/module that this one is linked to. So if you
> add that attribute and the function is not inlined, you'll probably
> get a linker error about a missing symbol definition.
>
> "optnone" is about all we have for this sort of thing - so if that's
> not what you're looking for, probably best to just let LLVM
> re-optimize the function. In general optimizations should be cheap if
> they're not doing any work/the function is already optimized.
>
> - Dave
>
> >
> > Thanks again!
> >
> > Best,
> > Haoran
> >
> >
> > David Blaikie <dblaikie at gmail.com> 于2020年6月12日周五 下午10:17写道:
> >>
> >> On Fri, Jun 12, 2020 at 10:10 PM Haoran Xu <haoranxu510 at gmail.com>
> wrote:
> >> >
> >> > Hi David,
> >> >
> >> > Thanks for your quick response!
> >> >
> >> > I now understand the reason that inlining cannot be done on functions
> with different target-attributes. Thanks for your explanation!
> >> >
> >> > However, I think I didn't fully understand your solution; it would be
> nice if you would like to elaborate a bit more. Here's a bit more info on
> my current workflow:
> >> >
> >> > (1) The clang++ compiler builds C++ source file (a.cpp), which
> contains the implementation '_Z2fnP10TestStructi', into bitcode (a.bc).
> >> > (2) A parser parses 'a.bc', extracts the IR of '_Z2fnP10TestStructi'
> and generates a data header file (a.h), containing the raw bitcode of that
> function.
> >> > (3) The data header is then built with the main program, so the main
> program has access to the raw bitcode data.
> >> > (4) At runtime, the main program generates 'testfn' using
> llvm::IRBuilder (something similar to Kaleidoscope tutorial does). The
> 'testfn' does not have any of those attributes or MetadataNodes of course.
> >> > (5) The raw bitcode data and the 'testfn' are combined into a single
> module using LLVM's LinkinModule API, then fed into optimizer.
> >> >
> >> > What do you think is the proper fix for my use case? I can think of a
> few, but I don't think I have enough context to determine which is the most
> proper fix.
> >> > (1) Remove all MetadataNode and attributes from the bitcode files. Is
> this sufficient to prevent all weird cases like this one? What would be the
> drawback if all MetadataNodes and attributes are removed?
> >>
> >> I don't know if dropping attributes is always safe/correct. (metadata
> >> is certainly droppable (or at least intended to be) while maintaining
> >> correctness - they're meant to be optional value-add without being
> >> mandatory)
> >>
> >> > (2) Remove only the 'target-features' attribute from the bitcode
> file. Is this sufficient to prevent all weird cases like this one?
> >>
> >> Don't know for sure.
> >>
> >> > (3) Add 'target-features' attribute to all the functions I generated.
> Is this sufficient to prevent all weird cases like this one? Do I have the
> guarantee that the 'target-features' attribute of all bitcode files
> generated by clang++ are identical?
> >>
> >> That's sort of what I was getting at - suggesting you figure out how
> >> Clang is determining the attributes and replicate or otherwise reuse
> >> the same logic. Not sure how feasible that approach is - but it'd be
> >> where I'd look to start at least.
> >>
> >> - Dave
> >>
> >> >
> >> > Thanks!
> >> >
> >> > Haoran
> >> >
> >> >
> >> > David Blaikie <dblaikie at gmail.com> 于2020年6月12日周五 下午9:54写道:
> >> >>
> >> >> (+Eric Christopher for target attributes)
> >> >> (+Lang Hames for JIT things)
> >> >>
> >> >> The problem is that those target features enable the compiler select
> >> >> instructions that would only be valid if the target CPU has those
> >> >> features (eg: a function without the "+mmx" attribute might be
> >> >> intended to be run on a CPU that doesn't have the mmx instruction
> >> >> set). It's possible that a function with mmx could be called from a
> >> >> function without mmx if the caller checked the CPU features to ensure
> >> >> they matched before making the call. Since there's any number of ways
> >> >> that test might be done - LLVM can't be sure once it inlines the
> >> >> mmx-using function into the not-mmx having caller, that LLVM won't
> >> >> accidentally move the mmx-using code around beyond the condition. So
> >> >> the inlining is disabled.
> >> >>
> >> >> In the broadest sense, you probably want to compile things the same
> >> >> way for both your IR generators - lifting whatever set of flags/etc
> is
> >> >> used to generate the target and attributes from clang for your
> runtime
> >> >> generated code would probably be the best thing.
> >> >>
> >> >> - Dave
> >> >>
> >> >> On Fri, Jun 12, 2020 at 9:21 PM Haoran Xu via llvm-dev
> >> >> <llvm-dev at lists.llvm.org> wrote:
> >> >> >
> >> >> > Hello,
> >> >> >
> >> >> > I'm new to LLVM and I recently hit a weird problem about inlining
> behavior. I managed to get a minimal repro and the symptom of the issue,
> but I couldn't understand the root cause or how I should properly handle
> this issue.
> >> >> >
> >> >> > Below is an IR code consisting of two functions
> '_Z2fnP10TestStructi' and 'testfn', with the latter calling the former. One
> would expect the optimizer inlining the call to the '_Z2fnP10TestStructi',
> but it doesn't. (The command line I used is 'opt -O3 test.ll -o test2.bc')
> >> >> >
> >> >> >> source_filename = "a.cpp"
> >> >> >> target datalayout =
> "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
> >> >> >> target triple = "x86_64-unknown-linux-gnu"
> >> >> >>
> >> >> >> %struct.TestStruct = type { i8*, i32 }
> >> >> >>
> >> >> >> define dso_local i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0,
> i32 %1) #0 {
> >> >> >>   %3 = getelementptr inbounds %struct.TestStruct,
> %struct.TestStruct* %0, i64 0, i32 0
> >> >> >>   %4 = load i8*, i8** %3, align 8
> >> >> >>   %5 = icmp eq i8* %4, null
> >> >> >>   %6 = add nsw i32 %1, 1
> >> >> >>   %7 = shl nsw i32 %1, 1
> >> >> >>   %8 = select i1 %5, i32 %6, i32 %7
> >> >> >>   ret i32 %8
> >> >> >> }
> >> >> >>
> >> >> >> define i32 @testfn(%struct.TestStruct* %0) {
> >> >> >> body:
> >> >> >>   %1 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
> 1)
> >> >> >>   %2 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
> %1)
> >> >> >>   %3 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
> %2)
> >> >> >>   %4 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
> %3)
> >> >> >>   %5 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
> %4)
> >> >> >>   %6 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
> %5)
> >> >> >>   %7 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
> %6)
> >> >> >>   %8 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
> %7)
> >> >> >>   %9 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
> %8)
> >> >> >>   %10 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
> %9)
> >> >> >>   ret i32 %10
> >> >> >> }
> >> >> >>
> >> >> >> attributes #0 = {
> "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" }
> >> >> >
> >> >> >
> >> >> > It turns out that the failure to inline is caused by the
> 'target-features' attribute in the last line. The function inlines properly
> if I remove the 'target-features' attribute from '_Z2fnP10TestStructi', or
> if I add 'attribute #0' to 'testfn'.
> >> >> >
> >> >> > So I think the symptom is that inlining does not work when two
> functions have different 'target-features' attributes. However, I could not
> understand what is the reasoning behind this, or how I should prevent this
> issue properly.
> >> >> >
> >> >> > Just for additional information, in my use case, the function
> '_Z2fnP10TestStructi' is automatically extracted from IR generated by
> clang++ with -O3, so the IR contains a bunch of attributes and
> MetadataNodes. The function 'testfn' is generated by my logic using
> llvm::IRBuilder at runtime, so the function does not contain any of those
> attributes and MetadataNodes initially. The functions generated by clang++
> and my functions are then fed together into optimization passes, and I
> expect the optimizer to inline clang++ functions into my functions as
> needed.
> >> >> >
> >> >> > So, what is the proper workaround for this? Should I delete all
> the attribute and MetadataNodes from the clang++-generated IR (and if yes,
> is that sufficient to prevent all those weird cases like this one)? I
> thought it was a bad idea because they provide more info to optimizer. If
> not, what is the proper way of handling this?
> >> >> >
> >> >> > Thanks!
> >> >> >
> >> >> > Best regards,
> >> >> > Haoran
> >> >> >
> >> >> >
> >> >> >
> >> >> > _______________________________________________
> >> >> > LLVM Developers mailing list
> >> >> > llvm-dev at lists.llvm.org
> >> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-- 
~Craig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200612/fab33b67/attachment.html>


More information about the llvm-dev mailing list