[llvm-dev] target-features attribute prevents inlining?

Fri Jun 12 21:53:57 PDT 2020

(+Eric Christopher for target attributes)
(+Lang Hames for JIT things)

The problem is that those target features enable the compiler select
instructions that would only be valid if the target CPU has those
features (eg: a function without the "+mmx" attribute might be
intended to be run on a CPU that doesn't have the mmx instruction
set). It's possible that a function with mmx could be called from a
function without mmx if the caller checked the CPU features to ensure
they matched before making the call. Since there's any number of ways
that test might be done - LLVM can't be sure once it inlines the
mmx-using function into the not-mmx having caller, that LLVM won't
accidentally move the mmx-using code around beyond the condition. So
the inlining is disabled.

In the broadest sense, you probably want to compile things the same
way for both your IR generators - lifting whatever set of flags/etc is
used to generate the target and attributes from clang for your runtime
generated code would probably be the best thing.

- Dave

On Fri, Jun 12, 2020 at 9:21 PM Haoran Xu via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>
> Hello,
>
> I'm new to LLVM and I recently hit a weird problem about inlining behavior. I managed to get a minimal repro and the symptom of the issue, but I couldn't understand the root cause or how I should properly handle this issue.
>
> Below is an IR code consisting of two functions '_Z2fnP10TestStructi' and 'testfn', with the latter calling the former. One would expect the optimizer inlining the call to the '_Z2fnP10TestStructi', but it doesn't. (The command line I used is 'opt -O3 test.ll -o test2.bc')
>
>> source_filename = "a.cpp"
>> target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
>> target triple = "x86_64-unknown-linux-gnu"
>>
>> %struct.TestStruct = type { i8*, i32 }
>>
>> define dso_local i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %1) #0 {
>>   %3 = getelementptr inbounds %struct.TestStruct, %struct.TestStruct* %0, i64 0, i32 0
>>   %4 = load i8*, i8** %3, align 8
>>   %5 = icmp eq i8* %4, null
>>   %6 = add nsw i32 %1, 1
>>   %7 = shl nsw i32 %1, 1
>>   %8 = select i1 %5, i32 %6, i32 %7
>>   ret i32 %8
>> }
>>
>> define i32 @testfn(%struct.TestStruct* %0) {
>> body:
>>   %1 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 1)
>>   %2 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %1)
>>   %3 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %2)
>>   %4 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %3)
>>   %5 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %4)
>>   %6 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %5)
>>   %7 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %6)
>>   %8 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %7)
>>   %9 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %8)
>>   %10 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %9)
>>   ret i32 %10
>> }
>>
>> attributes #0 = { "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" }
>
>
> It turns out that the failure to inline is caused by the 'target-features' attribute in the last line. The function inlines properly if I remove the 'target-features' attribute from '_Z2fnP10TestStructi', or if I add 'attribute #0' to 'testfn'.
>
> So I think the symptom is that inlining does not work when two functions have different 'target-features' attributes. However, I could not understand what is the reasoning behind this, or how I should prevent this issue properly.
>
> Just for additional information, in my use case, the function '_Z2fnP10TestStructi' is automatically extracted from IR generated by clang++ with -O3, so the IR contains a bunch of attributes and MetadataNodes. The function 'testfn' is generated by my logic using llvm::IRBuilder at runtime, so the function does not contain any of those attributes and MetadataNodes initially. The functions generated by clang++ and my functions are then fed together into optimization passes, and I expect the optimizer to inline clang++ functions into my functions as needed.
>
> So, what is the proper workaround for this? Should I delete all the attribute and MetadataNodes from the clang++-generated IR (and if yes, is that sufficient to prevent all those weird cases like this one)? I thought it was a bad idea because they provide more info to optimizer. If not, what is the proper way of handling this?
>
> Thanks!
>
> Best regards,
> Haoran
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev