[cfe-dev] Should _has_builtin be aware of target features?

Akira Hatanaka via cfe-dev cfe-dev at lists.llvm.org
Mon Jan 25 11:53:19 PST 2016


On Wed, Jan 20, 2016 at 3:06 PM, Eric Christopher <echristo at gmail.com>
wrote:

>
>
> On Tue, Jan 19, 2016 at 9:47 PM Akira Hatanaka <ahatanak at gmail.com> wrote:
>
>> I have to ask a few more questions to understand better what you are
>> proposing:
>>
>> I guess you are thinking about emitting IR for both branches and delete
>> one of the branches during back-end code-generation. Is that correct?
>>
>> if (__builtin_has_builtin("__builtin_ia32_paddsw256")) {
>>   // This is dead code unless "-mavx2" is on the command line.
>>   // use __builtin_ia32_paddsw256.
>> } else {
>>   ...
>> }
>>
>>
> Yes I was.
>
>
>> In that case, it looks like clang can complain (in
>> CodeGenFucntion::checkTargetFeatures) that the builtin cannot be emitted
>> because features are missing even when the builtin is in the dead branch
>> (this can happen when the code above is compiled without "-mavx2" on the
>> command line).
>>
>>
> Right. I completely forgot about this error, and this is why I couldn't
> come up with something that didn't warn in the first place when we came up
> with this. It's possible we could make the error control flow sensitive,
> however...
>
>
>> Also, not sure why __builtin_cpu_supports has to be emitted too. Doesn't
>> that builtin check the features at run-time?
>>
>
> The idea I had was that if you were depending on target features in
> conditional compilation then you could propagate them into the check and
> remove the conditional. In retrospect I don't think you can do that because
> you might be depending on the feature presence for more just legality, but
> also for performance reasons. Which would make it not ok to remove. Proper
> documentation that the builtin can be removed would solve this issue though.
>
>
Yes, I agree that we can remove the conditional if it's documented that it
can be removed at compile time.


> The reason I don't know that we can do this at IR generation time is
> because we can then override the cpu/feature set at LTO time if we're
> emitting IR later and we wouldn't want to remove the conditional if it
> could change (otherwise yes, at IR gen time makes perfect sense).
>
>
I'm not sure about this. My understanding is that, when you are doing LTO,
you want an executable that is functionally equivalent to a one built
without LTO. If we allow overriding the cpu/feature set at LTO time, in
some cases the LTO and the non-LTO executables will have different
behaviors.

For example, what happens if someone compiles foo1.c with -mavx2 and foo2.c
without the option?

$ cat foo1.c
int foo1() {
  return __builtin_has_builtin("some-avx2-builtin");
}

$ cat foo2.c
int foo2() {
  return __builtin_has_builtin("some-avx2-builtin");
}

I know this is a contrived example, but it seems to me that evaluating the
builtin at IR-gen time fits our LTO model.


> if (__builtin_cpu_supports("avx2"))
>    function_that_calls_mm_foo();
> else
>    function_that_does_it_slower();
>
> static void function_that_calls_mm_foo() {
>    // because using __builtin is a bad idea
>    _mm_foo();
> }
>
> which means that it should get inlined and avoid the warning
> (attribute(__always_inline__) is also a possibility here). And then if we
> document that __builtin_cpu_supports can be optimized away at code
> generation time (which, I believe, has been a goal on the gcc side anyways)
> then the optimization here should work. The complication is that we'd need
> a late inlining pass to make this work as I was originally picturing the
> lowering happening during codegen prepare (as the earliest "I'm going to
> generate code, I promise" pass).
>
>
I think it's possible to use this scheme, but it's a lot more complicated
than evaluating the builtin (whether it's a builtin or new macro) in the
frontend. We are looking for a simple builtin or macro that answers the
question "is this builtin supported?". If it becomes too complicated,
people will be less inclined to use it.


> Thoughts?
>
> -eric
>
>
>> On Tue, Jan 19, 2016 at 1:39 PM, Eric Christopher <echristo at gmail.com>
>> wrote:
>>
>>> I don't think at IR gen is the proper time, I think you'll want to do it
>>> as a code generation time and handle it there. That way you can still DCE
>>> the code that you don't want to dynamic dispatch on via
>>> __builtin_cpu_supports. Then you know that the code you're generating for
>>> an entire function will necessarily support an ISA extension or not.
>>>
>>> -eric
>>>
>>> On Tue, Jan 19, 2016 at 12:30 PM Akira Hatanaka <ahatanak at gmail.com>
>>> wrote:
>>>
>>>> ping.
>>>>
>>>> How about defining a new builtin (__builtin_has_builtin?) that
>>>> evaluates to true of false at compile time based on the target features set
>>>> by both the command line options and target attributes on the function?
>>>>
>>>> For example, clang's code-gen will not emit IR for the else statement
>>>> in the following piece of code if the builtin evaluates to true:
>>>>
>>>> if (__builtin_has_builtin("somebuiltin")) {
>>>>   // use "somebuiltin"
>>>> } else {
>>>>   ...
>>>> }
>>>>
>>>> On Mon, Nov 2, 2015 at 3:55 PM, Akira Hatanaka <ahatanak at gmail.com>
>>>> wrote:
>>>>
>>>>> _has_builtin is a function-like macro that evaluates to 1 if the
>>>>> builtin function name that's passed to it is supported by the current
>>>>> version of clang or the architecture that is being targeted.
>>>>>
>>>>>
>>>>> http://clang.llvm.org/docs/LanguageExtensions.html#feature-checking-macros
>>>>>
>>>>> Some people have expressed interest in enhancing the functionality of
>>>>> this macro and making it aware of target features.
>>>>>
>>>>> For example, let's say I had the following program and wanted to emit
>>>>> avx2  builtin __builtin_ia32_paddsw256 if the target architecture supported
>>>>> it and emit a plain addition if it was unsupported (I don't know whether
>>>>> the users will specify -mavx2 on the command line):
>>>>>
>>>>> $ cat add1.c
>>>>>
>>>>> typedef unsigned short v16hi __attribute__((__vector_size__(32)));
>>>>>
>>>>> v16hi func1(v16hi a, v16hi b) {
>>>>> #if __has_builtin(__builtin_ia32_paddsw256)
>>>>>   return __builtin_ia32_paddsw256(a, b);
>>>>> #else
>>>>>   return a + b;
>>>>> #endif
>>>>> }
>>>>>
>>>>> Currently, clang fails to compile this program unless -mavx2 is
>>>>> explicitly specified on the command line or the target supports avx2:
>>>>>
>>>>> $ clang add1.c -S -o - -v -emit-llvm
>>>>>
>>>>> *add1.c:5:10: **error: **'__builtin_ia32_paddsw256' needs target
>>>>> feature avx2*
>>>>>
>>>>>   return __builtin_ia32_paddsw256(a, b);
>>>>>
>>>>> This happens because _has_builtin always evaluates to 1 if the current
>>>>> version of clang supports the builtin for the target architecture (x86 in
>>>>> the case above) regardless of which target features (e.g., avx2) are set.
>>>>> So in this case, we want _has_builtin to know which features are set and
>>>>> evaluate to 0 if feature avx2 is not set.
>>>>>
>>>>> However, making _has_builitin smarter breaks the following use case:
>>>>>
>>>>> $ cat add1.c
>>>>> v16hi __attribute__((target("avx2"))) func1(v16hi a, v16hi b) {
>>>>> #if __has_builtin(__builtin_ia32_paddsw256)
>>>>>   return __builtin_ia32_paddsw256(a, b);
>>>>> #else
>>>>>   return a + b;
>>>>> #endif
>>>>> }
>>>>>
>>>>> $ clang add1.c -S -o - -v -emit-llvm
>>>>>
>>>>> If a user compiles the program with clang today, __has_builtin returns
>>>>> 1 because the builtin is supported. In this case, this is what the user
>>>>> wants: the user doesn't need to know if avx2 instructions are available
>>>>> (the target attribute says avx2 is available) but just wants to use
>>>>> _has_builitin to find out whether the current version of clang supports the
>>>>> builtin. However, if we make _has_builtin aware of the target feature,
>>>>> __has_builtin will evaluate to 0, which results in clang emitting the plain
>>>>> addition instead of the builtin (note that the preprocessor is not aware of
>>>>> what target attributes are on the function).
>>>>>
>>>>> I can see how making _has_builtin aware of target features can improve
>>>>> programmers' experience. But I've also heard opinions from people who are
>>>>> concerned about the use cases this change will break, so I'm sending out
>>>>> this email to a wider audience.
>>>>>
>>>>
>>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160125/4c4a3ff9/attachment.html>


More information about the cfe-dev mailing list