[cfe-dev] Should _has_builtin be aware of target features?

Eric Christopher via cfe-dev cfe-dev at lists.llvm.org
Wed Jan 20 15:06:35 PST 2016


On Tue, Jan 19, 2016 at 9:47 PM Akira Hatanaka <ahatanak at gmail.com> wrote:

> I have to ask a few more questions to understand better what you are
> proposing:
>
> I guess you are thinking about emitting IR for both branches and delete
> one of the branches during back-end code-generation. Is that correct?
>
> if (__builtin_has_builtin("__builtin_ia32_paddsw256")) {
>   // This is dead code unless "-mavx2" is on the command line.
>   // use __builtin_ia32_paddsw256.
> } else {
>   ...
> }
>
>
Yes I was.


> In that case, it looks like clang can complain (in
> CodeGenFucntion::checkTargetFeatures) that the builtin cannot be emitted
> because features are missing even when the builtin is in the dead branch
> (this can happen when the code above is compiled without "-mavx2" on the
> command line).
>
>
Right. I completely forgot about this error, and this is why I couldn't
come up with something that didn't warn in the first place when we came up
with this. It's possible we could make the error control flow sensitive,
however...


> Also, not sure why __builtin_cpu_supports has to be emitted too. Doesn't
> that builtin check the features at run-time?
>

The idea I had was that if you were depending on target features in
conditional compilation then you could propagate them into the check and
remove the conditional. In retrospect I don't think you can do that because
you might be depending on the feature presence for more just legality, but
also for performance reasons. Which would make it not ok to remove. Proper
documentation that the builtin can be removed would solve this issue though.

The reason I don't know that we can do this at IR generation time is
because we can then override the cpu/feature set at LTO time if we're
emitting IR later and we wouldn't want to remove the conditional if it
could change (otherwise yes, at IR gen time makes perfect sense).

A pattern to get around the warning that could work is:

if (__builtin_cpu_supports("avx2"))
   function_that_calls_mm_foo();
else
   function_that_does_it_slower();

static void function_that_calls_mm_foo() {
   // because using __builtin is a bad idea
   _mm_foo();
}

which means that it should get inlined and avoid the warning
(attribute(__always_inline__) is also a possibility here). And then if we
document that __builtin_cpu_supports can be optimized away at code
generation time (which, I believe, has been a goal on the gcc side anyways)
then the optimization here should work. The complication is that we'd need
a late inlining pass to make this work as I was originally picturing the
lowering happening during codegen prepare (as the earliest "I'm going to
generate code, I promise" pass).

Thoughts?

-eric


> On Tue, Jan 19, 2016 at 1:39 PM, Eric Christopher <echristo at gmail.com>
> wrote:
>
>> I don't think at IR gen is the proper time, I think you'll want to do it
>> as a code generation time and handle it there. That way you can still DCE
>> the code that you don't want to dynamic dispatch on via
>> __builtin_cpu_supports. Then you know that the code you're generating for
>> an entire function will necessarily support an ISA extension or not.
>>
>> -eric
>>
>> On Tue, Jan 19, 2016 at 12:30 PM Akira Hatanaka <ahatanak at gmail.com>
>> wrote:
>>
>>> ping.
>>>
>>> How about defining a new builtin (__builtin_has_builtin?) that evaluates
>>> to true of false at compile time based on the target features set by both
>>> the command line options and target attributes on the function?
>>>
>>> For example, clang's code-gen will not emit IR for the else statement in
>>> the following piece of code if the builtin evaluates to true:
>>>
>>> if (__builtin_has_builtin("somebuiltin")) {
>>>   // use "somebuiltin"
>>> } else {
>>>   ...
>>> }
>>>
>>> On Mon, Nov 2, 2015 at 3:55 PM, Akira Hatanaka <ahatanak at gmail.com>
>>> wrote:
>>>
>>>> _has_builtin is a function-like macro that evaluates to 1 if the
>>>> builtin function name that's passed to it is supported by the current
>>>> version of clang or the architecture that is being targeted.
>>>>
>>>>
>>>> http://clang.llvm.org/docs/LanguageExtensions.html#feature-checking-macros
>>>>
>>>> Some people have expressed interest in enhancing the functionality of
>>>> this macro and making it aware of target features.
>>>>
>>>> For example, let's say I had the following program and wanted to emit
>>>> avx2  builtin __builtin_ia32_paddsw256 if the target architecture supported
>>>> it and emit a plain addition if it was unsupported (I don't know whether
>>>> the users will specify -mavx2 on the command line):
>>>>
>>>> $ cat add1.c
>>>>
>>>> typedef unsigned short v16hi __attribute__((__vector_size__(32)));
>>>>
>>>> v16hi func1(v16hi a, v16hi b) {
>>>> #if __has_builtin(__builtin_ia32_paddsw256)
>>>>   return __builtin_ia32_paddsw256(a, b);
>>>> #else
>>>>   return a + b;
>>>> #endif
>>>> }
>>>>
>>>> Currently, clang fails to compile this program unless -mavx2 is
>>>> explicitly specified on the command line or the target supports avx2:
>>>>
>>>> $ clang add1.c -S -o - -v -emit-llvm
>>>>
>>>> *add1.c:5:10: **error: **'__builtin_ia32_paddsw256' needs target
>>>> feature avx2*
>>>>
>>>>   return __builtin_ia32_paddsw256(a, b);
>>>>
>>>> This happens because _has_builtin always evaluates to 1 if the current
>>>> version of clang supports the builtin for the target architecture (x86 in
>>>> the case above) regardless of which target features (e.g., avx2) are set.
>>>> So in this case, we want _has_builtin to know which features are set and
>>>> evaluate to 0 if feature avx2 is not set.
>>>>
>>>> However, making _has_builitin smarter breaks the following use case:
>>>>
>>>> $ cat add1.c
>>>> v16hi __attribute__((target("avx2"))) func1(v16hi a, v16hi b) {
>>>> #if __has_builtin(__builtin_ia32_paddsw256)
>>>>   return __builtin_ia32_paddsw256(a, b);
>>>> #else
>>>>   return a + b;
>>>> #endif
>>>> }
>>>>
>>>> $ clang add1.c -S -o - -v -emit-llvm
>>>>
>>>> If a user compiles the program with clang today, __has_builtin returns
>>>> 1 because the builtin is supported. In this case, this is what the user
>>>> wants: the user doesn't need to know if avx2 instructions are available
>>>> (the target attribute says avx2 is available) but just wants to use
>>>> _has_builitin to find out whether the current version of clang supports the
>>>> builtin. However, if we make _has_builtin aware of the target feature,
>>>> __has_builtin will evaluate to 0, which results in clang emitting the plain
>>>> addition instead of the builtin (note that the preprocessor is not aware of
>>>> what target attributes are on the function).
>>>>
>>>> I can see how making _has_builtin aware of target features can improve
>>>> programmers' experience. But I've also heard opinions from people who are
>>>> concerned about the use cases this change will break, so I'm sending out
>>>> this email to a wider audience.
>>>>
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160120/17602691/attachment.html>


More information about the cfe-dev mailing list