[LLVMdev] Incorrect loop optimization when building the Linux kernel

David Majnemer david.majnemer at gmail.com
Mon Dec 8 11:46:45 PST 2014


I'm pretty sure this is fixed in r223684.  This particular use of
zero-sized arrays should defeat any chance of compile-time address equality.

However, this commit does not close any open questions about the validity
of the source.

On Mon, Dec 8, 2014 at 10:32 AM, Reid Kleckner <rnk at google.com> wrote:

> On Sun, Dec 7, 2014 at 11:16 PM, Nick Lewycky <nicholas at mxc.ca> wrote:
>
>> Chengyu Song wrote:
>>
>>> It's difficult to say without a full example, but I'm very suspicious
>>>> of those global declarations. I think the compiler would be entirely
>>>> justified in assuming you could *never* get from __start_builtin_fw to
>>>> __end_builtin_fw, let alone on the first iteration: they're distinct
>>>> array objects and by definition (within C99) can't overlap.
>>>>
>>>
>>> I think this should be the root cause. Once I changed the declaration to
>>> pointers (instead of arrays):
>>>
>>> extern struct builtin_fw* __start_builtin_fw;
>>> extern struct builtin_fw* __end_builtin_fw;
>>>
>>> The generated code will not skip the first comparison.
>>>
>>> Sadly, Linux kernel may contain more such declarations.
>>>
>>
>> That's a great point. Maybe clang could issue a warning on these, any for
>> loop that declares a pointer to the address of a non-weak object, compares
>> it to a different object, and increments only the pointer (without doing
>> other changes to the pointer inside the loop body). We know that's a buggy
>> pattern. Could you file a feature request for this on llvm.org/bugs ?
>
>
> How is detecting the "buggy" pattern helpful to the user? The linker has
> provided them with the interface of __start_my_section and __end_my_section
> symbols, and they are using it in the most obvious way possible. I have
> personally written this pattern before. I can't think of any other way to
> use this interface.
>
> What will the user do if the compiler tells them their code is wrong and
> optimizes it away? They will probably just mutilate the code until the
> compiler fails to warn and optimize on this case. We need something better.
>
> We need a way to express iteration over the concatenated contents of a
> section in LLVM IR that has a logical lowering from C. We should either
> adopt the GNU binutils convention of special __start_* / __end_* symbols,
> or invent a new representation that is compelling.
>
> Joerg suggested adding 'weak' as a workaround, but it doesn't seem
> particularly compelling to me:
>
> extern void *__start_my_ptr_section[];
> extern void *__end_my_ptr_section[] __attribute__((weak)); // I'm assuming
> Joerg implied the attribute goes here...
> void iterate_my_section() {
>   for (void **i = __start_my_ptr_section; i != __end_my_ptr_section; i++) {
>     use_ptr(*i);
>   }
> }
>
> LLVM has had the concept of "appending" globals for a very long time, but
> it doesn't have an actual object file lowering. It also provides no
> mechanism for iteration, because it is impossible to determine the ending
> point of the concatenated array after linking.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141208/5d70d8f5/attachment.html>


More information about the llvm-dev mailing list