[cfe-dev] Determining if it is a bug in Clang or my code

Fri May 8 12:08:41 PDT 2015

Thank you for the responses. I spent a lot of time trying to distill the
kernel into a simple repro, but the problem didn't turn out to be so
complicated after all. Mats had it right with the emulated processor being
"not quite so advanced". The trick was adding `-mno-mmx -mno-sse` to the
build. Now the clang-backed compiles work just fine.

On Thu, Apr 30, 2015 at 2:16 PM, mats petersson <mats at planetcatfish.com>
wrote:

> It seems rather more likely that the problem is in your code than the
> compiler - although not IMPOSSIBLE that it's the compiler. The reason I say
> it's unlikely is that it would require that the compiler actually generates
> an invalid opcode for your processor [aside from UD2A as described by Reid]
> - although one possibility is of course that your actual compiled code is
> built for some fancy processor, and your emulated processor is "not quite
> so advanced" [e.g. SSE or AVX instructions are generated, where these are
> not supported in the emulation]. Use the "most basic" processor model that
> is supported to see if that is the case (e.g. i486 or x86-64)
>
> Of course, getting things messed up or using uninitialized data as
> code-pointers (vtables, function pointers, etc), returning on an
> overwritten stack and/or getting memory mapping wrong can easily lead to
> this exact scenario (and the exact effect will depend on the exact code
> generated, meaning different compilers will also give different results). I
> have worked with operating systems long enough to have seen just about
> every of these - my "favourite" is when I've messed up the arguments to
> memset, and the code starts to fill from `len` for `destination` bytes [and
> `destination` is the other side of the code vs. `len`], and eventually you
> overwrite the current code with the fill value - which usually leads to
> REALLY bizarre crashes. Or forgetting to set the A20 gate in x86-machines,
> so your first MB overlaps/shadows second MB (and if you cleverly put your
> code at 0..1MB, and the stack, heap, etc at 1..2MB, it gives really nice
> undecipherable crashes when you execute the just allocated memory or just
> allocated stack...)
>
> Figuring out "where you came from" (call-stack) as well as "what is this
> instruction" would be a good place to start looking at "why is this going
> wrong".
>
> --
> Mats
>
> On 30 April 2015 at 21:06, Reid Kleckner <rnk at google.com> wrote:
>
>> Can you see if the the invalid opcode is ud2a? Clang sometimes emits
>> those after encountering certain kinds of UB.
>>
>> I think the most common is falling off the end of a function that is
>> supposed to return a value. If you compile your code with -Wreturn-type (it
>> should be on by default), you should see a warning for it, but not if the
>> code is in a system header. There are other more obscure ways to trigger
>> it, like passing non-POD objects through a vararg pack, but my money is on
>> -Wreturn-type.
>>
>> On Thu, Apr 30, 2015 at 12:31 PM, Chris Smith <chrsmith at google.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I've been working on an operating-system as a hobby project for a few
>>> months now, and finally tried converting the codebase to Clang. While the
>>> code compiles fine, I now get a surprising Interrupt 0x06 "Invalid Opcode"
>>> being fired when executing some C++ code. (Running either under bochs and
>>> qemu.)
>>>
>>> The same codebase works fine when compiled under GCC, the faulty code(?)
>>> only happens when built under Clang. The part of the code isn't invoking
>>> any assembly (inline or otherwise), and the C++ itself is fairly straight
>>> forward. (See below.)
>>>
>>> My questions are:
>>>
>>> - Is the fact this interrupt firing while executing pure C++ code proof
>>> of a compiler bug? Or is it possible to generate invalid opcodes through
>>> using undefined C++ behavior, etc.
>>>
>>> - How likely is it that this is actually a Clang codegen bug? I worked
>>> on the F# compiler at Microsoft, and know quite well that "I found a bug in
>>> the compiler" is latin for "I don't understand how this language works";
>>> though the fact the code is triggering a CPU interrupt is concerning.
>>>
>>> - Would it be worth while to distill my os-project down and try to
>>> produce a minimal repro? If so, where should I send it?
>>>
>>> As for the code itself, the problem seems to be occurring in my
>>> implementation of printf. I'm using variadic template arguments to do it in
>>> a typesafe way. Is "variadic template codegen for 32-bit" a particularly
>>> rough area of the Clang/LLVM codebase?
>>>
>>> Any insight would be appreciated.
>>>
>>> Thanks,
>>> -Chris
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>
>>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150508/deb3776c/attachment.html>