[llvm-dev] Linking Linux kernel with LLD

Fri Feb 3 18:22:30 PST 2017

On Thu, Feb 2, 2017 at 4:35 PM, Dmitry Golovin <dima at golovin.in> wrote:

> I have just checked it, the startup.elf and realmode.elf are fine. Only
> few changes are required for mainline kernel and one commit has to be
> reverted from lld and a few patches have to be applied.
>
> The only step when I have used BFD is linking vmlinux. I have manually set
> LD variable in vmlinux_link() function. The vmlinux produced by lld doesn't
> work yet. I will compare it to the one produced by GNU ld and try to figure
> out what is wrong (maybe you can suggest some useful objdump flags?)
>

With objdump I would recommend looking at program headers. In particular at
PT_LOAD's and the dynamic symbol table. Anything in the dynamic table is
also worth scrutinizing. One thing to keep an eye out for is
addresses/offsets that look "weird"; e.g. maybe the LLD version thinks a
symbol has address 0 or some insane value, vs BFD/gold which has a more
sane value.

Also, set up your system so that you rebuild/reinstall the bootloader too
so that you can add printf's in there to hone in on where the boot is going
wrong. The following workflow might be useful:

Step 1: add a printf to the bootloader to try to hone in on the exact place
where things are going wrong
Step 2: rebuild/reinstall/reboot the new bootloader with the LLD-linked
kernel
Step 3: boot and observe the print's (or maybe things crashed before
reaching your print, which is just as useful to know)
Step 4: think about what you observed in Step 3, then go to Step 1, using
these results to inform the next set of print's to add

With appropriate scripts (and a nice qemu setup), one iteration of this may
take 10 minutes (say). You may have to repeat it (say) 20 times to pinpoint
the exact place where things are going wrong (e.g. "the bootloader is
crashing in the memcpy for the second PT_LOAD" or "the boot is failing
because the bootloader is reading from a bogus address that it got from
this part of the binary"). That is 200 minutes which isn't too bad.

One thing to keep in mind is that this is not like debugging a race
condition or other nasty nondeterministic bug. This should be quite
deterministic so you just have to be systematic and keep narrowing down
until you find where things go wrong. It just requires determination.

Once narrowed-down, you should hopefully have a clear indication of where
to look in the binary and compare with gold/bfd and hopefully the
discrepancy is pretty clear. Then you "just" need to figure out why LLD
produces this result and what to change to avoid the problem.

One amazing tool if you are working with object files is "010 Editor"
https://www.sweetscape.com/010editor/ with a "binary template" for ELF
files. I think there is an ELF "binary template" for 010 Editor floating
around the net, but the best one is Michael's one that he has evolved over
the years (ask him for it). If you haven't done so already, I recommend
that you sit down at Michael's desk one day and work with him to debug one
of these nasty "what is wrong with this binary and why?" issues so you can
see him do his thing; he's amazingly good at it.

Also, if you need a quick refresher about this x86 boot stuff (to be
somewhat oriented about the environment in which all this stuff is
happening), you may want to skim:
http://duartes.org/gustavo/blog/post/how-computers-boot-up/
http://duartes.org/gustavo/blog/post/kernel-boot-process/
http://duartes.org/gustavo/blog/post/memory-translation-and-segmentation/
http://duartes.org/gustavo/blog/post/cpu-rings-privilege-and-protection/

-- Sean Silva

>
> Regards,
> Dmitry
>
>
> 03.02.2017, 02:23, "Sean Silva" <chisophugis at gmail.com>:
>
>
>
> On Thu, Feb 2, 2017 at 12:38 AM, George Rimar <grimar at accesssoftek.com>
> wrote:
>
> >As far as the setup, I would recommend setting up qemu for actually
> running the LLD-linked kernel and custom bootloader etc. because then you
> can have a single >script that rebuilds the bootloader and kernel and
> copies the files to the VM. This reduces iteration time significantly.
> >Davide is the one that set that up and could probably provide more
> details, but qemu docs might be good enough that you can set things up
> without much effort
> >(not sure though).
> >
> >-- Sean Silva
>  
> By the way, yesterday I configured "smallest possible kernel", linked it
> with BFD and launched under QEMU.
> It is very small and takes a few seconds to build it from scratch for me,
> used next article:
> http://mgalgs.github.io/2015/05/16/how-to-build-a-custom-lin
> ux-kernel-for-qemu-2015-edition.html
>
> Now I am going to link it with LLD and check if it boots or now.
> I think that should be fastest way - boot that little core and then enable
> features
> one by one or group by group and fix other things on the road.
>
>
> My experience with linker bugs is that usually when things are mis-linked,
> they are in the "core". E.g. startup code. So linking a small kernel may
> not avoid as many bugs as you expect. For example, for FreeBSD, I don't
> think we hit any issues in anything that could have been configured out.
>
> -- Sean Silva
>
>
>
> Previously I also worked on a patches for kernel but did not try to
> minimize it and used some default configuration,
> what probably was good for finding mutliple issues from all sides, but not
> ideal way to fix/test startup and things.
>
> George.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170203/96169dbc/attachment.html>