[llvm-dev] Success: Bring-up of LLVM/clang-built Linux ARM(32-bit) kernel for Android - Nexus 5
Raghavan Santhanam via llvm-dev
llvm-dev at lists.llvm.org
Tue Jun 12 17:59:10 PDT 2018
Hello,
I would like to share my successful bring-up of LLVM/clang-built Linux
ARM(32-bit) hammerhead kernel for Android running on my Nexus 5 smartphone.
After having successfully brought up LLVM/clang-built Linux kernel(since
v4.15.7 to the most recent v4.17) on x86_64, I was interested in
accomplishing the same on the ARM platform of my Nexus 5 - Android
smartphone. So, here is the complete report of the same for the interested
people.
The main advantage of the *clang-built Android ARM(32-bit) hammerhead
kernel* for my Nexus 5 has been the *better battery usage* when compared to
that of *gcc-built kernel, *with the same kernel config and hardware(my
Nexus 5 Android Smartphone). Details of the same can be found below.
*NOTE : *By the way, I came across some reports of ARM64 clang-kernel for
some Android Smartphones - but, the information over there did **not** help
for my ARM32 clang-kernel case of Nexus 5(hammerhead). So, I started off
this project from **scratch** and it has been *lot of *entirely* *my own
original work* *to first successfully *build* the ARM32 clang-kernel for
Nexus 5(hammerhead) and second to make it **actually work** on the real
hardware - Nexus 5.
For easy reading with formatting, etc :
https://ubuntuforums.org/showthread.php?t=2394035
Cheers.
------------------------------
*Android ARM(32-bit) clang-kernel bring-up for Nexus 5(hammerhead)*
*[Android Version Information] **&** [Battery Usage of a clang-built kernel
~ better than that of gcc-built kernel (shows one of the instances)]*
*[1] Android NDK r13b [LLVM/clang + binutils(as, ld, etc)] [2] Android NDK
r17 [LLVM/clang + binutils(as, ld, etc)]*
*[3]** Main LLVM/clang + Android NDK r13b binutils(as, ld, etc) **[4]*
* Main LLVM/clang + Android NDK r17 binutils(as, ld, etc)*
*[5]** Snapdragon Qualcomm LLVM/clang + NDK r13b binutils(as, etc) **[6]*
* Snapdragon Qualcomm LLVM/clang + NDK r13b binutils(as, etc)*
*[Average Battery Usage]*
*BUILD SYSTEM INFORMATION*
Code:
#### Build system information ####
exp at exp:~$
exp at exp:~$ sudo dmidecode -t system | grep "Manufacturer:\|Version:"
Manufacturer: LENOVO
Version: Lenovo Y50-70 Touch
exp at exp:~$
exp at exp:~$
exp at exp:~$
exp at exp:~$ sudo dmidecode -t processor | grep "Version\|Family:"
Family: Core i7
Version: Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz
exp at exp:~$
exp at exp:~$
exp at exp:~$
exp at exp:~$ cat /proc/meminfo | grep MemTotal
MemTotal: 16332968 kB
exp at exp:~$
exp at exp:~$
exp at exp:~$
exp at exp:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 17.10
Release: 17.10
Codename: artful
exp at exp:~$
exp at exp:~$
*BUILD SUMMARY*
Code:
#### Total build time ####
38m6.816s
#### Build times ####
[GCC NDK r13b] : 4m15.596s
[GCC NDK r17] : 4m13.983s
[Android NDK r13b : LLVM/clang + binutils(ld and as)] : 4m3.665s
[Android NDK r17 : LLVM/clang + binutils(ld and as)] : 4m4.683s
[Main LLVM/clang + Android NDK r13b binutils(ld and as)] : 6m8.064s
[Main LLVM/clang + Android NDK r17 binutils(ld and as)] : 6m3.457s
[Qualcomm Snapdragon LLVM/clang + Android NDK r13b binutils(ld and
as)] : 4m32.581s
[Qualcomm Snapdragon LLVM/clang + Android NDK r17 binutils(ld and as)]
: 4m44.779s
*LONGEST AND SHORTEST BUILD*
Code:
##### Longest build ####
Name : Main LLVM/clang + Android NDK r13b binutils(ld and as)
Time : 6m8.064s
boot.img : boot-main-llvm-clang-ndk-r13b-binutils-ld-as.img
zImage-dtb : zImage-dtb-main-llvm-clang-ndk-r13b-binutils-ld-as
##### Shortest build ####
Name : Android NDK r13b : LLVM/clang + binutils(ld and as)
Time : 4m3.665s
boot.img : boot-ndk-r13b-clang-llvm-binutils-ld-as.img
zImage-dtb : zImage-dtb-ndk-r13b-clang-llvm-binutils-ld-as
*LARGEST AND SMALLEST IMAGES*
Code:
◈ Largest boot img : boot-ndk-r17-gcc.img ❏ Size : 13M (12984320 bytes)
◈ Smallest boot img 1 :
boot-main-llvm-clang-ndk-r17-binutils-ld-as.img ❏ Size : 11M (11272192
bytes)
◈ Smallest boot img 2 :
boot-main-llvm-clang-ndk-r13b-binutils-ld-as.img ❏ Size : 11M
(11272192 bytes)
◈ Largest zImage-dtb : zImage-dtb-ndk-r17-gcc ❏ Size : 12M (11844904 bytes)
◈ Smallest zImage-dtb :
zImage-dtb-main-llvm-clang-ndk-r13b-binutils-ld-as ❏ Size : 9.7M
(10132600 bytes)
*BOOT IMAGES SUMMARY*
Code:
◈ boot-ndk-r17-gcc.img ❏ Size ~ 13M (12984320 bytes)
◈ boot-ndk-r13b-gcc.img ❏ Size ~ 13M (12978176 bytes)
◈ boot-ndk-r13b-clang-llvm-binutils-ld-as.img ❏ Size ~ 13M (12625920 bytes)
◈ boot-qualcomm-snapdragon-llvm-clang-ndk-r13b-binutils-ld-as.img
❏ Size ~ 12M (11610112 bytes)
◈ boot-qualcomm-snapdragon-llvm-clang-ndk-r17-binutils-ld-as.img
❏ Size ~ 12M (11610112 bytes)
◈ boot-ndk-r17-clang-llvm-binutils-ld-as.img ❏ Size ~ 11M (11476992 bytes)
◈ boot-main-llvm-clang-ndk-r13b-binutils-ld-as.img ❏ Size ~ 11M
(11272192 bytes)
◈ boot-main-llvm-clang-ndk-r17-binutils-ld-as.img ❏ Size ~ 11M
(11272192 bytes)
*ZIMAGE-DTB SUMMARY*
Code:
◈ zImage-dtb-ndk-r17-gcc ❏ Size ~ 12M (11844904 bytes)
◈ zImage-dtb-ndk-r13b-gcc ❏ Size ~ 12M (11837640 bytes)
◈ zImage-dtb-ndk-r13b-clang-llvm-binutils-ld-as ❏ Size ~ 11M
(11487176 bytes)
◈ zImage-dtb-qualcomm-snapdragon-llvm-clang-ndk-r13b-binutils-ld-as
❏ Size ~ 10M (10469728 bytes)
◈ zImage-dtb-qualcomm-snapdragon-llvm-clang-ndk-r17-binutils-ld-as
❏ Size ~ 10M (10469680 bytes)
◈ zImage-dtb-ndk-r17-clang-llvm-binutils-ld-as ❏ Size ~ 10M
(10336624 bytes)
◈ zImage-dtb-main-llvm-clang-ndk-r17-binutils-ld-as ❏ Size ~ 10M
(10132608 bytes)
◈ zImage-dtb-main-llvm-clang-ndk-r13b-binutils-ld-as ❏ Size ~ 10M
(10132600 bytes)
*RAMDISK INFORMATION*
Code:
Ramdisk(pre-built - from RR) :
◈ boot.img-ramdisk.gz ❏ Size ~ 1M (1136400 bytes)
*Clang-KERNEL INFORMATION(from each of the zImage-dtb images)*
Code:
exp at exp:~$
exp at exp:~$ ../show_kernel_compiler_all.sh
#### Kernel compiler information ####
NOTE : Analyzing the images by decompressing them based on lz4 magic
(\x02\x21\x4c\x18) . . .
Image metadata(zImage-dtb-main-llvm-clang-ndk-r13b-binutils-ld-as) :
+++ grep -P -a -b --only-matching '\x02\x21\x4c\x18'
zImage-dtb-main-llvm-clang-ndk-r13b-binutils-ld-as
+++ tail -1
+++ cut -d: -f 1
++ pos1=5381
++ dd if=zImage-dtb-main-llvm-clang-ndk-r13b-binutils-ld-as bs=5381 skip=1
++ lz4 -d
++ eclang
++ head -1
+++ strings -a
+++ grep 'clang version'
Linux version 3.4.113-unicornblood-hammerhead-o+ (exp at exp) (Flash
clang version 7.0.332826 (https://git.llvm.org/git/clang
4029c7ddda99ecbfa144f0afec44a192c442b6e5)
(https://git.llvm.org/git/llvm
1181c40e0e24e0cca32e2609686db1f14151fc1a) (based on LLVM 7.0.332826) -
android-ndk-r13b) #1 SMP PREEMPT Mon Jun 4 00:33:21 PDT 2018
++ set +x
Image metadata(zImage-dtb-main-llvm-clang-ndk-r17-binutils-ld-as) :
+++ grep -P -a -b --only-matching '\x02\x21\x4c\x18'
zImage-dtb-main-llvm-clang-ndk-r17-binutils-ld-as
+++ cut -d: -f 1
+++ tail -1
++ pos1=5373
++ dd if=zImage-dtb-main-llvm-clang-ndk-r17-binutils-ld-as bs=5373 skip=1
++ lz4 -d
++ eclang
++ head -1
+++ strings -a
+++ grep 'clang version'
Linux version 3.4.113-unicornblood-hammerhead-o+ (exp at exp) (Flash
clang version 7.0.332826 (https://git.llvm.org/git/clang
4029c7ddda99ecbfa144f0afec44a192c442b6e5)
(https://git.llvm.org/git/llvm
1181c40e0e24e0cca32e2609686db1f14151fc1a) (based on LLVM 7.0.332826) -
android-ndk-r17) #1 SMP PREEMPT Mon Jun 4 00:39:25 PDT 2018
++ set +x
Image metadata(zImage-dtb-ndk-r13b-clang-llvm-binutils-ld-as) :
+++ grep -P -a -b --only-matching '\x02\x21\x4c\x18'
zImage-dtb-ndk-r13b-clang-llvm-binutils-ld-as
+++ tail -1
+++ cut -d: -f 1
++ pos1=5509
++ dd if=zImage-dtb-ndk-r13b-clang-llvm-binutils-ld-as bs=5509 skip=1
++ lz4 -d
++ eclang
++ head -1
+++ grep 'clang version'
+++ strings -a
Linux version 3.4.113-unicornblood-hammerhead-o+ (exp at exp) (Android
clang version 3.8.256229 (based on LLVM 3.8.256229) -
android-ndk-r13b) #1 SMP PREEMPT Mon Jun 4 00:23:08 PDT 2018
++ set +x
Image metadata(zImage-dtb-ndk-r17-clang-llvm-binutils-ld-as) :
+++ grep -P -a -b --only-matching '\x02\x21\x4c\x18'
zImage-dtb-ndk-r17-clang-llvm-binutils-ld-as
+++ tail -1
+++ cut -d: -f 1
++ pos1=5449
++ dd if=zImage-dtb-ndk-r17-clang-llvm-binutils-ld-as bs=5449 skip=1
++ lz4 -d
++ eclang
++ head -1
+++ strings -a
+++ grep 'clang version'
Linux version 3.4.113-unicornblood-hammerhead-o+ (exp at exp) (Android
(4691093 based on r316199) clang version 6.0.2
(https://android.googlesource.com/toolchain/clang
183abd29fc496f55536e7d904e0abae47888fc7f)
(https://android.googlesource.com/toolchain/llvm
34361f192e41ed6e4e8f9aca80a4ea7e9856f327) (based on LLVM 6.0.2svn) -
android-ndk-r17) #1 SMP PREEMPT Mon Jun 4 00:27:14 PDT 2018
++ set +x
Image metadata(zImage-dtb-qualcomm-snapdragon-llvm-clang-ndk-r13b-binutils-ld-as)
:
+++ grep -P -a -b --only-matching '\x02\x21\x4c\x18'
zImage-dtb-qualcomm-snapdragon-llvm-clang-ndk-r13b-binutils-ld-as
+++ tail -1
+++ cut -d: -f 1
++ pos1=5417
++ dd if=zImage-dtb-qualcomm-snapdragon-llvm-clang-ndk-r13b-binutils-ld-as
bs=5417 skip=1
++ lz4 -d
++ eclang
++ head -1
+++ strings -a
+++ grep 'clang version'
Linux version 3.4.113-unicornblood-hammerhead-o+ (exp at exp) (Snapdragon
LLVM ARM Compiler 4.0.2 for Android NDK (based on llvm.org 4.0+) -
clang version 4.0.2 for Android NDK - android-ndk-r13b) #1 SMP PREEMPT
Mon Jun 4 00:43:58 PDT 2018
++ set +x
Image metadata(zImage-dtb-qualcomm-snapdragon-llvm-clang-ndk-r17-binutils-ld-as)
:
+++ grep -P -a -b --only-matching '\x02\x21\x4c\x18'
zImage-dtb-qualcomm-snapdragon-llvm-clang-ndk-r17-binutils-ld-as
+++ tail -1
+++ cut -d: -f 1
++ pos1=5409
++ dd if=zImage-dtb-qualcomm-snapdragon-llvm-clang-ndk-r17-binutils-ld-as
bs=5409 skip=1
++ lz4 -d
++ eclang
++ head -1
+++ strings -a
+++ grep 'clang version'
Linux version 3.4.113-unicornblood-hammerhead-o+ (exp at exp) (Snapdragon
LLVM ARM Compiler 4.0.2 for Android NDK (based on llvm.org 4.0+) -
clang version 4.0.2 for Android NDK - android-ndk-r17) #1 SMP PREEMPT
Mon Jun 4 00:48:42 PDT 2018
++ set +x
exp at exp:~$
exp at exp:~$
*Clang-KERNEL INFORMATION(from dmesg extracted from each of the boot
instances)*
Code:
exp at exp:~$
exp at exp:~$ cat android1/android1_dmesg.txt | grep "clang\|Machine"
[ 0.000000] Linux version 3.4.113-unicornblood-hammerhead-o+
(exp at exp) (Android clang version 3.8.256229 (based on LLVM 3.8.256229)
- android-ndk-r13b) #1 SMP PREEMPT Mon Jun 4 00:23:08 PDT 2018
[ 0.000000] Machine: Qualcomm MSM 8974 HAMMERHEAD (Flattened Device
Tree), model: LGE MSM 8974 HAMMERHEAD
exp at exp:~$
exp at exp:~$
exp at exp:~$ cat android2/android2_dmesg.txt | grep "clang\|Machine"
[ 0.000000] Linux version 3.4.113-unicornblood-hammerhead-o+
(exp at exp) (Android (4691093 based on r316199) clang version 6.0.2
(https://android.googlesource.com/toolchain/clang
183abd29fc496f55536e7d904e0abae47888fc7f)
(https://android.googlesource.com/toolchain/llvm
34361f192e41ed6e4e8f9aca80a4ea7e9856f327) (based on LLVM 6.0.2svn) -
android-ndk-r17) #1 SMP PREEMPT Mon Jun 4 00:27:14 PDT 2018
[ 0.000000] Machine: Qualcomm MSM 8974 HAMMERHEAD (Flattened Device
Tree), model: LGE MSM 8974 HAMMERHEAD
exp at exp:~$
exp at exp:~$
exp at exp:~$ cat main1/main1_dmesg.txt | grep "clang\|Machine"
[ 0.000000] Linux version 3.4.113-unicornblood-hammerhead-o+
(exp at exp) (Flash clang version 7.0.332826
(https://git.llvm.org/git/clang
4029c7ddda99ecbfa144f0afec44a192c442b6e5)
(https://git.llvm.org/git/llvm
1181c40e0e24e0cca32e2609686db1f14151fc1a) (based on LLVM 7.0.332826) -
android-ndk-r13b) #1 SMP PREEMPT Mon Jun 4 00:33:21 PDT 2018
[ 0.000000] Machine: Qualcomm MSM 8974 HAMMERHEAD (Flattened Device
Tree), model: LGE MSM 8974 HAMMERHEAD
exp at exp:~$
exp at exp:~$
exp at exp:~$ cat main2/main2_dmesg.txt | grep "clang\|Machine"
[ 0.000000] Linux version 3.4.113-unicornblood-hammerhead-o+
(exp at exp) (Flash clang version 7.0.332826
(https://git.llvm.org/git/clang
4029c7ddda99ecbfa144f0afec44a192c442b6e5)
(https://git.llvm.org/git/llvm
1181c40e0e24e0cca32e2609686db1f14151fc1a) (based on LLVM 7.0.332826) -
android-ndk-r17) #1 SMP PREEMPT Mon Jun 4 00:39:25 PDT 2018
[ 0.000000] Machine: Qualcomm MSM 8974 HAMMERHEAD (Flattened Device
Tree), model: LGE MSM 8974 HAMMERHEAD
exp at exp:~$
exp at exp:~$
exp at exp:~$
exp at exp:~$ cat qualcomm1/qualcomm1_dmesg.txt | grep "clang\|Machine"
[ 0.000000] Linux version 3.4.113-unicornblood-hammerhead-o+
(exp at exp) (Snapdragon LLVM ARM Compiler 4.0.2 for Android NDK (based
on llvm.org 4.0+) - clang version 4.0.2 for Android NDK -
android-ndk-r13b) #1 SMP PREEMPT Mon Jun 4 00:43:58 PDT 2018
[ 0.000000] Machine: Qualcomm MSM 8974 HAMMERHEAD (Flattened Device
Tree), model: LGE MSM 8974 HAMMERHEAD
exp at exp:~$
exp at exp:~$
exp at exp:~$ cat qualcomm2/qualcomm2_dmesg.txt | grep "clang\|Machine"
[ 0.000000] Linux version 3.4.113-unicornblood-hammerhead-o+
(exp at exp) (Snapdragon LLVM ARM Compiler 4.0.2 for Android NDK (based
on llvm.org 4.0+) - clang version 4.0.2 for Android NDK -
android-ndk-r17) #1 SMP PREEMPT Mon Jun 4 00:48:42 PDT 2018
[ 0.000000] Machine: Qualcomm MSM 8974 HAMMERHEAD (Flattened Device
Tree), model: LGE MSM 8974 HAMMERHEAD
exp at exp:~$
*[ANDROID ARM LLVM/CLANG-KERNEL ~ RESEARCH PROJECT OVERVIEW]*
1. Finding the right kernel source and kernel config that works with the
Android version I had on my Nexus 5.
- Android(version) on my Nexus 5 is Resurrection Remix(RR) oreo with
3.4.13 hammerhead kernel(unicornblood config)
- Experimenting with different kernel source and hammerhead kernel
config including that of AOSP
2. Finding the most compact and quickest way to just build the kernel
out of tree but using the tree's build tools
- Syncing the Resurrection Remix's source from its repo onto my machine
- Finding the make targets for building just the ramdisk instead of
building the whole RR ROM which was not the goal
- Finding the tool and the arguments for that to build the boot.img
that can be fastboot-ed
3. Building a working kernel from source for my RR on Nexus 5, first
with Android NDK r13b gcc
- Finding the DirtyUnicorn kernel source, building it with gcc in the
first place, generating the boot.img
- Finding the right kernel config for the hammerhead kernel -
unicornblood that's used by DirtyUnicorn repo
- Using the pre-built ramdisk from RR zip instead of self-built one
- Disabling SELinux for allowing the kernel to be fastboot-ed
- Debugging the ADB-over-USB not working with the DirtyUnicorn kernel
image built
- Discussing on the Resurrection Remix forum and with the kernel
developer(uname:voidz) to know the actual kernel source used in RR.
4. Successfully booting the gcc-built kernel for RR on my Nexus 5
- Working kernel confimed in the first place for my ultimate Android ARM
clang-kernel goal
5. Setting up the clang-kernel build
- Finding the right LLVM/clang toolchain to begin with - Android NDK
r13b LLVM/clang
- Using the binutils - assembler, linker, etc that's in Android NDK
r13b
6. Launching the clang-kernel build
- Disabling the options that clang doesn't recognize
- Examining the initial compilation erros - invalid instruction
error(thrown actually by x86_64 GNU as)
- Fixing the assembler path to the ARM assembler
- Making sure it uses the right external assembler(Android EABI GNU
assembler)
7. Fixing the subsequent build errors
- RCU header had some static code check which had to be disabled since
in clang case alone it was an error not in gcc
- VLAIS in various kernel components had to be changed to non-VLAIS
to work with clang
8. Updating VLAIS to non-VLAIS in various kernel components
- Disk encryption
- USB Gadget/Function Filesystem
- CRC32
- Netfilter
9. Fixing linker errors for duplicate exception sections that clang
generates
- Instructing the linker to leave the exceptions out while generating
the final kernel image
10. Fixing more linker errors
- Added missing ARM EABI memory manipulation implementations that some
of the kernel code needed that the toolchain didn't provide
- Building successfully the kernel image with clang for the first time
11. Booting the clang-built kernel image for the first time
- Booting gets stuck at "Google" logo
- Checking whether there was any kernel panic by booting to recovery
mode and checking /proc/last_kmsg
- Not finding anything relevant to clang-kernel boot in
/proc/last_kmsg
12. Running over a plethora of possibilities for the
stuck-at-google-logo case
- Kernel might have not been loaded at all by bootloader for some reason
- Generated boot.img might have incorrect offsets for kernel,
ramdisk, etc
- Generated kernel image might be corrupt
- Bootloader might be expecting gcc-specific compiler metadata
instead of LLVM/clang, in kernel image header
- Fastboot might have some way of logging the overall boot sequence
which might have give some hint
13. Attempting to boot the clang-kernel and ramdisk with QEMU/ARM to
debug like I did for x86_64 clang-kernel
- Booting on QEMU/ARM with both kernel and ramdisk, next with just the
kernel, with GUI and without GUI
- Booting the kernel and ramdisk built with Android emulator,
specifically QEMU/ARMel that's part of the SDK
- Cross-compiling Android LittleKernel bootloader for ARM and using
that to boot the clang-kernel on QEMU/ARM
- Not seeing anything happening at all in any of the above scenarios
14. Carrying out more debugging
- Turning off the Nexus 5 and doing a fastboot to get fresh
/proc/last_kmsg if possible
- Adding various debug parameters on kernel command-line
- Trying out different options that fastboot has for specifying
offsets, etc
- Researching on external hardware-based debugging like adding UART
interface to get early boot logs
15. Researching on the very first code that runs when the kernel is
loaded - kernel entry point
- Finding start_kernel() inside init/main.c
- Adding some debug statements over there and not seeing them for
obvious console-not-yet-initialized reason
- Then finding the actual kernel entry point in head.S assembly source
- Researching on ways to print anything at all in the ARM assembly
code within head.S
- Finding printascii that's used for the above case but realizing
it's for a serial console(UART)
16. Understanding the ARM assembly code within head.S and its siblings
and the inline documentation therein
- Getting to know the prerequisites of prior entering the kernel entry
point in head.S
17. Exploring the methods to confirm whether the control is indeed
reaching head.S or not
- Checking if the LED on the bottom of Nexus 5 can be turned on/off with
different colors as an indication
- Researching on doing something with ARM CPU like raising an
exception, or a reset event as an indication
- Researching on different ways of restarting Nexus 5 - checking how
a "reboot" command works at kernel level
- Looking into machine restart logic - translating that to a reset
logic to be used within head.S
18. Using reset logic for more fine-grained debugging instruction by
instrucion within head.S and its siblings
- Noticing PC write was problematic
- Finding ways to branch off to destination instead of modifying PC
which is not recommended as per the docs
- Understanding the end-to-end control flow since the kernel entry
point till start_kernel() of init/main.c
- Following the inter-working of head.S and processor-specific
assembly code during the setup
- Locating some control register access being problematic like that
with PC
19. Researching on other available clang/LLVM toolchains to see if they
work
- Using Android NDK r17's LLVM/clang - not helpful - same outcome -
stuck-at-google-logo case
- Finding main LLVM/clang source and building it to use with kernel
source - not helpful - same outcome
- Finding Snapdragon Qualcomm LLVM/clang toolchain and using it - not
helpful - same outcome
20. Using different diff-tools to compare two binaries : clang-built
kernel and gcc-built kernel
- hexdiff - saw some differences
- vimdiff - some other differenes
21. Using different binary analysis tools to examine the differences
between the gcc-built and clang-built kernels => Android EABI readelf - saw
some ELF header information differences
- Android EABI objdump - compared disassembly, symbols, sections and
their flags, etc
22. Adding mechanism to use the same assembly code settings as that of
gcc
- Using same assembler options that gcc uses while invoking the
assembler for intermediate assembly code
- Using same assembly code setup as that of gcc for data, target
architecture, etc for intermediates
- Automating the above so that it works for every intermediate
assembly file that clang generates
23. Reducing the optimization level of kernel build
- Keeping the oversmart optimization aside - O1 and Os - didn't change
the stuck-at-google-logo case
- Disabling optimization completely - O0 - didn't work - kernel
doesn't support based on what I read online
24. Disabling the caches in kernel config as needed by head.S
- Updating kernel config to disable I-cache and D-cache - didn't help
25. Trying out different assembler options
- Experimenting with different SP sizes, EABI versions - didn't help
26. Trying out different clang options
- Using different possible stack alignments to address any incorrect
assumptions around that - didn't help
27. Correcting the SP access
- Updating access to SP in one of the thread access kernel code as per
one of the online notes
28. Researching more on the lines on what does clang do that's not gcc
doesn't
- Disabling all the optimizations and clang-only features if any
- Disabling all the intrinsic features that clang uses internally -
device rebooted after a failed boot!
- No more stuck-at-google-logo case with the above change
- Witnessing the device auto-reboot with the above change - must be a
kernel panic!
- Checking /proc/last_kmsg within recovery mode - yes, it was the
clang-kernel that panic'ed - good sign!
- Finally, the clang-kernel has started executing after an exhaustive
set of attempts - breakthrough!
29. Locating the source of kernel panic
- Looking at the stacktrace revealed one of the audio codec had a buffer
overrun - fixed it
- Rebuilding the kernel with the fix and retrying
- Seeing some more kernel panics - another audio codec source which
had similar issue - fixed it
30. Booting to the Android GUI with clang-kernel for the first time
- Fixing the kernel panics mentioned above allowed boot to move on
- Seeing Android animation(RR logo in my case) for the first time!
- Getting to the Android home screen after few seconds of wait -
mission accomplished!
31. Verifying all the system information
- Checking kernel version to be showing LLVM/clang toolchain version, etc
- Examining kernel dmesg for clang specific information
- Checking /proc/version for the same
- Checking Settings/About for the same
32. Verifying all the features work
- Checking Camera, Bluetooth, ADB over USB, etc
- Cheking WiFi - didn't work - "connected, no internet"
33. Noticing WiFi symbol had a cross(x) symbol on it
- Browsing failed as expected due to no internet availability
- Disabling WiFi to check if cellular(LTE) network works for internet
- didn't
34. Noticing mobile network also had a cross(x) symbol on it after
disabling WiFi as above
- Browsing with mobile network as well failed as expected due to no
internet availability
- Verifying phone calls work - yes, they worked!
35. Checking logcat, dmesg for any network error
- Noticing SELinux denials for some of the network related actions
- Locating the error stating bandwidth module not loaded
- Narrowing down to the kernel code where the possible issue is
present
36. Fixing the netfilter code for the above issue
- Updating one of the netfilter code with the latest code from that of
the mainline kernel
- Rebuilt the kernel - WiFi and mobile network - both worked!
37. Realizing all the features are now working with a clang-built ARM
kernel for Android!
- Planning to repeat the same with the all ther remaining LLVM/clang
toolchains
38. Using Android NDK r13b's LLVM/clang in place of main LLVM/clang used
so far for building the kernel
- Noticing Kernel panic
- Tracking down the kernel panic to one of the Camera MSM driver code
39. Fixing the Camera MSM driver code in terms of device id specification
- Comparing with other Camera MSM driver source code and finding the
difference if any
- Completing the device/driver id specification with the missing item
- fixed the panic
- Booting to Android home screen this time with even the Android NDK
r13's LLVM/clang-built kernel!
40. Picking the next remaining LLVM/clang toolchains
- Android NDK r17's LLVM/clang - no issues in booting thus far updated
kernel
- Qualcomm Snapdragon LLVM/clang - no issues in booting thus far
updated kernel
41. Performing round up of all the toolchains and combinations of NDK
binutils
- Verifying all the combinations(total 8) :
- Android NDK r13b [gcc + binutils(as, ld, etc)]
- Android NDK r17 [gcc + binutils(as, ld, etc)]
- Android NDK r13b [LLVM/clang + binutils(as, ld, etc)]
- Android NDK r17 [LLVM/clang + binutils(as, ld, etc)]
- Main LLVM/clang + Android NDK r13b binutils(as, ld, etc)
- Main LLVM/clang + Android NDK r17 binutils(as, ld, etc)
- Snapdragon Qualcomm LLVM/clang + Android NDK r13b binutils(as,
ld, etc)
- Snapdragon Qualcomm LLVM/clang + Android NDK r17 binutils(as,
ld, etc)
- Confirming all of the above work - yes, worked!
42. Automating all of the above builds and the testing of the images
- Facilitating automation of all the 8 combination builds
- Collecting the statistics - build time, image sizes, etc
- Summarizing the longest/shortest builds, largest/smallest
zImage-dtb, largest/smallest boot.img
- Testing all of the images one by one for the final time
43. Consolidating the data from the above automation
- Collecting the complete kernel boot(dmesg) log from each build
- Taking snapshots of the kernel version, Android version, build
number from each build - About/System info
44. Wrapping my Android clang-kernel research project!
- Done and dusted. Period.
------------------------------
*[ANDROID ARM LLVM/CLANG-KERNEL ~ RESEARCH *
*WALK-THROUGH]**Stage 1:* First of all, finding the right kernel source for
the hammerhead kernel installed as part of Resurrection Remix(RR) on my
Nexus 5 was a challenge. Initially, based on the kernel version
information, I took dirtyunicorn kernel source from its repo and built it
with unicornblood hammerhead kernel config using Android r13b gcc
toolchain. Then I also built the ramdisk needed from the Resurrection Remix
Android source base. With that created boot.img and tried that with
fastboot - this didn't boot - the usual google logo appeared and looped
back to bootloader. After examining the logs in recovery mode, I noticed
SELinux specific denials. So, tried to fastboot again with SELinux mode set
to permissive. This also didn't boot. After some research, found that the
boot.img that I had was smaller than the one that came with Resurrection
Remix zip. In that ramdisk was much smaller than the one in the zip. So,
after realizing that ramdisk needs other rootfs utils to be part of it
which required some more building from RR source, and since I was only keen
on the kernel part, I took the ramdisk from the zip and generated the
boot.img with that along with my kernel. This booted. Prior arriving at
this, I had also experimented with AOSP kernel source, and other kernel
sources for hammerhead - some of them didn't even boot so had to go back to
dirtyunicorn.
*Stage 2: *The booted kernel had almost everything working except for the
ADB over USB feature. Though ADB over network worked, when I enabled ADB
over USB, I noticed on my host PC, dmesg showed "USB disconnect" event and
would reconnect if I disabled ADB over USB. Tried every other stuff to get
this working including USB driver debugging, etc - nothing consistenly gave
the ADB over USB access. I noticed there were some missing sysfs entries
for Android USB function filesystem. So, I debugged around that code and
added extra logic to make sure the USB won't get disconnected, etc when ADB
over USB was enabled - none helped. After a lot more researching and
experimenting, noticed that the kernel that came with RR zip was built with
username(voidz) and based on that got an update from RR fourm that there's
an updated dirtyunicorn kernel branch which that person had and I took that
and tried to build as I built above. The kernel got built with the same
unicornblood hammerhead kernel config. Generated the boot.img with this
kernel and the ramdisk from the RR zip file, fastboot-ed it, image booted,
and ADB over USB worked as needed.
*Stage 3:* With the above prerequisite of a working kernel(source) for my
Nexus 5 running RR(oreo), I was motivated to build the Android kernel with
LLVM/clang as I had already been successful in building and bringing up
x86_64 Linux kernel(4.15.7 onwards - latest 4.17) for my host system
running Ubuntu 17.10 x86_64 whose reports are available at the end of this
post for the interested people. To begin with, I chose the LLVM/clang
toolchain that was present in Android NDK r13b and started to build the
same kernel I used with Android NDK r13b gcc above. Similar to that of
x86_64 kernel clang-build, there weres some build/compilation errors
involving unknown options, etc.
*Stage 4:* To get the Android kernel clang-build further, I disabled the
integrated assembler, removed the unsuppored compiler option/flags across
kernel components. This allowed clang-build to proceed further. But, there
were some still issues in recognizing the inline ARM assembly code
correctly since the assembler being picked up was still wrong. Setting
CLANG_TRIPLE to arm-linux-gnu and the like in separate attempts, didn't
help in any way. After much research, I found out that I can use "-target"
option to specify the androideabi target and by that clang was made to
search for the assembler with that target as prefix which actually was not
present under /usr/bin by default. Hence, it took the host x86_64 GNU
assembler and was complaining about unknown assembly instrucitons. So, I
created the necessary symlinks over there pointing to that of Android NDK
r13b binutils binaries - this fixed the issue of picking the right
assembler. By the way, I saw some reports on successful ARM64/AARCH64
clang-kernel for Google Pixel* but the steps therein didn't work for my
ARM(32) clang-kernel case, so had to continue with my independent efforts.
*Stage 5:* Progressing further, there was an RCU specific static code check
which failed - after looking around disabled it as it was harmless. The
build progressed further. Next the build failed due to VLAIS used in
several of the kernel components including disk encryption, USB
Gadget/Function Filesystem, CRC32, netfilter, etc. These addressed
compilation errors.
*Stage 6:* Next was to address the linker errors - clang generated
duplicate exception sections across kernel components which resulted in
failure while the linker tried to link all of them together. To address
this, I researched a lot in the internet to understand what are these and
why are they getting added by clang but not gcc, how to get rid of them if
not needed and possible, and so on. So, experimented with options to
disable exceptions, unwind-tables, etc - none helped. There were also
warnings of unwinding not expected to work for some of the kernel
components. Tried to enable the stack unwinding in kernel config - this
addressed the linker error but wanted to make sure that this additional
change to be not present for only clang-build as gcc didn't need it.
*Stage 7:* To address the duplicate the exception sections linker error
mentioned above, after realizing that gcc doesn't generate this error due
to not generating them in the first place, I researched more to finds ways
to exclude them. Found the way to inform the linker to leave these
exception sections in generating the final kernel image - that solved the
issue.
*Stage 8:* After fixing the above linker error of duplicate exception
sections, there were some some more linker complaints about missing
implementations of ARM eabi versions of memory manipulation functions.
Researched on these and found that these are usually available as part of
the toolchain but if absent, they need to be implemented by self. So, I
wrote my own implementations for them and restarted the build - did't see
those linker errors, as expected.
*Stage 9:* With the so far changes, the kernel image got successfully built
with clang for the first time. The zImage-dtb was ready to be combined with
the pre-built ramdisk used above to generate the quintessential boot.img
that I can "fastboot" on my Nexus 5 for the first time.
*Stage 10:* Tried to fastboot with boot.img thus built - the image appeared
to boot but got stuck with the "Google" logo with the unlocked symbol
underneath. Having seen this a lot many times even with gcc, expected the
boot to proceed with Android logo animation(RR logo in this case) after
some time since I thought booting time might differ between gcc/clang-built
kernel. But, it didn't proceed. Having seen a similar thing with x86_64
clang-built kernel for my host system running Ubuntu 17.10 x86_64, my
immediate interpretation was that the kernel "panic-ed" somewhere along the
boot assuming that kernel at least got loaded and started to boot to some
extent. So, I rebooted to recovery mode to retrieve the kernel logs of this
unsuccessful boot. But, the /proc/last_ksmg didn't have any logs pertaining
to the clang-built Android ARM kernel I fastboot-ed. Instead, it was having
some logs from previous gcc-built kernel boot.
*Stage 11:* The above absence of clang-kernel logs itself in
/proc/last_kmsg gave rise to numerous questions :
- Is the kernel getting loaded in the very first place?
- Is the kernel offset, ramdisk offset wrong or off by some bytes for
the clang-built kernel and not the gcc-built kernel?
- Is the mkbootimg command having the base address right, does it have
to be relative or absolute?
- Is there some incorrect zImage-dtb header(wrong version magic, etc)
for some reason, that Android bootloader was considering erroneous, and
hence bailing out and sitting idle after displaying the "Google" logo?
- Is there a way to get the fastboot logs, bootloader logs, where's the
bootloader code of Android?
- How does the Android bootloader(found out that LittleKernel was the
bootloader) source locate and load the kernel with offsets, etc?
- Is there some boot signature that's wrong and hence bootloader isn't
able to load kernel?
*Stage 12:* With no clue of whether kernel was getting loaded at all as
explained above, I wanted to see if it's possible to boot the clang-built
Android ARM kernel and ramdisk used, using QEMU/ARM - just like the
debugging method I had employed for that of x86_64 when I saw the Linux
kernel booting had stopped as I saw "Loading ..." on my host system with
Ubuntu 17.10 x86_64. However, the none of the attempts to boot Android ARM
kernel and ramdisk succeeded - nothing was seen at all either on QEMU GUI
or on the console in no-GUI mode. Tried different ARM machine types in QEMU
options, tried to boot even just the kernel to see if panics by not finding
the rootfs/ramdisk as expected - nothing helped.
*Stage 13:* As an additional effort in bringing up Android ARM clang-kernel
on QEMU/ARM, I thought if I could cross-compile the LittleKernel bootloader
source for ARM and use it as the bootloader with QEMU/ARM, I could see some
action - didn't work. Also, I tried to use an UEFI firmware cross-compiled
for ARM to boot the same - didn't help. Then I tried to use standard
Android emulator(via Android Studio, AVD, etc) to see if I can mention my
Android ARM clang-kernel as the kernel to boot the emulator with the
necessary arguments - didn't help though it took considerable effort to
locate how to load custom kernel in the command-line of Android emulator as
the script/bianary "emulator" loads eventuallly the QEMU/ARMel whose
command-line invocation I finally figured out and used it for my debugging
in this stage.
*Stage 14:* In order to confirm whether ther kernel is at least starting to
execute, I retried fastboot-ing some more times in different modes :
- Powering off the Nexus 5 and powering it on to bootloader mode, so
that old /proc/last_kmsg won't be present and hence no confusion of whether
the kernel logs therein is of my clang-kernel boot or of some other
previous gcc-kernel boot.
- Booting into gcc-kernel after failed clang-kernel boot to see if I can
get the /proc/last_kmsg from this valid Android boot mode instead of in
recovery mode.
- Enabled earlyprintk and UART debugging in kernel config for the
clang-kernel being built and added those parameters in the kernel cmdline.
- Increased kernel log buffer size to 4M.
- Increased the kernel log-level to maximum.
- Researched on who actually renders the "Google" logo if kernel is not
even starting to execute - is it then the bootloader which renders that?
- Why is that with some kernel versions, booting fails and it actually
loops back to bootloader and not in this clang-kernel case?
- Is the DTB appended correctly and TAGS offset correct for the kernel
to find it - if DTB is not present, fastboot itself fails saying the DTB
can't be found in the given boot.img.
- Tried to give the kernel(zImage-dtb) and ramdisk separately in
fastboot command with offsets, etc - fastboot creates the boot.img on the
fly and tries to boot it.
- Checked if fastboot has some useful options to debug more - like some
logging over USB back to host(my PC).
- Researched around to see if I can add some UART module over USB to get
the bootloader/kernel logs before the console(tty) even gets initialized
and I can see the kernel logs - I actually found some article of someone
having added an external UART module some Android smartphone - but after
sometime, I thought it's an overkill for my project - so, dropped that idea
of buying something like that and got back to debugging with whatever I had
- just my Nexus 5!
- Reattempted to boot with QEMU/ARM with different TTY*s, etc
None of the above worked though they helped me in understanding better of
how the Android booting works and its internals.
*Stage 15:* Then came the point of finding what actually runs first when
kernel is made to run - like some low-level kernel code that runs to get
the actual high-level code up and running to start dumping some kernel
logs? At first I found the C kernel entry-point to be start_kernel() in
init/main.c. Tried to put some extra printk() statements at the beginning
of it only to find the console gets somewhere down later after a whole
bunch of other initializations. So, all the printk would get buffered until
then - no use. So, is this the very function that gets called from the
Android bootloader or is there some other low-level code that gets entered
into?
*Stage 16:* After some research, found the architecture specific head.S(ARM
specific kernel entry point from the bootloader!). So, time to dive into
the assembly world - one step closer to the hardware - excited! I spent
some time understanding the instructions over there and the inline comments
over there to see what actually happens where - by that I also looked
around in head-common.S, etc. Finally, realized that this *is* the code to
debug in the first place before the start_kernel() even gets called
ultimately. But, what to debug, where to print some logs, how to debug - no
UART/serial console, no way of finding whether this entry point is even
getting hit - all I was seeing was the "Google" log(which I found out in
some way that it's not rendered by kernel but the
bootloader/some-other-early-code) and nothing else.
*Stage 17:* Upon reading the documentation over there in head.S, I doubted
whether the hardware is setup properly before entering the head.S : it says
certain things about MMU, I-cache, D-cache, etc - are these rightly setup?
At the same time, I also thought, if there was something wrong in that, how
did the gcc-built kernel boot up without any issue - same bootloader, so
same hardware pre-initializations before head.S is entered, same head*.S
code and the same kernel except the compiler being gcc instead of clang -
so, is clang screwing something up in generating the final binary so that
this head.S is not even entered from bootloader?
*Stage 18:* Next was to determine a way to check if the very instruction of
head.S was executed or not. With no external debugging aid, this took a lot
of brainstorming and researching on the internet - can I place some code in
head.S to turn on the small LED on the Android Smartphone in some way to
indicate the head.S was indeed getting hit or is there some other way to
know the same? All of the available debugging methods involved some
external hardware-aid which I didn't have.
*Stage 19:* There came a thought of doing with the ARM CPU itself - can it
be made to raise an exception, interrupt, etc through some ARM instruction?
Can it be reset? Can it made to jump to reset vector? Yes, that was the
"bang on" moment! I researched on the method, rather the instruction to
reset ARM CPU - some articles said some of the registers need to be written
into to enable/disable something so that ARM CPU gets reset but none of
them worked for the ARM environment(Cortex-a15 of Nexus 5) had.
*Stage 20:* After a fair amount of research into how to reset ARM CPU in
bare-metal/assembly mode, I realized that there must piece of low-level
code which resets/reboots the ARM CPU when the user issues a "reboot"
command on console/terminal - where is it? After tracing down the
reboot=>..=>sys_reboot()=>...some level-code to restart machine, I located
the logic which could restart the system. Using that as a reference for my
implementation of ARM CPU reset, wrote a reset logic in ARM assembly and
placed it in head.S. After experimenting substantially, came up with a
reset logic that seemed to work and called it from the kernel entry point
within head.S. With this, rebuilt the clang-kernel, regenerated the
boot.img, tried to fastboot it as usual on my Nexus 5 - the "Google" logo
appeared as anticipated, then after few seconds as I hoped, the boot
process looped back to bootloader! Delighted! My reset logoic worked which
meant the kernel entry point in head.S was indeed getting hit! Great find
after lot of struggle and what not!
*Stage 21:* With the above confirmation of the control coming to kernel
entry point in head.S from bootloader, I moved the reset logic to few
instructions down and repeated the above process of
rebuilding-regnerating-fastbooting and again, I could see the Nexus 5
getting reset to bootloader - proved again that the control was fine so far
good within head.S. Eventually, there was a place where the program counter
was getting updated directly to branch off to a different assembly
procedure - I placed my reset logic just after that one - control didn't
reach this time - there was no reset - booting process got stuck at
"Google" logo like many of the earlier times. This made me think what's
wrong with updating the program counter - is it not accessible/modifiable
directly? how was it not an issue for gcc-kernel?
*Stage 22:* After finding the issue with direct update of program counter
above, I tried a branch off to the destination using the branch instruction
- it worked to my surprise as I moved the reset logic to destination and
got the Nexus 5 reset to bootloader this time! So, there was some issue
with the underlying binary generated by clang with direct update of PC
instead of branching to it like I did. This made me acutally think, since
there were some unsupported ABI specfic options for clang that were
removed, is there something with the way program counter is interpreted?
Hence, I tried experimenting with different program counter sizes,
stack-alignment, and all that to the assembler in addition to the given
arguments from the clang for generating the object code for the
intermediate assembly code generated by clang - none helped, same behavior.
Did more of watching the arguments that are being given by clang to the
assembler and compared with that of the gcc - what were the differences?
Whatever were missing like EABI version, DWARF2 debugging information,
little endian flag, etc, I added them manually along with the arguments
given by clang so as to run the assembler with the same set of options that
gcc was doing - no change.
*Stage 23:* Since I had found a way to move the control forward using
branch instruction, I studied the control flow and replaced all the program
counter updates to corresponding branch instructions - entry point =>
processor initialization => getting MMU ready => turning MMU on => enabling
MMU => starting the kernel where the C code in init/main.c starts
executing. This took a lot of time since the above control transfers were
across assembly files and carefully going through them and understanding
them demanded certain level of effort as I was dealing with entirely ARM
assembly code for the first time on my own with no one to ask any questions
I had - ARM reference manuals, online documentation, open source
discussions were handy in understanding which instructions(in different
*.S) do what. Also, I had to combine several of the assembly code spread
across files together to be able to branch off by name instead of addresses
of the procedures stored in registers.
*Stage 24:* With the replaced assembly code as above, there were some still
issues with respect to the assembly instructions modifying the control
registers - control wouldn't go beyond them for some reason. Just to if
that's really required, had commented it out, and the control would move on
but the same issue would occur in another place with some access to the
control registers. So, I was not sure whether it's okay to leave out all
those control registers read/write - so reverted them to their originals.
*Stage 25:* With additional corrected changes to branching, etc, control
eventually reached the start_kernel() of init/main.c - for confirming this
I implemented the reset logic I wrote in ARM assembly for use in head*.S
and other assembly code, in C and called it within main.c - though the same
kind of reset to bootloader didn't happen right away, with some more
corrections to my C implementation of reset logic, I saw the same reset
behavior - start_kernel() is getting hit for sure as I checked it multiple
times with and without my C-reset logic(all of the earlier reset calls were
disabled, of course).
*Stage 26: *After confirming that start_kernel() was being entered into, I
tried moving the reset logic down one function call at a time - reset
didn't happen after few of function calls - went inside the function after
which the reset was not happening - it turned out to be accessing
CPUID(again some sort of control register access which posed as an issue as
earlier) and the control seemed not returning from it to the start_kernel().
*Stage 27:* Tried to change the linkage of the first few functions being
called to be ASM linkage to see if it makes any difference as
start_kernel() was being called(rather branched into) without any issue but
not the subsequent functions - didn't make any difference. The goal was to
find the underlying issue in any possible way - so methods employed though
seemed not so relevant many a times, I have tried them anyway so that I
rule them out from further considerations confidently if they didn't work
out or didn't help.
*Stage 28: *Tried to move the console initializations over other
initializations to be able to see printk messages as early as possible -
didn't work anyway.
*Stage 29: *After all the above, somewhere I thought what if there are some
fixes in the newer Android NDK LLVM/clang that were absent in the one I was
using(NDK r13b) - downloaded the latest NDK r17 and rebuilt the kernel
using the LLVM/clang that was in NDK r17, regenerated boot.img as usual,
fastboot-ed it - didn't boot - same result as earlier - got stuck in
"Google" logo.
*Stage 30:* Looked around on the internet for any reports on how to use
clang for Android kernel - found ARM64/AARCH64 report which mentioned using
main LLVM for building the LLVM/clang on our own. So, followed the steps
there, got the LLVM source, built it as needed, got the toolchain binaries
ready : clang, clang++, etc. Used it for rebuilding the kernel, repeated
the rest of process - didn't boot - same result as earlier.
*Stage 31:* Researched some more to see if there's any other LLVM toolchain
for Android that people have used. Found Qualcomm Snapdragon LLVM
toolchain. Downloaded it from Qualcomm site. Used the clang within it to
rebuild the kernel and repeated the rest of the process - did *not* boot -
same result as earlier. So, decided to continue rest of the effort with
main LLVM/clang download and built in the above stage instead of Android
NDK r13b's LLVM/clang.
*Stage 32:* As the next resort, tried to see differences between the kernel
images built by clang and gcc using hexdiff, vimdiff, etc - saw some
differences but couldn't find out anything useful.
*Stage 33: *Similarly did a hexdiff between boot.img(s) with clang and gcc
kernel - is the kernel offset same in both the cases and so on?
*Stage 34:* Used Android NDK's readelf to examine the kernel images from
both clang and gcc builds(vmlinux - both compressed and uncompressed) in
terms of the ELF header. Saw some metadata differences - analyzed them to
understand if they affected the kernel executions in any way.
*Stage 35: *Used Android NDK's objdump to determine differences in the
sections within the kernel images built using clang and gcc - compared the
disassembled code from both cases - saw the differences - went back to
assembler invocation - checked the arguments being passed - added manually
whatever were missing from the clang invocation of assembler when compared
to the gcc invocation of the same assembler. Even then theere were
differences in terms of number of sections(text, etc) and their flags,
symbols, and so on - spent considerable amount of time to understand if
they matter with respect to the reason behind the clang-kernel not booting.
*Stage 36:* Going one step further, added a mechanism to save the
intermediate assembly file that clang generateed for each source(*.S and
*.c), and compared that with that of gcc - there were some differences in
terms of optimizations, instruction uses, etc - this made me think there
could be some optimizations(over) that might be resulting in unexpected
behavior that I am seeing when booting clang-kernel?
*Stage 37:* Based on the above, added a mechanism to replace the assembler
directives that were placed by clang with that of gcc, for all of the
intermediate assembler code generated - this was to make sure that the ELF
generated by clang for the kernel has exactly same ELF header specifics as
that of the kernel generated by gcc - automating the whole thing took a lot
of work and rework. There was also some unamed intermediate assembly file
with architecture set to ARMv5 instead of ARMv7 - so did that replacement
as well to make sure I get the exact assembly code specifics as that with
gcc.
*Stage 38:* To keep the differences between the code generated by clang and
gcc at minimum, reduced the optimization level of kernel from O2 to O1 -
rebuilt the kernel and redid the rest of the process - didn't help in
booting the kernel - same outcome as earlier
*Stage 39:* Totally disabled optimization with O0 - this didn't even
compile - failed with some errors - researched on the internet and came to
know that the Linux kernel code is meant to be built with optimization at
least of O1.
*Stage 40:* Changed the optimization criteria to be for size(Os) - repeated
the whole process of building, etc - didn't help
*Stage 41:* Based on what I read in the documentation at the entry point in
head.S, disabled the D-cache, I-cache, etc in kernel config to see if they
bring any change - didn't affect.
*Stage 42:* Revisited the changes I had made including the memory
manipulation functions I had implemented - Are they right? Is there any
alignment considerations I need to do? What does a standard implementation
looks like? Researched more along these lines, found the actual signatures
of these functions, corrected my implementations accordingly - rebuilt the
kernel, repeated the rest of the process - didn't change anything.
*Stage 43:* Looked around on the internet again why these memory
manipulations aren't present by default - can this be because of the
optimization level chosen? Is there a way to enable these instead of
implementing it myself? - didn't find anything of that sort.
*Stage 44:* Meanwhile, came across some report on the stack pointer access
for thread local storage needing some explicit assembly-level access.
Included that as well, redid the whole thing - no change in the kernel
booting behavior though.
*Stage 45:* Eventually, I came to the thought there is something extra
that's being done by clang that's resulting in some unneeded optimization
that's leading to some misinterpretation of given assembly and/or C code -
so disabled every other optimization including the intrinsic ones that's
built into the compiler code generation after exploring the relevant
discussions and clang documentation. Rebuilt the kernel and repeated the
rest of the process - to my great satisfaction, this time it didn't get
stuck in "Google" logo but it rebooted to boot with the existing
kernel(installed on my Nexus 5) - I knew this happens only if there was
some kernel panic for any legitimate reason and as I had already disabled
all the "reset logic" everywhere, I was pretty sure about I could get the
dump of this kernel panic or oops in the recovery mode via /proc/last_kmsg.
*Stage 46:* With the above first milestone of finally achieving the loading
and running of the clang-built kernel on Nexus 5 though it looped back to
reboot with the installed kernel because it encountered some kernel panic,
I was immensely happy and overall project gained incredible momentum since
I had to go through tons of research, attempts, experiments, documentation
and articles day and night just to get the kernel started and see some logs
from it. And yes, I went to recovery mode, and had a look at the
/proc/last_kmsg which showed kernel panic - having started from zeroth
level in this project seeing my Nexus 5 getting stuck at "Google" logo for
hundreds of time since the start of project across numerous attemps spread
across day and night amidst so many things, it was a great indication of
eventually having a working clang-built Android kernel on my Nexus 5 since
I at least saw my clang-built Android kernel alive for some time for the
first time ever!
*Stage 47:* Just to confirm, I reverted the most recent changes mentioned
above to see if they were "actually" the reason to get the clang-kernel up
and running - they were indeed the reason - confirmed it couple of times -
great!
*Stage 48:* Focusing on the kernel panic, it was from one of the audio
codecs with resepct to buffer overrun - fixed it, rebuilt the kernel,
re-fastboot-ed it - there was another audio codec with the same failure -
fixed that as well, repeated the process and there were some more of these
and fixed all of them - kernel moved forward, saw the RR logo
animaton(Andriod logo in case of AOSP)! Android is finally booting!
*Stage 49:* After a plethora of attempts above, finally I got the Android
successfully booted up on the clang-built ARM kernel on my Nexus 5 -
confirmed the system information from "adb shell - dmesg - proc-version,
etc" and also from "About/System" section of Android settings which showed
the clang-built kernel version information in terms of the clang compiler
version used to build the kernel, etc. Immensely happy!
*Stage 50:* Noticed the WiFi got connected but had a cross(x) symbol on it
- "connected, no internet" was the status. Checked if data over LTE works
if not WiFi - didn't - same cross mark if I disabled WiFi and tried to use
Cellular data(LTE) - phone calls would work but no internet - checked
logcat/dmesg - showed some issue with SELinux denials - spent some time on
whether it's the kernel or ramdisk issue, etc.
*Stage 51:* Tracked down the above WiFi issue to be some bandwidth module
being not loaded - checked for different entries for WiFi/wlan under sysfs
and their status and so on - found out a particular piece of netfilter code
had some VLAIS in use which I had fixed as per clang requirements - later
on figured out that the latest mainline kernel had an updated code for that
piece - incorporated that, rebuilt and retried - WiFi worked - no more
cross symbol - same case with cellular data. Also, verified that ADB over
USB, etc work - so far so good.
*Stage 52:* With everything working so far, I had used main LLVM/clang so
far as the clang compiler, so wanted to try the same with all the other
clang toolchains I had earlier experimented with, for the sake of
universality. So, at first, picked Android NDK r13b's clang/LLVM - rebuilt
the kernel and repeated the rest of the process - didn't boot - looped back
to boot with the existing kernel - sign of some kernel panic.
*Stage 53:* Examined the /proc/last_ksmg as usual to debug the above issue
- saw some kernel panic in Camera MSM driver code - tracked down to the
actual function as per the stack-trace - after substantial debugging
without finding the actual reason for the panic, I decided to move on to
the other remaining LLVM/clang toolchains as this was only seeming to
happen with Android NDK r13b's clang(which is not the latest as well) - so
why waste any more time?
*Stage 54:* Took Android NDK r17's LLVM/clang for rebuilding the kernel and
tried bringing it up - worked without any issue(no panic, etc) - Android
came up - WiFi, etc all were working - verified the kernel version
information, compiler version, etc - all good.
*Stage 55:* Next was to use Qualcomm Snapdragon's LLVM/clang to rebuild the
kernel and verify everything works - did that - everything worked.
*Stage 56:* With all verified except with the Android NDK r13b's LLVM/clang
with which there was a kernel panic from Camera MSM driver code, I decided
to go back to it and fix it once for all as I didn't have anything else
pending before wrapping this project up!
*Stage 57:* Examined the Camera MSM driver code across different files and
compared them with each other, added printk-logs. Surprisingly, I noticed
that there was a incorrect driver's code being called back for the device
id table registered - there were some string manipulation usage within
these functions - suspected them to be overrunning the stack or so and
corrupting something up causing such a behavior - dumped various flags,
data, numbers, names, etc for the driver, device and its parameters to find
out who is calling the wrong driver - found the init code was calling this
as a call back after the driver got registered.
*Stage 58:* For one moment, had to check the device tree source for this
device for which wrong driver was called back - didn't find anything
incorrect as the same works for every other clang-toolchain and even gcc.
*Stage 59:* Tried to debug the init code which was calling this based on
the device id matching the driver, went through different device driver
bare-bone code to understand how a device-and-driver gets mapped onto and
the driver's file operations are called - found a difference between the
suspected driver and the actual driver which should have been called for
the wrongly mapped device, in terms of device id specification.
*Stage 60:* Added the missing item in the suspected driver to have the
device id specification as expected by the device driver bare-bone
framework - rebuilt the kernel, repeated the rest of the process - worked!
Android booted even with Android NDK r13b's LLVM/clang!
*Stage 61:* With that, everything was done and dusted. For the sake of
completeness, I redid the whole thing with different combinations of
LLVM/clang and NDK's binutils.
*Stage 62:* Repeated the process with Android NDK r13b's gcc - all good -
no regressions because of changes done for clang.
*Stage 63:* Repeated the process with Android NDK r17s gcc - all good - no
regressions because of changes done for clang.
*Stage 63:* Repeated the process with Android NDK r13b's (LLVM/clang +
binutils(as, ld, etc)) - no issues
*Stage 64:* Repeated the process with Android NDK r17's (LLVM/clang +
binutils(as, ld, etc)) - no issues
*Stage 65:* Repeated the process with (Main LLVM/clang) + (Android NDK
r13b's binutils(as, ld, etc)) - no issues
*Stage 66:* Repeated the process with (Main LLVM/clang) + (Android NDK
r17's binutils(as, ld, etc)) - no issues
*Stage 67:* Repeated the process with (Snapdragon Qualcomm LLVM/clang) +
(Android NDK r13b's binutils(as, ld, etc)) - no issues
*Stage 68:* For the sake of completeness and compactness, silenced the
various noisy warnings for clang case. Despite that Snapdargon Qualcomm
LLVM/clang's instance was showing unavailability of vectorization because
of missing needed options to enable them - tried to add the missing ones -
didn't silence the warning - researched on that - came to know that there
were little benefits of it - ignored those warnings and didn't silence them.
*Stage 69:* Repeated the process with (Snapdragon Qualcomm LLVM/clang) +
(Android NDK r17's binutils(as, ld, etc)) - no issues
*Stage 70:* Added clang-specific build-level changes under clang-only
category to not mess with gcc-builds
*Stage 71:* Reverified that everything is fine with all the toolchains with
thus far set of changes
*Stage 72:* Automated the whole of the above combinations, generated the
boot.imgs, compared the build-times, zImage-sizes, boot.img's size - listed
out the shortest/longest builds, smallest/largest zImage, smallest/largest
boot.img.
*Stage 73:* Ran the overall automation several times with any activity on
my system and with the activity to see the difference in build times, etc.
Note down the results.
*Stage 74:* Wrote test automation to fastboot all of the images one by one
and checked the sanity of all the images - all good - all done. Period.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180612/9a51a698/attachment-0001.html>
More information about the llvm-dev
mailing list