[cfe-dev] RFC: Implementing -fno-delete-null-pointer-checks in clang

Wed Apr 18 10:13:46 PDT 2018

Hi,

This is regarding support for -fno-delete-null-pointer-checks in clang (PR
9251).
Linux kernel uses this flag to keep null pointer checks from getting
optimized away.  Since clang does not currently support this flag, it often
invites comments from kernel devs that clang is not yet *ready* to compile
Linux kernel.
I have also heard that developers working on firmware, bare-metal tools and
other low level tools often want to use this flag as well.

Therefore, I would like to implement support for this flag (maybe with a
different name), and would like to know the opinions on how it should be
implemented.

Some options/opinions:

1 (From Duncan Sands comment at PR 9251)
> This could be implemented by putting pointers in a non-default address
space when this flag is passed.
Any ideas how will this work for languages/targets that actively make use
of non-default address spaces e.g. OpenCL/GPU targets.  Also, I believe
that allocas still need to kept in default address space so they need to be
bitcast to the *no-default* address space before use.

2. Use this flag like any other optimization flag. Find and fix all
optimizations in clang/llvm related to null pointer accesses.

Thanks,
Manoj

Background:
https://lkml.org/lkml/2018/4/4/601

 From Linus Torvalds:

Note that we explicitly use "-fno-delete-null-pointer-checks" because
we do *not* want NULL pointer check removal even in the case of a bug.

See commit a3ca86aea507 ("Add '-fno-delete-null-pointer-checks' to gcc
CFLAGS") for the reason: we had buggy code that accessed a pointer
before the NULL pointer check, but the bug was "benign" as long as the
compiler didn't actually remove the check.

And note how the buggy code *looked* safe. It was doing the right
check, it was just that the check was hidden and disabled by another
bug.

Removing the NULL pointer check turned a benign bug into a trivially
exploitable one by just mapping user space data at NULL (which avoided
the kernel oops, and then made the kernel use the user value!).

Now, admittedly we have a ton of other security features in place to
avoid these kinds of issues - SMAP helps on the hardware side, and not
allowing users to mmap() NULL in the first place helps with most
distributions, but the point remains: the kernel generally really
doesn't want optimizations that are perhaps allowed by the standard,
but that result in code generation that doesn't match the source code.

The NULL pointer removal is one such thing: don't remove checks in the
kernel based on "standard says". It's ok to do optimizations that are
based on "hardware does the exact same thing", but not on "the
standard says this is undefined so we can remove it".