[cfe-dev] [llvm-dev] RFC: Implementing -fno-delete-null-pointer-checks in clang

Mon Apr 30 11:53:02 PDT 2018

[I want to preface this by saying that given I was late to this
discussion, I'd be fine if we proceed with a function attribute based
solution despite my objections]

On Mon, Apr 30, 2018 at 11:19 AM, Reid Kleckner <rnk at google.com> wrote:
> Personally I don't think adding a new address space to express the idea that
> loads and stores may successfully dereference address zero is the best
> design.
>
> The existing code to support the handling of address spaces does not inspire
> a lot of confidence. I am a simple non-GPU compiler developer. I have no
> idea how address spaces are supposed to work. When I read code that deals
> with them, I assume nothing about address spaces. I treat them
> conservatively. They are a black box, I ignore them.

As an aside, there are non-GPU users of address spaces too.  For
instance, Azul uses (or at least used to when I was there :) )
represent GC-able pointers using address space 1, and we spent a lot
of time ensuring that all of the normal scalar optimizations kicked in
for address space 1 pointers (though I'm not sure if all of those
changes are upstream yet).

In any case, while I agree with your assessment of the problem, I
don't agree with the proposed solution.  "not everyone (wants to)
understands x" should not be a reason to re-invent a subset of x in
some other form.  Instead we should double down making the existing
thing more clearly and explicitly specified so that understanding it
is easy!

Of course, there are boundaries.  E.g. to me it isn't a big deal if a
backend wants to introduce an intrinsic or an attribute for
convenience despite it overlapping with existing things.  However,
here we're talking injecting new behavior into core LLVM passes so the
scope is larger.

> This may sound bad, but this is normal. We all have different reasons to use
> and contribute to LLVM, and as much as we benefit by sharing designs and
> development resources as possible, sometimes we end up creating and playing
> in our own sandboxes and ignoring possible design space overlap with other
> users. A small amount of duplication of design and effort is OK if it allows
> people to make forward progress independently. I think this is an area where
> we want to do that.

To me, his does not look like a small amount of duplication.  For
instance, we have to answer questions like what happens when we inline
a function marked "null-ptr-is-dereferenceable" into a regular
function (my current understanding is that we can't inline them) and
whether malloc() can return null or not.  I'd rather answer these
kinds of questions in one place if possible.

-- Sanjoy

> I think we can bear the code complexity cost of having to check for both
> address spaces and this option, and I think both parties (address space
> users and -fno-delete-null-checks users) are willing to audit LLVM twice for
> these transformations.
>
> On Sat, Apr 28, 2018 at 1:12 PM Sanjoy Das via cfe-dev
> <cfe-dev at lists.llvm.org> wrote:
>>
>> On Thu, Apr 19, 2018 at 3:56 PM, Manoj Gupta via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> > My understanding is we only want to disable  the optimizations regarding
>> > undefined behavior
>> > related to null pointer deference optimizations. And address space
>> > checks
>> > will end up
>> > disabling more optimizations than needed.
>>
>> [Repeating what others have already mentioned on this thread]
>>
>> I this it is better to avoid framing this as "disable optimizations that
>> exploit
>> UB" and instead frame this as "define some behavior that was undefined
>> before".
>>
>> > I did look at some of the optimizations/transforms and there are some
>> > that
>> > we definitely want to keep.
>> >
>> > Just a quick example from grepping:
>> > lib/Transforms/Scalar/LoopIdiomRecognize.cpp
>> > ...........
>> >              // Don't create memset_pattern16s with address spaces.
>> >              StorePtr->getType()->getPointerAddressSpace() == 0 &&
>> >              (PatternValue = getMemSetPatternValue(StoredVal, DL))) {
>> >     // It looks like we can use PatternValue!
>> >     return LegalStoreKind::MemsetPattern;
>> >   }
>> >
>> > Even worse, Sanitizers do NOT work with address spaces which is a big
>> > deal
>> > breaker IMO.
>>
>> IMO fixing these seems less engineering overhead in the long term than
>> introducing
>> yet another knob to the IR.
>>
>> More importantly, we should _not_ be doing these optimizations without
>> auditing
>> them individually.  For instance with -fno-delete-null-pointer checks, it
>> isn't
>> okay to change a memset loop to a call to llvm.memset unless we've ensured
>> llvm
>> DTRT for llvm.memset and null checks (i.e. the null check in
>> "llvm.memset(ptr,
>> ...); if (!ptr) {}" does not get optimized away).
>>
>> > Since address spaces and null pointers are really orthogonal issues, I
>> > would
>> > prefer to
>> > not conflate them.
>>
>> I'm not sure I agree with this.  Address spaces are a mechanism to provide
>> additional semantics to pointers.  In this case the additional property we
>> want is "address 0 may be dereferenceable", and using address spaces seems
>> appropriate.
>>
>> > In addition, It is already not easy to convince Linux Kernel maintainers
>> > to
>> > accept clang specific patches.
>>
>> Maybe I'm missing something, but I thought in this address space scheme
>> we'd
>> still provide a -fno-delete-null-ptr-checks flag -- it is clang that would
>> mark
>> pointers with address space n in this mode.
>>
>> > From Linux kernel maintainers POV, Clang built kernels are already
>> > "getting
>> > lucky". So I am not too
>> > worried about missing a few cases.
>> > I'll be glad to fix the missing cases whenever reported.
>>
>> It seems to me that adding back missing (but correct) optimizations when
>> reported is better than removing existing (but incorrect) optimizations
>> when
>> reported.  If I were a kernel developer (which I am not) I'd rather have a
>> kernel that boots slower than a security vulnerability.
>>
>> > I also have some other concerns with address spaces e.g. how to pick a
>> > safe
>> > address space not used by
>> >  any target e.g. many targets (in and out-of-tree) actively use non-zero
>> > address spaces.
>> > User code can also specify any address space via
>> > __attribute__((address_space(N))) so mixing object
>> > files will be tricky in such cases.
>>
>> I think for that we can start a thread on cfe-dev and llvm-dev about
>> reserving
>> address spaces.  While at it, we should reserve a range of address spaces
>> for
>> LLVM middle-end use so that we can do more things like
>> -fno-delete-null-pointer-checks without worrying about address space
>> clashes.
>>
>> -- Sanjoy
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev