[LLVMdev] RFC: implicit null checks in llvm

Sanjoy Das sanjoy at playingwithpointers.com
Tue Jun 2 14:42:02 PDT 2015


I decided to go with Andy's suggestion of lowering explicit null
checks into implicit null checks late, after register allocation.  The
tip of the change is at http://reviews.llvm.org/D10201.

-- Sanjoy

On Wed, Apr 29, 2015 at 6:52 PM, Andrew Trick <atrick at apple.com> wrote:
>
>> On Apr 24, 2015, at 4:14 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
>>
>> I don't think we can expose the memory operations directly from a
>> semantic, theoretical point of view.  Whether practically we can do
>> this or not is a different question.
>>
>> Does LLVM do optimizations like these at the machine instruction
>> level?
>>
>>
>>   if (condition)
>>     T = *X  // normal load, condition guards against null
>>
>>   EH_LABEL // clobbers all
>>   U = *X  // implicit null check, branches out on fault
>>   EH_LABEL // clobbers all
>>   ...
>>
>> =>
>>
>>  since the second "load" from X always happens, X must be
>>  dereferenceable
>>
>>
>>   T = *X  // miscompile here
>>
>>   EH_LABEL // clobbers all
>>   U = *X  // implicit null check, branches out on fault
>>   EH_LABEL // clobbers all
>>   ...
>>
>> The fundamental problem, of course, is that we're hiding the real
>> control flow which is
>>
>> if (!is_dereferenceable(X))  branch_out;
>> U = *X
>
> That’s a good description of the problem.
>
> Lowering to real loads will *probably* just work because your are being saved by EH_LABEL instructions which are conservatively modeled as having unknown side effects. The feature that saves you will also defeat optimization of those loads. I don't see any advantage of this in terms of optimizing codegen. It is just a workaround to avoid defining pseudo instructions.
>
> The optimal implementation would be to leave the explicit null check in place. Late in the pipeline, just before post-ra scheduling, a pass would combine and+cmp+br+load when it is profitable using target hooks like getLdStBaseRegImmOfsWidth(). Note that we still have alias information in the form of machine mem operands.
>
> You could take a step in that direction without doing much backend work by lowering to pseudo-loads during ISEL instead of using EH_LABEL. Then the various load/store optimizations could be taught to explicitly optimize normal loads and stores over the pseudo loads but not among them.
>
> Andy




More information about the llvm-dev mailing list