[llvm-dev] The undef story

John Regehr via llvm-dev llvm-dev at lists.llvm.org
Thu Jun 29 11:28:27 PDT 2017


On 6/29/17 9:41 AM, Peter Lawrence via llvm-dev wrote:

> This doesn’t make sense to me, a shift amount of 48 is “undefined” for
> unsigned char,
> How do we know this isn’t a source code bug,
> What makes us think the the user intended the result to be “0”.
>
> This strikes me as odd, we are mis-interpreting the user’s code
> In such a way so as to improve performance, but that isn’t necessarily
> what the user intended.

The quoted text above is indicative of a serious misunderstanding and I 
would like to stop it from leading anyone else astray.

The error is in thinking that we should consider the intent of a 
developer when we decide which optimizations to perform. That isn't how 
this works. LLVM code has a mathematical meaning: it describes 
computations. Any transformation that we do is either mathematically 
correct or it isn't.

A transformation is correct when it refines the meaning of a piece of 
IR. Refinement mostly means "preserves equivalence" but not quite 
because it also allows undefined behaviors to be removed. For example 
"add nsw" is not equivalent to "add" but an "add nsw" can always be 
turned into an "add". The opposite transformation is only permissible 
when the add can be proven to not overflow.

This is like the laws of physics for compiler optimizations, it is not 
open to debate.

The place to consider developer intent, if one wanted to do that, is in 
the frontend that generates IR. If we don't want undef or poison to ever 
happen, then we must make the frontend generate IR that includes 
appropriate checks in front of operations that are sometimes undefined. 
To do this we have sanitizers and safe programming languages.

SUMMARY: The intent, whatever it is, must be translated into IR. The 
LLVM middle end and backends are then obligated to preserve that 
meaning. They generally do this extremely well. But they are not, and 
must not be, obligated to infer the mental state of the developer who 
wrote the code that is being translated.

John


More information about the llvm-dev mailing list