[llvm-dev] A tagged architecture, the elephant in the undef / poison room

Thu Jun 15 16:15:06 PDT 2017

> It seems like a odd choice given that none of our current targets
> are tagged architectures, all of this tagged IR has to somehow be
> reduced back down to normal target machine instructions.

I agree with Krzysztof that an analogy with tagged architectures is 
strained. One way to see this is to look at the interpreter mode of lli, 
which is a proper LLVM interpreter that doesn't need to tag values as 
poison or not. It completely ignores poison afaik.

> Correct me if I am wrong, but it seems no one has outlined a proof
> that C programs can be translated into IR+”poison", optimized according
> to the definition of “poison”, and always be guaranteed to still
> be the same C program. Right now all we have are beliefs
> based on intuition that this can be done, but no actual proof.

It isn't clear that producing such a proof is a good value proposition, 
since Clang isn't a major source of miscompilation bugs, as far as I 
know. The problems under discussion (e.g, in the freeze paper) are in 
the middle end, not the front end.

> In other words,  everyone believed that “undef” was the way to go until
> counter examples started showing up, and now everyone believes
> that “poison” is the way to go, which will be true until counter examples
> start showing up. We are jumping from one shaky precipice to another,
> we still are not yet on solid ground.

I disagree with this characterization. Allow me to propose a different one.

- In the beginning, there was no undef and no poison. This was a 
workable situation, a perfectly suitable basis for a correct compiler, 
but certain desirable optimizations could not be performed.

- Undef was added to allow the IR to express don't-care values, with the 
result that writing some optimizations got a bit harder but also more 
optimizations could be performed.

- Poison was added to allow the IR to express additional optimizations 
that cannot be justified by undef, with the result that writing some 
optimizations got a bit harder still, but again more optimizations could 
be performed. This is the current situation and there are three 
problems. (1) People weren't always sure which of poison or undef to 
use, this confusion persists and needs to get fixed. The common case (I 
think) is a comment or documentation saying undef where it should be 
poison. (2) Undef and poison interact in tricky ways. (3) Conflicting 
assumptions were sometimes made about how UB works, leading to problems 
such as ones described in the freeze paper.

But let's be clear: UB problems are in corner cases and they don't 
affect very many developers or users. The sense of the community (as far 
as I can tell) is that a fundamental overhaul isn't called for. Various 
proposals have been made, including ours, about how to fix the UB model, 
but none has yet been adopted.

Peter, you are obviously entitled to your views, but my take is that it 
probably isn't productive to start multiple UB-related threads in a 
short time period, nor is it productive to sort of throw stones at all 
aspects of the model. We can't debate everything at once and we're not 
going to throw out a huge base of optimizations built around the current 
model. I'll also repeat Daniel's request that you tone down the 
open-ended requests for proofs/counterexamples/whatever that would 
generate a lot of work for anyone trying to discuss these matters with 
you. Instead of producing abstract arguments, please provide concrete 
examples, ideally accompanied by code.

John