[llvm-dev] A tagged architecture, the elephant in the undef / poison room
John Regehr via llvm-dev
llvm-dev at lists.llvm.org
Thu Jun 15 16:15:06 PDT 2017
> It seems like a odd choice given that none of our current targets
> are tagged architectures, all of this tagged IR has to somehow be
> reduced back down to normal target machine instructions.
I agree with Krzysztof that an analogy with tagged architectures is
strained. One way to see this is to look at the interpreter mode of lli,
which is a proper LLVM interpreter that doesn't need to tag values as
poison or not. It completely ignores poison afaik.
> Correct me if I am wrong, but it seems no one has outlined a proof
> that C programs can be translated into IR+”poison", optimized according
> to the definition of “poison”, and always be guaranteed to still
> be the same C program. Right now all we have are beliefs
> based on intuition that this can be done, but no actual proof.
It isn't clear that producing such a proof is a good value proposition,
since Clang isn't a major source of miscompilation bugs, as far as I
know. The problems under discussion (e.g, in the freeze paper) are in
the middle end, not the front end.
> In other words, everyone believed that “undef” was the way to go until
> counter examples started showing up, and now everyone believes
> that “poison” is the way to go, which will be true until counter examples
> start showing up. We are jumping from one shaky precipice to another,
> we still are not yet on solid ground.
I disagree with this characterization. Allow me to propose a different one.
- In the beginning, there was no undef and no poison. This was a
workable situation, a perfectly suitable basis for a correct compiler,
but certain desirable optimizations could not be performed.
- Undef was added to allow the IR to express don't-care values, with the
result that writing some optimizations got a bit harder but also more
optimizations could be performed.
- Poison was added to allow the IR to express additional optimizations
that cannot be justified by undef, with the result that writing some
optimizations got a bit harder still, but again more optimizations could
be performed. This is the current situation and there are three
problems. (1) People weren't always sure which of poison or undef to
use, this confusion persists and needs to get fixed. The common case (I
think) is a comment or documentation saying undef where it should be
poison. (2) Undef and poison interact in tricky ways. (3) Conflicting
assumptions were sometimes made about how UB works, leading to problems
such as ones described in the freeze paper.
But let's be clear: UB problems are in corner cases and they don't
affect very many developers or users. The sense of the community (as far
as I can tell) is that a fundamental overhaul isn't called for. Various
proposals have been made, including ours, about how to fix the UB model,
but none has yet been adopted.
Peter, you are obviously entitled to your views, but my take is that it
probably isn't productive to start multiple UB-related threads in a
short time period, nor is it productive to sort of throw stones at all
aspects of the model. We can't debate everything at once and we're not
going to throw out a huge base of optimizations built around the current
model. I'll also repeat Daniel's request that you tone down the
open-ended requests for proofs/counterexamples/whatever that would
generate a lot of work for anyone trying to discuss these matters with
you. Instead of producing abstract arguments, please provide concrete
examples, ideally accompanied by code.
More information about the llvm-dev