[llvm] r209007 - Fix most of PR10367.

Sun May 18 02:30:08 PDT 2014

John McCall wrote:
> On May 18, 2014, at 12:50 AM, Nick Lewycky<nicholas at mxc.ca>  wrote:
>> John McCall wrote:
>>> On May 17, 2014, at 10:42 PM, Nick Lewycky<nicholas at mxc.ca>   wrote:
>>>> John McCall wrote:
>>>>> On May 17, 2014, at 9:03 PM, Nick Lewycky<nicholas at mxc.ca>    wrote:
>>>>>> Suppose we come up with a new set of constants that represent the set of relocations available in the targets we support. This would permit us to generate optimal code for those targets, instead of our current situation of producing worse code on Linux/ELF because Mach-O doesn't support certain kinds of relocation.
>>>>>
>>>>> Your proposal is seriously to introduce relocations as a first-class concept in LLVM IR?
>>>>
>>>> Yes, filling the role ConstantExpr fills now.
>>>>
>>>>> What possible benefit does that have over just allowing them to be expressed naturally as ConstantExprs and then ensuring that front-ends and transforms are smart about not introducing ConstantExprs that the tool chain / target can’t handle?  You know, like they already do today, both for relocations and for a thousand other features that might be impossible on the target.
>>>>
>>>> I need to pull this apart a bit to answer it. Why get rid of ConstantExpr?
>>>>
>>>> ConstantExpr has the wrong design. The constants available in LLVM are typed bags of bits, the address of a function, the address of a global variable, an alias, and relocations. ConstantExpr is modeled as a mirror of the set of LLVM instructions -- and which of those things in the previous list correspond to instructions? Immediately we see problems with the model, there are instructions aren't mirrored -- nobody has a call ConstantExpr, or a load, or unreachable. Trapping is just another problem on the list.
>>>
>>> …because control flow and side effects aren't constants.  Are there side-effectful relocations you’re concerned about representing?  What is the use of a side effect which occurs in an unreliable order at load time?
>>
>> Yeah, there's IFUNC. Good uses include checking the cpuid or cache line size and resolving the relocation to the more efficient function implementation or lookup table. Bad uses include side-effecting functions, my personal favourite being a scheme that requires the dynamic loader to be re-entrant (sigh).
>
> That’s a fair point, but I thought a large part of the point of IFUNC is that users of the symbol don’t have to know it’s defined with IFUNC.  I understand that that doesn’t solve the question of how to represent an IFUNC definition.
>
> Also, I don’t think we want to guarantee much about the exact timing of the execution of IFUNC functions.
>
>>> Then LLVM pervasively mis-uses canonicalization.
>>
>> This is a great point of discussion. You can design a compiler IR either way, deciding that we'll go with the notion that this is okay behaviour or the notion that llvm is doing this wrong, and either way is internally consistent.
>
> Well.  I think the historical evidence in LLVM has told us pretty consistently that this approach is insanely brittle and leads to rampant miscompilations with really awkward fixes.  It does not fail safely.

*nod*

This is a really good argument we're having, but I think we should 
shelve it until there's a more concrete proposal to argue over -- at 
which time it will also be an important argument. I think what we're 
going to end up doing is arguing it twice (once now and once later) 
instead of once (once later).

>>> ConstantExprs can be meaningfully interpreted by target-independent code, even if they can’t always be validly generated by it.  Or really, they can always be generated by target-independent code (assuming it can be proven not to trap), they just can’t necessarily be used in certain places.
>>
>> Oh that reminded me. Getting rid of ConstantExpr's lets us get rid of much of the llvm::Operator layer and the undefined behaviour it brings, caused by people trying to treat instructions and constantexpr's with the same code.
>>
>>> Both of our proposals involve target-specific IR emission/transformation using target-specific information to produce a constant that’s only valid on specific targets.  The only difference is that you want to make constant completely opaque to target-independent code, even when it’s something obvious like “take the low 17 bits of this pointer”, because you find the possibility that somebody might accidentally form such a constant — and thus require the backend to reject the code — aesthetically unappealing.  In contrast, I feel that allowing the IR to express things that can’t necessarily be emitted by the backend in every context preserves a lot of very useful flexibility.
>>
>> Then our proposals are indeed similar. I see no need for these objects to be opaque, I expect optimizations to learn handle them like they handle any other instruction or Constant. We may simplify that by the addition of common base classes that permit unified handling of common features.
>
> Isn’t that basically reintroducing the llvm::Operator layer?

I meant more like GlobalValue.

llvm::Operator is a class derived from Value (which has a vtable) that 
you placement-new on top of another Value that isn't derived from 
Operator then call non-virtual methods on. It allows us to unify access 
across the ConstantExpr and Instruction hierarchies, and prevents us 
from being ubsan clean.

I was imagining a base class for SymbolPlusConstantOffset that different 
relocations could derive from (presupposing a world where they don't 
just all use one IR node because we can't agree on type or somesuch), 
which itself would be derived from ConstantRelocation.

(As an aside, there is another clean way to fix the llvm::Operator 
morass and still get its benefits: introduce mixins to llvm. That's 
another discussion I'm not prepared to have now.)

>>> We strongly disagree on this design, and that’s fine.  But I do not see why it matters, because you not only haven’t proposed it, but you haven’t even fleshed it out to yourself yet, which is, again, fine as far as your proposal for ConstantExpr goes.  The problem is that you and Rafael have decided to go ahead and use the basic thrust of this less-than-half-baked proposal to completely redesign an existing LLVM feature in a way that intentionally constrains and complicates future extensions.  That’s using trunk as an experimental lab for IR design, and it seems totally inappropriate to me.
>>
>> Right. Dropping the rest, I fundamentally agree here. It's nice when a feature dovetails with other plans, but a change being made now has to make sense with llvm as it stands now (consider the discussion to make datalayout mandatory -- it makes sense for llvm regardless of whether we go down the path of target-specific IR nodes).
>>
>> That said, the same applies to your proposed solution of making the verifier fail any constant expression with C->canTrap() true.
>
> That’s fair.
>
>> By the way, constant propagation does not do this analysis when deciding whether it can turn an instruction into a constant, it just always turns instructions with constant arguments into constants. This means that we can't safely sink/hoist constantexpr's around, and indeed, we've taught the optimizers to check whether operands have trapping constant expressions before moving instructions around. Mostly. I bet you could find bugs.
>
> That… is not how I would have advised solving that.  Doesn’t it obviously reorder the trap to the point where the division result is used, potentially eliminating it entirely?

Yup. Good thing traps caused by division (and all constantexpr's ever) 
are undefined behaviour.

At least we no longer sink them down into phi node operands where they 
go from being evaluated conditionally into being evaluated unconditionally.

Nick