[LLVMdev] RFC: GEP as canonical form for pointer addressing

Philip Reames listmail at philipreames.com
Tue Feb 18 11:21:23 PST 2014


On 02/15/2014 03:55 PM, Andrew Trick wrote:
> On Feb 14, 2014, at 5:18 PM, Philip Reames <listmail at philipreames.com> wrote:
>
>> RFC: GEP as canonical form for pointer addressing
>>
>> I would like to propose that we designate GEPs as the canonical form for pointer addressing in LLVM IR before CodeGenPrepare.
>>
>> Corollaries
>> 1) It is legal for an optimizer to convert inttoptr+arithmetic+inttoptr sequences to GEPs, but not vice versa.
>> 2) Input IR which does not contain inttoptr instructions will never contain inttoptr instructions (before CodeGenPrepare.)
>>
>> I've spoken with Nick Lewycky & Owen Anderson offline at the last social.  On first reflection, both were okay with the proposal, but I'd like broader buy-in and discussion.  Nick & Owen, if I've accidentally misrepresented our discussion or you've had second thoughts since, please speak up.
> FWIW, I think it would be nice if standard optimization passes have this property of being well behaved with respect to pointer types, and I don’t see a good reason for canonical IR passes to lose pointer types. I also think it’s the only way to mix the optimization of pointer values with precise GC. It seems that you just want LLVM developers to generally agree that certain passes will be well behaved (you can disable any others). It may just be a matter of documenting those passes.
You could phrase it this way.  I would push for "everything before 
CodeGenPrepare", but am open to counter argument on why a smaller set 
should be selected.  :)
> Ideally we could formalize this by declaring a pass as pointer-safe and verifying. Can we easily verify that no memory access is based on inttoptr?
Yes.  The only slightly complication is dealing with phi nodes and 
selects (which feed into GEPs), but assuming you're willing to accept a 
slightly conservative answer, it's definitely doable.

I have a pass locally which effectively does this.  It's a side effect 
of it's primary purpose, but getting that extracted as a distinct pass 
(or part of the verifier) shouldn't be difficult.
>
> -Andy
>
>> Background & Motivation
>>
>> We want to support precise garbage collection(1) in LLVM.  To do so, we have written a pass which inserts safepoints, read, and write barriers as appropriate.  This pass needs to be able to reliably(2) identify pointer vs non-pointer values.  Its advantageous to run this pass as late as practical in the optimization pipeline, but we can schedule it before lowering begins (i.e. before CodeGenPrepare).
>>
>> We control the initial IR which is generated and can ensure that it does not contain any inttoptr instructions.  We're looking to have a guarantee(*) that a random LLVM optimization pass will not decide to replace GEPs with a sequence of ptrtoint, int arithmetic, and inttoptr which are hard for us to reason about.
>>
>> * "guarantee" isn't really the right word here.  I'm really just looking to make sure that the community is comfortable with GEPs as canonical form.  If some pass decides to insert inttoptr instructions into otherwise clean IR, I want some assurance a patch fixing that would stand a good chance of being accepted.  I'm happy to do any cleanup required.
>>
>> In addition to my own use case, here's a few others which might come up:
>> - Backends for targets which support different operations on pointers vs integers.  Examples would be some of the older mainframe architectures.  (There'd be a lot more work needed to support this.)
>> - Various security related applications (e.g. CFI w.r.t. function pointers)
>>
>> I don't really want to get into these applications in detail, mostly because I'm not particularly knowledgeable on those topics.  I'd appreciate any other applications anyone wants to throw out, but lets try to keep from derailing the discussion.  (As I did to Nick's original thread on DataLayout. :))
>>
>> Notes:
>> 1) We're not using the existing gc.root implementation strategy.  I plan on explaining why in a lot more detail once we're closer to having a complete implementation that we can upstream.  That should be coming relatively shortly.  (i.e. months, not weeks, not years)
>>
>> 2) As Nick pointed out in a separate thread, other types of typecasts can obscure pointer vs integer classifications.  (i.e. casting the base type of a pointer we then load through could load a field of the "wrong" type")  I plan on responding to his point separately, but let's leave that out of this discussion for the moment.  Having GEPs as canonical form is a step forward by itself, even if I decide to propose something further down the road.
>>
>> Philip
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev




More information about the llvm-dev mailing list