[LLVMdev] RFC: GEP as canonical form for pointer addressing

Hal Finkel hfinkel at anl.gov
Sat Feb 15 07:22:57 PST 2014


----- Original Message -----
> From: "Philip Reames" <listmail at philipreames.com>
> To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
> Sent: Friday, February 14, 2014 7:18:21 PM
> Subject: [LLVMdev] RFC: GEP as canonical form for pointer addressing
> 
> RFC: GEP as canonical form for pointer addressing
> 
> I would like to propose that we designate GEPs as the canonical form
> for
> pointer addressing in LLVM IR before CodeGenPrepare.

Is this not already the case? I did not think that any passes introduce inttoptr+arithmetic+inttoptr prior to CGP. On the other hand, we don't convert inttoptr+arithmetic+inttoptr into GEP when we can (which is PR14226 -- where Eli (cc'd) said this is unsafe in general).

 -Hal

> 
> Corollaries
> 1) It is legal for an optimizer to convert
> inttoptr+arithmetic+inttoptr
> sequences to GEPs, but not vice versa.
> 2) Input IR which does not contain inttoptr instructions will never
> contain inttoptr instructions (before CodeGenPrepare.)
> 
> I've spoken with Nick Lewycky & Owen Anderson offline at the last
> social.  On first reflection, both were okay with the proposal, but
> I'd
> like broader buy-in and discussion.  Nick & Owen, if I've
> accidentally
> misrepresented our discussion or you've had second thoughts since,
> please speak up.
> 
> 
> Background & Motivation
> 
> We want to support precise garbage collection(1) in LLVM.  To do so,
> we
> have written a pass which inserts safepoints, read, and write
> barriers
> as appropriate.  This pass needs to be able to reliably(2) identify
> pointer vs non-pointer values.  Its advantageous to run this pass as
> late as practical in the optimization pipeline, but we can schedule
> it
> before lowering begins (i.e. before CodeGenPrepare).
> 
> We control the initial IR which is generated and can ensure that it
> does
> not contain any inttoptr instructions.  We're looking to have a
> guarantee(*) that a random LLVM optimization pass will not decide to
> replace GEPs with a sequence of ptrtoint, int arithmetic, and
> inttoptr
> which are hard for us to reason about.
> 
> * "guarantee" isn't really the right word here.  I'm really just
> looking
> to make sure that the community is comfortable with GEPs as canonical
> form.  If some pass decides to insert inttoptr instructions into
> otherwise clean IR, I want some assurance a patch fixing that would
> stand a good chance of being accepted.  I'm happy to do any cleanup
> required.
> 
> In addition to my own use case, here's a few others which might come
> up:
> - Backends for targets which support different operations on pointers
> vs
> integers.  Examples would be some of the older mainframe
> architectures.
> (There'd be a lot more work needed to support this.)
> - Various security related applications (e.g. CFI w.r.t. function
> pointers)
> 
> I don't really want to get into these applications in detail, mostly
> because I'm not particularly knowledgeable on those topics.  I'd
> appreciate any other applications anyone wants to throw out, but lets
> try to keep from derailing the discussion.  (As I did to Nick's
> original
> thread on DataLayout. :))
> 
> Notes:
> 1) We're not using the existing gc.root implementation strategy.  I
> plan
> on explaining why in a lot more detail once we're closer to having a
> complete implementation that we can upstream.  That should be coming
> relatively shortly.  (i.e. months, not weeks, not years)
> 
> 2) As Nick pointed out in a separate thread, other types of typecasts
> can obscure pointer vs integer classifications.  (i.e. casting the
> base
> type of a pointer we then load through could load a field of the
> "wrong"
> type")  I plan on responding to his point separately, but let's leave
> that out of this discussion for the moment.  Having GEPs as canonical
> form is a step forward by itself, even if I decide to propose
> something
> further down the road.
> 
> Philip
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory



More information about the llvm-dev mailing list