[LLVMdev] RFC: GEP as canonical form for pointer addressing
Philip Reames
listmail at philipreames.com
Fri Feb 14 17:18:21 PST 2014
RFC: GEP as canonical form for pointer addressing
I would like to propose that we designate GEPs as the canonical form for
pointer addressing in LLVM IR before CodeGenPrepare.
Corollaries
1) It is legal for an optimizer to convert inttoptr+arithmetic+inttoptr
sequences to GEPs, but not vice versa.
2) Input IR which does not contain inttoptr instructions will never
contain inttoptr instructions (before CodeGenPrepare.)
I've spoken with Nick Lewycky & Owen Anderson offline at the last
social. On first reflection, both were okay with the proposal, but I'd
like broader buy-in and discussion. Nick & Owen, if I've accidentally
misrepresented our discussion or you've had second thoughts since,
please speak up.
Background & Motivation
We want to support precise garbage collection(1) in LLVM. To do so, we
have written a pass which inserts safepoints, read, and write barriers
as appropriate. This pass needs to be able to reliably(2) identify
pointer vs non-pointer values. Its advantageous to run this pass as
late as practical in the optimization pipeline, but we can schedule it
before lowering begins (i.e. before CodeGenPrepare).
We control the initial IR which is generated and can ensure that it does
not contain any inttoptr instructions. We're looking to have a
guarantee(*) that a random LLVM optimization pass will not decide to
replace GEPs with a sequence of ptrtoint, int arithmetic, and inttoptr
which are hard for us to reason about.
* "guarantee" isn't really the right word here. I'm really just looking
to make sure that the community is comfortable with GEPs as canonical
form. If some pass decides to insert inttoptr instructions into
otherwise clean IR, I want some assurance a patch fixing that would
stand a good chance of being accepted. I'm happy to do any cleanup
required.
In addition to my own use case, here's a few others which might come up:
- Backends for targets which support different operations on pointers vs
integers. Examples would be some of the older mainframe architectures.
(There'd be a lot more work needed to support this.)
- Various security related applications (e.g. CFI w.r.t. function pointers)
I don't really want to get into these applications in detail, mostly
because I'm not particularly knowledgeable on those topics. I'd
appreciate any other applications anyone wants to throw out, but lets
try to keep from derailing the discussion. (As I did to Nick's original
thread on DataLayout. :))
Notes:
1) We're not using the existing gc.root implementation strategy. I plan
on explaining why in a lot more detail once we're closer to having a
complete implementation that we can upstream. That should be coming
relatively shortly. (i.e. months, not weeks, not years)
2) As Nick pointed out in a separate thread, other types of typecasts
can obscure pointer vs integer classifications. (i.e. casting the base
type of a pointer we then load through could load a field of the "wrong"
type") I plan on responding to his point separately, but let's leave
that out of this discussion for the moment. Having GEPs as canonical
form is a step forward by itself, even if I decide to propose something
further down the road.
Philip
More information about the llvm-dev
mailing list