[llvm-dev] [RFC] Introducing the opaque pointer type

Tue May 11 02:19:38 PDT 2021

On 11/05/2021 07:59, pawel k. via llvm-dev wrote:
> I am very much beginner in opaque pointers but I am also minimalist too 
> in a sense entities shouldnt be multiplied but rather divided where 
> applicable.
> 
> Can someone point me to article(s) describing what problems opaque 
> pointers solve that cant be solved with forward declaractions and typed 
> pointers etc?
> 
> My first gutfeeling was when learning on idea of opaque pointers, theyre 
> not much more than void* with all its issues from static analysis, 
> compiler design, code readability, code quality, code security 
> perspective. Can someone correct a newbie? Very open to change my mind.

There are a few problems with the current representation and they 
largely mirror the old problem with signed vs unsigned integers in the 
IR 15 years ago.  In early versions of LLVM, integers were explicitly 
signed.  This meant that the IR was cluttered with bitcasts from signed 
to unsigned integers, which slowed down analysis and didn't convey any 
useful semantics.  Worse, there were a bunch of things conflated, for 
example does unsigned imply wrapping?  Some time in the 2.x series (2.0? 
  My memory is fuzzy here), LLVM moved to just i{size} types for integer 
and moved all of the semantics to the operations.  It's now explicit 
whether an operation is signed or unsigned, whether overflow wraps or 
has undefined behaviour, and so on.

Pointers have a similar set of problems.  Pointers carry a type, but 
that type doesn't actually carry any semantics.  There are a lot of 
things that don't care about the type of the pointer, but they have no 
way of specifying this and generally use i8*.  This means that the IR is 
full of bitcasts from {something}* to i8* and then back again.

This is particularly important for code that wants to use non-zero 
address spaces, because a lot of code does casts via i8* and forgets to 
change this to i8*-in-another-address-space.

The fact that a pointer is a pointer to some struct type currently 
doesn't imply anything about whether the pointed-to data and it's 
completely valid to bitcast a pointer to a random type and back again in 
an optimisation.  The real type info (where applicable) is carried by 
TBAA metadata, dereferencability info by attributes, and so on.

TL;DR: The pointee type has no (or worse, misleading) semantics and 
forces a load of bitcasts.  Opaque pointers remove this.

David